Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info

2013-11-26 Thread Laszlo Ersek
On 11/18/13 12:53, Michael S. Tsirkin wrote:
 From: Igor Mammedov imamm...@redhat.com
 
 The BIOS that we ship in 1.7 does not use pci info
 from host and so far isn't going to use it.
 Taking in account problems it caused see 9604f70fdf and
 to avoid future incompatibility issues, it's safest to
 disable that interface by default for all machine types
 including 1.7 as it was never exposed/used by guest.
 And properly remove/cleanup it during 1.8 development cycle.
 
 Signed-off-by: Igor Mammedov imamm...@redhat.com
 Reviewed-by: Gerd Hoffmann kra...@redhat.com
 Reviewed-by: Michael S. Tsirkin m...@redhat.com
 Reviewed-by: Eduardo Habkost ehabk...@redhat.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/i386/pc_piix.c | 2 +-
  hw/i386/pc_q35.c  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

etc/pci-info is precisely the form and contents that OVMF needs to
download ACPI tables from qemu.

http://sourceforge.net/mailarchive/forum.php?thread_name=1385450282-27007-3-git-send-email-lersek%40redhat.comforum_name=edk2-devel

Please keep this exported in 1.8, for OVMF's sake.

Thanks
Laszlo



Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device

2013-11-26 Thread Peter Crosthwaite
On Tue, Nov 26, 2013 at 5:22 PM, liguang lig.f...@cn.fujitsu.com wrote:
 Signed-off-by: liguang lig.f...@cn.fujitsu.com
 ---
  default-configs/arm-softmmu.mak |2 +
  hw/timer/Makefile.objs  |1 +
  hw/timer/sunxi-pit.c|  295 
 +++
  include/hw/timer/sunxi-pit.h|   37 +
  4 files changed, 335 insertions(+), 0 deletions(-)
  create mode 100644 hw/timer/sunxi-pit.c
  create mode 100644 include/hw/timer/sunxi-pit.h

 diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
 index a555eef..7bf5ad0 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
 @@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y

  CONFIG_SDHCI=y
  CONFIG_INTEGRATOR_DEBUG=y
 +
 +CONFIG_SUNXI_PIT=y
 diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
 index eca5905..f7888e9 100644
 --- a/hw/timer/Makefile.objs
 +++ b/hw/timer/Makefile.objs
 @@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o
  obj-$(CONFIG_TUSB6010) += tusb6010.o

  obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 +obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o
 diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c
 new file mode 100644
 index 000..39b84ab
 --- /dev/null
 +++ b/hw/timer/sunxi-pit.c
 @@ -0,0 +1,295 @@
 +/*
 + * Allwinner sunxi timer device emulation
 + *
 + * Copyright (C) 2013 Li Guang
 + * Written by Li Guang lig.f...@cn.fujitsu.com
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License as published by the
 + * Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful, but 
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 + * for more details.
 + */
 +
 +#include hw/sysbus.h
 +#include hw/ptimer.h
 +#include sysemu/sysemu.h
 +#include hw/timer/sunxi-pit.h
 +
 +
 +typedef struct SunxiTimer {
 +ptimer_state *timer;
 +} SunxiTimer;
 +

I don't understand the need for this struct. What was wrong with the
direct array of ptimers you had before?

 +typedef struct SunxiPITState {
 +/* private */
 +SysBusDevice parent_obj;
 +/* public */
 +qemu_irq irq[SUNXI_TIMER_NR];
 +SunxiTimer timer[SUNXI_TIMER_NR];
 +MemoryRegion iomem;
 +
 +uint32_t  irq_enable;
 +uint32_t irq_status;
 +uint32_t control[SUNXI_TIMER_NR];
 +uint32_t interval[SUNXI_TIMER_NR];
 +uint32_t count[SUNXI_TIMER_NR];
 +uint32_t watch_dog_mode;
 +uint32_t watch_dog_control;
 +uint32_t count_lo;
 +uint32_t count_hi;
 +uint32_t count_ctl;
 +} SunxiPITState;
 +
 +static uint64_t sunxi_pit_read(void *opaque, hwaddr offset, unsigned size)
 +{
 +SunxiPITState *s = SUNXI_PIT(opaque);
 +uint8_t index = 0;

initializer to 0 un-needed.

 +
 +switch (offset) {
 +case SUNXI_TIMER_IRQ_EN:
 +return s-irq_enable;
 +break;
 +case SUNXI_TIMER_IRQ_ST:
 +return s-irq_status;
 +break;
 +case SUNXI_TIMER_BASE ...  SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT:
 +index = offset  0xf0;
 +index = 4;
 +index -= 1;
 +switch (offset  0x0f) {
 +case SUNXI_TIMER_CONTROL:
 +return s-control[index];
 +break;
 +case SUNXI_TIMER_INTERVAL:
 +return s-interval[index];
 +break;
 +case SUNXI_TIMER_COUNT: {
 +SunxiTimer *t = s-timer[index];
 +s-count[index] = ptimer_get_count(t-timer);
 +return s-count[index];
 +}
 +default:
 +break;
 +}
 +break;
 +case SUNXI_WDOG_CONTROL:
 +break;
 +case SUNXI_WDOG_MODE:
 +break;
 +case SUNXI_COUNT_LO:
 +return s-count_lo;
 +break;
 +case SUNXI_COUNT_HI:
 +return s-count_hi;
 +break;
 +case SUNXI_COUNT_CTL:
 +return s-count_ctl;
 +default:
 +break;

Usual to do a log_guest error here. Same for writes below.

 +}
 +
 +return 0;
 +}
 +
 +static void sunxi_pit_write(void *opaque, hwaddr offset, uint64_t value,
 +unsigned size)
 +{
 + SunxiPITState *s = SUNXI_PIT(opaque);
 + uint8_t index = 0;
 +
 +switch (offset) {
 +case SUNXI_TIMER_IRQ_EN:
 +s-irq_enable = value;
 +break;
 +case SUNXI_TIMER_IRQ_ST:
 +s-irq_status = value;

Are you missing a ~ ? This is a clear-to-clear semantic rather than
write-to-clear.

Also shouldn't this de-assert the relevant interrupt lines?

 +break;
 +case SUNXI_TIMER_BASE ...  SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT:
 +index = offset  0xf0;
 +index = 4;
 +index -= 1;
 +switch (offset  0x0f) {
 +case SUNXI_TIMER_CONTROL: {
 +SunxiTimer 

[Qemu-devel] [PATCHv2 1.8 1/9] qemu-img: add support for skipping zeroes in input during convert

2013-11-26 Thread Peter Lieven
we currently do not check if a sector is allocated during convert.
This means if a sector is unallocated that we allocate a bounce
buffer of zeroes, find out its zero later and do not write it
in the best case. In the worst case this can lead to reading
blocks from a raw device (like iSCSI) altough we could easily
know via get_block_status that they are zero and simply skip them.

This patch also fixes the progress output not being at 100% after
a successful conversion.

Signed-off-by: Peter Lieven p...@kamp.de
---
 qemu-img.c |   85 ++--
 1 file changed, 48 insertions(+), 37 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index dc0c2f0..efb744c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1125,13 +1125,15 @@ out3:
 
 static int img_convert(int argc, char **argv)
 {
-int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size,
+int c, n, n1, bs_n, bs_i, compress, cluster_size,
 cluster_sectors, skip_create;
+int64_t ret = 0;
 int progress = 0, flags;
 const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
 BlockDriver *drv, *proto_drv;
 BlockDriverState **bs = NULL, *out_bs = NULL;
-int64_t total_sectors, nb_sectors, sector_num, bs_offset;
+int64_t total_sectors, nb_sectors, sector_num, bs_offset,
+sector_num_next_status = 0;
 uint64_t bs_sectors;
 uint8_t * buf = NULL;
 const uint8_t *buf1;
@@ -1140,7 +1142,6 @@ static int img_convert(int argc, char **argv)
 QEMUOptionParameter *out_baseimg_param;
 char *options = NULL;
 const char *snapshot_name = NULL;
-float local_progress = 0;
 int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */
 bool quiet = false;
 Error *local_err = NULL;
@@ -1403,10 +1404,6 @@ static int img_convert(int argc, char **argv)
 sector_num = 0;
 
 nb_sectors = total_sectors;
-if (nb_sectors != 0) {
-local_progress = (float)100 /
-(nb_sectors / MIN(nb_sectors, cluster_sectors));
-}
 
 for(;;) {
 int64_t bs_num;
@@ -1464,7 +1461,7 @@ static int img_convert(int argc, char **argv)
 }
 }
 sector_num += n;
-qemu_progress_print(local_progress, 100);
+qemu_progress_print(100.0 * sector_num / total_sectors, 0);
 }
 /* signal EOF to align */
 bdrv_write_compressed(out_bs, 0, NULL, 0);
@@ -1481,21 +1478,13 @@ static int img_convert(int argc, char **argv)
 
 sector_num = 0; // total number of sectors converted so far
 nb_sectors = total_sectors - sector_num;
-if (nb_sectors != 0) {
-local_progress = (float)100 /
-(nb_sectors / MIN(nb_sectors, IO_BUF_SIZE / 512));
-}
 
 for(;;) {
 nb_sectors = total_sectors - sector_num;
 if (nb_sectors = 0) {
+ret = 0;
 break;
 }
-if (nb_sectors = (IO_BUF_SIZE / 512)) {
-n = (IO_BUF_SIZE / 512);
-} else {
-n = nb_sectors;
-}
 
 while (sector_num - bs_offset = bs_sectors) {
 bs_i ++;
@@ -1507,34 +1496,53 @@ static int img_convert(int argc, char **argv)
sector_num, bs_i, bs_offset, bs_sectors); */
 }
 
-if (n  bs_offset + bs_sectors - sector_num) {
-n = bs_offset + bs_sectors - sector_num;
-}
-
-/* If the output image is being created as a copy on write image,
-   assume that sectors which are unallocated in the input image
-   are present in both the output's and input's base images (no
-   need to copy them). */
-if (out_baseimg) {
-ret = bdrv_is_allocated(bs[bs_i], sector_num - bs_offset,
-n, n1);
+if ((out_baseimg || has_zero_init) 
+sector_num = sector_num_next_status) {
+n = nb_sectors  INT_MAX ? INT_MAX : nb_sectors;
+ret = bdrv_get_block_status(bs[bs_i], sector_num - bs_offset,
+n, n1);
 if (ret  0) {
-error_report(error while reading metadata for sector 
- % PRId64 : %s,
- sector_num - bs_offset, strerror(-ret));
+error_report(error while reading block status of sector %
+ PRId64 : %s, sector_num - bs_offset,
+ strerror(-ret));
 goto out;
 }
-if (!ret) {
+/* If the output image is zero initialized, we are not working
+ * on a shared base and the input is zero we can skip the next
+ * n1 sectors */
+if 

[Qemu-devel] [PATCHv2 1.8 4/9] block: add opt_transfer_length to BlockLimits

2013-11-26 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 include/block/block_int.h |3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 95140b6..6499516 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -242,6 +242,9 @@ typedef struct BlockLimits {
 
 /* optimal alignment for write zeroes requests in sectors */
 int64_t write_zeroes_alignment;
+
+/* optimal transfer length in sectors */
+int opt_transfer_length;
 } BlockLimits;
 
 /*
-- 
1.7.9.5




[Qemu-devel] [PATCHv2 1.8 2/9] qemu-img: fix usage instruction for qemu-img convert

2013-11-26 Thread Peter Lieven
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Peter Lieven p...@kamp.de
---
 qemu-img.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index efb744c..e2d1a0a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -105,7 +105,6 @@ static void help(void)
   conversion. If the number of bytes is 0, the source will 
not be scanned for\n
   unallocated or zero sectors, and the destination image will 
always be\n
   fully allocated\n
-  images will always be fully allocated\n
  '--output' takes the format in which the output must be done 
(human or json)\n
  '-n' skips the target volume creation (useful if the volume is 
created\n
   prior to running qemu-img)\n
-- 
1.7.9.5




[Qemu-devel] [PATCHv2 1.8 3/9] block/iscsi: set bdi-cluster_size

2013-11-26 Thread Peter Lieven
this patch aims to set bdi-cluster_size to the internal page size
of the iscsi target so that enabled callers can align requests
properly.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c |7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 93fee6d..75d6b87 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1580,6 +1580,13 @@ static int iscsi_get_info(BlockDriverState *bs, 
BlockDriverInfo *bdi)
 IscsiLun *iscsilun = bs-opaque;
 bdi-unallocated_blocks_are_zero = !!iscsilun-lbprz;
 bdi-can_write_zeroes_with_unmap = iscsilun-lbprz  iscsilun-lbp.lbpws;
+/* Guess the internal cluster (page) size of the iscsi target by the means
+ * of opt_unmap_gran. Transfer the unmap granularity only if it has a
+ * reasonable size for bdi-cluster_size */
+if (iscsilun-bl.opt_unmap_gran * iscsilun-block_size = 64 * 1024 
+iscsilun-bl.opt_unmap_gran * iscsilun-block_size = 16 * 1024 * 
1024) {
+bdi-cluster_size = iscsilun-bl.opt_unmap_gran * iscsilun-block_size;
+}
 return 0;
 }
 
-- 
1.7.9.5




[Qemu-devel] [PATCHv2 1.8 5/9] block/iscsi: set bs-bl.opt_transfer_length

2013-11-26 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 block/iscsi.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 75d6b87..1109d8c 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1457,6 +1457,9 @@ static int iscsi_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 bs-bl.write_zeroes_alignment = 
sector_lun2qemu(iscsilun-bl.opt_unmap_gran,
 iscsilun);
+
+bs-bl.opt_transfer_length = sector_lun2qemu(iscsilun-bl.opt_xfer_len,
+ iscsilun);
 }
 
 #if defined(LIBISCSI_FEATURE_NOP_COUNTER)
-- 
1.7.9.5




[Qemu-devel] [PATCHv2 1.8 9/9] qemu-img: increase min_sparse to 128 sectors (64kb)

2013-11-26 Thread Peter Lieven
Suggested-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Peter Lieven p...@kamp.de
---
 qemu-img.c|4 ++--
 qemu-img.texi |2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index cc7540f..6f1179b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -101,7 +101,7 @@ static void help(void)
  '-p' show progress of command (only certain commands)\n
  '-pp' show progress of command in sectors (only convert 
command)\n
  '-q' use Quiet mode - do not print any output (except errors)\n
- '-S' indicates the consecutive number of bytes (defaults to 4k) 
that must\n
+ '-S' indicates the consecutive number of bytes (defaults to 64k) 
that must\n
   contain only zeros for qemu-img to create a sparse image 
during\n
   conversion. If the number of bytes is 0, the source will 
not be scanned for\n
   unallocated or zero sectors, and the destination image will 
always be\n
@@ -1158,7 +1158,7 @@ static int img_convert(int argc, char **argv)
 QEMUOptionParameter *out_baseimg_param;
 char *options = NULL;
 const char *snapshot_name = NULL;
-int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */
+int min_sparse = 128; /* Need at least 64k of zeros for sparse detection */
 bool quiet = false;
 Error *local_err = NULL;
 
diff --git a/qemu-img.texi b/qemu-img.texi
index cb4a3eb..c0e86ab 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -195,7 +195,7 @@ Image conversion is also useful to get smaller image when 
using a
 growable format such as @code{qcow} or @code{cow}: the empty sectors
 are detected and suppressed from the destination image.
 
-@var{sparse_size} indicates the consecutive number of bytes (defaults to 4k)
+@var{sparse_size} indicates the consecutive number of bytes (defaults to 64k)
 that must contain only zeros for qemu-img to create a sparse image during
 conversion. If @var{sparse_size} is 0, the source will not be scanned for
 unallocated or zero sectors, and the destination image will always be
-- 
1.7.9.5




[Qemu-devel] [PATCHv2 1.8 0/9] qemu-img convert optimizations

2013-11-26 Thread Peter Lieven
this series adds some optimizations for qemu-img during convert which
have been developed recently:
- skipping input based on get_block_status
- variable I/O buffer size
- align write requests to cluster_size
- show progress in sectors or percent

v1-v2:
  - introduce opt_transfer_length in BlockLimits [Paolo]
  - remove knobs for iobuffer_size and alignment and
use them unconditionally [Paolo]
  - calculate I/O buffer size by BlockLimits information [Paolo]
  - change the alignment patch to round down to the
last and not to the next aligned sector [Paolo]
  - limit updates in the sector progress output
  - new patch to increase the default for min_sparse [Paolo]

Peter Lieven (9):
  qemu-img: add support for skipping zeroes in input during convert
  qemu-img: fix usage instruction for qemu-img convert
  block/iscsi: set bdi-cluster_size
  block: add opt_transfer_length to BlockLimits
  block/iscsi: set bs-bl.opt_transfer_length
  qemu-img: dynamically adjust iobuffer size during convert
  qemu-img: round down request length to an aligned sector
  qemu-img: add option to show progress in sectors
  qemu-img: increase min_sparse to 128 sectors (64kb)

 block/iscsi.c |   10 +++
 include/block/block_int.h |3 +
 qemu-img-cmds.hx  |4 +-
 qemu-img.c|  159 ++---
 qemu-img.texi |6 +-
 5 files changed, 126 insertions(+), 56 deletions(-)

-- 
1.7.9.5




[Qemu-devel] [PATCHv2 1.8 6/9] qemu-img: dynamically adjust iobuffer size during convert

2013-11-26 Thread Peter Lieven
since the convert process is basically a sync operation it might
be benificial in some case to change the hardcoded I/O buffer
size to a greater value.

This patch increases the I/O buffer size if the output
driver advertises an optimal transfer length or discard alignment
that is greater than the default buffer size of 2M.

Signed-off-by: Peter Lieven p...@kamp.de
---
 qemu-img.c |   18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index e2d1a0a..b076d44 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1135,6 +1135,7 @@ static int img_convert(int argc, char **argv)
 sector_num_next_status = 0;
 uint64_t bs_sectors;
 uint8_t * buf = NULL;
+size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE;
 const uint8_t *buf1;
 BlockDriverInfo bdi;
 QEMUOptionParameter *param = NULL, *create_options = NULL;
@@ -1371,7 +1372,16 @@ static int img_convert(int argc, char **argv)
 bs_i = 0;
 bs_offset = 0;
 bdrv_get_geometry(bs[0], bs_sectors);
-buf = qemu_blockalign(out_bs, IO_BUF_SIZE);
+
+/* increase bufsectors from the default 4096 (2M) if opt_transfer_length
+ * or discard_alignment of the out_bs is greater. Limit to 32768 (16MB)
+ * as maximum. */
+bufsectors = MIN(32768,
+ MAX(bufsectors, MAX(out_bs-bl.opt_transfer_length,
+ out_bs-bl.discard_alignment))
+);
+
+buf = qemu_blockalign(out_bs, bufsectors * BDRV_SECTOR_SIZE);
 
 if (skip_create) {
 int64_t output_length = bdrv_getlength(out_bs);
@@ -1394,7 +1404,7 @@ static int img_convert(int argc, char **argv)
 goto out;
 }
 cluster_size = bdi.cluster_size;
-if (cluster_size = 0 || cluster_size  IO_BUF_SIZE) {
+if (cluster_size = 0 || cluster_size  bufsectors * BDRV_SECTOR_SIZE) 
{
 error_report(invalid cluster size);
 ret = -1;
 goto out;
@@ -1531,8 +1541,8 @@ static int img_convert(int argc, char **argv)
 sector_num_next_status = sector_num + n1;
 }
 
-if (nb_sectors = (IO_BUF_SIZE / 512)) {
-n = (IO_BUF_SIZE / 512);
+if (nb_sectors = bufsectors) {
+n = bufsectors;
 } else {
 n = nb_sectors;
 }
-- 
1.7.9.5




[Qemu-devel] [PATCHv2 1.8 8/9] qemu-img: add option to show progress in sectors

2013-11-26 Thread Peter Lieven
Signed-off-by: Peter Lieven p...@kamp.de
---
 qemu-img-cmds.hx |4 ++--
 qemu-img.c   |   31 ---
 qemu-img.texi|4 +++-
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index da1d965..6c8183b 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -34,9 +34,9 @@ STEXI
 ETEXI
 
 DEF(convert, img_convert,
-convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o 
options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] 
output_filename)
+convert [-c] [-p|-pp] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o 
options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] 
output_filename)
 STEXI
-@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O 
@var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S 
@var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p|-pp] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O 
@var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S 
@var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF(info, img_info,
diff --git a/qemu-img.c b/qemu-img.c
index 1421f0f..cc7540f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -99,6 +99,7 @@ static void help(void)
   rebasing in this case (useful for renaming the backing 
file)\n
  '-h' with or without a command shows this help and lists the 
supported formats\n
  '-p' show progress of command (only certain commands)\n
+ '-pp' show progress of command in sectors (only convert 
command)\n
  '-q' use Quiet mode - do not print any output (except errors)\n
  '-S' indicates the consecutive number of bytes (defaults to 4k) 
that must\n
   contain only zeros for qemu-img to create a sparse image 
during\n
@@ -1122,6 +1123,22 @@ out3:
 return ret;
 }
 
+static void print_sector_progress(int progress, int64_t sector_num,
+  int64_t total_sectors)
+{
+static int64_t last_sector = -1;
+if (progress == 2) {
+if (sector_num == 0 ||
+sector_num  last_sector + 0.02 * total_sectors ||
+sector_num == total_sectors) {
+printf(%ld of %ld sectors converted.\r, sector_num,
+   total_sectors);
+fflush(stdout);
+last_sector = sector_num;
+}
+}
+}
+
 static int img_convert(int argc, char **argv)
 {
 int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create;
@@ -1130,7 +1147,7 @@ static int img_convert(int argc, char **argv)
 const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
 BlockDriver *drv, *proto_drv;
 BlockDriverState **bs = NULL, *out_bs = NULL;
-int64_t total_sectors, nb_sectors, sector_num, bs_offset,
+int64_t total_sectors = 0, nb_sectors, sector_num, bs_offset,
 sector_num_next_status = 0;
 uint64_t bs_sectors;
 uint8_t * buf = NULL;
@@ -1201,7 +1218,7 @@ static int img_convert(int argc, char **argv)
 break;
 }
 case 'p':
-progress = 1;
+progress++;
 break;
 case 't':
 cache = optarg;
@@ -1227,7 +1244,7 @@ static int img_convert(int argc, char **argv)
 out_filename = argv[argc - 1];
 
 /* Initialize before goto out */
-qemu_progress_init(progress, 2.0);
+qemu_progress_init(progress == 1, 2.0);
 
 if (options  is_help_option(options)) {
 ret = print_block_option_help(out_filename, out_fmt);
@@ -1258,6 +1275,8 @@ static int img_convert(int argc, char **argv)
 total_sectors += bs_sectors;
 }
 
+print_sector_progress(progress, 0, total_sectors);
+
 if (snapshot_name != NULL) {
 if (bs_n  1) {
 error_report(No support for concatenating multiple snapshot);
@@ -1472,6 +1491,7 @@ static int img_convert(int argc, char **argv)
 }
 sector_num += n;
 qemu_progress_print(100.0 * sector_num / total_sectors, 0);
+print_sector_progress(progress, sector_num, total_sectors);
 }
 /* signal EOF to align */
 bdrv_write_compressed(out_bs, 0, NULL, 0);
@@ -1587,11 +1607,16 @@ static int img_convert(int argc, char **argv)
 buf1 += n1 * 512;
 }
 qemu_progress_print(100.0 * sector_num / total_sectors, 0);
+print_sector_progress(progress, sector_num, total_sectors);
 }
 }
 out:
 if (!ret) {
 qemu_progress_print(100, 0);
+print_sector_progress(progress, total_sectors, total_sectors);
+}
+if (progress == 2) {
+printf(\n);
 }
 qemu_progress_end();
 free_option_parameters(create_options);
diff --git a/qemu-img.texi b/qemu-img.texi
index da36975..cb4a3eb 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -54,6 +54,8 

[Qemu-devel] [PATCHv2 1.8 7/9] qemu-img: round down request length to an aligned sector

2013-11-26 Thread Peter Lieven
this patch shortens requests to end at an aligned sector so that
the next request starts aligned.

Signed-off-by: Peter Lieven p...@kamp.de
---
 qemu-img.c |   30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index b076d44..1421f0f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1124,8 +1124,7 @@ out3:
 
 static int img_convert(int argc, char **argv)
 {
-int c, n, n1, bs_n, bs_i, compress, cluster_size,
-cluster_sectors, skip_create;
+int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create;
 int64_t ret = 0;
 int progress = 0, flags;
 const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
@@ -1397,19 +1396,21 @@ static int img_convert(int argc, char **argv)
 }
 }
 
+cluster_sectors = 0;
+ret = bdrv_get_info(out_bs, bdi);
+if (ret  0  compress) {
+error_report(could not get block driver info);
+goto out;
+} else {
+cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE;
+}
+
 if (compress) {
-ret = bdrv_get_info(out_bs, bdi);
-if (ret  0) {
-error_report(could not get block driver info);
-goto out;
-}
-cluster_size = bdi.cluster_size;
-if (cluster_size = 0 || cluster_size  bufsectors * BDRV_SECTOR_SIZE) 
{
+if (cluster_sectors = 0 || cluster_sectors  bufsectors) {
 error_report(invalid cluster size);
 ret = -1;
 goto out;
 }
-cluster_sectors = cluster_size  9;
 sector_num = 0;
 
 nb_sectors = total_sectors;
@@ -1547,6 +1548,15 @@ static int img_convert(int argc, char **argv)
 n = nb_sectors;
 }
 
+/* round down request length to an aligned sector */
+if (cluster_sectors  0  n = cluster_sectors) {
+int64_t next_aligned_sector = (sector_num + n);
+next_aligned_sector -= next_aligned_sector % cluster_sectors;
+if (sector_num + n  next_aligned_sector) {
+n = next_aligned_sector - sector_num;
+}
+}
+
 if (n  bs_offset + bs_sectors - sector_num) {
 n = bs_offset + bs_sectors - sector_num;
 }
-- 
1.7.9.5




Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device

2013-11-26 Thread Li Guang

Peter Crosthwaite wrote:

On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com  wrote:
   

Signed-off-by: liguanglig.f...@cn.fujitsu.com
---
  default-configs/arm-softmmu.mak |2 +
  hw/timer/Makefile.objs  |1 +
  hw/timer/sunxi-pit.c|  295 +++
  include/hw/timer/sunxi-pit.h|   37 +
  4 files changed, 335 insertions(+), 0 deletions(-)
  create mode 100644 hw/timer/sunxi-pit.c
  create mode 100644 include/hw/timer/sunxi-pit.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index a555eef..7bf5ad0 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y

  CONFIG_SDHCI=y
  CONFIG_INTEGRATOR_DEBUG=y
+
+CONFIG_SUNXI_PIT=y
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index eca5905..f7888e9 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o
  obj-$(CONFIG_TUSB6010) += tusb6010.o

  obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
+obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o
diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c
new file mode 100644
index 000..39b84ab
--- /dev/null
+++ b/hw/timer/sunxi-pit.c
@@ -0,0 +1,295 @@
+/*
+ * Allwinner sunxi timer device emulation
+ *
+ * Copyright (C) 2013 Li Guang
+ * Written by Li Guanglig.f...@cn.fujitsu.com
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include hw/sysbus.h
+#include hw/ptimer.h
+#include sysemu/sysemu.h
+#include hw/timer/sunxi-pit.h
+
+
+typedef struct SunxiTimer {
+ptimer_state *timer;
+} SunxiTimer;
+
 

I don't understand the need for this struct. What was wrong with the
direct array of ptimers you had before?
   


because I have to pack timer array into VMSD,
and there's no VMSTATE_PTIMER_ARRAY for ptimer_state.

   

+typedef struct SunxiPITState {
+/*  private*/
+SysBusDevice parent_obj;
+/*  public*/
+qemu_irq irq[SUNXI_TIMER_NR];
+SunxiTimer timer[SUNXI_TIMER_NR];
+MemoryRegion iomem;
+
+uint32_t  irq_enable;
+uint32_t irq_status;
+uint32_t control[SUNXI_TIMER_NR];
+uint32_t interval[SUNXI_TIMER_NR];
+uint32_t count[SUNXI_TIMER_NR];
+uint32_t watch_dog_mode;
+uint32_t watch_dog_control;
+uint32_t count_lo;
+uint32_t count_hi;
+uint32_t count_ctl;
+} SunxiPITState;
+
+static uint64_t sunxi_pit_read(void *opaque, hwaddr offset, unsigned size)
+{
+SunxiPITState *s = SUNXI_PIT(opaque);
+uint8_t index = 0;
 

initializer to 0 un-needed.

   


OK.

+
+switch (offset) {
+case SUNXI_TIMER_IRQ_EN:
+return s-irq_enable;
+break;
+case SUNXI_TIMER_IRQ_ST:
+return s-irq_status;
+break;
+case SUNXI_TIMER_BASE ...  SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT:
+index = offset  0xf0;
+index= 4;
+index -= 1;
+switch (offset  0x0f) {
+case SUNXI_TIMER_CONTROL:
+return s-control[index];
+break;
+case SUNXI_TIMER_INTERVAL:
+return s-interval[index];
+break;
+case SUNXI_TIMER_COUNT: {
+SunxiTimer *t =s-timer[index];
+s-count[index] = ptimer_get_count(t-timer);
+return s-count[index];
+}
+default:
+break;
+}
+break;
+case SUNXI_WDOG_CONTROL:
+break;
+case SUNXI_WDOG_MODE:
+break;
+case SUNXI_COUNT_LO:
+return s-count_lo;
+break;
+case SUNXI_COUNT_HI:
+return s-count_hi;
+break;
+case SUNXI_COUNT_CTL:
+return s-count_ctl;
+default:
+break;
 

Usual to do a log_guest error here. Same for writes below.
   


OK.
   

+}
+
+return 0;
+}
+
+static void sunxi_pit_write(void *opaque, hwaddr offset, uint64_t value,
+unsigned size)
+{
+ SunxiPITState *s = SUNXI_PIT(opaque);
+ uint8_t index = 0;
+
+switch (offset) {
+case SUNXI_TIMER_IRQ_EN:
+s-irq_enable = value;
+break;
+case SUNXI_TIMER_IRQ_ST:
+s-irq_status= value;
 

Are you missing a ~ ? This is a clear-to-clear semantic rather than
write-to-clear.
   


yes

Also shouldn't this de-assert the relevant interrupt lines?

   


there's no related tips in sunxi SoC data-sheet,
and test is fine until now.


+break;
+case SUNXI_TIMER_BASE ...  SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT:
+index = offset  

Re: [Qemu-devel] [PATCH for-1.7 0/2] block/drive-mirror: Reuse backing HD for sync=none

2013-11-26 Thread Paolo Bonzini
Il 25/11/2013 20:28, Max Reitz ha scritto:
 This series fixes the drive-mirror blockjob in case of none sync mode
 to always use the old (current) image file as the backing file of the
 newly created mirrored file (in case of absolute-paths mode).
 
 It is rather important to get this into 1.7, as we will introduce an at
 least pretty strange API in case the original file is unbacked
 otherwise.
 
 
 Max Reitz (2):
   block/drive-mirror: Reuse backing HD for sync=none
   qemu-iotests: Fix test 041

Reviewed-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-devel] [PATCH v4 2/4] hw/intc: add sunxi interrupt controller device

2013-11-26 Thread Peter Crosthwaite
On Tue, Nov 26, 2013 at 5:22 PM, liguang lig.f...@cn.fujitsu.com wrote:
 Signed-off-by: liguang lig.f...@cn.fujitsu.com
 ---
  default-configs/arm-softmmu.mak |1 +
  hw/intc/Makefile.objs   |1 +
  hw/intc/sunxi-pic.c |  238 
 +++
  include/hw/intc/sunxi-pic.h |   20 
  4 files changed, 260 insertions(+), 0 deletions(-)
  create mode 100644 hw/intc/sunxi-pic.c
  create mode 100644 include/hw/intc/sunxi-pic.h

 diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
 index 7bf5ad0..bbe00e4 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
 @@ -83,3 +83,4 @@ CONFIG_SDHCI=y
  CONFIG_INTEGRATOR_DEBUG=y

  CONFIG_SUNXI_PIT=y
 +CONFIG_SUNXI_PIC=y
 diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
 index 47ac442..dad8c43 100644
 --- a/hw/intc/Makefile.objs
 +++ b/hw/intc/Makefile.objs
 @@ -12,6 +12,7 @@ common-obj-$(CONFIG_IOAPIC) += ioapic_common.o
  common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o
  common-obj-$(CONFIG_ARM_GIC) += arm_gic.o
  common-obj-$(CONFIG_OPENPIC) += openpic.o
 +common-obj-$(CONFIG_SUNXI_PIC) += sunxi-pic.o

  obj-$(CONFIG_APIC) += apic.o apic_common.o
  obj-$(CONFIG_ARM_GIC_KVM) += arm_gic_kvm.o
 diff --git a/hw/intc/sunxi-pic.c b/hw/intc/sunxi-pic.c
 new file mode 100644
 index 000..e84fc55
 --- /dev/null
 +++ b/hw/intc/sunxi-pic.c
 @@ -0,0 +1,238 @@
 +/*
 + * Allwinner sunxi interrupt controller device emulation
 + *
 + * Copyright (C) 2013 Li Guang
 + * Written by Li Guang lig.f...@cn.fujitsu.com
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License as published by the
 + * Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful, but 
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 + * for more details.
 + */
 +
 +#include hw/sysbus.h
 +#include hw/devices.h
 +#include sysemu/sysemu.h
 +#include hw/intc/sunxi-pic.h
 +
 +
 +typedef struct SunxiPICState {
 +/* private */
 +SysBusDevice parent_obj;
 +/* public */
 +MemoryRegion iomem;
 +qemu_irq parent_fiq;
 +qemu_irq parent_irq;

Blank line here for readability.

 +uint32_t vector;
 +uint32_t base_addr;
 +uint32_t protect;
 +uint32_t nmi;
 +uint32_t irq_pending[SUNXI_PIC_REG_IDX];
 +uint32_t fiq_pending[SUNXI_PIC_REG_IDX];

this appears to be constant 0. I cant find anywhere that sets fiq_pending.

 +uint32_t select[SUNXI_PIC_REG_IDX];
 +uint32_t enable[SUNXI_PIC_REG_IDX];
 +uint32_t mask[SUNXI_PIC_REG_IDX];
 +/*priority setting here*/
 +} SunxiPICState;
 +
 +static void sunxi_pic_update(SunxiPICState *s)
 +{
 +uint8_t i = 0, j = 0;

Initialisers un-needed.

 +bool irq = false;
 +
 +for (i = 0, j = 0; i  SUNXI_PIC_REG_IDX; i++) {
 +for (j = 0; j  32; j++) {
 +if (!test_bit(j, (void *)s-mask[i])) {

You can save on a level of indentation here with:

if (test_bit(j, (void *)s-mask[i])) {
continue;
}

 +if (test_bit(j, (void *)s-irq_pending[i])) {
 +qemu_set_irq(s-parent_irq, 1);

qemu_irq_raise() is simpler.

I can't find anywhere in the code where the interrupts are lowered.
How do they come down, once they are up?

 +irq = true;
 +}
 +if (test_bit(j, (void *)s-fiq_pending[i]) 
 +test_bit(j, (void *)s-select[i])) {
 +qemu_set_irq(s-parent_fiq, 1);
 +irq = true;
 +}
 +if (irq) {
 +goto out;

What happens if two interrupts are both active, the first setting IRQ
the second FIQ? Wont this escape logic cause FIQ raising to
potentionally be missed?

 +}
 +}
 +}
 +}
 +out:
 +return;
 +}
 +
 +static void sunxi_pic_set_irq(void *opaque, int irq, int level)
 +{
 +SunxiPICState *s = opaque;
 +
 +if (level) {
 +set_bit(irq, (void *)s-irq_pending[irq/32]);

set_bit(irq % 32, ...)

 +}

Is this supposed to set either fiq_pending or irq_pending depending on
the select bit?

 +sunxi_pic_update(s);
 +}
 +
 +static uint64_t sunxi_pic_read(void *opaque, hwaddr offset, unsigned size)
 +{
 +SunxiPICState *s = opaque;
 +uint8_t index = (offset  0xc)/4;
 +
 +switch (offset) {
 +case SUNXI_PIC_VECTOR:
 +return s-vector;
 +break;
 +case SUNXI_PIC_BASE_ADDR:
 +return s-base_addr;
 +break;
 +case SUNXI_PIC_PROTECT:
 +return s-protect;
 +break;
 +case SUNXI_PIC_NMI:
 +return s-nmi;
 +break;
 +case SUNXI_PIC_IRQ_PENDING ... SUNXI_PIC_IRQ_PENDING + 8:
 +

Re: [Qemu-devel] [PATCH rebased for-1.8] i386: pc: align gpa-hpa on 1GB boundary (v6)

2013-11-26 Thread Paolo Bonzini
Il 25/11/2013 22:05, Michael S. Tsirkin ha scritto:
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  [Reorganize code, keep same logic. - Paolo]
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 BTW how about a unit-test for this?
 Can be something along the lines of the acpi tests:
 run BIOS, probe what it reported.
 

No, it can't because the change is not guest visible.  It is not enabled
for machine types =1.8 due to migration compatibility (the migration
stream uses offsets within the RAM block, and these change), but there
is no difference from the guest point of view.

Paolo



Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info

2013-11-26 Thread Michael S. Tsirkin
On Tue, Nov 26, 2013 at 09:12:50AM +0100, Laszlo Ersek wrote:
 On 11/18/13 12:53, Michael S. Tsirkin wrote:
  From: Igor Mammedov imamm...@redhat.com
  
  The BIOS that we ship in 1.7 does not use pci info
  from host and so far isn't going to use it.
  Taking in account problems it caused see 9604f70fdf and
  to avoid future incompatibility issues, it's safest to
  disable that interface by default for all machine types
  including 1.7 as it was never exposed/used by guest.
  And properly remove/cleanup it during 1.8 development cycle.
  
  Signed-off-by: Igor Mammedov imamm...@redhat.com
  Reviewed-by: Gerd Hoffmann kra...@redhat.com
  Reviewed-by: Michael S. Tsirkin m...@redhat.com
  Reviewed-by: Eduardo Habkost ehabk...@redhat.com
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/i386/pc_piix.c | 2 +-
   hw/i386/pc_q35.c  | 2 +-
   2 files changed, 2 insertions(+), 2 deletions(-)
 
 etc/pci-info is precisely the form and contents that OVMF needs to
 download ACPI tables from qemu.
 
 http://sourceforge.net/mailarchive/forum.php?thread_name=1385450282-27007-3-git-send-email-lersek%40redhat.comforum_name=edk2-devel
 
 Please keep this exported in 1.8, for OVMF's sake.
 
 Thanks
 Laszlo

This pull request was misnamed, it was merged for 1.7.
Problem is pci-info can't be implemented correctly as defined:
for example we don't know where does MMCONFIG resize before
it is configured.
This patch was acked by several people so we'll need a stronger justification
for re-introducing it.
seabios manages to enumerate PCI with information exported from qemu
so why can't OVMF?

I think it's down to other qemu bugs (such as _CRS not covering
all of PCI memory), we shall just fix them.

-- 
MST



Re: [Qemu-devel] [PATCH v4 2/4] hw/intc: add sunxi interrupt controller device

2013-11-26 Thread Li Guang

Peter Crosthwaite wrote:

On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com  wrote:
   

Signed-off-by: liguanglig.f...@cn.fujitsu.com
---
  default-configs/arm-softmmu.mak |1 +
  hw/intc/Makefile.objs   |1 +
  hw/intc/sunxi-pic.c |  238 +++
  include/hw/intc/sunxi-pic.h |   20 
  4 files changed, 260 insertions(+), 0 deletions(-)
  create mode 100644 hw/intc/sunxi-pic.c
  create mode 100644 include/hw/intc/sunxi-pic.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 7bf5ad0..bbe00e4 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -83,3 +83,4 @@ CONFIG_SDHCI=y
  CONFIG_INTEGRATOR_DEBUG=y

  CONFIG_SUNXI_PIT=y
+CONFIG_SUNXI_PIC=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 47ac442..dad8c43 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -12,6 +12,7 @@ common-obj-$(CONFIG_IOAPIC) += ioapic_common.o
  common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o
  common-obj-$(CONFIG_ARM_GIC) += arm_gic.o
  common-obj-$(CONFIG_OPENPIC) += openpic.o
+common-obj-$(CONFIG_SUNXI_PIC) += sunxi-pic.o

  obj-$(CONFIG_APIC) += apic.o apic_common.o
  obj-$(CONFIG_ARM_GIC_KVM) += arm_gic_kvm.o
diff --git a/hw/intc/sunxi-pic.c b/hw/intc/sunxi-pic.c
new file mode 100644
index 000..e84fc55
--- /dev/null
+++ b/hw/intc/sunxi-pic.c
@@ -0,0 +1,238 @@
+/*
+ * Allwinner sunxi interrupt controller device emulation
+ *
+ * Copyright (C) 2013 Li Guang
+ * Written by Li Guanglig.f...@cn.fujitsu.com
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include hw/sysbus.h
+#include hw/devices.h
+#include sysemu/sysemu.h
+#include hw/intc/sunxi-pic.h
+
+
+typedef struct SunxiPICState {
+/*  private*/
+SysBusDevice parent_obj;
+/*  public*/
+MemoryRegion iomem;
+qemu_irq parent_fiq;
+qemu_irq parent_irq;
 

Blank line here for readability.

   


OK

+uint32_t vector;
+uint32_t base_addr;
+uint32_t protect;
+uint32_t nmi;
+uint32_t irq_pending[SUNXI_PIC_REG_IDX];
+uint32_t fiq_pending[SUNXI_PIC_REG_IDX];
 

this appears to be constant 0. I cant find anywhere that sets fiq_pending.

   


here, only NMI can generate fiq,
and I did take care NMI now,
so there's no real case to set fiq.


+uint32_t select[SUNXI_PIC_REG_IDX];
+uint32_t enable[SUNXI_PIC_REG_IDX];
+uint32_t mask[SUNXI_PIC_REG_IDX];
+/*priority setting here*/
+} SunxiPICState;
+
+static void sunxi_pic_update(SunxiPICState *s)
+{
+uint8_t i = 0, j = 0;
 

Initialisers un-needed.

   

OK

+bool irq = false;
+
+for (i = 0, j = 0; i  SUNXI_PIC_REG_IDX; i++) {
+for (j = 0; j  32; j++) {
+if (!test_bit(j, (void *)s-mask[i])) {
 

You can save on a level of indentation here with:

if (test_bit(j, (void *)s-mask[i])) {
 continue;
}

   

OK

+if (test_bit(j, (void *)s-irq_pending[i])) {
+qemu_set_irq(s-parent_irq, 1);
 

qemu_irq_raise() is simpler.

I can't find anywhere in the code where the interrupts are lowered.
How do they come down, once they are up?

   





+irq = true;
+}
+if (test_bit(j, (void *)s-fiq_pending[i])
+test_bit(j, (void *)s-select[i])) {
+qemu_set_irq(s-parent_fiq, 1);
+irq = true;
+}
+if (irq) {
+goto out;
 

What happens if two interrupts are both active, the first setting IRQ
the second FIQ? Wont this escape logic cause FIQ raising to
potentionally be missed?

   

+}
+}
+}
+}
+out:
+return;
+}
+
+static void sunxi_pic_set_irq(void *opaque, int irq, int level)
+{
+SunxiPICState *s = opaque;
+
+if (level) {
+set_bit(irq, (void *)s-irq_pending[irq/32]);
 

set_bit(irq % 32, ...)

   


OK


+}
 

Is this supposed to set either fiq_pending or irq_pending depending on
the select bit?

   


Yes, but, I as I stated before, maybe I will take NMI later.

+sunxi_pic_update(s);
+}
+
+static uint64_t sunxi_pic_read(void *opaque, hwaddr offset, unsigned size)
+{
+SunxiPICState *s = opaque;
+uint8_t index = (offset  0xc)/4;
+
+switch (offset) {
+case SUNXI_PIC_VECTOR:
+return s-vector;
+break;
+case SUNXI_PIC_BASE_ADDR:
+return s-base_addr;
+break;
+case 

Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device

2013-11-26 Thread Peter Crosthwaite
On Tue, Nov 26, 2013 at 6:59 PM, Li Guang lig.f...@cn.fujitsu.com wrote:
 Peter Crosthwaite wrote:

 On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com  wrote:


 Signed-off-by: liguanglig.f...@cn.fujitsu.com
 ---
   default-configs/arm-softmmu.mak |2 +
   hw/timer/Makefile.objs  |1 +
   hw/timer/sunxi-pit.c|  295
 +++
   include/hw/timer/sunxi-pit.h|   37 +
   4 files changed, 335 insertions(+), 0 deletions(-)
   create mode 100644 hw/timer/sunxi-pit.c
   create mode 100644 include/hw/timer/sunxi-pit.h

 diff --git a/default-configs/arm-softmmu.mak
 b/default-configs/arm-softmmu.mak
 index a555eef..7bf5ad0 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
 @@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y

   CONFIG_SDHCI=y
   CONFIG_INTEGRATOR_DEBUG=y
 +
 +CONFIG_SUNXI_PIT=y
 diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
 index eca5905..f7888e9 100644
 --- a/hw/timer/Makefile.objs
 +++ b/hw/timer/Makefile.objs
 @@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o
   obj-$(CONFIG_TUSB6010) += tusb6010.o

   obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 +obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o
 diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c
 new file mode 100644
 index 000..39b84ab
 --- /dev/null
 +++ b/hw/timer/sunxi-pit.c
 @@ -0,0 +1,295 @@
 +/*
 + * Allwinner sunxi timer device emulation
 + *
 + * Copyright (C) 2013 Li Guang
 + * Written by Li Guanglig.f...@cn.fujitsu.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 it
 + * under the terms of the GNU General Public License as published by the
 + * Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful, but
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 + * for more details.
 + */
 +
 +#include hw/sysbus.h
 +#include hw/ptimer.h
 +#include sysemu/sysemu.h
 +#include hw/timer/sunxi-pit.h
 +
 +
 +typedef struct SunxiTimer {
 +ptimer_state *timer;
 +} SunxiTimer;
 +


 I don't understand the need for this struct. What was wrong with the
 direct array of ptimers you had before?



 because I have to pack timer array into VMSD,
 and there's no VMSTATE_PTIMER_ARRAY for ptimer_state.


Anyway you can just use VMSTATE_STRUCT_ARRAY on ptimers own VMSD definition?

If you cant, then I think you make a reasonable case for moving the
relevant bits and pieces to headers so they are visible. That or
implement VMSTATE_PTIMER_ARRAY.




 +typedef struct SunxiPITState {
 +/*  private*/
 +SysBusDevice parent_obj;
 +/*  public*/
 +qemu_irq irq[SUNXI_TIMER_NR];
 +SunxiTimer timer[SUNXI_TIMER_NR];
 +MemoryRegion iomem;
 +
 +uint32_t  irq_enable;
 +uint32_t irq_status;
 +uint32_t control[SUNXI_TIMER_NR];
 +uint32_t interval[SUNXI_TIMER_NR];
 +uint32_t count[SUNXI_TIMER_NR];
 +uint32_t watch_dog_mode;
 +uint32_t watch_dog_control;
 +uint32_t count_lo;
 +uint32_t count_hi;
 +uint32_t count_ctl;
 +} SunxiPITState;
 +
 +static uint64_t sunxi_pit_read(void *opaque, hwaddr offset, unsigned
 size)
 +{
 +SunxiPITState *s = SUNXI_PIT(opaque);
 +uint8_t index = 0;


 initializer to 0 un-needed.




 OK.

 +
 +switch (offset) {
 +case SUNXI_TIMER_IRQ_EN:
 +return s-irq_enable;
 +break;
 +case SUNXI_TIMER_IRQ_ST:
 +return s-irq_status;
 +break;
 +case SUNXI_TIMER_BASE ...  SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT:
 +index = offset  0xf0;

 +index= 4;
 +index -= 1;
 +switch (offset  0x0f) {

 +case SUNXI_TIMER_CONTROL:
 +return s-control[index];
 +break;
 +case SUNXI_TIMER_INTERVAL:
 +return s-interval[index];
 +break;
 +case SUNXI_TIMER_COUNT: {
 +SunxiTimer *t =s-timer[index];

 +s-count[index] = ptimer_get_count(t-timer);
 +return s-count[index];
 +}
 +default:
 +break;
 +}
 +break;
 +case SUNXI_WDOG_CONTROL:
 +break;
 +case SUNXI_WDOG_MODE:
 +break;
 +case SUNXI_COUNT_LO:
 +return s-count_lo;
 +break;
 +case SUNXI_COUNT_HI:
 +return s-count_hi;
 +break;
 +case SUNXI_COUNT_CTL:
 +return s-count_ctl;
 +default:
 +break;


 Usual to do a log_guest error here. Same for writes below.



 OK.



 +}
 +
 +return 0;
 +}
 +
 +static void sunxi_pit_write(void *opaque, hwaddr offset, uint64_t value,
 +unsigned size)
 +{
 + SunxiPITState *s = SUNXI_PIT(opaque);
 + uint8_t index = 0;
 +
 +switch (offset) {
 +case SUNXI_TIMER_IRQ_EN:
 +s-irq_enable = 

Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device

2013-11-26 Thread Li Guang

Peter Crosthwaite wrote:

On Tue, Nov 26, 2013 at 6:59 PM, Li Guanglig.f...@cn.fujitsu.com  wrote:
   

Peter Crosthwaite wrote:
 

On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com   wrote:

   

Signed-off-by: liguanglig.f...@cn.fujitsu.com
---
   default-configs/arm-softmmu.mak |2 +
   hw/timer/Makefile.objs  |1 +
   hw/timer/sunxi-pit.c|  295
+++
   include/hw/timer/sunxi-pit.h|   37 +
   4 files changed, 335 insertions(+), 0 deletions(-)
   create mode 100644 hw/timer/sunxi-pit.c
   create mode 100644 include/hw/timer/sunxi-pit.h

diff --git a/default-configs/arm-softmmu.mak
b/default-configs/arm-softmmu.mak
index a555eef..7bf5ad0 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y

   CONFIG_SDHCI=y
   CONFIG_INTEGRATOR_DEBUG=y
+
+CONFIG_SUNXI_PIT=y
diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index eca5905..f7888e9 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o
   obj-$(CONFIG_TUSB6010) += tusb6010.o

   obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
+obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o
diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c
new file mode 100644
index 000..39b84ab
--- /dev/null
+++ b/hw/timer/sunxi-pit.c
@@ -0,0 +1,295 @@
+/*
+ * Allwinner sunxi timer device emulation
+ *
+ * Copyright (C) 2013 Li Guang
+ * Written by Li Guanglig.f...@cn.fujitsu.com
+ *
+ * This program is free software; you can redistribute it and/or modify
it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include hw/sysbus.h
+#include hw/ptimer.h
+#include sysemu/sysemu.h
+#include hw/timer/sunxi-pit.h
+
+
+typedef struct SunxiTimer {
+ptimer_state *timer;
+} SunxiTimer;
+

 

I don't understand the need for this struct. What was wrong with the
direct array of ptimers you had before?

   


because I have to pack timer array into VMSD,
and there's no VMSTATE_PTIMER_ARRAY for ptimer_state.

 

Anyway you can just use VMSTATE_STRUCT_ARRAY on ptimers own VMSD definition?

If you cant, then I think you make a reasonable case for moving the
relevant bits and pieces to headers so they are visible. That or
implement VMSTATE_PTIMER_ARRAY.

   


right, but that seems be in a separated patch,
I consider to use current way, can I?
 
   

+typedef struct SunxiPITState {
+/*   private*/
+SysBusDevice parent_obj;
+/*   public*/
+qemu_irq irq[SUNXI_TIMER_NR];
+SunxiTimer timer[SUNXI_TIMER_NR];
+MemoryRegion iomem;
+
+uint32_t  irq_enable;
+uint32_t irq_status;
+uint32_t control[SUNXI_TIMER_NR];
+uint32_t interval[SUNXI_TIMER_NR];
+uint32_t count[SUNXI_TIMER_NR];
+uint32_t watch_dog_mode;
+uint32_t watch_dog_control;
+uint32_t count_lo;
+uint32_t count_hi;
+uint32_t count_ctl;
+} SunxiPITState;
+
+static uint64_t sunxi_pit_read(void *opaque, hwaddr offset, unsigned
size)
+{
+SunxiPITState *s = SUNXI_PIT(opaque);
+uint8_t index = 0;

 

initializer to 0 un-needed.


   


OK.
 

+
+switch (offset) {
+case SUNXI_TIMER_IRQ_EN:
+return s-irq_enable;
+break;
+case SUNXI_TIMER_IRQ_ST:
+return s-irq_status;
+break;
+case SUNXI_TIMER_BASE ...  SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT:
+index = offset   0xf0;

+index= 4;
+index -= 1;
+switch (offset   0x0f) {

+case SUNXI_TIMER_CONTROL:
+return s-control[index];
+break;
+case SUNXI_TIMER_INTERVAL:
+return s-interval[index];
+break;
+case SUNXI_TIMER_COUNT: {
+SunxiTimer *t =s-timer[index];

+s-count[index] = ptimer_get_count(t-timer);
+return s-count[index];
+}
+default:
+break;
+}
+break;
+case SUNXI_WDOG_CONTROL:
+break;
+case SUNXI_WDOG_MODE:
+break;
+case SUNXI_COUNT_LO:
+return s-count_lo;
+break;
+case SUNXI_COUNT_HI:
+return s-count_hi;
+break;
+case SUNXI_COUNT_CTL:
+return s-count_ctl;
+default:
+break;

 

Usual to do a log_guest error here. Same for writes below.

   


OK.
 


   

+}
+
+return 0;
+}
+
+static void sunxi_pit_write(void *opaque, hwaddr offset, uint64_t value,
+unsigned size)
+{
+ SunxiPITState *s = SUNXI_PIT(opaque);
+ uint8_t 

Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 03:07, Fam Zheng ha scritto:

 +void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error
 *reason)

 What about making BlockOpType a bitmask, and having bdrv_op_{,un}block
 take multiple ORed BlockOpTypes?

 bdrv_op_{,un}block_all then are not necessary, you only need a
 BLOCK_OPERATION_ALL value.

 
 Bitmap is not enough, at least it should be an array. For example when
 we enable multiple block jobs, there're two stoppers for drive_del, right?

I said bitmask, not bitmap. :)

So that you can add the same Error to multiple items of the array with a
single bdrv_op_block call (by ORing them into the second parameter).

Paolo



Re: [Qemu-devel] [PATCH v4 3/4] hw/arm: add sunxi machine type

2013-11-26 Thread Peter Crosthwaite
On Tue, Nov 26, 2013 at 5:22 PM, liguang lig.f...@cn.fujitsu.com wrote:
 Signed-off-by: liguang lig.f...@cn.fujitsu.com
 ---
  hw/arm/Makefile.objs |1 +
  hw/arm/sunxi-soc.c   |   98 
 ++
  2 files changed, 99 insertions(+), 0 deletions(-)
  create mode 100644 hw/arm/sunxi-soc.c

 diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
 index 3671b42..f9f3071 100644
 --- a/hw/arm/Makefile.objs
 +++ b/hw/arm/Makefile.objs
 @@ -5,3 +5,4 @@ obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o

  obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
  obj-y += omap1.o omap2.o strongarm.o
 +obj-y += sunxi-soc.o
 diff --git a/hw/arm/sunxi-soc.c b/hw/arm/sunxi-soc.c
 new file mode 100644
 index 000..b45af6d
 --- /dev/null
 +++ b/hw/arm/sunxi-soc.c
 @@ -0,0 +1,98 @@
 +/*
 + * Allwinner sunxi series SoC emulation
 + *
 + * Copyright (C) 2013 Li Guang
 + * Written by Li Guang lig.f...@cn.fujitsu.com
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms of the GNU General Public License as published by the
 + * Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful, but 
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 + * for more details.
 + */
 +
 +#include hw/sysbus.h
 +#include hw/devices.h
 +#include hw/boards.h
 +#include hw/arm/arm.h
 +#include hw/ptimer.h
 +#include hw/char/serial.h
 +#include hw/timer/sunxi-pit.h
 +#include hw/intc/sunxi-pic.h
 +
 +#include sysemu/sysemu.h
 +#include exec/address-spaces.h
 +
 +
 +#define SUNXI_PIC_REG_BASE 0x01c20400
 +#define SUNXI_PIT_REG_BASE 0x01c20c00
 +#define SUNXI_UART0_REG_BASE 0x01c28000
 +
 +static struct arm_boot_info sunxi_binfo = {
 +.loader_start = 0x4000,
 +.board_id = 0x1008,
 +};
 +
 +static void sunxi_init(QEMUMachineInitArgs *args)

I would check with Andreas/PMM on what the go is with SoCs regarding
container devices and boards. My (vague) understanding is that SoCs
should be container devices and boards instantiate those containers
with off-chip connectivity. This seems flat to me, with everything on
board level.

 +{
 +ram_addr_t ram_size = args-ram_size;
 +const char *cpu_model = args-cpu_model;
 +const char *kernel_filename = args-kernel_filename;
 +const char *kernel_cmdline = args-kernel_cmdline;
 +ARMCPU *cpu;
 +MemoryRegion *address_space_mem = get_system_memory();
 +MemoryRegion *ram = g_new(MemoryRegion, 1);
 +MemoryRegion *ram_alias = g_new(MemoryRegion, 1);
 +qemu_irq pic[95];

[SUNXI_PIC_INT_NR]

Regards,
Peter

 +DeviceState *dev;
 +uint8_t i;
 +
 +/*here we currently support sunxi-4i*/
 +cpu_model = cortex-a8;
 +cpu = cpu_arm_init(cpu_model);
 +if (!cpu) {
 +fprintf(stderr, Unable to find CPU definition\n);
 +exit(1);
 +}
 +
 +memory_region_init_ram(ram, NULL, sunxi-soc.ram, ram_size);
 +memory_region_add_subregion(address_space_mem, 0, ram);
 +memory_region_init_alias(ram_alias, NULL, ram.alias, ram, 0, ram_size);
 +memory_region_add_subregion(address_space_mem, 0x4000, ram_alias);
 +
 +dev = sysbus_create_varargs(TYPE_SUNXI_PIC, SUNXI_PIC_REG_BASE,
 +qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ),
 +qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ),
 +NULL);
 +for (i = 0; i  SUNXI_PIC_INT_NR; i++) {
 +pic[i] = qdev_get_gpio_in(dev, i);
 +}
 +
 +sysbus_create_varargs(TYPE_SUNXI_PIT, SUNXI_PIT_REG_BASE, pic[22], 
 pic[23],
 +  pic[24], pic[25], pic[67], pic[68], NULL);
 +
 +serial_mm_init(address_space_mem, SUNXI_UART0_REG_BASE, 2, pic[1], 
 115200,
 +serial_hds[0], DEVICE_NATIVE_ENDIAN);
 +
 +sunxi_binfo.ram_size = ram_size;
 +sunxi_binfo.kernel_filename = kernel_filename;
 +sunxi_binfo.kernel_cmdline = kernel_cmdline;
 +arm_load_kernel(cpu, sunxi_binfo);
 +}
 +
 +static QEMUMachine sunxi_machine = {
 +.name = sunxi,
 +.desc = Allwinner's SoC (sunxi series),
 +.init = sunxi_init,
 +};
 +
 +static void sunxi_machine_init(void)
 +{
 +qemu_register_machine(sunxi_machine);
 +}
 +
 +machine_init(sunxi_machine_init);
 --
 1.7.2.5





Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device

2013-11-26 Thread Peter Crosthwaite
On Tue, Nov 26, 2013 at 7:19 PM, Li Guang lig.f...@cn.fujitsu.com wrote:
 Peter Crosthwaite wrote:

 On Tue, Nov 26, 2013 at 6:59 PM, Li Guanglig.f...@cn.fujitsu.com  wrote:


 Peter Crosthwaite wrote:


 On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com
 wrote:



 Signed-off-by: liguanglig.f...@cn.fujitsu.com
 ---
default-configs/arm-softmmu.mak |2 +
hw/timer/Makefile.objs  |1 +
hw/timer/sunxi-pit.c|  295
 +++
include/hw/timer/sunxi-pit.h|   37 +
4 files changed, 335 insertions(+), 0 deletions(-)
create mode 100644 hw/timer/sunxi-pit.c
create mode 100644 include/hw/timer/sunxi-pit.h

 diff --git a/default-configs/arm-softmmu.mak
 b/default-configs/arm-softmmu.mak
 index a555eef..7bf5ad0 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
 @@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y

CONFIG_SDHCI=y
CONFIG_INTEGRATOR_DEBUG=y
 +
 +CONFIG_SUNXI_PIT=y
 diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
 index eca5905..f7888e9 100644
 --- a/hw/timer/Makefile.objs
 +++ b/hw/timer/Makefile.objs
 @@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o
obj-$(CONFIG_TUSB6010) += tusb6010.o

obj-$(CONFIG_MC146818RTC) += mc146818rtc.o
 +obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o
 diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c
 new file mode 100644
 index 000..39b84ab
 --- /dev/null
 +++ b/hw/timer/sunxi-pit.c
 @@ -0,0 +1,295 @@
 +/*
 + * Allwinner sunxi timer device emulation
 + *
 + * Copyright (C) 2013 Li Guang
 + * Written by Li Guanglig.f...@cn.fujitsu.com
 + *
 + * This program is free software; you can redistribute it and/or
 modify
 it
 + * under the terms of the GNU General Public License as published by
 the
 + * Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful, but
 WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 or
 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
 License
 + * for more details.
 + */
 +
 +#include hw/sysbus.h
 +#include hw/ptimer.h
 +#include sysemu/sysemu.h
 +#include hw/timer/sunxi-pit.h
 +
 +
 +typedef struct SunxiTimer {
 +ptimer_state *timer;
 +} SunxiTimer;
 +



 I don't understand the need for this struct. What was wrong with the
 direct array of ptimers you had before?




 because I have to pack timer array into VMSD,
 and there's no VMSTATE_PTIMER_ARRAY for ptimer_state.



 Anyway you can just use VMSTATE_STRUCT_ARRAY on ptimers own VMSD
 definition?

 If you cant, then I think you make a reasonable case for moving the
 relevant bits and pieces to headers so they are visible. That or
 implement VMSTATE_PTIMER_ARRAY.




 right, but that seems be in a separated patch,
 I consider to use current way, can I?


Well you have effectively implemented VMSTATE_PTIMER_ARRAY here
locally. Its probably best to just fix ptimer as first patch in this
series and it keeps your device model cleaner from the start. You may
have already done the hard work.

Regards,
Peter



Re: [Qemu-devel] [PATCHv2 1.8 2/9] qemu-img: fix usage instruction for qemu-img convert

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 09:56, Peter Lieven ha scritto:
 Reviewed-by: Eric Blake ebl...@redhat.com
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  qemu-img.c |1 -
  1 file changed, 1 deletion(-)
 
 diff --git a/qemu-img.c b/qemu-img.c
 index efb744c..e2d1a0a 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -105,7 +105,6 @@ static void help(void)
conversion. If the number of bytes is 0, the source will 
 not be scanned for\n
unallocated or zero sectors, and the destination image 
 will always be\n
fully allocated\n
 -  images will always be fully allocated\n
   '--output' takes the format in which the output must be done 
 (human or json)\n
   '-n' skips the target volume creation (useful if the volume is 
 created\n
prior to running qemu-img)\n
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-devel] [PATCHv2 1.8 5/9] block/iscsi: set bs-bl.opt_transfer_length

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 09:56, Peter Lieven ha scritto:
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  block/iscsi.c |3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/block/iscsi.c b/block/iscsi.c
 index 75d6b87..1109d8c 100644
 --- a/block/iscsi.c
 +++ b/block/iscsi.c
 @@ -1457,6 +1457,9 @@ static int iscsi_open(BlockDriverState *bs, QDict 
 *options, int flags,
  }
  bs-bl.write_zeroes_alignment = 
 sector_lun2qemu(iscsilun-bl.opt_unmap_gran,
  iscsilun);
 +
 +bs-bl.opt_transfer_length = 
 sector_lun2qemu(iscsilun-bl.opt_xfer_len,
 + iscsilun);
  }
  
  #if defined(LIBISCSI_FEATURE_NOP_COUNTER)
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-devel] [PATCHv2 1.8 4/9] block: add opt_transfer_length to BlockLimits

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 09:56, Peter Lieven ha scritto:
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  include/block/block_int.h |3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/include/block/block_int.h b/include/block/block_int.h
 index 95140b6..6499516 100644
 --- a/include/block/block_int.h
 +++ b/include/block/block_int.h
 @@ -242,6 +242,9 @@ typedef struct BlockLimits {
  
  /* optimal alignment for write zeroes requests in sectors */
  int64_t write_zeroes_alignment;
 +
 +/* optimal transfer length in sectors */
 +int opt_transfer_length;
  } BlockLimits;
  
  /*
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-devel] [PATCHv2 1.8 6/9] qemu-img: dynamically adjust iobuffer size during convert

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 09:56, Peter Lieven ha scritto:
 since the convert process is basically a sync operation it might
 be benificial in some case to change the hardcoded I/O buffer
 size to a greater value.
 
 This patch increases the I/O buffer size if the output
 driver advertises an optimal transfer length or discard alignment
 that is greater than the default buffer size of 2M.
 
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  qemu-img.c |   18 ++
  1 file changed, 14 insertions(+), 4 deletions(-)
 
 diff --git a/qemu-img.c b/qemu-img.c
 index e2d1a0a..b076d44 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -1135,6 +1135,7 @@ static int img_convert(int argc, char **argv)
  sector_num_next_status = 0;
  uint64_t bs_sectors;
  uint8_t * buf = NULL;
 +size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE;
  const uint8_t *buf1;
  BlockDriverInfo bdi;
  QEMUOptionParameter *param = NULL, *create_options = NULL;
 @@ -1371,7 +1372,16 @@ static int img_convert(int argc, char **argv)
  bs_i = 0;
  bs_offset = 0;
  bdrv_get_geometry(bs[0], bs_sectors);
 -buf = qemu_blockalign(out_bs, IO_BUF_SIZE);
 +
 +/* increase bufsectors from the default 4096 (2M) if opt_transfer_length
 + * or discard_alignment of the out_bs is greater. Limit to 32768 (16MB)
 + * as maximum. */
 +bufsectors = MIN(32768,
 + MAX(bufsectors, MAX(out_bs-bl.opt_transfer_length,
 + out_bs-bl.discard_alignment))
 +);
 +
 +buf = qemu_blockalign(out_bs, bufsectors * BDRV_SECTOR_SIZE);
  
  if (skip_create) {
  int64_t output_length = bdrv_getlength(out_bs);
 @@ -1394,7 +1404,7 @@ static int img_convert(int argc, char **argv)
  goto out;
  }
  cluster_size = bdi.cluster_size;
 -if (cluster_size = 0 || cluster_size  IO_BUF_SIZE) {
 +if (cluster_size = 0 || cluster_size  bufsectors * 
 BDRV_SECTOR_SIZE) {
  error_report(invalid cluster size);
  ret = -1;
  goto out;
 @@ -1531,8 +1541,8 @@ static int img_convert(int argc, char **argv)
  sector_num_next_status = sector_num + n1;
  }
  
 -if (nb_sectors = (IO_BUF_SIZE / 512)) {
 -n = (IO_BUF_SIZE / 512);
 +if (nb_sectors = bufsectors) {
 +n = bufsectors;
  } else {
  n = nb_sectors;
  }
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-devel] [PATCHv2 1.8 7/9] qemu-img: round down request length to an aligned sector

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 09:56, Peter Lieven ha scritto:
 this patch shortens requests to end at an aligned sector so that
 the next request starts aligned.
 
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  qemu-img.c |   30 --
  1 file changed, 20 insertions(+), 10 deletions(-)
 
 diff --git a/qemu-img.c b/qemu-img.c
 index b076d44..1421f0f 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -1124,8 +1124,7 @@ out3:
  
  static int img_convert(int argc, char **argv)
  {
 -int c, n, n1, bs_n, bs_i, compress, cluster_size,
 -cluster_sectors, skip_create;
 +int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create;
  int64_t ret = 0;
  int progress = 0, flags;
  const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
 @@ -1397,19 +1396,21 @@ static int img_convert(int argc, char **argv)
  }
  }
  
 +cluster_sectors = 0;
 +ret = bdrv_get_info(out_bs, bdi);
 +if (ret  0  compress) {
 +error_report(could not get block driver info);
 +goto out;
 +} else {
 +cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE;
 +}
 +
  if (compress) {
 -ret = bdrv_get_info(out_bs, bdi);
 -if (ret  0) {
 -error_report(could not get block driver info);
 -goto out;
 -}
 -cluster_size = bdi.cluster_size;
 -if (cluster_size = 0 || cluster_size  bufsectors * 
 BDRV_SECTOR_SIZE) {
 +if (cluster_sectors = 0 || cluster_sectors  bufsectors) {
  error_report(invalid cluster size);
  ret = -1;
  goto out;
  }
 -cluster_sectors = cluster_size  9;
  sector_num = 0;
  
  nb_sectors = total_sectors;
 @@ -1547,6 +1548,15 @@ static int img_convert(int argc, char **argv)
  n = nb_sectors;
  }
  
 +/* round down request length to an aligned sector */

If you have to respin, /* do not bother doing this on short requests.
They happen when we found an all-zero area, and the next sector to write
will not be sector_num + n.  */

 +if (cluster_sectors  0  n = cluster_sectors) {
 +int64_t next_aligned_sector = (sector_num + n);
 +next_aligned_sector -= next_aligned_sector % cluster_sectors;
 +if (sector_num + n  next_aligned_sector) {
 +n = next_aligned_sector - sector_num;
 +}
 +}
 +
  if (n  bs_offset + bs_sectors - sector_num) {
  n = bs_offset + bs_sectors - sector_num;
  }
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-devel] [PATCHv2 1.8 9/9] qemu-img: increase min_sparse to 128 sectors (64kb)

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 09:56, Peter Lieven ha scritto:
 Suggested-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  qemu-img.c|4 ++--
  qemu-img.texi |2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/qemu-img.c b/qemu-img.c
 index cc7540f..6f1179b 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -101,7 +101,7 @@ static void help(void)
   '-p' show progress of command (only certain commands)\n
   '-pp' show progress of command in sectors (only convert 
 command)\n
   '-q' use Quiet mode - do not print any output (except 
 errors)\n
 - '-S' indicates the consecutive number of bytes (defaults to 
 4k) that must\n
 + '-S' indicates the consecutive number of bytes (defaults to 
 64k) that must\n
contain only zeros for qemu-img to create a sparse image 
 during\n
conversion. If the number of bytes is 0, the source will 
 not be scanned for\n
unallocated or zero sectors, and the destination image 
 will always be\n
 @@ -1158,7 +1158,7 @@ static int img_convert(int argc, char **argv)
  QEMUOptionParameter *out_baseimg_param;
  char *options = NULL;
  const char *snapshot_name = NULL;
 -int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */
 +int min_sparse = 128; /* Need at least 64k of zeros for sparse detection 
 */
  bool quiet = false;
  Error *local_err = NULL;
  
 diff --git a/qemu-img.texi b/qemu-img.texi
 index cb4a3eb..c0e86ab 100644
 --- a/qemu-img.texi
 +++ b/qemu-img.texi
 @@ -195,7 +195,7 @@ Image conversion is also useful to get smaller image when 
 using a
  growable format such as @code{qcow} or @code{cow}: the empty sectors
  are detected and suppressed from the destination image.
  
 -@var{sparse_size} indicates the consecutive number of bytes (defaults to 4k)
 +@var{sparse_size} indicates the consecutive number of bytes (defaults to 64k)
  that must contain only zeros for qemu-img to create a sparse image during
  conversion. If @var{sparse_size} is 0, the source will not be scanned for
  unallocated or zero sectors, and the destination image will always be
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-devel] [PATCH 14/16] pc: Postpone adding ACPI and SMBIOS to fw_cfg

2013-11-26 Thread Michael S. Tsirkin
On Thu, Nov 14, 2013 at 07:40:31AM -0600, Corey Minyard wrote:
 On 11/14/2013 07:38 AM, Michael S. Tsirkin wrote:
  On Thu, Nov 14, 2013 at 07:28:00AM -0600, Corey Minyard wrote:
  On 11/14/2013 01:30 AM, Michael S. Tsirkin wrote:
  On Tue, Nov 12, 2013 at 10:33:13AM -0600, Corey Minyard wrote:
  Postpone the addition of the ACPI and SMBIOS tables until after
  device initialization.  This allows devices to add entries to these
  tables.
 
  Signed-off-by: Corey Minyard cminy...@mvsita.com
  Why delay adding FW_CFG_ACPI_TABLES?
  These are normally specified by user so they are constant.
  Because the IPMI device has to dynamically add an entry based on its
  configuration.
  Where? Which patch does this? It would be a strange thing to do
  because FW_CFG_ACPI_TABLES is normally intended for user-supplied tables
  (though q35 misused it for other purposes).
 
 I don't have that patch out yet.  I had written some code to dynamically
 generate ACPI tables, but that was rejected and from what I understand
 something was done recently so these tables can be dynamically generated
 some other way.  I haven't had time to look into this, but I wanted to
 get the main patch set out first.
 
 I know that these tables are generally fixed, but the entry for IPMI
 should only be present if the device is specified, and its contents
 depend on the configuration supplied,
 
 -corey

OK need some help with that?
Main options are

- write code in ASL, supply entry in ACPI always, patch some fields
  to enable/disable it dynamically

see how pvpanic entry is disabled for an example

- write code in ASL append it to ACPI if necessary

see how CPU entries are added depending on number of CPUs for an example

- generate code in AML directly

see acpi based pci hotplug on pci branch in my tree for an example

 
  As it is the tables get added to the BIOS access before
  devices initialize.
 
  -corey
 
  ---
   hw/i386/pc.c | 38 ++
   1 file changed, 30 insertions(+), 8 deletions(-)
 
  diff --git a/hw/i386/pc.c b/hw/i386/pc.c
  index dee409d..765c95e 100644
  --- a/hw/i386/pc.c
  +++ b/hw/i386/pc.c
  @@ -607,8 +607,6 @@ static unsigned int pc_apic_id_limit(unsigned int 
  max_cpus)
   static FWCfgState *bochs_bios_init(void)
   {
   FWCfgState *fw_cfg;
  -uint8_t *smbios_table;
  -size_t smbios_len;
   uint64_t *numa_fw_cfg;
   int i, j;
   unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
  @@ -631,14 +629,8 @@ static FWCfgState *bochs_bios_init(void)
   fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)apic_id_limit);
   fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1);
   fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
  -fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES,
  - acpi_tables, acpi_tables_len);
   fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, 
  kvm_allows_irq0_override());
   
  -smbios_table = smbios_get_table(smbios_len);
  -if (smbios_table)
  -fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES,
  - smbios_table, smbios_len);
   fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
e820_table, sizeof(e820_table));
   
  @@ -1127,6 +1119,31 @@ void pc_acpi_init(const char *default_dsdt)
   }
   }
   
  +struct pc_bios_post_init {
  +Notifier post_init;
  +void *fw_cfg;
  +};
  +
  +/* Add the ACPI and SMBIOS tables after all the hardware has been 
  initialized.
  + * This gives devices a chance to add to those tables.
  + */
  +static void pc_bios_post_initfn(Notifier *n, void *opaque)
  +{
  +struct pc_bios_post_init *p = container_of(n, struct 
  pc_bios_post_init,
  +   post_init);
  +uint8_t *smbios_table;
  +size_t smbios_len;
  +
  +fw_cfg_add_bytes(p-fw_cfg, FW_CFG_ACPI_TABLES,
  + acpi_tables, acpi_tables_len);
  +smbios_table = smbios_get_table(smbios_len);
  +if (smbios_table)
  +fw_cfg_add_bytes(p-fw_cfg, FW_CFG_SMBIOS_ENTRIES,
  + smbios_table, smbios_len);
  +}
  +
  +static struct pc_bios_post_init post_init;
  +
   FWCfgState *pc_memory_init(MemoryRegion *system_memory,
  const char *kernel_filename,
  const char *kernel_cmdline,
  @@ -1196,6 +1213,11 @@ FWCfgState *pc_memory_init(MemoryRegion 
  *system_memory,
   rom_add_option(option_rom[i].name, option_rom[i].bootindex);
   }
   guest_info-fw_cfg = fw_cfg;
  +
  +post_init.fw_cfg = fw_cfg;
  +post_init.post_init.notify = pc_bios_post_initfn;
  +qemu_add_machine_init_done_notifier(post_init.post_init);
  +
   return fw_cfg;
   }
   
  -- 
  1.8.3.1



Re: [Qemu-devel] os x boot broken by commit 11948748495841bd54721b250d68c7b3cb0475ef

2013-11-26 Thread Michael S. Tsirkin
On Thu, Nov 21, 2013 at 05:02:07PM -0500, Gabriel L. Somlo wrote:
 Added qemu-devel, since that is where this stuff belongs now. Everyone
 else, sorry for the dupe...
 
 On Thu, Nov 21, 2013 at 07:14:27PM +0100, Paolo Bonzini wrote:
  Can you remind us about your DSDT modifications?  It should be possible
  to patch the HPET and applesmc bits appropriately from QEMU (or to move
  them from the DSDT to an SSDT that is built entirely in QEMU).
  
  It actually isn't impossible that Mac OS X would boot just fine with 1.8...
 
 My current DSDT patch (against QEMU) is enclosed below. The HPET
 basically needs IRQNoFlags() {2, 8}, which causes XP to bluescreen.
 
 So, I've made it conditional on the SMC STA method returning success
 (0x0B).
 
 The SMC node's STA method returns 0x0B unconditionally on real
 hardware. So I was planning on figuring out what's easier in the
 context of the most recent QEMU code base:
 
   1. dynamically generating (during qemu runtime initialization)
a DSDT entry for SMC with hardcoded 0x0B STA method, whenever
   --device isa-applesmc is present on the qemu command line
 
 or
 
   2. writing a static (compile-time) SMC node but with a slightly
   smarter _STA method, which returns 0x0B when --device isa-applesmc
   was given on the cmdline, or which returns 0x00 in the absence
   of --device isa-applesmc.
 
 Either 1. or 2. could be used with HPET -- I can make inclusion of
 IRQNoFlags dependent on either the success or on the presence of
 SMC._STA() :)
 
 Let me know what you think.
 
 Thanks,
 --Gabriel

Without discussing whether this is the right thing to do,
main options to implement this would be:

- write code in ASL, supply entry in ACPI always, patch some
  fields
  to enable/disable it dynamically

see how pvpanic entry is disabled for an example

- write code in ASL append it to ACPI if necessary

see how CPU entries are added depending on number of CPUs for an example

- generate code in AML directly

see acpi based pci hotplug on pci branch in my tree for an example




Re: [Qemu-devel] [PATCHv2 1.8 1/9] qemu-img: add support for skipping zeroes in input during convert

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 09:56, Peter Lieven ha scritto:
 we currently do not check if a sector is allocated during convert.
 This means if a sector is unallocated that we allocate a bounce
 buffer of zeroes, find out its zero later and do not write it
 in the best case. In the worst case this can lead to reading
 blocks from a raw device (like iSCSI) altough we could easily
 know via get_block_status that they are zero and simply skip them.
 
 This patch also fixes the progress output not being at 100% after
 a successful conversion.
 
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  qemu-img.c |   85 
 ++--
  1 file changed, 48 insertions(+), 37 deletions(-)
 
 diff --git a/qemu-img.c b/qemu-img.c
 index dc0c2f0..efb744c 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -1125,13 +1125,15 @@ out3:
  
  static int img_convert(int argc, char **argv)
  {
 -int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size,
 +int c, n, n1, bs_n, bs_i, compress, cluster_size,
  cluster_sectors, skip_create;
 +int64_t ret = 0;
  int progress = 0, flags;
  const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
  BlockDriver *drv, *proto_drv;
  BlockDriverState **bs = NULL, *out_bs = NULL;
 -int64_t total_sectors, nb_sectors, sector_num, bs_offset;
 +int64_t total_sectors, nb_sectors, sector_num, bs_offset,
 +sector_num_next_status = 0;
  uint64_t bs_sectors;
  uint8_t * buf = NULL;
  const uint8_t *buf1;
 @@ -1140,7 +1142,6 @@ static int img_convert(int argc, char **argv)
  QEMUOptionParameter *out_baseimg_param;
  char *options = NULL;
  const char *snapshot_name = NULL;
 -float local_progress = 0;
  int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */
  bool quiet = false;
  Error *local_err = NULL;
 @@ -1403,10 +1404,6 @@ static int img_convert(int argc, char **argv)
  sector_num = 0;
  
  nb_sectors = total_sectors;
 -if (nb_sectors != 0) {
 -local_progress = (float)100 /
 -(nb_sectors / MIN(nb_sectors, cluster_sectors));
 -}
  
  for(;;) {
  int64_t bs_num;
 @@ -1464,7 +1461,7 @@ static int img_convert(int argc, char **argv)
  }
  }
  sector_num += n;
 -qemu_progress_print(local_progress, 100);
 +qemu_progress_print(100.0 * sector_num / total_sectors, 0);
  }
  /* signal EOF to align */
  bdrv_write_compressed(out_bs, 0, NULL, 0);
 @@ -1481,21 +1478,13 @@ static int img_convert(int argc, char **argv)
  
  sector_num = 0; // total number of sectors converted so far
  nb_sectors = total_sectors - sector_num;
 -if (nb_sectors != 0) {
 -local_progress = (float)100 /
 -(nb_sectors / MIN(nb_sectors, IO_BUF_SIZE / 512));
 -}
  
  for(;;) {
  nb_sectors = total_sectors - sector_num;
  if (nb_sectors = 0) {
 +ret = 0;
  break;
  }
 -if (nb_sectors = (IO_BUF_SIZE / 512)) {
 -n = (IO_BUF_SIZE / 512);
 -} else {
 -n = nb_sectors;
 -}
  
  while (sector_num - bs_offset = bs_sectors) {
  bs_i ++;
 @@ -1507,34 +1496,53 @@ static int img_convert(int argc, char **argv)
 sector_num, bs_i, bs_offset, bs_sectors); */
  }
  
 -if (n  bs_offset + bs_sectors - sector_num) {
 -n = bs_offset + bs_sectors - sector_num;
 -}
 -
 -/* If the output image is being created as a copy on write image,
 -   assume that sectors which are unallocated in the input image
 -   are present in both the output's and input's base images (no
 -   need to copy them). */
 -if (out_baseimg) {
 -ret = bdrv_is_allocated(bs[bs_i], sector_num - bs_offset,
 -n, n1);
 +if ((out_baseimg || has_zero_init) 
 +sector_num = sector_num_next_status) {
 +n = nb_sectors  INT_MAX ? INT_MAX : nb_sectors;
 +ret = bdrv_get_block_status(bs[bs_i], sector_num - bs_offset,
 +n, n1);
  if (ret  0) {
 -error_report(error while reading metadata for sector 
 - % PRId64 : %s,
 - sector_num - bs_offset, strerror(-ret));
 +error_report(error while reading block status of sector 
 %
 + PRId64 : %s, sector_num - bs_offset,
 + strerror(-ret));
  goto out;
  }
 -if (!ret) {
 +/* If the output image is zero 

Re: [Qemu-devel] [PATCHv2 1.8 8/9] qemu-img: add option to show progress in sectors

2013-11-26 Thread Paolo Bonzini
I think the right way to do this would be to add the functionality to
qemu-progress.c (i.e. pass a number of sectors and let it choose between
printing % or sectors).

I don't like the qemu-progress.c API anyway, so a rewrite would be
welcome. :)

Paolo

Il 26/11/2013 09:56, Peter Lieven ha scritto:
 Signed-off-by: Peter Lieven p...@kamp.de
 ---
  qemu-img-cmds.hx |4 ++--
  qemu-img.c   |   31 ---
  qemu-img.texi|4 +++-
  3 files changed, 33 insertions(+), 6 deletions(-)
 
 diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
 index da1d965..6c8183b 100644
 --- a/qemu-img-cmds.hx
 +++ b/qemu-img-cmds.hx
 @@ -34,9 +34,9 @@ STEXI
  ETEXI
  
  DEF(convert, img_convert,
 -convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o 
 options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] 
 output_filename)
 +convert [-c] [-p|-pp] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o 
 options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] 
 output_filename)
  STEXI
 -@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O 
 @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S 
 @var{sparse_size}] @var{filename} [@var{filename2} [...]] 
 @var{output_filename}
 +@item convert [-c] [-p|-pp] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O 
 @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S 
 @var{sparse_size}] @var{filename} [@var{filename2} [...]] 
 @var{output_filename}
  ETEXI
  
  DEF(info, img_info,
 diff --git a/qemu-img.c b/qemu-img.c
 index 1421f0f..cc7540f 100644
 --- a/qemu-img.c
 +++ b/qemu-img.c
 @@ -99,6 +99,7 @@ static void help(void)
rebasing in this case (useful for renaming the backing 
 file)\n
   '-h' with or without a command shows this help and lists the 
 supported formats\n
   '-p' show progress of command (only certain commands)\n
 + '-pp' show progress of command in sectors (only convert 
 command)\n
   '-q' use Quiet mode - do not print any output (except 
 errors)\n
   '-S' indicates the consecutive number of bytes (defaults to 
 4k) that must\n
contain only zeros for qemu-img to create a sparse image 
 during\n
 @@ -1122,6 +1123,22 @@ out3:
  return ret;
  }
  
 +static void print_sector_progress(int progress, int64_t sector_num,
 +  int64_t total_sectors)
 +{
 +static int64_t last_sector = -1;
 +if (progress == 2) {
 +if (sector_num == 0 ||
 +sector_num  last_sector + 0.02 * total_sectors ||
 +sector_num == total_sectors) {
 +printf(%ld of %ld sectors converted.\r, sector_num,
 +   total_sectors);
 +fflush(stdout);
 +last_sector = sector_num;
 +}
 +}
 +}
 +
  static int img_convert(int argc, char **argv)
  {
  int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create;
 @@ -1130,7 +1147,7 @@ static int img_convert(int argc, char **argv)
  const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
  BlockDriver *drv, *proto_drv;
  BlockDriverState **bs = NULL, *out_bs = NULL;
 -int64_t total_sectors, nb_sectors, sector_num, bs_offset,
 +int64_t total_sectors = 0, nb_sectors, sector_num, bs_offset,
  sector_num_next_status = 0;
  uint64_t bs_sectors;
  uint8_t * buf = NULL;
 @@ -1201,7 +1218,7 @@ static int img_convert(int argc, char **argv)
  break;
  }
  case 'p':
 -progress = 1;
 +progress++;
  break;
  case 't':
  cache = optarg;
 @@ -1227,7 +1244,7 @@ static int img_convert(int argc, char **argv)
  out_filename = argv[argc - 1];
  
  /* Initialize before goto out */
 -qemu_progress_init(progress, 2.0);
 +qemu_progress_init(progress == 1, 2.0);
  
  if (options  is_help_option(options)) {
  ret = print_block_option_help(out_filename, out_fmt);
 @@ -1258,6 +1275,8 @@ static int img_convert(int argc, char **argv)
  total_sectors += bs_sectors;
  }
  
 +print_sector_progress(progress, 0, total_sectors);
 +
  if (snapshot_name != NULL) {
  if (bs_n  1) {
  error_report(No support for concatenating multiple snapshot);
 @@ -1472,6 +1491,7 @@ static int img_convert(int argc, char **argv)
  }
  sector_num += n;
  qemu_progress_print(100.0 * sector_num / total_sectors, 0);
 +print_sector_progress(progress, sector_num, total_sectors);
  }
  /* signal EOF to align */
  bdrv_write_compressed(out_bs, 0, NULL, 0);
 @@ -1587,11 +1607,16 @@ static int img_convert(int argc, char **argv)
  buf1 += n1 * 512;
  }
  qemu_progress_print(100.0 * sector_num / total_sectors, 0);
 +print_sector_progress(progress, sector_num, 

Re: [Qemu-devel] os x boot broken by commit 11948748495841bd54721b250d68c7b3cb0475ef

2013-11-26 Thread Michael S. Tsirkin
On Fri, Nov 22, 2013 at 10:00:20AM +0100, Paolo Bonzini wrote:
 Il 21/11/2013 23:02, Gabriel L. Somlo ha scritto:
  On Thu, Nov 21, 2013 at 07:14:27PM +0100, Paolo Bonzini wrote:
   Can you remind us about your DSDT modifications?  It should be possible
   to patch the HPET and applesmc bits appropriately from QEMU (or to move
   them from the DSDT to an SSDT that is built entirely in QEMU).
   
   It actually isn't impossible that Mac OS X would boot just fine with 
   1.8...
  My current DSDT patch (against QEMU) is enclosed below. The HPET
  basically needs IRQNoFlags() {2, 8}, which causes XP to bluescreen.
 
 The IRQNoFlags(){2,8} setting makes sense if the general configuration
 register of the HPET has bits 0..1=1 (HPET enabled = 1 and HPET legacy
 replacement route = 1).
 
 That would be something like
 
  Field(HPTM, DWordAcc, Lock, Preserve) {
  VEND, 32,
  PRD, 32,
  UNUS, 32
  GCNF, 32
  }
 
  ...
 
  Method(_CRS, 0) {
 Store(GCNF, Local0)
 If (LEqual(LAnd(Local0, 3), 3)) {   // Legacy replacement route
  ConcatenateResTemplate(RESP, RESI, Local1)
  Return (Local1)
 } else {
  Return (RESP)
 }
 }

Which reminds me. We run C preprocessor over the source so there's
no good reason to use 4-byte names anymore really (iasl also has
an integrated preprocessor but that only appeared in 2012, not
sure it's wise to rely on that).

So simply
#define HPET_MEMORY HPTM

and use HPET_MEMORY everywhere.

 If that doesn't work, there are various choices here...
 
 (1) Does Mac OS work if you add a _PRS with IRQNoFlags and
 Memory32Fixed, but leave _CRS as it is?
 
 (2) does it work with -no-hpet?
 
 (3) you could also make that dependent on _OSI(Darwin).  It's unlikely
 that Linux and/or Windows expose _OSI(Darwin), and anyway the BSOD is
 only there for Windows XP as I understand it.
 
  So, I've made it conditional on the SMC STA method returning success
  (0x0B).
 
 That would mean that running Windows XP on Mac OS X hardware breaks,
 though.
 
  The SMC node's STA method returns 0x0B unconditionally on real
  hardware. So I was planning on figuring out what's easier in the
  context of the most recent QEMU code base:
  
  1. dynamically generating (during qemu runtime initialization)
   a DSDT entry for SMC with hardcoded 0x0B STA method, whenever
  --device isa-applesmc is present on the qemu command line
  
  or
  
  2. writing a static (compile-time) SMC node but with a slightly
  smarter _STA method, which returns 0x0B when --device isa-applesmc
  was given on the cmdline, or which returns 0x00 in the absence
  of --device isa-applesmc.
 
 Either would work.  See acpi_get_misc_info and patch_ssdt in
 hw/i386/acpi-build.c.
 
 I think device-dependent ACPI stuff should become a QOM interface, but
 you need not do that.
 
 Paolo



Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 11:19, Fam Zheng ha scritto:
 So that you can add the same Error to multiple items of the array with a
 single bdrv_op_block call (by ORing them into the second parameter).
 
 What data type to contain this? I'm not sure 64 is enough in long term...

I would just use an uint64_t, I think you have ~20 operations now.
There is still quite some breathing room.

Paolo




Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState

2013-11-26 Thread Fam Zheng
On Tue, Nov 26, 2013 at 6:25 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 26/11/2013 11:19, Fam Zheng ha scritto:
 So that you can add the same Error to multiple items of the array with a
 single bdrv_op_block call (by ORing them into the second parameter).

 What data type to contain this? I'm not sure 64 is enough in long term...

 I would just use an uint64_t, I think you have ~20 operations now.
 There is still quite some breathing room.

OK. Then, can we still use QAPI enum? Because I think being possible
to query blockers makes sense, that way the user doesn't have to trial
and error to know what operation is blocked.

Fam



Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState

2013-11-26 Thread Fam Zheng
On Tue, Nov 26, 2013 at 5:22 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 26/11/2013 03:07, Fam Zheng ha scritto:

 +void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error
 *reason)

 What about making BlockOpType a bitmask, and having bdrv_op_{,un}block
 take multiple ORed BlockOpTypes?

 bdrv_op_{,un}block_all then are not necessary, you only need a
 BLOCK_OPERATION_ALL value.


 Bitmap is not enough, at least it should be an array. For example when
 we enable multiple block jobs, there're two stoppers for drive_del, right?

 I said bitmask, not bitmap. :)


OK. Sorry..

 So that you can add the same Error to multiple items of the array with a
 single bdrv_op_block call (by ORing them into the second parameter).


What data type to contain this? I'm not sure 64 is enough in long term...

Fam



Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 11:31, Fam Zheng ha scritto:
  I would just use an uint64_t, I think you have ~20 operations now.
  There is still quite some breathing room.
 
 OK. Then, can we still use QAPI enum? Because I think being possible
 to query blockers makes sense, that way the user doesn't have to trial
 and error to know what operation is blocked.

Yes, we can.  You would still have to add a bunch of

#define BLOCK_OPERATION_MASK_RESIZE (1ULL  BLOCK_OPERATION_TYPE_RESIZE)

in block.h.

Paolo



[Qemu-devel] [PATCH] qcow2: Zero-initialise first cluster for new images

2013-11-26 Thread Kevin Wolf
Strictly speaking, this is only required for has_zero_init() == false,
but it's easy enough to just do a cluster-aligned write that is padded
with zeros after the header.

This fixes that after 'qemu-img create' header extensions are attempted
to be parsed that are really just random leftover data.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/qcow2.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6e5d98d..7c18587 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1471,7 +1471,7 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
  * size for any qcow2 image.
  */
 BlockDriverState* bs;
-QCowHeader header;
+QCowHeader *header;
 uint8_t* refcount_table;
 Error *local_err = NULL;
 int ret;
@@ -1489,30 +1489,35 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 }
 
 /* Write the header */
-memset(header, 0, sizeof(header));
-header.magic = cpu_to_be32(QCOW_MAGIC);
-header.version = cpu_to_be32(version);
-header.cluster_bits = cpu_to_be32(cluster_bits);
-header.size = cpu_to_be64(0);
-header.l1_table_offset = cpu_to_be64(0);
-header.l1_size = cpu_to_be32(0);
-header.refcount_table_offset = cpu_to_be64(cluster_size);
-header.refcount_table_clusters = cpu_to_be32(1);
-header.refcount_order = cpu_to_be32(3 + REFCOUNT_SHIFT);
-header.header_length = cpu_to_be32(sizeof(header));
+QEMU_BUILD_BUG_ON((1  MIN_CLUSTER_BITS)  sizeof(*header));
+header = g_malloc(cluster_size);
+memset(header, 0, cluster_size);
+*header = (QCowHeader) {
+.magic  = cpu_to_be32(QCOW_MAGIC),
+.version= cpu_to_be32(version),
+.cluster_bits   = cpu_to_be32(cluster_bits),
+.size   = cpu_to_be64(0),
+.l1_table_offset= cpu_to_be64(0),
+.l1_size= cpu_to_be32(0),
+.refcount_table_offset  = cpu_to_be64(cluster_size),
+.refcount_table_clusters= cpu_to_be32(1),
+.refcount_order = cpu_to_be32(3 + REFCOUNT_SHIFT),
+.header_length  = cpu_to_be32(sizeof(*header)),
+};
 
 if (flags  BLOCK_FLAG_ENCRYPT) {
-header.crypt_method = cpu_to_be32(QCOW_CRYPT_AES);
+header-crypt_method = cpu_to_be32(QCOW_CRYPT_AES);
 } else {
-header.crypt_method = cpu_to_be32(QCOW_CRYPT_NONE);
+header-crypt_method = cpu_to_be32(QCOW_CRYPT_NONE);
 }
 
 if (flags  BLOCK_FLAG_LAZY_REFCOUNTS) {
-header.compatible_features |=
+header-compatible_features |=
 cpu_to_be64(QCOW2_COMPAT_LAZY_REFCOUNTS);
 }
 
-ret = bdrv_pwrite(bs, 0, header, sizeof(header));
+ret = bdrv_pwrite(bs, 0, header, cluster_size);
+g_free(header);
 if (ret  0) {
 error_setg_errno(errp, -ret, Could not write qcow2 header);
 goto out;
-- 
1.8.1.4




Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info

2013-11-26 Thread Gerd Hoffmann
  Hi,

 I think it's down to other qemu bugs (such as _CRS not covering
 all of PCI memory), we shall just fix them.

i.e. the attached patch should fixup things.

cheers,
  Gerd

From a81b8d66e24fd298ce7654d424a378337e6cf132 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann kra...@redhat.com
Date: Tue, 26 Nov 2013 11:46:11 +0100
Subject: [PATCH] piix: fix 32bit pci hole

Make the 32bit pci hole start at end of ram, so all possible address
space is covered.  Of course the firmware can use less than that.
Leaving space unused is no problem, mapping pci bars outside the
hole causes problems though.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/pci-host/piix.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index edc974e..1414a2b 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -345,15 +345,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state,
 f-ram_memory = ram_memory;
 
 i440fx = I440FX_PCI_HOST_BRIDGE(dev);
-/* Set PCI window size the way seabios has always done it. */
-/* Power of 2 so bios can cover it with a single MTRR */
-if (ram_size = 0x8000) {
-i440fx-pci_info.w32.begin = 0x8000;
-} else if (ram_size = 0xc000) {
-i440fx-pci_info.w32.begin = 0xc000;
-} else {
-i440fx-pci_info.w32.begin = 0xe000;
-}
+i440fx-pci_info.w32.begin = ram_size;
 
 memory_region_init_alias(f-pci_hole, OBJECT(d), pci-hole, f-pci_address_space,
  pci_hole_start, pci_hole_size);
-- 
1.8.3.1



Re: [Qemu-devel] [PATCH v4 7/7] block: Allow backup on referenced named BlockDriverState

2013-11-26 Thread Kevin Wolf
Am 26.11.2013 um 04:06 hat Fam Zheng geschrieben:
 On 2013年11月25日 19:23, Kevin Wolf wrote:
 Am 22.11.2013 um 06:24 hat Fam Zheng geschrieben:
 Drive backup is a read only operation on source bs. We want to allow
 this specific case to enable image-fleecing. Note that when
 image-fleecing job starts, the job still add its blocker to source bs,
 and any other operation on it will be blocked by that.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
   block.c | 2 ++
   1 file changed, 2 insertions(+)
 
 diff --git a/block.c b/block.c
 index a5da656..d30be51 100644
 --- a/block.c
 +++ b/block.c
 @@ -1179,6 +1179,8 @@ int bdrv_open(BlockDriverState *bs, const char 
 *filename, QDict *options,
  device is used as backing hd of '%s',
  bs-device_name);
   bdrv_op_block_all(bs-backing_hd, bs-backing_blocker);
 +bdrv_op_unblock(bs-backing_hd, BLOCK_OP_TYPE_BACKUP,
 +bs-backing_blocker);
   pstrcpy(bs-backing_file, sizeof(bs-backing_file),
   bs-backing_hd-filename);
   pstrcpy(bs-backing_format, sizeof(bs-backing_format),
 
 We probably need separate blockers for can be a backup source and can
 be a backup target. Because I think this allows using it as a
 read-write target as well, which was not intended.
 
 
 Yes. Will do it.
 
 Do we need to cover this in other parts of the code as well, like when
 adding a new BDS during external snapshot creation?
 
 
 Does it have a name? If not I think we are safe there.

Not yet, but I think it won't be long until we do have named nodes
created this way, so considering it now certainly can't hurt.

Kevin



Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState

2013-11-26 Thread Fam Zheng
On Tue, Nov 26, 2013 at 6:41 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 26/11/2013 11:31, Fam Zheng ha scritto:
  I would just use an uint64_t, I think you have ~20 operations now.
  There is still quite some breathing room.
 
 OK. Then, can we still use QAPI enum? Because I think being possible
 to query blockers makes sense, that way the user doesn't have to trial
 and error to know what operation is blocked.

 Yes, we can.  You would still have to add a bunch of

 #define BLOCK_OPERATION_MASK_RESIZE (1ULL  BLOCK_OPERATION_TYPE_RESIZE)

 in block.h.


Well that works, but then we have a repeated list of operations in two
places, I'm not sure that would be good to have.

Kevin, Stefan, any opinion?

Fam



Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram

2013-11-26 Thread Lei Li

On 11/25/2013 05:48 PM, Paolo Bonzini wrote:

Il 25/11/2013 08:29, Lei Li ha scritto:


In this case, if the migration would fail just because the misconfiguration
of device state on destination, in the meantime the outgoing migration has
no aware of this failure, I think it should add such handling (like synchronize
of the device state list in incoming side?) to the current migration protocol
as it is kind of missing... It can not just rely on the resume of source
guest for such failure... or maybe it should be handled in management
app to force the configuration right?

It is already handled by libvirt, indeed.

Basically, -incoming without -S is a broken option because of the
missing handshake at the end of migration.  With -S something else
(either a human or a program) can check that everything went well and
choose whether to restart the source or the destination.


I see, thanks for your explanation.  :-)

BTW, do you think we should add such handling to the current migration
protocol?




Postcopy would fix this (assuming the postcopy phase is reliable) by
migrating device data before any page flipping occurs.

Are you suggesting that page flipping should be coupled with the postcopy
migration for live upgrade of QEMU as your comments in the previous
version?

In order to make live upgrade reliable, it should.


The whole procedure for page flipping migration is straight forward, and
the cases of failure I listed are in theory, which never happened at least
since many times I have tested (except the case you raised above). But I
agree with you on coupling with postcopy migration to make it more reliable,
specially for the undetected problems.

For this, I am not quite sure I understand it correctly, seems the latest
update of post copy migration was sent on last Oct, would you please give
some insights on what else could I do for the coupling with postcopy migration?

If no, now page flipping is implemented as a migration capability, and it's
a good shape already as your comments in the previous version. Although it
still needs a little more time to get the numbers of the new vmsplice, I'd to
ask your opinion that do you consider it could be merged as an experimental
version for now?




Paolo




--
Lei




Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 12:07, Lei Li ha scritto:
 Basically, -incoming without -S is a broken option because of the
 missing handshake at the end of migration.  With -S something else
 (either a human or a program) can check that everything went well and
 choose whether to restart the source or the destination.
 
 I see, thanks for your explanation.  :-)
 
 BTW, do you think we should add such handling to the current migration
 protocol?

I think it's not included by design.

 The whole procedure for page flipping migration is straight forward, and
 the cases of failure I listed are in theory, which never happened at least
 since many times I have tested (except the case you raised above). But I
 agree with you on coupling with postcopy migration to make it more
 reliable, specially for the undetected problems.

The only problem that worries me is failing to load device data (most
likely due to misconfiguration or a bug).

 For this, I am not quite sure I understand it correctly, seems the latest
 update of post copy migration was sent on last Oct, would you please give
 some insights on what else could I do for the coupling with postcopy
 migration?

I don't know the state exactly.  Orit and Andrea should know.

 If no, now page flipping is implemented as a migration capability, and it's
 a good shape already as your comments in the previous version. Although it
 still needs a little more time to get the numbers of the new vmsplice,
 I'd to ask your opinion that do you consider it could be merged as an
 experimental version for now?

Yes, that could be useful.  I will review the patch as soon as possible.

Paolo



Re: [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit

2013-11-26 Thread Paolo Bonzini
Il 21/11/2013 10:11, Lei Li ha scritto:
 This patch implements save_page callback for the outside
 of page flipping. It will write the address of the page
 on the Unix socket and flip the page data on pipe by
 vmsplice(). Every page address would have a header flag
 RAM_SAVE_FLAG_HOOK, which will trigger the load hook to
 receive it in incoming side as well.
 
 Signed-off-by: Lei Li li...@linux.vnet.ibm.com
 ---
  migration-local.c |   54 
 +
  1 files changed, 54 insertions(+), 0 deletions(-)
 
 diff --git a/migration-local.c b/migration-local.c
 index 0f0896b..14207e9 100644
 --- a/migration-local.c
 +++ b/migration-local.c
 @@ -200,6 +200,59 @@ static int qemu_local_send_pipefd(QEMUFile *f, void 
 *opaque,
  return 0;
  }
  
 +static size_t qemu_local_save_ram(QEMUFile *f, void *opaque,
 +  MemoryRegion *mr, ram_addr_t offset,
 +  size_t size, int *bytes_sent)
 +{
 +QEMUFileLocal *s = opaque;
 +ram_addr_t current_addr = mr-ram_addr + offset;
 +void *ram_addr;
 +ssize_t ret;
 +
 +if (s-unix_page_flipping) {
 +qemu_fflush(s-file);
 +qemu_put_be64(s-file, RAM_SAVE_FLAG_HOOK);
 +
 +/* Write page address to unix socket */
 +qemu_put_be64(s-file, current_addr);
 +

You can write current_addr | RAM_SAVE_FLAG_HOOK.  The value will be in
the flags argument of the hook_ram_load, you can extract it with flags
 ~RAM_SAVE_FLAG_HOOK.  This cuts by half the data written to the Unix
socket.

Paolo

 +ram_addr = memory_region_get_ram_ptr(mr) + offset;
 +
 +/* vmsplice page data to pipe */
 +struct iovec iov = {
 +.iov_base = ram_addr,
 +.iov_len  = size,
 +};
 +
 +/*
 + * The flag SPLICE_F_MOVE is introduced in kernel for the page
 + * flipping feature in QEMU, which will movie pages rather than
 + * copying, previously unused.
 + *
 + * If a move is not possible the kernel will transparently falls
 + * back to copying data.
 + *
 + * For older kernels the SPLICE_F_MOVE would be ignored and a copy
 + * would occur.
 + */
 +ret = vmsplice(s-pipefd[1], iov, 1, SPLICE_F_GIFT | SPLICE_F_MOVE);
 +if (ret == -1) {
 +if (errno != EAGAIN  errno != EINTR) {
 +fprintf(stderr, vmsplice save error: %s\n, 
 strerror(errno));
 +return ret;
 +}
 +} else {
 +if (bytes_sent) {
 +*bytes_sent = 1;
 +}
 +DPRINTF(block_offset: %lu, offset: %lu\n, block_offset, 
 offset);
 +return 0;
 +}
 +}
 +
 +return RAM_SAVE_CONTROL_NOT_SUPP;
 +}
 +
  static const QEMUFileOps pipe_read_ops = {
  .get_fd= qemu_local_get_sockfd,
  .get_buffer= qemu_local_get_buffer,
 @@ -211,6 +264,7 @@ static const QEMUFileOps pipe_write_ops = {
  .writev_buffer  = qemu_local_writev_buffer,
  .close  = qemu_local_close,
  .before_ram_iterate = qemu_local_send_pipefd,
 +.save_page  = qemu_local_save_ram
  };
  
  QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode)
 




Re: [Qemu-devel] [PATCH] SPARC: Fix LEON3 power down instruction

2013-11-26 Thread Fabien Chouteau
On 11/25/2013 03:22 PM, Sebastian Huber wrote:
 The env-pc is not necessarily up-to-date in the helper function.  Use
 the program counter of the disassembly context instead.
 

Looks good. Thanks Sebastian.

Reviewed-by: Fabien Chouteau chout...@adacore.com

 Signed-off-by: Sebastian Huber sebastian.hu...@embedded-brains.de
 ---
  target-sparc/helper.c|6 +++---
  target-sparc/helper.h|2 +-
  target-sparc/translate.c |3 ++-
  3 files changed, 6 insertions(+), 5 deletions(-)
 
 diff --git a/target-sparc/helper.c b/target-sparc/helper.c
 index e70d1bc..50912ff 100644
 --- a/target-sparc/helper.c
 +++ b/target-sparc/helper.c
 @@ -314,14 +314,14 @@ target_ulong helper_tsubcctv(CPUSPARCState *env, 
 target_ulong src1,
  }
  
  #ifndef TARGET_SPARC64
 -void helper_power_down(CPUSPARCState *env)
 +void helper_power_down(CPUSPARCState *env, uint32_t pc)
  {
  CPUState *cs = CPU(sparc_env_get_cpu(env));
  
  cs-halted = 1;
  env-exception_index = EXCP_HLT;
 -env-pc = env-npc;
 -env-npc = env-pc + 4;
 +env-pc = pc + 4;
 +env-npc = pc + 8;
  cpu_loop_exit(env);
  }
  #endif
 diff --git a/target-sparc/helper.h b/target-sparc/helper.h
 index fc49cd8..c4752c7 100644
 --- a/target-sparc/helper.h
 +++ b/target-sparc/helper.h
 @@ -6,7 +6,7 @@ DEF_HELPER_2(trace_insn, void, env, i32)
  DEF_HELPER_1(rett, void, env)
  DEF_HELPER_2(wrpsr, void, env, tl)
  DEF_HELPER_1(rdpsr, tl, env)
 -DEF_HELPER_1(power_down, void, env)
 +DEF_HELPER_2(power_down, void, env, i32)
  #else
  DEF_HELPER_2(wrpil, void, env, tl)
  DEF_HELPER_2(wrpstate, void, env, tl)
 diff --git a/target-sparc/translate.c b/target-sparc/translate.c
 index 0588d23..d9ee90c 100644
 --- a/target-sparc/translate.c
 +++ b/target-sparc/translate.c
 @@ -3631,7 +3631,8 @@ static void disas_sparc_insn(DisasContext * dc, 
 unsigned int insn)
  if ((rd == 0x13)  (dc-def-features 
   CPU_FEATURE_POWERDOWN)) 
 {
  /* LEON3 power-down */
 -gen_helper_power_down(cpu_env);
 +tcg_gen_movi_i32(cpu_tmp0, dc-pc);
 +gen_helper_power_down(cpu_env, cpu_tmp0);
  }
  break;
  #else
 




Re: [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load

2013-11-26 Thread Paolo Bonzini
Il 21/11/2013 10:11, Lei Li ha scritto:
 +static int qemu_local_ram_load(QEMUFile *f, void *opaque,
 +   uint64_t flags)
 +{
 +QEMUFileLocal *s = opaque;
 +ram_addr_t addr;
 +struct iovec iov;
 +ssize_t ret = -EINVAL;
 +
 +/*
 + * PIPE file descriptor will be received by another callback
 + * get_buffer.
 + */
 +if (pipefd_passed) {
 +void *host;
 +/*
 + * Extract the page address from the 8-byte record and
 + * read the page data from the pipe.
 + */
 +addr = qemu_get_be64(s-file);
 +host = qemu_get_ram_ptr(addr);
 +
 +iov.iov_base = host;
 +iov.iov_len = TARGET_PAGE_SIZE;
 +
 +/* The flag SPLICE_F_MOVE is introduced in kernel for the page
 + * flipping feature in QEMU, which will movie pages rather than
 + * copying, previously unused.
 + *
 + * If a move is not possible the kernel will transparently falls
 + * back to copying data.
 + *
 + * For older kernels the SPLICE_F_MOVE would be ignored and a copy
 + * would occur.
 + */
 +ret = vmsplice(s-pipefd[0], iov, 1, SPLICE_F_MOVE);
 +if (ret == -1) {
 +if (errno != EAGAIN  errno != EINTR) {
 +fprintf(stderr, vmsplice() load error: %s, 
 strerror(errno));
 +return ret;
 +}
 +DPRINTF(vmsplice load error\n);
 +} else if (ret == 0) {
 +DPRINTF(stderr, load_page: zero read\n);
 +}
 +
 +DPRINTF(vmsplice (read): %zu\n, ret);
 +return ret;
 +}
 +
 +return 0;
 +}

I think you need to return -EINVAL if there is no pipe.

Paolo



Re: [Qemu-devel] [PATCH 06/17] migration-local: add send_pipefd()

2013-11-26 Thread Paolo Bonzini
Il 21/11/2013 10:11, Lei Li ha scritto:
 +struct cmsghdr *cmptr;
 +char req[1] = { 0x01 };

About this, see my reply to patch 8.

 +if (pipefd  0) {
 +msg.msg_control = NULL;
 +msg.msg_controllen = 0;
 +/* Negative status means error */
 +req[0] = pipefd;

No need for this.  qemu_fopen_socket_local has failed already, and you
will never get here.

Paolo

 +} else {
 +msg.msg_control = control_un.control;
 +msg.msg_controllen = sizeof(control_un.control);
 +
 +cmptr = CMSG_FIRSTHDR(msg);
 +cmptr-cmsg_len = CMSG_LEN(sizeof(int));
 +cmptr-cmsg_level = SOL_SOCKET;
 +cmptr-cmsg_type = SCM_RIGHTS;
 +*((int *) CMSG_DATA(cmptr)) = pipefd;
 +
 +msg.msg_name = NULL;
 +msg.msg_namelen = 0;
 +
 +iov[0].iov_base = req;
 +iov[0].iov_len = sizeof(req);
 +msg.msg_iov = iov;
 +msg.msg_iovlen = 1;
 +}




Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer

2013-11-26 Thread Lei Li

On 11/21/2013 05:11 PM, Lei Li wrote:

The control message for exchange of pipe file descriptor should
be received by recvmsg, and it might be eaten to stream file by
qemu_recv() when receiving by two callbacks. So this patch adds
unix_msgfd_lookup() to callback get_buffer as the only one receiver,
where the pipe file descriptor would be caughted.

Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
  migration-local.c |   68 ++--
  1 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/migration-local.c b/migration-local.c
index e028beb..0f0896b 100644
--- a/migration-local.c
+++ b/migration-local.c
@@ -50,6 +50,8 @@ typedef struct QEMUFileLocal {
  bool unix_page_flipping;
  } QEMUFileLocal;

+static bool pipefd_passed;
+
  static int qemu_local_get_sockfd(void *opaque)
  {
  QEMUFileLocal *s = opaque;
@@ -57,16 +59,76 @@ static int qemu_local_get_sockfd(void *opaque)
  return s-sockfd;
  }

+static int unix_msgfd_lookup(void *opaque, struct msghdr *msg)
+{
+QEMUFileLocal *s = opaque;
+struct cmsghdr *cmsg;
+bool found = false;
+
+for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
+if (cmsg-cmsg_len != CMSG_LEN(sizeof(int)) ||
+cmsg-cmsg_level != SOL_SOCKET ||
+cmsg-cmsg_type != SCM_RIGHTS)
+continue;
+
+/* PIPE file descriptor to be received */
+s-pipefd[0] = *((int *)CMSG_DATA(cmsg));
+}
+
+if (s-pipefd[0] = 0) {


And this should be if (s-pipefd[0]  0)..


+fprintf(stderr, no pipe fd can be received\n);
+return found;
+}
+
+DPRINTF(pipefd successfully received\n);
+return s-pipefd[0];
+}
+
  static int qemu_local_get_buffer(void *opaque, uint8_t *buf,
   int64_t pos, int size)
  {
  QEMUFileLocal *s = opaque;
  ssize_t len;
+struct msghdr msg = { NULL, };
+struct iovec iov[1];
+union {
+struct cmsghdr cmsg;
+char control[CMSG_SPACE(sizeof(int))];
+} msg_control;
+
+iov[0].iov_base = buf;
+iov[0].iov_len = size;
+
+msg.msg_iov = iov;
+msg.msg_iovlen = 1;
+msg.msg_control = msg_control;
+msg.msg_controllen = sizeof(msg_control);

  for (;;) {
-len = qemu_recv(s-sockfd, buf, size, 0);
-if (len != -1) {
-break;
+if (!pipefd_passed) {
+/*
+ * recvmsg is called here to catch the control message for
+ * the exchange of PIPE file descriptor until it is received.
+ */
+len = recvmsg(s-sockfd, msg, 0);
+if (len != -1) {
+if (unix_msgfd_lookup(s, msg)  0) {
+pipefd_passed = 1;
+/*
+ * Do not count one byte taken by the PIPE file
+ * descriptor.
+ */
+len--;
+} else {
+len = -1;
+}


Just found that this 'else' should go away as it will break the normal
Unix migration since pipefd_passed will always be 0 for it. I have
fixed this in my code, seems I mis-send it for some reason, sorry
for this...:-[  


+break;
+}
+} else {
+len = qemu_recv(s-sockfd, buf, size, 0);
+if (len != -1) {
+break;
+}
  }

  if (socket_error() == EAGAIN) {



--
Lei




Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer

2013-11-26 Thread Paolo Bonzini
Il 21/11/2013 10:11, Lei Li ha scritto:
 +/*
 + * recvmsg is called here to catch the control message for
 + * the exchange of PIPE file descriptor until it is received.
 + */
 +len = recvmsg(s-sockfd, msg, 0);
 +if (len != -1) {
 +if (unix_msgfd_lookup(s, msg)  0) {
 +pipefd_passed = 1;
 +/*
 + * Do not count one byte taken by the PIPE file
 + * descriptor.
 + */
 +len--;

I think adding a byte in the middle of the stream is not reliable.

Rather, you should transmit the socket always at the same place, for
example in the first call of qemu_local_save_ram, after it has written
the 64-bit field.

The matching code in qemu_local_ram_load will be like this:

static int qemu_local_ram_load(QEMUFile *f, void *opaque,
   uint64_t flags)
{
QEMUFileLocal *s = opaque;
ram_addr_t addr;
struct iovec iov;
ssize_t ret = -EINVAL;

if (!s-pipefd_received) {
/*
 * send_pipefd was called at this point, and it wrote one byte
 * to the stream.
 */
qemu_get_byte(s);
s-pipefd_received = true;
}

if (pipefd_passed) {
...
}
return -EINVAL;
}

Also, please move pipefd_passed within QEMUFileLocal.

Thanks,

Paolo



Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping

2013-11-26 Thread Paolo Bonzini
Il 21/11/2013 10:11, Lei Li ha scritto:
 Signed-off-by: Lei Li li...@linux.vnet.ibm.com
 ---
  migration.c |   10 +++---
  1 files changed, 7 insertions(+), 3 deletions(-)
 
 diff --git a/migration.c b/migration.c
 index 4ac466b..0f98ac1 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -579,10 +579,11 @@ static void *migration_thread(void *opaque)
  pending_size = qemu_savevm_state_pending(s-file, max_size);
  DPRINTF(pending size % PRIu64  max % PRIu64 \n,
  pending_size, max_size);
 -if (pending_size  pending_size = max_size) {
 +if (pending_size  pending_size = max_size 
 +!runstate_needs_reset()) {
  qemu_savevm_state_iterate(s-file);

I'm not sure why you need this.

  } else {
 -int ret;
 +int ret = 0;
  
  DPRINTF(done iterating\n);
  qemu_mutex_lock_iothread();
 @@ -590,7 +591,10 @@ static void *migration_thread(void *opaque)
  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
  old_vm_running = runstate_is_running();
  
 -ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 +if (!runstate_needs_reset()) {
 +ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
 +}

This however is okay.

Paolo

  if (ret = 0) {
  qemu_file_set_rate_limit(s-file, INT_MAX);
  qemu_savevm_state_complete(s-file);
 




Re: [Qemu-devel] [PATCH 29/60] AArch64: Add orri instruction emulation

2013-11-26 Thread Claudio Fontana
On 09/27/2013 09:42 PM, Richard Henderson wrote:
 On 09/26/2013 05:48 PM, Alexander Graf wrote:
 +if (setflags) {
 +tcg_dst = cpu_reg(dest);
 +} else {
 +tcg_dst = cpu_reg_sp(dest);
 +}
 
 Never sp for logicals.

This should be ok in my view, the manual explicitly shows in the pseudocode:

if d == 31  !setflags then
 SP[] = result;
else
 X[d] = result;

Claudio

 
 +handle_orri(s, insn);
 
 And yet again, a better function name.
 
 
 r~
 




Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping

2013-11-26 Thread Lei Li

On 11/26/2013 07:32 PM, Paolo Bonzini wrote:

Il 21/11/2013 10:11, Lei Li ha scritto:

Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
  migration.c |   10 +++---
  1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/migration.c b/migration.c
index 4ac466b..0f98ac1 100644
--- a/migration.c
+++ b/migration.c
@@ -579,10 +579,11 @@ static void *migration_thread(void *opaque)
  pending_size = qemu_savevm_state_pending(s-file, max_size);
  DPRINTF(pending size % PRIu64  max % PRIu64 \n,
  pending_size, max_size);
-if (pending_size  pending_size = max_size) {
+if (pending_size  pending_size = max_size 
+!runstate_needs_reset()) {
  qemu_savevm_state_iterate(s-file);

I'm not sure why you need this.


The adjustment here is to avoid the iteration stage for page flipping.
Because pending_size = ram_save_remaining() * TARGET_PAGE_SIZE which is
not 0 and pending_size  max_size (0) at start.

In the previous version it was like this:

if (pending_size  pending_size = max_size 
!migrate_unix_page_flipping()) {

And you said 'This is a bit ugly but I understand the need. Perhaps 
!runstate_needs_reset() like below?' :)




  } else {
-int ret;
+int ret = 0;
  
  DPRINTF(done iterating\n);

  qemu_mutex_lock_iothread();
@@ -590,7 +591,10 @@ static void *migration_thread(void *opaque)
  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
  old_vm_running = runstate_is_running();
  
-ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);

+if (!runstate_needs_reset()) {
+ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+}

This however is okay.

Paolo


  if (ret = 0) {
  qemu_file_set_rate_limit(s-file, INT_MAX);
  qemu_savevm_state_complete(s-file);






--
Lei




Re: [Qemu-devel] [PATCH 29/60] AArch64: Add orri instruction emulation

2013-11-26 Thread Laurent Desnogues
On Tue, Nov 26, 2013 at 12:56 PM, Claudio Fontana
claudio.font...@linaro.org wrote:
 On 09/27/2013 09:42 PM, Richard Henderson wrote:
 On 09/26/2013 05:48 PM, Alexander Graf wrote:
 +if (setflags) {
 +tcg_dst = cpu_reg(dest);
 +} else {
 +tcg_dst = cpu_reg_sp(dest);
 +}

 Never sp for logicals.

 This should be ok in my view, the manual explicitly shows in the pseudocode:

 if d == 31  !setflags then
  SP[] = result;
 else
  X[d] = result;

Agreed:  for immediate logical instructions, destination can be SP
except for ANDS.  ANDS with destination as r31 is aliased to TST.


Laurent

 Claudio


 +handle_orri(s, insn);

 And yet again, a better function name.


 r~





Re: [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit

2013-11-26 Thread Lei Li

On 11/26/2013 07:22 PM, Paolo Bonzini wrote:

Il 21/11/2013 10:11, Lei Li ha scritto:

This patch implements save_page callback for the outside
of page flipping. It will write the address of the page
on the Unix socket and flip the page data on pipe by
vmsplice(). Every page address would have a header flag
RAM_SAVE_FLAG_HOOK, which will trigger the load hook to
receive it in incoming side as well.

Signed-off-by: Lei Li li...@linux.vnet.ibm.com
---
  migration-local.c |   54 +
  1 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/migration-local.c b/migration-local.c
index 0f0896b..14207e9 100644
--- a/migration-local.c
+++ b/migration-local.c
@@ -200,6 +200,59 @@ static int qemu_local_send_pipefd(QEMUFile *f, void 
*opaque,
  return 0;
  }
  
+static size_t qemu_local_save_ram(QEMUFile *f, void *opaque,

+  MemoryRegion *mr, ram_addr_t offset,
+  size_t size, int *bytes_sent)
+{
+QEMUFileLocal *s = opaque;
+ram_addr_t current_addr = mr-ram_addr + offset;
+void *ram_addr;
+ssize_t ret;
+
+if (s-unix_page_flipping) {
+qemu_fflush(s-file);
+qemu_put_be64(s-file, RAM_SAVE_FLAG_HOOK);
+
+/* Write page address to unix socket */
+qemu_put_be64(s-file, current_addr);
+

You can write current_addr | RAM_SAVE_FLAG_HOOK.  The value will be in
the flags argument of the hook_ram_load, you can extract it with flags
 ~RAM_SAVE_FLAG_HOOK.  This cuts by half the data written to the Unix
socket.


OK, thanks.


Paolo


+ram_addr = memory_region_get_ram_ptr(mr) + offset;
+
+/* vmsplice page data to pipe */
+struct iovec iov = {
+.iov_base = ram_addr,
+.iov_len  = size,
+};
+
+/*
+ * The flag SPLICE_F_MOVE is introduced in kernel for the page
+ * flipping feature in QEMU, which will movie pages rather than
+ * copying, previously unused.
+ *
+ * If a move is not possible the kernel will transparently falls
+ * back to copying data.
+ *
+ * For older kernels the SPLICE_F_MOVE would be ignored and a copy
+ * would occur.
+ */
+ret = vmsplice(s-pipefd[1], iov, 1, SPLICE_F_GIFT | SPLICE_F_MOVE);
+if (ret == -1) {
+if (errno != EAGAIN  errno != EINTR) {
+fprintf(stderr, vmsplice save error: %s\n, strerror(errno));
+return ret;
+}
+} else {
+if (bytes_sent) {
+*bytes_sent = 1;
+}
+DPRINTF(block_offset: %lu, offset: %lu\n, block_offset, offset);
+return 0;
+}
+}
+
+return RAM_SAVE_CONTROL_NOT_SUPP;
+}
+
  static const QEMUFileOps pipe_read_ops = {
  .get_fd= qemu_local_get_sockfd,
  .get_buffer= qemu_local_get_buffer,
@@ -211,6 +264,7 @@ static const QEMUFileOps pipe_write_ops = {
  .writev_buffer  = qemu_local_writev_buffer,
  .close  = qemu_local_close,
  .before_ram_iterate = qemu_local_send_pipefd,
+.save_page  = qemu_local_save_ram
  };
  
  QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode)







--
Lei




Re: [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load

2013-11-26 Thread Lei Li

On 11/26/2013 07:25 PM, Paolo Bonzini wrote:

Il 21/11/2013 10:11, Lei Li ha scritto:

+static int qemu_local_ram_load(QEMUFile *f, void *opaque,
+   uint64_t flags)
+{
+QEMUFileLocal *s = opaque;
+ram_addr_t addr;
+struct iovec iov;
+ssize_t ret = -EINVAL;
+
+/*
+ * PIPE file descriptor will be received by another callback
+ * get_buffer.
+ */
+if (pipefd_passed) {
+void *host;
+/*
+ * Extract the page address from the 8-byte record and
+ * read the page data from the pipe.
+ */
+addr = qemu_get_be64(s-file);
+host = qemu_get_ram_ptr(addr);
+
+iov.iov_base = host;
+iov.iov_len = TARGET_PAGE_SIZE;
+
+/* The flag SPLICE_F_MOVE is introduced in kernel for the page
+ * flipping feature in QEMU, which will movie pages rather than
+ * copying, previously unused.
+ *
+ * If a move is not possible the kernel will transparently falls
+ * back to copying data.
+ *
+ * For older kernels the SPLICE_F_MOVE would be ignored and a copy
+ * would occur.
+ */
+ret = vmsplice(s-pipefd[0], iov, 1, SPLICE_F_MOVE);
+if (ret == -1) {
+if (errno != EAGAIN  errno != EINTR) {
+fprintf(stderr, vmsplice() load error: %s, strerror(errno));
+return ret;
+}
+DPRINTF(vmsplice load error\n);
+} else if (ret == 0) {
+DPRINTF(stderr, load_page: zero read\n);
+}
+
+DPRINTF(vmsplice (read): %zu\n, ret);
+return ret;
+}
+
+return 0;
+}

I think you need to return -EINVAL if there is no pipe.


Yes, you are right..



Paolo




--
Lei




Re: [Qemu-devel] [Qemu-stable] [PATCH] qcow2: Zero-initialise first cluster for new images

2013-11-26 Thread Fam Zheng

On 2013年11月26日 18:48, Kevin Wolf wrote:

Strictly speaking, this is only required for has_zero_init() == false,
but it's easy enough to just do a cluster-aligned write that is padded
with zeros after the header.

This fixes that after 'qemu-img create' header extensions are attempted
to be parsed that are really just random leftover data.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf kw...@redhat.com
---
  block/qcow2.c | 37 +
  1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6e5d98d..7c18587 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1471,7 +1471,7 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
   * size for any qcow2 image.
   */
  BlockDriverState* bs;
-QCowHeader header;
+QCowHeader *header;
  uint8_t* refcount_table;
  Error *local_err = NULL;
  int ret;
@@ -1489,30 +1489,35 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
  }

  /* Write the header */
-memset(header, 0, sizeof(header));
-header.magic = cpu_to_be32(QCOW_MAGIC);
-header.version = cpu_to_be32(version);
-header.cluster_bits = cpu_to_be32(cluster_bits);
-header.size = cpu_to_be64(0);
-header.l1_table_offset = cpu_to_be64(0);
-header.l1_size = cpu_to_be32(0);
-header.refcount_table_offset = cpu_to_be64(cluster_size);
-header.refcount_table_clusters = cpu_to_be32(1);
-header.refcount_order = cpu_to_be32(3 + REFCOUNT_SHIFT);
-header.header_length = cpu_to_be32(sizeof(header));
+QEMU_BUILD_BUG_ON((1  MIN_CLUSTER_BITS)  sizeof(*header));
+header = g_malloc(cluster_size);
+memset(header, 0, cluster_size);


Could just be:

header = g_malloc0(cluster_size);

But either way,

Reviewed-by: Fam Zheng f...@redhat.com


+*header = (QCowHeader) {
+.magic  = cpu_to_be32(QCOW_MAGIC),
+.version= cpu_to_be32(version),
+.cluster_bits   = cpu_to_be32(cluster_bits),
+.size   = cpu_to_be64(0),
+.l1_table_offset= cpu_to_be64(0),
+.l1_size= cpu_to_be32(0),
+.refcount_table_offset  = cpu_to_be64(cluster_size),
+.refcount_table_clusters= cpu_to_be32(1),
+.refcount_order = cpu_to_be32(3 + REFCOUNT_SHIFT),
+.header_length  = cpu_to_be32(sizeof(*header)),
+};

  if (flags  BLOCK_FLAG_ENCRYPT) {
-header.crypt_method = cpu_to_be32(QCOW_CRYPT_AES);
+header-crypt_method = cpu_to_be32(QCOW_CRYPT_AES);
  } else {
-header.crypt_method = cpu_to_be32(QCOW_CRYPT_NONE);
+header-crypt_method = cpu_to_be32(QCOW_CRYPT_NONE);
  }

  if (flags  BLOCK_FLAG_LAZY_REFCOUNTS) {
-header.compatible_features |=
+header-compatible_features |=
  cpu_to_be64(QCOW2_COMPAT_LAZY_REFCOUNTS);
  }

-ret = bdrv_pwrite(bs, 0, header, sizeof(header));
+ret = bdrv_pwrite(bs, 0, header, cluster_size);
+g_free(header);
  if (ret  0) {
  error_setg_errno(errp, -ret, Could not write qcow2 header);
  goto out;






Re: [Qemu-devel] [PATCH 1.7] [PATCH v3 1/2] linux-user: create target_structsheader to place ipc_perm and shmid_dss

2013-11-26 Thread Petar Jovanovic
ping
http://patchwork.ozlabs.org/patch/287211/
http://patchwork.ozlabs.org/patch/287213/

Can we get these two changes in 1.7?
It has been reviewed and waiting for someone who has commits rights for two 
months now.

Regards,
Petar


From: Petar Jovanovic
Sent: Wednesday, November 20, 2013 12:03 AM
To: qemu-devel@nongnu.org; riku.voi...@linaro.org
Cc: Petar Jovanovic; peter.mayd...@linaro.org; aurel...@aurel32.net; 
afaer...@suse.de; Alex Bennée
Subject: RE: [Qemu-devel] [PATCH v3 1/2] linux-user: create 
target_structsheader to place ipc_perm and shmid_dss

ping
http://patchwork.ozlabs.org/patch/287211/
http://patchwork.ozlabs.org/patch/287213/


From: Petar Jovanovic
Sent: Tuesday, November 12, 2013 4:40 PM
To: qemu-devel@nongnu.org; riku.voi...@linaro.org
Cc: Petar Jovanovic; peter.mayd...@linaro.org; aurel...@aurel32.net; 
afaer...@suse.de; Alex Bennée
Subject: RE: [Qemu-devel] [PATCH v3 1/2] linux-user: create 
target_structsheader to place ipc_perm and shmid_dss

ping
http://patchwork.ozlabs.org/patch/287211/
http://patchwork.ozlabs.org/patch/287213/

Riku?

Regards,
Petar




Re: [Qemu-devel] [PATCHv2 1.8 8/9] qemu-img: add option to show progress in sectors

2013-11-26 Thread Peter Lieven

On 26.11.2013 11:04, Paolo Bonzini wrote:

I think the right way to do this would be to add the functionality to
qemu-progress.c (i.e. pass a number of sectors and let it choose between
printing % or sectors).

I was thinking about the same, but is this not beyond the scope of this patch? 
;-)
I don't mind leaving this patch out if you or the maintainers are strongly 
against it.
I just need it internally to show the progress of a convert operation. I could
theoretically calculate this the information I need also from the percentage 
output.
Its a little bit more complicated if more than one hard drive is converted.

Peter



Re: [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE

2013-11-26 Thread Paolo Bonzini
Il 21/11/2013 10:11, Lei Li ha scritto:
  
  { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },

DEBUG - MEMORY_STALE is missing.

Paolo

  { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
  { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
  { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
 +{ RUN_STATE_SUSPENDED, RUN_STATE_MEMORY_STALE },
  




Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-26 Thread Markus Armbruster
Laszlo Ersek ler...@redhat.com writes:

 On 11/25/13 16:22, Markus Armbruster wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 On 11/22/13 13:21, Markus Armbruster wrote:
 Laszlo Ersek ler...@redhat.com writes:

 This patch allows the user to usefully specify

   -drive file=img_1,if=pflash,format=raw,readonly \
   -drive file=img_2,if=pflash,format=raw

 on the command line. The flash images will be mapped under 4G in their
 reverse unit order -- that is, with their base addresses progressing
 downwards, in increasing unit order.

 (The unit number increases with command line order if not explicitly
 specified.)

 This accommodates the following use case: suppose that OVMF is split in
 two parts, a writeable host file for non-volatile variable storage, and a
 read-only part for bootstrap and decompressible executable code.

 The binary code part would be read-only, centrally managed on the host
 system, and passed in as unit 0. The variable store would be writeable,
 VM-specific, and passed in as unit 1.

   ffe0-ffe1 (prio 0, R-): system.flash1
   ffe2- (prio 0, R-): system.flash0

 (If the guest tries to write to the flash range that is backed by the
 read-only drive, bdrv_write() in pflash_update() will correctly deny the
 write with -EACCES, and pflash_update() won't care, which suits us well.)

 Before this patch:

 You can define as many if=pflash drives as you want.  Any with non-zero
 index are silently ignored.

 If you don't specify one with index=0, you get a ROM at the top of the
 32 bit address space, contents taken from -bios (default bios.bin).
 Up to its last 128KiB are aliased at the top of the ISA address space.

 If you do specify one with index=0, you get a pflash device at the top
 of the 32 bit address space, with contents from the drive, and -bios is
 silently ignored.  Up to its last 128KiB are copied to a ROM at the top
 of the (20 bit) ISA address space.

 After this patch (please correct misunderstandings):

 Now the drives after the first unused index are silently ignored.

 If you don't specify one with index=0, no change.

 If you do, you now get N pflash devices, where N is the first unused
 index.  Each pflash's contents is taken from the respective drive.  The
 flashes are mapped at the top of the 32 bit address space in reverse
 index order.  -bios is silently ignored, as before.  Up to the last
 128KiB of the index=0 flash are copied to a ROM at the top of the ISA
 address space.

 Thus, no change for index=0.  For index=1..N, we now get additional
 flash devices.

 Correct?

 Yes.
 
 Thanks.
 
 1. Multiple pflash devices
 
 Is there any way for a guest to see the N separate pflash devices, or do
 they look exactly like a single pflash device?

 The interpretation of multiple -pflash options is board specific. I
 grepped the source for IF_PFLASH first, and found some boards that use
 more than one flash device. Merging them in one contiguous target-phys
 address range would be i386 specific.

 This doesn't break compatibility (because multiple pflash devices used
 not to be supported for i386 targets at all), but I agree that this
 would posit their interpretation for the future, and thus it might need
 deeper thought.

From looking at hw/block/pflash_cfi01.c, it seems that the guest can
 interrogate the flash device about its size (nb_blocs is stored in
 cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98
 in pflash_read()). So, if the guest cares, it can figure out that there
 are multiple devices backing the range. I think.

Your stated purpose for multiple -pflash:

This accommodates the following use case: suppose that OVMF is split in
two parts, a writeable host file for non-volatile variable storage, and a
read-only part for bootstrap and decompressible executable code.

Such a split between writable part and read-only part makes sense to me.
How is it done in physical hardware?  Single device with configurable
write-protect, or two separate devices?

 2. More board code device configuration magic
 
 Our long term goal is to configure machines by configuration files
 (i.e. data) rather than code.
 
 Your patch extends the PC board code dealing with if=pflash for multiple
 drives.
 
 Reminder: -drive if=X (where X != none) creates a backend, and leaves it
 in a place where board code can find it.  Board code may elect to create
 a frontend using that backend.

 Yes, I'm aware. I did think about this -- eg. just create a drive with
 if=none, and add a frontend with -device something. But, the frontend
 here is not a device (a peripheral), it's a memory region with special
 mmio ops.

 Pflash frontends can apparently be created with -device cfi.pflash01,...':

 cfi.pflash01.drive=drive
 cfi.pflash01.num-blocks=uint32
 cfi.pflash01.sector-length=uint64
 cfi.pflash01.width=uint8
 cfi.pflash01.big-endian=uint8
 cfi.pflash01.id0=uint16
 cfi.pflash01.id1=uint16
 cfi.pflash01.id2=uint16
 

Re: [Qemu-devel] [PATCHv2 1.8 8/9] qemu-img: add option to show progress in sectors

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 13:23, Peter Lieven ha scritto:
 I think the right way to do this would be to add the functionality to
 qemu-progress.c (i.e. pass a number of sectors and let it choose between
 printing % or sectors).
 I was thinking about the same, but is this not beyond the scope of this
 patch? ;-)

I think the functionality is not important enough to warrant more code
in qemu-img.c (even 20 lines is already too much).  We should improve
the utility libraries instead.

 I don't mind leaving this patch out if you or the maintainers are
 strongly against it.

Yes, please leave it out.

Paolo



[Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Zhanghaoyu (A)
Hi all,

When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL 
return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update 
the irq routing table,
in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is 
blocked for so much time to wait RCU grace period, and during this period, this 
vcpu cannot provide service to VM,
so those interrupts delivered to this vcpu cannot be handled in time, and the 
apps running on this vcpu cannot be serviced too.
It's unacceptable in some real-time scenario, e.g. telecom. 

So, I want to create a single workqueue for each VM, to asynchronously 
performing the RCU synchronization for irq routing table, 
and let the vcpu thread return and VMENTRY to service VM immediately, no more 
need to blocked to wait RCU grace period.
And, I have implemented a raw patch, took a test in our telecom environment, 
above problem disappeared.

Any better ideas?

Thanks,
Zhang Haoyu




Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
 When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will 
 IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to 
 update the irq routing table,
 in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is 
 blocked for so much time to wait RCU grace period, and during this period, 
 this vcpu cannot provide service to VM,
 so those interrupts delivered to this vcpu cannot be handled in time, and the 
 apps running on this vcpu cannot be serviced too.
 It's unacceptable in some real-time scenario, e.g. telecom. 
 
 So, I want to create a single workqueue for each VM, to asynchronously 
 performing the RCU synchronization for irq routing table, 
 and let the vcpu thread return and VMENTRY to service VM immediately, no more 
 need to blocked to wait RCU grace period.
 And, I have implemented a raw patch, took a test in our telecom environment, 
 above problem disappeared.

I don't think a workqueue is even needed.  You just need to use call_rcu
to free old after releasing kvm-irq_lock.

What do you think?

Paolo



Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Gleb Natapov
On Tue, Nov 26, 2013 at 12:40:36PM +, Zhanghaoyu (A) wrote:
 Hi all,
 
 When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will 
 IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to 
 update the irq routing table,
Why vcpu thread ask the hypervisor to update the irq routing table on
pcpu migration?

 in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is 
 blocked for so much time to wait RCU grace period, and during this period, 
 this vcpu cannot provide service to VM,
 so those interrupts delivered to this vcpu cannot be handled in time, and the 
 apps running on this vcpu cannot be serviced too.
 It's unacceptable in some real-time scenario, e.g. telecom. 
 
 So, I want to create a single workqueue for each VM, to asynchronously 
 performing the RCU synchronization for irq routing table, 
 and let the vcpu thread return and VMENTRY to service VM immediately, no more 
 need to blocked to wait RCU grace period.
 And, I have implemented a raw patch, took a test in our telecom environment, 
 above problem disappeared.
 
 Any better ideas?
 
 Thanks,
 Zhang Haoyu

--
Gleb.



Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Gleb Natapov
On Tue, Nov 26, 2013 at 02:48:10PM +0200, Gleb Natapov wrote:
 On Tue, Nov 26, 2013 at 12:40:36PM +, Zhanghaoyu (A) wrote:
  Hi all,
  
  When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will 
  IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor 
  to update the irq routing table,
 Why vcpu thread ask the hypervisor to update the irq routing table on
 pcpu migration?
 
Ah, I misread. Guest sets irq smp_affinity not host.

--
Gleb.



Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-26 Thread Markus Armbruster
Laszlo Ersek ler...@redhat.com writes:

 On 11/22/13 13:21, Markus Armbruster wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 This patch allows the user to usefully specify

   -drive file=img_1,if=pflash,format=raw,readonly \
   -drive file=img_2,if=pflash,format=raw

 on the command line. The flash images will be mapped under 4G in their
 reverse unit order -- that is, with their base addresses progressing
 downwards, in increasing unit order.

 (The unit number increases with command line order if not explicitly
 specified.)

 This accommodates the following use case: suppose that OVMF is split in
 two parts, a writeable host file for non-volatile variable storage, and a
 read-only part for bootstrap and decompressible executable code.

 The binary code part would be read-only, centrally managed on the host
 system, and passed in as unit 0. The variable store would be writeable,
 VM-specific, and passed in as unit 1.

   ffe0-ffe1 (prio 0, R-): system.flash1
   ffe2- (prio 0, R-): system.flash0

 (If the guest tries to write to the flash range that is backed by the
 read-only drive, bdrv_write() in pflash_update() will correctly deny the
 write with -EACCES, and pflash_update() won't care, which suits us well.)
 
 Before this patch:
 
 You can define as many if=pflash drives as you want.  Any with non-zero
 index are silently ignored.
 
 If you don't specify one with index=0, you get a ROM at the top of the
 32 bit address space, contents taken from -bios (default bios.bin).
 Up to its last 128KiB are aliased at the top of the ISA address space.
 
 If you do specify one with index=0, you get a pflash device at the top
 of the 32 bit address space, with contents from the drive, and -bios is
 silently ignored.  Up to its last 128KiB are copied to a ROM at the top
 of the (20 bit) ISA address space.
 
 After this patch (please correct misunderstandings):
 
 Now the drives after the first unused index are silently ignored.
 
 If you don't specify one with index=0, no change.
 
 If you do, you now get N pflash devices, where N is the first unused
 index.  Each pflash's contents is taken from the respective drive.  The
 flashes are mapped at the top of the 32 bit address space in reverse
 index order.  -bios is silently ignored, as before.  Up to the last
 128KiB of the index=0 flash are copied to a ROM at the top of the ISA
 address space.
 
 Thus, no change for index=0.  For index=1..N, we now get additional
 flash devices.
 
 Correct?

 Yes.

Thus, we grab *all* if=pflash drives for this purpose.

Your stated use case wants just two.

Hmm.  Are we sure we'll never want to map an if=pflash device somewhere
else?

I'm asking because such ABI things are a pain to change later on...



Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 13:03, Lei Li ha scritto:

 +if (pending_size  pending_size = max_size 
 +!runstate_needs_reset()) {
   qemu_savevm_state_iterate(s-file);
 I'm not sure why you need this.
 
 The adjustment here is to avoid the iteration stage for page flipping.
 Because pending_size = ram_save_remaining() * TARGET_PAGE_SIZE which is
 not 0 and pending_size  max_size (0) at start.

It's still not clear to me that avoiding the iteration stage is
necessary.  I think it's just an optimization to avoid scanning the
bitmap, but:

(1) Juan's bitmap optimization will make this mostly unnecessary

(2) getting good downtime from page flipping will require postcopy anyway.

 And you said 'This is a bit ugly but I understand the need. Perhaps 
 !runstate_needs_reset() like below?' :)

Oops.  I might have said this before thinking about postcopy and/or
before seeing the benchmark results from Juan's patches.  If this part
of the patch is just an optimization, I'd rather leave it out for now.

Thanks for putting up with me. :)

Paolo



Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Gleb Natapov
On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote:
 Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
  When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will 
  IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor 
  to update the irq routing table,
  in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is 
  blocked for so much time to wait RCU grace period, and during this period, 
  this vcpu cannot provide service to VM,
  so those interrupts delivered to this vcpu cannot be handled in time, and 
  the apps running on this vcpu cannot be serviced too.
  It's unacceptable in some real-time scenario, e.g. telecom. 
  
  So, I want to create a single workqueue for each VM, to asynchronously 
  performing the RCU synchronization for irq routing table, 
  and let the vcpu thread return and VMENTRY to service VM immediately, no 
  more need to blocked to wait RCU grace period.
  And, I have implemented a raw patch, took a test in our telecom 
  environment, above problem disappeared.
 
 I don't think a workqueue is even needed.  You just need to use call_rcu
 to free old after releasing kvm-irq_lock.
 
 What do you think?
 
It should be rate limited somehow. Since it guest triggarable guest may cause
host to allocate a lot of memory this way.

Is this about MSI interrupt affinity? IIRC changing INT interrupt
affinity should not trigger kvm_set_irq_routing update. If this is about
MSI only then what about changing userspace to use KVM_SIGNAL_MSI for
MSI injection?

--
Gleb.



Re: [Qemu-devel] virtio-net: network stops responding in Win2k3 server

2013-11-26 Thread ing. Mario De Chenno
Since it's a production machine, I had to try a remedy first: I changed one
network card (pub, of course) to e1000 and now it's up from 62 hours (maybe
a record!)

Here is the output for the other card (virtio): I guess parameters should
have been the same for the affected interface too (they are now the same
for the tap interface connected to e1000 driver, and reflect the underlying
physical interface).

Offload parameters for tap-solariconsi:
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off

I cannot switch back the virtual nic just now. I will try to increase debug
on other machines, but most are 64bit Win2k3 OSs, so the virtio-net driver
is not the same (it's a 64bit version at least...).
Soon or later I will have some debug data to report to the list. In the
meantime if you have some configuration advises, feel free to post them.

Thanks,

Mario


2013/11/24 Yan Vugenfirer yvuge...@redhat.com

 Hi Mario,

 Can you check the offload settings of the tap device that is connected to
 guest?

 Run “ethtool -k tap-solaripub”.

 On the guest. Raise the log verbosity by going to device manager - NetKVM
 device - Advanced tab - Logging.Level and changing it to 4. Use DebugView
 to record the driver tracing (enable kernel trace):
 http://technet.microsoft.com/en-us/sysinternals/bb896647.aspx


 Best regards,
 Yan.






Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-26 Thread Markus Armbruster
Laszlo Ersek ler...@redhat.com writes:

 On 11/25/13 16:32, Markus Armbruster wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 This patch allows the user to usefully specify

   -drive file=img_1,if=pflash,format=raw,readonly \
   -drive file=img_2,if=pflash,format=raw

 on the command line. The flash images will be mapped under 4G in their
 reverse unit order -- that is, with their base addresses progressing
 downwards, in increasing unit order.

 (The unit number increases with command line order if not explicitly
 specified.)

 This accommodates the following use case: suppose that OVMF is split in
 two parts, a writeable host file for non-volatile variable storage, and a
 read-only part for bootstrap and decompressible executable code.

 The binary code part would be read-only, centrally managed on the host
 system, and passed in as unit 0. The variable store would be writeable,
 VM-specific, and passed in as unit 1.

   ffe0-ffe1 (prio 0, R-): system.flash1
   ffe2- (prio 0, R-): system.flash0

 (If the guest tries to write to the flash range that is backed by the
 read-only drive, bdrv_write() in pflash_update() will correctly deny the
 write with -EACCES, and pflash_update() won't care, which suits us well.)

 Signed-off-by: Laszlo Ersek ler...@redhat.com
 ---
  hw/i386/pc_sysfw.c | 60 
 --
  1 file changed, 45 insertions(+), 15 deletions(-)

 diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
 index e917c83..1c3e3d6 100644
 --- a/hw/i386/pc_sysfw.c
 +++ b/hw/i386/pc_sysfw.c
 @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
  memory_region_set_readonly(isa_bios, true);
  }
  
 +/* This function maps flash drives from 4G downward, in order of their unit
 + * numbers. Addressing within one flash drive is of course not reversed.
 + *
 + * The drive with unit number 0 is mapped at the highest address, and it is
 + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is 
 not
 + * supported.
 + *
 + * The caller is responsible to pass in the non-NULL @pflash_drv that
 + * corresponds to the flash drive with unit number 0.
 + */
  static void pc_system_flash_init(MemoryRegion *rom_memory,
   DriveInfo *pflash_drv)
  {
 +int unit = 0;
  BlockDriverState *bdrv;
  int64_t size;
 -hwaddr phys_addr;
 +hwaddr phys_addr = 0x1ULL;
  int sector_bits, sector_size;
  pflash_t *system_flash;
  MemoryRegion *flash_mem;
 +char name[64];
  
 -bdrv = pflash_drv-bdrv;
 -size = bdrv_getlength(pflash_drv-bdrv);
  sector_bits = 12;
  sector_size = 1  sector_bits;
  
 -if ((size % sector_size) != 0) {
 -fprintf(stderr,
 -qemu: PC system firmware (pflash) must be a multiple of 
 0x%x\n,
 -sector_size);
 -exit(1);
 -}
 +do {
 +bdrv = pflash_drv-bdrv;
 +size = bdrv_getlength(bdrv);
 +if ((size % sector_size) != 0) {
 +fprintf(stderr,
 +qemu: PC system firmware (pflash segment %d) must be 
 a 
 +multiple of 0x%x\n, unit, sector_size);
 +exit(1);
 +}
 +if (size  phys_addr) {
 +fprintf(stderr, qemu: pflash segments must fit under 4G 
 +cumulatively\n);

You're just following existing bad practice here, but correct use of
error_report() would give you nicer messages.  Happy to explain if
you're interested.

 I suspect things go haywire long before you hit address zero.
 
 Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G.
 The former's hole starts at 0xe000, the latter's at 0xb000.
 Should that be the limit?

 I wanted to do the bare minimal here, to catch obviously wrong backing
 drives (like a 10G file). This is already more verification than what
 the current code does.

 I wouldn't mind more a specific check, but I don't want to suggest (with
 more specific code) that it's actually *safe* to go down to the limit
 that I'd put here.

 For example, the IO-APIC mmio range is [0xFEE0..0xFEE01000[, leaving
 free 18MB-4KB just below 4G. (Of which the current OVMF, including
 variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS.

 I just don't want to send the message it's safe to go all the way down
 there. Right now the responsibility is with the user (you can specify a
 single pflash device that's 20MB in size even now), and I'd like to
 stick with that.

 I believe that

   pflash_cfi01_register()
 sysbus_mmio_map()
   sysbus_mmio_map_common()
 memory_region_add_subregion()
   memory_region_add_subregion_common()

 could, in theory, find a conflict at runtime (see the #if 0-surrounded
 collision warning in memory_region_add_subregion_common()). But the
 memory API doesn't consider such collisions hard errors, and no status
 code is 

Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 13:56, Gleb Natapov ha scritto:
  I don't think a workqueue is even needed.  You just need to use call_rcu
  to free old after releasing kvm-irq_lock.
  
  What do you think?
 
 It should be rate limited somehow. Since it guest triggarable guest may cause
 host to allocate a lot of memory this way.

True, though if I understand Zhanghaoyu's proposal a workqueue would be
even worse.

Paolo



Re: [Qemu-devel] [PATCH 04/27] vl: convert -m to qemu_opts_parse()

2013-11-26 Thread Igor Mammedov
On Thu, 21 Nov 2013 11:12:43 +0100
Markus Armbruster arm...@redhat.com wrote:

 Igor Mammedov imamm...@redhat.com writes:
 
  Along with conversion extend -m option to support following parameters:
 
 Please split this into two patches: first conversion to QemuOpts, then
 extension.
 
mem - startup memory amount
slots - total number of hotplug memory slots
maxmem - maximum possible memory
 
  slots and maxmem should go in pair and maxmem should be greater
  than mem for memory hotplug to be usable.
 
  v2:
make sure maxmem is not less than ram size
 
  Signed-off-by: Igor Mammedov imamm...@redhat.com
  ---
   qemu-options.hx |9 +-
   vl.c|   73 
  ++-
   2 files changed, 68 insertions(+), 14 deletions(-)
 
  diff --git a/qemu-options.hx b/qemu-options.hx
  index 8b94264..fe4559b 100644
  --- a/qemu-options.hx
  +++ b/qemu-options.hx
  @@ -210,8 +210,13 @@ use is discouraged as it may be removed from future 
  versions.
   ETEXI
   
   DEF(m, HAS_ARG, QEMU_OPTION_m,
  --m megs set virtual RAM size to megs MB [default=
  -stringify(DEFAULT_RAM_SIZE) ]\n, QEMU_ARCH_ALL)
  +-m [mem=]megs[,slots=n,maxmem=size]\n
  +set virtual RAM size to megs MB [default=
  +stringify(DEFAULT_RAM_SIZE) ]\n
  +mem=start-up memory amount\n
  +slots=maximum number of hotplug slots\n
  +maxmem=maximum total amount of memory\n,
  +QEMU_ARCH_ALL)
 
 Help text is confusing, because it discusses megs twice.  Fits right in,
 as output of -help is generally confusing (to put it politely).
 
 What about something like this:
 
 -m [mem=]megs[,slots=n,maxmem=size]
 configure guest RAM
 mem: initial amount of guest memory (default: XXX)
 slots: number of hotplug slots (default: none)
 maxmem: maximum amount of guest memory (default: mem)
Sure, I'll fix it.

   STEXI
   @item -m @var{megs}
   @findex -m
  diff --git a/vl.c b/vl.c
  index f28674f..5974f0f 100644
  --- a/vl.c
  +++ b/vl.c
  @@ -529,6 +529,28 @@ static QemuOptsList qemu_msg_opts = {
   },
   };
   
  +static QemuOptsList qemu_mem_opts = {
  +.name = memory-opts,
  +.implied_opt_name = mem,
  +.head = QTAILQ_HEAD_INITIALIZER(qemu_mem_opts.head),
  +.merge_lists = true,
 
 Yes, because multiple -m need to accumulate.
 
  +.desc = {
  +{
  +.name = mem,
  +.type = QEMU_OPT_SIZE,
  +},
  +{
  +.name = slots,
  +.type = QEMU_OPT_NUMBER,
  +},
  +{
  +.name = maxmem,
  +.type = QEMU_OPT_SIZE,
  +},
  +{ /* end of list */ }
  +},
  +};
  +
   /**
* Get machine options
*
  @@ -2816,6 +2838,14 @@ static int object_create(QemuOpts *opts, void 
  *opaque)
   return 0;
   }
   
  +static void qemu_init_default_mem_opts(uint64_t size)
  +{
  +QemuOpts *opts = qemu_opts_create_nofail(qemu_mem_opts);
  +qemu_opt_set_number(opts, mem, size);
  +qemu_opt_set_number(opts, maxmem, size);
  +qemu_opt_set_number(opts, slots, 0);
  +}
  +
 
 We usually don't put defaults in QemuOpts.  Instead, the code querying
 QemuOpts detects and handles absence of the parameter.  Can be as simple
 as qemu_opt_get_size(opts, mem, DEFAULT_RAM_SIZE * 1024 * 1024).
It allows to make number of uses a bit simpler:

for example v6:
QemuOpts *opts = qemu_opts_find(qemu_find_opts(memory-opts), NULL);
if (!opts) { /* no -m x,... was passed to cmd line so no mem hotplug */
return;
}
mem_st-dev_count = qemu_opt_get_number(opts, slots, 0);

becomes
QemuOpts *opts = qemu_opts_find(qemu_find_opts(memory-opts), NULL);
state-dev_count = qemu_opt_get_number(opts, slots, 0);

and eliminates need to pepper code with DEFAULT_RAM_SIZE * 1024 * 1024 or
like constants wherever qemu_opt_get_..() is called, that was main reason
to set defaults. i.e. set defaults only once instead of spreading them in
every place qemu_opt_get_..() is called.

 
 See also below.
 
   int main(int argc, char **argv, char **envp)
   {
   int i;
  @@ -2887,6 +2917,7 @@ int main(int argc, char **argv, char **envp)
   qemu_add_opts(qemu_tpmdev_opts);
   qemu_add_opts(qemu_realtime_opts);
   qemu_add_opts(qemu_msg_opts);
  +qemu_add_opts(qemu_mem_opts);
   
   runstate_init();
   
  @@ -2901,7 +2932,8 @@ int main(int argc, char **argv, char **envp)
   module_call_init(MODULE_INIT_MACHINE);
   machine = find_default_machine();
   cpu_model = NULL;
  -ram_size = 0;
  +ram_size = DEFAULT_RAM_SIZE * 1024 * 1024;
  +qemu_init_default_mem_opts(ram_size);
   snapshot = 0;
   cyls = heads = secs = 0;
   translation = BIOS_ATA_TRANSLATION_AUTO;
  @@ -3178,21 +3210,43 @@ int main(int argc, char **argv, char **envp)
   

Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Avi Kivity
On Tue, Nov 26, 2013 at 2:47 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
 When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread
will IOCTL return to QEMU from hypervisor, then vcpu thread ask the
hypervisor to update the irq routing table,
 in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread
is blocked for so much time to wait RCU grace period, and during this
period, this vcpu cannot provide service to VM,
 so those interrupts delivered to this vcpu cannot be handled in time,
and the apps running on this vcpu cannot be serviced too.
 It's unacceptable in some real-time scenario, e.g. telecom.

 So, I want to create a single workqueue for each VM, to asynchronously
performing the RCU synchronization for irq routing table,
 and let the vcpu thread return and VMENTRY to service VM immediately, no
more need to blocked to wait RCU grace period.
 And, I have implemented a raw patch, took a test in our telecom
environment, above problem disappeared.

 I don't think a workqueue is even needed.  You just need to use call_rcu
 to free old after releasing kvm-irq_lock.

 What do you think?

Can this cause an interrupt to be delivered to the wrong (old) vcpu?

The way Linux sets interrupt affinity, it cannot, since changing the
affinity is (IIRC) done in the interrupt handler, so the next interrupt
cannot be in flight and thus pick up the old interrupt routing table.

However it may be vulnerable in other ways.


[Qemu-devel] sparc64 with openbios-sparc64

2013-11-26 Thread Mahmood Naderan
Hello,
Is there any news on emulating sparc64 machine with openbios-sparc64 on a 
x86_64 host machine? I followed some posts regarding this from 2010 however 
didn't end-up with a conclusion. 

 
Regards,
Mahmood

Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-26 Thread Laszlo Ersek
On 11/26/13 13:53, Markus Armbruster wrote:

 Thus, we grab *all* if=pflash drives for this purpose.
 
 Your stated use case wants just two.
 
 Hmm.  Are we sure we'll never want to map an if=pflash device somewhere
 else?

No, I'm not sure.

Thanks
Laszlo



Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-26 Thread Laszlo Ersek
On 11/26/13 13:36, Markus Armbruster wrote:

 Your stated purpose for multiple -pflash:
 
 This accommodates the following use case: suppose that OVMF is split in
 two parts, a writeable host file for non-volatile variable storage, and a
 read-only part for bootstrap and decompressible executable code.
 
 Such a split between writable part and read-only part makes sense to me.
 How is it done in physical hardware?  Single device with configurable
 write-protect, or two separate devices?

(Jordan could help more.)

Likely one device that's fully writeable.

The flash driver (through which the NvVar updates go) makes sure that a
kind of journal is written and that the live variable store is not
corrupted even if power is cut during an update.

However, if something writes to the flash without going through the
driver, it can brick the board. (Trample over the bootstrap code for
example.) I think.

Laszlo



Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-26 Thread Laszlo Ersek
On 11/26/13 14:11, Markus Armbruster wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 On 11/25/13 16:32, Markus Armbruster wrote:
 Laszlo Ersek ler...@redhat.com writes:

 This patch allows the user to usefully specify

   -drive file=img_1,if=pflash,format=raw,readonly \
   -drive file=img_2,if=pflash,format=raw

 on the command line. The flash images will be mapped under 4G in their
 reverse unit order -- that is, with their base addresses progressing
 downwards, in increasing unit order.

 (The unit number increases with command line order if not explicitly
 specified.)

 This accommodates the following use case: suppose that OVMF is split in
 two parts, a writeable host file for non-volatile variable storage, and a
 read-only part for bootstrap and decompressible executable code.

 The binary code part would be read-only, centrally managed on the host
 system, and passed in as unit 0. The variable store would be writeable,
 VM-specific, and passed in as unit 1.

   ffe0-ffe1 (prio 0, R-): system.flash1
   ffe2- (prio 0, R-): system.flash0

 (If the guest tries to write to the flash range that is backed by the
 read-only drive, bdrv_write() in pflash_update() will correctly deny the
 write with -EACCES, and pflash_update() won't care, which suits us well.)

 Signed-off-by: Laszlo Ersek ler...@redhat.com
 ---
  hw/i386/pc_sysfw.c | 60 
 --
  1 file changed, 45 insertions(+), 15 deletions(-)

 diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
 index e917c83..1c3e3d6 100644
 --- a/hw/i386/pc_sysfw.c
 +++ b/hw/i386/pc_sysfw.c
 @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
  memory_region_set_readonly(isa_bios, true);
  }
  
 +/* This function maps flash drives from 4G downward, in order of their 
 unit
 + * numbers. Addressing within one flash drive is of course not reversed.
 + *
 + * The drive with unit number 0 is mapped at the highest address, and it 
 is
 + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is 
 not
 + * supported.
 + *
 + * The caller is responsible to pass in the non-NULL @pflash_drv that
 + * corresponds to the flash drive with unit number 0.
 + */
  static void pc_system_flash_init(MemoryRegion *rom_memory,
   DriveInfo *pflash_drv)
  {
 +int unit = 0;
  BlockDriverState *bdrv;
  int64_t size;
 -hwaddr phys_addr;
 +hwaddr phys_addr = 0x1ULL;
  int sector_bits, sector_size;
  pflash_t *system_flash;
  MemoryRegion *flash_mem;
 +char name[64];
  
 -bdrv = pflash_drv-bdrv;
 -size = bdrv_getlength(pflash_drv-bdrv);
  sector_bits = 12;
  sector_size = 1  sector_bits;
  
 -if ((size % sector_size) != 0) {
 -fprintf(stderr,
 -qemu: PC system firmware (pflash) must be a multiple of 
 0x%x\n,
 -sector_size);
 -exit(1);
 -}
 +do {
 +bdrv = pflash_drv-bdrv;
 +size = bdrv_getlength(bdrv);
 +if ((size % sector_size) != 0) {
 +fprintf(stderr,
 +qemu: PC system firmware (pflash segment %d) must be 
 a 
 +multiple of 0x%x\n, unit, sector_size);
 +exit(1);
 +}
 +if (size  phys_addr) {
 +fprintf(stderr, qemu: pflash segments must fit under 4G 
 +cumulatively\n);
 
 You're just following existing bad practice here, but correct use of
 error_report() would give you nicer messages.  Happy to explain if
 you're interested.

You have the Location thing in mind, eg. automatically binding the error
message to the offending command line option, right?

I've seen you use it before in another patch. Hm... It was commit
ec2df8c10a4585ba4641ae482cf2f5f13daa810e.

I will try to address the rest of your comments too when I get back to
this patch.

Thanks
Laszlo

 
 I suspect things go haywire long before you hit address zero.

 Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G.
 The former's hole starts at 0xe000, the latter's at 0xb000.
 Should that be the limit?

 I wanted to do the bare minimal here, to catch obviously wrong backing
 drives (like a 10G file). This is already more verification than what
 the current code does.

 I wouldn't mind more a specific check, but I don't want to suggest (with
 more specific code) that it's actually *safe* to go down to the limit
 that I'd put here.

 For example, the IO-APIC mmio range is [0xFEE0..0xFEE01000[, leaving
 free 18MB-4KB just below 4G. (Of which the current OVMF, including
 variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS.

 I just don't want to send the message it's safe to go all the way down
 there. Right now the responsibility is with the user (you can specify a
 single pflash device that's 20MB in size even now), and I'd like to
 stick with that.

 I believe that

   

Re: [Qemu-devel] [PATCHv2 1.8 1/9] qemu-img: add support for skipping zeroes in input during convert

2013-11-26 Thread Peter Lieven

On 26.11.2013 11:02, Paolo Bonzini wrote:

Il 26/11/2013 09:56, Peter Lieven ha scritto:

we currently do not check if a sector is allocated during convert.
This means if a sector is unallocated that we allocate a bounce
buffer of zeroes, find out its zero later and do not write it
in the best case. In the worst case this can lead to reading
blocks from a raw device (like iSCSI) altough we could easily
know via get_block_status that they are zero and simply skip them.

This patch also fixes the progress output not being at 100% after
a successful conversion.

Signed-off-by: Peter Lieven p...@kamp.de
---
  qemu-img.c |   85 ++--
  1 file changed, 48 insertions(+), 37 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index dc0c2f0..efb744c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1125,13 +1125,15 @@ out3:
  
  static int img_convert(int argc, char **argv)

  {
-int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size,
+int c, n, n1, bs_n, bs_i, compress, cluster_size,
  cluster_sectors, skip_create;
+int64_t ret = 0;
  int progress = 0, flags;
  const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
  BlockDriver *drv, *proto_drv;
  BlockDriverState **bs = NULL, *out_bs = NULL;
-int64_t total_sectors, nb_sectors, sector_num, bs_offset;
+int64_t total_sectors, nb_sectors, sector_num, bs_offset,
+sector_num_next_status = 0;
  uint64_t bs_sectors;
  uint8_t * buf = NULL;
  const uint8_t *buf1;
@@ -1140,7 +1142,6 @@ static int img_convert(int argc, char **argv)
  QEMUOptionParameter *out_baseimg_param;
  char *options = NULL;
  const char *snapshot_name = NULL;
-float local_progress = 0;
  int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */
  bool quiet = false;
  Error *local_err = NULL;
@@ -1403,10 +1404,6 @@ static int img_convert(int argc, char **argv)
  sector_num = 0;
  
  nb_sectors = total_sectors;

-if (nb_sectors != 0) {
-local_progress = (float)100 /
-(nb_sectors / MIN(nb_sectors, cluster_sectors));
-}
  
  for(;;) {

  int64_t bs_num;
@@ -1464,7 +1461,7 @@ static int img_convert(int argc, char **argv)
  }
  }
  sector_num += n;
-qemu_progress_print(local_progress, 100);
+qemu_progress_print(100.0 * sector_num / total_sectors, 0);
  }
  /* signal EOF to align */
  bdrv_write_compressed(out_bs, 0, NULL, 0);
@@ -1481,21 +1478,13 @@ static int img_convert(int argc, char **argv)
  
  sector_num = 0; // total number of sectors converted so far

  nb_sectors = total_sectors - sector_num;
-if (nb_sectors != 0) {
-local_progress = (float)100 /
-(nb_sectors / MIN(nb_sectors, IO_BUF_SIZE / 512));
-}
  
  for(;;) {

  nb_sectors = total_sectors - sector_num;
  if (nb_sectors = 0) {
+ret = 0;
  break;
  }
-if (nb_sectors = (IO_BUF_SIZE / 512)) {
-n = (IO_BUF_SIZE / 512);
-} else {
-n = nb_sectors;
-}
  
  while (sector_num - bs_offset = bs_sectors) {

  bs_i ++;
@@ -1507,34 +1496,53 @@ static int img_convert(int argc, char **argv)
 sector_num, bs_i, bs_offset, bs_sectors); */
  }
  
-if (n  bs_offset + bs_sectors - sector_num) {

-n = bs_offset + bs_sectors - sector_num;
-}
-
-/* If the output image is being created as a copy on write image,
-   assume that sectors which are unallocated in the input image
-   are present in both the output's and input's base images (no
-   need to copy them). */
-if (out_baseimg) {
-ret = bdrv_is_allocated(bs[bs_i], sector_num - bs_offset,
-n, n1);
+if ((out_baseimg || has_zero_init) 
+sector_num = sector_num_next_status) {
+n = nb_sectors  INT_MAX ? INT_MAX : nb_sectors;
+ret = bdrv_get_block_status(bs[bs_i], sector_num - bs_offset,
+n, n1);
  if (ret  0) {
-error_report(error while reading metadata for sector 
- % PRId64 : %s,
- sector_num - bs_offset, strerror(-ret));
+error_report(error while reading block status of sector %
+ PRId64 : %s, sector_num - bs_offset,
+ strerror(-ret));
  goto out;
  }
-if (!ret) {
+/* If the output image is zero initialized, we are 

Re: [Qemu-devel] [edk2] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-26 Thread Paolo Bonzini
Il 25/11/2013 20:45, Laszlo Ersek ha scritto:
 From looking at hw/block/pflash_cfi01.c, it seems that the guest can
 interrogate the flash device about its size (nb_blocs is stored in
 cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98
 in pflash_read()). So, if the guest cares, it can figure out that there
 are multiple devices backing the range. I think.

IIUC in the case of OVMF the guest just knows that there are multiple
devices backing the range.  Is that right?

But yes, I think that the guest can figure out that there are multiple
devices backing the range.  From reading the pflash code further:

* the pflash device doesn't care about the location where you write the
command (the exception is the block erase command)

* the pflash device only cares about the LSB you read from when you read
data from it

So you can use the last 256 bytes of the first flash (you know it ends
at 4GB) to check for the device (there's probably some suggested
protocol to do that, I don't know) and query its size.  Example with
-qtest stdio:

writeb 0xff00 0x98
OK
readb 0xfff2d
OK 0x001f
readb 0xfff2e
OK 0x
readb 0xfff2f
OK 0x0010
readb 0xfff30
OK 0x
writeb 0xff00 0x98
OK

This means the device has 31+1 blocks each 4KB in size.  You can then
query the next device at 4GB-128KB-256, and so on.

Paolo



[Qemu-devel] [PATCH] virtio-rng: correct the default limit rate

2013-11-26 Thread Amos Kong
We have the following comment about the default limit rate:

/* Set a default rate limit of 2^47 bytes per minute or roughly 2TB/s.  If
 * you have an entropy source capable of generating more entropy than this
 * and you can pass it through via virtio-rng, then hats off to you.  Until
 * then, this is unlimited for all practical purposes.
 */

But the current rate is (INT64_MAX) bytes per (1  16) ms, it's 128,000 TB/s

2TB/s is a reasonable rate, so this patch fixes the code, not update the 
document.

* change the default period to 60,000 ms -- 1 mins
* change the default max-bytes to 2^47 bytes -- INT64_MAX  16

 INT64_MAX  =  0x 8000    - 1 = 2 ^ 63 - 1
 INT64_MAX  16 =  0x 8000   - 1 = 2 ^ 47

Signed-off-by: Amos Kong ak...@redhat.com
---
 include/hw/virtio/virtio-rng.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio-rng.h b/include/hw/virtio/virtio-rng.h
index debaa15..d64e804 100644
--- a/include/hw/virtio/virtio-rng.h
+++ b/include/hw/virtio/virtio-rng.h
@@ -53,7 +53,7 @@ typedef struct VirtIORNG {
 */
 #define DEFINE_VIRTIO_RNG_PROPERTIES(_state, _conf_field)\
 DEFINE_PROP_UINT64(max-bytes, _state, _conf_field.max_bytes,   \
-   INT64_MAX),   \
-DEFINE_PROP_UINT32(period, _state, _conf_field.period_ms, 1  16)
+   INT64_MAX  16), \
+DEFINE_PROP_UINT32(period, _state, _conf_field.period_ms, 6)
 
 #endif
-- 
1.8.3.1




Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 14:18, Avi Kivity ha scritto:
 
 I don't think a workqueue is even needed.  You just need to use call_rcu
 to free old after releasing kvm-irq_lock.

 What do you think?
 
 Can this cause an interrupt to be delivered to the wrong (old) vcpu?

No, this would be exactly the same code that is running now:

mutex_lock(kvm-irq_lock);
old = kvm-irq_routing;
kvm_irq_routing_update(kvm, new);
mutex_unlock(kvm-irq_lock);

synchronize_rcu();
kfree(old);
return 0;

Except that the kfree would run in the call_rcu kernel thread instead of
the vcpu thread.  But the vcpus already see the new routing table after
the rcu_assign_pointer that is in kvm_irq_routing_update.

There is still the problem that Gleb pointed out, though.

Paolo

 The way Linux sets interrupt affinity, it cannot, since changing the
 affinity is (IIRC) done in the interrupt handler, so the next interrupt
 cannot be in flight and thus pick up the old interrupt routing table.
 
 However it may be vulnerable in other ways.
 
 




Re: [Qemu-devel] [edk2] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-26 Thread Laszlo Ersek
On 11/26/13 14:41, Paolo Bonzini wrote:
 Il 25/11/2013 20:45, Laszlo Ersek ha scritto:
 From looking at hw/block/pflash_cfi01.c, it seems that the guest can
 interrogate the flash device about its size (nb_blocs is stored in
 cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98
 in pflash_read()). So, if the guest cares, it can figure out that there
 are multiple devices backing the range. I think.
 
 IIUC in the case of OVMF the guest just knows that there are multiple
 devices backing the range.  Is that right?

No, the guest (the flash driver) is unaware of that. It could know if it
wanted to, but it doesn't care. It cares about a base address and a
size, and that range is subdivided into various roles. But how many
flash devices back the range is not interesting for the driver.

Jordan wrote the driver with one flash device in mind, and when I split
the image in two parts, I took care to map them so that the base
address, the size, and those boundaries stay the same. It's sufficient
for the driver to continue working.

 But yes, I think that the guest can figure out that there are multiple
 devices backing the range.  From reading the pflash code further:
 
 * the pflash device doesn't care about the location where you write the
 command (the exception is the block erase command)
 
 * the pflash device only cares about the LSB you read from when you read
 data from it
 
 So you can use the last 256 bytes of the first flash (you know it ends
 at 4GB) to check for the device (there's probably some suggested
 protocol to do that, I don't know) and query its size.  Example with
 -qtest stdio:
 
 writeb 0xff00 0x98
 OK
 readb 0xfff2d
 OK 0x001f
 readb 0xfff2e
 OK 0x
 readb 0xfff2f
 OK 0x0010
 readb 0xfff30
 OK 0x
 writeb 0xff00 0x98
 OK
 
 This means the device has 31+1 blocks each 4KB in size.  You can then
 query the next device at 4GB-128KB-256, and so on.

Thanks for testing this in practice.
Laszlo



Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping

2013-11-26 Thread Lei Li

On 11/26/2013 08:54 PM, Paolo Bonzini wrote:

Il 26/11/2013 13:03, Lei Li ha scritto:

+if (pending_size  pending_size = max_size 
+!runstate_needs_reset()) {
   qemu_savevm_state_iterate(s-file);

I'm not sure why you need this.

The adjustment here is to avoid the iteration stage for page flipping.
Because pending_size = ram_save_remaining() * TARGET_PAGE_SIZE which is
not 0 and pending_size  max_size (0) at start.

It's still not clear to me that avoiding the iteration stage is


The purpose of it is not just for optimization, but to avoid the
iteration for better alignment.

The current flow of page flipping basically has two stages:

1) ram_save_setup stage, it will send all the bytes in this stages
   to destination, and send_pipefd by ram_control_before_iterate
   at the end of it.
2) ram_save_complete, it will start to transmit the ram page
   in ram_save_block, and send the device state after that.

So it needs to adjust the current migration process to avoid
the iteration stage.


necessary.  I think it's just an optimization to avoid scanning the
bitmap, but:

(1) Juan's bitmap optimization will make this mostly unnecessary

(2) getting good downtime from page flipping will require postcopy anyway.


And you said 'This is a bit ugly but I understand the need. Perhaps 
!runstate_needs_reset() like below?' :)

Oops.  I might have said this before thinking about postcopy and/or
before seeing the benchmark results from Juan's patches.  If this part
of the patch is just an optimization, I'd rather leave it out for now.


I am afraid that page flipping can not proceed correctly without this..



Thanks for putting up with me. :)

Paolo




--
Lei




Re: [Qemu-devel] [PATCH] virtio-rng: correct the default limit rate

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 14:43, Amos Kong ha scritto:
 /* Set a default rate limit of 2^47 bytes per minute or roughly 2TB/s.  If
  * you have an entropy source capable of generating more entropy than this
  * and you can pass it through via virtio-rng, then hats off to you.  Until
  * then, this is unlimited for all practical purposes.
  */
 
 But the current rate is (INT64_MAX) bytes per (1  16) ms, it's 128,000 TB/s

You are changing:

* max-bytes from 2^63 to 2^47

* period from 65536 to 6

For a user, changing only period would have no effect, the limit rate
would remain effectively infinite.  Changing max-bytes would give a 7%
higher rate after your patch.

Not a big deal, and max-bytes is easier to explain after your patch
(bytes/minute) than before (bytes/65536ms).

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

 * change the default period to 60,000 ms -- 1 mins
 * change the default max-bytes to 2^47 bytes -- INT64_MAX  16
 
  INT64_MAX  =  0x 8000    - 1 = 2 ^ 63 - 1
  INT64_MAX  16 =  0x 8000   - 1 = 2 ^ 47
 
 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  include/hw/virtio/virtio-rng.h | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/include/hw/virtio/virtio-rng.h b/include/hw/virtio/virtio-rng.h
 index debaa15..d64e804 100644
 --- a/include/hw/virtio/virtio-rng.h
 +++ b/include/hw/virtio/virtio-rng.h
 @@ -53,7 +53,7 @@ typedef struct VirtIORNG {
  */
  #define DEFINE_VIRTIO_RNG_PROPERTIES(_state, _conf_field)
 \
  DEFINE_PROP_UINT64(max-bytes, _state, _conf_field.max_bytes,   
 \
 -   INT64_MAX),   
 \
 -DEFINE_PROP_UINT32(period, _state, _conf_field.period_ms, 1  16)
 +   INT64_MAX  16), 
 \
 +DEFINE_PROP_UINT32(period, _state, _conf_field.period_ms, 6)




Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info

2013-11-26 Thread Michael S. Tsirkin
On Tue, Nov 26, 2013 at 12:00:51PM +0100, Gerd Hoffmann wrote:
   Hi,
 
  I think it's down to other qemu bugs (such as _CRS not covering
  all of PCI memory), we shall just fix them.
 
 i.e. the attached patch should fixup things.
 
 cheers,
   Gerd
 

Yes, I think it's a start. Q35 is a bit harder because of the MMIO
region.  Do we want to tweak end too? There's all kind of
stuff there so need to be careful ...

 From a81b8d66e24fd298ce7654d424a378337e6cf132 Mon Sep 17 00:00:00 2001
 From: Gerd Hoffmann kra...@redhat.com
 Date: Tue, 26 Nov 2013 11:46:11 +0100
 Subject: [PATCH] piix: fix 32bit pci hole
 
 Make the 32bit pci hole start at end of ram, so all possible address
 space is covered.  Of course the firmware can use less than that.
 Leaving space unused is no problem, mapping pci bars outside the
 hole causes problems though.
 
 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 ---
  hw/pci-host/piix.c | 10 +-
  1 file changed, 1 insertion(+), 9 deletions(-)
 
 diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
 index edc974e..1414a2b 100644
 --- a/hw/pci-host/piix.c
 +++ b/hw/pci-host/piix.c
 @@ -345,15 +345,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state,
  f-ram_memory = ram_memory;
  
  i440fx = I440FX_PCI_HOST_BRIDGE(dev);
 -/* Set PCI window size the way seabios has always done it. */
 -/* Power of 2 so bios can cover it with a single MTRR */
 -if (ram_size = 0x8000) {
 -i440fx-pci_info.w32.begin = 0x8000;
 -} else if (ram_size = 0xc000) {
 -i440fx-pci_info.w32.begin = 0xc000;
 -} else {
 -i440fx-pci_info.w32.begin = 0xe000;
 -}
 +i440fx-pci_info.w32.begin = ram_size;
  
  memory_region_init_alias(f-pci_hole, OBJECT(d), pci-hole, 
 f-pci_address_space,
   pci_hole_start, pci_hole_size);
 -- 
 1.8.3.1
 




Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer

2013-11-26 Thread Lei Li

On 11/26/2013 07:31 PM, Paolo Bonzini wrote:

Il 21/11/2013 10:11, Lei Li ha scritto:

+/*
+ * recvmsg is called here to catch the control message for
+ * the exchange of PIPE file descriptor until it is received.
+ */
+len = recvmsg(s-sockfd, msg, 0);
+if (len != -1) {
+if (unix_msgfd_lookup(s, msg)  0) {
+pipefd_passed = 1;
+/*
+ * Do not count one byte taken by the PIPE file
+ * descriptor.
+ */
+len--;

I think adding a byte in the middle of the stream is not reliable.

Rather, you should transmit the socket always at the same place, for
example in the first call of qemu_local_save_ram, after it has written
the 64-bit field.


I guess 'transmit the socket' you mean transmit the fd?

Sorry that I am quite understand your suggestion here.. Do you
mean that send_pipefd in the first call of qemu_local_save_ram
after it has written the 64-bit field? In this way, get rid of
qemu_local_send_pipefd?

Currently, the fd control message is sent at the end of the stream
in ram_save_setup stage, followed by the ram page. The control
message of fd is always at the same place.



The matching code in qemu_local_ram_load will be like this:

static int qemu_local_ram_load(QEMUFile *f, void *opaque,
uint64_t flags)
{
 QEMUFileLocal *s = opaque;
 ram_addr_t addr;
 struct iovec iov;
 ssize_t ret = -EINVAL;

 if (!s-pipefd_received) {
 /*
  * send_pipefd was called at this point, and it wrote one byte
  * to the stream.
  */
 qemu_get_byte(s);
 s-pipefd_received = true;
 }

 if (pipefd_passed) {
 ...
 }
 return -EINVAL;
}

Also, please move pipefd_passed within QEMUFileLocal.

Thanks,

Paolo




--
Lei




Re: [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE

2013-11-26 Thread Lei Li

On 11/26/2013 08:28 PM, Paolo Bonzini wrote:

Il 21/11/2013 10:11, Lei Li ha scritto:
  
  { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },

DEBUG - MEMORY_STALE is missing.


Good catch, I will add it, thanks. :)



Paolo


  { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
  { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING },
  { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
+{ RUN_STATE_SUSPENDED, RUN_STATE_MEMORY_STALE },
  





--
Lei




[Qemu-devel] [PATCH] SPARC: Add and use CPU_FEATURE_CASA

2013-11-26 Thread Sebastian Huber
The LEON3 processor has support for the CASA instruction which is
normally only available for SPARC V9 processors.  Binutils 2.24
and GCC 4.9 will support this instruction for LEON3.  GCC uses it to
generate C11 atomic operations.
---
 target-sparc/cpu.c |3 +-
 target-sparc/cpu.h |4 ++-
 target-sparc/helper.h  |4 ++-
 target-sparc/ldst_helper.c |   26 +---
 target-sparc/translate.c   |   47 ---
 5 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/target-sparc/cpu.c b/target-sparc/cpu.c
index e7f878e..5806e59 100644
--- a/target-sparc/cpu.c
+++ b/target-sparc/cpu.c
@@ -458,7 +458,8 @@ static const sparc_def_t sparc_defs[] = {
 .mmu_trcr_mask = 0x,
 .nwindows = 8,
 .features = CPU_DEFAULT_FEATURES | CPU_FEATURE_TA0_SHUTDOWN |
-CPU_FEATURE_ASR17 | CPU_FEATURE_CACHE_CTRL | CPU_FEATURE_POWERDOWN,
+CPU_FEATURE_ASR17 | CPU_FEATURE_CACHE_CTRL | CPU_FEATURE_POWERDOWN |
+CPU_FEATURE_CASA,
 },
 #endif
 };
diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
index 41194ec..f87d7fb 100644
--- a/target-sparc/cpu.h
+++ b/target-sparc/cpu.h
@@ -271,12 +271,14 @@ typedef struct sparc_def_t {
 #define CPU_FEATURE_ASR17(1  15)
 #define CPU_FEATURE_CACHE_CTRL   (1  16)
 #define CPU_FEATURE_POWERDOWN(1  17)
+#define CPU_FEATURE_CASA (1  18)
 
 #ifndef TARGET_SPARC64
 #define CPU_DEFAULT_FEATURES (CPU_FEATURE_FLOAT | CPU_FEATURE_SWAP |  \
   CPU_FEATURE_MUL | CPU_FEATURE_DIV | \
   CPU_FEATURE_FLUSH | CPU_FEATURE_FSQRT | \
-  CPU_FEATURE_FMUL | CPU_FEATURE_FSMULD)
+  CPU_FEATURE_FMUL | CPU_FEATURE_FSMULD | \
+  CPU_FEATURE_CASA)
 #else
 #define CPU_DEFAULT_FEATURES (CPU_FEATURE_FLOAT | CPU_FEATURE_SWAP |  \
   CPU_FEATURE_MUL | CPU_FEATURE_DIV | \
diff --git a/target-sparc/helper.h b/target-sparc/helper.h
index 5e0eea1..9c4fd56 100644
--- a/target-sparc/helper.h
+++ b/target-sparc/helper.h
@@ -22,7 +22,6 @@ DEF_HELPER_1(popc, tl, tl)
 DEF_HELPER_4(ldda_asi, void, env, tl, int, int)
 DEF_HELPER_5(ldf_asi, void, env, tl, int, int, int)
 DEF_HELPER_5(stf_asi, void, env, tl, int, int, int)
-DEF_HELPER_5(cas_asi, tl, env, tl, tl, tl, i32)
 DEF_HELPER_5(casx_asi, tl, env, tl, tl, tl, i32)
 DEF_HELPER_2(set_softint, void, env, i64)
 DEF_HELPER_2(clear_softint, void, env, i64)
@@ -31,6 +30,9 @@ DEF_HELPER_2(tick_set_count, void, ptr, i64)
 DEF_HELPER_1(tick_get_count, i64, ptr)
 DEF_HELPER_2(tick_set_limit, void, ptr, i64)
 #endif
+#if !defined(CONFIG_USER_ONLY) || defined(TARGET_SPARC64)
+DEF_HELPER_5(cas_asi, tl, env, tl, tl, tl, i32)
+#endif
 DEF_HELPER_3(check_align, void, env, tl, i32)
 DEF_HELPER_1(debug, void, env)
 DEF_HELPER_1(save, void, env)
diff --git a/target-sparc/ldst_helper.c b/target-sparc/ldst_helper.c
index 2936b58..c51b9b0 100644
--- a/target-sparc/ldst_helper.c
+++ b/target-sparc/ldst_helper.c
@@ -2224,33 +2224,35 @@ void helper_stf_asi(CPUSPARCState *env, target_ulong 
addr, int asi, int size,
 }
 }
 
-target_ulong helper_cas_asi(CPUSPARCState *env, target_ulong addr,
-target_ulong val1, target_ulong val2, uint32_t asi)
+target_ulong helper_casx_asi(CPUSPARCState *env, target_ulong addr,
+ target_ulong val1, target_ulong val2,
+ uint32_t asi)
 {
 target_ulong ret;
 
-val2 = 0xUL;
-ret = helper_ld_asi(env, addr, asi, 4, 0);
-ret = 0xUL;
+ret = helper_ld_asi(env, addr, asi, 8, 0);
 if (val2 == ret) {
-helper_st_asi(env, addr, val1  0xUL, asi, 4);
+helper_st_asi(env, addr, val1, asi, 8);
 }
 return ret;
 }
+#endif /* TARGET_SPARC64 */
 
-target_ulong helper_casx_asi(CPUSPARCState *env, target_ulong addr,
- target_ulong val1, target_ulong val2,
- uint32_t asi)
+#if !defined(CONFIG_USER_ONLY) || defined(TARGET_SPARC64)
+target_ulong helper_cas_asi(CPUSPARCState *env, target_ulong addr,
+target_ulong val1, target_ulong val2, uint32_t asi)
 {
 target_ulong ret;
 
-ret = helper_ld_asi(env, addr, asi, 8, 0);
+val2 = 0xUL;
+ret = helper_ld_asi(env, addr, asi, 4, 0);
+ret = 0xUL;
 if (val2 == ret) {
-helper_st_asi(env, addr, val1, asi, 8);
+helper_st_asi(env, addr, val1  0xUL, asi, 4);
 }
 return ret;
 }
-#endif /* TARGET_SPARC64 */
+#endif /* !defined(CONFIG_USER_ONLY) || defined(TARGET_SPARC64) */
 
 void helper_ldqf(CPUSPARCState *env, target_ulong addr, int mem_idx)
 {
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 38b4519..86743dc 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -2107,18 +2107,6 @@ static 

Re: [Qemu-devel] [edk2] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 14:53, Laszlo Ersek ha scritto:
  
  IIUC in the case of OVMF the guest just knows that there are multiple
  devices backing the range.  Is that right?
 No, the guest (the flash driver) is unaware of that. It could know if it
 wanted to, but it doesn't care. It cares about a base address and a
 size, and that range is subdivided into various roles. But how many
 flash devices back the range is not interesting for the driver.
 
 Jordan wrote the driver with one flash device in mind, and when I split
 the image in two parts, I took care to map them so that the base
 address, the size, and those boundaries stay the same. It's sufficient
 for the driver to continue working.

Ah, I see it now.  That's because the driver never needs to write an
offset into the flash device.  Offsets are communicated by writing to
different memory addresses.

Since the OVMF driver never queries the size of the device, it doesn't
care whether the flash is one or two.

Paolo



Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info

2013-11-26 Thread Laszlo Ersek
On 11/26/13 10:10, Michael S. Tsirkin wrote:

 seabios manages to enumerate PCI with information exported from qemu
 so why can't OVMF?

SeaBIOS and qemu duplicate logic (code) between each other.

src/fw/pciinit.c grabs RamSize over fw_cfg, and i440fx_mem_addr_setup()
sets pcimem_start to one of three values based on RamSize. (One of the
three is not explicit there, it's the build default 0xe000.)

The same code is visible in qemu in i440fx_init().

I duplicated the same logic in OVMF's PEI one week ago, and it solved
the problem. See the patch attached to
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/4881/focus=4995.

But code/logic duplication is ugly, which is why I've been looking for a
better, dynamic solution ever since.

 I think it's down to other qemu bugs (such as _CRS not covering
 all of PCI memory), we shall just fix them.

I don't know how to fix them. I don't know how to enumerate all PCI
regions in use, plus all unassigned ranges, from below, like with a
MemoryListener.

If I understand correctly, Igor suggested to track devices as they map
their regions, but (again, if I understood correctly) you didn't seem to
like the idea.

Thanks
Laszlo



Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 14:53, Lei Li ha scritto:
 1) ram_save_setup stage, it will send all the bytes in this stages
to destination, and send_pipefd by ram_control_before_iterate
at the end of it.

ram_save_setup runs doesn't send anything from guest RAM.  It sends the
lengths of the various blocks.  As you said, at the end of
ram_save_setup you send the pipefd.

ram_save_iterate runs before ram_save_complete.  ram_save_iterate and
ram_save_complete write data with exactly the same format.  Both of them
can use ram_save_page

It should not matter if some pages are sent as part of ram_save_iterate
and others as part of ram_save_complete.

One possibility is that you are hitting a bug due to the way you ignore
the 0x01 byte that send_pipefd places on the socket.

 Oops.  I might have said this before thinking about postcopy and/or
 before seeing the benchmark results from Juan's patches.  If this part
 of the patch is just an optimization, I'd rather leave it out for now.
 
 I am afraid that page flipping can not proceed correctly without this..

I really would like to understand why, because it really shouldn't (this
shouldn't be a place where you need a hook).

Paolo




Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info

2013-11-26 Thread Laszlo Ersek
On 11/26/13 15:04, Michael S. Tsirkin wrote:
 On Tue, Nov 26, 2013 at 12:00:51PM +0100, Gerd Hoffmann wrote:
   Hi,

 I think it's down to other qemu bugs (such as _CRS not covering
 all of PCI memory), we shall just fix them.

 i.e. the attached patch should fixup things.

 cheers,
   Gerd

 
 Yes, I think it's a start. Q35 is a bit harder because of the MMIO
 region.  Do we want to tweak end too? There's all kind of
 stuff there so need to be careful ...

I sent you a very similar patch last evening, and you ignored it. In
that same email I asked about the mmconfig stuff as well, and you
ignored that too. Attached.

 From a81b8d66e24fd298ce7654d424a378337e6cf132 Mon Sep 17 00:00:00 2001
 From: Gerd Hoffmann kra...@redhat.com
 Date: Tue, 26 Nov 2013 11:46:11 +0100
 Subject: [PATCH] piix: fix 32bit pci hole

 Make the 32bit pci hole start at end of ram, so all possible address
 space is covered.  Of course the firmware can use less than that.
 Leaving space unused is no problem, mapping pci bars outside the
 hole causes problems though.

 Signed-off-by: Gerd Hoffmann kra...@redhat.com
 ---
  hw/pci-host/piix.c | 10 +-
  1 file changed, 1 insertion(+), 9 deletions(-)

 diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
 index edc974e..1414a2b 100644
 --- a/hw/pci-host/piix.c
 +++ b/hw/pci-host/piix.c
 @@ -345,15 +345,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state,
  f-ram_memory = ram_memory;
  
  i440fx = I440FX_PCI_HOST_BRIDGE(dev);
 -/* Set PCI window size the way seabios has always done it. */
 -/* Power of 2 so bios can cover it with a single MTRR */
 -if (ram_size = 0x8000) {
 -i440fx-pci_info.w32.begin = 0x8000;
 -} else if (ram_size = 0xc000) {
 -i440fx-pci_info.w32.begin = 0xc000;
 -} else {
 -i440fx-pci_info.w32.begin = 0xe000;
 -}
 +i440fx-pci_info.w32.begin = ram_size;

This doesn't clamp the w32.begin value into [0x8000, 0xe000],
which seems wrong.

Guys, I'm confused and annoyed out of my brains by the divergence here.
In my perception Michael, Igor, and Gerd are all proposing different
things, and my ideas are either rejected or ignored (even though they
occasionally overlap with ideas from others). After struggling with this
for more than a week, and having being awake for 27 hrs, I'm officially
stopping to care right now. Ping me when qemu has something to offer
that's neither ridden by SeaBIOS legacy nor requires an AML interpreter
in OVMF's PEI phase.

Thanks,
Laszlo
---BeginMessage---
CC'ing Igor

On 11/25/13 14:24, Michael S. Tsirkin wrote:
 On Sun, Nov 24, 2013 at 06:43:38PM +0100, Laszlo Ersek wrote:
 Hello Michael,

 I didn't completely get your point on IRC regarding this subject. You
 were saying that OVMF shouldn't be changed, but the boundaries
 exported by qemu should (and that qemu should stop caring about
 SeaBIOS tradition in this regard).

 You mentioned MMCONFIG and I don't understand that reference. Can you
 please elaborate how you think we should proceed?

 Thanks!
 Laszlo

 I refer to this:
 /* Leave enough space for the biggest MCFG BAR */
 /* TODO: this matches current bios behaviour, but
  * it's not a power of two, which means an MTRR
  * can't cover it exactly.
  */
 s-mch.pci_info.w32.begin = MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT +
 MCH_HOST_BRIDGE_PCIEXBAR_MAX;
 s-mch.pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS;

 We must make sure CRS does not include the mmcfg region.

Sorry but I still don't understand :)

First, OVMF is currently for i440fx only, and the quoted part is from
hw/pci-host/q35.c. I of course agree that we should fix this in a
uniform manner in qemu -- just saying that for OVMF the following would
suffice:

  diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
  index edc974e..9753fea 100644
  --- a/hw/pci-host/piix.c
  +++ b/hw/pci-host/piix.c
  @@ -345,12 +345,10 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state,
   f-ram_memory = ram_memory;

   i440fx = I440FX_PCI_HOST_BRIDGE(dev);
  -/* Set PCI window size the way seabios has always done it. */
  -/* Power of 2 so bios can cover it with a single MTRR */
   if (ram_size = 0x8000) {
   i440fx-pci_info.w32.begin = 0x8000;
  -} else if (ram_size = 0xc000) {
  -i440fx-pci_info.w32.begin = 0xc000;
  +} else if (ram_size = 0xe000) {
  +i440fx-pci_info.w32.begin = ram_size;
   } else {
   i440fx-pci_info.w32.begin = 0xe000;
   }

Second, the code that you quoted from hw/pci-host/q35.c seems to do
exactly what you're saying we must do.

  static Property mch_props[] = {
  DEFINE_PROP_UINT64(PCIE_HOST_MCFG_BASE, Q35PCIHost, parent_obj.base_addr,
  MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT),

This sets the default mmconfig base address at
MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT. The code you qouted seems to ensure
that the range starting at 

Re: [Qemu-devel] [PATCH] virtio-rng: correct the default limit rate

2013-11-26 Thread Amos Kong
On Tue, Nov 26, 2013 at 02:58:26PM +0100, Paolo Bonzini wrote:
 Il 26/11/2013 14:43, Amos Kong ha scritto:
  /* Set a default rate limit of 2^47 bytes per minute or roughly 2TB/s.  If
   * you have an entropy source capable of generating more entropy than this
   * and you can pass it through via virtio-rng, then hats off to you.  Until
   * then, this is unlimited for all practical purposes.
   */
  
  But the current rate is (INT64_MAX) bytes per (1  16) ms, it's 128,000 
  TB/s
 
 You are changing:
 
 * max-bytes from 2^63 to 2^47
 
 * period from 65536 to 6
 
 For a user, changing only period would have no effect, the limit rate
 would remain effectively infinite.  Changing max-bytes would give a 7%
 higher rate after your patch.

The rate will be effected when period OR max-bytes is changed.

Before patch (as documenet described): 2^47 / 65536 = 1.953125 TB/s
Applied patch:  2^47 / 600,000 = 2.1328125 TB/s

(2.1328125 - 1.953125) / 1.953125 = 0.092

It increased 9.2%, as you said, not a big deal.
 
 Not a big deal, and max-bytes is easier to explain after your patch
 (bytes/minute) than before (bytes/65536ms).
 
 Reviewed-by: Paolo Bonzini pbonz...@redhat.com
 
Thanks for the quick feedback.

Amos



Re: [Qemu-devel] [PATCH] virtio-rng: correct the default limit rate

2013-11-26 Thread Eric Blake
[adding libvirt]

On 11/26/2013 06:58 AM, Paolo Bonzini wrote:
 Il 26/11/2013 14:43, Amos Kong ha scritto:
 /* Set a default rate limit of 2^47 bytes per minute or roughly 2TB/s.  If
  * you have an entropy source capable of generating more entropy than this
  * and you can pass it through via virtio-rng, then hats off to you.  Until
  * then, this is unlimited for all practical purposes.
  */

 But the current rate is (INT64_MAX) bytes per (1  16) ms, it's 128,000 TB/s
 
 You are changing:
 
 * max-bytes from 2^63 to 2^47
 
 * period from 65536 to 6
 
 For a user, changing only period would have no effect, the limit rate
 would remain effectively infinite.  Changing max-bytes would give a 7%
 higher rate after your patch.
 
 Not a big deal, and max-bytes is easier to explain after your patch
 (bytes/minute) than before (bytes/65536ms).
 
 Reviewed-by: Paolo Bonzini pbonz...@redhat.com
 

Hmm.  Libvirt is already converting a user's rate of bytes/period into
the qemu parameters, defaulting to 1 second as its default period.  Am I
correct that as long as libvirt specified both rate AND period, then
this change has no impact (and that the 7% change occurs if you specify
period while leaving max-bytes alone)?  Or is this an ABI change where
libvirt will have to be taught to be smart enough to know whether it is
old qemu or new qemu to adjust how libvirt does its calculations when
converting the user's rate into qemu terms?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] KVM call agenda for 2013-11-26

2013-11-26 Thread Juan Quintela
Juan Quintela quint...@redhat.com wrote:
 Hi

 Please, send any topic that you are interested in covering.

As there are no topics for agenda, call is cancelled.

Later,  Juan.




 Thanks, Juan.

 Call details:

 10:00 AM to 11:00 AM EDT
 Every two weeks

 If you need phone number details,  contact me privately.



Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Avi Kivity
On Tue, Nov 26, 2013 at 3:47 PM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 26/11/2013 14:18, Avi Kivity ha scritto:
 
  I don't think a workqueue is even needed.  You just need to use call_rcu
  to free old after releasing kvm-irq_lock.
 
  What do you think?
 
  Can this cause an interrupt to be delivered to the wrong (old) vcpu?

 No, this would be exactly the same code that is running now:

 mutex_lock(kvm-irq_lock);
 old = kvm-irq_routing;
 kvm_irq_routing_update(kvm, new);
 mutex_unlock(kvm-irq_lock);

 synchronize_rcu();
 kfree(old);
 return 0;

 Except that the kfree would run in the call_rcu kernel thread instead of
 the vcpu thread.  But the vcpus already see the new routing table after
 the rcu_assign_pointer that is in kvm_irq_routing_update.



I understood the proposal was also to eliminate the synchronize_rcu(), so
while new interrupts would see the new routing table, interrupts already in
flight could pick up the old one.


Re: [Qemu-devel] [PATCH 4/4] tpm: Provide libtpms software TPM backend

2013-11-26 Thread Corey Bryant


On 11/25/2013 10:04 PM, Xu, Quan wrote:

  Thanks Bryant, this problem has been solved by following 
http://www.mail-archive.com/qemu-devel@nongnu.org/msg200808.html;.
  But there is another problem when run configure with ./configure --target-list=x86_64-softmmu --enable-tpm. The 
value of libtpms is still no. when I modified tpm_libtpms to yes in configure 
file directly and make, then reported with error
hw/tpm/tpm_libtpms.c:21:33: fatal error: libtpms/tpm_library.h: No such file or 
directory.  Now I am installing libtpms with https://github.com/coreycb/libtpms for 
libtpms lib. Could you share specific step to configure QEMU based on your patch, if it 
comes
easily to you?


Here's what I've been using to build libtpms:

$ CFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'

$ export CFLAGS
$ ./configure --build=x86_64-redhat-linux-gnu --prefix=/usr 
--libdir=/usr/lib64

$ make
$ sudo make install

And then the configure you're using above should work for QEMU.


  BTW, one target of my team is enabling stubdom vtpm for HVM virtual 
machine on Xen virtualization, your patches and seabios are big breakthroughs. 
My team is very interested to collaborate with you / Qemu community on similar 
areas.


That's great to hear!

Unfortunately, the current approach of linking QEMU against libtpms 
doesn't look like it's going to make it upstream.  So it looks like we 
need to take a different approach.


Btw, I thought Xen already had TPM support.  Is that not supported in 
stubdom's?


--
Regards,
Corey Bryant



I'd be really pleased if you can help me on these issues.

Quan Xu
Intel



-Original Message-
From: Corey Bryant [mailto:cor...@linux.vnet.ibm.com]
Sent: Monday, November 25, 2013 9:53 PM
To: Xu, Quan
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 4/4] tpm: Provide libtpms software TPM
backend



On 11/24/2013 10:36 PM, Xu, Quan wrote:

Bryant,

   I found that there is some conflict in qemu-options.hx between your

patch andqemu-1.7.0-rc1.tar.bz2
http://wiki.qemu-project.org/download/qemu-1.7.0-rc1.tar.bz2.


What QEMU version does this patch base on? Thanks.





Quan Xu

Intel





Thanks Quan.  I believe I built these on top of commit
c2d30667760e3d7b81290d801e567d4f758825ca.  I don't think this series is going
to make it upstream though so I likely won't be submitting a v2.

--
Regards,
Corey Bryant








Re: [Qemu-devel] [PATCH] virtio-rng: correct the default limit rate

2013-11-26 Thread Paolo Bonzini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 26/11/2013 15:32, Eric Blake ha scritto:
 Hmm.  Libvirt is already converting a user's rate of bytes/period
 into the qemu parameters, defaulting to 1 second as its default
 period.  Am I correct that as long as libvirt specified both rate
 AND period, then this change has no impact (and that the 7% change
 occurs if you specify period while leaving max-bytes alone)?  Or is
 this an ABI change where libvirt will have to be taught to be smart
 enough to know whether it is old qemu or new qemu to adjust how
 libvirt does its calculations when converting the user's rate into
 qemu terms?

Yes, it is the former---i.e. you're right.

Paolo
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSlLPjAAoJEBvWZb6bTYbyDg8QAIk8BIh5ZY0w4aF7AHufxpmn
XtvbGIAdb+ztNBtPrL5HJXAvaOvbZNfktmKDA7Cee1Cq8opJPUZQDpwmv7xUshpG
yc1YQu3Djb0/kCDV0oNiMQARSwxJ3XMR94SSLfmlI3b7lJVDeviouhV4UdjS3eal
ZNWUXya+1bloOC+zX4P0atB5hyrpVhcVHddWLsIs4+6EYcOatHPgl5CK5f5uxq+P
5qkAe1FJTHMkBOJ86kis3s57ZTTqIlkvFiqtJY4IViifHjbMR4PJraO6EQigOxf6
VdV+MIkPm3GA6F3SYOoZNacITBsaRXTJ0kjD+BjUUAFONB7X8//Uob5gSUgmkED7
ATqiDnqSq1jwyV4FAVwYDB397ViLAZYd9k/ai4QZLaclyMqHQKx+RCNN7dmFwCuL
EYK0Z9AkoVkJJQQUcERO3ps/T89+tlBVQhBKpae1V2ZKJqCGxLaf0344S+kFicYk
bqp87ra5L5J8Qdn+/eBMJsQNa5NDWjKU0TvZSr3tGGoGKKamzwFmnsTzcZ9DHoTZ
IVcdHAbToCYnqbnJzORoxFVycSrY8B2AhZOqIz5YQgNV1LDPcK7hzcXZ+rpBu5mU
zF9MFHBKrGJo7l4GyC4jFa5osDfGOyGzMyoSvlMuFTG8CUXN/QLmcpzE+NL1QlCj
lL7TLkbhYKrQxUPw37+P
=mb5e
-END PGP SIGNATURE-



Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-26 Thread Paolo Bonzini
Il 26/11/2013 15:36, Avi Kivity ha scritto:
 
 No, this would be exactly the same code that is running now:
 
 mutex_lock(kvm-irq_lock);
 old = kvm-irq_routing;
 kvm_irq_routing_update(kvm, new);
 mutex_unlock(kvm-irq_lock);
 
 synchronize_rcu();
 kfree(old);
 return 0;
 
 Except that the kfree would run in the call_rcu kernel thread instead of
 the vcpu thread.  But the vcpus already see the new routing table after
 the rcu_assign_pointer that is in kvm_irq_routing_update.
 
 I understood the proposal was also to eliminate the synchronize_rcu(),
 so while new interrupts would see the new routing table, interrupts
 already in flight could pick up the old one.

Isn't that always the case with RCU?  (See my answer above: the vcpus
already see the new routing table after the rcu_assign_pointer that is
in kvm_irq_routing_update).

If you eliminate the synchronize_rcu, new interrupts would see the new
routing table, while interrupts already in flight will get a dangling
pointer.

Paolo



  1   2   3   >