date:20141118

Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series

2014-11-18 Thread Denis V. Lunev


On 18/11/14 19:08, John Snow wrote:



On 11/18/2014 08:09 AM, Vladimir Sementsov-Ogievskiy wrote:

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock
against any of the bitmap functions to prevent them from marking any
bits dirty.
- On first write to a clean persistent bitmap, delay the write until
we can mark the bitmap as dirty first. This incurs a write penalty
when we try to use the bitmap at first...
- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty
information to the file and mark the file as clean once more and
re-take the persistence lock.

Correct me if I'm wrong.



- first of all what we are protecting against? Any QEMU or kernel
crash leads to the disaster. You can not guarantee at all the
consistency between data written to the main file (disk) and data
written to bitmap except if you wait that dirty bitmap is really
updated. In this case you will have performance halved in comparison
with the host (each guest write means 2 IOPS instead of 1)

- this effectively means that we can not be protected against crash

- if we don't we could not care at all about bitmap updates when
QEMU is running. We should write it only on stop/suspend/etc.

This simplifies things a lot.

So, the procedure is simple
- stop main VM by using vm_pause
- open/create bitmap file
- set dirty
- unpause

That's all. If QEMU is running, bitmap is always dirty. The only 
exception is backup creation. New backup means that old bitmap is

synced with dirty clear, new one is created with dirty set and
backup starts working with old bitmap.

Once again. In all other cases we can not guarantee that we will
report _all_ changed blocks to backup software if crash happens
in the middle. One missed block in the bitmap and the entire backup
is blown up.

This also means that (4) aka sync policy is simple. Do this on
close.


#Read bitmap:
read in blockdev_init, before any write to device, so no lock is needed.

#Set bits in bitmap:
if bitmap.dirty_flag:
set bits
else:
LOCK
set bits
set bitmap.dirty_flag
set dirty_flag in bitmap file
UNLOCK

#Sync:
if not bitmap.dirty_flag:
skip sync
else:
LOCK
save one of bitmap levels (saving the last one is too long and not
very good idea, because it is fast-updateing)
unset dirty_flag in bitmap file
unset bitmap.dirty_flag
UNLOCK

#Last sync in bdrv_close:
Just save the last bitmap level and unset dirty_flag in bitmap file

Also.. I'm not quite sure about locking.. As I understand, co-routines
in qemu are not running in parallel, is locking required? Or sync timer
will not be co-routine based?

Best regards,
Vladimir


Might be being too informal. I just meant a lock or barrier to prevent
actual IO throughput until we can confirm the dirty flag has been
adjusted to indicate that the persistent bitmap is now officially out of
date. Nothing fancy.

Wasn't trying to imply that we needed threading protection, just
"locking" the IO until we can configure the bitmap as we need it to be.

[Qemu-devel] [PATCH] usb: delete redundant brackets in usb_host_handle_control()

2014-11-18 Thread Jun Li

When see usb codes, find there are redundant brackets !((udev->port->speedmask
& USB_SPEED_MASK_SUPER)) here. So delete it.

Signed-off-by: Jun Li 
---
 hw/usb/host-libusb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
index a5f9dab..cff4f7c 100644
--- a/hw/usb/host-libusb.c
+++ b/hw/usb/host-libusb.c
@@ -1237,7 +1237,7 @@ static void usb_host_handle_control(USBDevice *udev, 
USBPacket *p,
 /* Fix up USB-3 ep0 maxpacket size to allow superspeed connected devices
  * to work redirected to a not superspeed capable hcd */
 if (udev->speed == USB_SPEED_SUPER &&
-!((udev->port->speedmask & USB_SPEED_MASK_SUPER)) &&
+!(udev->port->speedmask & USB_SPEED_MASK_SUPER) &&
 request == 0x8006 && value == 0x100 && index == 0) {
 r->usb3ep0quirk = true;
 }
-- 
1.9.3

[Qemu-devel] [PATCH v2] tests: Use "command -v" instead of which(1) in shell scripts

2014-11-18 Thread Fam Zheng

When which(1) is not installed, we would complain "perl not found"
because it's the first set_prog_path check. The error message is
wrong.

Fix it by using "command -v", a native way to query the existence of a
command.

Suggested-by: Eric Blake 
Signed-off-by: Fam Zheng 

---
v2: Use "command -v" as suggested by Eric. Also change a few other
occasions of which(1) in tests/qemu-iotests/common.
---
 tests/qemu-iotests/common| 8 
 tests/qemu-iotests/common.config | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 9e12bec..bc27f6a 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -289,10 +289,10 @@ testlist options
 
 if [ ! -z "$DISPLAY" ]
 then
-which xdiff >/dev/null 2>&1 && diff=xdiff
-which gdiff >/dev/null 2>&1 && diff=gdiff
-which tkdiff >/dev/null 2>&1 && diff=tkdiff
-which xxdiff >/dev/null 2>&1 && diff=xxdiff
+command -v xdiff >/dev/null 2>&1 && diff=xdiff
+command -v gdiff >/dev/null 2>&1 && diff=gdiff
+command -v tkdiff >/dev/null 2>&1 && diff=tkdiff
+command -v xxdiff >/dev/null 2>&1 && diff=xxdiff
 fi
 ;;
 
diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
index bd6790b..91a5ef6 100644
--- a/tests/qemu-iotests/common.config
+++ b/tests/qemu-iotests/common.config
@@ -47,7 +47,7 @@ export PWD=`pwd`
 # $1 = prog to look for, $2* = default pathnames if not found in $PATH
 set_prog_path()
 {
-p=`which $1 2> /dev/null`
+p=`command -v $1 2> /dev/null`
 if [ -n "$p" -a -x "$p" ]; then
 echo $p
 return 0
-- 
1.9.3

Re: [Qemu-devel] [PATCH] usb: delete redundant brackets in usb_host_handle_control()

2014-11-18 Thread Fam Zheng

On Wed, 11/19 14:57, Jun Li wrote:
> When see usb codes, find there are redundant brackets !((udev->port->speedmask
> & USB_SPEED_MASK_SUPER)) here. So delete it.
> 
> Signed-off-by: Jun Li 
> ---
>  hw/usb/host-libusb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
> index a5f9dab..cff4f7c 100644
> --- a/hw/usb/host-libusb.c
> +++ b/hw/usb/host-libusb.c
> @@ -1237,7 +1237,7 @@ static void usb_host_handle_control(USBDevice *udev, 
> USBPacket *p,
>  /* Fix up USB-3 ep0 maxpacket size to allow superspeed connected devices
>   * to work redirected to a not superspeed capable hcd */
>  if (udev->speed == USB_SPEED_SUPER &&
> -!((udev->port->speedmask & USB_SPEED_MASK_SUPER)) &&
> +!(udev->port->speedmask & USB_SPEED_MASK_SUPER) &&
>  request == 0x8006 && value == 0x100 && index == 0) {
>  r->usb3ep0quirk = true;
>  }
> -- 
> 1.9.3
> 
> 

Reviewed-by: Fam Zheng 

Cc'ing qemu-triv...@nongnu.org.

Fam

[Qemu-devel] [PATCH] configure: Replace which(1) with "command -v"

2014-11-18 Thread Fam Zheng

Because which(1) is not always installed, whereas "command -v" is
the more native way to check for a command.

Signed-off-by: Fam Zheng 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 47048f0..986a13d 100755
--- a/configure
+++ b/configure
@@ -2727,7 +2727,7 @@ fi
 if test "$modules" = yes; then
 shacmd_probe="sha1sum sha1 shasum"
 for c in $shacmd_probe; do
-if which $c >/dev/null 2>&1; then
+if command -v $c >/dev/null 2>&1; then
 shacmd="$c"
 break
 fi
-- 
1.9.3

Re: [Qemu-devel] [PATCH for-2.2] acpi-build: mark RAM dirty on table update

2014-11-18 Thread Amit Shah

On (Mon) 17 Nov 2014 [19:04:13], Michael S. Tsirkin wrote:
> acpi build modifies internal FW CFG RAM on first access
> but we forgot to mark it dirty.
> If this RAM has been migrated already, it won't be
> migrated again, returning corrupted tables to guest.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/hw/loader.h  |  2 +-
>  hw/core/loader.c |  8 +---
>  hw/i386/acpi-build.c | 11 ---
>  3 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/include/hw/loader.h b/include/hw/loader.h
> index 054c6a2..6481639 100644
> --- a/include/hw/loader.h
> +++ b/include/hw/loader.h
> @@ -59,7 +59,7 @@ extern bool rom_file_has_mr;
>  int rom_add_file(const char *file, const char *fw_dir,
>   hwaddr addr, int32_t bootindex,
>   bool option_rom);
> -void *rom_add_blob(const char *name, const void *blob, size_t len,
> +ram_addr_t rom_add_blob(const char *name, const void *blob, size_t len,
> hwaddr addr, const char *fw_file_name,
> FWCfgReadCallback fw_callback, void *callback_opaque);

Here, and in the next hunks where function signatures are modified,
indent of following lines go off.  Minor nit.

>  int rom_add_elf_program(const char *name, void *data, size_t datasize,
> diff --git a/hw/core/loader.c b/hw/core/loader.c
> index bbe6eb3..5cf686d 100644
> --- a/hw/core/loader.c
> +++ b/hw/core/loader.c
> @@ -798,12 +798,12 @@ err:
>  return -1;
>  }
>  
> -void *rom_add_blob(const char *name, const void *blob, size_t len,
> +ram_addr_t rom_add_blob(const char *name, const void *blob, size_t len,
> hwaddr addr, const char *fw_file_name,
> FWCfgReadCallback fw_callback, void *callback_opaque)
>  {
>  Rom *rom;
> -void *data = NULL;
> +ram_addr_t ret = RAM_ADDR_MAX;
>  
>  rom   = g_malloc0(sizeof(*rom));
>  rom->name = g_strdup(name);
> @@ -815,11 +815,13 @@ void *rom_add_blob(const char *name, const void *blob, 
> size_t len,
>  rom_insert(rom);
>  if (fw_file_name && fw_cfg) {
>  char devpath[100];
> +void *data;
>  
>  snprintf(devpath, sizeof(devpath), "/rom@%s", fw_file_name);
>  
>  if (rom_file_has_mr) {
>  data = rom_set_mr(rom, OBJECT(fw_cfg), devpath);
> +ret = memory_region_get_ram_addr(rom->mr);
>  } else {
>  data = rom->data;
>  }
> @@ -828,7 +830,7 @@ void *rom_add_blob(const char *name, const void *blob, 
> size_t len,
>   fw_callback, callback_opaque,
>   data, rom->romsize);
>  }
> -return data;
> +return ret;
>  }
>  
>  /* This function is specific for elf program because we don't need to 
> allocate
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 4003b6b..92a36e3 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -56,6 +56,7 @@
>  
>  #include "qapi/qmp/qint.h"
>  #include "qom/qom-qobject.h"
> +#include "exec/ram_addr.h"
>  
>  /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
>   * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
> @@ -1511,7 +1512,7 @@ static inline void 
> acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre)
>  typedef
>  struct AcpiBuildState {
>  /* Copy of table in RAM (for patching). */
> -uint8_t *table_ram;
> +ram_addr_t table_ram;
>  uint32_t table_size;
>  /* Is table patched? */
>  uint8_t patched;
> @@ -1716,9 +1717,12 @@ static void acpi_build_update(void *build_opaque, 
> uint32_t offset)
>  acpi_build(build_state->guest_info, &tables);
>  
>  assert(acpi_data_len(tables.table_data) == build_state->table_size);
> -memcpy(build_state->table_ram, tables.table_data->data,
> +memcpy(qemu_get_ram_ptr(build_state->table_ram), tables.table_data->data,
> build_state->table_size);

This looks like something not necessary for this patch?  Can be split
off into another one?

> +cpu_physical_memory_set_dirty_range_nocode(build_state->table_ram,
> +   build_state->table_size);
> +
>  acpi_build_tables_cleanup(&tables, true);
>  }
>  
> @@ -1728,7 +1732,7 @@ static void acpi_build_reset(void *build_opaque)
>  build_state->patched = 0;
>  }
>  
> -static void *acpi_add_rom_blob(AcpiBuildState *build_state, GArray *blob,
> +static ram_addr_t acpi_add_rom_blob(AcpiBuildState *build_state, GArray 
> *blob,
> const char *name)
>  {
>  return rom_add_blob(name, blob->data, acpi_data_len(blob), -1, name,
> @@ -1777,6 +1781,7 @@ void acpi_setup(PcGuestInfo *guest_info)
>  /* Now expose it all to Guest */
>  build_state->table_ram = acpi_add_rom_blob(build_state, 
> tables.table_data,
> ACPI_BUILD_TABLE_FILE);
> +assert(build_state->table_ram != RAM_ADDR_MAX);
>

Re: [Qemu-devel] [PATCH 0/5] pc: make ROMs resizeable

2014-11-18 Thread Amit Shah

On (Mon) 17 Nov 2014 [22:11:13], Michael S. Tsirkin wrote:

> And to make it clear, I'd like to see review from
> migration maintainers on this.
> The patchset crosses into pc so it's probably best to
> merge through my tree to avoic conflicts.

Fine by me.

Amit

Re: [Qemu-devel] [PATCH 0/5] pc: make ROMs resizeable

2014-11-18 Thread Amit Shah

On (Mon) 17 Nov 2014 [22:08:46], Michael S. Tsirkin wrote:
> At the moment we migrate ROMs which reside in fw cfg, which allows
> changing ROM code at will, and supports migrating largish blocks early,
> with good performance.
> However, we are running into a problem: changing size breaks
> migration every time.
> This already requires somewhat messy compatibility support in
> acpi generation code, and it looks like there'll be more to come.
> 
> Rather than try to guess the correct size once and for all,
> this patchset tries to make code future-proof, by
> adding support for resizeable ram blocks.
> 
> A (possibly very high) amount of space in ram_addr_t space is reserved
> for each block, but never allocated.
> If incoming block size differs from current size, block is
> reallocated. FW CFG is also notified and updated accordingly.
> 
> To simplify things, I didn't add support for resizing
> actual RAM: device RAM such as fw cfg ROMs are never mapped
> into guests directly, so instead I added an API to
> flag device RAM explicitly, and manage them using
> simple alloc/free/realloc
> 
> Considering this promises to rid us of worries about ROM size considerations
> once and for all, I thinking about pushing this as a "kind of bugfix" before
> 2.2, so we don't need to maintain more band-aids in 2.3 and on.

I'd rather wait for 2.3; we've done this for a couple of releases
already, so what's one more.  And we're at rc2 already..

> Note: migration stream is unaffected by these patches.
> This makes it possible to enable this functionality
> unconditionally, for all machine types.
> 
> In the future, this might be handy for other things,
> such as changing kernels loaded on command line
> across migrations.

I think that'll be too risky; unless we do S4 before / after
migration to ensure the kernel realises things might be changing
beneath its feet.

Amit

Re: [Qemu-devel] [PATCH 1/4] virtio-mmio: introduce set_host_notifier()

2014-11-18 Thread Fam Zheng

On Tue, 11/04 20:47, Shannon Zhao wrote:
> set_host_notifier() is introduced into virtio-mmio now. Most of codes came
> from virtio-pci.
> 
> Signed-off-by: Ying-Shiuan Pan 
> Signed-off-by: Li Liu 
> Signed-off-by: Shannon Zhao 
> ---
>  hw/virtio/virtio-mmio.c |   70 
> +++
>  1 files changed, 70 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
> index 2450c13..d8ec2d1 100644
> --- a/hw/virtio/virtio-mmio.c
> +++ b/hw/virtio/virtio-mmio.c
> @@ -23,6 +23,7 @@
>  #include "hw/virtio/virtio.h"
>  #include "qemu/host-utils.h"
>  #include "hw/virtio/virtio-bus.h"
> +#include "qemu/error-report.h"
>  
>  /* #define DEBUG_VIRTIO_MMIO */
>  
> @@ -87,8 +88,58 @@ typedef struct {
>  uint32_t guest_page_shift;
>  /* virtio-bus */
>  VirtioBusState bus;
> +bool ioeventfd_disabled;
> +bool ioeventfd_started;
>  } VirtIOMMIOProxy;
>  
> +static int virtio_mmio_set_host_notifier_internal(VirtIOMMIOProxy *proxy,
> + int n, bool assign, bool 
> set_handler)

I didn't review the code, but checkpatch.pl noticed this line and one more
below [*] is too long (over 80 columes).

> +{
> +VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> +VirtQueue *vq = virtio_get_queue(vdev, n);
> +EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
> +int r = 0;
> +
> +if (assign) {
> +r = event_notifier_init(notifier, 1);
> +if (r < 0) {
> +error_report("%s: unable to init event notifier: %d",
> + __func__, r);
> +return r;
> +}
> +virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler);
> +memory_region_add_eventfd(&proxy->iomem, VIRTIO_MMIO_QUEUENOTIFY, 4,
> +  true, n, notifier);
> +} else {
> +memory_region_del_eventfd(&proxy->iomem, VIRTIO_MMIO_QUEUENOTIFY, 4,
> +  true, n, notifier);
> +virtio_queue_set_host_notifier_fd_handler(vq, false, false);
> +event_notifier_cleanup(notifier);
> +}
> +return r;
> +}
> +
> +static void virtio_mmio_stop_ioeventfd(VirtIOMMIOProxy *proxy)
> +{
> +int r;
> +int n;
> +VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> +
> +if (!proxy->ioeventfd_started) {
> +return;
> +}
> +
> +for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
> +if (!virtio_queue_get_num(vdev, n)) {
> +continue;
> +}
> +
> +r = virtio_mmio_set_host_notifier_internal(proxy, n, false, false);
> +assert(r >= 0);
> +}
> +proxy->ioeventfd_started = false;
> +}
> +
>  static uint64_t virtio_mmio_read(void *opaque, hwaddr offset, unsigned size)
>  {
>  VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
> @@ -342,6 +393,24 @@ static void virtio_mmio_reset(DeviceState *d)
>  proxy->guest_page_shift = 0;
>  }
>  
> +static int virtio_mmio_set_host_notifier(DeviceState *opaque, int n, bool 
> assign)

[*]

No need to respin yet just for this. Please wait for a serious review.

Thanks,

Fam

Re: [Qemu-devel] [PATCH 1/9] l2tpv3: fix fd leak

2014-11-18 Thread Gonglei

On 2014/11/18 15:50, Markus Armbruster wrote:

> Michael Tokarev  writes:
> 
>> 15.11.2014 13:06, arei.gong...@huawei.com wrote:
>>> From: Gonglei 
>>>
>>> In this false branch, fd will leak when it is zero.
>>> Change the testing condition.
>>
>> Why fd==0 is a concern here?  It is a very unlikely
>> situation that fd0 will be picked - firstly because
>> fd0 is almost always open, and second - even if it
>> isn't open, it will be picked much earlier than this
>> code path, ie, some other file will use fd0 before.
>>
>> But even if the concern is real (after all, better
>> stay correct than spread bad code pattern, even if
>> in reality we don't care as this can't happen), why
>> not add 0 to the equality?
>>
>> Why people especially compare with -1?  Any negative
>> value is illegal here and in lots of other places,
>> and many software packages used to return -errno in
>> error cases, which is definitely != -1.  I'm not
>> saying that comparing with -1 is bad in _this_
>> particular case, but why not do it generally in
>> all cases?
>>
>> More, comparing with 0 is faster and shorter than
>> comparing with -1...
>>
>> So if it were me, I'd change it to >= 0, not to
>> == -1.  Here and in all other millions of places
>> in qemu code where it compares with -1... ;)
> 
> Yup.
> 
> In the case of close(), I wouldn't even bother to check, except Coverity
> gets excited when it sees an invalid fd flow into close().  Rightfully
> so when the invalid fd is non-negative, overeager when it's negative.

Thank you, guys.
Paolo had fixed it :)

Best regards,
-Gonglei

Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting for block devices

2014-11-18 Thread Francesco Romani

- Original Message -
> From: "Stefan Hajnoczi" 
> To: "Francesco Romani" 
> Cc: kw...@redhat.com, lcapitul...@redhat.com, qemu-devel@nongnu.org, 
> stefa...@redhat.com, mdr...@linux.vnet.ibm.com
> Sent: Monday, November 17, 2014 5:49:36 PM
> Subject: Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold 
> reporting for block devices
> 
> On Fri, Nov 07, 2014 at 02:12:13PM +0100, Francesco Romani wrote:
> 
> Sorry for the long review delay.  Looks pretty good, just one real issue
> to think about at the bottom.

Hi Stefan, thanks for the review and no problem for the delay :)

> 
> > +static void usage_threshold_disable(BlockDriverState *bs)
> > +{
> 
> It would be safest to make this idempotent:
> 
> if (!usage_threshold_is_set(bs)) {
> return;
> }
> 
> That way it can be called multiple times.

Will change.

> 
> > +notifier_with_return_remove(&bs->wr_usage_threshold_notifier);
> > +bs->wr_offset_threshold = 0;
> > +}
> > +
> > +static int usage_threshold_is_set(const BlockDriverState *bs)
> > +{
> > +return !!(bs->wr_offset_threshold > 0);
> > +}
> 
> Please use the bool type instead of an int return value.

Sure, will fix.

> > +static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
> > +void *opaque)
> > +{
> > +BdrvTrackedRequest *req = opaque;
> > +BlockDriverState *bs = req->bs;
> > +int64_t amount = 0;
> > +
> > +assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
> > +assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
> 
> Does the code still make these assumptions or are the asserts left over
> from previous versions of the patch?

It's a leftover.
I understood they don't hurt and add a bit of safety, but if they are confusing
I'll remove them.

> > +void bdrv_set_usage_threshold(BlockDriverState *bs, int64_t
> > threshold_bytes)
> > +{
> > +BlockDriverState *target_bs = bs;
> > +if (bs->file) {
> > +target_bs = bs->file;
> > +}
> 
> Hmm...I think now I understand why you are trying to use bs->file.  This
> is an attempt to make image formats work with the threshold.
> 
> Unfortunately the BlockDriverState topology can be more complicated than
> just 1 level.

I thought so but I can't reproduce yet more complex topologies.
Disclosure: I'm testing against the topology I know to be used on oVirt, lacking
immediate availability of others: suggestions welcome.

At risk of being redundant, in oVirt the usecase is QCOW2 over lvm block device,
and we'd like to be notified about the allocation of the lvm block device, 
which (IIUC)
is the last bs->file.

This is a simple topology (unless I'm missing something), and that's
the reason why I go down just one level.

Of course I want a general solution for this change, so...

> If we hardcode a strategy to traverse bs->file then it will work in most
> cases:
> 
>   while (bs->file) {
>   bs = bs->file;
>   }
> 
> But there are cases like VMDK extent files where a BlockDriverState
> actually has multiple children.
> 
> One way to solve this is to require that the management tool tells QEMU
> which exact BlockDriverState node the threshold applies to.  Then QEMU
> doesn't need any hardcoded policy.  But I'm not sure how realistic that
> it at the moment (whether management tools are uses node names for each
> node yet), so it may be best to hardcode the bs->file traversal that
> I've suggested.

oVirt relies on libvirt[1], and uses device node (e.g. 'vda').

BTW, I haven't found a way to inspect from the outside (e.g. monitor command) 
the BlockDriverState
topology, there is a way I'm missing?

Another issue I don't yet have a proper solution is related to this one.

The management app will have deal with VM with more than one block device disk, 
so we need
a way to map the notification with the corresponding device.

If we descend the bs->file chain, AFAIU the easiest mapping, being the device 
name,
is easily lost because only the outermost BlockDriverState has a device name 
attached, so when the
notification trigger
bdrv_get_device_name() will return NULL

I believe we don't necessarily need a device name in the notification, for 
example something
like an opaque cookie set together with the threshold and returned back with 
the notification
will suffice. Of course the device name is nicer :)

+++

[1] if that can help further to understand the usecase, these are the libvirt 
APIs being used
by oVirt:
http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockStatsFlags
http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetBlockInfo
both relies on the output[2] of 'query-blockstats' monitor command.

[2] AFAIU -but this is just my guesswork!- it also assumes a quite simple 
topology like I did

Thanks and bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani

Re: [Qemu-devel] [Bug 1393486] [NEW] hw/virtio/virtio-rng.c:150: bad test ?

2014-11-18 Thread Amit Shah

On (Mon) 17 Nov 2014 [17:15:03], dcb wrote:
> Public bug reported:
> 
> hw/virtio/virtio-rng.c:150:31: warning: logical not is only applied to
> the left hand side of comparison [-Wlogical-not-parentheses]
> 
> if (!vrng->conf.period_ms > 0) {
> error_setg(errp, "'period' parameter expects a positive integer");
> return;
> }
> 
> Maybe better code
> 
> if (vrng->conf.period_ms <= 0) {
> error_setg(errp, "'period' parameter expects a positive integer");
> return;
> }

Thanks!

Do you want to submit a patch, since you've identified the fix as
well?


Amit

Re: [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it

2014-11-18 Thread Max Reitz


On 2014-11-17 at 17:43, Eric Blake wrote:

On 11/17/2014 03:18 AM, Markus Armbruster wrote:

On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.

Maybe worth a comment that this is not fatal, but also not optimal.


Additionally, unlikely lseek() failures are treated badly:

* When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
   -ENXIO, there's in fact a trailing hole.  Can happen only when
   something truncated the file since we opened it.

* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
   then try_seek_hole() reports a trailing hole.  This is okay only
   when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
   found by SEEK_HOLE has since become trailing somehow).  For other
   failures (unlikely), it's wrong.

* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
   then try_seek_hole() reports bogus data [-1,start), which its caller
   raw_co_get_block_status() turns into zero sectors of data.  Could
   theoretically lead to infinite loops in code that attempts to scan
   data vs. hole forward.

Rewrite from scratch, with very careful comments.

Thanks for the careful commit message as well as the careful comments :)


Signed-off-by: Markus Armbruster 
---
  block/raw-posix.c | 111 +-
  1 file changed, 85 insertions(+), 26 deletions(-)

@@ -1542,25 +1600,26 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
  nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
  }
  
+} else if (data == start) {

  /* On a data extent, compute sectors to the end of the extent.  */
  *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);

I think we are safe for now that no file system supports holes smaller
than 512 bytes, so (hole - start) should always be a non-zero multiple
of sectors.  Similarly for the hole case of (data - start).  Maybe it's
worth assert(*pnum > 0) to ensure that we never hit a situation where we
go into an infinite loop where we aren't progressing because pnum is
never advancing to the next sector?


That's something the callers of bdrv_get_block_status() have to take 
care of. See commit 4b25bbc4c22cf39350b75bd250d568a4d975f7c5, for example.


Max


But that would be okay as a
separate patch, and I don't want to delay getting _this_ patch into 2.2.

Reviewed-by: Eric Blake

Re: [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE

2014-11-18 Thread Max Reitz


On 2014-11-17 at 11:18, Markus Armbruster wrote:

PATCH 1 is just a comment fix.  PATCH 2 drops FIEMAP use and explains
why it needs to go.  PATCH 3 carefully rewrites the SEEK_HOLE code.

Why 2.2?  The series fixes bugs, but the bugs are either not terribly
severe, or not particularly likely to bite.  The reason I want it
included is we've already changed the code fundamentally in 2.2, from

 Incorrect use of FIEMAP if available, else somewhat incorrect use
 of SEEK_HOLE if available, else pretend no holes

to

 Somewhat incorrect use of SEEK_HOLE if available, else correct but
 slow-as-molasses use of FIEMAP if available, else pretend no holes

Let's finish the job:

 Correct use of SEEK_HOLE if available, else pretend no holes

Less complex, more robust, and no half-broken intermediate version
released.

v3:
* PATCH 1&2 unchanged, except for a commit message typo [Eric]
* PATCH 3&4 redone as PATCH 3, very carefully [Eric, Kevin]

v2:
* PATCH 1 unchanged
* PATCH 2 revised and split up [Paolo, Fam, Eric, Max]

Markus Armbruster (3):
   raw-posix: Fix comment for raw_co_get_block_status()
   raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
   raw-posix: The SEEK_HOLE code is flawed, rewrite it

  block/raw-posix.c | 167 --
  1 file changed, 86 insertions(+), 81 deletions(-)


Thanks, applied to my block tree with the hunk removing skip_fiemap 
squashed into patch 2:


https://github.com/XanClic/qemu/commits/block

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Amit Shah

On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
> This patchset fixes CVE-2014-7840: invalid
> migration stream can cause arbitrary qemu memory
> overwrite.
> First patch includes the minimal fix for the issue.
> Follow-up patches on top add extra checking to reduce the
> chance this kind of bug recurs.
> 
> Note: these are already (tentatively-pending review)
> queued in my tree, so only review/ack
> is necessary.

Any more acks for this?  Dave?

Also, Michael agreed to just push patch 1 for now, and 2-4 can go in
later (2.3)?  Michael, is that fine with you?


Amit

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Amit Shah

On (Mon) 17 Nov 2014 [14:36:45], Michael S. Tsirkin wrote:
> On Mon, Nov 17, 2014 at 05:50:34PM +0530, Amit Shah wrote:
> > On (Mon) 17 Nov 2014 [13:48:58], Michael S. Tsirkin wrote:
> > > On Mon, Nov 17, 2014 at 04:37:50PM +0530, Amit Shah wrote:
> > > > On (Mon) 17 Nov 2014 [12:52:59], Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 17, 2014 at 04:08:58PM +0530, Amit Shah wrote:
> > > > > > On (Mon) 17 Nov 2014 [12:32:57], Michael S. Tsirkin wrote:
> > > > > > > On Mon, Nov 17, 2014 at 12:06:38PM +0530, Amit Shah wrote:
> > > > > > > > On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
> > > > > > > > > This patchset fixes CVE-2014-7840: invalid
> > > > > > > > > migration stream can cause arbitrary qemu memory
> > > > > > > > > overwrite.
> > > > > > > > > First patch includes the minimal fix for the issue.
> > > > > > > > > Follow-up patches on top add extra checking to reduce the
> > > > > > > > > chance this kind of bug recurs.
> > > > > > > > > 
> > > > > > > > > Note: these are already (tentatively-pending review)
> > > > > > > > > queued in my tree, so only review/ack
> > > > > > > > > is necessary.
> > > > > > > > 
> > > > > > > > Why not let this go in via the migration tree?
> > > > > > > 
> > > > > > > Well I Cc'd Juan and David, so if they had a problem with this, I 
> > > > > > > expect
> > > > > > > they'd complain.  David acked so I assume it's ok.  Since I 
> > > > > > > wasted time
> > > > > > > testing this and have it on my tree already, might as well just 
> > > > > > > merge.
> > > > > > 
> > > > > > IMO asking as a courtesy would've been better than just stating it.
> > > > > 
> > > > > Right, thanks for reminding me.
> > > > > 
> > > > > BTW, there is actually a good reason to special-case it: it's a CVE 
> > > > > fix,
> > > > > which I handle.  So they stay on my private queue and are passed
> > > > > to vendors so vendors can fix downstreams, until making fix public is
> > > > > cleared with all reporters and vendors.
> > > > > After reporting is cleared, I try to collect acks but don't normally 
> > > > > route
> > > > > patches through separate queues - that would make it harder to
> > > > > track the status which we need for CVEs.
> > > > 
> > > > Patch is public, so all of this doesn't really matter.
> > > > 
> > > > But: involving maintainers in their areas, even if the patch is
> > > > embargoed, should be a pre-requisite.  I'm not sure if we're doing
> > > > that, but please do that so patches get a proper review from the
> > > > maintainers.
> > > 
> > > Involving more people means more back and forth with reporters which
> > > must approve any disclosure.  If the issue isn't clear, I do involve
> > > maintainers.  I send patches on list and try to merge them only after
> > > they get ack from relevant people. I'm sorry, but this is as far as I
> > > have the time to go.
> > 
> > The other aspect of the thing is sub-optimal, or patches with bugs,
> > get pushed in, because the maintainers didn't get involved.  
> 
> Patches don't get merged before they are on list for a while.
> I typically ping people if I don't get acks.

BTW I was talking about embargoed bugs / patches.  That's not relevant
for this discussion.  I'll create a new thread to discuss qemu's
security policy for embargoed bugs.


Amit

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Dr. David Alan Gilbert

* Amit Shah (amit.s...@redhat.com) wrote:
> On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
> > This patchset fixes CVE-2014-7840: invalid
> > migration stream can cause arbitrary qemu memory
> > overwrite.
> > First patch includes the minimal fix for the issue.
> > Follow-up patches on top add extra checking to reduce the
> > chance this kind of bug recurs.
> > 
> > Note: these are already (tentatively-pending review)
> > queued in my tree, so only review/ack
> > is necessary.
> 
> Any more acks for this?  Dave?

Yep,
Reviewed-by: Dr. David Alan Gilbert 
(for the set)

> Also, Michael agreed to just push patch 1 for now, and 2-4 can go in
> later (2.3)?  Michael, is that fine with you?
> 
> 
>   Amit

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] mute button not work

2014-11-18 Thread li

Hi all,
 
i use qemu-system-x86_64 to run windows 7, and fund that "mute/unmute" button 
in the right bottom of guest desktop not work.
 
related info is as below:
 
$qemu-system-x86_64 --version
QEMU emulator version 1.4.0, Copyright (c) 2003-2008 Fabrice Bellard
 
$ps -elf | grep qemu-system-x86_64
qemu-system-x86_64 -machine accel=kvm:tcg -m 4096 -smp 4 -M pc-i440fx-1.4 -spi
ce port=5906,addr=0.0.0.0,disable-ticketing,jpeg-wan-compression=always -vga 
qxl -global qxl-vga.vram_size=268435456 
/opt/instances/win7_sh_2.1.1_driver.qcow2 -net 
nic,macaddr=00:16:3e:00:00:04,model=virtio
-net tap,ifname=tap04 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6  -chardev 
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev 
spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device 
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7
 
$uname -a
Linux mymachine 3.8.0-34-generic #49~precise1-Ubuntu SMP Wed Nov 13 18:05:00 
UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
 
Best Regards,
Stone

Re: [Qemu-devel] [PATCH 2/5] block: Add AioContextNotifier functions to BB

2014-11-18 Thread Max Reitz


On 2014-11-17 at 18:26, Paolo Bonzini wrote:

On 17/11/2014 16:30, Max Reitz wrote:

Because all BlockDriverStates behind a single BlockBackend reside in a
single AioContext, it is fine to just pass these functions
(blk_add_aio_context_notifier() and blk_remove_aio_context_notifier())
through to the root BlockDriverState.

Signed-off-by: Max Reitz 

The logical question then is: are there any function in BlockDriverState
that do not make sense as a BlockBackend API?


Well, surely bdrv_swap(), or bdrv_drop_intermediate(). These are 
functions which work on multiple BDSs (in the same tree, that is, behind 
the same BB) so they don't make sense on the BB.


Others could simply be passed through to the root BDS but it somehow 
doesn't make sense to execute them on the BB. For instance, 
bdrv_set_key(); this is something for an individual BDS, in contrast to 
other operations like reading and writing which will probably be passed 
through the BDS tree; or bdrv_get_info().


The functions added in this patch do make sense on a BB level: Since all 
BDSs behind one BB are always in the same AioContext, it makes sense to 
consider that the BB's AioContext.


The next patch is more difficult to justify. Closing a BDS is somehow 
passed through, but at first glance it doesn't make a whole lot of sense 
on the BB level. However, when you consider (as far as I looked into it) 
that a BDS is only closed when there are either no references to it 
(which will not happen as long as it has a BB) or when it is ejected, it 
suddenly does make sense: "Ejecting" really is something for the BB, so 
it makes sense to wait for that event (even though the name "close 
notifier" doesn't sound much like it...). Maybe I should sometimes take 
a deeper look into when a BDS belonging to a BB may be closed and if 
it's really only due to ejection, rename the "close notifiers" to 
something like "eject notifiers" (on the BB level).


Max

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Michael S. Tsirkin

On Tue, Nov 18, 2014 at 02:31:34PM +0530, Amit Shah wrote:
> On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
> > This patchset fixes CVE-2014-7840: invalid
> > migration stream can cause arbitrary qemu memory
> > overwrite.
> > First patch includes the minimal fix for the issue.
> > Follow-up patches on top add extra checking to reduce the
> > chance this kind of bug recurs.
> > 
> > Note: these are already (tentatively-pending review)
> > queued in my tree, so only review/ack
> > is necessary.
> 
> Any more acks for this?  Dave?
> 
> Also, Michael agreed to just push patch 1 for now, and 2-4 can go in
> later (2.3)?  Michael, is that fine with you?
> 
> 
>   Amit

I'm fine with putting them off.
Someone wants to take them or should I?


-- 
MST

Re: [Qemu-devel] [PATCH 5/5] nbd: Use BlockBackend internally

2014-11-18 Thread Max Reitz


On 2014-11-17 at 18:29, Paolo Bonzini wrote:


On 17/11/2014 16:30, Max Reitz wrote:

With all externally visible functions changed to use BlockBackend, this
patch makes nbd use BlockBackend for everything internally as well.

While touching them, substitute 512 by BDRV_SECTOR_SIZE in the calls to
blk_read(), blk_write() and blk_co_discard().

Signed-off-by: Max Reitz 

qemu-nbd.c has more uses of "bs" that probably should be changed to
"blk".  Not sure if it should be in this patch, patch 4 or an additional
patch 6 though.


Good point. I'll go for patch 6.


If the third possibility, series

Reviewed-by: Paolo Bonzini 


Thanks!

Max

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Dr. David Alan Gilbert

* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Tue, Nov 18, 2014 at 02:31:34PM +0530, Amit Shah wrote:
> > On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
> > > This patchset fixes CVE-2014-7840: invalid
> > > migration stream can cause arbitrary qemu memory
> > > overwrite.
> > > First patch includes the minimal fix for the issue.
> > > Follow-up patches on top add extra checking to reduce the
> > > chance this kind of bug recurs.
> > > 
> > > Note: these are already (tentatively-pending review)
> > > queued in my tree, so only review/ack
> > > is necessary.
> > 
> > Any more acks for this?  Dave?
> > 
> > Also, Michael agreed to just push patch 1 for now, and 2-4 can go in
> > later (2.3)?  Michael, is that fine with you?
> > 
> > 
> > Amit
> 
> I'm fine with putting them off.
> Someone wants to take them or should I?

exec.c stuff seems to be pretty much all over, so I think you're
as good as anyone.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 2/5] block: Add AioContextNotifier functions to BB

2014-11-18 Thread Paolo Bonzini

On 18/11/2014 10:26, Max Reitz wrote:
> However, when you consider (as far as I looked into it) that a BDS is
> only closed when there are either no references to it (which will not
> happen as long as it has a BB) or when it is ejected, it suddenly does
> make sense: "Ejecting" really is something for the BB, so it makes sense
> to wait for that event (even though the name "close notifier" doesn't
> sound much like it...).

I agree, and indeed the notifier was added to deal with ejection.

Thanks for clarifying.  I guess if QEMU were written in C++ we would
have a hierarchy like this:

BlockDevice (abstract)
BlockBackend
BlockDriverState

and there would be no duplication of function names; the BlockBackend
implementation would still have to forward to the top BlockDriverState.

Paolo

[Qemu-devel] hotremoving a disk qmp/hmp

2014-11-18 Thread William Dauchy

Hello,

When hotremoving a disk I'm using the QMP API with device_del command;

Previous query-block command result:

{   u'device': u'disk1',
u'inserted': {   u'backing_file_depth': 0,
 u'bps': 0,
 u'bps_rd': 0,
 u'bps_wr': 0,
 u'detect_zeroes': u'off',
 u'drv': u'raw',
 u'encrypted': False,
 u'encryption_key_missing': False,
 u'file': u'/dev/sda',
 u'image': {   u'actual-size': 0,
   u'dirty-flag': False,
   u'filename': u'/dev/sda',
   u'format': u'raw',
   u'virtual-size': 3221225472},
 u'iops': 0,
 u'iops_rd': 0,
 u'iops_wr': 0,
 u'ro': False},
u'io-status': u'ok',
u'locked': True,
u'removable': True,
u'tray_open': False,
u'type': u'unknown'}

After a device_del command I have the same thing but "'locked': False"
Then I can also do `eject device=disk1` which bring me to:

{   u'device': u'disk1',
   u'io-status': u'ok',
   u'locked': False,
   u'removable': True,
   u'tray_open': False,
   u'type': u'unknown'}

But I did not found a way to completely remove the disk1 entry in order
to be able to add it again.
To complete the operation I need to use the old HMP API and use
`drive_del` command.

Did I miss something? or is it still something missing in QMP API?

Best regards,
-- 
William


signature.asc
Description: Digital signature

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-18 Thread Paolo Bonzini



On 17/11/2014 22:20, Don Slutz wrote:
> The other callers to blk_set_enable_write_cache() in this file
> already check for s->blk == NULL.
> 
> Signed-off-by: Don Slutz 
> ---
> 
> I think this is a bugfix that should be back ported to stable
> releases.
> 
> I also think this should be done in xen's copy of QEMU for 4.5 with
> back port(s) to active stable releases.
> 
> Note: In 2.1 and earlier the routine is
> bdrv_set_enable_write_cache(); variable is s->bs.
> 
>  hw/ide/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 00e21cf..d4af5e2 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int 
> version_id)
>  {
>  IDEState *s = opaque;
>  
> -if (s->identify_set) {
> +if (s->blk && s->identify_set) {
>  blk_set_enable_write_cache(s->blk, !!(s->identify_data[85] & (1 << 
> 5)));
>  }
>  return 0;
> 

Reviewed-by: Paolo Bonzini

[Qemu-devel] [PATCH for-2.2 0/3] block/raw-posix: Fixes for preallocation

2014-11-18 Thread Max Reitz

This series brings some (minor) fixes for full preallocation in
raw-posix; thanks to László for finding these bugs.

Since the fixes are not too grave, I would be fine with not getting them
into 2.2. But because they are rather simple and they are bug fixes
after all, I sent this series with the "for-2.2" infix.


Max Reitz (3):
  block/raw-posix: Fix preallocating write() loop
  block/raw-posix: Only sync after successful preallocation
  block/raw-posix: Catch fsync() errors

 block/raw-posix.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH for-2.2 2/3] block/raw-posix: Only sync after successful preallocation

2014-11-18 Thread Max Reitz

The loop which filled the file with zeroes may have been left early due
to an error. In that case, the fsync() should be skipped.

Signed-off-by: Max Reitz 
---
 block/raw-posix.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 4e6552f..f67fb11 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1459,7 +1459,9 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 left -= result;
 }
-fsync(fd);
+if (result >= 0) {
+fsync(fd);
+}
 g_free(buf);
 break;
 }
-- 
1.9.3

[Qemu-devel] [PATCH for-2.2 1/3] block/raw-posix: Fix preallocating write() loop

2014-11-18 Thread Max Reitz

write() may write less bytes than requested; in this case, the number of
bytes written is returned. This is the byte count we should be
subtracting from the number of bytes still to be written, and not the
byte count we requested to write.

Reported-by: László Érsek 
Signed-off-by: Max Reitz 
---
An interesting anecdote: My German man page for write(2) says the
following about its return value:

> Bei Erfolg wird Null zurückgegeben.

Which translates to:

> On success, zero is returned.

Whereas write(2) for LANG=C says the following (which is correct):

> On success, the number of bytes written is returned (zero indicates
> nothing was written).

I think I know why I somehow prefer the English versions...
---
 block/raw-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e100ae2..4e6552f 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1457,7 +1457,7 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
  "Could not write to the new file");
 break;
 }
-left -= num;
+left -= result;
 }
 fsync(fd);
 g_free(buf);
-- 
1.9.3

[Qemu-devel] [PATCH for-2.2 3/3] block/raw-posix: Catch fsync() errors

2014-11-18 Thread Max Reitz

fsync() may fail, and that case should be handled.

Reported-by: László Érsek 
Signed-off-by: Max Reitz 
---
 block/raw-posix.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index f67fb11..666cdec 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1460,7 +1460,12 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 left -= result;
 }
 if (result >= 0) {
-fsync(fd);
+result = fsync(fd);
+if (result < 0) {
+result = -errno;
+error_setg_errno(errp, -result,
+ "Could not flush new file to disk");
+}
 }
 g_free(buf);
 break;
-- 
1.9.3

Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series

2014-11-18 Thread Vladimir Sementsov-Ogievskiy




(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt 
if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE? 

- The name of the bitmap and also the size of this name



(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's the 
primary benefit you're seeking? 
Hmm. It may be used for faster sync. Maybe, save some of bitmap levels 
on timer while vm is running and save the last level on shutdown?


CC qemu-devel - ok.

Best regards,
Vladimir

On 18.11.2014 02:46, John Snow wrote:



On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi

I'd just like to start working on persistent dirty bitmap. My thoughts
about it are the following:
- qemu -drive file=file,dirty_bitmap=file
 so,  bitmap will be loaded with drive open and saved with drive 
close.

- save only meaningful (the last) level of the bitmap, restore all
levels on bitmap loading
- bool parameter "persistent" for bdrv_create_dirty_bitmap and
BdrvDirtyBitmap
- internal dirty_bitmaps, saved in qcow2 file

Best regards,
Vladimir


I am thinking:

(1) Command Lines

If you enable dirty bitmaps and give it a file that doesn't exist, it 
should error out on you.


If you enable dirty bitmaps and give it a file that's blank, it 
understands that it is to create a persistent bitmap file in this 
location and it should enable persistence.


If a bitmap file is given and it has valid magic, this should imply 
persistence.


I am hesitant to have it auto-create files that don't already exist in 
case the files become large in size and a misconfiguration leads to 
repeated creation of these files that get orphaned in random folders. 
Perhaps we can add a create=auto flag or similar to allow this 
behavior if wanted.


(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt 
if we give ourselves a sector's worth to write metadata within.)

- Data starting at... PAGESIZE?

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent bitmap, delay the write until 
we can mark the bitmap as dirty first. This incurs a write penalty 
when we try to use the bitmap at first...

- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty 
information to the file and mark the file as clean once more and 
re-take the persistence lock.


(4) Synchronization Policy

- Sync after so many bits become dirty in the bitmap, either as an 
absolute threshold or a density percentage?

- Sync periodically on a fixed timer?
- Sync periodically opportunistically when I/O utilization becomes 
relatively low? (With some sort of starvation prevention timer?)

- Sync only at shutdown?

In discussing with Stefan, I think we rather liked the idea of a timer 
that tries to re-commit the block data during lulls in the I/O.


(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's the 
primary benefit you're seeking?


(6) Inclusion as qcow2 Metadata

And lastly, we did discuss the inclusion of the bitmap as qcow2 
metadata, but decided it wasn't our principle target for the format to 
allow generality to other file formats. We didn't really discuss the 
idea of having it as an option or an extension, but I don't (off the 
top of my head) have any reasonings against it, but I will likely not 
work on it myself.



You didn't CC qemu-devel on this (so I won't!), but perhaps we should 
re-send out our ideas to the wider list for feedback before we proceed 
any further. Maybe we can split the work if we agree upon a design.


Thanks!
--js

P.S.: I'm still cleaning up Fam's first patchset based on Max's and 
your feedback. Hope to have it out by the end of this week.



On 11.11.2014 18:59, John Snow wrote:



On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi Fam, hi Jorn.

Jagane's project - http://wiki.qemu.org/Features/Livebackup

In two words:
Normal delta - like in qemu, while backuping, we save all new 
writes to

separate virtual disk - delta. When backup is done, we can merge delta
back to original image.
Reverse delta - while backuping, we don't stop writing to original 
image
(and qemu works with it, not with delta), but before every writ

Re: [Qemu-devel] [PATCH for-2.2 0/3] block/raw-posix: Fixes for preallocation

2014-11-18 Thread Kevin Wolf

Am 18.11.2014 um 11:23 hat Max Reitz geschrieben:
> This series brings some (minor) fixes for full preallocation in
> raw-posix; thanks to László for finding these bugs.
> 
> Since the fixes are not too grave, I would be fine with not getting them
> into 2.2. But because they are rather simple and they are bug fixes
> after all, I sent this series with the "for-2.2" infix.

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH] exec: Handle multipage ranges in invalidate_and_set_dirty()

2014-11-18 Thread Peter Maydell

On 17 November 2014 11:03, Paolo Bonzini  wrote:
>
>
> On 16/11/2014 20:44, Peter Maydell wrote:
>> The code in invalidate_and_set_dirty() needs to handle addr/length
>> combinations which cross guest physical page boundaries. This can happen,
>> for example, when disk I/O reads large blocks into guest RAM which previously
>> held code that we have cached translations for. Unfortunately we were only
>> checking the clean/dirty status of the first page in the range, and then
>> were calling a tb_invalidate function which only handles ranges that don't
>> cross page boundaries. Fix the function to deal with multipage ranges.
>>
>> The symptoms of this bug were that guest code would misbehave (eg segfault),
>> in particular after a guest reboot but potentially any time the guest
>> reused a page of its physical RAM for new code.
>>
>> Cc: qemu-sta...@nongnu.org
>> Signed-off-by: Peter Maydell 
>> ---
>> This seems pretty nasty, and I have no idea why it hasn't been wreaking
>> more havoc than this before. I'm not entirely sure why we invalidate TBs
>> if any of the dirty bits is set rather than only if the code bit is set,
>> but I left that logic as it is.
>
> I think it's a remain of when we had a single bitmap with three bits in
> it.  We can clean up in 2.3.
>
>> Review appreciated -- it would be nice to get this into rc2 if
>> we can, I think.
>
> Reviewed-by: Paolo Bonzini 

Applied to master; thanks.

-- PMM

[Qemu-devel] [PATCH v2 3/6] block: Add blk_add_close_notifier() for BB

2014-11-18 Thread Max Reitz

Adding something like a "delete notifier" to a BlockBackend would not
make much sense, because whoever is interested in registering there will
probably hold a reference to that BlockBackend; therefore, the notifier
will never be called (or only when the notifiee already relinquished its
reference and thus most probably is no longer interested in that
notification).

Therefore, this patch just passes through the close notifier interface
of the root BDS. This will be called when the device is ejected, for
instance, and therefore does make sense.

Signed-off-by: Max Reitz 
Reviewed-by: Paolo Bonzini 
---
 block/block-backend.c  | 5 +
 include/sysemu/block-backend.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 7a7f690..ef16d73 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -642,6 +642,11 @@ void blk_remove_aio_context_notifier(BlockBackend *blk,
  detach_aio_context, opaque);
 }
 
+void blk_add_close_notifier(BlockBackend *blk, Notifier *notify)
+{
+bdrv_add_close_notifier(blk->bs, notify);
+}
+
 void blk_io_plug(BlockBackend *blk)
 {
 bdrv_io_plug(blk->bs);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index d9c1337..8871a02 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -143,6 +143,7 @@ void blk_remove_aio_context_notifier(BlockBackend *blk,
   void *),
  void (*detach_aio_context)(void *),
  void *opaque);
+void blk_add_close_notifier(BlockBackend *blk, Notifier *notify);
 void blk_io_plug(BlockBackend *blk);
 void blk_io_unplug(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
-- 
1.9.3

[Qemu-devel] [PATCH v2 4/6] nbd: Change external interface to BlockBackend

2014-11-18 Thread Max Reitz

Substitute BlockDriverState by BlockBackend in every globally visible
function provided by nbd.

Signed-off-by: Max Reitz 
Reviewed-by: Paolo Bonzini 
---
 blockdev-nbd.c  | 15 ---
 include/block/nbd.h |  7 +++
 nbd.c   | 11 ++-
 qemu-nbd.c  |  2 +-
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 06f901e..22e95d1 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -10,6 +10,7 @@
  */
 
 #include "sysemu/blockdev.h"
+#include "sysemu/block-backend.h"
 #include "hw/block/block.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qerror.h"
@@ -73,7 +74,7 @@ static void nbd_close_notifier(Notifier *n, void *data)
 void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
 Error **errp)
 {
-BlockDriverState *bs;
+BlockBackend *blk;
 NBDExport *exp;
 NBDCloseNotifier *n;
 
@@ -87,12 +88,12 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
 return;
 }
 
-bs = bdrv_find(device);
-if (!bs) {
+blk = blk_by_name(device);
+if (!blk) {
 error_set(errp, QERR_DEVICE_NOT_FOUND, device);
 return;
 }
-if (!bdrv_is_inserted(bs)) {
+if (!blk_is_inserted(blk)) {
 error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
 return;
 }
@@ -100,18 +101,18 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
 if (!has_writable) {
 writable = false;
 }
-if (bdrv_is_read_only(bs)) {
+if (blk_is_read_only(blk)) {
 writable = false;
 }
 
-exp = nbd_export_new(bs, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL);
+exp = nbd_export_new(blk, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL);
 
 nbd_export_set_name(exp, device);
 
 n = g_new0(NBDCloseNotifier, 1);
 n->n.notify = nbd_close_notifier;
 n->exp = exp;
-bdrv_add_close_notifier(bs, &n->n);
+blk_add_close_notifier(blk, &n->n);
 QTAILQ_INSERT_TAIL(&close_notifiers, n, next);
 }
 
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 9e835d2..348302c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -85,14 +85,13 @@ int nbd_disconnect(int fd);
 typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;
 
-NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset,
-  off_t size, uint32_t nbdflags,
-  void (*close)(NBDExport *));
+NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
+  uint32_t nbdflags, void (*close)(NBDExport *));
 void nbd_export_close(NBDExport *exp);
 void nbd_export_get(NBDExport *exp);
 void nbd_export_put(NBDExport *exp);
 
-BlockDriverState *nbd_export_get_blockdev(NBDExport *exp);
+BlockBackend *nbd_export_get_blockdev(NBDExport *exp);
 
 NBDExport *nbd_export_find(const char *name);
 void nbd_export_set_name(NBDExport *exp, const char *name);
diff --git a/nbd.c b/nbd.c
index a7bce45..3fd5743 100644
--- a/nbd.c
+++ b/nbd.c
@@ -19,6 +19,7 @@
 #include "block/nbd.h"
 #include "block/block.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 
 #include "block/coroutine.h"
 
@@ -957,10 +958,10 @@ static void bs_aio_detach(void *opaque)
 exp->ctx = NULL;
 }
 
-NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset,
-  off_t size, uint32_t nbdflags,
-  void (*close)(NBDExport *))
+NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
+  uint32_t nbdflags, void (*close)(NBDExport *))
 {
+BlockDriverState *bs = blk_bs(blk);
 NBDExport *exp = g_malloc0(sizeof(NBDExport));
 exp->refcount = 1;
 QTAILQ_INIT(&exp->clients);
@@ -1056,9 +1057,9 @@ void nbd_export_put(NBDExport *exp)
 }
 }
 
-BlockDriverState *nbd_export_get_blockdev(NBDExport *exp)
+BlockBackend *nbd_export_get_blockdev(NBDExport *exp)
 {
-return exp->bs;
+return exp->bs->blk;
 }
 
 void nbd_export_close_all(void)
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 5cd6c6d..60ce50f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -730,7 +730,7 @@ int main(int argc, char **argv)
 }
 }
 
-exp = nbd_export_new(bs, dev_offset, fd_size, nbdflags, nbd_export_closed);
+exp = nbd_export_new(blk, dev_offset, fd_size, nbdflags, 
nbd_export_closed);
 
 if (sockpath) {
 fd = unix_socket_incoming(sockpath);
-- 
1.9.3

[Qemu-devel] [PATCH v2 0/6] nbd: Use BlockBackend

2014-11-18 Thread Max Reitz

>From the block layer's perspective, the nbd server is pretty similar to
a guest device. Therefore, it should use BlockBackend to access block
devices, just like any other guest device does.

This series consequently makes the nbd server use BlockBackend for
referencing block devices.


v2:
- Added patch 6, which converts qemu-nbd to BlockBackend as far as
  reasonable [Paolo]


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/6:[] [--] 'block: Lift more functions into BlockBackend'
002/6:[] [--] 'block: Add AioContextNotifier functions to BB'
003/6:[] [--] 'block: Add blk_add_close_notifier() for BB'
004/6:[] [--] 'nbd: Change external interface to BlockBackend'
005/6:[] [--] 'nbd: Use BlockBackend internally'
006/6:[down] 'qemu-nbd: Use BlockBackend where reasonable'


Max Reitz (6):
  block: Lift more functions into BlockBackend
  block: Add AioContextNotifier functions to BB
  block: Add blk_add_close_notifier() for BB
  nbd: Change external interface to BlockBackend
  nbd: Use BlockBackend internally
  qemu-nbd: Use BlockBackend where reasonable

 block/block-backend.c  | 38 +
 blockdev-nbd.c | 15 +-
 include/block/nbd.h|  7 ++---
 include/sysemu/block-backend.h | 12 
 nbd.c  | 63 +-
 qemu-nbd.c | 12 
 6 files changed, 99 insertions(+), 48 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH v2 5/6] nbd: Use BlockBackend internally

2014-11-18 Thread Max Reitz

With all externally visible functions changed to use BlockBackend, this
patch makes nbd use BlockBackend for everything internally as well.

While touching them, substitute 512 by BDRV_SECTOR_SIZE in the calls to
blk_read(), blk_write() and blk_co_discard().

Signed-off-by: Max Reitz 
Reviewed-by: Paolo Bonzini 
---
 nbd.c | 56 
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/nbd.c b/nbd.c
index 3fd5743..53cf82b 100644
--- a/nbd.c
+++ b/nbd.c
@@ -17,8 +17,6 @@
  */
 
 #include "block/nbd.h"
-#include "block/block.h"
-#include "block/block_int.h"
 #include "sysemu/block-backend.h"
 
 #include "block/coroutine.h"
@@ -102,7 +100,7 @@ struct NBDExport {
 int refcount;
 void (*close)(NBDExport *exp);
 
-BlockDriverState *bs;
+BlockBackend *blk;
 char *name;
 off_t dev_offset;
 off_t size;
@@ -930,7 +928,7 @@ static void nbd_request_put(NBDRequest *req)
 nbd_client_put(client);
 }
 
-static void bs_aio_attached(AioContext *ctx, void *opaque)
+static void blk_aio_attached(AioContext *ctx, void *opaque)
 {
 NBDExport *exp = opaque;
 NBDClient *client;
@@ -944,7 +942,7 @@ static void bs_aio_attached(AioContext *ctx, void *opaque)
 }
 }
 
-static void bs_aio_detach(void *opaque)
+static void blk_aio_detach(void *opaque)
 {
 NBDExport *exp = opaque;
 NBDClient *client;
@@ -961,24 +959,23 @@ static void bs_aio_detach(void *opaque)
 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
   uint32_t nbdflags, void (*close)(NBDExport *))
 {
-BlockDriverState *bs = blk_bs(blk);
 NBDExport *exp = g_malloc0(sizeof(NBDExport));
 exp->refcount = 1;
 QTAILQ_INIT(&exp->clients);
-exp->bs = bs;
+exp->blk = blk;
 exp->dev_offset = dev_offset;
 exp->nbdflags = nbdflags;
-exp->size = size == -1 ? bdrv_getlength(bs) : size;
+exp->size = size == -1 ? blk_getlength(blk) : size;
 exp->close = close;
-exp->ctx = bdrv_get_aio_context(bs);
-bdrv_ref(bs);
-bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach, exp);
+exp->ctx = blk_get_aio_context(blk);
+blk_ref(blk);
+blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
 /*
  * NBD exports are used for non-shared storage migration.  Make sure
  * that BDRV_O_INCOMING is cleared and the image is ready for write
  * access since the export could be available before migration handover.
  */
-bdrv_invalidate_cache(bs, NULL);
+blk_invalidate_cache(blk, NULL);
 return exp;
 }
 
@@ -1025,11 +1022,11 @@ void nbd_export_close(NBDExport *exp)
 }
 nbd_export_set_name(exp, NULL);
 nbd_export_put(exp);
-if (exp->bs) {
-bdrv_remove_aio_context_notifier(exp->bs, bs_aio_attached,
- bs_aio_detach, exp);
-bdrv_unref(exp->bs);
-exp->bs = NULL;
+if (exp->blk) {
+blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
+blk_aio_detach, exp);
+blk_unref(exp->blk);
+exp->blk = NULL;
 }
 }
 
@@ -1059,7 +1056,7 @@ void nbd_export_put(NBDExport *exp)
 
 BlockBackend *nbd_export_get_blockdev(NBDExport *exp)
 {
-return exp->bs->blk;
+return exp->blk;
 }
 
 void nbd_export_close_all(void)
@@ -1138,7 +1135,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, 
struct nbd_request *reque
 
 command = request->type & NBD_CMD_MASK_COMMAND;
 if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
-req->data = qemu_blockalign(client->exp->bs, request->len);
+req->data = blk_blockalign(client->exp->blk, request->len);
 }
 if (command == NBD_CMD_WRITE) {
 TRACE("Reading %u byte(s)", request->len);
@@ -1204,7 +1201,7 @@ static void nbd_trip(void *opaque)
 TRACE("Request type is READ");
 
 if (request.type & NBD_CMD_FLAG_FUA) {
-ret = bdrv_co_flush(exp->bs);
+ret = blk_co_flush(exp->blk);
 if (ret < 0) {
 LOG("flush failed");
 reply.error = -ret;
@@ -1212,8 +1209,9 @@ static void nbd_trip(void *opaque)
 }
 }
 
-ret = bdrv_read(exp->bs, (request.from + exp->dev_offset) / 512,
-req->data, request.len / 512);
+ret = blk_read(exp->blk,
+   (request.from + exp->dev_offset) / BDRV_SECTOR_SIZE,
+   req->data, request.len / BDRV_SECTOR_SIZE);
 if (ret < 0) {
 LOG("reading from file failed");
 reply.error = -ret;
@@ -1235,8 +1233,9 @@ static void nbd_trip(void *opaque)
 
 TRACE("Writing to device");
 
-ret = bdrv_write(exp->bs, (request.from + exp->dev_offset) / 512,
- req->data, request.len / 512);
+ret = blk_write(exp->blk,
+(request.from + exp->dev

[Qemu-devel] [PATCH v2 6/6] qemu-nbd: Use BlockBackend where reasonable

2014-11-18 Thread Max Reitz

Because qemu-nbd creates the BlockBackend by itself, it should create
the according BlockDriverState tree by itself as well; that means, it
has call bdrv_open() on its own. This is one of the places where
qemu-nbd still needs to use a BlockDriverState directly (the root BDS
below the BB); other places are the configuration of zero detection
(which may be lifted into the BB eventually, but is not yet) and
temporarily loading a snapshot.

Everywhere else, though, qemu-nbd can and thus should use BlockBackend.

Suggested-by: Paolo Bonzini 
Signed-off-by: Max Reitz 
---
 qemu-nbd.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 60ce50f..d222512 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -146,7 +146,7 @@ static void read_partition(uint8_t *p, struct 
partition_record *r)
 r->nb_sectors_abs = p[12] | p[13] << 8 | p[14] << 16 | p[15] << 24;
 }
 
-static int find_partition(BlockDriverState *bs, int partition,
+static int find_partition(BlockBackend *blk, int partition,
   off_t *offset, off_t *size)
 {
 struct partition_record mbr[4];
@@ -155,7 +155,7 @@ static int find_partition(BlockDriverState *bs, int 
partition,
 int ext_partnum = 4;
 int ret;
 
-if ((ret = bdrv_read(bs, 0, data, 1)) < 0) {
+if ((ret = blk_read(blk, 0, data, 1)) < 0) {
 errno = -ret;
 err(EXIT_FAILURE, "error while reading");
 }
@@ -175,7 +175,7 @@ static int find_partition(BlockDriverState *bs, int 
partition,
 uint8_t data1[512];
 int j;
 
-if ((ret = bdrv_read(bs, mbr[i].start_sector_abs, data1, 1)) < 0) {
+if ((ret = blk_read(blk, mbr[i].start_sector_abs, data1, 1)) < 0) {
 errno = -ret;
 err(EXIT_FAILURE, "error while reading");
 }
@@ -720,10 +720,10 @@ int main(int argc, char **argv)
 }
 
 bs->detect_zeroes = detect_zeroes;
-fd_size = bdrv_getlength(bs);
+fd_size = blk_getlength(blk);
 
 if (partition != -1) {
-ret = find_partition(bs, partition, &dev_offset, &fd_size);
+ret = find_partition(blk, partition, &dev_offset, &fd_size);
 if (ret < 0) {
 errno = -ret;
 err(EXIT_FAILURE, "Could not find partition %d", partition);
-- 
1.9.3

[Qemu-devel] [PATCH v2 1/6] block: Lift more functions into BlockBackend

2014-11-18 Thread Max Reitz

There are already some blk_aio_* functions, so we might as well have
blk_co_* functions (as far as we need them). This patch adds
blk_co_flush(), blk_co_discard(), and also blk_invalidate_cache() (which
is not a blk_co_* function but is needed nonetheless).

Signed-off-by: Max Reitz 
Reviewed-by: Paolo Bonzini 
---
 block/block-backend.c  | 15 +++
 include/sysemu/block-backend.h |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index d0692b1..89f69b7 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -497,6 +497,16 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long 
int req, void *buf,
 return bdrv_aio_ioctl(blk->bs, req, buf, cb, opaque);
 }
 
+int blk_co_discard(BlockBackend *blk, int64_t sector_num, int nb_sectors)
+{
+return bdrv_co_discard(blk->bs, sector_num, nb_sectors);
+}
+
+int blk_co_flush(BlockBackend *blk)
+{
+return bdrv_co_flush(blk->bs);
+}
+
 int blk_flush(BlockBackend *blk)
 {
 return bdrv_flush(blk->bs);
@@ -549,6 +559,11 @@ void blk_set_enable_write_cache(BlockBackend *blk, bool 
wce)
 bdrv_set_enable_write_cache(blk->bs, wce);
 }
 
+void blk_invalidate_cache(BlockBackend *blk, Error **errp)
+{
+bdrv_invalidate_cache(blk->bs, errp);
+}
+
 int blk_is_inserted(BlockBackend *blk)
 {
 return bdrv_is_inserted(blk->bs);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 52d13c1..0c46b82 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -108,6 +108,8 @@ int blk_aio_multiwrite(BlockBackend *blk, BlockRequest 
*reqs, int num_reqs);
 int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf);
 BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
   BlockCompletionFunc *cb, void *opaque);
+int blk_co_discard(BlockBackend *blk, int64_t sector_num, int nb_sectors);
+int blk_co_flush(BlockBackend *blk);
 int blk_flush(BlockBackend *blk);
 int blk_flush_all(void);
 void blk_drain_all(void);
@@ -120,6 +122,7 @@ int blk_is_read_only(BlockBackend *blk);
 int blk_is_sg(BlockBackend *blk);
 int blk_enable_write_cache(BlockBackend *blk);
 void blk_set_enable_write_cache(BlockBackend *blk, bool wce);
+void blk_invalidate_cache(BlockBackend *blk, Error **errp);
 int blk_is_inserted(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
-- 
1.9.3

[Qemu-devel] [PATCH v2 2/6] block: Add AioContextNotifier functions to BB

2014-11-18 Thread Max Reitz

Because all BlockDriverStates behind a single BlockBackend reside in a
single AioContext, it is fine to just pass these functions
(blk_add_aio_context_notifier() and blk_remove_aio_context_notifier())
through to the root BlockDriverState.

Signed-off-by: Max Reitz 
Reviewed-by: Paolo Bonzini 
---
 block/block-backend.c  | 18 ++
 include/sysemu/block-backend.h |  8 
 2 files changed, 26 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 89f69b7..7a7f690 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -624,6 +624,24 @@ void blk_set_aio_context(BlockBackend *blk, AioContext 
*new_context)
 bdrv_set_aio_context(blk->bs, new_context);
 }
 
+void blk_add_aio_context_notifier(BlockBackend *blk,
+void (*attached_aio_context)(AioContext *new_context, void *opaque),
+void (*detach_aio_context)(void *opaque), void *opaque)
+{
+bdrv_add_aio_context_notifier(blk->bs, attached_aio_context,
+  detach_aio_context, opaque);
+}
+
+void blk_remove_aio_context_notifier(BlockBackend *blk,
+ void (*attached_aio_context)(AioContext *,
+  void *),
+ void (*detach_aio_context)(void *),
+ void *opaque)
+{
+bdrv_remove_aio_context_notifier(blk->bs, attached_aio_context,
+ detach_aio_context, opaque);
+}
+
 void blk_io_plug(BlockBackend *blk)
 {
 bdrv_io_plug(blk->bs);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 0c46b82..d9c1337 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -135,6 +135,14 @@ void blk_op_block_all(BlockBackend *blk, Error *reason);
 void blk_op_unblock_all(BlockBackend *blk, Error *reason);
 AioContext *blk_get_aio_context(BlockBackend *blk);
 void blk_set_aio_context(BlockBackend *blk, AioContext *new_context);
+void blk_add_aio_context_notifier(BlockBackend *blk,
+void (*attached_aio_context)(AioContext *new_context, void *opaque),
+void (*detach_aio_context)(void *opaque), void *opaque);
+void blk_remove_aio_context_notifier(BlockBackend *blk,
+ void (*attached_aio_context)(AioContext *,
+  void *),
+ void (*detach_aio_context)(void *),
+ void *opaque);
 void blk_io_plug(BlockBackend *blk);
 void blk_io_unplug(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
-- 
1.9.3

[Qemu-devel] [PATCH 1/1] migration: fix parameter validation on ram load

2014-11-18 Thread Amit Shah

From: "Michael S. Tsirkin" 

During migration, the values read from migration stream during ram load
are not validated. Especially offset in host_from_stream_offset() and
also the length of the writes in the callers of said function.

To fix this, we need to make sure that the [offset, offset + length]
range fits into one of the allocated memory regions.

Validating addr < len should be sufficient since data seems to always be
managed in TARGET_PAGE_SIZE chunks.

Fixes: CVE-2014-7840

Note: follow-up patches add extra checks on each block->host access.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Amit Shah 
---
 arch_init.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 88a5ba0..593a990 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1006,7 +1006,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 uint8_t len;
 
 if (flags & RAM_SAVE_FLAG_CONTINUE) {
-if (!block) {
+if (!block || block->length <= offset) {
 error_report("Ack, bad migration stream!");
 return NULL;
 }
@@ -1019,8 +1019,9 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 id[len] = 0;
 
 QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-if (!strncmp(id, block->idstr, sizeof(id)))
+if (!strncmp(id, block->idstr, sizeof(id)) && block->length > offset) {
 return memory_region_get_ram_ptr(block->mr) + offset;
+}
 }
 
 error_report("Can't find block %s!", id);
-- 
2.1.0

[Qemu-devel] [PULL] migration: fix for CVE-2014-7840

2014-11-18 Thread Amit Shah

The following changes since commit d6be29e3fb5659102ac0e48e295d177cb67e32c5:

  target-arm: handle address translations that start at level 3 (2014-11-17 
19:30:28 +)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/qemu/amit/migration.git tags/for-2.2

for you to fetch changes up to 0be839a2701369f669532ea5884c15bead1c6e08:

  migration: fix parameter validation on ram load (2014-11-18 16:49:44 +0530)


Fix for CVE-2014-7840, avoiding arbitrary qemu memory overwrite for
migration by Michael S. Tsirkin.


Michael S. Tsirkin (1):
  migration: fix parameter validation on ram load

 arch_init.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.1.0

[Qemu-devel] [PULL 2/6] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP

2014-11-18 Thread Kevin Wolf

From: Markus Armbruster 

Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
follows:

1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl

2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()

3. Else pretend there are no holes

Later on, raw_co_is_allocated() was generalized to
raw_co_get_block_status().

Commit 4f11aa8 (May 2014) changed it to try the three methods in order
until success, because "there may be implementations which support
[SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
versa."

Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.

As you see, the obvious use of FIEMAP is wrong, and the correct use is
slow.  I guess this puts it somewhere between -7 "The obvious use is
wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
to Misuse scale[*].

"Fortunately", the FIEMAP code is used only when

* SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is

  Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
  was introduced for ext4 and btrfs) and 2012.

* SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails

  Unlikely.

Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
bugs.  Worse, bugs hiding there can theoretically bite even on a host
that has SEEK_HOLE/SEEK_DATA.

I don't want to worry about this crap, not even theoretically.  Get
rid of it.

[*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

Signed-off-by: Markus Armbruster 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 block/raw-posix.c | 63 ---
 1 file changed, 4 insertions(+), 59 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 706d3c0..a29130e 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,9 +60,6 @@
 #define FS_NOCOW_FL 0x0080 /* Do not cow file */
 #endif
 #endif
-#ifdef CONFIG_FIEMAP
-#include 
-#endif
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
 #include 
 #endif
@@ -151,9 +148,6 @@ typedef struct BDRVRawState {
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
 bool needs_alignment;
-#ifdef CONFIG_FIEMAP
-bool skip_fiemap;
-#endif
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {
@@ -1481,52 +1475,6 @@ out:
 return result;
 }
 
-static int try_fiemap(BlockDriverState *bs, off_t start, off_t *data,
-  off_t *hole, int nb_sectors)
-{
-#ifdef CONFIG_FIEMAP
-BDRVRawState *s = bs->opaque;
-int ret = 0;
-struct {
-struct fiemap fm;
-struct fiemap_extent fe;
-} f;
-
-if (s->skip_fiemap) {
-return -ENOTSUP;
-}
-
-f.fm.fm_start = start;
-f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
-f.fm.fm_flags = FIEMAP_FLAG_SYNC;
-f.fm.fm_extent_count = 1;
-f.fm.fm_reserved = 0;
-if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
-s->skip_fiemap = true;
-return -errno;
-}
-
-if (f.fm.fm_mapped_extents == 0) {
-/* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length.
- * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
- */
-off_t length = lseek(s->fd, 0, SEEK_END);
-*hole = f.fm.fm_start;
-*data = MIN(f.fm.fm_start + f.fm.fm_length, length);
-} else {
-*data = f.fe.fe_logical;
-*hole = f.fe.fe_logical + f.fe.fe_length;
-if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
-ret |= BDRV_BLOCK_ZERO;
-}
-}
-
-return ret;
-#else
-return -ENOTSUP;
-#endif
-}
-
 static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
  off_t *hole)
 {
@@ -1593,13 +1541,10 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
 
 ret = try_seek_hole(bs, start, &data, &hole);
 if (ret < 0) {
-ret = try_fiemap(bs, start, &data, &hole, nb_sectors);
-if (ret < 0) {
-/* Assume everything is allocated. */
-data = 0;
-hole = start + nb_sectors * BDRV_SECTOR_SIZE;
-ret = 0;
-}
+/* Assume everything is allocated. */
+data = 0;
+hole = start + nb_sectors * BDRV_SECTOR_SIZE;
+ret = 0;
 }
 
 assert(ret >= 0);
-- 
1.8.3.1

[Qemu-devel] [PULL 6/6] block/raw-posix: Catch fsync() errors

2014-11-18 Thread Kevin Wolf

From: Max Reitz 

fsync() may fail, and that case should be handled.

Reported-by: László Érsek 
Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block/raw-posix.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index d106fc4..b1af77e 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1454,7 +1454,12 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 left -= result;
 }
 if (result >= 0) {
-fsync(fd);
+result = fsync(fd);
+if (result < 0) {
+result = -errno;
+error_setg_errno(errp, -result,
+ "Could not flush new file to disk");
+}
 }
 g_free(buf);
 break;
-- 
1.8.3.1

[Qemu-devel] [PULL 1/6] raw-posix: Fix comment for raw_co_get_block_status()

2014-11-18 Thread Kevin Wolf

From: Markus Armbruster 

Missed in commit 705be72.

Signed-off-by: Markus Armbruster 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Max Reitz 
Signed-off-by: Max Reitz 
---
 block/raw-posix.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e100ae2..706d3c0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1555,9 +1555,7 @@ static int try_seek_hole(BlockDriverState *bs, off_t 
start, off_t *data,
 }
 
 /*
- * Returns true iff the specified sector is present in the disk image. Drivers
- * not implementing the functionality are assumed to not support backing files,
- * hence all their sectors are reported as allocated.
+ * Returns the allocation status of the specified sectors.
  *
  * If 'sector_num' is beyond the end of the disk image the return value is 0
  * and 'pnum' is set to 0.
-- 
1.8.3.1

[Qemu-devel] [PULL 0/6] Block patches for 2.2.0-rc2

2014-11-18 Thread Kevin Wolf

The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2014-11-17 17:22:03 +)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 098ffa6674a82ceac0e3ccb3a8a5bf6ca44adcd5:

  block/raw-posix: Catch fsync() errors (2014-11-18 12:09:00 +0100)


Block patches for 2.2.0-rc2


Kevin Wolf (1):
  Merge remote-tracking branch 'mreitz/block' into queue-block

Markus Armbruster (3):
  raw-posix: Fix comment for raw_co_get_block_status()
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  raw-posix: The SEEK_HOLE code is flawed, rewrite it

Max Reitz (3):
  block/raw-posix: Fix preallocating write() loop
  block/raw-posix: Only sync after successful preallocation
  block/raw-posix: Catch fsync() errors

 block/raw-posix.c | 173 --
 1 file changed, 91 insertions(+), 82 deletions(-)

[Qemu-devel] [PULL 3/6] raw-posix: The SEEK_HOLE code is flawed, rewrite it

2014-11-18 Thread Kevin Wolf

From: Markus Armbruster 

On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.

Additionally, unlikely lseek() failures are treated badly:

* When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
  -ENXIO, there's in fact a trailing hole.  Can happen only when
  something truncated the file since we opened it.

* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
  then try_seek_hole() reports a trailing hole.  This is okay only
  when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
  found by SEEK_HOLE has since become trailing somehow).  For other
  failures (unlikely), it's wrong.

* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
  then try_seek_hole() reports bogus data [-1,start), which its caller
  raw_co_get_block_status() turns into zero sectors of data.  Could
  theoretically lead to infinite loops in code that attempts to scan
  data vs. hole forward.

Rewrite from scratch, with very careful comments.

Signed-off-by: Markus Armbruster 
Reviewed-by: Max Reitz 
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 block/raw-posix.c | 111 +-
 1 file changed, 85 insertions(+), 26 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index a29130e..414e6d1 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1475,28 +1475,86 @@ out:
 return result;
 }
 
-static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
- off_t *hole)
+/*
+ * Find allocation range in @bs around offset @start.
+ * May change underlying file descriptor's file offset.
+ * If @start is not in a hole, store @start in @data, and the
+ * beginning of the next hole in @hole, and return 0.
+ * If @start is in a non-trailing hole, store @start in @hole and the
+ * beginning of the next non-hole in @data, and return 0.
+ * If @start is in a trailing hole or beyond EOF, return -ENXIO.
+ * If we can't find out, return a negative errno other than -ENXIO.
+ */
+static int find_allocation(BlockDriverState *bs, off_t start,
+   off_t *data, off_t *hole)
 {
 #if defined SEEK_HOLE && defined SEEK_DATA
 BDRVRawState *s = bs->opaque;
+off_t offs;
 
-*hole = lseek(s->fd, start, SEEK_HOLE);
-if (*hole == -1) {
-return -errno;
+/*
+ * SEEK_DATA cases:
+ * D1. offs == start: start is in data
+ * D2. offs > start: start is in a hole, next data at offs
+ * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
+ *  or start is beyond EOF
+ * If the latter happens, the file has been truncated behind
+ * our back since we opened it.  All bets are off then.
+ * Treating like a trailing hole is simplest.
+ * D4. offs < 0, errno != ENXIO: we learned nothing
+ */
+offs = lseek(s->fd, start, SEEK_DATA);
+if (offs < 0) {
+return -errno;  /* D3 or D4 */
+}
+assert(offs >= start);
+
+if (offs > start) {
+/* D2: in hole, next data at offs */
+*hole = start;
+*data = offs;
+return 0;
 }
 
-if (*hole > start) {
+/* D1: in data, end not yet known */
+
+/*
+ * SEEK_HOLE cases:
+ * H1. offs == start: start is in a hole
+ * If this happens here, a hole has been dug behind our back
+ * since the previous lseek().
+ * H2. offs > start: either start is in data, next hole at offs,
+ *   or start is in trailing hole, EOF at offs
+ * Linux treats trailing holes like any other hole: offs ==
+ * start.  Solaris seeks to EOF instead: offs > start (blech).
+ * If that happens here, a hole has been dug behind our back
+ * since the previous lseek().
+ * H3. offs < 0, errno = ENXIO: start is beyond EOF
+ * If this happens, the file has been truncated behind our
+ * back since we opened it.  Treat it like a trailing hole.
+ * H4. offs < 0, errno != ENXIO: we learned nothing
+ * Pretend we know nothing at all, i.e. "forget" about D1.
+ */
+offs = lseek(s->fd, start, SEEK_HOLE);
+if (offs < 0) {
+return -errno;  /* D1 and (H3 or H4) */
+}
+assert(offs >= start);
+
+if (offs > start) {
+/*
+ * D1 and H2: either in data, next hole at offs, or it was in
+ * data but is now in a trailing hole.  In the latter case,
+ * all bets are off.  Treating it as if it there was data all
+ * the way to EOF is safe, so simply do that.
+ */
 *data = start;
-} else {
-/* On a hole.  We need another syscall to find its end.  */
-*data = lseek(s->fd, start, SEEK_DATA);
-if (*data == -1) {
-*data = lseek(s->fd, 0, SEEK_END);
-}
+*hole = offs;
+return 0;
 }
 
-

[Qemu-devel] [PULL 5/6] block/raw-posix: Only sync after successful preallocation

2014-11-18 Thread Kevin Wolf

From: Max Reitz 

The loop which filled the file with zeroes may have been left early due
to an error. In that case, the fsync() should be skipped.

Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block/raw-posix.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e0e48c5..d106fc4 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1453,7 +1453,9 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 left -= result;
 }
-fsync(fd);
+if (result >= 0) {
+fsync(fd);
+}
 g_free(buf);
 break;
 }
-- 
1.8.3.1

[Qemu-devel] [PULL 4/6] block/raw-posix: Fix preallocating write() loop

2014-11-18 Thread Kevin Wolf

From: Max Reitz 

write() may write less bytes than requested; in this case, the number of
bytes written is returned. This is the byte count we should be
subtracting from the number of bytes still to be written, and not the
byte count we requested to write.

Reported-by: László Érsek 
Signed-off-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block/raw-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 414e6d1..e0e48c5 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1451,7 +1451,7 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
  "Could not write to the new file");
 break;
 }
-left -= num;
+left -= result;
 }
 fsync(fd);
 g_free(buf);
-- 
1.8.3.1

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-18 Thread Stefan Hajnoczi

On Mon, Nov 17, 2014 at 04:20:39PM -0500, Don Slutz wrote:
> The other callers to blk_set_enable_write_cache() in this file
> already check for s->blk == NULL.
> 
> Signed-off-by: Don Slutz 
> ---
> 
> I think this is a bugfix that should be back ported to stable
> releases.
> 
> I also think this should be done in xen's copy of QEMU for 4.5 with
> back port(s) to active stable releases.
> 
> Note: In 2.1 and earlier the routine is
> bdrv_set_enable_write_cache(); variable is s->bs.
> 
>  hw/ide/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


pgp82QmAe9sMB.pgp
Description: PGP signature

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-18 Thread Kevin Wolf

Am 17.11.2014 um 22:20 hat Don Slutz geschrieben:
> The other callers to blk_set_enable_write_cache() in this file
> already check for s->blk == NULL.
> 
> Signed-off-by: Don Slutz 
> ---
> 
> I think this is a bugfix that should be back ported to stable
> releases.

Please remember to include a 'Cc: qemu-sta...@nongnu.org' line in your
commit messages for such patches. I've added it and applied the patch to
my block branch, thanks. (Saw it too late for my -rc2 pull request,
though, so it'll have to wait for -rc3.)

Kevin

Re: [Qemu-devel] [PATCH 4/5] memory: interface to allocate device ram

2014-11-18 Thread Michael S. Tsirkin

On Mon, Nov 17, 2014 at 08:21:00PM +, Peter Maydell wrote:
> On 17 November 2014 20:08, Michael S. Tsirkin  wrote:
> > Add API to allocate on-device RAM.
> > This looks just like regular RAM from migration POV,
> > but has two special properties internally:
> >  - it is never exposed to guest
> 
> If it's not exposed to the guest why is it a MemoryRegion?
> Those are pretty much the definition of regions of memory
> we expose to the guest (via one address space or another)...
> 
> -- PMM

Mostly for migration: we are using page migration machinery to
migrate this memory early rather than as part of device state
after VM is stopped.

-- 
MST

Re: [Qemu-devel] [PATCH v2 6/6] qemu-nbd: Use BlockBackend where reasonable

2014-11-18 Thread Paolo Bonzini



On 18/11/2014 12:21, Max Reitz wrote:
> Because qemu-nbd creates the BlockBackend by itself, it should create
> the according BlockDriverState tree by itself as well; that means, it
> has call bdrv_open() on its own. This is one of the places where
> qemu-nbd still needs to use a BlockDriverState directly (the root BDS
> below the BB); other places are the configuration of zero detection
> (which may be lifted into the BB eventually, but is not yet) and
> temporarily loading a snapshot.
> 
> Everywhere else, though, qemu-nbd can and thus should use BlockBackend.

... which is not much, but is nevertheless a start. :)

Reviewed-by: Paolo Bonzini 

> Suggested-by: Paolo Bonzini 
> Signed-off-by: Max Reitz 
> ---
>  qemu-nbd.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 60ce50f..d222512 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -146,7 +146,7 @@ static void read_partition(uint8_t *p, struct 
> partition_record *r)
>  r->nb_sectors_abs = p[12] | p[13] << 8 | p[14] << 16 | p[15] << 24;
>  }
>  
> -static int find_partition(BlockDriverState *bs, int partition,
> +static int find_partition(BlockBackend *blk, int partition,
>off_t *offset, off_t *size)
>  {
>  struct partition_record mbr[4];
> @@ -155,7 +155,7 @@ static int find_partition(BlockDriverState *bs, int 
> partition,
>  int ext_partnum = 4;
>  int ret;
>  
> -if ((ret = bdrv_read(bs, 0, data, 1)) < 0) {
> +if ((ret = blk_read(blk, 0, data, 1)) < 0) {
>  errno = -ret;
>  err(EXIT_FAILURE, "error while reading");
>  }
> @@ -175,7 +175,7 @@ static int find_partition(BlockDriverState *bs, int 
> partition,
>  uint8_t data1[512];
>  int j;
>  
> -if ((ret = bdrv_read(bs, mbr[i].start_sector_abs, data1, 1)) < 
> 0) {
> +if ((ret = blk_read(blk, mbr[i].start_sector_abs, data1, 1)) < 
> 0) {
>  errno = -ret;
>  err(EXIT_FAILURE, "error while reading");
>  }
> @@ -720,10 +720,10 @@ int main(int argc, char **argv)
>  }
>  
>  bs->detect_zeroes = detect_zeroes;
> -fd_size = bdrv_getlength(bs);
> +fd_size = blk_getlength(blk);
>  
>  if (partition != -1) {
> -ret = find_partition(bs, partition, &dev_offset, &fd_size);
> +ret = find_partition(blk, partition, &dev_offset, &fd_size);
>  if (ret < 0) {
>  errno = -ret;
>  err(EXIT_FAILURE, "Could not find partition %d", partition);
>

Re: [Qemu-devel] [PATCH] linux-headers: update to 3.18-rc5

2014-11-18 Thread Peter Maydell

On 18 November 2014 05:19, Paolo Bonzini  wrote:
>
>
> On 17/11/2014 19:32, Peter Maydell wrote:
>> On 17 November 2014 18:28, Ard Biesheuvel  wrote:
>>> This updates the Linux header to version 3.18-rc5, adding support for
>>> (among other things) read-only memslots on ARM and arm64.
>>>
>>> Signed-off-by: Ard Biesheuvel 
>>
>> So, to be clear, the idea is that this should go into 2.2 because
>> it (effectively) fixes a bug where running from emulated flash
>> devices under KVM/ARM doesn't work?
>>
>> A KVM header update seems pretty safe to me; cc'ing Paolo
>> as KVM maintainer to ack/nak it.
>
> Sure, this is a good time (kernel-wise) to update headers.  And if it
> fixes a bug, it's acceptable for hard freeze.

OK; applied to master.

thanks
-- PMM

[Qemu-devel] [PATCH 0/4] MAINTAINERS: add myself to migration, rng; update others

2014-11-18 Thread Amit Shah

Hello,

These patches add myself to the migration maintainers list, and a new
entry for virtio-rng.  Also update the virtio-serial entry for an
include file.

Please ack.

Amit Shah (4):
  MAINTAINERS: Add myself to migration maintainers
  MAINTAINERS: migration: add vmstate static checker files
  MAINTAINERS: add entry for virtio-rng
  MAINTAINERS: add include files to virtio-serial entry

 MAINTAINERS | 11 +++
 1 file changed, 11 insertions(+)

-- 
2.1.0

[Qemu-devel] [PATCH 1/4] MAINTAINERS: Add myself to migration maintainers

2014-11-18 Thread Amit Shah

Signed-off-by: Amit Shah 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bcb69e8..e517c41 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -928,6 +928,7 @@ F: scripts/checkpatch.pl
 
 Migration
 M: Juan Quintela 
+M: Amit Shah 
 S: Maintained
 F: include/migration/
 F: migration*
-- 
2.1.0

[Qemu-devel] [PATCH 2/4] MAINTAINERS: migration: add vmstate static checker files

2014-11-18 Thread Amit Shah

Signed-off-by: Amit Shah 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e517c41..af8d856 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -935,6 +935,8 @@ F: migration*
 F: savevm.c
 F: arch_init.c
 F: vmstate.c
+F: scripts/vmstate-static-checker.py
+F: tests/vmstate-static-checker-data/
 
 Seccomp
 M: Eduardo Otubo 
-- 
2.1.0

[Qemu-devel] [PATCH 3/4] MAINTAINERS: add entry for virtio-rng

2014-11-18 Thread Amit Shah

Signed-off-by: Amit Shah 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index af8d856..4c46cfe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -697,6 +697,13 @@ S: Supported
 F: hw/char/virtio-serial-bus.c
 F: hw/char/virtio-console.c
 
+virtio-rng
+M: Amit Shah 
+S: Supported
+F: hw/virtio/virtio-rng.c
+F: include/hw/virtio/virtio-rng.h
+F: backends/rng*.c
+
 nvme
 M: Keith Busch 
 S: Supported
-- 
2.1.0

[Qemu-devel] [PATCH 4/4] MAINTAINERS: add include files to virtio-serial entry

2014-11-18 Thread Amit Shah

Signed-off-by: Amit Shah 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4c46cfe..d2f4a11 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -696,6 +696,7 @@ M: Amit Shah 
 S: Supported
 F: hw/char/virtio-serial-bus.c
 F: hw/char/virtio-console.c
+F: include/hw/virtio/virtio-serial.h
 
 virtio-rng
 M: Amit Shah 
-- 
2.1.0

Re: [Qemu-devel] [PATCH Part1 1/5] acpi, pc: Add hotunplug request cb for pc machine.

2014-11-18 Thread Igor Mammedov

On Mon, 17 Nov 2014 13:03:13 +0800
Tang Chen  wrote:

in subj s/cb/callback|handler/

> Memory and CPU hot unplug are both asynchronize procedures.
s/asynchronize/asynchronous/
> They both need unplug request cb when the unplug operation happens.
s/cb when the unplug operation happens/callback to initiate unplug operation/

> 
> This patch adds hotunplug request cb for pc machine, and memory and CPU
> hot unplug will base on it.
Add unplug handler to pc machine that will be used by following
CPU and memory unplug patches.


> 
> Signed-off-by: Tang Chen 
> ---
>  hw/i386/pc.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 1205db8..5c48435 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1647,6 +1647,13 @@ static void pc_machine_device_plug_cb(HotplugHandler 
> *hotplug_dev,
>  }
>  }
>  
> +static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
> +DeviceState *dev, Error 
> **errp)
> +{
> +error_setg(errp, "acpi: device unplug request for not supported device"
> +   " type: %s", object_get_typename(OBJECT(dev)));
it's not necessarily acpi related in general so maybe drop 'acpi:' prefix.
Also it would be nice to add device's ID or use it instead of type name.

> +}
> +
>  static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
>   DeviceState *dev)
>  {
> @@ -1753,6 +1760,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
> *data)
>  pcmc->get_hotplug_handler = mc->get_hotplug_handler;
>  mc->get_hotplug_handler = pc_get_hotpug_handler;
>  hc->plug = pc_machine_device_plug_cb;
> +hc->unplug_request = pc_machine_device_unplug_request_cb;
>  }
>  
>  static const TypeInfo pc_machine_info = {

Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support

2014-11-18 Thread Frank Blaschka

On Mon, Nov 10, 2014 at 04:14:16PM +0100, Alexander Graf wrote:
> 
> 
> On 10.11.14 15:20, Frank Blaschka wrote:
> > From: Frank Blaschka 
> > 
> > This patch implements a pci bus for s390x together with infrastructure
> > to generate and handle hotplug events, to configure/unconfigure via
> > sclp instruction, to do iommu translations and provide s390 support for
> > MSI/MSI-X notification processing.
> > 
> > Signed-off-by: Frank Blaschka 
> > ---
> >  default-configs/s390x-softmmu.mak |   1 +
> >  hw/s390x/Makefile.objs|   1 +
> >  hw/s390x/css.c|   5 +
> >  hw/s390x/css.h|   1 +
> >  hw/s390x/s390-pci-bus.c   | 485 
> > ++
> >  hw/s390x/s390-pci-bus.h   | 254 
> >  hw/s390x/s390-virtio-ccw.c|   3 +
> >  hw/s390x/sclp.c   |  10 +-
> >  include/hw/s390x/sclp.h   |   8 +
> >  target-s390x/ioinst.c |  52 
> >  target-s390x/ioinst.h |   1 +
> >  11 files changed, 820 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/s390x/s390-pci-bus.c
> >  create mode 100644 hw/s390x/s390-pci-bus.h
> > 
> > diff --git a/default-configs/s390x-softmmu.mak 
> > b/default-configs/s390x-softmmu.mak
> > index 126d88d..6ee2ff8 100644
> > --- a/default-configs/s390x-softmmu.mak
> > +++ b/default-configs/s390x-softmmu.mak
> > @@ -1,3 +1,4 @@
> > +include pci.mak
> >  CONFIG_VIRTIO=y
> >  CONFIG_SCLPCONSOLE=y
> >  CONFIG_S390_FLIC=y
> > diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
> > index 1ba6c3a..428d957 100644
> > --- a/hw/s390x/Makefile.objs
> > +++ b/hw/s390x/Makefile.objs
> > @@ -8,3 +8,4 @@ obj-y += ipl.o
> >  obj-y += css.o
> >  obj-y += s390-virtio-ccw.o
> >  obj-y += virtio-ccw.o
> > +obj-y += s390-pci-bus.o
> > diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> > index b67c039..7553085 100644
> > --- a/hw/s390x/css.c
> > +++ b/hw/s390x/css.c
> > @@ -1299,6 +1299,11 @@ void css_generate_chp_crws(uint8_t cssid, uint8_t 
> > chpid)
> >  /* TODO */
> >  }
> >  
> > +void css_generate_css_crws(uint8_t cssid)
> > +{
> > +css_queue_crw(CRW_RSC_CSS, 0, 0, 0);
> > +}
> > +
> >  int css_enable_mcsse(void)
> >  {
> >  trace_css_enable_facility("mcsse");
> > diff --git a/hw/s390x/css.h b/hw/s390x/css.h
> > index 33104ac..7e53148 100644
> > --- a/hw/s390x/css.h
> > +++ b/hw/s390x/css.h
> > @@ -101,6 +101,7 @@ void css_queue_crw(uint8_t rsc, uint8_t erc, int chain, 
> > uint16_t rsid);
> >  void css_generate_sch_crws(uint8_t cssid, uint8_t ssid, uint16_t schid,
> > int hotplugged, int add);
> >  void css_generate_chp_crws(uint8_t cssid, uint8_t chpid);
> > +void css_generate_css_crws(uint8_t cssid);
> >  void css_adapter_interrupt(uint8_t isc);
> >  
> >  #define CSS_IO_ADAPTER_VIRTIO 1
> > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> > new file mode 100644
> > index 000..f2fa6ba
> > --- /dev/null
> > +++ b/hw/s390x/s390-pci-bus.c
> > @@ -0,0 +1,485 @@
> > +/*
> > + * s390 PCI BUS
> > + *
> > + * Copyright 2014 IBM Corp.
> > + * Author(s): Frank Blaschka 
> > + *Hong Bo Li 
> > + *Yi Min Zhao 
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> > + * your option) any later version. See the COPYING file in the top-level
> > + * directory.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include "qemu/error-report.h"
> > +#include "s390-pci-bus.h"
> > +
> > +/* #define DEBUG_S390PCI_BUS */
> > +#ifdef DEBUG_S390PCI_BUS
> > +#define DPRINTF(fmt, ...) \
> > +do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
> > +#else
> > +#define DPRINTF(fmt, ...) \
> > +do { } while (0)
> > +#endif
> > +
> > +static const unsigned long be_to_le = BITS_PER_LONG - 1;
> > +static QTAILQ_HEAD(, SeiContainer) pending_sei =
> > +QTAILQ_HEAD_INITIALIZER(pending_sei);
> > +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
> > +QTAILQ_HEAD_INITIALIZER(device_list);
> 
> Please get rid of all statics ;). All state has to live in objects.
>

be_to_le was misleading and unnecesary will remove this one but
static QTAILQ_HEAD seems to be a common practice for list anchors.
If you really want me to change this do you have any prefered way,
or can you point me to some code doing this?

> > +
> > +int chsc_sei_nt2_get_event(void *res)
> > +{
> > +ChscSeiNt2Res *nt2_res = (ChscSeiNt2Res *)res;
> > +PciCcdfAvail *accdf;
> > +PciCcdfErr *eccdf;
> > +int rc = 1;
> > +SeiContainer *sei_cont;
> > +
> > +sei_cont = QTAILQ_FIRST(&pending_sei);
> > +if (sei_cont) {
> > +QTAILQ_REMOVE(&pending_sei, sei_cont, link);
> > +nt2_res->nt = 2;
> > +nt2_res->cc = sei_cont->cc;
> > +switch (sei_cont->cc) {
> > +case 1: /* error event */
> > +eccdf = (PciCcdfErr *)nt2_res->ccdf;
> > +eccdf->f

Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series

2014-11-18 Thread Vladimir Sementsov-Ogievskiy


(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent bitmap, delay the write until 
we can mark the bitmap as dirty first. This incurs a write penalty 
when we try to use the bitmap at first...

- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty 
information to the file and mark the file as clean once more and 
re-take the persistence lock. 

Correct me if I'm wrong.

#Read bitmap:
read in blockdev_init, before any write to device, so no lock is needed.

#Set bits in bitmap:
if bitmap.dirty_flag:
   set bits
else:
   LOCK
   set bits
   set bitmap.dirty_flag
   set dirty_flag in bitmap file
   UNLOCK

#Sync:
if not bitmap.dirty_flag:
   skip sync
else:
   LOCK
   save one of bitmap levels (saving the last one is too long and not 
very good idea, because it is fast-updateing)

   unset dirty_flag in bitmap file
   unset bitmap.dirty_flag
   UNLOCK

#Last sync in bdrv_close:
Just save the last bitmap level and unset dirty_flag in bitmap file

Also.. I'm not quite sure about locking.. As I understand, co-routines 
in qemu are not running in parallel, is locking required? Or sync timer 
will not be co-routine based?


Best regards,
Vladimir

On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote:



(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt 
if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE? 

- The name of the bitmap and also the size of this name



(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's 
the primary benefit you're seeking? 
Hmm. It may be used for faster sync. Maybe, save some of bitmap levels 
on timer while vm is running and save the last level on shutdown?


CC qemu-devel - ok.

Best regards,
Vladimir

On 18.11.2014 02:46, John Snow wrote:



On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi

I'd just like to start working on persistent dirty bitmap. My thoughts
about it are the following:
- qemu -drive file=file,dirty_bitmap=file
 so,  bitmap will be loaded with drive open and saved with drive 
close.

- save only meaningful (the last) level of the bitmap, restore all
levels on bitmap loading
- bool parameter "persistent" for bdrv_create_dirty_bitmap and
BdrvDirtyBitmap
- internal dirty_bitmaps, saved in qcow2 file

Best regards,
Vladimir


I am thinking:

(1) Command Lines

If you enable dirty bitmaps and give it a file that doesn't exist, it 
should error out on you.


If you enable dirty bitmaps and give it a file that's blank, it 
understands that it is to create a persistent bitmap file in this 
location and it should enable persistence.


If a bitmap file is given and it has valid magic, this should imply 
persistence.


I am hesitant to have it auto-create files that don't already exist 
in case the files become large in size and a misconfiguration leads 
to repeated creation of these files that get orphaned in random 
folders. Perhaps we can add a create=auto flag or similar to allow 
this behavior if wanted.


(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt 
if we give ourselves a sector's worth to write metadata within.)

- Data starting at... PAGESIZE?

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent bitmap, delay the write until 
we can mark the bitmap as dirty first. This incurs a write penalty 
when we try to use the bitmap at first...

- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty 
information to the file and mark the file as clean once more and 
re-take the persistence lock.


(4) Synchronization Policy

- Sync after so many bits become dirty in the bitmap, either as an 
absolute threshold or a density percentage?

- Sync periodically on a fixed timer?
- Sync periodically opportunistically when I/O utilization becomes 
relati

Re: [Qemu-devel] runtime configurable semihosting

2014-11-18 Thread Peter Maydell

On 13 November 2014 09:44, Liviu Ionescu  wrote:
> Peter Maydell wrote:
>> I took a quick look at the syntax of command options and monitor
>> commands, and I would suggest the following:
>>
>> - extend the option "-semihosting" with an optional
>> "target=native|gdb|auto", default auto
>
> unfortunately the parser is not able to detect a missing optional, and
> will always consume the next option, so this syntax is not available.

Sorry, yes, you're right; I should have looked more carefully.
So we do need a new option; however I think it would be better
for the new option to be a general qemuopts option, like this:

 -semihosting-config target=[native|gdb|auto]

because that then gives us a place to put future semihosting
related config options, and also allows the config files used
by -readconfig/-writeconfig to change this setting. (We can
have -semihosting-config have an implied 'enable=true', so
you don't have to use the old "-semihosting" option if you're
specifying -semihosting-config.)

thanks
-- PMM

Re: [Qemu-devel] [PULL] migration: fix for CVE-2014-7840

2014-11-18 Thread Peter Maydell

On 18 November 2014 11:29, Amit Shah  wrote:
> The following changes since commit d6be29e3fb5659102ac0e48e295d177cb67e32c5:
>
>   target-arm: handle address translations that start at level 3 (2014-11-17 
> 19:30:28 +)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/qemu/amit/migration.git tags/for-2.2
>
> for you to fetch changes up to 0be839a2701369f669532ea5884c15bead1c6e08:
>
>   migration: fix parameter validation on ram load (2014-11-18 16:49:44 +0530)
>
> 
> Fix for CVE-2014-7840, avoiding arbitrary qemu memory overwrite for
> migration by Michael S. Tsirkin.
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH 00/13] linux-aio/virtio-scsi: support AioContext wide IO submission as batch

2014-11-18 Thread Paolo Bonzini



On 09/11/2014 08:42, Ming Lei wrote:
> This patch implements AioContext wide IO submission as batch, and
> the idea behind is very simple:
> 
>   - linux native aio(io_submit) supports to enqueue read/write requests
>   to different files
> 
>   - in one AioContext, I/O requests from VM can be submitted to different
>   backend in host, one typical example is multi-lun scsi 
> 
> This patch changes 'struct qemu_laio_state' as per AioContext, and
> multiple 'bs' can be associted with one single instance of
> 'struct qemu_laio_state', then AioContext wide IO submission as batch
> becomes easy to implement.
> 
> One simple test in my laptop shows ~20% throughput improvement
> on randread from VM(using AioContext wide IO batch vs. not using io batch)
> with below config:
> 
>   -drive 
> id=drive_scsi1-0-0-0,if=none,format=raw,cache=none,aio=native,file=/dev/nullb2
>  \
>   -drive 
> id=drive_scsi1-0-0-1,if=none,format=raw,cache=none,aio=native,file=/dev/nullb3
>  \
>   -device 
> virtio-scsi-pci,num_queues=4,id=scsi1,addr=07,iothread=iothread0 \
>   -device 
> scsi-disk,bus=scsi1.0,channel=0,scsi-id=1,lun=0,drive=drive_scsi1-0-0-0,id=scsi1-0-0-0
>  \
>   -device 
> scsi-disk,bus=scsi1.0,channel=0,scsi-id=1,lun=1,drive=drive_scsi1-0-0-1,id=scsi1-0-0-1
>  \
> 
> BTW, maybe more boost can be obtained since ~33K/sec write() system call
> can be observed when this test case is running, and it might be a recent
> regression(BH?).

Ming,

these patches are interesting.  I would like to compare them with the
opposite approach (and, I think, more similar to your old work) where
the qemu_laio_state API is moved entirely into AioContext, with lazy
allocation (reference-counted too, probably).

Most of the patches would be the same, but you would replace
aio_attach_aio_bs/aio_detach_aio_bs with something like
aio_native_get/aio_native_unref.  Ultimately block/{linux,win32}-aio.c
could be merged into block/aio-{posix,win32}.c, but you do not have to
do that now.

Could you try that?  This way we can see which API turns out to be nicer.

Thanks,

Paolo

> This patchset can be found on below tree too:
> 
>   git://kernel.ubuntu.com/ming/qemu.git aio-io-batch.2
> 
> and these patches depend on "linux-aio: fix batch submission" patches
> in below link:
> 
>   http://marc.info/?l=qemu-devel&m=141528663106557&w=2
> 
> Any comments and suggestions are welcome.
> 
>  async.c |1 +
>  block.c |   16 +++
>  block/linux-aio.c   |  251 
> ++-
>  block/raw-aio.h |6 +-
>  block/raw-posix.c   |4 +-
>  hw/scsi/virtio-scsi-dataplane.c |8 ++
>  hw/scsi/virtio-scsi.c   |2 -
>  include/block/aio.h |   27 +
>  include/block/block.h   |3 +
>  9 files changed, 259 insertions(+), 59 deletions(-)
> 
> Thanks,
> Ming Lei
>

Re: [Qemu-devel] [PATCH v2 1/5] libqos: Change use of pointers to uint64_t in virtio

2014-11-18 Thread Stefan Hajnoczi

On Mon, Nov 17, 2014 at 04:25:50PM +0100, Andreas Färber wrote:
> Am 17.11.2014 um 16:19 schrieb Marc Marí:
> > El Mon, 17 Nov 2014 15:16:21 +
> > Stefan Hajnoczi  escribió:
> >> On Sat, Nov 01, 2014 at 06:02:26PM +0100, Marc Marí wrote:
> >>> Convert use of pointers in functions of virtio to uint64_t in order
> >>> to make it platform-independent.
> >>>
> >>> Add casting from pointers (in PCI functions) to uint64_t and vice
> >>> versa through uintptr_t.
> >>>
> >>> Signed-off-by: Marc Marí 
> >>> ---
> >>>  tests/libqos/virtio-pci.c |   20 +++-
> >>>  tests/libqos/virtio.c |8 
> >>>  tests/libqos/virtio.h |   16 
> >>>  tests/virtio-blk-test.c   |   21 ++---
> >>>  4 files changed, 37 insertions(+), 28 deletions(-)
> >>
> >> This makes sense.  To fully abolish pointers the PCI code also needs
> >> to be converted.  Do plan plan to do that?
> >>
> >> Stefan
> > 
> > Not planned, but if asked, I may do it.
> 
> Pretty please. :) I was wondering the same thing when reading the patch.

That said, you do not need to convert PCI in this patch series.

If you do the PCI conversion in a separate patch series we can keep
merging code incrementally.  It's more satisfying for you that way and
less intimidating for code reviews to review short series.

Stefan


pgpK9RujbspDI.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] Tracing: Fix simpletrace.py error on tcg enabled binary traces

2014-11-18 Thread Stefan Hajnoczi

On Mon, Nov 17, 2014 at 07:57:08PM -0600, Lluís Vilanova wrote:
> Stefan Hajnoczi writes:
> 
> > On Sun, Nov 02, 2014 at 10:37:59PM +0100, christoph.seif...@posteo.de wrote:
> >> From: Christoph Seifert 
> >> 
> >> simpletrace.py does not recognize the tcg option while reading trace-events
> >> file. In result simpletrace does not work on binary traces and tcg enabled
> >> events. Moved transformation of tcg enabled events to _read_events() which 
> >> is
> >> used by simpletrace.
> >> 
> >> Signed-off-by: Christoph Seifert 
> >> ---
> >> scripts/tracetool/__init__.py | 67 
> >> +--
> >> 1 file changed, 33 insertions(+), 34 deletions(-)
> 
> > Looks good to me.
> 
> > Lluís: Any comments?
> 
> I've looked at how it affects some patches I did not have time yet to send, 
> and
> everything looks good.

Fantastic, thanks for your time, Lluís!

Stefan


pgpVCTHO1OlEy.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 2/3] linux-aio: handling -EAGAIN for !s->io_q.plugged case

2014-11-18 Thread Paolo Bonzini



On 06/11/2014 16:10, Ming Lei wrote:
> +/* don't submit until next completion for -EAGAIN of non plug case */
> +if (unlikely(!s->io_q.plugged)) {
> +return 0;
> +}
> +

Is this an optimization or a fix for something?

> +/*
> + * Switch to queue mode until -EAGAIN is handled, we suppose
> + * there is always uncompleted I/O, so try to enqueue it first,
> + * and will be submitted again in following aio completion cb.
> + */
> +if (ret == -EAGAIN) {
> +goto enqueue;
> +} else if (ret < 0) {
>  goto out_free_aiocb;
>  }

Better:

 if (!s->io_q.plugged && !s->io_q.idx) {
ret = io_submit(s->ctx, 1, &iocbs);
if (ret >= 0) {
return &laiocb->common;
}
if (ret != -EAGAIN) {
goto out_free_aiocb;
}
}

/* code for queue mode.  */

Paolo

Re: [Qemu-devel] [PATCH v3 3/3] linux-aio: remove 'node' from 'struct qemu_laiocb'

2014-11-18 Thread Paolo Bonzini



On 06/11/2014 16:10, Ming Lei wrote:
> No one uses the 'node' field any more, so remove it
> from 'struct qemu_laiocb', and this can save 16byte
> for the struct on 64bit arch.
> 
> Signed-off-by: Ming Lei 
> ---
>  block/linux-aio.c |1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index f5ca41d..b12da25 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -35,7 +35,6 @@ struct qemu_laiocb {
>  size_t nbytes;
>  QEMUIOVector *qiov;
>  bool is_read;
> -QLIST_ENTRY(qemu_laiocb) node;
>  };
>  
>  /*
> 

Reviewed-by: Paolo Bonzini

Re: [Qemu-devel] [PATCH] Tracing: Fix simpletrace.py error on tcg enabled binary traces

2014-11-18 Thread Stefan Hajnoczi

On Sun, Nov 02, 2014 at 10:37:59PM +0100, christoph.seif...@posteo.de wrote:
> From: Christoph Seifert 
> 
> simpletrace.py does not recognize the tcg option while reading trace-events  
> file. In result simpletrace does not work on binary traces and tcg enabled 
> events. Moved transformation of tcg enabled events to _read_events() which is 
> used by simpletrace.
> 
> Signed-off-by: Christoph Seifert 
> ---
>  scripts/tracetool/__init__.py | 67 
> +--
>  1 file changed, 33 insertions(+), 34 deletions(-)

Thanks, applied to my tracing tree:
https://github.com/stefanha/qemu/commits/tracing

Stefan


pgpZe1fWea7WZ.pgp
Description: PGP signature

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-18 Thread Stefano Stabellini

Konrad,
I think we should have this fix in Xen 4.5. Should I go ahead and
backport it?

On Mon, 17 Nov 2014, Don Slutz wrote:
> The other callers to blk_set_enable_write_cache() in this file
> already check for s->blk == NULL.
> 
> Signed-off-by: Don Slutz 
> ---
> 
> I think this is a bugfix that should be back ported to stable
> releases.
> 
> I also think this should be done in xen's copy of QEMU for 4.5 with
> back port(s) to active stable releases.
> 
> Note: In 2.1 and earlier the routine is
> bdrv_set_enable_write_cache(); variable is s->bs.
> 
>  hw/ide/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 00e21cf..d4af5e2 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int 
> version_id)
>  {
>  IDEState *s = opaque;
>  
> -if (s->identify_set) {
> +if (s->blk && s->identify_set) {
>  blk_set_enable_write_cache(s->blk, !!(s->identify_data[85] & (1 << 
> 5)));
>  }
>  return 0;
> -- 
> 1.8.4
>

Re: [Qemu-devel] [PATCH v3 1/3] linux-aio: fix submit aio as a batch

2014-11-18 Thread Paolo Bonzini



> @@ -137,6 +145,12 @@ static void qemu_laio_completion_bh(void *opaque)
>  }
>  }
>  
> +static void qemu_laio_start_retry(struct qemu_laio_state *s)
> +{
> +if (s->io_q.idx)
> +qemu_bh_schedule(s->io_q.retry);
> +}
> +
>  static void qemu_laio_completion_cb(EventNotifier *e)
>  {
>  struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
> @@ -144,6 +158,7 @@ static void qemu_laio_completion_cb(EventNotifier *e)
>  if (event_notifier_test_and_clear(&s->e)) {
>  qemu_bh_schedule(s->completion_bh);
>  }
> +qemu_laio_start_retry(s);

I think you do not even need two bottom halves.  Just call ioq_submit
from completion_bh instead, after the call to io_getevents.

>  }
>  
>  static void laio_cancel(BlockAIOCB *blockacb)
> @@ -163,6 +178,9 @@ static void laio_cancel(BlockAIOCB *blockacb)
>  }
>  
>  laiocb->common.cb(laiocb->common.opaque, laiocb->ret);
> +
> +/* check if there are requests in io queue */
> +qemu_laio_start_retry(laiocb->ctx);
>  }
>  
>  static const AIOCBInfo laio_aiocb_info = {
> @@ -177,45 +195,80 @@ static void ioq_init(LaioQueue *io_q)
>  io_q->plugged = 0;
>  }
>  
> -static int ioq_submit(struct qemu_laio_state *s)
> +static void abort_queue(struct qemu_laio_state *s)
> +{
> +int i;
> +for (i = 0; i < s->io_q.idx; i++) {
> +struct qemu_laiocb *laiocb = container_of(s->io_q.iocbs[i],
> +  struct qemu_laiocb,
> +  iocb);
> +laiocb->ret = -EIO;
> +qemu_laio_process_completion(s, laiocb);
> +}
> +}
> +
> +static int ioq_submit(struct qemu_laio_state *s, bool enqueue)
>  {
>  int ret, i = 0;
>  int len = s->io_q.idx;
> +int j = 0;
>  
> -do {
> -ret = io_submit(s->ctx, len, s->io_q.iocbs);
> -} while (i++ < 3 && ret == -EAGAIN);
> +if (!len) {
> +return 0;
> +}
>  
> -/* empty io queue */
> -s->io_q.idx = 0;
> +ret = io_submit(s->ctx, len, s->io_q.iocbs);
> +if (ret == -EAGAIN) { /* retry in following completion cb */
> +return 0;
> +} else if (ret < 0) {
> +if (enqueue) {
> +return ret;
> +}
>  
> -if (ret < 0) {
> -i = 0;
> -} else {
> -i = ret;
> +/* in non-queue path, all IOs have to be completed */
> +abort_queue(s);
> +ret = len;
> +} else if (ret == 0) {
> +goto out;

No need for goto; just move the "for" loop inside this conditional.  Or
better, just use memmove.  That is:

if (ret < 0) {
if (ret == -EAGAIN) {
return 0;
}
if (enqueue) {
return ret;
}

abort_queue(s);
ret = len;
}

if (ret > 0) {
memmove(...)
s->io_q.idx -= ret;
}
return ret;

> + * update io queue, for partial completion, retry will be
> + * started automatically in following completion cb.
> + */
> +s->io_q.idx -= ret;
> +
>  return ret;
>  }
>  
> -static void ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
> +static void ioq_submit_retry(void *opaque)
> +{
> +struct qemu_laio_state *s = opaque;
> +ioq_submit(s, false);
> +}
> +
> +static int ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
>  {
>  unsigned int idx = s->io_q.idx;
>  
> +if (unlikely(idx == s->io_q.size)) {
> +return -1;

return -EAGAIN?

Thanks,

Paolo

> +}
> +
>  s->io_q.iocbs[idx++] = iocb;
>  s->io_q.idx = idx;
>  
> -/* submit immediately if queue is full */
> -if (idx == s->io_q.size) {
> -ioq_submit(s);
> +/* submit immediately if queue depth is above 2/3 */
> +if (idx > s->io_q.size * 2 / 3) {
> +return ioq_submit(s, true);
>  }
> +
> +return 0;
>  }
>  
>  void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
> @@ -237,7 +290,7 @@ int laio_io_unplug(BlockDriverState *bs, void *aio_ctx, 
> bool unplug)
>  }
>  
>  if (s->io_q.idx > 0) {
> -ret = ioq_submit(s);
> +ret = ioq_submit(s, false);
>  }
>  
>  return ret;
> @@ -281,7 +334,9 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, void 
> *aio_ctx, int fd,
>  goto out_free_aiocb;
>  }
>  } else {
> -ioq_enqueue(s, iocbs);
> +if (ioq_enqueue(s, iocbs) < 0) {
> +goto out_free_aiocb;
> +}
>  }
>  return &laiocb->common;
>  
> @@ -296,12 +351,14 @@ void laio_detach_aio_context(void *s_, AioContext 
> *old_context)
>  
>  aio_set_event_notifier(old_context, &s->e, NULL);
>  qemu_bh_delete(s->completion_bh);
> +qemu_bh_delete(s->io_q.retry);
>  }
>  
>  void laio_attach_aio_context(void *s_, AioContext *new_context)
>  {
>  struct qemu_laio_state *s = s_;
>  
> +s->io_q.retry = aio_bh_new(new_context, ioq_submit_retry, s);
>  s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_b

Re: [Qemu-devel] hotremoving a disk qmp/hmp

2014-11-18 Thread Markus Armbruster

William Dauchy  writes:

> Hello,
>
> When hotremoving a disk I'm using the QMP API with device_del command;
>
> Previous query-block command result:
>
> {   u'device': u'disk1',
> u'inserted': {   u'backing_file_depth': 0,
>  u'bps': 0,
>  u'bps_rd': 0,
>  u'bps_wr': 0,
>  u'detect_zeroes': u'off',
>  u'drv': u'raw',
>  u'encrypted': False,
>  u'encryption_key_missing': False,
>  u'file': u'/dev/sda',
>  u'image': {   u'actual-size': 0,
>u'dirty-flag': False,
>u'filename': u'/dev/sda',
>u'format': u'raw',
>u'virtual-size': 3221225472},
>  u'iops': 0,
>  u'iops_rd': 0,
>  u'iops_wr': 0,
>  u'ro': False},
> u'io-status': u'ok',
> u'locked': True,
> u'removable': True,
> u'tray_open': False,
> u'type': u'unknown'}

This is block backend "disk1".

> After a device_del command I have the same thing but "'locked': False"

I presume you're deleting the device using backend "disk1".

What kind of device is this?  PCI, perhaps?

> Then I can also do `eject device=disk1` which bring me to:
>
> {   u'device': u'disk1',
>u'io-status': u'ok',
>u'locked': False,
>u'removable': True,
>u'tray_open': False,
>u'type': u'unknown'}
>
> But I did not found a way to completely remove the disk1 entry in order
> to be able to add it again.
> To complete the operation I need to use the old HMP API and use
> `drive_del` command.
>
> Did I miss something? or is it still something missing in QMP API?

Please show us a complete QMP conversation.

Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series

2014-11-18 Thread Vladimir Sementsov-Ogievskiy

Also, if we sync not the last level, bitmap.dirty_flag should be related 
to syncing level, not to the whole bitmap, to reduce sync overhead.
Or, if we implement difficult sync policy, there should be dirty flags 
for each bitmap level. Despite this, only one level is needed to be 
saved in the bitmap file.


PS: more ideas about file format - thanks to Denis V. Lunev 

1) Shouldn't we consider a possibility of storing several bitmaps in one 
file? Or one bitmap = one file?

2) Implement header extensions like in qcow2.

Best regards,
Vladimir

On 18.11.2014 16:09, Vladimir Sementsov-Ogievskiy wrote:

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent bitmap, delay the write until 
we can mark the bitmap as dirty first. This incurs a write penalty 
when we try to use the bitmap at first...

- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty 
information to the file and mark the file as clean once more and 
re-take the persistence lock. 

Correct me if I'm wrong.

#Read bitmap:
read in blockdev_init, before any write to device, so no lock is needed.

#Set bits in bitmap:
if bitmap.dirty_flag:
   set bits
else:
   LOCK
   set bits
   set bitmap.dirty_flag
   set dirty_flag in bitmap file
   UNLOCK

#Sync:
if not bitmap.dirty_flag:
   skip sync
else:
   LOCK
   save one of bitmap levels (saving the last one is too long and not 
very good idea, because it is fast-updateing)

   unset dirty_flag in bitmap file
   unset bitmap.dirty_flag
   UNLOCK

#Last sync in bdrv_close:
Just save the last bitmap level and unset dirty_flag in bitmap file

Also.. I'm not quite sure about locking.. As I understand, co-routines 
in qemu are not running in parallel, is locking required? Or sync 
timer will not be co-routine based?


Best regards,
Vladimir

On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote:



(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't 
hurt if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE? 

- The name of the bitmap and also the size of this name



(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's 
the primary benefit you're seeking? 
Hmm. It may be used for faster sync. Maybe, save some of bitmap 
levels on timer while vm is running and save the last level on shutdown?


CC qemu-devel - ok.

Best regards,
Vladimir

On 18.11.2014 02:46, John Snow wrote:



On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi

I'd just like to start working on persistent dirty bitmap. My thoughts
about it are the following:
- qemu -drive file=file,dirty_bitmap=file
 so,  bitmap will be loaded with drive open and saved with 
drive close.

- save only meaningful (the last) level of the bitmap, restore all
levels on bitmap loading
- bool parameter "persistent" for bdrv_create_dirty_bitmap and
BdrvDirtyBitmap
- internal dirty_bitmaps, saved in qcow2 file

Best regards,
Vladimir


I am thinking:

(1) Command Lines

If you enable dirty bitmaps and give it a file that doesn't exist, 
it should error out on you.


If you enable dirty bitmaps and give it a file that's blank, it 
understands that it is to create a persistent bitmap file in this 
location and it should enable persistence.


If a bitmap file is given and it has valid magic, this should imply 
persistence.


I am hesitant to have it auto-create files that don't already exist 
in case the files become large in size and a misconfiguration leads 
to repeated creation of these files that get orphaned in random 
folders. Perhaps we can add a create=auto flag or similar to allow 
this behavior if wanted.


(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't 
hurt if we give ourselves a sector's worth to write metadata within.)

- Data starting at... PAGESIZE?

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent bitmap, delay the

Re: [Qemu-devel] [PULL 0/6] Block patches for 2.2.0-rc2

2014-11-18 Thread Peter Maydell

On 18 November 2014 11:35, Kevin Wolf  wrote:
> The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2014-11-17 17:22:03 +)
>
> are available in the git repository at:
>
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 098ffa6674a82ceac0e3ccb3a8a5bf6ca44adcd5:
>
>   block/raw-posix: Catch fsync() errors (2014-11-18 12:09:00 +0100)
>
> 
> Block patches for 2.2.0-rc2
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] exec.c:invalidate_and_set_dirty() only checks whether first page in its range is dirty...

2014-11-18 Thread Stefan Hajnoczi

On Sun, Nov 16, 2014 at 06:11:48PM +, Peter Maydell wrote:
> I'm trying to track down a bug in ARM TCG where we:
>  * boot a guest
>  * run 'shutdown -r now' to trigger a reboot
>  * on reboot, crash when running userspace because the contents
>of physical RAM have changed but the translated code from
>before the shutdown was never invalidated
> 
> This is with a virtio-mmio block device as the disk.
> 
> Debugging indicates that when the post-reboot guest reloads
> binaries from disk into ram we fail to invalidate the cached
> translations. For the specific case I looked at, we have a
> translation of code at ramaddr_t 0x806e000. The disk load
> pulls 0x16000 bytes of data off disk to address 0x806a000.
> Virtio correctly calls address_space_unmap(), which is supposed
> to be what marks the ram range as dirty. It in turn calls
> invalidate_and_set_clean(). However invalidate_and_set_clean()
> just does this:
> 
> if (cpu_physical_memory_is_clean(addr)) {
> /* invalidate code */
> tb_invalidate_phys_page_range(addr, addr + length, 0);
> /* set dirty bit */
> cpu_physical_memory_set_dirty_range_nocode(addr, length);
> }
> 
> So if the first page in the range (here 0x806a000) happens
> to be dirty then we won't do anything, even if later pages
> in the range do need to be invalidated. Also, we'll call
> tb_invalidate_phys_page_range() with a start/end which may
> be in different physical pages, which is forbidden by that
> function's API.
> 
> I guess invalidate_and_set_clean() really needs to be
> fixed to loop through each page in the range; does anybody
> know how this is supposed to work (or why nobody's noticed
> this bug before :-)) ?

Not directly but I don't like this code because it's not atomic.  I'll
send patches soon for atomic test-and-set and test-and-clear.  Hopefully
it won't impact performance too much.

What you've discovered seems like a plain old bug.  It needs a loop.

Stefan


pgpG50EJ2bRrO.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH 0/5] pc: make ROMs resizeable

2014-11-18 Thread Michael S. Tsirkin

On Tue, Nov 18, 2014 at 03:47:16PM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin"  writes:
> 
> > At the moment we migrate ROMs which reside in fw cfg, which allows
> > changing ROM code at will, and supports migrating largish blocks early,
> > with good performance.
> > However, we are running into a problem: changing size breaks
> > migration every time.
> 
> Pardon my ignorance... changing ROM in migration?  I would expect
> migration to be as transparent for ROM as it is for RAM.  What am I
> missing?
> 
> [...]

Nothing really.

We migrate RAM size too - but if there's a mismatch, migration fails.

RAM size is user configurable.

ROM is used internally so we have to figure out the size,
and it turned out to be a hard problem, so we end up maintaining
logic for ROM size "like we did in 1.7" "like we did in 2.0" etc.

I don't want to add any more: let's just accept whatever is migrated,
and stick to it.

-- 
MST

[Qemu-devel] [PULL 1/2] Tracing docs fix configure option and description

2014-11-18 Thread Stefan Hajnoczi

From: "Dr. David Alan Gilbert" 

Fix the example trace configure option.
Update the text to say that multiple backends are allowed and what
happens when multiple backends are enabled.

Signed-off-by: Dr. David Alan Gilbert 
Message-id: 1412691161-31785-1-git-send-email-dgilb...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 docs/tracing.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index 7d38926..7117c5e 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -139,12 +139,12 @@ events are not tightly coupled to a specific trace 
backend, such as LTTng or
 SystemTap.  Support for trace backends can be added by extending the 
"tracetool"
 script.
 
-The trace backend is chosen at configure time and only one trace backend can
-be built into the binary:
+The trace backends are chosen at configure time:
 
-./configure --trace-backends=simple
+./configure --enable-trace-backends=simple
 
 For a list of supported trace backends, try ./configure --help or see below.
+If multiple backends are enabled, the trace is sent to them all.
 
 The following subsections describe the supported trace backends.
 
-- 
2.1.0

[Qemu-devel] [PULL 0/2] Tracing patches

2014-11-18 Thread Stefan Hajnoczi

The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2014-11-17 17:22:03 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/tracing-pull-request

for you to fetch changes up to 776ec96f790e2c943c13313d8ecab4713b47ab65:

  Tracing: Fix simpletrace.py error on tcg enabled binary traces (2014-11-18 
14:05:58 +)





Christoph Seifert (1):
  Tracing: Fix simpletrace.py error on tcg enabled binary traces

Dr. David Alan Gilbert (1):
  Tracing docs fix configure option and description

 docs/tracing.txt  |  6 ++--
 scripts/tracetool/__init__.py | 67 +--
 2 files changed, 36 insertions(+), 37 deletions(-)

-- 
2.1.0

[Qemu-devel] [PULL 2/2] Tracing: Fix simpletrace.py error on tcg enabled binary traces

2014-11-18 Thread Stefan Hajnoczi

From: Christoph Seifert 

simpletrace.py does not recognize the tcg option while reading trace-events  
file. In result simpletrace does not work on binary traces and tcg enabled 
events. Moved transformation of tcg enabled events to _read_events() which is 
used by simpletrace.

Signed-off-by: Christoph Seifert 
Signed-off-by: Stefan Hajnoczi 
---
 scripts/tracetool/__init__.py | 67 +--
 1 file changed, 33 insertions(+), 34 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 3d5743f..181675f 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -253,14 +253,44 @@ class Event(object):
 
 
 def _read_events(fobj):
-res = []
+events = []
 for line in fobj:
 if not line.strip():
 continue
 if line.lstrip().startswith('#'):
 continue
-res.append(Event.build(line))
-return res
+
+event = Event.build(line)
+
+# transform TCG-enabled events
+if "tcg" not in event.properties:
+events.append(event)
+else:
+event_trans = event.copy()
+event_trans.name += "_trans"
+event_trans.properties += ["tcg-trans"]
+event_trans.fmt = event.fmt[0]
+args_trans = []
+for atrans, aorig in zip(
+event_trans.transform(tracetool.transform.TCG_2_HOST).args,
+event.args):
+if atrans == aorig:
+args_trans.append(atrans)
+event_trans.args = Arguments(args_trans)
+event_trans = event_trans.copy()
+
+event_exec = event.copy()
+event_exec.name += "_exec"
+event_exec.properties += ["tcg-exec"]
+event_exec.fmt = event.fmt[1]
+event_exec = event_exec.transform(tracetool.transform.TCG_2_HOST)
+
+new_event = [event_trans, event_exec]
+event.event_trans, event.event_exec = new_event
+
+events.extend(new_event)
+
+return events
 
 
 class TracetoolError (Exception):
@@ -333,35 +363,4 @@ def generate(fevents, format, backends,
 
 events = _read_events(fevents)
 
-# transform TCG-enabled events
-new_events = []
-for event in events:
-if "tcg" not in event.properties:
-new_events.append(event)
-else:
-event_trans = event.copy()
-event_trans.name += "_trans"
-event_trans.properties += ["tcg-trans"]
-event_trans.fmt = event.fmt[0]
-args_trans = []
-for atrans, aorig in zip(
-event_trans.transform(tracetool.transform.TCG_2_HOST).args,
-event.args):
-if atrans == aorig:
-args_trans.append(atrans)
-event_trans.args = Arguments(args_trans)
-event_trans = event_trans.copy()
-
-event_exec = event.copy()
-event_exec.name += "_exec"
-event_exec.properties += ["tcg-exec"]
-event_exec.fmt = event.fmt[1]
-event_exec = event_exec.transform(tracetool.transform.TCG_2_HOST)
-
-new_event = [event_trans, event_exec]
-event.event_trans, event.event_exec = new_event
-
-new_events.extend(new_event)
-events = new_events
-
 tracetool.format.generate(events, format, backend)
-- 
2.1.0

Re: [Qemu-devel] hotremoving a disk qmp/hmp

2014-11-18 Thread William Dauchy

On Nov18 15:22, Markus Armbruster wrote:
> This is block backend "disk1".

yes indeed.

> I presume you're deleting the device using backend "disk1".

yes

> What kind of device is this?  PCI, perhaps?
> Please show us a complete QMP conversation.

Here it is:

live vm with one disk:

(QEMU) query-block
{   u'return': [   {   u'device': u'disk0',
   u'inserted': {   u'backing_file_depth': 0,
u'bps': 0,
u'bps_rd': 0,
u'bps_wr': 0,
u'detect_zeroes': u'off',
u'drv': u'raw',
u'encrypted': False,
u'encryption_key_missing': False,
u'file': u'/dev/sda',
u'image': {   u'actual-size': 0,
  u'dirty-flag': False,
  u'filename': u'/dev/sda',
  u'format': u'raw',
  u'virtual-size': 
3221225472},
u'iops': 0,
u'iops_rd': 0,
u'iops_wr': 0,
u'ro': False},
   u'io-status': u'ok',
   u'locked': True,
   u'removable': True,
   u'tray_open': False,
   u'type': u'unknown'}]}

hotplugging one disk:

(QEMU) blockdev-add
with
{'options' : {
'driver': 'raw',
'id': 'disk1',
'file': {
'driver': 'file',
'filename': /dev/sdb,
}
}}

(QEMU) device_add
with:
{
'driver': 'scsi-hd',
'id': 'disk1',
'drive': 'disk1',
'scsi-id': 1,
'removable': 'on',
'dpofua': 'off'
}

(QEMU) query-block
{   u'return': [   {   u'device': u'disk0',
   u'inserted': {   u'backing_file_depth': 0,
u'bps': 0,
u'bps_rd': 0,
u'bps_wr': 0,
u'detect_zeroes': u'off',
u'drv': u'raw',
u'encrypted': False,
u'encryption_key_missing': False,
u'file': u'/dev/sda',
u'image': {   u'actual-size': 0,
  u'dirty-flag': False,
  u'filename': u'/dev/sda',
  u'format': u'raw',
  u'virtual-size': 
3221225472},
u'iops': 0,
u'iops_rd': 0,
u'iops_wr': 0,
u'ro': False},
   u'io-status': u'ok',
   u'locked': True,
   u'removable': True,
   u'tray_open': False,
   u'type': u'unknown'},,
   {   u'device': u'disk1',
   u'inserted': {   u'backing_file_depth': 0,
u'bps': 0,
u'bps_rd': 0,
u'bps_wr': 0,
u'detect_zeroes': u'off',
u'drv': u'raw',
u'encrypted': False,
u'encryption_key_missing': False,
u'file': u'/dev/sdb',
u'image': {   u'actual-size': 0,
  u'dirty-flag': False,
  u'filename': u'/dev/sdb',
  u'format': u'raw',
  u'virtual-size': 
3221225472},
u'iops': 0,
u'iops_rd': 0,
u'iops_wr': 0,
u'ro': False},
   u'io-status': u'ok',
   u'locked': True,
   u'removable': True,
   u'tray_open': False,
   u'type': u'unknown'}]}

hotremoving disk1:

(QEMU) device_del
with:
{'id': 'disk1'}

(QEMU) query-block
{   u'return': [   {

[Qemu-devel] [PULL] Net patches

2014-11-18 Thread Stefan Hajnoczi

The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2014-11-17 17:22:03 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/net-pull-request

for you to fetch changes up to ed6273e26fdfb94a282dbbf1234a75422c6b4c4b:

  net: The third parameter of getsockname should be initialized (2014-11-18 
15:04:35 +)





zhanghailiang (1):
  net: The third parameter of getsockname should be initialized

 net/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.1.0

Re: [Qemu-devel] [PATCH 0/5] pc: make ROMs resizeable

2014-11-18 Thread Markus Armbruster

"Michael S. Tsirkin"  writes:

> At the moment we migrate ROMs which reside in fw cfg, which allows
> changing ROM code at will, and supports migrating largish blocks early,
> with good performance.
> However, we are running into a problem: changing size breaks
> migration every time.

Pardon my ignorance... changing ROM in migration?  I would expect
migration to be as transparent for ROM as it is for RAM.  What am I
missing?

[...]

[Qemu-devel] [PULL] net: The third parameter of getsockname should be initialized

2014-11-18 Thread Stefan Hajnoczi

From: zhanghailiang 

Signed-off-by: zhanghailiang 
Reviewed-by: Markus Armbruster 
Signed-off-by: Stefan Hajnoczi 
---
 net/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index fb21e20..ca4b8ba 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -352,7 +352,7 @@ static NetSocketState 
*net_socket_fd_init_dgram(NetClientState *peer,
 {
 struct sockaddr_in saddr;
 int newfd;
-socklen_t saddr_len;
+socklen_t saddr_len = sizeof(saddr);
 NetClientState *nc;
 NetSocketState *s;
 
-- 
2.1.0

Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series

2014-11-18 Thread John Snow




On 11/18/2014 08:09 AM, Vladimir Sementsov-Ogievskiy wrote:

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock
against any of the bitmap functions to prevent them from marking any
bits dirty.
- On first write to a clean persistent bitmap, delay the write until
we can mark the bitmap as dirty first. This incurs a write penalty
when we try to use the bitmap at first...
- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty
information to the file and mark the file as clean once more and
re-take the persistence lock.

Correct me if I'm wrong.

#Read bitmap:
read in blockdev_init, before any write to device, so no lock is needed.

#Set bits in bitmap:
if bitmap.dirty_flag:
set bits
else:
LOCK
set bits
set bitmap.dirty_flag
set dirty_flag in bitmap file
UNLOCK

#Sync:
if not bitmap.dirty_flag:
skip sync
else:
LOCK
save one of bitmap levels (saving the last one is too long and not
very good idea, because it is fast-updateing)
unset dirty_flag in bitmap file
unset bitmap.dirty_flag
UNLOCK

#Last sync in bdrv_close:
Just save the last bitmap level and unset dirty_flag in bitmap file

Also.. I'm not quite sure about locking.. As I understand, co-routines
in qemu are not running in parallel, is locking required? Or sync timer
will not be co-routine based?

Best regards,
Vladimir


Might be being too informal. I just meant a lock or barrier to prevent 
actual IO throughput until we can confirm the dirty flag has been 
adjusted to indicate that the persistent bitmap is now officially out of 
date. Nothing fancy.


Wasn't trying to imply that we needed threading protection, just 
"locking" the IO until we can configure the bitmap as we need it to be.



On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote:



(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt
if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE?

- The name of the bitmap and also the size of this name



(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's
the primary benefit you're seeking?

Hmm. It may be used for faster sync. Maybe, save some of bitmap levels
on timer while vm is running and save the last level on shutdown?

CC qemu-devel - ok.

Best regards,
Vladimir

On 18.11.2014 02:46, John Snow wrote:



On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi

I'd just like to start working on persistent dirty bitmap. My thoughts
about it are the following:
- qemu -drive file=file,dirty_bitmap=file
 so,  bitmap will be loaded with drive open and saved with drive
close.
- save only meaningful (the last) level of the bitmap, restore all
levels on bitmap loading
- bool parameter "persistent" for bdrv_create_dirty_bitmap and
BdrvDirtyBitmap
- internal dirty_bitmaps, saved in qcow2 file

Best regards,
Vladimir


I am thinking:

(1) Command Lines

If you enable dirty bitmaps and give it a file that doesn't exist, it
should error out on you.

If you enable dirty bitmaps and give it a file that's blank, it
understands that it is to create a persistent bitmap file in this
location and it should enable persistence.

If a bitmap file is given and it has valid magic, this should imply
persistence.

I am hesitant to have it auto-create files that don't already exist
in case the files become large in size and a misconfiguration leads
to repeated creation of these files that get orphaned in random
folders. Perhaps we can add a create=auto flag or similar to allow
this behavior if wanted.

(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt
if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE?

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock
against any of the bitmap functions to prevent them from marking any
bits dirty.
- On first write to a clean persistent bitmap, delay the write until
we can mark the bitmap as dirty first. This incurs a write penalty
when we try to use the bitmap at first...
- Unlock the bitmap functions and allow them to mark blocks as needed.
-

[Qemu-devel] Fwd: [PATCH v2 1/6] geometry: add bdrv functions for geometry and blocksize

2014-11-18 Thread Ekaterina Tumanova



copying to mail list

 Forwarded Message 
Subject: [PATCH v2 1/6] geometry: add bdrv functions for geometry and 
blocksize

Date: Tue, 18 Nov 2014 17:09:56 +0100
From: Ekaterina Tumanova 
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 



Add driver functions for geometry and blocksize detection

Signed-off-by: Ekaterina Tumanova 
---
 block.c   | 26 ++
 include/block/block.h | 20 
 include/block/block_int.h |  3 +++
 3 files changed, 49 insertions(+)

diff --git a/block.c b/block.c
index a612594..5df35cf 100644
--- a/block.c
+++ b/block.c
@@ -548,6 +548,32 @@ void bdrv_refresh_limits(BlockDriverState *bs, 
Error **errp)

 }
 }

+struct ProbeBlockSize bdrv_probe_blocksizes(BlockDriverState *bs)
+{
+BlockDriver *drv = bs->drv;
+struct ProbeBlockSize err_geo = { .rc = -1 };
+
+assert(drv != NULL);
+if (drv->bdrv_probe_blocksizes) {
+return drv->bdrv_probe_blocksizes(bs);
+}
+
+return err_geo;
+}
+
+struct ProbeGeometry bdrv_probe_geometry(BlockDriverState *bs)
+{
+BlockDriver *drv = bs->drv;
+struct ProbeGeometry err_geo = { .rc = -1 };
+
+assert(drv != NULL);
+if (drv->bdrv_probe_geometry) {
+return drv->bdrv_probe_geometry(bs);
+}
+
+return err_geo;
+}
+
 /*
  * Create a uniquely-named empty temporary file.
  * Return 0 upon success, otherwise a negative errno value.
diff --git a/include/block/block.h b/include/block/block.h
index 5450610..3287dbc 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -60,6 +60,24 @@ typedef enum {
 BDRV_REQ_MAY_UNMAP= 0x4,
 } BdrvRequestFlags;

+struct ProbeBlockSize {
+int rc;
+struct BlockSize {
+uint16_t phys;
+uint16_t log;
+} size;
+};
+
+struct ProbeGeometry {
+int rc;
+struct HDGeometry {
+uint32_t heads;
+uint32_t sectors;
+uint32_t cylinders;
+uint32_t start;
+} geo;
+};
+
 #define BDRV_O_RDWR0x0002
 #define BDRV_O_SNAPSHOT0x0008 /* open the file read only and save 
writes in a snapshot */

 #define BDRV_O_TEMPORARY   0x0010 /* delete the file after use */
@@ -538,6 +556,8 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs);
  * the old #AioContext is not executing.
  */
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);
+struct ProbeBlockSize bdrv_probe_blocksizes(BlockDriverState *bs);
+struct ProbeGeometry bdrv_probe_geometry(BlockDriverState *bs);

 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a1c17b9..830e564 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -271,6 +271,9 @@ struct BlockDriver {
 void (*bdrv_io_unplug)(BlockDriverState *bs);
 void (*bdrv_flush_io_queue)(BlockDriverState *bs);

+struct ProbeBlockSize (*bdrv_probe_blocksizes)(BlockDriverState *bs);
+struct ProbeGeometry (*bdrv_probe_geometry)(BlockDriverState *bs);
+
 QLIST_ENTRY(BlockDriver) list;
 };

--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 2/6] geometry: Detect blocksize via ioctls in separate static functions

2014-11-18 Thread Ekaterina Tumanova


copying to mail list


 Forwarded Message 
Subject: [PATCH v2 2/6] geometry: Detect blocksize via ioctls in 
separate static functions

Date: Tue, 18 Nov 2014 17:09:57 +0100
From: Ekaterina Tumanova 
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 



Move the IOCTL calls that detect logical blocksize from raw_probe_alignment
into separate function (probe_logical_blocksize).
Introduce function which detect physical blocksize via IOCTL
(probe_physical_blocksize).
Both functions will be used in the next patch.

Signed-off-by: Ekaterina Tumanova 
---
 block/raw-posix.c | 58 
+--

 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e100ae2..45f1d79 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -223,50 +223,70 @@ static int raw_normalize_devicepath(const char 
**filename)

 }
 #endif

-static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
+static unsigned int probe_logical_blocksize(BlockDriverState *bs, int fd)
 {
-BDRVRawState *s = bs->opaque;
-char *buf;
-unsigned int sector_size;
-
-/* For /dev/sg devices the alignment is not really used.
-   With buffered I/O, we don't have any restrictions. */
-if (bs->sg || !s->needs_alignment) {
-bs->request_alignment = 1;
-s->buf_align = 1;
-return;
-}
+unsigned int sector_size = 0;

 /* Try a few ioctls to get the right size */
-bs->request_alignment = 0;
-s->buf_align = 0;
-
 #ifdef BLKSSZGET
 if (ioctl(fd, BLKSSZGET, §or_size) >= 0) {
-bs->request_alignment = sector_size;
+return sector_size;
 }
 #endif
 #ifdef DKIOCGETBLOCKSIZE
 if (ioctl(fd, DKIOCGETBLOCKSIZE, §or_size) >= 0) {
-bs->request_alignment = sector_size;
+return sector_size;
 }
 #endif
 #ifdef DIOCGSECTORSIZE
 if (ioctl(fd, DIOCGSECTORSIZE, §or_size) >= 0) {
-bs->request_alignment = sector_size;
+return sector_size;
 }
 #endif
 #ifdef CONFIG_XFS
 if (s->is_xfs) {
 struct dioattr da;
 if (xfsctl(NULL, fd, XFS_IOC_DIOINFO, &da) >= 0) {
-bs->request_alignment = da.d_miniosz;
+sector_size = da.d_miniosz;
 /* The kernel returns wrong information for d_mem */
 /* s->buf_align = da.d_mem; */
+return sector_size;
 }
 }
 #endif

+return 0;
+}
+
+static unsigned int probe_physical_blocksize(BlockDriverState *bs, int fd)
+{
+unsigned int blk_size = 0;
+#ifdef BLKPBSZGET
+if (ioctl(fd, BLKPBSZGET, &blk_size) >= 0) {
+return blk_size;
+}
+#endif
+
+return 0;
+}
+
+static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
+{
+BDRVRawState *s = bs->opaque;
+char *buf;
+
+/* For /dev/sg devices the alignment is not really used.
+   With buffered I/O, we don't have any restrictions. */
+if (bs->sg || !s->needs_alignment) {
+bs->request_alignment = 1;
+s->buf_align = 1;
+return;
+}
+
+s->buf_align = 0;
+/* Let's try to use the logical blocksize for the alignment. */
+bs->request_alignment = probe_logical_blocksize(bs, fd);
+
 /* If we could not get the sizes so far, we can only guess them */
 if (!s->buf_align) {
 size_t align;
--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 3/6] geometry: Add driver methods to probe blocksizes and geometry

2014-11-18 Thread Ekaterina Tumanova



copying to mail list

 Forwarded Message 
Subject: [PATCH v2 3/6] geometry: Add driver methods to probe blocksizes 
and geometry

Date: Tue, 18 Nov 2014 17:09:58 +0100
From: Ekaterina Tumanova 
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 



This patch introduces driver methods of defining disk blocksizes
(physical and logical) and hard drive geometry.
The method is only implemented for "host_device". For "raw" devices
driver calls child's method.
The detection will only work for DASD devices. In order to check that
a local CheckForDASD function was introduced, which calls BIODASDINFO2 ioctl
and returns 0 only if it succeeds.

Signed-off-by: Ekaterina Tumanova 
---
 block/raw-posix.c | 65 
+++

 block/raw_bsd.c   | 12 ++
 2 files changed, 77 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 45f1d79..274ceda 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifndef FS_NOCOW_FL
 #define FS_NOCOW_FL 0x0080 /* Do not cow file */
 #endif
@@ -93,6 +94,10 @@
 #include 
 #endif

+#ifdef __s390__
+#include 
+#endif
+
 //#define DEBUG_FLOPPY

 //#define DEBUG_BLOCK
@@ -678,6 +683,64 @@ static void raw_refresh_limits(BlockDriverState 
*bs, Error **errp)

 bs->bl.opt_mem_alignment = s->buf_align;
 }

+static int CheckForDASD(int fd)
+{
+#ifdef BIODASDINFO2
+struct dasd_information2_t info = {0};
+
+return ioctl(fd, BIODASDINFO2, &info);
+#endif
+return -1;
+}
+
+static struct ProbeBlockSize hdev_probe_blocksizes(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+struct ProbeBlockSize block_sizes = {0};
+
+block_sizes.rc = CheckForDASD(s->fd);
+/* If DASD, get blocksizes */
+if (block_sizes.rc == 0) {
+block_sizes.size.log = probe_logical_blocksize(bs, s->fd);
+block_sizes.size.phys = probe_physical_blocksize(bs, s->fd);
+}
+
+return block_sizes;
+}
+
+static struct ProbeGeometry hdev_probe_geometry(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+struct ProbeGeometry geometry = {0};
+struct hd_geometry ioctl_geo = {0};
+
+geometry.rc = CheckForDASD(s->fd);
+if (geometry.rc != 0) {
+goto done;
+}
+/* If DASD, get it's geometry */
+geometry.rc = ioctl(s->fd, HDIO_GETGEO, &ioctl_geo);
+/* Do not return a geometry for partition */
+if (ioctl_geo.start != 0) {
+geometry.rc = -1;
+goto done;
+}
+/* HDIO_GETGEO may return success even though geo contains zeros
+   (e.g. certain multipath setups) */
+if (!ioctl_geo.heads || !ioctl_geo.sectors || !ioctl_geo.cylinders) {
+geometry.rc = -1;
+goto done;
+}
+if (geometry.rc == 0) {
+geometry.geo.heads = (uint32_t)ioctl_geo.heads;
+geometry.geo.sectors = (uint32_t)ioctl_geo.sectors;
+geometry.geo.cylinders = (uint32_t)ioctl_geo.cylinders;
+geometry.geo.start = (uint32_t)ioctl_geo.start;
+}
+done:
+   return geometry;
+}
+
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
 {
 int ret;
@@ -2128,6 +2191,8 @@ static BlockDriver bdrv_host_device = {
 .bdrv_get_info = raw_get_info,
 .bdrv_get_allocated_file_size
 = raw_get_allocated_file_size,
+.bdrv_probe_blocksizes = hdev_probe_blocksizes,
+.bdrv_probe_geometry = hdev_probe_geometry,

 .bdrv_detach_aio_context = raw_detach_aio_context,
 .bdrv_attach_aio_context = raw_attach_aio_context,
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 401b967..d164eba 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -173,6 +173,16 @@ static int raw_probe(const uint8_t *buf, int 
buf_size, const char *filename)

 return 1;
 }

+static struct ProbeBlockSize raw_probe_blocksizes(BlockDriverState *bs)
+{
+return bdrv_probe_blocksizes(bs->file);
+}
+
+static struct ProbeGeometry raw_probe_geometry(BlockDriverState *bs)
+{
+return bdrv_probe_geometry(bs->file);
+}
+
 static BlockDriver bdrv_raw = {
 .format_name  = "raw",
 .bdrv_probe   = &raw_probe,
@@ -190,6 +200,8 @@ static BlockDriver bdrv_raw = {
 .has_variable_length  = true,
 .bdrv_get_info= &raw_get_info,
 .bdrv_refresh_limits  = &raw_refresh_limits,
+.bdrv_probe_blocksizes = &raw_probe_blocksizes,
+.bdrv_probe_geometry  = &raw_probe_geometry,
 .bdrv_is_inserted = &raw_is_inserted,
 .bdrv_media_changed   = &raw_media_changed,
 .bdrv_eject   = &raw_eject,
--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 5/6] geometry: Call backend function to detect geometry and blocksize

2014-11-18 Thread Ekaterina Tumanova


copying to mail list


 Forwarded Message 
Subject: [PATCH v2 5/6] geometry: Call backend function to detect 
geometry and blocksize

Date: Tue, 18 Nov 2014 17:10:00 +0100
From: Ekaterina Tumanova 
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 



hd_geometry_guess function autodetects the drive geometry. This patch
adds a block backend call, that probes the backing device geometry.
If the inner driver method is implemented and succeeds (currently only 
DASDs),

the blkconf_geometry will pass-through the backing device geometry.

Signed-off-by: Ekaterina Tumanova 
---
 hw/block/block.c | 11 +++
 hw/block/hd-geometry.c   |  9 +
 hw/block/virtio-blk.c|  1 +
 include/hw/block/block.h |  1 +
 4 files changed, 22 insertions(+)

diff --git a/hw/block/block.c b/hw/block/block.c
index a625773..f1d29bc 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -25,6 +25,17 @@ void blkconf_serial(BlockConf *conf, char **serial)
 }
 }

+void blkconf_blocksizes(BlockConf *conf)
+{
+BlockBackend *blk = conf->blk;
+struct ProbeBlockSize blocksize = blk_probe_blocksizes(blk);
+
+if (blocksize.rc == 0) {
+conf->physical_block_size = blocksize.size.phys;
+conf->logical_block_size = blocksize.size.log;
+}
+}
+
 void blkconf_geometry(BlockConf *conf, int *ptrans,
   unsigned cyls_max, unsigned heads_max, unsigned 
secs_max,

   Error **errp)
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 6fcf74d..4972114 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -121,6 +121,14 @@ void hd_geometry_guess(BlockBackend *blk,
int *ptrans)
 {
 int cylinders, heads, secs, translation;
+struct ProbeGeometry geometry = blk_probe_geometry(blk);
+
+if (geometry.rc == 0) {
+*pcyls = geometry.geo.cylinders;
+*psecs = geometry.geo.sectors;
+*pheads = geometry.geo.heads;
+goto done;
+}

 if (guess_disk_lchs(blk, &cylinders, &heads, &secs) < 0) {
 /* no LCHS guess: use a standard physical disk geometry  */
@@ -143,6 +151,7 @@ void hd_geometry_guess(BlockBackend *blk,
the logical geometry */
 translation = BIOS_ATA_TRANSLATION_NONE;
 }
+done:
 if (ptrans) {
 *ptrans = translation;
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b19b102..6f01565 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -745,6 +745,7 @@ static void virtio_blk_device_realize(DeviceState 
*dev, Error **errp)

 error_propagate(errp, err);
 return;
 }
+blkconf_blocksizes(&conf->conf);

 virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
 sizeof(struct virtio_blk_config));
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 0d0ce9a..856bf75 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -63,6 +63,7 @@ void blkconf_serial(BlockConf *conf, char **serial);
 void blkconf_geometry(BlockConf *conf, int *trans,
   unsigned cyls_max, unsigned heads_max, unsigned 
secs_max,

   Error **errp);
+void blkconf_blocksizes(BlockConf *conf);

 /* Hard disk geometry */

--
1.8.5.5

Re: [Qemu-devel] [PULL 0/2] Tracing patches

2014-11-18 Thread Peter Maydell

On 18 November 2014 15:04, Stefan Hajnoczi  wrote:
> The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2014-11-17 17:22:03 +)
>
> are available in the git repository at:
>
>   git://github.com/stefanha/qemu.git tags/tracing-pull-request
>
> for you to fetch changes up to 776ec96f790e2c943c13313d8ecab4713b47ab65:
>
>   Tracing: Fix simpletrace.py error on tcg enabled binary traces (2014-11-18 
> 14:05:58 +)
>
> 
>
> 

Applied, thanks.

-- PMM

[Qemu-devel] Fwd: [PATCH v2 4/6] geometry: Add block-backend wrappers for geometry probing

2014-11-18 Thread Ekaterina Tumanova


copying to mail list


 Forwarded Message 
Subject: [PATCH v2 4/6] geometry: Add block-backend wrappers for 
geometry probing

Date: Tue, 18 Nov 2014 17:09:59 +0100
From: Ekaterina Tumanova 
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 



Signed-off-by: Ekaterina Tumanova 
---
 block/block-backend.c  | 10 ++
 include/sysemu/block-backend.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index d0692b1..6717301 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -629,3 +629,13 @@ void *blk_aio_get(const AIOCBInfo *aiocb_info, 
BlockBackend *blk,

 {
 return qemu_aio_get(aiocb_info, blk_bs(blk), cb, opaque);
 }
+
+struct ProbeBlockSize blk_probe_blocksizes(BlockBackend *blk)
+{
+return bdrv_probe_blocksizes(blk->bs);
+}
+
+struct ProbeGeometry blk_probe_geometry(BlockBackend *blk)
+{
+return bdrv_probe_geometry(blk->bs);
+}
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 52d13c1..e8b497a 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -138,5 +138,7 @@ BlockAcctStats *blk_get_stats(BlockBackend *blk);

 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque);
+struct ProbeBlockSize blk_probe_blocksizes(BlockBackend *blk);
+struct ProbeGeometry blk_probe_geometry(BlockBackend *blk);

 #endif
--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 6/6] geometry: Target specific hook for s390x in geometry guessing

2014-11-18 Thread Ekaterina Tumanova



copying to mail list

 Forwarded Message 
Subject: [PATCH v2 6/6] geometry: Target specific hook for s390x in 
geometry guessing

Date: Tue, 18 Nov 2014 17:10:01 +0100
From: Ekaterina Tumanova 
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 



For target-s390x, the behaviour is chosen as follows:
If no geo could be guessed from backing device, guess_disk_lchs (partition
table guessing) is called.
If this is not successful (e.g. image files) geometry is derived from the
size of the disk (as before).
We have no translation on s390, so we simply set in to NONE.

Signed-off-by: Ekaterina Tumanova 
---
 hw/block/Makefile.objs |  6 +-
 hw/block/hd-geometry.c | 36 +++-
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index d4c3ab7..1f7da7a 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += block.o cdrom.o hd-geometry.o
+common-obj-y += block.o cdrom.o
 common-obj-$(CONFIG_FDC) += fdc.o
 common-obj-$(CONFIG_SSI_M25P80) += m25p80.o
 common-obj-$(CONFIG_NAND) += nand.o
@@ -13,3 +13,7 @@ obj-$(CONFIG_SH4) += tc58128.o

 obj-$(CONFIG_VIRTIO) += virtio-blk.o
 obj-$(CONFIG_VIRTIO) += dataplane/
+
+# geometry is target/architecture dependent and therefore needs to be built
+# for every target platform
+obj-y += hd-geometry.o
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 4972114..b462225 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -49,11 +49,12 @@ struct partition {

 /* try to guess the disk logical geometry from the MSDOS partition table.
Return 0 if OK, -1 if could not guess */
-static int guess_disk_lchs(BlockBackend *blk,
-   int *pcylinders, int *pheads, int *psectors)
+static int guess_disk_lchs(BlockBackend *blk, uint32_t *pcylinders,
+   uint32_t *pheads, uint32_t *psectors)
 {
 uint8_t buf[BDRV_SECTOR_SIZE];
-int i, heads, sectors, cylinders;
+int i;
+uint32_t heads, sectors, cylinders;
 struct partition *p;
 uint32_t nr_sects;
 uint64_t nb_sectors;
@@ -116,11 +117,11 @@ static void guess_chs_for_size(BlockBackend *blk,
 *psecs = 63;
 }

+#ifdef TARGET_S390X
 void hd_geometry_guess(BlockBackend *blk,
uint32_t *pcyls, uint32_t *pheads, uint32_t *psecs,
int *ptrans)
 {
-int cylinders, heads, secs, translation;
 struct ProbeGeometry geometry = blk_probe_geometry(blk);

 if (geometry.rc == 0) {
@@ -129,7 +130,32 @@ void hd_geometry_guess(BlockBackend *blk,
 *pheads = geometry.geo.heads;
 goto done;
 }
+if (guess_disk_lchs(blk, pcyls, pheads, psecs) == 0) {
+goto done;
+}
+guess_chs_for_size(blk, pcyls, pheads, psecs);
+done:
+if (ptrans) {
+*ptrans = BIOS_ATA_TRANSLATION_NONE;
+}

+trace_hd_geometry_guess(blk, *pcyls, *pheads, *psecs,
+BIOS_ATA_TRANSLATION_NONE);
+}
+#else
+void hd_geometry_guess(BlockBackend *blk,
+   uint32_t *pcyls, uint32_t *pheads, uint32_t *psecs,
+   int *ptrans)
+{
+int cylinders, heads, secs, translation;
+struct ProbeGeometry geometry = blk_probe_geometry(blk);
+
+if (geometry.rc == 0) {
+*pcyls = geometry.geo.cylinders;
+*psecs = geometry.geo.sectors;
+*pheads = geometry.geo.heads;
+goto done;
+}
 if (guess_disk_lchs(blk, &cylinders, &heads, &secs) < 0) {
 /* no LCHS guess: use a standard physical disk geometry  */
 guess_chs_for_size(blk, pcyls, pheads, psecs);
@@ -157,7 +183,7 @@ done:
 }
 trace_hd_geometry_guess(blk, *pcyls, *pheads, *psecs, translation);
 }
-
+#endif
 int hd_bios_chs_auto_trans(uint32_t cyls, uint32_t heads, uint32_t secs)
 {
 return cyls <= 1024 && heads <= 16 && secs <= 63
--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 0/6] Geometry and blocksize support for backing devices

2014-11-18 Thread Ekaterina Tumanova



copying to mail list

 Forwarded Message 
Subject: [PATCH v2 0/6] Geometry and blocksize support for backing devices
Date: Tue, 18 Nov 2014 17:09:55 +0100
From: Ekaterina Tumanova 
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 



This is the rework of the geometry+blocksize patch, which was
recently discussed here:
http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg01148.html

Markus suggested that we only detect blocksize and geometry for DASDs.

According to this agreement new version contains DASD special casing.
The driver methods are implemented only for "host_device" and inner hdev_xxx
functions check if the backing storage is a DASD by means of
BIODASDINFO2 ioctl.

Original patchset can be found here:
http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg03791.html

Ekaterina Tumanova (6):
  geometry: add bdrv functions for geometry and blocksize
  geometry: Detect blocksize via ioctls in separate static functions
  geometry: Add driver methods to probe blocksizes and geometry
  geometry: Add block-backend wrappers for geometry probing
  geometry: Call backend function to detect geometry and blocksize
  geometry: Target specific hook for s390x in geometry guessing

 block.c|  26 +
 block/block-backend.c  |  10 
 block/raw-posix.c  | 123 
++---

 block/raw_bsd.c|  12 
 hw/block/Makefile.objs |   6 +-
 hw/block/block.c   |  11 
 hw/block/hd-geometry.c |  43 --
 hw/block/virtio-blk.c  |   1 +
 include/block/block.h  |  20 +++
 include/block/block_int.h  |   3 +
 include/hw/block/block.h   |   1 +
 include/sysemu/block-backend.h |   2 +
 12 files changed, 234 insertions(+), 24 deletions(-)

--
1.8.5.5

Re: [Qemu-devel] Sending packets up to VM using vhost-net User.

2014-11-18 Thread Anshul Makkar

Sorry, forgot to mention I am using " git clone -b vhost-user-v5
https://github.com/virtualopensystems/qemu.git"; for vhost-user backend
implementation.

and "git clone https://github.com/virtualopensystems/vapp.git " for
reference implementation.

Anshul Makkar

On Tue, Nov 18, 2014 at 5:29 PM, Anshul Makkar <
anshul.mak...@profitbricks.com> wrote:

> Hi,
>
> I am developing an application that is using vhost-user backend for packet
> transfer.
>
> The architecture:
>
> 1) VM1 is using Vhost-user and executing on server1.
>
> ".qemu-system-x86_64 -m 1024 -mem-path /dev/hugepages,prealloc=on,share=on
> -drive
> file=/home/amakkar/test.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
> -device
> virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
> -vga std -vnc 0.0.0.0:3 -netdev
> type=vhost-user,id=net0,file=/home/amakkar/qemu.sock -device
> virtio-net-pci,netdev=net0"
>
> 2) App1 on server1: executing in user-mode connects with vhost-user
> backend over qemu.sock. As expected, initialization is done and guest
> addresses including addresses of descriptor ring , available ring and used
> ring and mapped to my userspace app and I can directly access them.
>
> I launch PACKETH on VM1 and transfer some packets using eth0 on VM1
> (packet transfer uses virtio-net backend. ifconfig eth0 shows correct TX
> stats)
>
> In App1 I directly access the avail_ring buffer and consume the packet and
> then I do RDMA transfer to server 2 .
>
> 3) VM2 and App2 executing on server2 and again using VHost-User.
>
> App2: Vring initializations are successfully done and vring buffers are
> mapped. I get the buffer from App1 and now *I want to transfer this
> buffer (Raw packet) to VM2.*
>
> To transfer the buffer from App2 to VM2, I directly access the descriptor
> ring, place the buffer in it and update the available index and then issue
> the kick.
>
> code snippet for it:
>
> dest_buf = (void *)handler->map_handler(handler->context,
> desc[a_idx].addr);
> memcpy(dest_buf + hdr_len, buf, size);
> avail->ring[avail->idx % num] = a_idx;
> avail->idx++;
> fprintf(stdout, "put_vring, synching memory \n");
> sync_shm(dest_buf, size);
> sync_shm((void *)&(avail), sizeof(struct vring_avail));
>
> kick(&vhost_user->vring_table, rx_idx);
>
> But the buffer never reaches to VM2. (I do ifconfig eth0 in VM2 and RX
> stats are 0)
>
> Please can you share if my approach is correct in transferring the packet
> from App2 to VM. Can I directly place the buffer in descriptor ring and
> issue a kick to notify virtio-net that a packet is available or you can
> smell some implementation problem.
>
> Thanks
> Anshul Makkar
>
>

[Qemu-devel] Sending packets up to VM using vhost-net User.

2014-11-18 Thread Anshul Makkar

Hi,

I am developing an application that is using vhost-user backend for packet
transfer.

The architecture:

1) VM1 is using Vhost-user and executing on server1.

".qemu-system-x86_64 -m 1024 -mem-path /dev/hugepages,prealloc=on,share=on
-drive
file=/home/amakkar/test.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
-device
virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-vga std -vnc 0.0.0.0:3 -netdev
type=vhost-user,id=net0,file=/home/amakkar/qemu.sock -device
virtio-net-pci,netdev=net0"

2) App1 on server1: executing in user-mode connects with vhost-user backend
over qemu.sock. As expected, initialization is done and guest addresses
including addresses of descriptor ring , available ring and used ring and
mapped to my userspace app and I can directly access them.

I launch PACKETH on VM1 and transfer some packets using eth0 on VM1 (packet
transfer uses virtio-net backend. ifconfig eth0 shows correct TX stats)

In App1 I directly access the avail_ring buffer and consume the packet and
then I do RDMA transfer to server 2 .

3) VM2 and App2 executing on server2 and again using VHost-User.

App2: Vring initializations are successfully done and vring buffers are
mapped. I get the buffer from App1 and now *I want to transfer this buffer
(Raw packet) to VM2.*

To transfer the buffer from App2 to VM2, I directly access the descriptor
ring, place the buffer in it and update the available index and then issue
the kick.

code snippet for it:

dest_buf = (void *)handler->map_handler(handler->context, desc[a_idx].addr);
memcpy(dest_buf + hdr_len, buf, size);
avail->ring[avail->idx % num] = a_idx;
avail->idx++;
fprintf(stdout, "put_vring, synching memory \n");
sync_shm(dest_buf, size);
sync_shm((void *)&(avail), sizeof(struct vring_avail));

kick(&vhost_user->vring_table, rx_idx);

But the buffer never reaches to VM2. (I do ifconfig eth0 in VM2 and RX
stats are 0)

Please can you share if my approach is correct in transferring the packet
from App2 to VM. Can I directly place the buffer in descriptor ring and
issue a kick to notify virtio-net that a packet is available or you can
smell some implementation problem.

Thanks
Anshul Makkar

[Qemu-devel] qmp-commands.hx and inherited types (Was: Re: [PATCH v6 02/10] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove)

2014-11-18 Thread John Snow




On 11/07/2014 08:00 AM, Eric Blake wrote:

On 10/30/2014 04:22 AM, Fam Zheng wrote:


[snip]


+++ b/qapi/block-core.json
@@ -865,6 +865,61 @@
  '*on-target-error': 'BlockdevOnError' } }

  ##
+# @BlockDirtyBitmap
+#
+# @device: name of device which the bitmap is tracking
+#
+# @name: name of the dirty bitmap
+#
+# Since 2.3
+##
+{ 'type': 'BlockDirtyBitmap',
+  'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @BlockDirtyBitmapAdd
+#
+# @device: name of device which the bitmap is tracking
+#
+# @name: name of the dirty bitmap
+#
+# @granularity: #optional the bitmap granularity, default is 64k for
+#   block-dirty-bitmap-add


Do you still need to call out the command, given that it is the only
client of this type?


+#
+# Since 2.3
+##
+{ 'type': 'BlockDirtyBitmapAdd',
+  'data': { 'device': 'str', 'name': 'str', '*granularity': 'int' } }


Is it worth using type inheritance, as in:

{ 'type': 'BlockDirtyBitmapAdd',
   'base': 'BlockDirtyBitmap',
   'data': { '*granularity': 'int' } }



Strictly speaking, I would argue against inheritance here because 
"BlockDirtyBitmapAdd" is not "isa" "BlockDirtyBitmap". It's more of a 
"Hasa" relationship.


At any rate, I tried to implement this for giggles to see if I could, 
and ran into the following issue with which I'd be curious to get an 
answer for.


As an example, If you have some type:

{ 'type': 'example',
  'data': { 'foo': 'int' } }

And an extension of it:

{ 'type': 'academicExample'
  'base': 'example',
  'data': { 'bar': 'str' } }

How would you write a command that expected both "foo" and "bar"?
The following doesn't seem appropriate (the generated code SKIPS the 
base fields, which leads to missing arguments in the prototype:


{ 'command': 'academic-command',
  'data': 'academicExample' }

...

{
.name = "academic-command",
.args_type = "foo:i,bar:s",
.mhandler.cmd_new = qmp_marshal_input_academic_command,
},


The generated prototype appears to skip the "foo" argument, including 
only the arguments associated with the base type, in this case, 'bar'.


Do we support this kind of use? I didn't see it in-use currently, but I 
only gave it a cursory skimming.

Re: [Qemu-devel] runtime configurable semihosting

2014-11-18 Thread Liviu Ionescu


On 18 Nov 2014, at 15:31, Peter Maydell  wrote:

> Sorry, yes, you're right; I should have looked more carefully.
> So we do need a new option;

or we should fix the parser. if the current keyword is not in the list, it can 
be pushed back, and defaults used.

> -semihosting-config target=[native|gdb|auto]

ok, I'll implement this, and if you fix the parser we'll change it to the plain 
-semihosting.

> have -semihosting-config have an implied 'enable=true', so

ok

Liviu

[Qemu-devel] Tunneled Migration with Non-Shared Storage

2014-11-18 Thread Gary R Hook


What I really need to figure out is why, when performing a migration
using non-shared storage, an entire VM is copied into memory before it’s 
sent across the wire using this method. A copy-on-read operation

is performed first, then the disk is sent, then the dirty pages, then
the RAM.

The odd thing is that non-tunneled migrations use a completely different
code path that does _not_ execute a copy-on-read first.

The problem: VMs (VM disks) that are larger than available memory can't
be migrated due to space constraints.

I'd like to disable this COR opration but I think it's important to
understand why the code is written this way. I've yet to run across any
comments that explain the need for this "extra" copy, and am looking
for background and advice.

Any insights are welcome.

--
Gary R Hook
Senior Kernel Engineer
NIMBOXX, Inc

Re: [Qemu-devel] runtime configurable semihosting

2014-11-18 Thread Peter Maydell

On 18 November 2014 16:46, Liviu Ionescu  wrote:
>
> On 18 Nov 2014, at 15:31, Peter Maydell  wrote:
>
>> Sorry, yes, you're right; I should have looked more carefully.
>> So we do need a new option;
>
> or we should fix the parser. if the current keyword is not in
> the list, it can be pushed back, and defaults used.

This is veering quite close to magic, really. It changes
the current handling of some valid command lines (admittedly
ones with implausible image names, but still).

>> -semihosting-config target=[native|gdb|auto]
>
> ok, I'll implement this, and if you fix the parser we'll change it to the 
> plain -semihosting.

Thanks.

-- PMM

Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support

2014-11-18 Thread Alexander Graf



On 18.11.14 13:50, Frank Blaschka wrote:
> On Mon, Nov 10, 2014 at 04:14:16PM +0100, Alexander Graf wrote:
>>
>>
>> On 10.11.14 15:20, Frank Blaschka wrote:
>>> From: Frank Blaschka 
>>>
>>> This patch implements a pci bus for s390x together with infrastructure
>>> to generate and handle hotplug events, to configure/unconfigure via
>>> sclp instruction, to do iommu translations and provide s390 support for
>>> MSI/MSI-X notification processing.
>>>
>>> Signed-off-by: Frank Blaschka 

[...]

>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>> new file mode 100644
>>> index 000..f2fa6ba
>>> --- /dev/null
>>> +++ b/hw/s390x/s390-pci-bus.c
>>> @@ -0,0 +1,485 @@
>>> +/*
>>> + * s390 PCI BUS
>>> + *
>>> + * Copyright 2014 IBM Corp.
>>> + * Author(s): Frank Blaschka 
>>> + *Hong Bo Li 
>>> + *Yi Min Zhao 
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
>>> + * your option) any later version. See the COPYING file in the top-level
>>> + * directory.
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include 
>>> +#include "qemu/error-report.h"
>>> +#include "s390-pci-bus.h"
>>> +
>>> +/* #define DEBUG_S390PCI_BUS */
>>> +#ifdef DEBUG_S390PCI_BUS
>>> +#define DPRINTF(fmt, ...) \
>>> +do { fprintf(stderr, "S390pci-bus: " fmt, ## __VA_ARGS__); } while (0)
>>> +#else
>>> +#define DPRINTF(fmt, ...) \
>>> +do { } while (0)
>>> +#endif
>>> +
>>> +static const unsigned long be_to_le = BITS_PER_LONG - 1;
>>> +static QTAILQ_HEAD(, SeiContainer) pending_sei =
>>> +QTAILQ_HEAD_INITIALIZER(pending_sei);
>>> +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
>>> +QTAILQ_HEAD_INITIALIZER(device_list);
>>
>> Please get rid of all statics ;). All state has to live in objects.
>>
> 
> be_to_le was misleading and unnecesary will remove this one but
> static QTAILQ_HEAD seems to be a common practice for list anchors.
> If you really want me to change this do you have any prefered way,
> or can you point me to some code doing this?

For PCI devices, I don't think you need a list at all. Your PHB device
should already have a proper qbus that knows about all its child devices.

As for pending_sei, what is this about?

> 
>>> +
>>> +int chsc_sei_nt2_get_event(void *res)

[...]

>>> +
>>> +int chsc_sei_nt2_get_event(void *res);
>>> +int chsc_sei_nt2_have_event(void);
>>> +void s390_pci_sclp_configure(int configure, SCCB *sccb);
>>> +S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx);
>>> +S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh);
>>> +S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid);
>>
>> I think it makes sense to pass the PHB device as parameter on these.
>> Don't assume you only have one.
> 
> We need to lookup our device mainly in the instruction handlers and there
> we do not have a PHB available.

Then have a way to find your PHB - either put a variable into the
machine object, or find it by path via QOM tree lookups. Maybe we need
multiple PHBs, identified by part of the ID? I know too little about the
way PCI works on s390x to really tell.

Again, are there specs?

> Also having one list for our S390PCIBusDevices
> devices does not prevent us from supporting more PHBs.
> 
>>
>>> +void s390_pci_bus_init(void);
>>> +uint64_t s390_pci_get_table_origin(uint64_t iota);
>>> +uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
>>> +  uint64_t guest_dma_address);
>>
>> Why are these exported?
>>
>>> +
>>> +#endif
>>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>>> index bc4dc2a..2e25834 100644
>>> --- a/hw/s390x/s390-virtio-ccw.c
>>> +++ b/hw/s390x/s390-virtio-ccw.c
>>> @@ -18,6 +18,7 @@
>>>  #include "css.h"
>>>  #include "virtio-ccw.h"
>>>  #include "qemu/config-file.h"
>>> +#include "s390-pci-bus.h"
>>>  
>>>  #define TYPE_S390_CCW_MACHINE   "s390-ccw-machine"
>>>  
>>> @@ -127,6 +128,8 @@ static void ccw_init(MachineState *machine)
>>>machine->initrd_filename, "s390-ccw.img");
>>>  s390_flic_init();
>>>  
>>> +s390_pci_bus_init();
>>
>> Please just inline that function here.
>>
> 
> What do you mean by "just inline"?

The contents of the s390_pci_bus_init() function should just be standing
right here. There's no value in creating a public wrapper function for
initialization. We only did this back in the old days before qdev was
around, because initialization was difficult back then and some devices
didn't make the jump to get rid of their public init functions.

> 
>>> +
>>>  /* register hypercalls */
>>>  virtio_ccw_register_hcalls();
>>>  
>>> diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
>>> index a759da7..a969975 100644
>>> --- a/hw/s390x/sclp.c
>>> +++ b/hw/s390x/sclp.c
>>> @@ -20,6 +20,7 @@
>>>  #include "qemu/config-file.h"
>>>  #include "hw/s390x/sclp.h"
>>>  #include "hw/s390x/event-facility.h"
>>> +#include "hw/s390x/s390-pci-bus.h"
>>>  
>>>  static in

Re: [Qemu-devel] qmp-commands.hx and inherited types (Was: Re: [PATCH v6 02/10] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove)

2014-11-18 Thread Eric Blake

On 11/18/2014 09:44 AM, John Snow wrote:
>> Is it worth using type inheritance, as in:
>>
>> { 'type': 'BlockDirtyBitmapAdd',
>>'base': 'BlockDirtyBitmap',
>>'data': { '*granularity': 'int' } }
>>
> 
> Strictly speaking, I would argue against inheritance here because
> "BlockDirtyBitmapAdd" is not "isa" "BlockDirtyBitmap". It's more of a
> "Hasa" relationship.

Fair enough.

> 
> At any rate, I tried to implement this for giggles to see if I could,
> and ran into the following issue with which I'd be curious to get an
> answer for.
> 
> As an example, If you have some type:
> 
> { 'type': 'example',
>   'data': { 'foo': 'int' } }
> 
> And an extension of it:
> 
> { 'type': 'academicExample'
>   'base': 'example',
>   'data': { 'bar': 'str' } }
> 
> How would you write a command that expected both "foo" and "bar"?
> The following doesn't seem appropriate (the generated code SKIPS the
> base fields, which leads to missing arguments in the prototype:
> 
> { 'command': 'academic-command',
>   'data': 'academicExample' }

Ouch.  Sounds like a bug in the code generator.  Obviously, someone will
have to patch that (and add a testsuite entry to make sure it doesn't
regress) before we can rely on it.

> 
> ...
> 
> {
> .name = "academic-command",
> .args_type = "foo:i,bar:s",
> .mhandler.cmd_new = qmp_marshal_input_academic_command,
> },
> 
> 
> The generated prototype appears to skip the "foo" argument, including
> only the arguments associated with the base type, in this case, 'bar'.
> 
> Do we support this kind of use? I didn't see it in-use currently, but I
> only gave it a cursory skimming.

We supposedly document it as working, but as no one is using it
(including no testsuite entry), I'm not surprised that it doesn't work yet.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

1 2 >

1 - 100 of 135 matches

Mail list logo