Re: [Qemu-devel] [PATCH] ahci: map memory via device's address space instead of address_space_memory

2014-07-05 Thread Michael S. Tsirkin
On Fri, Jul 04, 2014 at 07:26:14AM +0200, Jan Kiszka wrote:
> On 2014-07-03 22:30, Michael S. Tsirkin wrote:
> > On Thu, Jul 03, 2014 at 06:45:03PM +0200, Jan Kiszka wrote:
> >> On 2014-07-03 12:02, Michael S. Tsirkin wrote:
> >>> On Thu, Jul 03, 2014 at 10:43:57AM +0200, Jan Kiszka wrote:
>  On 2014-07-03 10:26, Le Tan wrote:
> > In map_page() in hw/ide/ahci.c, replace cpu_physical_memory_map() and
> > cpu_physical_memory_unmap() with dma_memory_map() and 
> > dma_memory_unmap(),
> > because ahci devices should not access memory directly but via their 
> > address
> > space. Add an AddressSpace parameter to map_page(). In order to call
> > map_page(), we should pass the AHCIState.as as the AddressSpace 
> > argument.
> 
>  BTW, when doing "git grep cpu_physical_memory_map hw", there are some
>  more cases that should be checked (for x86). I suppose vhost is
>  incompatible with an IOMMU,
> >>>
> >>> vhost can be made to work: you just need to
> >>> update its memory tables as appropriate.
> >>> But see below
> >>>
>  but plain virtio should work,
> >>>
> >>> It doesn't: all guests pass in physical addresses at the moment.
> >>
> >> You mean they do not put virtio devices into IOMMU domains, or they do
> >> put them but ignore any translation rules that are not 1:1?
> > 
> > Look at the code. We just pass in physical addresses
> > ignoring which iommu domain device ended up with.
> 
> Should probably be fixed, at least in Linux. I suppose there is always a
> 1:1 domain where virtio devices can be put in so that they are not
> involved in any translation if this is not desired (PPC?).

This will need some experiments.

> > 
> >>> We discussed requiring this for virtio 1.0, but in the end,
> >>> most people thought that passing through virtio devices
> >>> isn't worthwhile.
> >>
> >> It should be consistent at least. If virtio is not translated, we have
> >> to exclude such devices via ACPI tables from the scope of our IOMMUs.
> > 
> > I didn't know this is possible. How does one do this?
> 
> Both the DMAR (Intel) and the IVRS (AMD) ACPI tables allow to report the
> scope of an IOMMU unit. There you list sets of devices on specific buses
> or a complete segment that the unit virtualizes. Le currently reports
> that there is a single DMAR unit for q35, and that one controls the
> complete segment 0. The structure could be adapted if a virtio device is
> present so that this device is not included. That should prevent that
> the guest tries to control virtio devices via IOMMUs or even assign them
> to some L2 guests.

Might be tricky to do with hotplugged devices.

> We could then provide a machine property that controls virtualization of
> virtio devices, default off, but you can enable it when running properly
> adjusted guest drivers.
> 
> > 
> >>> We can certainly add that as an option, with a feature bit.
> >>>
> >>> If you feel otherwise, you can comment on the latest spec draft.
> >>
> >> Does the spec at least state that "virtio devices are not subject to any
> >> guest configured IOMMU translation"? Is is this left undefined?
> >>
> >> Jan
> >>
> >>
> > 
> > I don't think we have anything like this.
> > 
> 
> And I'm not even sure it necessarily belongs to the spec. I think it's
> just a guest driver deficit to ignore IOMMU settings that the guest OS
> may apply.
> 
> Jan
> 

I guess we might want to at least document the historical driver
behaviour of skipping the IOMMU. But I think I agree this can wait:
we will have time to figure it out before we release QEMU 2.2.

-- 
MST



Re: [Qemu-devel] [RFC] qemu VGA endian swap low level drawing changes

2014-07-05 Thread Benjamin Herrenschmidt
On Sun, 2014-07-06 at 15:49 +1000, Benjamin Herrenschmidt wrote:
> We're basically tripping that test in cirrus_bitblt_rop_fwd_*
> 
> if (dstpitch < 0 || srcpitch < 0) {
> /* is 0 valid? srcpitch == 0 could be useful */
> return;
> }
> 
> Because when called from cirrus_bitblt_cputovideo_next() we
> always pass 0 for both pitch (we do one line at a time).

Ok, this is fun :-)

The above test was added supposedly as a fix for CVE2007-1320 in
2008, commit b2eb849d4b1fdb6f35d5c46958c7f703cf64cfef by "aurel32" (not
sure who that is).

However, looking at the description, it's a bit of a joke:

<<
I have just noticed that patch for CVE-2007-1320 has never been applied
to the QEMU CVS. Please find it below.

| Multiple heap-based buffer overflows in the cirrus_invalidate_region
| function in the Cirrus VGA extension in QEMU 0.8.2, as used in Xen and
| possibly other products, might allow local users to execute arbitrary
| code via unspecified vectors related to "attempting to mark
| non-existent regions as dirty," aka the "bitblt" heap overflow.
>>

This is bogus for at least these reasons:

 - I don't see how adding that check inside one of the ROPs will have any
effect on the call to cirrus_invalidate_region()

 - Whoever wrote the patch obviously didn't "notice" the two substractions
to the pitches right before, since the comment about value "0" makes no sense
unless you ignore those. If you don't, then the comment is completely spurrious
as of course, the case of the pitch being 0 after susbtraction would be quite
common.

 - Whoever wrote the patch didn't notice that we do call it with 0 in the copy
from host path

 - Assuming doing the fix inside the ROP makes sense, then why fix only one
of them ? The exploit, if it exists, is still present in all the other ones...

 - Finally, assuming that bound-checking the pitch has some value to avoid
writing outside of the framebuffer allocated area, doing it inside the ROP
makes little sense, this should be done when setting up the blit.

At this point, I"m tempted to just revert that commit. What do you think Gerd ?

I've noticed a few reports over the last few years of breakage with Windows
NT that seem to stem from this.

Cheers,
Ben.





Re: [Qemu-devel] [PATCH] qemu-char: add chr_add_watch support in mux chardev

2014-07-05 Thread Michael S. Tsirkin
On Fri, Jul 04, 2014 at 04:43:15PM +0400, Kirill Batuzov wrote:
> Forward chr_add_watch call from mux chardev to underlying
> implementation.
> 
> This should fix bug #1335444
> 
> Signed-off-by: Kirill Batuzov 

Applied for this once, but please note 2.1 patches
need to be tagged explicitly from now on.

> ---
>  qemu-char.c |9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/qemu-char.c b/qemu-char.c
> index 51917de..e1bd6f5 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -581,6 +581,12 @@ static Notifier muxes_realize_notify = {
>  .notify = muxes_realize_done,
>  };
>  
> +static GSource *mux_chr_add_watch(CharDriverState *s, GIOCondition cond)
> +{
> +MuxDriver *d = s->opaque;
> +return d->drv->chr_add_watch(d->drv, cond);
> +}
> +
>  static CharDriverState *qemu_chr_open_mux(CharDriverState *drv)
>  {
>  CharDriverState *chr;
> @@ -597,6 +603,9 @@ static CharDriverState *qemu_chr_open_mux(CharDriverState 
> *drv)
>  chr->chr_accept_input = mux_chr_accept_input;
>  /* Frontend guest-open / -close notification is not support with muxes */
>  chr->chr_set_fe_open = NULL;
> +if (drv->chr_add_watch) {
> +chr->chr_add_watch = mux_chr_add_watch;
> +}
>  /* only default to opened state if we've realized the initial
>   * set of muxes
>   */
> -- 
> 1.7.10.4



Re: [Qemu-devel] [PATCH] virtio-pci: fix MSI memory region use after tree

2014-07-05 Thread Michael S. Tsirkin
On Fri, Jul 04, 2014 at 11:43:49AM +0200, Paolo Bonzini wrote:
> After memory region QOMification QEMU is stricter in detecting
> wrong usage of the memory region API.  Here it detected a
> memory_region_destroy done before the corresponding
> memory_region_del_subregion; the memory_region_destroy is
> done by msix_uninit_exclusive_bar, the memory_region_del_subregion
> is done by the PCI core's pci_unregister_io_regions before
> pc->exit is called.
> 
> The misuse caused an assertion when hot-unplugging virtio
> devices.  Using the API correctly fixes the assertion.
> 
> Signed-off-by: Paolo Bonzini 

Applied, thanks!
I also added some historical context in the comments
(the API misuse was introduced in 06a1307379fcd6c551185ad87679cd7ed896b9ea)


> ---
>  hw/virtio/virtio-pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 3c42cda..ecb2097 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1003,11 +1003,9 @@ static void virtio_pci_device_plugged(DeviceState *d)
>  
>  static void virtio_pci_device_unplugged(DeviceState *d)
>  {
> -PCIDevice *pci_dev = PCI_DEVICE(d);
>  VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
>  
>  virtio_pci_stop_ioeventfd(proxy);
> -msix_uninit_exclusive_bar(pci_dev);
>  }
>  
>  static int virtio_pci_init(PCIDevice *pci_dev)
> @@ -1024,6 +1022,8 @@ static int virtio_pci_init(PCIDevice *pci_dev)
>  static void virtio_pci_exit(PCIDevice *pci_dev)
>  {
>  VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
> +
> +msix_uninit_exclusive_bar(pci_dev);
>  memory_region_destroy(&proxy->bar);
>  }
>  
> -- 
> 1.9.3



Re: [Qemu-devel] [RFC] qemu VGA endian swap low level drawing changes

2014-07-05 Thread Benjamin Herrenschmidt
On Sun, 2014-07-06 at 12:19 +1000, Benjamin Herrenschmidt wrote:
> One obvious issue: Your patch:
> 
> gtk: update mouse position in mouse_set()
> 
> Completely breaks cursor movement in NT4 (I haven't checked with other
> guests). It works fine without the patch. This is after cherry-picking
> on top of my series on github.

Ok it's not clear cut. The cursor goes into lalaland without your patch,
though with the patch it goes bonkers immediately. So something is still
wrong (using gtk). Added to my list of things to look it :-)

Cheers,
Ben.





Re: [Qemu-devel] [RFC] qemu VGA endian swap low level drawing changes

2014-07-05 Thread Benjamin Herrenschmidt
On Sun, 2014-07-06 at 12:19 +1000, Benjamin Herrenschmidt wrote:

> I've started to look (and while at it added use of the dirty bitmap to
> catch changes to the HW cursor image just because it looked easy).
> 
> One obvious issue: Your patch:
> 
> gtk: update mouse position in mouse_set()
> 
> Completely breaks cursor movement in NT4 (I haven't checked with other
> guests). It works fine without the patch. This is after cherry-picking
> on top of my series on github.
> 
> Now the cursor is still a white rectangle :-) I need to investigate that
> one a bit more.

Ok, I've found the cirrus bug that causes that white rectangle, but also
the broken icons in 8bpp mode.

We're basically tripping that test in cirrus_bitblt_rop_fwd_*

if (dstpitch < 0 || srcpitch < 0) {
/* is 0 valid? srcpitch == 0 could be useful */
return;
}

Because when called from cirrus_bitblt_cputovideo_next() we
always pass 0 for both pitch (we do one line at a time).

That done, we now get both HW "emulated" and HW "qemu" cursors
working in 8bpp.

The cursor is still absent in 16, probably a minor glitch, I'll
try to figure that out next.

Cheers,
Ben.





Re: [Qemu-devel] [RFC] qemu VGA endian swap low level drawing changes

2014-07-05 Thread Benjamin Herrenschmidt
On Wed, 2014-07-02 at 14:12 +0200, Gerd Hoffmann wrote:
> https://www.kraxel.org/cgit/qemu/log/?h=rebase/console-wip
> 
> Added a patch for hardware cursor support via dpy_cursor_define().
> Old hardware cursor code is still in, so in theory this gives you two
> pointers.  In practice it only shows that cursor support in relative
> mouse mode is (a) flaky and (b) our uis are very inconsistent ...
> 
> [ not hacking on it at the moment, busy with other stuff ]

I've started to look (and while at it added use of the dirty bitmap to
catch changes to the HW cursor image just because it looked easy).

One obvious issue: Your patch:

gtk: update mouse position in mouse_set()

Completely breaks cursor movement in NT4 (I haven't checked with other
guests). It works fine without the patch. This is after cherry-picking
on top of my series on github.

Now the cursor is still a white rectangle :-) I need to investigate that
one a bit more.

Cheers,
Ben.





[Qemu-devel] How to decide the value of Host Address Width in ACPI DMAR table

2014-07-05 Thread Le Tan
Hi,
I am doing Intel IOMMU emulation for q35. I need to add a DMAR table
to ACPI. There is one field in the DMAR table called "Host Address
Width", which indicates the maximum DMA physical addressability
supported by this platform. I am not sure what this value should be
and how to get this value. What is the best way to decide the value of
HAW?
Any suggestion is appreciated. :)
Thanks very much!

Regards,
Le Tan



Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-05 Thread Al Viro
On Sat, Jul 05, 2014 at 10:09:51PM +0100, Al Viro wrote:

> Anyway, the current delta (on top of 26f86) follows; seems to get IEEE
> insns behave on non-finite arguments as they do on 21264.  The main
> exception is that register bitmask supplied to trap isn't calculated in
> a bunch of cases; since its main purpose is to help locating the trapping
> insn and we report precise traps (amask feature bit 9), it's probably not
> an interesting problem.  Current Linux kernel definitely won't look at that
> thing under qemu; an old one might, but it would have to be something
> older than 2.3...  than 2.2.8, actually.  And the impact
> is that insns with /S getting a denorm argument won't be properly emulated
> and you'll get SIGFPE.  Again, it has to be a really old kernel (older than
> May 1999) to be affected at all.

... and a followup (and the last part of exception handling for non-VAX
insn inputs, AFAICS) - CVTQL.

* whether it triggers a trap or not, it sets IOV and INE on overflow.
* in case of trap it does *not* bugger off immediately - result is
calculated, stored and only then we trap.
* trap summary word is different from cvtql/v and cvtql/sv.  IOW, it's yet
another case of "we think that IEEE semantics is stupid and if you need it,
you'd damn better ask for it explicitly".  Note that cvtql/v sets IOV|INE and
hits SIGFPE no matter what, while cvtql/sv set INV instead and triggers SIGFPE
only if FP_INVALID is enabled.  All difference is kernel-side and it's
triggered by EXC_M_SWC in summary word.

AFAICS, that should be it for IEEE and shared insns, as far as exceptions
on inputs are concerned.

Combined delta follows:

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index 9b297de..25c83b5 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
 return get_float_exception_flags(&FP_STATUS);
 }
 
+enum {
+   Exc_Mask = float_flag_invalid | float_flag_int_overflow |
+  float_flag_divbyzero | float_flag_overflow |
+  float_flag_underflow | float_flag_inexact
+};
+
 static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
  uint32_t exc, uint32_t regno, uint32_t hw_exc)
 {
@@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, 
uintptr_t retaddr,
doesn't apply.  */
 void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
 if (exc) {
 env->fpcr_exc_status |= exc;
 exc &= ~ignore;
@@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, 
uint32_t regno)
 /* Raise exceptions for ieee fp insns with software completion.  */
 void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
 if (exc) {
 env->fpcr_exc_status |= exc;
 exc &= ~ignore;
@@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val)
 uint64_t frac = val & 0xfull;
 
 if (exp == 0) {
-/* Denormals without DNZ set raise an exception.  */
-if (frac != 0 && !env->fp_status.flush_inputs_to_zero) {
-arith_excp(env, GETPC(), EXC_M_UNF, 0);
+/* Denormals without /S raise an exception.  */
+if (frac != 0) {
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 } else if (exp == 0x7ff) {
 /* Infinity or NaN.  */
-/* ??? I'm not sure these exception bit flags are correct.  I do
-   know that the Linux kernel, at least, doesn't rely on them and
-   just emulates the insn to figure out what exception to use.  */
-arith_excp(env, GETPC(), frac ? EXC_M_INV : EXC_M_FOV, 0);
+env->fpcr_exc_status |= float_flag_invalid;
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 }
 
@@ -125,16 +129,34 @@ void helper_ieee_input_cmp(CPUAlphaState *env, uint64_t 
val)
 uint64_t frac = val & 0xfull;
 
 if (exp == 0) {
-/* Denormals without DNZ set raise an exception.  */
-if (frac != 0 && !env->fp_status.flush_inputs_to_zero) {
-arith_excp(env, GETPC(), EXC_M_UNF, 0);
+/* Denormals raise an exception.  */
+if (frac != 0) {
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 } else if (exp == 0x7ff && frac) {
 /* NaN.  */
+env->fpcr_exc_status |= float_flag_invalid;
 arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 }
 
+/* Input handing with software completion.  Trap for denorms,
+   unless DNZ is set.  *IF* we try to support DNOD (which
+   none of the produced hardware did, AFAICS), we'll need
+   to suppress the trap when 

Re: [Qemu-devel] [RFC] alpha qemu arithmetic exceptions

2014-07-05 Thread Al Viro
On Sat, Jul 05, 2014 at 02:40:55AM +0100, Al Viro wrote:
> a) softfloat.c raises flags we don't care about.  So checking that
> FP_STATUS.float_exception_flags is non-zero is *not* good - we catch
> false positives that way.
> 
> b) DNZ has effect *only* for /S insns.  Without /S denorm means INV and
> that's it.  FPCR.INV isn't set, at that.  FPCR.INVD is ignored (it affects
> only insns with /S).
> 
> c) without DNZ or DNOD denorms trip INV even with /S.  Again, FPCR.INV is
> not set *and* FPCR.INVD is ignored.  It does stop INV from SQRTT/SU on
> -1, but not on DBL_MIN/2 (and on SQRTT/SU(-1) FPCR.INV is set).  Looks like
> this sucker is a separate kind of trap, the only similarity with INV being
> that it sets the same bit in trap summary word.

BTW, CVTTQ/SVI on denorms with DNZ shouldn't set Inexact.

> Right now I have duplicate of 21264 SQRTT behaviour on everything except
> infinities; hadn't looked into those yet.  I'm going to massage it a bit
> and see if the result causes any regressions for corner cases of MULT
> and friends.  Hopefully I'll have something usable by tomorrow...

Situation with infinities/NaNs: without /S we should trap on those guys
for arithmetics and conversions; in trap summary we get EXC_M_INV (regardless
of the argument) *and* (unlike the treatment of denorms there) we should
set FPCR.INV.  With /S they are passed to operation, which is responsible
for raising whatever it wants to raise (so far they all seem to be doing
the right thing in that area).

With comparisons, denorm handling is the same as for arithemtics; i.e.
with /S they trigger INV unless DNZ is set (and, presumably, working DNOD
would have the same effect on them).

Anyway, the current delta (on top of 26f86) follows; seems to get IEEE
insns behave on non-finite arguments as they do on 21264.  The main
exception is that register bitmask supplied to trap isn't calculated in
a bunch of cases; since its main purpose is to help locating the trapping
insn and we report precise traps (amask feature bit 9), it's probably not
an interesting problem.  Current Linux kernel definitely won't look at that
thing under qemu; an old one might, but it would have to be something
older than 2.3...  than 2.2.8, actually.  And the impact
is that insns with /S getting a denorm argument won't be properly emulated
and you'll get SIGFPE.  Again, it has to be a really old kernel (older than
May 1999) to be affected at all.

diff --git a/target-alpha/fpu_helper.c b/target-alpha/fpu_helper.c
index 9b297de..637d95e 100644
--- a/target-alpha/fpu_helper.c
+++ b/target-alpha/fpu_helper.c
@@ -44,6 +44,12 @@ uint32_t helper_fp_exc_get(CPUAlphaState *env)
 return get_float_exception_flags(&FP_STATUS);
 }
 
+enum {
+   Exc_Mask = float_flag_invalid | float_flag_int_overflow |
+  float_flag_divbyzero | float_flag_overflow |
+  float_flag_underflow | float_flag_inexact
+};
+
 static inline void fp_exc_raise1(CPUAlphaState *env, uintptr_t retaddr,
  uint32_t exc, uint32_t regno, uint32_t hw_exc)
 {
@@ -73,7 +79,7 @@ static inline void fp_exc_raise1(CPUAlphaState *env, 
uintptr_t retaddr,
doesn't apply.  */
 void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
 if (exc) {
 env->fpcr_exc_status |= exc;
 exc &= ~ignore;
@@ -86,7 +92,7 @@ void helper_fp_exc_raise(CPUAlphaState *env, uint32_t ignore, 
uint32_t regno)
 /* Raise exceptions for ieee fp insns with software completion.  */
 void helper_fp_exc_raise_s(CPUAlphaState *env, uint32_t ignore, uint32_t regno)
 {
-uint32_t exc = (uint8_t)env->fp_status.float_exception_flags;
+uint32_t exc = (uint8_t)env->fp_status.float_exception_flags & Exc_Mask;
 if (exc) {
 env->fpcr_exc_status |= exc;
 exc &= ~ignore;
@@ -105,16 +111,14 @@ void helper_ieee_input(CPUAlphaState *env, uint64_t val)
 uint64_t frac = val & 0xfull;
 
 if (exp == 0) {
-/* Denormals without DNZ set raise an exception.  */
-if (frac != 0 && !env->fp_status.flush_inputs_to_zero) {
-arith_excp(env, GETPC(), EXC_M_UNF, 0);
+/* Denormals without /S raise an exception.  */
+if (frac != 0) {
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 } else if (exp == 0x7ff) {
-/* Infinity or NaN.  */
-/* ??? I'm not sure these exception bit flags are correct.  I do
-   know that the Linux kernel, at least, doesn't rely on them and
-   just emulates the insn to figure out what exception to use.  */
-arith_excp(env, GETPC(), frac ? EXC_M_INV : EXC_M_FOV, 0);
+/* Infinity or NaN */
+env->fpcr_exc_status |= float_flag_invalid;
+arith_excp(env, GETPC(), EXC_M_INV, 0);
 }
 }
 
@@ -125,16 +129,34 @@ void helper_ieee

Re: [Qemu-devel] [PATCH 4/4] block: Assert qiov length matches request length

2014-07-05 Thread Max Reitz

On 04.07.2014 17:55, Kevin Wolf wrote:

At least raw-posix relies on this because it can allocate bounce buffers
based on the request length, but access it using all of the qiov entries
later.

Signed-off-by: Kevin Wolf 
---
  block.c   |  2 ++
  block/raw-posix.c | 15 +++
  2 files changed, 13 insertions(+), 4 deletions(-)


Somehow I'm not sure whether I want to give my R-b to this, in case one 
of these assertions should fail. ;-)


But they look correct (considering QIO vectors now have to be just as 
long as the request), so:


Reviewed-by: Max Reitz 



Re: [Qemu-devel] [PATCH 3/4] qed: Make qiov match request size until backing file EOF

2014-07-05 Thread Max Reitz

On 04.07.2014 17:55, Kevin Wolf wrote:

If a QED image has a shorter backing file and a read request to
unallocated clusters goes across EOF of the backing file, the backing
file sees a shortened request and the rest is filled with zeros.
However, the original too long qiov was used with the shortened request.

This patch makes the qiov size match the request size, avoiding a
potential buffer overflow in raw-posix.

Signed-off-by: Kevin Wolf 
---
  block/qed.c | 26 +++---
  block/qed.h |  1 +
  2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index b69374b..1f63b8f 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -772,6 +772,7 @@ static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
   */
  static void qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
QEMUIOVector *qiov,
+  QEMUIOVector **backing_qiov,


This could be documented in the comment above the function header.


BlockDriverCompletionFunc *cb, void *opaque)
  {
  uint64_t backing_length = 0;
@@ -804,15 +805,20 @@ static void qed_read_backing_file(BDRVQEDState *s, 
uint64_t pos,
  /* If the read straddles the end of the backing file, shorten it */
  size = MIN((uint64_t)backing_length - pos, qiov->size);
  
+*backing_qiov = g_new(QEMUIOVector, 1);


I guess at least because of the qemu_iovec_destroy() block in 
qed_aio_next_io() *backing_qiov always has to be NULL before this point. 
I guess I'd like an assert(!*backing_qiov) here (or 
assert(!acb->backing_qiov) before the call to qed_read_backing_file() in 
qed_aio_read_data()) anyway to express clearly that there can be no leaks.


Speaking of leaks: Shouldn't the backing_qiov be freed in 
qed_aio_complete()?


Max



Re: [Qemu-devel] [PATCH 2/4] qcow2: Make qiov match request size until backing file EOF

2014-07-05 Thread Max Reitz

On 04.07.2014 17:55, Kevin Wolf wrote:

If a qcow2 image has a shorter backing file and a read request to
unallocated clusters goes across EOF of the backing file, the backing
file sees a shortened request and the rest is filled with zeros.
However, the original too long qiov was used with the shortened request.

This patch makes the qiov size match the request size, avoiding a
potential buffer overflow in raw-posix.

Signed-off-by: Kevin Wolf 
---
  block/qcow2.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)


Reviewed-by: Max Reitz 



Re: [Qemu-devel] [PATCH 1/4] block: Make qiov match the request size until EOF

2014-07-05 Thread Max Reitz

On 04.07.2014 17:55, Kevin Wolf wrote:

If a read request goes across EOF, the block driver sees a shortened
request that stops at EOF (the rest is memsetted in block.c), however
the original qiov was used for this request.

This patch makes the qiov size match the request size, avoiding a
potential buffer overflow in raw-posix.

Signed-off-by: Kevin Wolf 
---
  block.c | 11 ++-
  1 file changed, 10 insertions(+), 1 deletion(-)


Reviewed-by: Max Reitz 



[Qemu-devel] [PATCH v9 02/14] qcow2: Implement bdrv_make_empty()

2014-07-05 Thread Max Reitz
Implement this function by making all clusters in the image file fall
through to the backing file (by using the recently extended discard).

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 block/qcow2.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index ed65679..cce6fa9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2010,6 +2010,32 @@ fail:
 return ret;
 }
 
+static int qcow2_make_empty(BlockDriverState *bs)
+{
+int ret = 0;
+uint64_t start_sector;
+int sector_step = INT_MAX / BDRV_SECTOR_SIZE;
+
+for (start_sector = 0; start_sector < bs->total_sectors;
+ start_sector += sector_step)
+{
+/* As this function is generally used after committing an external
+ * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
+ * default action for this kind of discard is to pass the discard,
+ * which will ideally result in an actually smaller image file, as
+ * is probably desired. */
+ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
+ MIN(sector_step,
+ bs->total_sectors - start_sector),
+ QCOW2_DISCARD_SNAPSHOT, true);
+if (ret < 0) {
+break;
+}
+}
+
+return ret;
+}
+
 static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
 {
 BDRVQcowState *s = bs->opaque;
@@ -2406,6 +2432,7 @@ static BlockDriver bdrv_qcow2 = {
 .bdrv_co_discard= qcow2_co_discard,
 .bdrv_truncate  = qcow2_truncate,
 .bdrv_write_compressed  = qcow2_write_compressed,
+.bdrv_make_empty= qcow2_make_empty,
 
 .bdrv_snapshot_create   = qcow2_snapshot_create,
 .bdrv_snapshot_goto = qcow2_snapshot_goto,
-- 
2.0.1




[Qemu-devel] [PATCH v9 04/14] blockjob: Introduce block_job_complete_sync()

2014-07-05 Thread Max Reitz
Implement block_job_complete_sync() by doing the exact same thing as
block_job_cancel_sync() does, only with calling block_job_complete()
instead of block_job_cancel().

Signed-off-by: Max Reitz 
Reviewed-by: Kevin Wolf 
---
 blockjob.c   | 39 ---
 include/block/blockjob.h | 15 +++
 2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 67a64ea..2c98689 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -152,7 +152,7 @@ void block_job_iostatus_reset(BlockJob *job)
 }
 }
 
-struct BlockCancelData {
+struct BlockFinishData {
 BlockJob *job;
 BlockDriverCompletionFunc *cb;
 void *opaque;
@@ -160,19 +160,22 @@ struct BlockCancelData {
 int ret;
 };
 
-static void block_job_cancel_cb(void *opaque, int ret)
+static void block_job_finish_cb(void *opaque, int ret)
 {
-struct BlockCancelData *data = opaque;
+struct BlockFinishData *data = opaque;
 
 data->cancelled = block_job_is_cancelled(data->job);
 data->ret = ret;
 data->cb(data->opaque, ret);
 }
 
-int block_job_cancel_sync(BlockJob *job)
+static int block_job_finish_sync(BlockJob *job,
+ void (*finish)(BlockJob *, Error **errp),
+ Error **errp)
 {
-struct BlockCancelData data;
+struct BlockFinishData data;
 BlockDriverState *bs = job->bs;
+Error *local_err = NULL;
 
 assert(bs->job == job);
 
@@ -183,15 +186,37 @@ int block_job_cancel_sync(BlockJob *job)
 data.cb = job->cb;
 data.opaque = job->opaque;
 data.ret = -EINPROGRESS;
-job->cb = block_job_cancel_cb;
+job->cb = block_job_finish_cb;
 job->opaque = &data;
-block_job_cancel(job);
+finish(job, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EBUSY;
+}
 while (data.ret == -EINPROGRESS) {
 qemu_aio_wait();
 }
 return (data.cancelled && data.ret == 0) ? -ECANCELED : data.ret;
 }
 
+/* A wrapper around block_job_cancel() taking an Error ** parameter so it may 
be
+ * used with block_job_finish_sync() without the need for (rather nasty)
+ * function pointer casts there. */
+static void block_job_cancel_err(BlockJob *job, Error **errp)
+{
+block_job_cancel(job);
+}
+
+int block_job_cancel_sync(BlockJob *job)
+{
+return block_job_finish_sync(job, &block_job_cancel_err, NULL);
+}
+
+int block_job_complete_sync(BlockJob *job, Error **errp)
+{
+return block_job_finish_sync(job, &block_job_complete, errp);
+}
+
 void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
 {
 assert(job->busy);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index f3cf63f..88f4be2 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -273,6 +273,21 @@ bool block_job_is_paused(BlockJob *job);
 int block_job_cancel_sync(BlockJob *job);
 
 /**
+ * block_job_complete_sync:
+ * @job: The job to be completed.
+ * @errp: Error object which may be set by block_job_complete(); this is not
+ *necessarily set on every error, the job return value has to be
+ *checked as well.
+ *
+ * Synchronously complete the job.  The completion callback is called before 
the
+ * function returns, unless it is NULL (which is permissible when using this
+ * function).
+ *
+ * Returns the return value from the job.
+ */
+int block_job_complete_sync(BlockJob *job, Error **errp);
+
+/**
  * block_job_iostatus_reset:
  * @job: The job whose I/O status should be reset.
  *
-- 
2.0.1




[Qemu-devel] [PATCH v9 07/14] qemu-img: Implement commit like QMP

2014-07-05 Thread Max Reitz
qemu-img should use QMP commands whenever possible in order to ensure
feature completeness of both online and offline image operations. As
qemu-img itself has no access to QMP (since this would basically require
just everything being linked into qemu-img), imitate QMP's
implementation of block-commit by using commit_active_start() and then
waiting for the block job to finish.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/Makefile.objs |  2 +-
 qemu-img.c  | 83 +
 2 files changed, 65 insertions(+), 20 deletions(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index fd88c03..2c37e80 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -9,6 +9,7 @@ block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
+block-obj-y += mirror.o
 
 ifeq ($(CONFIG_POSIX),y)
 block-obj-y += nbd.o nbd-client.o sheepdog.o
@@ -22,7 +23,6 @@ endif
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += mirror.o
 common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
diff --git a/qemu-img.c b/qemu-img.c
index c98896b..26d8b11 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -30,6 +30,7 @@
 #include "qemu/osdep.h"
 #include "sysemu/sysemu.h"
 #include "block/block_int.h"
+#include "block/blockjob.h"
 #include "block/qapi.h"
 #include 
 #include 
@@ -717,12 +718,46 @@ fail:
 return ret;
 }
 
+typedef struct CommonBlockJobCBInfo {
+BlockDriverState *bs;
+Error **errp;
+} CommonBlockJobCBInfo;
+
+static void common_block_job_cb(void *opaque, int ret)
+{
+CommonBlockJobCBInfo *cbi = opaque;
+
+if (ret < 0) {
+error_setg_errno(cbi->errp, -ret, "Block job failed");
+}
+
+/* Drop this block job's reference */
+bdrv_unref(cbi->bs);
+}
+
+static void run_block_job(BlockJob *job, Error **errp)
+{
+AioContext *aio_context = bdrv_get_aio_context(job->bs);
+
+do {
+aio_poll(aio_context, true);
+
+if (!job->busy && !job->ready) {
+block_job_resume(job);
+}
+} while (!job->ready);
+
+block_job_complete_sync(job, errp);
+}
+
 static int img_commit(int argc, char **argv)
 {
 int c, ret, flags;
 const char *filename, *fmt, *cache;
-BlockDriverState *bs;
+BlockDriverState *bs, *base_bs;
 bool quiet = false;
+Error *local_err = NULL;
+CommonBlockJobCBInfo cbi;
 
 fmt = NULL;
 cache = BDRV_DEFAULT_CACHE;
@@ -763,29 +798,39 @@ static int img_commit(int argc, char **argv)
 if (!bs) {
 return 1;
 }
-ret = bdrv_commit(bs);
-switch(ret) {
-case 0:
-qprintf(quiet, "Image committed.\n");
-break;
-case -ENOENT:
-error_report("No disk inserted");
-break;
-case -EACCES:
-error_report("Image is read-only");
-break;
-case -ENOTSUP:
-error_report("Image is already committed");
-break;
-default:
-error_report("Error while committing image");
-break;
+
+/* This is different from QMP, which by default uses the deepest file in 
the
+ * backing chain (i.e., the very base); however, the traditional behavior 
of
+ * qemu-img commit is using the immediate backing file. */
+base_bs = bs->backing_hd;
+if (!base_bs) {
+error_set(&local_err, QERR_BASE_NOT_FOUND, "NULL");
+goto done;
+}
+
+cbi = (CommonBlockJobCBInfo){
+.errp = &local_err,
+.bs   = bs,
+};
+
+commit_active_start(bs, base_bs, 0, BLOCKDEV_ON_ERROR_REPORT,
+common_block_job_cb, &cbi, &local_err);
+if (local_err) {
+goto done;
 }
 
+run_block_job(bs->job, &local_err);
+
+done:
 bdrv_unref(bs);
-if (ret) {
+
+if (local_err) {
+qerror_report_err(local_err);
+error_free(local_err);
 return 1;
 }
+
+qprintf(quiet, "Image committed.\n");
 return 0;
 }
 
-- 
2.0.1




Re: [Qemu-devel] [libvirt] Not able to run "virsh qemu-agent-command" when socat is working

2014-07-05 Thread Puneet Bakshi
Sorry, that was a typo. But, it is still not working. (For some reason, I
had to move to Windows2012 with name vm_win_05).

[root@sdsr720-14 virtio-win]# echo "{'execute':'guest-ping'}" | socat
stdio,ignoreeof /var/lib/libvirt/qemu/g05.agent
{"return": {}}

[root@sdsr720-14 virtio-win]# virsh qemu-agent-command vm_win_05 '{
"execute":"guest-ping"}'

[root@sdsr720-14 virtio-win]#

Regards,
~Puneet


On Wed, Jul 2, 2014 at 10:09 PM, Eric Blake  wrote:

> On 07/02/2014 01:13 AM, Puneet Bakshi wrote:
> > Hi,
> >
> > I am running qemu guest agent in Windows 2k8. I am able to execute
> > "qemu-agent-commands" using socat but not through "virsh
> > qemu-agent-command".
> >
>
> > *Host CentOS system*
> >
> > socat returns response appropriately.
> > [root@sdsr720-14 ~]# echo "{'execute':'guest-ping'}" | socat
> > stdio,ignoreeof /var/lib/libvirt/qemu/g06.agent
> > {"return": {}}
>
> Note your spelling...
>
> >
> > *"virsh qemu-agent-command" returns blank.*
> >
> > [root@sdsr720-14 ~]# virsh qemu-agent-command vm_win_06 '{
> "execute":
> > "guest_ping"}'
>
> and compare it to here.  There is no guest_ping command, only
> guest-ping.  It's a bug in libvirt that guest-agent-command doesn't
> output a useful error message when attempting to run a non-existing
> command, but you'll never hit that bug if you pass valid commands to the
> agent in the first place.
>
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>


[Qemu-devel] [PATCH v9 06/14] block/mirror: Improve progress report

2014-07-05 Thread Max Reitz
Instead of taking the total length of the block device as the block
job's length, use the number of dirty sectors. The progress is now the
number of sectors mirrored to the target block device. Note that this
may result in the job's length increasing during operation, which is
however in fact desirable.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 6c3ee70..39c52c3 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -45,6 +45,7 @@ typedef struct MirrorBlockJob {
 int64_t sector_num;
 int64_t granularity;
 size_t buf_size;
+int64_t bdev_length;
 unsigned long *cow_bitmap;
 BdrvDirtyBitmap *dirty_bitmap;
 HBitmapIter hbi;
@@ -54,6 +55,7 @@ typedef struct MirrorBlockJob {
 
 unsigned long *in_flight_bitmap;
 int in_flight;
+int sectors_in_flight;
 int ret;
 } MirrorBlockJob;
 
@@ -87,6 +89,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors, ret);
 
 s->in_flight--;
+s->sectors_in_flight -= op->nb_sectors;
 iov = op->qiov.iov;
 for (i = 0; i < op->qiov.niov; i++) {
 MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
@@ -98,8 +101,11 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 chunk_num = op->sector_num / sectors_per_chunk;
 nb_chunks = op->nb_sectors / sectors_per_chunk;
 bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
-if (s->cow_bitmap && ret >= 0) {
-bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
+if (ret >= 0) {
+if (s->cow_bitmap) {
+bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
+}
+s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
 }
 
 qemu_iovec_destroy(&op->qiov);
@@ -172,7 +178,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 hbitmap_next_sector = s->sector_num;
 sector_num = s->sector_num;
 sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-end = s->common.len >> BDRV_SECTOR_BITS;
+end = s->bdev_length / BDRV_SECTOR_SIZE;
 
 /* Extend the QEMUIOVector to include all adjacent blocks that will
  * be copied in this operation.
@@ -284,6 +290,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 
 /* Copy the dirty cluster.  */
 s->in_flight++;
+s->sectors_in_flight += nb_sectors;
 trace_mirror_one_iteration(s, sector_num, nb_sectors);
 bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
mirror_read_complete, op);
@@ -329,11 +336,11 @@ static void coroutine_fn mirror_run(void *opaque)
 goto immediate_exit;
 }
 
-s->common.len = bdrv_getlength(bs);
-if (s->common.len < 0) {
-ret = s->common.len;
+s->bdev_length = bdrv_getlength(bs);
+if (s->bdev_length < 0) {
+ret = s->bdev_length;
 goto immediate_exit;
-} else if (s->common.len == 0) {
+} else if (s->bdev_length == 0) {
 /* Report BLOCK_JOB_READY and wait for complete. */
 block_job_event_ready(&s->common);
 s->synced = true;
@@ -344,7 +351,7 @@ static void coroutine_fn mirror_run(void *opaque)
 goto immediate_exit;
 }
 
-length = DIV_ROUND_UP(s->common.len, s->granularity);
+length = DIV_ROUND_UP(s->bdev_length, s->granularity);
 s->in_flight_bitmap = bitmap_new(length);
 
 /* If we have no backing file yet in the destination, we cannot let
@@ -364,7 +371,7 @@ static void coroutine_fn mirror_run(void *opaque)
 }
 }
 
-end = s->common.len >> BDRV_SECTOR_BITS;
+end = s->bdev_length / BDRV_SECTOR_SIZE;
 s->buf = qemu_blockalign(bs, s->buf_size);
 sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
 mirror_free_init(s);
@@ -404,6 +411,12 @@ static void coroutine_fn mirror_run(void *opaque)
 }
 
 cnt = bdrv_get_dirty_count(bs, s->dirty_bitmap);
+/* s->common.offset contains the number of bytes already processed so
+ * far, cnt is the number of dirty sectors remaining and
+ * s->sectors_in_flight is the number of sectors currently being
+ * processed; together those are the current total operation length */
+s->common.len = s->common.offset +
+(cnt + s->sectors_in_flight) * BDRV_SECTOR_SIZE;
 
 /* Note that even when no rate limit is applied we need to yield
  * periodically with no pending I/O so that qemu_aio_flush() returns.
@@ -440,7 +453,6 @@ static void coroutine_fn mirror_run(void *opaque)
  * report completion.  This way, block-job-cancel will leave
  * the target in a consistent state.
  */
-s->common.offset = end * BDRV_SECTOR_SIZE;
 if (!s->synced) {
 block_job_event_ready(&s->common);
  

[Qemu-devel] [PATCH v9 12/14] iotests: Add test for backing-chain commits

2014-07-05 Thread Max Reitz
Add a test for qemu-img commit on backing chains with more than two
images. This test also checks whether the top image is emptied (unless
this is prevented by specifying either -d or -b) and does therefore not
work for qed and vmdk which requires it to be separate from 020.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/097 | 122 +
 tests/qemu-iotests/097.out | 119 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 242 insertions(+)
 create mode 100755 tests/qemu-iotests/097
 create mode 100644 tests/qemu-iotests/097.out

diff --git a/tests/qemu-iotests/097 b/tests/qemu-iotests/097
new file mode 100755
index 000..c7a613b
--- /dev/null
+++ b/tests/qemu-iotests/097
@@ -0,0 +1,122 @@
+#!/bin/bash
+#
+# Commit changes into backing chains and empty the top image if the
+# backing image is not explicitly specified
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+_rm_test_img "$TEST_IMG.itmd"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.pattern
+
+# Any format supporting backing files and bdrv_make_empty
+_supported_fmt qcow qcow2
+_supported_proto file
+_supported_os Linux
+
+
+# Four passes:
+#  0: Two-layer backing chain, commit to upper backing file (implicitly)
+# (in this case, the top image will be emptied)
+#  1: Two-layer backing chain, commit to upper backing file (explicitly)
+# (in this case, the top image will implicitly stay unchanged)
+#  2: Two-layer backing chain, commit to upper backing file (implicitly with 
-d)
+# (in this case, the top image will explicitly stay unchanged)
+#  3: Two-layer backing chain, commit to lower backing file
+# (in this case, the top image will implicitly stay unchanged)
+#
+# 020 already tests committing, so this only tests whether image chains are
+# working properly and that all images above the base are emptied; therefore,
+# no complicated patterns are necessary
+for i in 0 1 2 3; do
+
+echo
+echo "=== Test pass $i ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img 64M
+TEST_IMG="$TEST_IMG.itmd" _make_test_img -b "$TEST_IMG.base" 64M
+_make_test_img -b "$TEST_IMG.itmd" 64M
+
+$QEMU_IO -c 'write -P 1 0 192k' "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c 'write -P 2 64k 128k' "$TEST_IMG.itmd" | _filter_qemu_io
+$QEMU_IO -c 'write -P 3 128k 64k' "$TEST_IMG" | _filter_qemu_io
+
+if [ $i -lt 3 ]; then
+if [ $i == 0 ]; then
+# -b "$TEST_IMG.itmd" should be the default (that is, committing to the
+# first backing file in the chain)
+$QEMU_IMG commit "$TEST_IMG"
+elif [ $i == 1 ]; then
+# explicitly specify the commit target (this should imply -d)
+$QEMU_IMG commit -b "$TEST_IMG.itmd" "$TEST_IMG"
+else
+# do not explicitly specify the commit target, but use -d to leave the
+# top image unchanged
+$QEMU_IMG commit -d "$TEST_IMG"
+fi
+
+# Bottom should be unchanged
+$QEMU_IO -c 'read -P 1 0 192k' "$TEST_IMG.base" | _filter_qemu_io
+
+# Intermediate should contain changes from top
+$QEMU_IO -c 'read -P 1 0 64k' "$TEST_IMG.itmd" | _filter_qemu_io
+$QEMU_IO -c 'read -P 2 64k 64k' "$TEST_IMG.itmd" | _filter_qemu_io
+$QEMU_IO -c 'read -P 3 128k 64k' "$TEST_IMG.itmd" | _filter_qemu_io
+
+# And in pass 0, the top image should be empty, whereas in both other 
passes
+# it should be unchanged (which is both checked by qemu-img map)
+else
+$QEMU_IMG commit -b "$TEST_IMG.base" "$TEST_IMG"
+
+# Bottom should contain all changes
+$QEMU_IO -c 'read -P 1 0 64k' "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c 'read -P 2 64k 64k' "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c 'read -P 3 128k 64k' "$TEST_IMG.base" | _filter_qemu_io
+
+# Both top and intermediate should be unchanged
+fi
+
+$QEMU_IMG map "$TEST_IMG.base" | _filter_qemu_img_map
+$QEMU_IMG map "$TEST_IMG.itmd" | _filter_qemu_img_map
+$QEMU_IMG map "$TEST_IMG" | _filter_qemu_img_map
+
+done
+
+
+# success, all done

[Qemu-devel] [PATCH v9 05/14] blockjob: Add "ready" field

2014-07-05 Thread Max Reitz
When a block job signals readiness, this is currently reported only
through QMP. If qemu wants to use block jobs for internal tasks, there
needs to be another way to correctly detect when a block job may be
completed.

For this reason, introduce a bool "ready" which is set when the block
job may be completed.

Signed-off-by: Max Reitz 
---
 blockjob.c   | 3 +++
 include/block/blockjob.h | 5 +
 qapi/block-core.json | 4 +++-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/blockjob.c b/blockjob.c
index 2c98689..ff1d78e 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -260,6 +260,7 @@ BlockJobInfo *block_job_query(BlockJob *job)
 info->offset= job->offset;
 info->speed = job->speed;
 info->io_status = job->iostatus;
+info->ready = job->ready;
 return info;
 }
 
@@ -295,6 +296,8 @@ void block_job_event_completed(BlockJob *job, const char 
*msg)
 
 void block_job_event_ready(BlockJob *job)
 {
+job->ready = true;
+
 qapi_event_send_block_job_ready(job->driver->job_type,
 bdrv_get_device_name(job->bs),
 job->len,
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 88f4be2..97658ca 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -91,6 +91,11 @@ struct BlockJob {
  */
 bool busy;
 
+/**
+ * Set to true when the job is ready to be completed.
+ */
+bool ready;
+
 /** Status that is published by the query-block-jobs QMP API */
 BlockDeviceIoStatus iostatus;
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index e378653..72ebfef 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -505,12 +505,14 @@
 #
 # @io-status: the status of the job (since 1.3)
 #
+# @ready: true if the job may be completed (since 2.1)
+#
 # Since: 1.1
 ##
 { 'type': 'BlockJobInfo',
   'data': {'type': 'str', 'device': 'str', 'len': 'int',
'offset': 'int', 'busy': 'bool', 'paused': 'bool', 'speed': 'int',
-   'io-status': 'BlockDeviceIoStatus'} }
+   'io-status': 'BlockDeviceIoStatus', 'ready': 'bool'} }
 
 ##
 # @query-block-jobs:
-- 
2.0.1




[Qemu-devel] [PATCH v9 11/14] iotests: Add _filter_qemu_img_map

2014-07-05 Thread Max Reitz
As different image formats most probably map guest addresses to
different host addresses, add a filter to filter the host addresses out;
also, the image filename should be filtered.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/common.filter | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index a04df7f..8b14edb 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -170,5 +170,12 @@ _filter_qmp()
 -e 's#^{"QMP":.*}$#QMP_VERSION#'
 }
 
+# filter out offsets and file names from qemu-img map
+_filter_qemu_img_map()
+{
+sed -e 's/\([0-9a-fx]* *[0-9a-fx]* *\)[0-9a-fx]* */\1/g' \
+-e 's/Mapped to *//' | _filter_testdir | _filter_imgfmt
+}
+
 # make sure this script returns success
 /bin/true
-- 
2.0.1




[Qemu-devel] [PATCH v9 03/14] qcow2: Optimize bdrv_make_empty()

2014-07-05 Thread Max Reitz
bdrv_make_empty() is currently only called if the current image
represents an external snapshot that has been committed to its base
image; it is therefore unlikely to have internal snapshots. In this
case, bdrv_make_empty() can be greatly sped up by creating an empty L1
table and dropping all data clusters at once by recreating the refcount
structure accordingly instead of normally discarding all clusters.

If there are snapshots, fall back to the simple implementation (discard
all clusters).

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 block/qcow2.c | 389 +++---
 1 file changed, 374 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index cce6fa9..0aba8dd 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2010,27 +2010,386 @@ fail:
 return ret;
 }
 
+/*
+ * Creates a reftable pointing to refblocks following right afterwards and an
+ * empty L1 table at the given @offset. @refblocks is the number of refblocks
+ * to create.
+ *
+ * This combination of structures (reftable+refblocks+L1) is here called a
+ * "blob".
+ */
+static int create_refcount_l1(BlockDriverState *bs, uint64_t offset,
+  uint64_t refblocks)
+{
+BDRVQcowState *s = bs->opaque;
+uint64_t *reftable = NULL;
+uint16_t *refblock = NULL;
+uint64_t reftable_clusters;
+uint64_t rbi;
+uint64_t blob_start, blob_end;
+uint64_t l2_tables, l1_clusters, l1_size2;
+uint8_t l1_size_and_offset[12];
+uint64_t rt_offset;
+int ret, i;
+
+reftable_clusters = DIV_ROUND_UP(refblocks, s->cluster_size / 8);
+l2_tables = DIV_ROUND_UP(bs->total_sectors / s->cluster_sectors,
+ s->cluster_size / 8);
+l1_clusters = DIV_ROUND_UP(l2_tables, s->cluster_size / 8);
+l1_size2 = l1_clusters << s->cluster_bits;
+
+blob_start = offset;
+blob_end = offset + ((reftable_clusters + refblocks + l1_clusters) <<
+s->cluster_bits);
+
+ret = qcow2_pre_write_overlap_check(bs, 0, blob_start,
+blob_end - blob_start);
+if (ret < 0) {
+return ret;
+}
+
+/* Create the reftable pointing with entries pointing consecutively to the
+ * following clusters */
+reftable = g_malloc0_n(reftable_clusters, s->cluster_size);
+
+for (rbi = 0; rbi < refblocks; rbi++) {
+reftable[rbi] = cpu_to_be64(offset + ((reftable_clusters + rbi) <<
+s->cluster_bits));
+}
+
+ret = bdrv_write(bs->file, offset >> BDRV_SECTOR_BITS, (uint8_t *)reftable,
+ reftable_clusters * s->cluster_sectors);
+if (ret < 0) {
+goto out;
+}
+
+offset += reftable_clusters << s->cluster_bits;
+
+/* Keep the reftable, as we will need it for the BDRVQcowState anyway */
+
+/* Now, create all the refblocks */
+refblock = g_malloc(s->cluster_size);
+
+for (rbi = 0; rbi < refblocks; rbi++) {
+for (i = 0; i < s->cluster_size / 2; i++) {
+uint64_t cluster_offset = ((rbi << (s->cluster_bits - 1)) + i)
+  << s->cluster_bits;
+
+/* Only 0 and 1 are possible as refcounts here */
+refblock[i] = cpu_to_be16(!cluster_offset ||
+  (cluster_offset >= blob_start &&
+   cluster_offset <  blob_end));
+}
+
+ret = bdrv_write(bs->file, offset >> BDRV_SECTOR_BITS,
+ (uint8_t *)refblock, s->cluster_sectors);
+if (ret < 0) {
+goto out;
+}
+
+offset += s->cluster_size;
+}
+
+g_free(refblock);
+refblock = NULL;
+
+/* The L1 table is very simple */
+ret = bdrv_write_zeroes(bs->file, offset >> BDRV_SECTOR_BITS,
+l1_clusters * s->cluster_sectors,
+BDRV_REQ_MAY_UNMAP);
+if (ret < 0) {
+goto out;
+}
+
+/* Now make sure all changes are stable and clear all caches */
+bdrv_flush(bs);
+
+/* This is probably not really necessary, but it cannot hurt, either */
+ret = qcow2_mark_clean(bs);
+if (ret < 0) {
+goto out;
+}
+
+ret = qcow2_cache_empty(bs, s->l2_table_cache);
+if (ret < 0) {
+goto out;
+}
+
+ret = qcow2_cache_empty(bs, s->refcount_block_cache);
+if (ret < 0) {
+goto out;
+}
+
+/* Modify the image header to point to the new blank L1 table. This will
+ * leak all currently existing data clusters, which is fine. */
+BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
+
+assert(l1_size2 / 8 <= UINT32_MAX);
+cpu_to_be32w((uint32_t *)&l1_size_and_offset[0], l1_size2 / 8);
+cpu_to_be64w((uint64_t *)&l1_size_and_offset[4], offset);
+ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_size),
+   l1_size_and_offset, sizeof(l1_size_and_offset));
+if (re

[Qemu-devel] [PATCH v9 10/14] qemu-img: Specify backing file for commit

2014-07-05 Thread Max Reitz
Introduce a new parameter for qemu-img commit which may be used to
explicitly specify the backing file into which an image should be
committed if the backing chain has more than a single layer.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qemu-img-cmds.hx |  4 ++--
 qemu-img.c   | 24 +---
 qemu-img.texi|  9 -
 3 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index ea41d4f..4331949 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -22,9 +22,9 @@ STEXI
 ETEXI
 
 DEF("commit", img_commit,
-"commit [-q] [-f fmt] [-t cache] [-d] [-p] filename")
+"commit [-q] [-f fmt] [-t cache] [-b base] [-d] [-p] filename")
 STEXI
-@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-d] [-p] @var{filename}
+@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-b @var{base}] [-d] [-p] 
@var{filename}
 ETEXI
 
 DEF("compare", img_compare,
diff --git a/qemu-img.c b/qemu-img.c
index 8409062..daf4bba 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -759,7 +759,7 @@ static void run_block_job(BlockJob *job, Error **errp)
 static int img_commit(int argc, char **argv)
 {
 int c, ret, flags;
-const char *filename, *fmt, *cache;
+const char *filename, *fmt, *cache, *base;
 BlockDriverState *bs, *base_bs;
 bool progress = false, quiet = false, drop = false;
 Error *local_err = NULL;
@@ -767,8 +767,9 @@ static int img_commit(int argc, char **argv)
 
 fmt = NULL;
 cache = BDRV_DEFAULT_CACHE;
+base = NULL;
 for(;;) {
-c = getopt(argc, argv, "f:ht:dpq");
+c = getopt(argc, argv, "f:ht:b:dpq");
 if (c == -1) {
 break;
 }
@@ -783,6 +784,11 @@ static int img_commit(int argc, char **argv)
 case 't':
 cache = optarg;
 break;
+case 'b':
+base = optarg;
+/* -b implies -d */
+drop = true;
+break;
 case 'd':
 drop = true;
 break;
@@ -820,12 +826,16 @@ static int img_commit(int argc, char **argv)
 qemu_progress_init(progress, 1.f);
 qemu_progress_print(0.f, 100);
 
-/* This is different from QMP, which by default uses the deepest file in 
the
- * backing chain (i.e., the very base); however, the traditional behavior 
of
- * qemu-img commit is using the immediate backing file. */
-base_bs = bs->backing_hd;
+if (base) {
+base_bs = bdrv_find_backing_image(bs, base);
+} else {
+/* This is different from QMP, which by default uses the deepest file 
in
+ * the backing chain (i.e., the very base); however, the traditional
+ * behavior of qemu-img commit is using the immediate backing file. */
+base_bs = bs->backing_hd;
+}
 if (!base_bs) {
-error_set(&local_err, QERR_BASE_NOT_FOUND, "NULL");
+error_set(&local_err, QERR_BASE_NOT_FOUND, base ?: "NULL");
 goto done;
 }
 
diff --git a/qemu-img.texi b/qemu-img.texi
index 5c151af..04b6c9e 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -163,7 +163,7 @@ this case. @var{backing_file} will never be modified unless 
you use the
 The size can also be specified using the @var{size} option with @code{-o},
 it doesn't need to be specified separately in this case.
 
-@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-d] [-p] @var{filename}
+@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-b @var{base}] [-d] [-p] 
@var{filename}
 
 Commit the changes recorded in @var{filename} in its base image or backing 
file.
 If the backing file is smaller than the snapshot, then the backing file will be
@@ -176,6 +176,13 @@ The image @var{filename} is emptied after the operation 
has succeeded. If you do
 not need @var{filename} afterwards and intend to drop it, you may skip emptying
 @var{filename} by specifying the @code{-d} flag.
 
+If the backing chain of the given image file @var{filename} has more than one
+layer, the backing file into which the changes will be committed may be
+specified as @var{base} (which has to be part of @var{filename}'s backing
+chain). If @var{base} is not specified, the immediate backing file of the top
+image (which is @var{filename}) will be used. For reasons of consistency,
+explicitly specifying @var{base} will always imply @code{-d}.
+
 @item compare [-f @var{fmt}] [-F @var{fmt}] [-p] [-s] [-q] @var{filename1} 
@var{filename2}
 
 Check if two images have the same content. You can compare images with
-- 
2.0.1




[Qemu-devel] [PATCH v9 01/14] qcow2: Allow "full" discard

2014-07-05 Thread Max Reitz
Normally, discarded sectors should read back as zero. However, there are
cases in which a sector (or rather cluster) should be discarded as if
they were never written in the first place, that is, reading them should
fall through to the backing file again.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 block/qcow2-cluster.c  | 27 +--
 block/qcow2-snapshot.c |  2 +-
 block/qcow2.c  |  2 +-
 block/qcow2.h  |  2 +-
 4 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4208dc0..ce52f9b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1351,7 +1351,7 @@ int qcow2_decompress_cluster(BlockDriverState *bs, 
uint64_t cluster_offset)
  * clusters.
  */
 static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
-unsigned int nb_clusters, enum qcow2_discard_type type)
+unsigned int nb_clusters, enum qcow2_discard_type type, bool full_discard)
 {
 BDRVQcowState *s = bs->opaque;
 uint64_t *l2_table;
@@ -1373,23 +1373,30 @@ static int discard_single_l2(BlockDriverState *bs, 
uint64_t offset,
 old_l2_entry = be64_to_cpu(l2_table[l2_index + i]);
 
 /*
- * Make sure that a discarded area reads back as zeroes for v3 images
- * (we cannot do it for v2 without actually writing a zero-filled
- * buffer). We can skip the operation if the cluster is already marked
- * as zero, or if it's unallocated and we don't have a backing file.
+ * If full_discard is false, make sure that a discarded area reads back
+ * as zeroes for v3 images (we cannot do it for v2 without actually
+ * writing a zero-filled buffer). We can skip the operation if the
+ * cluster is already marked as zero, or if it's unallocated and we
+ * don't have a backing file.
  *
  * TODO We might want to use bdrv_get_block_status(bs) here, but we're
  * holding s->lock, so that doesn't work today.
+ *
+ * If full_discard is true, the sector should not read back as zeroes,
+ * but rather fall through to the backing file.
  */
 switch (qcow2_get_cluster_type(old_l2_entry)) {
 case QCOW2_CLUSTER_UNALLOCATED:
-if (!bs->backing_hd) {
+if (full_discard || !bs->backing_hd) {
 continue;
 }
 break;
 
 case QCOW2_CLUSTER_ZERO:
-continue;
+if (!full_discard) {
+continue;
+}
+break;
 
 case QCOW2_CLUSTER_NORMAL:
 case QCOW2_CLUSTER_COMPRESSED:
@@ -1401,7 +1408,7 @@ static int discard_single_l2(BlockDriverState *bs, 
uint64_t offset,
 
 /* First remove L2 entries */
 qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
-if (s->qcow_version >= 3) {
+if (!full_discard && s->qcow_version >= 3) {
 l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
 } else {
 l2_table[l2_index + i] = cpu_to_be64(0);
@@ -1420,7 +1427,7 @@ static int discard_single_l2(BlockDriverState *bs, 
uint64_t offset,
 }
 
 int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-int nb_sectors, enum qcow2_discard_type type)
+int nb_sectors, enum qcow2_discard_type type, bool full_discard)
 {
 BDRVQcowState *s = bs->opaque;
 uint64_t end_offset;
@@ -1443,7 +1450,7 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t 
offset,
 
 /* Each L2 table is handled by its own loop iteration */
 while (nb_clusters > 0) {
-ret = discard_single_l2(bs, offset, nb_clusters, type);
+ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
 if (ret < 0) {
 goto fail;
 }
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0aa9def..c5ea2cd 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -436,7 +436,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
align_offset(sn->vm_state_size, s->cluster_size)
 >> BDRV_SECTOR_BITS,
-   QCOW2_DISCARD_NEVER);
+   QCOW2_DISCARD_NEVER, false);
 
 #ifdef DEBUG_ALLOC
 {
diff --git a/block/qcow2.c b/block/qcow2.c
index 67e55c9..ed65679 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1868,7 +1868,7 @@ static coroutine_fn int qcow2_co_discard(BlockDriverState 
*bs,
 
 qemu_co_mutex_lock(&s->lock);
 ret = qcow2_discard_clusters(bs, sector_num << BDRV_SECTOR_BITS,
-nb_sectors, QCOW2_DISCARD_REQUEST);
+nb_sectors, QCOW2_DISCARD_REQUEST, false);
 qemu_co_mutex_unlock(&s->lock);
 return ret;
 }
diff --git a/block/qcow2.h b/block/qcow2.h
i

[Qemu-devel] [PATCH v9 14/14] iotests: Omit length/offset test in 040 and 041

2014-07-05 Thread Max Reitz
As the length of a mirror block job no longer directly depends on the
size of the block device, drop those checks from this test. Instead,
just check whether the final offset equals the block job length.

As 041 uses the wait_until_completed function from iotests.py, the same
applies there as well which in turn affects tests 030, 055 and 056. On
the other hand, a block job's length does not have to be related to the
length of the image file in the first place, so that check was
questionable anyway.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/040| 4 +---
 tests/qemu-iotests/041| 3 +--
 tests/qemu-iotests/iotests.py | 3 +--
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index f1e16c1..2b432ad 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -43,8 +43,7 @@ class ImageCommitTestCase(iotests.QMPTestCase):
 if event['event'] == 'BLOCK_JOB_COMPLETED':
 self.assert_qmp(event, 'data/type', 'commit')
 self.assert_qmp(event, 'data/device', 'drive0')
-self.assert_qmp(event, 'data/offset', self.image_len)
-self.assert_qmp(event, 'data/len', self.image_len)
+self.assert_qmp(event, 'data/offset', event['data']['len'])
 if need_ready:
 self.assertTrue(ready, "Expecting BLOCK_JOB_COMPLETED 
event")
 completed = True
@@ -52,7 +51,6 @@ class ImageCommitTestCase(iotests.QMPTestCase):
 ready = True
 self.assert_qmp(event, 'data/type', 'commit')
 self.assert_qmp(event, 'data/device', 'drive0')
-self.assert_qmp(event, 'data/len', self.image_len)
 self.vm.qmp('block-job-complete', device='drive0')
 
 self.assert_no_active_block_jobs()
diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 0815e19..d8bb76c 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -52,8 +52,7 @@ class ImageMirroringTestCase(iotests.QMPTestCase):
 event = self.cancel_and_wait(drive=drive)
 self.assertEquals(event['event'], 'BLOCK_JOB_COMPLETED')
 self.assert_qmp(event, 'data/type', 'mirror')
-self.assert_qmp(event, 'data/offset', self.image_len)
-self.assert_qmp(event, 'data/len', self.image_len)
+self.assert_qmp(event, 'data/offset', event['data']['len'])
 
 def complete_and_wait(self, drive='drive0', wait_ready=True):
 '''Complete a block job and wait for it to finish'''
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 39a4cfc..f57f154 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -267,8 +267,7 @@ class QMPTestCase(unittest.TestCase):
 self.assert_qmp(event, 'data/device', drive)
 self.assert_qmp_absent(event, 'data/error')
 if check_offset:
-self.assert_qmp(event, 'data/offset', self.image_len)
-self.assert_qmp(event, 'data/len', self.image_len)
+self.assert_qmp(event, 'data/offset', 
event['data']['len'])
 completed = True
 
 self.assert_no_active_block_jobs()
-- 
2.0.1




[Qemu-devel] [PATCH v9 09/14] qemu-img: Enable progress output for commit

2014-07-05 Thread Max Reitz
Implement progress output for the commit command by querying the
progress of the block job.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qemu-img-cmds.hx |  4 ++--
 qemu-img.c   | 24 ++--
 qemu-img.texi|  2 +-
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index b31d81c..ea41d4f 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -22,9 +22,9 @@ STEXI
 ETEXI
 
 DEF("commit", img_commit,
-"commit [-q] [-f fmt] [-t cache] [-d] filename")
+"commit [-q] [-f fmt] [-t cache] [-d] [-p] filename")
 STEXI
-@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-d] @var{filename}
+@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-d] [-p] @var{filename}
 ETEXI
 
 DEF("compare", img_compare,
diff --git a/qemu-img.c b/qemu-img.c
index 47cc94e..8409062 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -742,12 +742,18 @@ static void run_block_job(BlockJob *job, Error **errp)
 do {
 aio_poll(aio_context, true);
 
+qemu_progress_print((float)job->offset / job->len * 100.f, 0);
+
 if (!job->busy && !job->ready) {
 block_job_resume(job);
 }
 } while (!job->ready);
 
 block_job_complete_sync(job, errp);
+
+/* A block job may finish instantaneously without publishing any progress,
+ * so just signal completion here */
+qemu_progress_print(100.f, 0);
 }
 
 static int img_commit(int argc, char **argv)
@@ -755,14 +761,14 @@ static int img_commit(int argc, char **argv)
 int c, ret, flags;
 const char *filename, *fmt, *cache;
 BlockDriverState *bs, *base_bs;
-bool quiet = false, drop = false;
+bool progress = false, quiet = false, drop = false;
 Error *local_err = NULL;
 CommonBlockJobCBInfo cbi;
 
 fmt = NULL;
 cache = BDRV_DEFAULT_CACHE;
 for(;;) {
-c = getopt(argc, argv, "f:ht:dq");
+c = getopt(argc, argv, "f:ht:dpq");
 if (c == -1) {
 break;
 }
@@ -780,11 +786,20 @@ static int img_commit(int argc, char **argv)
 case 'd':
 drop = true;
 break;
+case 'p':
+progress = true;
+break;
 case 'q':
 quiet = true;
 break;
 }
 }
+
+/* Progress is not shown in Quiet mode */
+if (quiet) {
+progress = false;
+}
+
 if (optind != argc - 1) {
 error_exit("Expecting one image file name");
 }
@@ -802,6 +817,9 @@ static int img_commit(int argc, char **argv)
 return 1;
 }
 
+qemu_progress_init(progress, 1.f);
+qemu_progress_print(0.f, 100);
+
 /* This is different from QMP, which by default uses the deepest file in 
the
  * backing chain (i.e., the very base); however, the traditional behavior 
of
  * qemu-img commit is using the immediate backing file. */
@@ -850,6 +868,8 @@ unref_backing:
 }
 
 done:
+qemu_progress_end();
+
 bdrv_unref(bs);
 
 if (local_err) {
diff --git a/qemu-img.texi b/qemu-img.texi
index fd8ecd6..5c151af 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -163,7 +163,7 @@ this case. @var{backing_file} will never be modified unless 
you use the
 The size can also be specified using the @var{size} option with @code{-o},
 it doesn't need to be specified separately in this case.
 
-@item commit [-f @var{fmt}] [-t @var{cache}] [-d] @var{filename}
+@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-d] [-p] @var{filename}
 
 Commit the changes recorded in @var{filename} in its base image or backing 
file.
 If the backing file is smaller than the snapshot, then the backing file will be
-- 
2.0.1




[Qemu-devel] [PATCH v9 13/14] iotests: Add test for qcow2's bdrv_make_empty

2014-07-05 Thread Max Reitz
Add a test for qcow2's fast bdrv_make_empty implementation on images
without internal snapshots.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/098 | 75 ++
 tests/qemu-iotests/098.out | 26 
 tests/qemu-iotests/group   |  1 +
 3 files changed, 102 insertions(+)
 create mode 100755 tests/qemu-iotests/098
 create mode 100644 tests/qemu-iotests/098.out

diff --git a/tests/qemu-iotests/098 b/tests/qemu-iotests/098
new file mode 100755
index 000..116f935
--- /dev/null
+++ b/tests/qemu-iotests/098
@@ -0,0 +1,75 @@
+#!/bin/bash
+#
+# Test qcow2's bdrv_make_empty for images without internal snapshots
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+rm -f "$TEST_DIR/blkdebug.conf"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.pattern
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+
+for event in l1_update reftable_update; do
+
+echo
+echo "=== $event ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img 64M
+_make_test_img -b "$TEST_IMG.base" 64M
+
+cat > "$TEST_DIR/blkdebug.conf" <&1 \
+| _filter_testdir | _filter_imgfmt
+
+_check_test_img
+
+done
+
+
+rm -f "$TEST_DIR/blkdebug.conf"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/098.out b/tests/qemu-iotests/098.out
new file mode 100644
index 000..5f5af1b
--- /dev/null
+++ b/tests/qemu-iotests/098.out
@@ -0,0 +1,26 @@
+QA output created by 098
+
+=== l1_update ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864 
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
backing_file='TEST_DIR/t.IMGFMT.base' 
+qemu-img: Could not empty blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: 
Input/output error
+Leaked cluster 4 refcount=1 reference=0
+Leaked cluster 5 refcount=1 reference=0
+Leaked cluster 6 refcount=1 reference=0
+
+3 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
+
+=== reftable_update ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864 
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
backing_file='TEST_DIR/t.IMGFMT.base' 
+qemu-img: Could not empty blkdebug:TEST_DIR/blkdebug.conf:TEST_DIR/t.IMGFMT: 
Input/output error
+Leaked cluster 3 refcount=1 reference=0
+Leaked cluster 4 refcount=1 reference=0
+Leaked cluster 5 refcount=1 reference=0
+
+3 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index bdf04cd..dfbd0ed 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -101,3 +101,4 @@
 092 rw auto quick
 095 rw auto quick
 097 rw auto backing
+098 rw auto backing quick
-- 
2.0.1




[Qemu-devel] [PATCH v9 08/14] qemu-img: Empty image after commit

2014-07-05 Thread Max Reitz
After the top image has been committed, it should be emptied unless
specified otherwise.

Signed-off-by: Max Reitz 
Reviewed-by: Eric Blake 
---
 qemu-img-cmds.hx |  4 ++--
 qemu-img.c   | 34 +++---
 qemu-img.texi|  6 +-
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index d029609..b31d81c 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -22,9 +22,9 @@ STEXI
 ETEXI
 
 DEF("commit", img_commit,
-"commit [-q] [-f fmt] [-t cache] filename")
+"commit [-q] [-f fmt] [-t cache] [-d] filename")
 STEXI
-@item commit [-q] [-f @var{fmt}] [-t @var{cache}] @var{filename}
+@item commit [-q] [-f @var{fmt}] [-t @var{cache}] [-d] @var{filename}
 ETEXI
 
 DEF("compare", img_compare,
diff --git a/qemu-img.c b/qemu-img.c
index 26d8b11..47cc94e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -755,14 +755,14 @@ static int img_commit(int argc, char **argv)
 int c, ret, flags;
 const char *filename, *fmt, *cache;
 BlockDriverState *bs, *base_bs;
-bool quiet = false;
+bool quiet = false, drop = false;
 Error *local_err = NULL;
 CommonBlockJobCBInfo cbi;
 
 fmt = NULL;
 cache = BDRV_DEFAULT_CACHE;
 for(;;) {
-c = getopt(argc, argv, "f:ht:q");
+c = getopt(argc, argv, "f:ht:dq");
 if (c == -1) {
 break;
 }
@@ -777,6 +777,9 @@ static int img_commit(int argc, char **argv)
 case 't':
 cache = optarg;
 break;
+case 'd':
+drop = true;
+break;
 case 'q':
 quiet = true;
 break;
@@ -787,7 +790,7 @@ static int img_commit(int argc, char **argv)
 }
 filename = argv[optind++];
 
-flags = BDRV_O_RDWR;
+flags = BDRV_O_RDWR | BDRV_O_UNMAP;
 ret = bdrv_parse_cache_flags(cache, &flags);
 if (ret < 0) {
 error_report("Invalid cache option: %s", cache);
@@ -819,7 +822,32 @@ static int img_commit(int argc, char **argv)
 goto done;
 }
 
+/* The block job will swap base_bs and bs (which is not what we really want
+ * here, but okay) and unref base_bs (after the swap, i.e., the old top
+ * image). In order to still be able to empty that top image afterwards,
+ * increment the reference counter here preemptively. */
+if (!drop) {
+bdrv_ref(base_bs);
+}
+
 run_block_job(bs->job, &local_err);
+if (local_err) {
+goto unref_backing;
+}
+
+if (!drop && base_bs->drv->bdrv_make_empty) {
+ret = base_bs->drv->bdrv_make_empty(base_bs);
+if (ret) {
+error_setg_errno(&local_err, -ret, "Could not empty %s",
+ filename);
+goto unref_backing;
+}
+}
+
+unref_backing:
+if (!drop) {
+bdrv_unref(base_bs);
+}
 
 done:
 bdrv_unref(bs);
diff --git a/qemu-img.texi b/qemu-img.texi
index 8496f3b..fd8ecd6 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -163,7 +163,7 @@ this case. @var{backing_file} will never be modified unless 
you use the
 The size can also be specified using the @var{size} option with @code{-o},
 it doesn't need to be specified separately in this case.
 
-@item commit [-f @var{fmt}] [-t @var{cache}] @var{filename}
+@item commit [-f @var{fmt}] [-t @var{cache}] [-d] @var{filename}
 
 Commit the changes recorded in @var{filename} in its base image or backing 
file.
 If the backing file is smaller than the snapshot, then the backing file will be
@@ -172,6 +172,10 @@ the backing file, the backing file will not be truncated.  
If you want the
 backing file to match the size of the smaller snapshot, you can safely truncate
 it yourself once the commit operation successfully completes.
 
+The image @var{filename} is emptied after the operation has succeeded. If you 
do
+not need @var{filename} afterwards and intend to drop it, you may skip emptying
+@var{filename} by specifying the @code{-d} flag.
+
 @item compare [-f @var{fmt}] [-F @var{fmt}] [-p] [-s] [-q] @var{filename1} 
@var{filename2}
 
 Check if two images have the same content. You can compare images with
-- 
2.0.1




[Qemu-devel] [PATCH v9 00/14] qemu-img: Implement commit like QMP

2014-07-05 Thread Max Reitz
qemu-img should use QMP commands whenever possible in order to ensure
feature completeness of both online and offline image operations. For
the "commit" command, this is relatively easy, so implement it first
(in the hope that indeed others will follow).

As qemu-img does not have access to QMP (due to QMP being intertwined
with basically everything in qemu), we cannot directly use QMP, but at
least use the functions the corresponding QMP commands are using (which
would be "block-commit", in this case).


I'll still take a look on Kevin's proposal for patch 3 (make use of lazy
refcounts to simplify emptying a snapshot-less qcow2 image), but I still
think we'll need the size-calculation function anyway; that is why I
sent this version which still uses that function and no lazy refcounts.

In case using lazy refcounts turns out to be vastly easier (which I
actually think it will), I'll send an alternative for patch 3; the other
patches should be untouched by the decision which version is used
(except probably the test from patch 13).


v9: Rebase on master
- Patch 5: block_job_ready() is now called block_job_event_ready() and
  has been changed in the meantime
- Patch 6: rebased on the fixes for zero-length mirrors
- Patch 12: test number 094 is now taken as well, move to 097
- Patch 13:
  - test number 095 is taken, move to 098
  - remove blkdebug.conf when test is done
  - filter out test directory and image format from the commit output


Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/14:[] [--] 'qcow2: Allow "full" discard'
002/14:[] [--] 'qcow2: Implement bdrv_make_empty()'
003/14:[] [--] 'qcow2: Optimize bdrv_make_empty()'
004/14:[] [--] 'blockjob: Introduce block_job_complete_sync()'
005/14:[0004] [FC] 'blockjob: Add "ready" field'
006/14:[0006] [FC] 'block/mirror: Improve progress report'
007/14:[] [--] 'qemu-img: Implement commit like QMP'
008/14:[] [--] 'qemu-img: Empty image after commit'
009/14:[] [--] 'qemu-img: Enable progress output for commit'
010/14:[] [--] 'qemu-img: Specify backing file for commit'
011/14:[] [--] 'iotests: Add _filter_qemu_img_map'
012/14:[0004] [FC] 'iotests: Add test for backing-chain commits'
013/14:[0013] [FC] 'iotests: Add test for qcow2's bdrv_make_empty'
014/14:[] [-C] 'iotests: Omit length/offset test in 040 and 041'


Max Reitz (14):
  qcow2: Allow "full" discard
  qcow2: Implement bdrv_make_empty()
  qcow2: Optimize bdrv_make_empty()
  blockjob: Introduce block_job_complete_sync()
  blockjob: Add "ready" field
  block/mirror: Improve progress report
  qemu-img: Implement commit like QMP
  qemu-img: Empty image after commit
  qemu-img: Enable progress output for commit
  qemu-img: Specify backing file for commit
  iotests: Add _filter_qemu_img_map
  iotests: Add test for backing-chain commits
  iotests: Add test for qcow2's bdrv_make_empty
  iotests: Omit length/offset test in 040 and 041

 block/Makefile.objs  |   2 +-
 block/mirror.c   |  34 ++--
 block/qcow2-cluster.c|  27 ++-
 block/qcow2-snapshot.c   |   2 +-
 block/qcow2.c| 388 ++-
 block/qcow2.h|   2 +-
 blockjob.c   |  42 -
 include/block/blockjob.h |  20 ++
 qapi/block-core.json |   4 +-
 qemu-img-cmds.hx |   4 +-
 qemu-img.c   | 149 ---
 qemu-img.texi|  13 +-
 tests/qemu-iotests/040   |   4 +-
 tests/qemu-iotests/041   |   3 +-
 tests/qemu-iotests/097   | 122 
 tests/qemu-iotests/097.out   | 119 
 tests/qemu-iotests/098   |  75 
 tests/qemu-iotests/098.out   |  26 +++
 tests/qemu-iotests/common.filter |   7 +
 tests/qemu-iotests/group |   2 +
 tests/qemu-iotests/iotests.py|   3 +-
 21 files changed, 981 insertions(+), 67 deletions(-)
 create mode 100755 tests/qemu-iotests/097
 create mode 100644 tests/qemu-iotests/097.out
 create mode 100755 tests/qemu-iotests/098
 create mode 100644 tests/qemu-iotests/098.out

-- 
2.0.1




[Qemu-devel] virtualize sparc developer workstation?

2014-07-05 Thread dennis luehring

i want to virtualize my noisy sparc workstation
with the help of qemu and want to know if its possible to
get an system image to run or what other options available
for sparc virtualization

im developing a low-traffic network communication software
and sparc is one of my test platforms

is the sparc emulation good enough for
network communication tests?

my current system specs:
-SUN Ultra 10 SPARC Workstation
-384 MB RAM
-Operating System: Solaris 10 November 2006 - SunOS Release 5.10 
Generic_118833-33 64-bit

-Desktop: CDE 1.6.3, X11 Version 6.6.2
-Compiler: SunStudio 12 - Sun C/C++ v5.9 2007/05/03 (and gcc v.3.4.6)
-Workstation-Info: SUNW,Ultra-5_10;sparc;sun4u

it would be nice to have the desktop running
but i can compile & test my software fully on command line

thx for any help




[Qemu-devel] virtualize sparc developer workstation?

2014-07-05 Thread dennis luehring

i want to virtualize (under a linux x86 host) my noisy sparc workstation
with the help of qemu and want to know if its possible to
get an system image to run or what other options available
for sparc virtualization

im developing a low-traffic network communication software
and sparc is one of my test platforms

is the sparc emulation good enough for
(low-traffic) network communication tests?

my current system specs:
-SUN Ultra 10 SPARC Workstation
-384 MB RAM
-Operating System: Solaris 10 November 2006 - SunOS Release 5.10
Generic_118833-33 64-bit
-Desktop: CDE 1.6.3, X11 Version 6.6.2
-Compiler: SunStudio 12 - Sun C/C++ v5.9 2007/05/03 (and gcc v.3.4.6)
-Workstation-Info: SUNW,Ultra-5_10;sparc;sun4u

it would be nice to have the desktop running
but i can compile & test my software fully on command line

questions are:
-can i boot an image of my machine into qemu?
-what bios do i need - OpenBios, OpenBoot, original (how can i get this 
from my machine?)

-use qemu git head or other version?

thx for any help





Re: [Qemu-devel] [PATCH] compile QEMU with Xen support on ARM

2014-07-05 Thread Peter Maydell
On 5 July 2014 02:53, Stefano Stabellini
 wrote:
> I admit that this small series was lost in my mailbox. I had an older
> version of the build fixes patch and started from there again.
> Are you OK with v2 as per your link? It applies almost as is, except for
> the renaming xen-all.c -> xen-hvm.c.
> If your are OK with it, I'll rebase and resend with a PULL request?

Yes, that's fine. You can add my
Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [Qemu-devel] [PATCH 00/46] Postcopy implementation

2014-07-05 Thread Paolo Bonzini

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:

   e) I've added a 'migration_set_parameter' command as somewhere to put integer
  parameters associated with migration.
  e.1) And I use that initially for the length of precopy to try.


Could you have instead a "migrate_start_postcopy" command, and leave the 
policy to management instead?


We could also (later) add an event for the end of the migration bulk 
phase, that can help management deciding when to switch?


Paolo



Re: [Qemu-devel] [PATCH 15/46] Rework loadvm path for subloops

2014-07-05 Thread Paolo Bonzini

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:

From: "Dr. David Alan Gilbert" 

Postcopy needs to have two migration streams loading concurrently;
one from memory (with the device state) and the other from the fd
with the memory transactions.


Can you explain this?

I would have though the order is

precopy RAM and everything
prepare postcopy RAM ("sent && dirty" bitmap)
finish precopy non-RAM
finish devices
postcopy RAM

Why do you need to have all the packaging stuff and a separate 
memory-based migration stream for devices?  I'm sure I'm missing 
something. :)


Paolo



Re: [Qemu-devel] [PATCH 31/46] Postcopy: Rework migration thread for postcopy mode

2014-07-05 Thread Paolo Bonzini

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:

From: "Dr. David Alan Gilbert" 

Switch to postcopy if:
   1) There's still a significant amount to transfer
   2) Postcopy is enabled
   3) It's taken longer than the time set by the parameter.

and change the cleanup at the end of migration to match.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration.c | 92 -
 1 file changed, 73 insertions(+), 19 deletions(-)

diff --git a/migration.c b/migration.c
index 0d567ef..c73fcfa 100644
--- a/migration.c
+++ b/migration.c
@@ -982,16 +982,40 @@ static int postcopy_start(MigrationState *ms)
 static void *migration_thread(void *opaque)
 {
 MigrationState *s = opaque;
+/* Used by the bandwidth calcs, updated later */
 int64_t initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+/* Really, the time we started */
+const int64_t initial_time_fixed = initial_time;
 int64_t setup_start = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 int64_t initial_bytes = 0;
 int64_t max_size = 0;
 int64_t start_time = initial_time;
+int64_t pc_start_time;
+
 bool old_vm_running = false;
+pc_start_time = 
s->tunables[MIGRATION_PARAMETER_NAME_X_POSTCOPY_START_TIME];
+
+/* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */
+enum MigrationPhase current_active_type = MIG_STATE_ACTIVE;

 qemu_savevm_state_begin(s->file, &s->params);

+if (migrate_postcopy_ram()) {
+/* Now tell the dest that it should open it's end so it can reply */
+qemu_savevm_send_openrp(s->file);
+
+/* And ask it to send an ack that will make stuff easier to debug */
+qemu_savevm_send_reqack(s->file, 1);
+
+/* Tell the destination that we *might* want to do postcopy later;
+ * if the other end can't do postcopy it should fail now, nice and
+ * early.
+ */
+qemu_savevm_send_postcopy_ram_advise(s->file);
+}
+
 s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
+current_active_type = MIG_STATE_ACTIVE;
 migrate_set_state(s, MIG_STATE_SETUP, MIG_STATE_ACTIVE);

 DPRINTF("setup complete\n");
@@ -1012,37 +1036,66 @@ static void *migration_thread(void *opaque)
 " nonpost=%" PRIu64 ")\n",
 pending_size, max_size, pend_post, pend_nonpost);
 if (pending_size && pending_size >= max_size) {
-qemu_savevm_state_iterate(s->file);
+/* Still a significant amount to transfer */
+
+current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+if (migrate_postcopy_ram() &&
+s->state != MIG_STATE_POSTCOPY_ACTIVE &&
+pend_nonpost == 0 &&
+(current_time >= initial_time_fixed + pc_start_time)) {
+
+if (!postcopy_start(s)) {
+current_active_type = MIG_STATE_POSTCOPY_ACTIVE;
+}
+
+continue;
+} else {


You don't really need the "else" if you have a continue.  However, do 
you need _any_ of the "else" and "continue"?  Would the next iteration 
of the "while" loop do anything else but invoking qemu_savevm_state_iterate.



+/* Just another iteration step */
+qemu_savevm_state_iterate(s->file);
+}
 } else {
 int ret;

-DPRINTF("done iterating\n");
-qemu_mutex_lock_iothread();
-start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
-old_vm_running = runstate_is_running();
-
-ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
-if (ret >= 0) {
-qemu_file_set_rate_limit(s->file, INT64_MAX);
-qemu_savevm_state_complete(s->file);
-}
-qemu_mutex_unlock_iothread();
-
-if (ret < 0) {
-migrate_set_state(s, MIG_STATE_ACTIVE, MIG_STATE_ERROR);
-break;
+DPRINTF("done iterating pending size %" PRIu64 "\n",
+pending_size);
+
+if (s->state == MIG_STATE_ACTIVE) {
+qemu_mutex_lock_iothread();
+start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
+old_vm_running = runstate_is_running();
+
+ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+if (ret >= 0) {
+qemu_file_set_rate_limit(s->file, INT64_MAX);
+qemu_savevm_state_complete(s->file);
+}
+qemu_mutex_unlock_iothread();
+if (ret < 0) {
+   

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets

2014-07-05 Thread Paolo Bonzini

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:

From: "Dr. David Alan Gilbert" 

Postcopy needs a method to send messages from the destination back to
the source, this is the 'return path'.

Wire it up for 'socket' QEMUFile's using a dup'd fd.

Signed-off-by: Dr. David Alan Gilbert 
---
 include/migration/qemu-file.h |  8 +
 qemu-file.c   | 74 +++
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index df38646..ec1a342 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -87,6 +87,11 @@ typedef size_t (QEMURamSaveFunc)(QEMUFile *f, void *opaque,
 #define QEMUFILE_IO_BUF_SIZE 32768
 #define QEMUFILE_MAX_IOV_SIZE MIN(IOV_MAX, 64)

+/*
+ * Return a QEMUFile for comms in the opposite direction
+ */
+typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
+
 typedef struct QEMUFileOps {
 QEMUFilePutBufferFunc *put_buffer;
 QEMUFileGetBufferFunc *get_buffer;
@@ -97,6 +102,7 @@ typedef struct QEMUFileOps {
 QEMURamHookFunc *after_ram_iterate;
 QEMURamHookFunc *hook_ram_load;
 QEMURamSaveFunc *save_page;
+QEMURetPathFunc *get_return_path;
 } QEMUFileOps;

 struct QEMUFile {
@@ -117,6 +123,7 @@ struct QEMUFile {

 int last_error;

+struct QEMUFile *return_path;
 MigrationIncomingState *mis;
 };

@@ -202,6 +209,7 @@ void qemu_file_set_rate_limit(QEMUFile *f, int64_t 
new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 int qemu_file_get_error(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
+QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);

 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
diff --git a/qemu-file.c b/qemu-file.c
index 88cacc7..98a6d2a 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -16,6 +16,54 @@ typedef struct QEMUFileSocket {
 QEMUFile *file;
 } QEMUFileSocket;

+/* Give a QEMUFile* off the same socket but data in the opposite
+ * direction.
+ * qemu_fopen_socket marks write fd's as blocking, but doesn't
+ * touch read fd's status, so we dup the fd just to keep settings
+ * separate. [TBD: Do I need to explicitly mark as non-block on read?]
+ */
+static QEMUFile *socket_dup_return_path(void *opaque)
+{
+QEMUFileSocket *qfs = opaque;
+int revfd;
+bool this_is_read;
+QEMUFile *result;
+
+/* If it's already open, return it */
+if (qfs->file->return_path) {
+return qfs->file->return_path;


Wouldn't this leave a dangling file descriptor if you call 
socket_dup_return_path twice, and then close the original QEMUFile?



+}
+
+if (qemu_file_get_error(qfs->file)) {
+/* If the forward file is in error, don't try and open a return */
+return NULL;
+}
+
+/* I don't think there's a better way to tell which direction 'this' is */
+this_is_read = qfs->file->ops->get_buffer != NULL;
+
+revfd = dup(qfs->fd);
+if (revfd == -1) {
+error_report("Error duplicating fd for return path: %s",
+  strerror(errno));
+return NULL;
+}
+
+qemu_set_nonblock(revfd);


Blocking/nonblocking is per-file *description*, not descriptor.  So 
you're making the original QEMUFile nonblocking as well.  Can you 
explain why this is needed before I reach the meat of the patch series?


In other words, can you draw a table with source/dest and read/write, 
and whether it should be blocking or non-blocking?


Paolo


+result = qemu_fopen_socket(revfd, this_is_read ? "wb" : "rb");
+qfs->file->return_path = result;
+
+if (result) {
+/* We are the reverse path of our reverse path (although I don't
+   expect this to be used, it would stop another dup if it was */
+result->return_path = qfs->file;
+} else {
+close(revfd);
+}
+
+return result;
+}
+
 static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int 
iovcnt,
 int64_t pos)
 {
@@ -313,17 +361,31 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 }

 static const QEMUFileOps socket_read_ops = {
-.get_fd = socket_get_fd,
-.get_buffer = socket_get_buffer,
-.close =  socket_close
+.get_fd  = socket_get_fd,
+.get_buffer  = socket_get_buffer,
+.close   = socket_close,
+.get_return_path = socket_dup_return_path
 };

 static const QEMUFileOps socket_write_ops = {
-.get_fd = socket_get_fd,
-.writev_buffer = socket_writev_buffer,
-.close =  socket_close
+.get_fd  = socket_get_fd,
+.writev_buffer   = socket_writev_buffer,
+.close   = socket_close,
+.get_return_path = socket_dup_return_path
 };

+/*
+ * Result: QEMUFile* for a 'return path' for comms in the opposite direction
+ * NULL if not available
+ */
+QEMUFile *qemu_file_get_return_path(QEMUFile *f)
+{
+if (!f->ops->get_retu

Re: [Qemu-devel] [PATCH 08/46] Return path: socket_writev_buffer: Block even on non-blocking fd's

2014-07-05 Thread Paolo Bonzini

Il 04/07/2014 19:41, Dr. David Alan Gilbert (git) ha scritto:

From: "Dr. David Alan Gilbert" 

The return path uses a non-blocking fd so as not to block waiting
for the (possibly broken) destination to finish returning a message,
however we still want outbound data to behave in the same way and block.

Signed-off-by: Dr. David Alan Gilbert 
---
 qemu-file.c | 39 +++
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/qemu-file.c b/qemu-file.c
index 98a6d2a..9809428 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -70,12 +70,43 @@ static ssize_t socket_writev_buffer(void *opaque, struct 
iovec *iov, int iovcnt,
 QEMUFileSocket *s = opaque;
 ssize_t len;
 ssize_t size = iov_size(iov, iovcnt);
+ssize_t offset = 0;
+int err;

-len = iov_send(s->fd, iov, iovcnt, 0, size);
-if (len < size) {
-len = -socket_error();
+while (size > 0) {
+len = iov_send(s->fd, iov, iovcnt, offset, size);
+
+if (len > 0) {
+size -= len;
+offset += len;
+}
+
+if (size > 0) {
+err = socket_error();
+
+if (err != EAGAIN) {
+error_report("socket_writev_buffer: Got err=%d for (%zd/%zd)",
+ err, size, len);
+/*
+ * If I've already sent some but only just got the error, I
+ * could return the amount validly sent so far and wait for the
+ * next call to report the error, but I'd rather flag the error
+ * immediately.
+ */
+return -err;
+}
+
+/* Emulate blocking */
+GPollFD pfd;
+
+pfd.fd = s->fd;
+pfd.events = G_IO_OUT | G_IO_ERR;
+pfd.revents = 0;
+g_poll(&pfd, 1 /* 1 fd */, -1 /* no timeout */);
+}
 }
-return len;
+
+return offset;
 }

 static int socket_get_fd(void *opaque)



I guess the table I just asked about would help clarifying this as well. 
 Also note that g_poll doesn't work on sockets on Windows, but I hope 
we can avoid it altogether. :)


Paolo



Re: [Qemu-devel] [PATCH for-2.1] target-i386: Add "kvmclock-stable-bit" feature bit name

2014-07-05 Thread Paolo Bonzini

Il 04/07/2014 21:44, Eduardo Habkost ha scritto:

KVM_FEATURE_CLOCKSOURCE_STABLE_BIT is enabled by default and supported
by KVM. But not having a name defined makes QEMU treat it as an unknown
and unmigratable feature flag (as any unknown feature may possibly
require state to be migrated), and disable it by default on "-cpu host".

As a side-effect, the new name also makes the flag configurable,
allowing the user to disable it (which may be useful for testing or for
compatibility with old kernels).

Signed-off-by: Eduardo Habkost 
---
 target-i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 45c662d..6d008ab 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -241,7 +241,7 @@ static const char *kvm_feature_name[] = {
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
-NULL, NULL, NULL, NULL,
+"kvmclock-stable-bit", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 };




Applied to uq/master, thanks.

Paolo



Re: [Qemu-devel] [PATCH] qmp: show QOM properties in device-list-properties

2014-07-05 Thread Paolo Bonzini

Il 20/05/2014 14:29, Stefan Hajnoczi ha scritto:

Devices can use a mix of qdev and QOM properties.  Currently only the
qdev properties are displayed by device-list-properties.

This patch extends the property enumeration algorithm to also display
QOM properties (excluding the implicit "type", "realized",
"hotpluggable", and "parent_bus" properties).

When a qdev property exists, use the qdev type name to preserve
backwards compatibility.  QOM type names can be different for bool (qdev
on/off) and str (used by qdev pointers).

Signed-off-by: Stefan Hajnoczi 
---
Here is a demo:

$ cat test.py
#!/usr/bin/env python
import os
import sys; sys.path.append(os.path.join(os.path.dirname(__file__), 'tests', 
'qemu-iotests'))
import iotests

iotests.qemu_args = ['x86_64-softmmu/qemu-system-x86_64']

vm = iotests.VM()
vm.launch()
print vm.qmp('device-list-properties', typename='e1000')
print vm.qmp('device-list-properties', typename='virtio-blk-pci')
print vm.qmp('device-list-properties', typename='virtio-scsi-pci')
vm.shutdown()

$ ./test.py
{u'return': [{u'type': u'on/off', u'name': u'command_serr_enable'}, {u'type': 
u'on/off', u'name': u'multifunction'}, {u'type': u'uint32', u'name': 
u'rombar'}, {u'type': u'str', u'name': u'romfile'}, {u'type': u'pci-devfn', 
u'name': u'addr'}, {u'type': u'on/off', u'name': u'mitigation'}, {u'type': 
u'on/off', u'name': u'autonegotiation'}, {u'type': u'int32', u'name': 
u'bootindex'}, {u'type': u'netdev', u'name': u'netdev'}, {u'type': u'vlan', 
u'name': u'vlan'}, {u'type': u'macaddr', u'name': u'mac'}]}
{u'return': [{u'type': u'child', u'name': 
u'virtio-backend'}, {u'type': u'on/off', u'name': u'command_serr_enable'}, {u'type': 
u'on/off', u'name': u'multifunction'}, {u'type': u'uint32', u'name': u'rombar'}, 
{u'type': u'str', u'name': u'romfile'}, {u'type': u'pci-devfn', u'name': u'addr'}, 
{u'type': u'iothread', u'name': u'x-iothread'}, {u'type': u'on/off', u'name': 
u'scsi'}, {u'type': u'on/off', u'name': u'config-wce'}, {u'type': u'str', u'name': 
u'serial'}, {u'type': u'uint32', u'name': u'secs'}, {u'type': u'uint32', u'name': 
u'heads'}, {u'type': u'uint32', u'name': u'cyls'}, {u'type': u'uint32', u'name': 
u'discard_granularity'}, {u'type': u'int32', u'name': u'bootindex'}, {u'type': 
u'uint32', u'name': u'opt_io_size'}, {u'type': u'uint16', u'name': u'min_io_size'}, 
{u'
 type': u'blocksize', u'name': u'physical_block_size'}, {u'type': u'blocksize', 
u'name': u'logical_block_size'}, {u'type': u'drive', u'name': u'drive'}, 
{u'type': u'on/off', u'name': u'event!
 _idx'}, {u'type': u'on/off', u'name': u'indirect_desc'}, {u'type': u'on/off', 
u'name': u'x-data-plane'}, {u'type': u'uint32', u'name': u'vectors'}, {u'type': 
u'on/off', u'name': u'ioeventfd'}, {u'type': u'uint32', u'name': u'class'}]}
{u'return': [{u'type': u'child', u'name': 
u'virtio-backend'}, {u'type': u'on/off', u'name': u'command_serr_enable'}, {u'type': 
u'on/off', u'name': u'multifunction'}, {u'type': u'uint32', u'name': u'rombar'}, 
{u'type': u'str', u'name': u'romfile'}, {u'type': u'pci-devfn', u'name': u'addr'}, 
{u'type': u'uint32', u'name': u'cmd_per_lun'}, {u'type': u'uint32', u'name': 
u'max_sectors'}, {u'type': u'uint32', u'name': u'num_queues'}, {u'type': u'on/off', 
u'name': u'param_change'}, {u'type': u'on/off', u'name': u'hotplug'}, {u'type': 
u'on/off', u'name': u'event_idx'}, {u'type': u'on/off', u'name': u'indirect_desc'}, 
{u'type': u'uint32', u'name': u'vectors'}, {u'type': u'on/off', u'name': 
u'ioeventfd'}]}

 qmp.c | 99 +++
 1 file changed, 76 insertions(+), 23 deletions(-)

diff --git a/qmp.c b/qmp.c
index a7f432b..cb2577f 100644
--- a/qmp.c
+++ b/qmp.c
@@ -431,11 +431,57 @@ ObjectTypeInfoList *qmp_qom_list_types(bool 
has_implements,
 return ret;
 }

+/* Return a DevicePropertyInfo for a qdev property.
+ *
+ * If a qdev property with the given name does not exist, use the given default
+ * type.  If the qdev property info should not be shown, return NULL.
+ *
+ * The caller must free the return value.
+ */
+static DevicePropertyInfo *make_device_property_info(ObjectClass *klass,
+ const char *name,
+ const char *default_type)
+{
+DevicePropertyInfo *info;
+Property *prop;
+
+do {
+for (prop = DEVICE_CLASS(klass)->props; prop && prop->name; prop++) {
+if (strcmp(name, prop->name) != 0) {
+continue;
+}
+
+/*
+ * TODO Properties without a parser are just for dirty hacks.
+ * qdev_prop_ptr is the only such PropertyInfo.  It's marked
+ * for removal.  This conditional should be removed along with
+ * it.
+ */
+if (!prop->info->set) {
+return NULL;   /* no way to set it, don't show */
+}
+
+info = g_malloc0(sizeof(*info));
+