date:20160722

Re: [Qemu-devel] [RFC v1 13/13] target-ppc: introduce opc4 for Expanded Opcode

2016-07-22 Thread Nikunj A Dadhania

David Gibson  writes:

> [ Unknown signature status ]
> On Fri, Jul 22, 2016 at 11:05:54AM +0530, Nikunj A Dadhania wrote:
>> David Gibson  writes:
>> 
>> > [ Unknown signature status ]
>> > On Mon, Jul 18, 2016 at 10:35:17PM +0530, Nikunj A Dadhania wrote:
>> >> ISA 3.0 has introduced EO - Expanded Opcode. Introduce third level
>> >> indirect opcode table and corresponding parsing routines.
>> >> 
>> >> EO (11:12) Expanded opcode field
>> >> Formats: XX1
>> >> 
>> >> EO (11:15) Expanded opcode field
>> >> Formats: VX, X, XX2
>> >> 
>> >> Signed-off-by: Nikunj A Dadhania 
>> >> ---
>> >>  target-ppc/translate.c  |  73 +--
>> >>  target-ppc/translate_init.c | 103 
>> >> 
>> >>  2 files changed, 136 insertions(+), 40 deletions(-)
>> >> 
>> >> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
>> >> index 6c5a4a6..733d68d 100644
>> >> --- a/target-ppc/translate.c
>> >> +++ b/target-ppc/translate.c
>> >> @@ -40,6 +40,7 @@
>> >>  /* Include definitions for instructions classes and implementations 
>> >> flags */
>> >>  //#define PPC_DEBUG_DISAS
>> >>  //#define DO_PPC_STATISTICS
>> >> +//#define PPC_DUMP_CPU
>> >>  
>> >>  #ifdef PPC_DEBUG_DISAS
>> >>  #  define LOG_DISAS(...) qemu_log_mask(CPU_LOG_TB_IN_ASM, ## __VA_ARGS__)
>> >> @@ -367,12 +368,15 @@ GEN_OPCODE2(name, onam, opc1, opc2, opc3, inval, 
>> >> type, PPC_NONE)
>> >>  #define GEN_HANDLER2_E(name, onam, opc1, opc2, opc3, inval, type, type2) 
>> >>  \
>> >>  GEN_OPCODE2(name, onam, opc1, opc2, opc3, inval, type, type2)
>> >>  
>> >> +#define GEN_HANDLER_E_2(name, opc1, opc2, opc3, opc4, inval, type, 
>> >> type2) \
>> >> +GEN_OPCODE3(name, opc1, opc2, opc3, opc4, inval, type, type2)
>> >> +
>> >>  typedef struct opcode_t {
>> >> -unsigned char opc1, opc2, opc3;
>> >> +unsigned char opc1, opc2, opc3, opc4;
>> >>  #if HOST_LONG_BITS == 64 /* Explicitly align to 64 bits */
>> >> -unsigned char pad[5];
>> >> +unsigned char pad[4];
>> >>  #else
>> >> -unsigned char pad[1];
>> >> +unsigned char pad[4]; /* 4-byte pad to maintain pad in opcode table 
>> >> */
>> >
>> > IIUC the point here is to align entries to the wordsize.  If the
>> > worsize is 32-bit you shouldn't need any extra padding here.
>> 
>> You are right, the reason I had added this here is to keep the code
>> clean in the GEN_OPCODEx
>> 
>> #define GEN_OPCODE(name, op1, op2, op3, op4, invl, _typ, _typ2)   \
>> {  \
>> .opc1 = op1,   \
>> .opc2 = op2,   \
>> .opc3 = op3,   \
>> .opc4 = 0xff,  \
>> #if HOST_LONG_BITS == 64   \
>> .pad  = { 0, },\
>> #endif \
>
> Hrm.. you're using C99 designated initializers, which means I'm pretty
> sure you can just leave out the pad field, since you don't care about
> it's value.  That should avoid the need for an ifdef.

Sure then, will update accordingly.

Regards,
Nikunj

Re: [Qemu-devel] [PATCH] spapr: fix spapr-nvram migration

2016-07-22 Thread David Gibson

On Thu, Jul 21, 2016 at 02:05:46PM +0200, Laurent Vivier wrote:
> When spapr-nvram is backed by a file using pflash interface,
> migration fails on the destination guest with assert:
> 
> bdrv_co_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
> 
> This avoids the problem by delaying the pflash update until after
> the device loads complete.
> 
> This fix is similar to the one for the pflash_cfi01 migration:
> 
> 90c647d Fix pflash migration
> 
> Signed-off-by: Laurent Vivier 

It's a bit sad we lose the error checking, but I can't see any easy
way around that, so I've applied to ppc-for-2.7.

> ---
>  hw/nvram/spapr_nvram.c | 23 +++
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/nvram/spapr_nvram.c b/hw/nvram/spapr_nvram.c
> index 019f25d..4de5f70 100644
> --- a/hw/nvram/spapr_nvram.c
> +++ b/hw/nvram/spapr_nvram.c
> @@ -39,6 +39,7 @@ typedef struct sPAPRNVRAM {
>  uint32_t size;
>  uint8_t *buf;
>  BlockBackend *blk;
> +VMChangeStateEntry *vmstate;
>  } sPAPRNVRAM;
>  
>  #define TYPE_VIO_SPAPR_NVRAM "spapr-nvram"
> @@ -185,19 +186,25 @@ static int spapr_nvram_pre_load(void *opaque)
>  return 0;
>  }
>  
> +static void postload_update_cb(void *opaque, int running, RunState state)
> +{
> +sPAPRNVRAM *nvram = opaque;
> +
> +/* This is called after bdrv_invalidate_cache_all.  */
> +
> +qemu_del_vm_change_state_handler(nvram->vmstate);
> +nvram->vmstate = NULL;
> +
> +blk_pwrite(nvram->blk, 0, nvram->buf, nvram->size, 0);
> +}
> +
>  static int spapr_nvram_post_load(void *opaque, int version_id)
>  {
>  sPAPRNVRAM *nvram = VIO_SPAPR_NVRAM(opaque);
>  
>  if (nvram->blk) {
> -int alen = blk_pwrite(nvram->blk, 0, nvram->buf, nvram->size, 0);
> -
> -if (alen < 0) {
> -return alen;
> -}
> -if (alen != nvram->size) {
> -return -1;
> -}
> +nvram->vmstate = qemu_add_vm_change_state_handler(postload_update_cb,
> +  nvram);
>  }
>  
>  return 0;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2] test: port postcopy test to ppc64

2016-07-22 Thread David Gibson

On Thu, Jul 21, 2016 at 06:47:56PM +0200, Laurent Vivier wrote:
> As userfaultfd syscall is available on powerpc, migration
> postcopy can be used.
> 
> This patch adds the support needed to test this on powerpc,
> instead of using a bootsector to run code to modify memory,
> we use a FORTH script in "boot-command" property.
> 
> As spapr machine doesn't support "-prom-env" argument
> (the nvram is initialized by SLOF and not by QEMU),
> "boot-command" is provided to SLOF via a file mapped nvram
> (with "-drive file=...,if=pflash")
> 
> Signed-off-by: Laurent Vivier 
> ---
> v2: move FORTH script directly in sprintf()
> use openbios_firmware_abi.h
> remove useless "default" case
> 
>  tests/Makefile.include |   1 +
>  tests/postcopy-test.c  | 116 
> +
>  2 files changed, 98 insertions(+), 19 deletions(-)

There's a mostly cosmetic problem with this.  If you run make check
for a ppc64 target on an x86 machine, you get:

GTESTER check-qtest-ppc64
"kvm" accelerator not found.
"kvm" accelerator not found.

> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index e7e50d6..e2d1885 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -268,6 +268,7 @@ check-qtest-sparc-y += tests/prom-env-test$(EXESUF)
>  #check-qtest-sparc64-y += tests/prom-env-test$(EXESUF)
>  check-qtest-microblazeel-y = $(check-qtest-microblaze-y)
>  check-qtest-xtensaeb-y = $(check-qtest-xtensa-y)
> +check-qtest-ppc64-y += tests/postcopy-test$(EXESUF)
>  
>  check-qtest-generic-y += tests/qom-test$(EXESUF)
>  
> diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
> index 16465ab..229e9e9 100644
> --- a/tests/postcopy-test.c
> +++ b/tests/postcopy-test.c
> @@ -18,6 +18,9 @@
>  #include "qemu/sockets.h"
>  #include "sysemu/char.h"
>  #include "sysemu/sysemu.h"
> +#include "hw/nvram/openbios_firmware_abi.h"
> +
> +#define MIN_NVRAM_SIZE 8192 /* from spapr_nvram.c */
>  
>  const unsigned start_address = 1024 * 1024;
>  const unsigned end_address = 100 * 1024 * 1024;
> @@ -122,6 +125,44 @@ unsigned char bootsect[] = {
>0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x55, 0xaa
>  };
>  
> +static void init_bootfile_x86(const char *bootpath)
> +{
> +FILE *bootfile = fopen(bootpath, "wb");
> +
> +g_assert_cmpint(fwrite(bootsect, 512, 1, bootfile), ==, 1);
> +fclose(bootfile);
> +}
> +
> +static void init_bootfile_ppc(const char *bootpath)
> +{
> +FILE *bootfile;
> +char buf[MIN_NVRAM_SIZE];
> +struct OpenBIOS_nvpart_v1 *header = (struct OpenBIOS_nvpart_v1 *)buf;
> +
> +memset(buf, 0, MIN_NVRAM_SIZE);
> +
> +/* Create a "common" partition in nvram to store boot-command property */
> +
> +header->signature = OPENBIOS_PART_SYSTEM;
> +memcpy(header->name, "common", 6);
> +OpenBIOS_finish_partition(header, MIN_NVRAM_SIZE);
> +
> +/* FW_MAX_SIZE is 4MB, but slof.bin is only 900KB,
> + * so let's modify memory between 1MB and 100MB
> + * to do like PC bootsector
> + */
> +
> +sprintf(buf + 16,
> +"boot-command=hex .\" _\" begin %x %x do i c@ 1 + i c! 1000 
> +loop "
> +".\" B\" 0 until", end_address, start_address);
> +
> +/* Write partition to the NVRAM file */
> +
> +bootfile = fopen(bootpath, "wb");
> +g_assert_cmpint(fwrite(buf, MIN_NVRAM_SIZE, 1, bootfile), ==, 1);
> +fclose(bootfile);
> +}
> +
>  /*
>   * Wait for some output in the serial output file,
>   * we get an 'A' followed by an endless string of 'B's
> @@ -131,10 +172,29 @@ static void wait_for_serial(const char *side)
>  {
>  char *serialpath = g_strdup_printf("%s/%s", tmpfs, side);
>  FILE *serialfile = fopen(serialpath, "r");
> +const char *arch = qtest_get_arch();
> +int started = (strcmp(side, "src_serial") == 0 &&
> +   strcmp(arch, "ppc64") == 0) ? 0 : 1;
>  
>  do {
>  int readvalue = fgetc(serialfile);
>  
> +if (!started) {
> +/* SLOF prints its banner before starting test,
> + * to ignore it, mark the start of the test with '_',
> + * ignore all characters until this marker
> + */
> +switch (readvalue) {
> +case '_':
> +started = 1;
> +break;
> +case EOF:
> +fseek(serialfile, 0, SEEK_SET);
> +usleep(1000);
> +break;
> +}
> +continue;
> +}
>  switch (readvalue) {
>  case 'A':
>  /* Fine */
> @@ -147,6 +207,8 @@ static void wait_for_serial(const char *side)
>  return;
>  
>  case EOF:
> +started = (strcmp(side, "src_serial") == 0 &&
> +   strcmp(arch, "ppc64") == 0) ? 0 : 1;
>  fseek(serialfile, 0, SEEK_SET);
>  usleep(1000);
>  break;
> @@ -295,32 +357,48 @@ static void test_migrate(void)
>  char *uri = g_strdup_printf("unix:%

Re: [Qemu-devel] [PATCH v2 02/12] qapi-schema: add 'device_add'

2016-07-22 Thread Marc-André Lureau

Hi

On Fri, Jul 22, 2016 at 12:44 AM, Eric Blake  wrote:
> On 07/21/2016 08:00 AM, marcandre.lur...@redhat.com wrote:
>> From: Marc-André Lureau 
>>
>> Even though device_add is not fully qapi'fied, we may add it to the json
>> schema with 'gen': false, so registration and documentation can be
>> generated.
>>
>> Signed-off-by: Marc-André Lureau 
>> ---
>>  qapi-schema.json | 29 +
>>  1 file changed, 29 insertions(+)
>
>
>> +++ b/qapi-schema.json
>> @@ -2200,6 +2200,35 @@
>>  ##
>>  { 'command': 'xen-set-global-dirty-log', 'data': { 'enable': 'bool' } }
>>
>> +##
>> +# @device_add:
>> +#
>> +# @driver: the name of the new device's driver
>> +# @bus: #optional the device's parent bus (device tree path)
>> +# @id: the device's ID, must be unique
>> +# @props: #optional a dictionary of properties to be passed to the backend
>> +#
>> +# Add a device.
>> +#
>> +# Notes:
>> +# 1. For detailed information about this command, please refer to the
>> +#'docs/qdev-device-use.txt' file.
>> +#
>> +# 2. It's possible to list device properties by running QEMU with the
>> +#"-device DEVICE,help" command-line argument, where DEVICE is the
>> +#device's name
>> +#
>> +# Example:
>> +#
>> +# -> { "execute": "device_add",
>> +#  "arguments": { "driver": "e1000", "id": "net1" } }
>
> Is it worth an example that includes 'bus' and/or 'props'?

done

>
>> +# <- { "return": {} }
>> +#
>> +# Since: 0.13
>> +##
>> +{ 'command': 'device_add',
>> +  'data': {'driver': 'str', 'id': 'str'}, 'gen': false }
>
> The documentation mentions fields not listed here, but the 'gen':false
> explains why. We may yet get device_add QAPIfied for 2.8, but there's
> nothing wrong with documenting things now.
>
> Reviewed-by: Eric Blake 
>
>
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>



-- 
Marc-André Lureau

Re: [Qemu-devel] [RFC v1 05/13] target-ppc: add modulo word operations

2016-07-22 Thread David Gibson

On Fri, Jul 22, 2016 at 12:24:55PM +0530, Nikunj A Dadhania wrote:
> David Gibson  writes:
> 
> > [ Unknown signature status ]
> > On Fri, Jul 22, 2016 at 10:59:18AM +0530, Nikunj A Dadhania wrote:
> >> David Gibson  writes:
> >> 
> >> > [ Unknown signature status ]
> >> > On Mon, Jul 18, 2016 at 10:35:09PM +0530, Nikunj A Dadhania wrote:
> >> >> Adding following instructions:
> >> >> 
> >> >> moduw: Modulo Unsigned Word
> >> >> modsw: Modulo Signed Word
> >> >> 
> >> >> Signed-off-by: Nikunj A Dadhania 
> >> >
> >> > As rth has already mentioned this many branches probably means this
> >> > wants a helper.
> >> >
> >> >> ---
> >> >>  target-ppc/translate.c | 48 
> >> >> 
> >> >>  1 file changed, 48 insertions(+)
> >> >> 
> >> >> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> >> >> index d44f7af..487dd94 100644
> >> >> --- a/target-ppc/translate.c
> >> >> +++ b/target-ppc/translate.c
> >> >> @@ -1178,6 +1178,52 @@ GEN_DIVE(divde, divde, 0);
> >> >>  GEN_DIVE(divdeo, divde, 1);
> >> >>  #endif
> >> >>  
> >> >> +static inline void gen_op_arith_modw(DisasContext *ctx, TCGv ret, TCGv 
> >> >> arg1,
> >> >> + TCGv arg2, int sign)
> >> >> +{
> >> >> +TCGLabel *l1 = gen_new_label();
> >> >> +TCGLabel *l2 = gen_new_label();
> >> >> +TCGv_i32 t0 = tcg_temp_local_new_i32();
> >> >> +TCGv_i32 t1 = tcg_temp_local_new_i32();
> >> >> +TCGv_i32 t2 = tcg_temp_local_new_i32();
> >> >> +
> >> >> +tcg_gen_trunc_tl_i32(t0, arg1);
> >> >> +tcg_gen_trunc_tl_i32(t1, arg2);
> >> >> +tcg_gen_brcondi_i32(TCG_COND_EQ, t1, 0, l1);
> >> 
> >> Result for:
> >>  % 0 and ...
> >> 
> >> >> +if (sign) {
> >> >> +TCGLabel *l3 = gen_new_label();
> >> >> +tcg_gen_brcondi_i32(TCG_COND_NE, t1, -1, l3);
> >> >> +tcg_gen_brcondi_i32(TCG_COND_EQ, t0, INT32_MIN, l1);
> >> >> +gen_set_label(l3);
> >> >
> >> > It's not really clear to be what the logic above is doing.
> >> 
> >> ... For signed case
> >> 0x8000_ % -1
> >> 
> >> Is undefined, addressing those cases.
> >
> > Do you mean the tcg operations have undefined results or that the ppc
> > instructions have undefined results?
> 
> TCG side, I haven't tried.
> 
> > If the latter, then why do you care about those cases?
> 
> Thats how divd is implemented as well, i didn't want to break that. I am
> looking at doing both div and mod as helpers.
> 
> >> >> +tcg_gen_rem_i32(t2, t0, t1);
> >> >> +} else {
> >> >> +tcg_gen_remu_i32(t2, t0, t1);
> >> >> +}
> >> >> +tcg_gen_br(l2);
> >> >> +gen_set_label(l1);
> >> >> +if (sign) {
> >> >> +tcg_gen_sari_i32(t2, t0, 31);
> >> >
> >> > AFAICT this sets t2 to either 0 or -1 depending on the sign of t0,
> >> > which seems like an odd thing to do.
> >> 
> >> Extending the sign later ...
> >
> > Right, so after sign extension you have a 64-bit 0 or -1.  Still not
> > seeing what that 0 or -1 result is useful for.
> 
> Oh ok, i got why you got confused. I am re-writing all of it though, but
> for understanding:
> 
>   if (divisor == 0)
>  goto l1;
> 
>   if (signed) {
>  if (divisor == -1 && dividend == INT_MIN)
> goto l1;
>  compute_signed_rem(t2, t0, t1);
>   } else {
>  compute_unsigned_rem(t2, t0, t1);  
>   }
>   goto l2; /* jump to setting extending result and return */
> 
> l1: /* in case of invalid input set values */
>   if (signed)
>  t2 = -1 or 0;
>   else
>  t2 = 0;

Ok, so why do you ever need different result values in the case of
invalid input?  Why is always returning 0 not good enough?

> l2:
>   set (ret, t2)
> 
> Regards
> Nikunj
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 6/8] spapr: init CPUState->cpu_index with index relative to core-id

2016-07-22 Thread David Gibson

On Fri, Jul 22, 2016 at 11:40:03AM +0530, Bharata B Rao wrote:
> On Fri, Jul 22, 2016 at 01:23:01PM +1000, David Gibson wrote:
> > On Thu, Jul 21, 2016 at 05:54:37PM +0200, Igor Mammedov wrote:
> > > It will enshure that cpu_index for a given cpu stays the same
> > > regardless of the order cpus has been created/deleted and so
> > > it would be possible to migrate QEMU instance with out of order
> > > created CPU.
> > > 
> > > Signed-off-by: Igor Mammedov 
> > 
> > So, this isn't quite right (it wasn't right in my version either).
> > 
> > The problem occurs when smp_threads < kvmppc_smt_threads().  That is,
> > when the requested threads-per-core is less than the hardware's
> > maximum number of threads-per-core.
> > 
> > The core-id values are assigned essentially as i *
> > kvmppc_smt_threads(), meaning the patch below will leave gaps in the
> > cpu_index values and the last ones will exceed max_cpus, causing other
> > problems.
> 
> This would lead to hotplug failures as cpu_dt_id is still being
> derived from non-contiguous cpu_index resulting in wrong enumeration
> of CPU nodes in DT.

Which "This" are you referring to?

> 
> For -smp 8,threads=4 we see the following CPU nodes in DT
> 
> PowerPC,POWER8@0 PowerPC,POWER8@10
> 
> which otherwise should have been
> 
> PowerPC,POWER8@0 PowerPC,POWER8@8
> 
> The problem manifests as drmgr failure.
> 
> Greg's patchset that moved cpu_dt_id setting to machine code and that
> derived cpu_dt_id from core-id for sPAPR would be needed to fix this
> I guess.

No, it shouldn't be necessary to fix it.  But we certainly do want to
clean this stuff up.  I'm not terribly convinced by the current
approach in Greg's series though.  I'd actually prefer to remove
cpu_dt_id from the cpustate entirely and instead work it out from the
(now stable) cpu index when we go to construct the device tree.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 1/2] qdev: ignore GlobalProperty.errp for hotplugged devices

2016-07-22 Thread Greg Kurz

On Fri, 22 Jul 2016 11:28:48 +1000
David Gibson  wrote:

> On Fri, Jul 22, 2016 at 01:01:26AM +0200, Greg Kurz wrote:
> > This patch ensures QEMU won't terminate while hotplugging a device if the
> > global property cannot be set and errp points to error_fatal or error_abort.
> > 
> > While here, it also fixes indentation of the typename argument.
> > 
> > Suggested-by: Eduardo Habkost 
> > Signed-off-by: Greg Kurz   
> 
> This seems kind of bogus to me - we have this whole infrastructure for
> handling errors, and here we throw it away.
> 
> It seems like the right solution would be to make the caller in the
> hotplug case *not* use error_abort or error_fatal, and instead get the
> error propagated back to the monitor which will display it.
> 

The caller is QOM initialization here. Are you asking to add an errp argument
to object_initialize() and friends ?

> > ---
> >  hw/core/qdev-properties.c |4 ++--
> >  include/hw/qdev-core.h|4 +++-
> >  2 files changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> > index 14e544ab17d2..311af6da7684 100644
> > --- a/hw/core/qdev-properties.c
> > +++ b/hw/core/qdev-properties.c
> > @@ -1084,7 +1084,7 @@ int qdev_prop_check_globals(void)
> >  }
> >  
> >  static void qdev_prop_set_globals_for_type(DeviceState *dev,
> > -const char *typename)
> > +   const char *typename)
> >  {
> >  GList *l;
> >  
> > @@ -1100,7 +1100,7 @@ static void 
> > qdev_prop_set_globals_for_type(DeviceState *dev,
> >  if (err != NULL) {
> >  error_prepend(&err, "can't apply global %s.%s=%s: ",
> >prop->driver, prop->property, prop->value);
> > -if (prop->errp) {
> > +if (!dev->hotplugged && prop->errp) {
> >  error_propagate(prop->errp, err);
> >  } else {
> >  assert(prop->user_provided);
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 1d1f8612a9b8..4b4b33bec885 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -261,7 +261,9 @@ struct PropertyInfo {
> >   * @used: Set to true if property was used when initializing a device.
> >   * @errp: Error destination, used like first argument of error_setg()
> >   *in case property setting fails later. If @errp is NULL, we
> > - *print warnings instead of ignoring errors silently.
> > + *print warnings instead of ignoring errors silently. For
> > + *hotplugged devices, errp is always ignored and warnings are
> > + *printed instead.
> >   */
> >  typedef struct GlobalProperty {
> >  const char *driver;
> >   
> 



pgpHBMa_UsqG_.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 06/12] monitor: remove mhandler.cmd_new

2016-07-22 Thread Marc-André Lureau

Hi

On Fri, Jul 22, 2016 at 1:02 AM, Eric Blake  wrote:
> Might be worth s/correspoding/corresponding/ while touching this file.

sure, (although this block goes away with the last patch)


-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 6/8] spapr: init CPUState->cpu_index with index relative to core-id

2016-07-22 Thread Bharata B Rao

On Fri, Jul 22, 2016 at 05:14:33PM +1000, David Gibson wrote:
> On Fri, Jul 22, 2016 at 11:40:03AM +0530, Bharata B Rao wrote:
> > On Fri, Jul 22, 2016 at 01:23:01PM +1000, David Gibson wrote:
> > > On Thu, Jul 21, 2016 at 05:54:37PM +0200, Igor Mammedov wrote:
> > > > It will enshure that cpu_index for a given cpu stays the same
> > > > regardless of the order cpus has been created/deleted and so
> > > > it would be possible to migrate QEMU instance with out of order
> > > > created CPU.
> > > > 
> > > > Signed-off-by: Igor Mammedov 
> > > 
> > > So, this isn't quite right (it wasn't right in my version either).
> > > 
> > > The problem occurs when smp_threads < kvmppc_smt_threads().  That is,
> > > when the requested threads-per-core is less than the hardware's
> > > maximum number of threads-per-core.
> > > 
> > > The core-id values are assigned essentially as i *
> > > kvmppc_smt_threads(), meaning the patch below will leave gaps in the
> > > cpu_index values and the last ones will exceed max_cpus, causing other
> > > problems.
> > 
> > This would lead to hotplug failures as cpu_dt_id is still being
> > derived from non-contiguous cpu_index resulting in wrong enumeration
> > of CPU nodes in DT.
> 
> Which "This" are you referring to?

:) Gaps in cpu_index values due to which cpu_dt_id gets calculated wrongly.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH v2] test: port postcopy test to ppc64

2016-07-22 Thread Laurent Vivier



On 22/07/2016 08:43, David Gibson wrote:
> On Thu, Jul 21, 2016 at 06:47:56PM +0200, Laurent Vivier wrote:
>> As userfaultfd syscall is available on powerpc, migration
>> postcopy can be used.
>>
>> This patch adds the support needed to test this on powerpc,
>> instead of using a bootsector to run code to modify memory,
>> we use a FORTH script in "boot-command" property.
>>
>> As spapr machine doesn't support "-prom-env" argument
>> (the nvram is initialized by SLOF and not by QEMU),
>> "boot-command" is provided to SLOF via a file mapped nvram
>> (with "-drive file=...,if=pflash")
>>
>> Signed-off-by: Laurent Vivier 
>> ---
>> v2: move FORTH script directly in sprintf()
>> use openbios_firmware_abi.h
>> remove useless "default" case
>>
>>  tests/Makefile.include |   1 +
>>  tests/postcopy-test.c  | 116 
>> +
>>  2 files changed, 98 insertions(+), 19 deletions(-)
> 
> There's a mostly cosmetic problem with this.  If you run make check
> for a ppc64 target on an x86 machine, you get:
> 
> GTESTER check-qtest-ppc64
> "kvm" accelerator not found.
> "kvm" accelerator not found.

I think this is because of "-machine accel=kvm:tcg", it tries to use kvm
and fall back to tcg.

accel.c:

 80 void configure_accelerator(MachineState *ms)
 81 {
...
100 acc = accel_find(buf);
101 if (!acc) {
102 fprintf(stderr, "\"%s\" accelerator not found.\n", buf);
103 continue;
104 }

We can remove the "-machine" argument to use the default instead (tcg or
kvm).

Laurent

Re: [Qemu-devel] [PATCH v2 09/12] qapi: remove the "middle" mode

2016-07-22 Thread Marc-André Lureau

Hi

On Fri, Jul 22, 2016 at 2:55 AM, Eric Blake  wrote:
> On 07/21/2016 08:00 AM, marcandre.lur...@redhat.com wrote:
>> From: Marc-André Lureau 
>>
>> Now that the register function is always generated, we can
>> remove the so-called "middle" mode from the generator script.
>>
>> Signed-off-by: Marc-André Lureau 
>> ---
>>  scripts/qapi-commands.py | 29 +
>>  1 file changed, 5 insertions(+), 24 deletions(-)
>>
>> diff --git a/scripts/qapi-commands.py b/scripts/qapi-commands.py
>> index a06a2c4..4754ae0 100644
>> --- a/scripts/qapi-commands.py
>> +++ b/scripts/qapi-commands.py
>> @@ -84,17 +84,8 @@ static void qmp_marshal_output_%(c_name)s(%(c_type)s 
>> ret_in, QObject **ret_out,
>>
>>
>>  def gen_marshal_proto(name):
>> -ret = 'void qmp_marshal_%s(QDict *args, QObject **ret, Error **errp)' % 
>> c_name(name)
>> -if not middle_mode:
>> -ret = 'static ' + ret
>> -return ret
>> -
>> -
>> -def gen_marshal_decl(name):
>> -return mcgen('''
>> -%(proto)s;
>> -''',
>> - proto=gen_marshal_proto(name))
>> +return 'static void qmp_marshal_%s' % c_name(name) + \
>> +'(QDict *args, QObject **ret, Error **errp)'
>
> I'm wondering if this should be:
>
> return mcgen('''
> static void qmp_marshal_%(c_name)s(QDict *args, QObject **ret, Error **errp)
> ''',
>  c_name=c_name(name))
>
> for consistency with our other code (I'm not sure why we weren't already
> using mcgen(), though).

yes, it works fine too, and we can simplify a bit the code around.

-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH V3] hw/virtio-pci: fix virtio behaviour

2016-07-22 Thread Marcel Apfelbaum


On 07/21/2016 11:18 PM, Michael S. Tsirkin wrote:

On Wed, Jul 20, 2016 at 06:28:21PM +0300, Marcel Apfelbaum wrote:

Enable transitional virtio devices by default.
Enable virtio-1.0 for devices plugged into


disable legacy is better, I agree.


PCIe ports (Root ports or Downstream ports).

Using the virtio-1 mode will remove the limitation
of the number of devices that can be attached to a machine
by removing the need for the IO BAR.

Signed-off-by: Marcel Apfelbaum 


I think you also want to add some comment with a description explaining
*why* you are disabling legacy for these specific devices.


Hi Michael,
I thought the above paragraph in the commit message explains it:

  " Using the virtio-1 mode will remove the limitation
   of the number of devices that can be attached to a machine
   by removing the need for the IO BAR."


What do you think I should add?

Thanks,
Marcel





---

Hi,

v2 -> v3:
   - Various code tweaks to simplify if statements (Michael)
   - Enable virtio modern by default (Gerd and Cornelia)
   - Replace virtio flags with actual fields (Gerd)
   - Wrappers for more readable code

v1 -> v2:
   - Stick to existing defaults for old machine types (Michael S. Tsirkin)

If everyone agrees, I am thinking about getting it into 2.7
to avoid the ~15 virtio devices limitation per machine.

My tests were limited to checking all possible disable-* configurations (and 
make check for all archs)

Thanks,
Marcel

  hw/display/virtio-gpu-pci.c |  4 +---
  hw/display/virtio-vga.c |  4 +---
  hw/virtio/virtio-pci.c  | 34 ++
  hw/virtio/virtio-pci.h  | 21 +
  include/hw/compat.h |  8 
  5 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index a71b230..34a724c 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -30,9 +30,7 @@ static void virtio_gpu_pci_realize(VirtIOPCIProxy *vpci_dev, 
Error **errp)
  int i;

  qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
-/* force virtio-1.0 */
-vpci_dev->flags &= ~VIRTIO_PCI_FLAG_DISABLE_MODERN;
-vpci_dev->flags |= VIRTIO_PCI_FLAG_DISABLE_LEGACY;
+virtio_pci_force_virtio_1(vpci_dev);
  object_property_set_bool(OBJECT(vdev), true, "realized", errp);

  for (i = 0; i < g->conf.max_outputs; i++) {
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index 315b7fc..5b510a1 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -134,9 +134,7 @@ static void virtio_vga_realize(VirtIOPCIProxy *vpci_dev, 
Error **errp)

  /* init virtio bits */
  qdev_set_parent_bus(DEVICE(g), BUS(&vpci_dev->bus));
-/* force virtio-1.0 */
-vpci_dev->flags &= ~VIRTIO_PCI_FLAG_DISABLE_MODERN;
-vpci_dev->flags |= VIRTIO_PCI_FLAG_DISABLE_LEGACY;
+virtio_pci_force_virtio_1(vpci_dev);
  object_property_set_bool(OBJECT(g), true, "realized", &err);
  if (err) {
  error_propagate(errp, err);
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 2b34b43..11cd634 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -161,7 +161,7 @@ static bool virtio_pci_modern_state_needed(void *opaque)
  {
  VirtIOPCIProxy *proxy = opaque;

-return !(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_MODERN);
+return virtio_pci_modern(proxy);
  }

  static const VMStateDescription vmstate_virtio_pci_modern_state = {
@@ -300,8 +300,8 @@ static int virtio_pci_ioeventfd_assign(DeviceState *d, 
EventNotifier *notifier,
  VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
  VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
  VirtQueue *vq = virtio_get_queue(vdev, n);
-bool legacy = !(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_LEGACY);
-bool modern = !(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_MODERN);
+bool legacy = virtio_pci_legacy(proxy);
+bool modern = virtio_pci_modern(proxy);
  bool fast_mmio = kvm_ioeventfd_any_length_enabled();
  bool modern_pio = proxy->flags & VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY;
  MemoryRegion *modern_mr = &proxy->notify.mr;
@@ -1576,8 +1576,8 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
  {
  VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
  VirtioBusState *bus = &proxy->bus;
-bool legacy = !(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_LEGACY);
-bool modern = !(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_MODERN);
+bool legacy = virtio_pci_legacy(proxy);
+bool modern = virtio_pci_modern(proxy);
  bool modern_pio = proxy->flags & VIRTIO_PCI_FLAG_MODERN_PIO_NOTIFY;
  uint8_t *config;
  uint32_t size;
@@ -1696,7 +1696,7 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
  static void virtio_pci_device_unplugged(DeviceState *d)
  {
  VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
-bool modern = !(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_MODERN);
+bool modern = virtio_pci_modern(proxy);

Re: [Qemu-devel] [PATCH v4] virtio-pci: error out when both legacy and modern modes are disabled

2016-07-22 Thread Marcel Apfelbaum


On 07/22/2016 12:55 AM, Greg Kurz wrote:

On Thu, 21 Jul 2016 23:21:16 +0200
Greg Kurz  wrote:


From: Greg Kurz 

Without presuming if we got there because of a user mistake or some
more subtle bug in the tooling, it really does not make sense to
implement a non-functional device.

Signed-off-by: Greg Kurz 
Reviewed-by: Marcel Apfelbaum 
Signed-off-by: Greg Kurz 
---
v4: - rephrased error message and provide a hint to the user
 - split string literals to stay below 80 characters
 - added Marcel's R-b tag
---


Marcel,

I see that Michael has comments on your patch. If you feel this patch is 
valuable
for 2.7, please consider carrying and pushing it, as I'm about to take a 1-month
leave.



I'll be sure to take it form here, thanks for the help!
Marcel


Thanks.

--
Greg


  hw/virtio/virtio-pci.c |8 
  1 file changed, 8 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 755f9218b77d..72c4b392ffda 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1842,6 +1842,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
Error **errp)
  VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
  PCIDevice *pci_dev = &proxy->pci_dev;

+if (!(virtio_pci_modern(proxy) || virtio_pci_legacy(proxy))) {
+error_setg(errp, "device cannot work when both modern and legacy modes"
+   " are disabled");
+error_append_hint(errp, "Set either disable-modern or disable-legacy"
+  " to off\n");
+return;
+}
+
  if (!(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_PCIE) &&
  virtio_pci_modern(proxy)) {
  pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;

Re: [Qemu-devel] [PATCH V3] hw/virtio-pci: fix virtio behaviour

2016-07-22 Thread Marcel Apfelbaum


On 07/22/2016 01:21 AM, Michael S. Tsirkin wrote:

On Thu, Jul 21, 2016 at 11:58:52PM +0200, Gerd Hoffmann wrote:

   Hi,


Actually this can still break existing scripts:
stick a device on express bus but add disable-modern=on
Gave you a legacy device previously but it no longer does.


Unlikely to happen in practice because there is little reason to use
disable-modern=on in 2.6 & older because that is the default ...


Good point, I forgot.


Still we can default to legacy=yes in case disable-modern=on +
disable-legacy=auto.


Given the above I'm not sure it's worth it. I'll leave it to Marcel
to decide.


Hi Michael,Gerd,

I think it doesn't worth to make the code more complicated
for an uninteresting scenario.




  And throw and error in case both modern and legacy
are explicitly disabled (as already suggested elsewhere in this thread).



There is already a patch for this upstream:
https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg05263.html

Michael, since the rc1 is approaching fast, can you please advice
on what changes to make to the commit message and if you prefer me
to add the Greg patch to the series and re-send, or you can take it separately?

Thanks,
Marcel



cheers,
   Gerd

Re: [Qemu-devel] [PATCH 28/37] virtio-input: free config list

2016-07-22 Thread Gerd Hoffmann

> --- a/hw/input/virtio-input-hid.c
> +++ b/hw/input/virtio-input-hid.c

> +.instance_finalize = virtio_input_finalize,

> --- a/hw/input/virtio-input.c
> +++ b/hw/input/virtio-input.c

> +void virtio_input_finalize(Object *obj)
> +{
> +VirtIOInput *vinput = VIRTIO_INPUT(obj);
> +VirtIOInputConfig *cfg, *next;
> +
> +QTAILQ_FOREACH_SAFE(cfg, &vinput->cfg_list, node, next) {
> +QTAILQ_REMOVE(&vinput->cfg_list, cfg, node);
> +g_free(cfg);
> +}
> +}

I think you can keep this local to virtio-input.c and simply hook it
into the abstract base class (TYPE_VIRTIO_INPUT).

Other than that it looks fine to me.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 30/37] usb: free USBDevice.strings

2016-07-22 Thread Gerd Hoffmann

On Di, 2016-07-19 at 12:54 +0400, marcandre.lur...@redhat.com wrote:
> The list is created during instance init and further populated with
> usb_desc_set_string(). Clear it when unrealizing the device.

Reviewed-by: Gerd Hoffmann

Re: [Qemu-devel] [PATCH 32/37] usb: free leaking path

2016-07-22 Thread Gerd Hoffmann

On Di, 2016-07-19 at 12:54 +0400, marcandre.lur...@redhat.com wrote:
> qdev_get_dev_path() returns an allocated string, free it when no
> longer
> needed.

Reviewed-by: Gerd Hoffmann

[Qemu-devel] [PULL 6/7] scripts: ensure monitor socket has SO_REUSEADDR set

2016-07-22 Thread Amit Shah

From: "Daniel P. Berrange" 

If tests use a TCP based monitor socket, the connection will
go into a TIMED_WAIT state when the test exits. This will
randomly prevent the test from being re-run without a certain
time period. Set the SO_REUSEADDR flag on the socket to ensure
we can immediately re-run the tests

Signed-off-by: Daniel P. Berrange 
Message-Id: <1469020993-29426-6-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 scripts/qmp/qmp.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
index 2d0d926..62d3651 100644
--- a/scripts/qmp/qmp.py
+++ b/scripts/qmp/qmp.py
@@ -43,6 +43,7 @@ class QEMUMonitorProtocol:
 self._debug = debug
 self.__sock = self.__get_sock()
 if server:
+self.__sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
 self.__sock.bind(self.__address)
 self.__sock.listen(1)
 
-- 
2.7.4

[Qemu-devel] [PULL 0/7] migration: fix, perf testing framework

2016-07-22 Thread Amit Shah

The following changes since commit 206d0c24361a083fbdcb2cc86fb75dc8b7f251a2:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2016-07-21 20:12:37 +0100)

are available in the git repository at:

  http://git.kernel.org/pub/scm/virt/qemu/amit/migration.git 
tags/migration-for-2.7-6

for you to fetch changes up to 409437e16df273fc5f78f6cd1cb53023eaeb9b72:

  tests: introduce a framework for testing migration performance (2016-07-22 
13:23:39 +0530)


Migration:
- Fix a postcopy bug
- Add a testsuite for measuring migration performance




Daniel P. Berrange (6):
  scripts: add __init__.py file to scripts/qmp/
  scripts: add a 'debug' parameter to QEMUMonitorProtocol
  scripts: refactor the VM class in iotests for reuse
  scripts: set timeout when waiting for qemu monitor connection
  scripts: ensure monitor socket has SO_REUSEADDR set
  tests: introduce a framework for testing migration performance

Dr. David Alan Gilbert (1):
  migration: set state to post-migrate on failure

 configure   |   2 +
 migration/migration.c   |   4 +
 scripts/qemu.py | 202 +++
 scripts/qmp/__init__.py |   0
 scripts/qmp/qmp.py  |  15 +-
 scripts/qtest.py|  34 ++
 tests/Makefile.include  |  12 +
 tests/migration/.gitignore  |   2 +
 tests/migration/guestperf-batch.py  |  26 ++
 tests/migration/guestperf-plot.py   |  26 ++
 tests/migration/guestperf.py|  27 ++
 tests/migration/guestperf/__init__.py   |   0
 tests/migration/guestperf/comparison.py | 124 +++
 tests/migration/guestperf/engine.py | 439 ++
 tests/migration/guestperf/hardware.py   |  62 
 tests/migration/guestperf/plot.py   | 623 
 tests/migration/guestperf/progress.py   | 117 ++
 tests/migration/guestperf/report.py |  98 +
 tests/migration/guestperf/scenario.py   |  95 +
 tests/migration/guestperf/shell.py  | 255 +
 tests/migration/guestperf/timings.py|  55 +++
 tests/migration/stress.c| 367 +++
 tests/qemu-iotests/iotests.py   | 135 +--
 23 files changed, 2587 insertions(+), 133 deletions(-)
 create mode 100644 scripts/qemu.py
 create mode 100644 scripts/qmp/__init__.py
 create mode 100644 tests/migration/.gitignore
 create mode 100755 tests/migration/guestperf-batch.py
 create mode 100755 tests/migration/guestperf-plot.py
 create mode 100755 tests/migration/guestperf.py
 create mode 100644 tests/migration/guestperf/__init__.py
 create mode 100644 tests/migration/guestperf/comparison.py
 create mode 100644 tests/migration/guestperf/engine.py
 create mode 100644 tests/migration/guestperf/hardware.py
 create mode 100644 tests/migration/guestperf/plot.py
 create mode 100644 tests/migration/guestperf/progress.py
 create mode 100644 tests/migration/guestperf/report.py
 create mode 100644 tests/migration/guestperf/scenario.py
 create mode 100644 tests/migration/guestperf/shell.py
 create mode 100644 tests/migration/guestperf/timings.py
 create mode 100644 tests/migration/stress.c

-- 
2.7.4

[Qemu-devel] [PULL 2/7] scripts: add init.py file to scripts/qmp/

2016-07-22 Thread Amit Shah

From: "Daniel P. Berrange" 

When searching for modules to load, python will ignore any
sub-directory which does not contain __init__.py. This means
that both scripts and scripts/qmp/ have to be explicitly added
to the python path. By adding a __init__.py file to scripts/qmp,
we only need add scripts/ to the python path and can then simply
do 'from qmp import qmp' to load scripts/qmp/qmp.py.

Signed-off-by: Daniel P. Berrange 
Message-Id: <1469020993-29426-2-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 scripts/qmp/__init__.py | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 scripts/qmp/__init__.py

diff --git a/scripts/qmp/__init__.py b/scripts/qmp/__init__.py
new file mode 100644
index 000..e69de29
-- 
2.7.4

[Qemu-devel] [PULL 5/7] scripts: set timeout when waiting for qemu monitor connection

2016-07-22 Thread Amit Shah

From: "Daniel P. Berrange" 

If QEMU fails to launch for some reason, the QEMUMonitorProtocol
class accept() method will wait forever in a socket accept call.
Set a timeout of 15 seconds so that we fail more gracefully
instead of hanging the test script forever

Signed-off-by: Daniel P. Berrange 
Message-Id: <1469020993-29426-5-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 scripts/qmp/qmp.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
index 70e927e..2d0d926 100644
--- a/scripts/qmp/qmp.py
+++ b/scripts/qmp/qmp.py
@@ -140,6 +140,7 @@ class QEMUMonitorProtocol:
 @raise QMPConnectError if the greeting is not received
 @raise QMPCapabilitiesError if fails to negotiate capabilities
 """
+self.__sock.settimeout(15)
 self.__sock, _ = self.__sock.accept()
 self.__sockfile = self.__sock.makefile()
 return self.__negotiate_capabilities()
-- 
2.7.4

Re: [Qemu-devel] [RFC v1 05/13] target-ppc: add modulo word operations

2016-07-22 Thread Nikunj A Dadhania

David Gibson  writes:

> [ Unknown signature status ]
> On Fri, Jul 22, 2016 at 12:24:55PM +0530, Nikunj A Dadhania wrote:
>> David Gibson  writes:
>> 
>> > [ Unknown signature status ]
>> > On Fri, Jul 22, 2016 at 10:59:18AM +0530, Nikunj A Dadhania wrote:
>> >> David Gibson  writes:
>> >> 
>> >> > [ Unknown signature status ]
>> >> > On Mon, Jul 18, 2016 at 10:35:09PM +0530, Nikunj A Dadhania wrote:
>> >> >> Adding following instructions:
>> >> >> 
>> >> >> moduw: Modulo Unsigned Word
>> >> >> modsw: Modulo Signed Word
>> >> >> 
>> >> >> Signed-off-by: Nikunj A Dadhania 
>> >> >
>> >> > As rth has already mentioned this many branches probably means this
>> >> > wants a helper.
>> >> >
>> >> >> ---
>> >> >>  target-ppc/translate.c | 48 
>> >> >> 
>> >> >>  1 file changed, 48 insertions(+)
>> >> >> 
>> >> >> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
>> >> >> index d44f7af..487dd94 100644
>> >> >> --- a/target-ppc/translate.c
>> >> >> +++ b/target-ppc/translate.c
>> >> >> @@ -1178,6 +1178,52 @@ GEN_DIVE(divde, divde, 0);
>> >> >>  GEN_DIVE(divdeo, divde, 1);
>> >> >>  #endif
>> >> >>  
>> >> >> +static inline void gen_op_arith_modw(DisasContext *ctx, TCGv ret, 
>> >> >> TCGv arg1,
>> >> >> + TCGv arg2, int sign)
>> >> >> +{
>> >> >> +TCGLabel *l1 = gen_new_label();
>> >> >> +TCGLabel *l2 = gen_new_label();
>> >> >> +TCGv_i32 t0 = tcg_temp_local_new_i32();
>> >> >> +TCGv_i32 t1 = tcg_temp_local_new_i32();
>> >> >> +TCGv_i32 t2 = tcg_temp_local_new_i32();
>> >> >> +
>> >> >> +tcg_gen_trunc_tl_i32(t0, arg1);
>> >> >> +tcg_gen_trunc_tl_i32(t1, arg2);
>> >> >> +tcg_gen_brcondi_i32(TCG_COND_EQ, t1, 0, l1);
>> >> 
>> >> Result for:
>> >>  % 0 and ...
>> >> 
>> >> >> +if (sign) {
>> >> >> +TCGLabel *l3 = gen_new_label();
>> >> >> +tcg_gen_brcondi_i32(TCG_COND_NE, t1, -1, l3);
>> >> >> +tcg_gen_brcondi_i32(TCG_COND_EQ, t0, INT32_MIN, l1);
>> >> >> +gen_set_label(l3);
>> >> >
>> >> > It's not really clear to be what the logic above is doing.
>> >> 
>> >> ... For signed case
>> >> 0x8000_ % -1
>> >> 
>> >> Is undefined, addressing those cases.
>> >
>> > Do you mean the tcg operations have undefined results or that the ppc
>> > instructions have undefined results?
>> 
>> TCG side, I haven't tried.
>> 
>> > If the latter, then why do you care about those cases?
>> 
>> Thats how divd is implemented as well, i didn't want to break that. I am
>> looking at doing both div and mod as helpers.
>> 
>> >> >> +tcg_gen_rem_i32(t2, t0, t1);
>> >> >> +} else {
>> >> >> +tcg_gen_remu_i32(t2, t0, t1);
>> >> >> +}
>> >> >> +tcg_gen_br(l2);
>> >> >> +gen_set_label(l1);
>> >> >> +if (sign) {
>> >> >> +tcg_gen_sari_i32(t2, t0, 31);
>> >> >
>> >> > AFAICT this sets t2 to either 0 or -1 depending on the sign of t0,
>> >> > which seems like an odd thing to do.
>> >> 
>> >> Extending the sign later ...
>> >
>> > Right, so after sign extension you have a 64-bit 0 or -1.  Still not
>> > seeing what that 0 or -1 result is useful for.
>> 
>> Oh ok, i got why you got confused. I am re-writing all of it though, but
>> for understanding:
>> 
>>   if (divisor == 0)
>>  goto l1;
>> 
>>   if (signed) {
>>  if (divisor == -1 && dividend == INT_MIN)
>> goto l1;
>>  compute_signed_rem(t2, t0, t1);
>>   } else {
>>  compute_unsigned_rem(t2, t0, t1);  
>>   }
>>   goto l2; /* jump to setting extending result and return */
>> 
>> l1: /* in case of invalid input set values */
>>   if (signed)
>>  t2 = -1 or 0;
>>   else
>>  t2 = 0;
>
> Ok, so why do you ever need different result values in the case of
> invalid input?  Why is always returning 0 not good enough?

Let me go through the spec, as divd does the same thing.

Regards
Nikunj

[Qemu-devel] [PULL 7/7] tests: introduce a framework for testing migration performance

2016-07-22 Thread Amit Shah

From: "Daniel P. Berrange" 

This introduces a moderately general purpose framework for
testing performance of migration.

The initial guest workload is provided by the included 'stress'
program, which is configured to spawn one thread per guest CPU
and run a maximally memory intensive workload. It will loop
over GB of memory, xor'ing each byte with data from a 4k array
of random bytes. This ensures heavy read and write load across
all of guest memory to stress the migration performance. While
running the 'stress' program will record how long it takes to
xor each GB of memory and print this data for later reporting.

The test engine will spawn a pair of QEMU processes, either on
the same host, or with the target on a remote host via ssh,
using the host kernel and a custom initrd built with 'stress'
as the /init binary. Kernel command line args are set to ensure
a fast kernel boot time (< 1 second) between launching QEMU and
the stress program starting execution.

None the less, the test engine will initially wait N seconds for
the guest workload to stablize, before starting the migration
operation. When migration is running, the engine will use pause,
post-copy, autoconverge, xbzrle compression and multithread
compression features, as well as downtime & bandwidth tuning
to encourage completion. If migration completes, the test engine
will wait N seconds again for the guest workooad to stablize on
the target host. If migration does not complete after a preset
number of iterations, it will be aborted.

While the QEMU process is running on the source host, the test
engine will sample the host CPU usage of QEMU as a whole, and
each vCPU thread. While migration is running, it will record
all the stats reported by 'query-migration'. Finally, it will
capture the output of the stress program running in the guest.

All the data produced from a single test execution is recorded
in a structured JSON file. A separate program is then able to
create interactive charts using the "plotly" python + javascript
libraries, showing the characteristics of the migration.

The data output provides visualization of the effect on guest
vCPU workloads from the migration process, the corresponding
vCPU utilization on the host, and the overall CPU hit from
QEMU on the host. This is correlated from statistics from the
migration process, such as downtime, vCPU throttling and iteration
number.

While the tests can be run individually with arbitrary parameters,
there is also a facility for producing batch reports for a number
of pre-defined scenarios / comparisons, in order to be able to
get standardized results across different hardware configurations
(eg TCP vs RDMA, or comparing different VCPU counts / memory
sizes, etc).

To use this, first you must build the initrd image

 $ make tests/migration/initrd-stress.img

To run a a one-shot test with all default parameters

 $ ./tests/migration/guestperf.py > result.json

This has many command line args for varying its behaviour.
For example, to increase the RAM size and CPU count and
bind it to specific host NUMA nodes

 $ ./tests/migration/guestperf.py \
   --mem 4 --cpus 2 \
   --src-mem-bind 0 --src-cpu-bind 0,1 \
   --dst-mem-bind 1 --dst-cpu-bind 2,3 \
   > result.json

Using mem + cpu binding is strongly recommended on NUMA
machines, otherwise the guest performance results will
vary wildly between runs of the test due to lucky/unlucky
NUMA placement, making sensible data analysis impossible.

To make it run across separate hosts:

 $ ./tests/migration/guestperf.py \
   --dst-host somehostname > result.json

To request that post-copy is enabled, with switchover
after 5 iterations

 $ ./tests/migration/guestperf.py \
   --post-copy --post-copy-iters 5 > result.json

Once a result.json file is created, a graph of the data
can be generated, showing guest workload performance per
thread and the migration iteration points:

 $ ./tests/migration/guestperf-plot.py --output result.html \
--migration-iters --split-guest-cpu result.json

To further include host vCPU utilization and overall QEMU
utilization

 $ ./tests/migration/guestperf-plot.py --output result.html \
--migration-iters --split-guest-cpu \
--qemu-cpu --vcpu-cpu result.json

NB, the 'guestperf-plot.py' command requires that you have
the plotly python library installed. eg you must do

 $ pip install --user  plotly

Viewing the result.html file requires that you have the
plotly.min.js file in the same directory as the HTML
output. This js file is installed as part of the plotly
python library, so can be found in

  $HOME/.local/lib/python2.7/site-packages/plotly/offline/plotly.min.js

The guestperf-plot.py program can accept multiple json files
to plot, enabling results from different configurations to
be compared.

Finally, to run the entire standardized set of comparisons

  $ ./tests/migration/guestperf-batch.py \
   --dst-host somehost \
   --mem 4 --cpus 2 \
   --src

[Qemu-devel] [PULL 1/7] migration: set state to post-migrate on failure

2016-07-22 Thread Amit Shah

From: "Dr. David Alan Gilbert" 

If a migration fails/is cancelled during the postcopy stage we currently
end up with the runstate as finish-migrate, where it should be post-migrate.
There's a small window in precopy where I think the same thing can
happen, but I've never seen it.

It rarely matters; the only postcopy case is if you restart a migration, which
again is a case that rarely matters in postcopy because it's only
safe to restart the migration if you know the destination hasn't
been running (which you might if you started the destination with -S
and hadn't got around to 'c' ing it before the postcopy failed).
Even then it's a small window but potentially you could hit if
there's a problem loading the devices on the destination.

This corresponds to:
https://bugzilla.redhat.com/show_bug.cgi?id=1355683

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Amit Shah 
Message-Id: <1468601086-32117-1-git-send-email-dgilb...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/migration.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index c4e0193..955d5ee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1837,6 +1837,10 @@ static void *migration_thread(void *opaque)
 } else {
 if (old_vm_running && !entered_postcopy) {
 vm_start();
+} else {
+if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+runstate_set(RUN_STATE_POSTMIGRATE);
+}
 }
 }
 qemu_bh_schedule(s->cleanup_bh);
-- 
2.7.4

Re: [Qemu-devel] [RFC PATCH 0/2] Migration: support working on file:url

2016-07-22 Thread Daniel P. Berrange

On Thu, Jul 21, 2016 at 06:41:23PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berra...@redhat.com) wrote:
> > On Thu, Jul 21, 2016 at 01:05:58PM +0800, zhanghailiang wrote:
> > > It is more simple to use file:url to migrate VM into file.
> > > Besides, it will be used in memory snapshot.
> > 
> > NB, you can already migrate into a file
> > 
> >"exec:/bin/cat > /path/to/file"
> > 
> > and likewise
> > 
> >   qemu -incoming "exec:/bin/cat /path/to/file"
> > 
> > This avoids the problem with POSIX I/O on plain files not actually
> > supporting O_NONBLOCK in any sensible manner, which your file:
> > suggestion suffers from.
> 
> Hmm that's a shame; I liked this idea as a nice simplification of the
> exec: stuff.

The only way to achieve that would be to spawn a thread in QEMU that
does I/O to the actual file, and then have the migration code do I/O
to/from that thread via a pipe. IOW, the thread would take the role
of cat.  What that's worth doing or not I don't know, given that it
is already possible to use exec+cat.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [PULL 3/7] scripts: add a 'debug' parameter to QEMUMonitorProtocol

2016-07-22 Thread Amit Shah

From: "Daniel P. Berrange" 

Add a 'debug' parameter to the QEMUMonitorProtocol class
which will cause it to print out all JSON strings on
sys.stderr

Signed-off-by: Daniel P. Berrange 
Message-Id: <1469020993-29426-3-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 scripts/qmp/qmp.py | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py
index 779332f..70e927e 100644
--- a/scripts/qmp/qmp.py
+++ b/scripts/qmp/qmp.py
@@ -11,6 +11,7 @@
 import json
 import errno
 import socket
+import sys
 
 class QMPError(Exception):
 pass
@@ -25,7 +26,7 @@ class QMPTimeoutError(QMPError):
 pass
 
 class QEMUMonitorProtocol:
-def __init__(self, address, server=False):
+def __init__(self, address, server=False, debug=False):
 """
 Create a QEMUMonitorProtocol class.
 
@@ -39,6 +40,7 @@ class QEMUMonitorProtocol:
 """
 self.__events = []
 self.__address = address
+self._debug = debug
 self.__sock = self.__get_sock()
 if server:
 self.__sock.bind(self.__address)
@@ -68,6 +70,8 @@ class QEMUMonitorProtocol:
 return
 resp = json.loads(data)
 if 'event' in resp:
+if self._debug:
+print >>sys.stderr, "QMP:<<< %s" % resp
 self.__events.append(resp)
 if not only_event:
 continue
@@ -148,13 +152,18 @@ class QEMUMonitorProtocol:
 @return QMP response as a Python dict or None if the connection has
 been closed
 """
+if self._debug:
+print >>sys.stderr, "QMP:>>> %s" % qmp_cmd
 try:
 self.__sock.sendall(json.dumps(qmp_cmd))
 except socket.error as err:
 if err[0] == errno.EPIPE:
 return
 raise socket.error(err)
-return self.__json_read()
+resp = self.__json_read()
+if self._debug:
+print >>sys.stderr, "QMP:<<< %s" % resp
+return resp
 
 def cmd(self, name, args=None, id=None):
 """
-- 
2.7.4

[Qemu-devel] [PULL 4/7] scripts: refactor the VM class in iotests for reuse

2016-07-22 Thread Amit Shah

From: "Daniel P. Berrange" 

The iotests module has a python class for controlling QEMU
processes. Pull the generic functionality out of this file
and create a scripts/qemu.py module containing a QEMUMachine
class. Put the QTest integration support into a subclass
QEMUQtestMachine.

Signed-off-by: Daniel P. Berrange 
Message-Id: <1469020993-29426-4-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 scripts/qemu.py   | 202 ++
 scripts/qtest.py  |  34 +++
 tests/qemu-iotests/iotests.py | 135 +---
 3 files changed, 240 insertions(+), 131 deletions(-)
 create mode 100644 scripts/qemu.py

diff --git a/scripts/qemu.py b/scripts/qemu.py
new file mode 100644
index 000..9cdad24
--- /dev/null
+++ b/scripts/qemu.py
@@ -0,0 +1,202 @@
+# QEMU library
+#
+# Copyright (C) 2015-2016 Red Hat Inc.
+# Copyright (C) 2012 IBM Corp.
+#
+# Authors:
+#  Fam Zheng 
+#
+# This work is licensed under the terms of the GNU GPL, version 2.  See
+# the COPYING file in the top-level directory.
+#
+# Based on qmp.py.
+#
+
+import errno
+import string
+import os
+import sys
+import subprocess
+import qmp.qmp
+
+
+class QEMUMachine(object):
+'''A QEMU VM'''
+
+def __init__(self, binary, args=[], wrapper=[], name=None, 
test_dir="/var/tmp",
+ monitor_address=None, debug=False):
+if name is None:
+name = "qemu-%d" % os.getpid()
+if monitor_address is None:
+monitor_address = os.path.join(test_dir, name + "-monitor.sock")
+self._monitor_address = monitor_address
+self._qemu_log_path = os.path.join(test_dir, name + ".log")
+self._popen = None
+self._binary = binary
+self._args = args
+self._wrapper = wrapper
+self._events = []
+self._iolog = None
+self._debug = debug
+
+# This can be used to add an unused monitor instance.
+def add_monitor_telnet(self, ip, port):
+args = 'tcp:%s:%d,server,nowait,telnet' % (ip, port)
+self._args.append('-monitor')
+self._args.append(args)
+
+def add_fd(self, fd, fdset, opaque, opts=''):
+'''Pass a file descriptor to the VM'''
+options = ['fd=%d' % fd,
+   'set=%d' % fdset,
+   'opaque=%s' % opaque]
+if opts:
+options.append(opts)
+
+self._args.append('-add-fd')
+self._args.append(','.join(options))
+return self
+
+def send_fd_scm(self, fd_file_path):
+# In iotest.py, the qmp should always use unix socket.
+assert self._qmp.is_scm_available()
+bin = socket_scm_helper
+if os.path.exists(bin) == False:
+print "Scm help program does not present, path '%s'." % bin
+return -1
+fd_param = ["%s" % bin,
+"%d" % self._qmp.get_sock_fd(),
+"%s" % fd_file_path]
+devnull = open('/dev/null', 'rb')
+p = subprocess.Popen(fd_param, stdin=devnull, stdout=sys.stdout,
+ stderr=sys.stderr)
+return p.wait()
+
+@staticmethod
+def _remove_if_exists(path):
+'''Remove file object at path if it exists'''
+try:
+os.remove(path)
+except OSError as exception:
+if exception.errno == errno.ENOENT:
+return
+raise
+
+def get_pid(self):
+if not self._popen:
+return None
+return self._popen.pid
+
+def _load_io_log(self):
+with open(self._qemu_log_path, "r") as fh:
+self._iolog = fh.read()
+
+def _base_args(self):
+if isinstance(self._monitor_address, tuple):
+moncdev = "socket,id=mon,host=%s,port=%s" % (
+self._monitor_address[0],
+self._monitor_address[1])
+else:
+moncdev = 'socket,id=mon,path=%s' % self._monitor_address
+return ['-chardev', moncdev,
+'-mon', 'chardev=mon,mode=control',
+'-display', 'none', '-vga', 'none']
+
+def _pre_launch(self):
+self._qmp = qmp.qmp.QEMUMonitorProtocol(self._monitor_address, 
server=True,
+debug=self._debug)
+
+def _post_launch(self):
+self._qmp.accept()
+
+def _post_shutdown(self):
+if not isinstance(self._monitor_address, tuple):
+self._remove_if_exists(self._monitor_address)
+self._remove_if_exists(self._qemu_log_path)
+
+def launch(self):
+'''Launch the VM and establish a QMP connection'''
+devnull = open('/dev/null', 'rb')
+qemulog = open(self._qemu_log_path, 'wb')
+try:
+self._pre_launch()
+args = self._wrapper + [self._binary] + self._base_args() + 
self._args
+self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
+

Re: [Qemu-devel] [PATCH v4] virtio-pci: error out when both legacy and modern modes are disabled

2016-07-22 Thread Cornelia Huck

On Thu, 21 Jul 2016 23:21:16 +0200
Greg Kurz  wrote:

> From: Greg Kurz 
> 
> Without presuming if we got there because of a user mistake or some
> more subtle bug in the tooling, it really does not make sense to
> implement a non-functional device.
> 
> Signed-off-by: Greg Kurz 
> Reviewed-by: Marcel Apfelbaum 
> Signed-off-by: Greg Kurz 
> ---
> v4: - rephrased error message and provide a hint to the user
> - split string literals to stay below 80 characters
> - added Marcel's R-b tag
> ---
>  hw/virtio/virtio-pci.c |8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 755f9218b77d..72c4b392ffda 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1842,6 +1842,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
> Error **errp)
>  VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
>  PCIDevice *pci_dev = &proxy->pci_dev;
> 
> +if (!(virtio_pci_modern(proxy) || virtio_pci_legacy(proxy))) {

I'm not sure that I didn't mess up the sequence of the realize
callbacks, but could disable_legacy still be AUTO here? In that case,
we'd fail for disable-modern=on and disable-legacy unset (i.e., AUTO),
which would be ok for pcie but not for !pcie.

> +error_setg(errp, "device cannot work when both modern and legacy 
> modes"
> +   " are disabled");
> +error_append_hint(errp, "Set either disable-modern or disable-legacy"
> +  " to off\n");
> +return;
> +}
> +
>  if (!(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_PCIE) &&
>  virtio_pci_modern(proxy)) {
>  pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
>

[Qemu-devel] [PATCH v7 00/16] backup compression

2016-07-22 Thread Denis V. Lunev

The idea is simple - backup is "written-once" data. It is written block
by block and it is large enough. It would be nice to save storage
space and compress it.

These patches add the ability to compress data during backup. This
functionality is implemented by means of adding options to the qmp/hmp
commands(drive-backup, blockdev-backup). The implementation is quite
simple, because the responsibility for data compression imposed on the
format driver.

Changes from v1:
- added unittest for backup compression (12, 13)

Changes from v2:
- implemented a new .bdrv_co_write_compressed interface to replace the
  old .bdrv_write_compressed (2,3,4,5,6)

Changes from v3:
- added the byte-based interfaces:
  blk_pwrite_compressed()/blk_co_pwritev_compressed() (1, 7)
- fix drive/blockdev-backup documentation (10, 11)

Changes from v4:
- added assert that offset and count are aligned (1)
- reuse RwCo and bdrv_co_pwritev() for write compressed (2)
- converted interfaces to byte-based for format drivers (2, 3, 5, 6)
- move an unrelated cleanup in a separate patches (4, 7)
- turn on dirty_bitmaps for the compressed writes (9)
- added simplify drive/blockdev-backup by using the boxed commands (10, 11)
- reworded drive/blockdev-backup documentation about compression (12, 13)
- fix s/bakup/backup/ (14)

Changes from v5:
- rebased on master
- fix grammar (5, 8)

Changes from v6:
- more grammar fixes (1,11)
- assignment cleanup as suggested by Eric in 11

Signed-off-by: Pavel Butsykin 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 

Pavel Butsykin (16):
  block: switch blk_write_compressed() to byte-based interface
  block: Convert bdrv_pwrite_compressed() to BdrvChild
  block/io: reuse bdrv_co_pwritev() for write compressed
  qcow2: add qcow2_co_pwritev_compressed
  qcow2: cleanup qcow2_co_pwritev_compressed to avoid the recursion
  vmdk: add vmdk_co_pwritev_compressed
  qcow: add qcow_co_pwritev_compressed
  qcow: cleanup qcow_co_pwritev_compressed to avoid the recursion
  block: remove BlockDriver.bdrv_write_compressed
  block/io: turn on dirty_bitmaps for the compressed writes
  block: simplify drive-backup
  block: simplify blockdev-backup
  drive-backup: added support for data compression
  blockdev-backup: added support for data compression
  qemu-iotests: test backup compression in 055
  qemu-iotests: add vmdk for test backup compression in 055

 block/backup.c |  12 ++-
 block/block-backend.c  |  27 +-
 block/io.c |  48 --
 block/qcow.c   | 113 +---
 block/qcow2.c  | 128 ++-
 block/vmdk.c   |  55 ++--
 blockdev.c | 193 ++---
 hmp-commands.hx|   8 +-
 hmp.c  |  24 ++---
 include/block/block.h  |   5 +-
 include/block/block_int.h  |   5 +-
 include/sysemu/block-backend.h |   4 +-
 qapi/block-core.json   |  18 +++-
 qemu-img.c |   8 +-
 qemu-io-cmds.c |   2 +-
 qmp-commands.hx|   9 +-
 tests/qemu-iotests/055 | 118 +
 tests/qemu-iotests/055.out |   4 +-
 tests/qemu-iotests/iotests.py  |  10 +--
 19 files changed, 366 insertions(+), 425 deletions(-)

-- 
2.1.4

[Qemu-devel] [PATCH v7 11/16] block: simplify drive-backup

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Now that we can support boxed commands, use it to greatly reduce the
number of parameters (and likelihood of getting out of sync) when
adjusting drive-backup parameters.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 blockdev.c   | 115 +--
 hmp.c|  21 +-
 qapi/block-core.json |   3 +-
 3 files changed, 50 insertions(+), 89 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index eafeba9..e29147a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1838,17 +1838,8 @@ typedef struct DriveBackupState {
 BlockJob *job;
 } DriveBackupState;
 
-static void do_drive_backup(const char *job_id, const char *device,
-const char *target, bool has_format,
-const char *format, enum MirrorSyncMode sync,
-bool has_mode, enum NewImageMode mode,
-bool has_speed, int64_t speed,
-bool has_bitmap, const char *bitmap,
-bool has_on_source_error,
-BlockdevOnError on_source_error,
-bool has_on_target_error,
-BlockdevOnError on_target_error,
-BlockJobTxn *txn, Error **errp);
+static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
+Error **errp);
 
 static void drive_backup_prepare(BlkActionState *common, Error **errp)
 {
@@ -1878,16 +1869,7 @@ static void drive_backup_prepare(BlkActionState *common, 
Error **errp)
 bdrv_drained_begin(blk_bs(blk));
 state->bs = blk_bs(blk);
 
-do_drive_backup(backup->has_job_id ? backup->job_id : NULL,
-backup->device, backup->target,
-backup->has_format, backup->format,
-backup->sync,
-backup->has_mode, backup->mode,
-backup->has_speed, backup->speed,
-backup->has_bitmap, backup->bitmap,
-backup->has_on_source_error, backup->on_source_error,
-backup->has_on_target_error, backup->on_target_error,
-common->block_job_txn, &local_err);
+do_drive_backup(backup, common->block_job_txn, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -3155,17 +3137,7 @@ out:
 aio_context_release(aio_context);
 }
 
-static void do_drive_backup(const char *job_id, const char *device,
-const char *target, bool has_format,
-const char *format, enum MirrorSyncMode sync,
-bool has_mode, enum NewImageMode mode,
-bool has_speed, int64_t speed,
-bool has_bitmap, const char *bitmap,
-bool has_on_source_error,
-BlockdevOnError on_source_error,
-bool has_on_target_error,
-BlockdevOnError on_target_error,
-BlockJobTxn *txn, Error **errp)
+static void do_drive_backup(DriveBackup *backup, BlockJobTxn *txn, Error 
**errp)
 {
 BlockBackend *blk;
 BlockDriverState *bs;
@@ -3178,23 +3150,26 @@ static void do_drive_backup(const char *job_id, const 
char *device,
 int flags;
 int64_t size;
 
-if (!has_speed) {
-speed = 0;
+if (!backup->has_speed) {
+backup->speed = 0;
 }
-if (!has_on_source_error) {
-on_source_error = BLOCKDEV_ON_ERROR_REPORT;
+if (!backup->has_on_source_error) {
+backup->on_source_error = BLOCKDEV_ON_ERROR_REPORT;
 }
-if (!has_on_target_error) {
-on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+if (!backup->has_on_target_error) {
+backup->on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+}
+if (!backup->has_mode) {
+backup->mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
 }
-if (!has_mode) {
-mode = NEW_IMAGE_MODE_ABSOLUTE_PATHS;
+if (!backup->has_job_id) {
+backup->job_id = NULL;
 }
 
-blk = blk_by_name(device);
+blk = blk_by_name(backup->device);
 if (!blk) {
 error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
-  "Device '%s' not found", device);
+  "Device '%s' not found", backup->device);
 return;
 }
 
@@ -3204,13 +3179,14 @@ static void do_drive_backup(const char *job_id, const 
char *device,
 /* Although backup_run has this check too, we need to use bs->drv below, so
  * do an early check redundantly. */
 if (!blk_is_available(blk)) {
-error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
+error_setg(err

[Qemu-devel] [PATCH v7 08/16] qcow: cleanup qcow_co_pwritev_compressed to avoid the recursion

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Now that the function uses a vector instead of a buffer, there is no
need to use recursive code.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/qcow.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index e1d335d..20d2e15 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -927,27 +927,17 @@ qcow_co_pwritev_compressed(BlockDriverState *bs, uint64_t 
offset,
 uint8_t *buf, *out_buf;
 uint64_t cluster_offset;
 
+buf = qemu_blockalign(bs, s->cluster_size);
 if (bytes != s->cluster_size) {
-ret = -EINVAL;
-
-/* Zero-pad last write if image size is not cluster aligned */
-if (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS &&
-bytes < s->cluster_size)
+if (bytes > s->cluster_size ||
+offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
 {
-uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
-memset(pad_buf, 0, s->cluster_size);
-qemu_iovec_to_buf(qiov, 0, pad_buf, s->cluster_size);
-iov = (struct iovec) {
-.iov_base   = pad_buf,
-.iov_len= s->cluster_size,
-};
-qemu_iovec_init_external(&hd_qiov, &iov, 1);
-ret = qcow_co_pwritev_compressed(bs, offset, bytes, &hd_qiov);
-qemu_vfree(pad_buf);
+qemu_vfree(buf);
+return -EINVAL;
 }
-return ret;
+/* Zero-pad last write if image size is not cluster aligned */
+memset(buf + bytes, 0, s->cluster_size - bytes);
 }
-buf = qemu_blockalign(bs, s->cluster_size);
 qemu_iovec_to_buf(qiov, 0, buf, qiov->size);
 
 out_buf = g_malloc(s->cluster_size + (s->cluster_size / 1000) + 128);
-- 
2.1.4

[Qemu-devel] [PATCH v7 07/16] qcow: add qcow_co_pwritev_compressed

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Added implementation of the qcow_co_pwritev_compressed function that
will allow us to safely use compressed writes for the qcow from running
VMs.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/qcow.c | 109 +++
 1 file changed, 42 insertions(+), 67 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index 0c7b75b..e1d335d 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -913,75 +913,42 @@ static int qcow_make_empty(BlockDriverState *bs)
 return 0;
 }
 
-typedef struct QcowWriteCo {
-BlockDriverState *bs;
-int64_t sector_num;
-const uint8_t *buf;
-int nb_sectors;
-int ret;
-} QcowWriteCo;
-
-static void qcow_write_co_entry(void *opaque)
-{
-QcowWriteCo *co = opaque;
-QEMUIOVector qiov;
-
-struct iovec iov = (struct iovec) {
-.iov_base   = (uint8_t*) co->buf,
-.iov_len= co->nb_sectors * BDRV_SECTOR_SIZE,
-};
-qemu_iovec_init_external(&qiov, &iov, 1);
-
-co->ret = qcow_co_writev(co->bs, co->sector_num, co->nb_sectors, &qiov);
-}
-
-/* Wrapper for non-coroutine contexts */
-static int qcow_write(BlockDriverState *bs, int64_t sector_num,
-  const uint8_t *buf, int nb_sectors)
-{
-Coroutine *co;
-AioContext *aio_context = bdrv_get_aio_context(bs);
-QcowWriteCo data = {
-.bs = bs,
-.sector_num = sector_num,
-.buf= buf,
-.nb_sectors = nb_sectors,
-.ret= -EINPROGRESS,
-};
-co = qemu_coroutine_create(qcow_write_co_entry, &data);
-qemu_coroutine_enter(co);
-while (data.ret == -EINPROGRESS) {
-aio_poll(aio_context, true);
-}
-return data.ret;
-}
-
 /* XXX: put compressed sectors first, then all the cluster aligned
tables to avoid losing bytes in alignment */
-static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
- const uint8_t *buf, int nb_sectors)
+static coroutine_fn int
+qcow_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
+   uint64_t bytes, QEMUIOVector *qiov)
 {
 BDRVQcowState *s = bs->opaque;
+QEMUIOVector hd_qiov;
+struct iovec iov;
 z_stream strm;
 int ret, out_len;
-uint8_t *out_buf;
+uint8_t *buf, *out_buf;
 uint64_t cluster_offset;
 
-if (nb_sectors != s->cluster_sectors) {
+if (bytes != s->cluster_size) {
 ret = -EINVAL;
 
 /* Zero-pad last write if image size is not cluster aligned */
-if (sector_num + nb_sectors == bs->total_sectors &&
-nb_sectors < s->cluster_sectors) {
+if (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS &&
+bytes < s->cluster_size)
+{
 uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
 memset(pad_buf, 0, s->cluster_size);
-memcpy(pad_buf, buf, nb_sectors * BDRV_SECTOR_SIZE);
-ret = qcow_write_compressed(bs, sector_num,
-pad_buf, s->cluster_sectors);
+qemu_iovec_to_buf(qiov, 0, pad_buf, s->cluster_size);
+iov = (struct iovec) {
+.iov_base   = pad_buf,
+.iov_len= s->cluster_size,
+};
+qemu_iovec_init_external(&hd_qiov, &iov, 1);
+ret = qcow_co_pwritev_compressed(bs, offset, bytes, &hd_qiov);
 qemu_vfree(pad_buf);
 }
 return ret;
 }
+buf = qemu_blockalign(bs, s->cluster_size);
+qemu_iovec_to_buf(qiov, 0, buf, qiov->size);
 
 out_buf = g_malloc(s->cluster_size + (s->cluster_size / 1000) + 128);
 
@@ -1012,27 +979,35 @@ static int qcow_write_compressed(BlockDriverState *bs, 
int64_t sector_num,
 
 if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
 /* could not compress: write normal cluster */
-ret = qcow_write(bs, sector_num, buf, s->cluster_sectors);
-if (ret < 0) {
-goto fail;
-}
-} else {
-cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
-out_len, 0, 0);
-if (cluster_offset == 0) {
-ret = -EIO;
-goto fail;
-}
-
-cluster_offset &= s->cluster_offset_mask;
-ret = bdrv_pwrite(bs->file, cluster_offset, out_buf, out_len);
+ret = qcow_co_writev(bs, offset >> BDRV_SECTOR_BITS,
+ bytes >> BDRV_SECTOR_BITS, qiov);
 if (ret < 0) {
 goto fail;
 }
+goto success;
+}
+qemu_co_mutex_lock(&s->lock);
+cluster_offset = get_cluster_offset(bs, offset, 2, out_len, 0, 0);
+qemu_co_mutex_unlock(&s->lock);
+if (cluster_offset == 0) {
+ret = -EIO;
+goto fail;
 }

[Qemu-devel] [PATCH v7 10/16] block/io: turn on dirty_bitmaps for the compressed writes

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Previously was added the assert:

  commit 1755da16e32c15b22a521e8a38539e4b5cf367f3
  Author: Paolo Bonzini 
  Date:   Thu Oct 18 16:49:18 2012 +0200
  block: introduce new dirty bitmap functionality

Now the compressed write is always in coroutine and setting the bits is
done after the write, so that we can return the dirty_bitmaps for the
compressed writes.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/io.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/io.c b/block/io.c
index 60922ed..bc9eee5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -896,7 +896,6 @@ bdrv_driver_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 return -ENOTSUP;
 }
 
-assert(QLIST_EMPTY(&bs->dirty_bitmaps));
 return drv->bdrv_co_pwritev_compressed(bs, offset, bytes, qiov);
 }
 
@@ -1317,6 +1316,8 @@ static int coroutine_fn 
bdrv_aligned_pwritev(BlockDriverState *bs,
 } else if (flags & BDRV_REQ_ZERO_WRITE) {
 bdrv_debug_event(bs, BLKDBG_PWRITEV_ZERO);
 ret = bdrv_co_do_pwrite_zeroes(bs, offset, bytes, flags);
+} else if (flags & BDRV_REQ_WRITE_COMPRESSED) {
+ret = bdrv_driver_pwritev_compressed(bs, offset, bytes, qiov);
 } else if (bytes <= max_transfer) {
 bdrv_debug_event(bs, BLKDBG_PWRITEV);
 ret = bdrv_driver_pwritev(bs, offset, bytes, qiov, flags);
@@ -1568,14 +1569,9 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
 bytes = ROUND_UP(bytes, align);
 }
 
-if (flags & BDRV_REQ_WRITE_COMPRESSED) {
-ret = bdrv_driver_pwritev_compressed(
-bs, offset, bytes, use_local_qiov ? &local_qiov : qiov);
-} else {
-ret = bdrv_aligned_pwritev(bs, &req, offset, bytes, align,
-   use_local_qiov ? &local_qiov : qiov,
-   flags);
-}
+ret = bdrv_aligned_pwritev(bs, &req, offset, bytes, align,
+   use_local_qiov ? &local_qiov : qiov,
+   flags);
 
 fail:
 
-- 
2.1.4

[Qemu-devel] [PATCH v7 14/16] blockdev-backup: added support for data compression

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

The idea is simple - backup is "written-once" data. It is written block
by block and it is large enough. It would be nice to save storage
space and compress it.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 blockdev.c   | 7 +--
 qapi/block-core.json | 4 
 qmp-commands.hx  | 4 +++-
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 587d76b..89c403f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3287,6 +3287,9 @@ void do_blockdev_backup(BlockdevBackup *backup, 
BlockJobTxn *txn, Error **errp)
 if (!backup->has_job_id) {
 backup->job_id = NULL;
 }
+if (!backup->has_compress) {
+backup->compress = false;
+}
 
 blk = blk_by_name(backup->device);
 if (!blk) {
@@ -3320,8 +3323,8 @@ void do_blockdev_backup(BlockdevBackup *backup, 
BlockJobTxn *txn, Error **errp)
 }
 }
 backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
- NULL, false, backup->on_source_error, backup->on_target_error,
- block_job_cb, bs, txn, &local_err);
+ NULL, backup->compress, backup->on_source_error,
+ backup->on_target_error, block_job_cb, bs, txn, &local_err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
 }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6d98da7..7f3424b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -935,6 +935,9 @@
 # @speed: #optional the maximum speed, in bytes per second. The default is 0,
 # for unlimited.
 #
+# @compress: #optional true to compress data, if the target format supports it.
+#(default: false) (since 2.7)
+#
 # @on-source-error: #optional the action to take on an error on the source,
 #   default 'report'.  'stop' and 'enospc' can only be used
 #   if the block device supports io-status (see BlockInfo).
@@ -953,6 +956,7 @@
   'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
 'sync': 'MirrorSyncMode',
 '*speed': 'int',
+'*compress': 'bool',
 '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError' } }
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index ff72194..d06ced5 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1275,7 +1275,7 @@ EQMP
 
 {
 .name   = "blockdev-backup",
-.args_type  = "job-id:s?,sync:s,device:B,target:B,speed:i?,"
+.args_type  = 
"job-id:s?,sync:s,device:B,target:B,speed:i?,compress:b?,"
   "on-source-error:s?,on-target-error:s?",
 .mhandler.cmd_new = qmp_marshal_blockdev_backup,
 },
@@ -1299,6 +1299,8 @@ Arguments:
   sectors allocated in the topmost image, or "none" to only replicate
   new I/O (MirrorSyncMode).
 - "speed": the maximum speed, in bytes per second (json-int, optional)
+- "compress": true to compress data, if the target format supports it.
+  (json-bool, optional, default false)
 - "on-source-error": the action to take on an error on the source, default
  'report'.  'stop' and 'enospc' can only be used
  if the block device supports io-status.
-- 
2.1.4

[Qemu-devel] [PATCH v7 09/16] block: remove BlockDriver.bdrv_write_compressed

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

There are no block drivers left that implement the old
.bdrv_write_compressed interface, so it can be removed. Also now we have
no need to use the bdrv_pwrite_compressed function and we can remove it
entirely.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/block-backend.c |  8 ++--
 block/io.c| 31 ---
 include/block/block.h |  2 --
 include/block/block_int.h |  3 ---
 qemu-img.c|  2 +-
 5 files changed, 3 insertions(+), 43 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4bfc2eb..53f7971 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1472,12 +1472,8 @@ int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, 
int64_t offset,
 int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void *buf,
   int count)
 {
-int ret = blk_check_byte_request(blk, offset, count);
-if (ret < 0) {
-return ret;
-}
-
-return bdrv_pwrite_compressed(blk->root, offset, buf, count);
+return blk_prw(blk, offset, (void *) buf, count, blk_write_entry,
+   BDRV_REQ_WRITE_COMPRESSED);
 }
 
 int blk_truncate(BlockBackend *blk, int64_t offset)
diff --git a/block/io.c b/block/io.c
index 7fad5b7..60922ed 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1886,37 +1886,6 @@ int bdrv_is_allocated_above(BlockDriverState *top,
 return 0;
 }
 
-int bdrv_pwrite_compressed(BdrvChild *child, int64_t offset,
-   const void *buf, int bytes)
-{
-BlockDriverState *bs = child->bs;
-BlockDriver *drv = bs->drv;
-QEMUIOVector qiov;
-struct iovec iov;
-
-if (!drv) {
-return -ENOMEDIUM;
-}
-if (drv->bdrv_write_compressed) {
-int ret = bdrv_check_byte_request(bs, offset, bytes);
-if (ret < 0) {
-return ret;
-}
-assert(QLIST_EMPTY(&bs->dirty_bitmaps));
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
-return drv->bdrv_write_compressed(bs, offset >> BDRV_SECTOR_BITS, buf,
-  bytes >> BDRV_SECTOR_BITS);
-}
-iov = (struct iovec) {
-.iov_base = (void *)buf,
-.iov_len = bytes,
-};
-qemu_iovec_init_external(&qiov, &iov, 1);
-
-return bdrv_prwv_co(child, offset, &qiov, true, BDRV_REQ_WRITE_COMPRESSED);
-}
-
 typedef struct BdrvVmstateCo {
 BlockDriverState*bs;
 QEMUIOVector*qiov;
diff --git a/include/block/block.h b/include/block/block.h
index d8dacd2..7edce5c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -400,8 +400,6 @@ const char *bdrv_get_node_name(const BlockDriverState *bs);
 const char *bdrv_get_device_name(const BlockDriverState *bs);
 const char *bdrv_get_device_or_node_name(const BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
-int bdrv_pwrite_compressed(BdrvChild *child, int64_t offset,
-   const void *buf, int bytes);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
 void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index d2673a1..378c966 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -204,9 +204,6 @@ struct BlockDriver {
 bool has_variable_length;
 int64_t (*bdrv_get_allocated_file_size)(BlockDriverState *bs);
 
-int (*bdrv_write_compressed)(BlockDriverState *bs, int64_t sector_num,
- const uint8_t *buf, int nb_sectors);
-
 int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, QEMUIOVector *qiov);
 
diff --git a/qemu-img.c b/qemu-img.c
index d5676a5..014c408 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2034,7 +2034,7 @@ static int img_convert(int argc, char **argv)
 const char *preallocation =
 qemu_opt_get(opts, BLOCK_OPT_PREALLOC);
 
-if (!drv->bdrv_write_compressed && !drv->bdrv_co_pwritev_compressed) {
+if (!drv->bdrv_co_pwritev_compressed) {
 error_report("Compression not supported for this file format");
 ret = -1;
 goto out;
-- 
2.1.4

[Qemu-devel] [PATCH v7 01/16] block: switch blk_write_compressed() to byte-based interface

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

This is a preparatory patch, which continues the general trend of the
transition to the byte-based interfaces. bdrv_check_request() and
blk_check_request() are no longer used, thus we can remove them.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/block-backend.c  | 23 ---
 block/io.c | 22 +++---
 include/block/block.h  |  4 ++--
 include/sysemu/block-backend.h |  4 ++--
 qemu-img.c |  6 --
 qemu-io-cmds.c |  2 +-
 6 files changed, 20 insertions(+), 41 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index effa038..8f38ab4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -727,21 +727,6 @@ static int blk_check_byte_request(BlockBackend *blk, 
int64_t offset,
 return 0;
 }
 
-static int blk_check_request(BlockBackend *blk, int64_t sector_num,
- int nb_sectors)
-{
-if (sector_num < 0 || sector_num > INT64_MAX / BDRV_SECTOR_SIZE) {
-return -EIO;
-}
-
-if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) {
-return -EIO;
-}
-
-return blk_check_byte_request(blk, sector_num * BDRV_SECTOR_SIZE,
-  nb_sectors * BDRV_SECTOR_SIZE);
-}
-
 int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
unsigned int bytes, QEMUIOVector *qiov,
BdrvRequestFlags flags)
@@ -1484,15 +1469,15 @@ int coroutine_fn blk_co_pwrite_zeroes(BlockBackend 
*blk, int64_t offset,
   flags | BDRV_REQ_ZERO_WRITE);
 }
 
-int blk_write_compressed(BlockBackend *blk, int64_t sector_num,
- const uint8_t *buf, int nb_sectors)
+int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void *buf,
+  int count)
 {
-int ret = blk_check_request(blk, sector_num, nb_sectors);
+int ret = blk_check_byte_request(blk, offset, count);
 if (ret < 0) {
 return ret;
 }
 
-return bdrv_write_compressed(blk_bs(blk), sector_num, buf, nb_sectors);
+return bdrv_pwrite_compressed(blk_bs(blk), offset, buf, count);
 }
 
 int blk_truncate(BlockBackend *blk, int64_t offset)
diff --git a/block/io.c b/block/io.c
index 7323f0f..e9f35c6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -540,17 +540,6 @@ static int bdrv_check_byte_request(BlockDriverState *bs, 
int64_t offset,
 return 0;
 }
 
-static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num,
-  int nb_sectors)
-{
-if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return -EIO;
-}
-
-return bdrv_check_byte_request(bs, sector_num * BDRV_SECTOR_SIZE,
-   nb_sectors * BDRV_SECTOR_SIZE);
-}
-
 typedef struct RwCo {
 BdrvChild *child;
 int64_t offset;
@@ -1878,8 +1867,8 @@ int bdrv_is_allocated_above(BlockDriverState *top,
 return 0;
 }
 
-int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
-  const uint8_t *buf, int nb_sectors)
+int bdrv_pwrite_compressed(BlockDriverState *bs, int64_t offset,
+   const void *buf, int bytes)
 {
 BlockDriver *drv = bs->drv;
 int ret;
@@ -1890,14 +1879,17 @@ int bdrv_write_compressed(BlockDriverState *bs, int64_t 
sector_num,
 if (!drv->bdrv_write_compressed) {
 return -ENOTSUP;
 }
-ret = bdrv_check_request(bs, sector_num, nb_sectors);
+ret = bdrv_check_byte_request(bs, offset, bytes);
 if (ret < 0) {
 return ret;
 }
 
 assert(QLIST_EMPTY(&bs->dirty_bitmaps));
+assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
-return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors);
+return drv->bdrv_write_compressed(bs, offset >> BDRV_SECTOR_BITS, buf,
+  bytes >> BDRV_SECTOR_BITS);
 }
 
 typedef struct BdrvVmstateCo {
diff --git a/include/block/block.h b/include/block/block.h
index 11c162d..b4a97f2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -399,8 +399,8 @@ const char *bdrv_get_node_name(const BlockDriverState *bs);
 const char *bdrv_get_device_name(const BlockDriverState *bs);
 const char *bdrv_get_device_or_node_name(const BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
-int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
-  const uint8_t *buf, int nb_sectors);
+int bdrv_pwrite_compressed(BlockDriverState *bs, int64_t offset,
+   const void *buf, int bytes);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 I

[Qemu-devel] [PATCH v7 04/16] qcow2: add qcow2_co_pwritev_compressed

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Added implementation of the qcow2_co_pwritev_compressed function that
will allow us to safely use compressed writes for the qcow2 from running
VMs.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/qcow2.c | 124 +++---
 1 file changed, 50 insertions(+), 74 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d620d0a..b5c69df 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2533,84 +2533,49 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t 
offset)
 return 0;
 }
 
-typedef struct Qcow2WriteCo {
-BlockDriverState *bs;
-int64_t sector_num;
-const uint8_t *buf;
-int nb_sectors;
-int ret;
-} Qcow2WriteCo;
-
-static void qcow2_write_co_entry(void *opaque)
-{
-Qcow2WriteCo *co = opaque;
-QEMUIOVector qiov;
-uint64_t offset = co->sector_num * BDRV_SECTOR_SIZE;
-uint64_t bytes = co->nb_sectors * BDRV_SECTOR_SIZE;
-
-struct iovec iov = (struct iovec) {
-.iov_base   = (uint8_t*) co->buf,
-.iov_len= bytes,
-};
-qemu_iovec_init_external(&qiov, &iov, 1);
-
-co->ret = qcow2_co_pwritev(co->bs, offset, bytes, &qiov, 0);
-}
-
-/* Wrapper for non-coroutine contexts */
-static int qcow2_write(BlockDriverState *bs, int64_t sector_num,
-   const uint8_t *buf, int nb_sectors)
-{
-Coroutine *co;
-AioContext *aio_context = bdrv_get_aio_context(bs);
-Qcow2WriteCo data = {
-.bs = bs,
-.sector_num = sector_num,
-.buf= buf,
-.nb_sectors = nb_sectors,
-.ret= -EINPROGRESS,
-};
-co = qemu_coroutine_create(qcow2_write_co_entry, &data);
-qemu_coroutine_enter(co);
-while (data.ret == -EINPROGRESS) {
-aio_poll(aio_context, true);
-}
-return data.ret;
-}
-
 /* XXX: put compressed sectors first, then all the cluster aligned
tables to avoid losing bytes in alignment */
-static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
-  const uint8_t *buf, int nb_sectors)
+static coroutine_fn int
+qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
+uint64_t bytes, QEMUIOVector *qiov)
 {
 BDRVQcow2State *s = bs->opaque;
+QEMUIOVector hd_qiov;
+struct iovec iov;
 z_stream strm;
 int ret, out_len;
-uint8_t *out_buf;
+uint8_t *buf, *out_buf;
 uint64_t cluster_offset;
 
-if (nb_sectors == 0) {
+if (bytes == 0) {
 /* align end of file to a sector boundary to ease reading with
sector based I/Os */
 cluster_offset = bdrv_getlength(bs->file->bs);
 return bdrv_truncate(bs->file->bs, cluster_offset);
 }
 
-if (nb_sectors != s->cluster_sectors) {
+if (bytes != s->cluster_size) {
 ret = -EINVAL;
 
 /* Zero-pad last write if image size is not cluster aligned */
-if (sector_num + nb_sectors == bs->total_sectors &&
-nb_sectors < s->cluster_sectors) {
+if (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS &&
+bytes < s->cluster_size)
+{
 uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
 memset(pad_buf, 0, s->cluster_size);
-memcpy(pad_buf, buf, nb_sectors * BDRV_SECTOR_SIZE);
-ret = qcow2_write_compressed(bs, sector_num,
- pad_buf, s->cluster_sectors);
+qemu_iovec_to_buf(qiov, 0, pad_buf, s->cluster_size);
+iov = (struct iovec) {
+.iov_base   = pad_buf,
+.iov_len= s->cluster_size,
+};
+qemu_iovec_init_external(&hd_qiov, &iov, 1);
+ret = qcow2_co_pwritev_compressed(bs, offset, bytes, &hd_qiov);
 qemu_vfree(pad_buf);
 }
 return ret;
 }
+buf = qemu_blockalign(bs, s->cluster_size);
+qemu_iovec_to_buf(qiov, 0, buf, s->cluster_size);
 
 out_buf = g_malloc(s->cluster_size + (s->cluster_size / 1000) + 128);
 
@@ -2641,33 +2606,44 @@ static int qcow2_write_compressed(BlockDriverState *bs, 
int64_t sector_num,
 
 if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
 /* could not compress: write normal cluster */
-ret = qcow2_write(bs, sector_num, buf, s->cluster_sectors);
+ret = qcow2_co_pwritev(bs, offset, bytes, qiov, 0);
 if (ret < 0) {
 goto fail;
 }
-} else {
-cluster_offset = qcow2_alloc_compressed_cluster_offset(bs,
-sector_num << 9, out_len);
-if (!cluster_offset) {
-ret = -EIO;
-goto fail;
-}
-cluster_offset &= s->cluster_offset_mask;
+goto success;
+}
 
-ret = qcow2_pre_write_overlap_c

[Qemu-devel] [PATCH v7 05/16] qcow2: cleanup qcow2_co_pwritev_compressed to avoid the recursion

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Now that the function uses a vector instead of a buffer, there is no
need to use recursive code.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/qcow2.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b5c69df..01bc003 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2554,27 +2554,17 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, 
uint64_t offset,
 return bdrv_truncate(bs->file->bs, cluster_offset);
 }
 
+buf = qemu_blockalign(bs, s->cluster_size);
 if (bytes != s->cluster_size) {
-ret = -EINVAL;
-
-/* Zero-pad last write if image size is not cluster aligned */
-if (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS &&
-bytes < s->cluster_size)
+if (bytes > s->cluster_size ||
+offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
 {
-uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
-memset(pad_buf, 0, s->cluster_size);
-qemu_iovec_to_buf(qiov, 0, pad_buf, s->cluster_size);
-iov = (struct iovec) {
-.iov_base   = pad_buf,
-.iov_len= s->cluster_size,
-};
-qemu_iovec_init_external(&hd_qiov, &iov, 1);
-ret = qcow2_co_pwritev_compressed(bs, offset, bytes, &hd_qiov);
-qemu_vfree(pad_buf);
+qemu_vfree(buf);
+return -EINVAL;
 }
-return ret;
+/* Zero-pad last write if image size is not cluster aligned */
+memset(buf + bytes, 0, s->cluster_size - bytes);
 }
-buf = qemu_blockalign(bs, s->cluster_size);
 qemu_iovec_to_buf(qiov, 0, buf, s->cluster_size);
 
 out_buf = g_malloc(s->cluster_size + (s->cluster_size / 1000) + 128);
-- 
2.1.4

[Qemu-devel] [PATCH v7 03/16] block/io: reuse bdrv_co_pwritev() for write compressed

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

For bdrv_pwrite_compressed() it looks like most of the code creating
coroutine is duplicated in bdrv_prwv_co(). So we can just add a flag
(BDRV_REQ_WRITE_COMPRESSED) and use bdrv_prwv_co() as a generic one.
In the end we get coroutine oriented function for write compressed by using
bdrv_co_pwritev/blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED flag.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/io.c| 56 +--
 include/block/block.h |  3 ++-
 include/block/block_int.h |  3 +++
 qemu-img.c|  2 +-
 4 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/block/io.c b/block/io.c
index 1503e09..7fad5b7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -886,6 +886,20 @@ emulate_flags:
 return ret;
 }
 
+static int coroutine_fn
+bdrv_driver_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
+   uint64_t bytes, QEMUIOVector *qiov)
+{
+BlockDriver *drv = bs->drv;
+
+if (!drv->bdrv_co_pwritev_compressed) {
+return -ENOTSUP;
+}
+
+assert(QLIST_EMPTY(&bs->dirty_bitmaps));
+return drv->bdrv_co_pwritev_compressed(bs, offset, bytes, qiov);
+}
+
 static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
 int64_t offset, unsigned int bytes, QEMUIOVector *qiov)
 {
@@ -1554,9 +1568,14 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
 bytes = ROUND_UP(bytes, align);
 }
 
-ret = bdrv_aligned_pwritev(bs, &req, offset, bytes, align,
-   use_local_qiov ? &local_qiov : qiov,
-   flags);
+if (flags & BDRV_REQ_WRITE_COMPRESSED) {
+ret = bdrv_driver_pwritev_compressed(
+bs, offset, bytes, use_local_qiov ? &local_qiov : qiov);
+} else {
+ret = bdrv_aligned_pwritev(bs, &req, offset, bytes, align,
+   use_local_qiov ? &local_qiov : qiov,
+   flags);
+}
 
 fail:
 
@@ -1872,25 +1891,30 @@ int bdrv_pwrite_compressed(BdrvChild *child, int64_t 
offset,
 {
 BlockDriverState *bs = child->bs;
 BlockDriver *drv = bs->drv;
-int ret;
+QEMUIOVector qiov;
+struct iovec iov;
 
 if (!drv) {
 return -ENOMEDIUM;
 }
-if (!drv->bdrv_write_compressed) {
-return -ENOTSUP;
-}
-ret = bdrv_check_byte_request(bs, offset, bytes);
-if (ret < 0) {
-return ret;
+if (drv->bdrv_write_compressed) {
+int ret = bdrv_check_byte_request(bs, offset, bytes);
+if (ret < 0) {
+return ret;
+}
+assert(QLIST_EMPTY(&bs->dirty_bitmaps));
+assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+return drv->bdrv_write_compressed(bs, offset >> BDRV_SECTOR_BITS, buf,
+  bytes >> BDRV_SECTOR_BITS);
 }
+iov = (struct iovec) {
+.iov_base = (void *)buf,
+.iov_len = bytes,
+};
+qemu_iovec_init_external(&qiov, &iov, 1);
 
-assert(QLIST_EMPTY(&bs->dirty_bitmaps));
-assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
-
-return drv->bdrv_write_compressed(bs, offset >> BDRV_SECTOR_BITS, buf,
-  bytes >> BDRV_SECTOR_BITS);
+return bdrv_prwv_co(child, offset, &qiov, true, BDRV_REQ_WRITE_COMPRESSED);
 }
 
 typedef struct BdrvVmstateCo {
diff --git a/include/block/block.h b/include/block/block.h
index 7bb5ddb..d8dacd2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -65,9 +65,10 @@ typedef enum {
 BDRV_REQ_MAY_UNMAP  = 0x4,
 BDRV_REQ_NO_SERIALISING = 0x8,
 BDRV_REQ_FUA= 0x10,
+BDRV_REQ_WRITE_COMPRESSED   = 0x20,
 
 /* Mask of valid flags */
-BDRV_REQ_MASK   = 0x1f,
+BDRV_REQ_MASK   = 0x3f,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1fe0fd9..d2673a1 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -207,6 +207,9 @@ struct BlockDriver {
 int (*bdrv_write_compressed)(BlockDriverState *bs, int64_t sector_num,
  const uint8_t *buf, int nb_sectors);
 
+int coroutine_fn (*bdrv_co_pwritev_compressed)(BlockDriverState *bs,
+uint64_t offset, uint64_t bytes, QEMUIOVector *qiov);
+
 int (*bdrv_snapshot_create)(BlockDriverState *bs,
 QEMUSnapshotInfo *sn_info);
 int (*bdrv_snapshot_goto)(BlockDriverState *bs,
diff --git a/qemu-img.c b/qemu-img.c
index c0f939b..d5676a5 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2034,7 +2034,7 @@ static int img_conver

[Qemu-devel] [PATCH v7 06/16] vmdk: add vmdk_co_pwritev_compressed

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Added implementation of the vmdk_co_pwritev_compressed function that
will allow us to safely use compressed writes for the vmdk from running
VMs.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/vmdk.c | 55 +--
 1 file changed, 5 insertions(+), 50 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 46d474e..a11c27a 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1645,56 +1645,11 @@ vmdk_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 return ret;
 }
 
-typedef struct VmdkWriteCompressedCo {
-BlockDriverState *bs;
-int64_t sector_num;
-const uint8_t *buf;
-int nb_sectors;
-int ret;
-} VmdkWriteCompressedCo;
-
-static void vmdk_co_write_compressed(void *opaque)
-{
-VmdkWriteCompressedCo *co = opaque;
-QEMUIOVector local_qiov;
-uint64_t offset = co->sector_num * BDRV_SECTOR_SIZE;
-uint64_t bytes = co->nb_sectors * BDRV_SECTOR_SIZE;
-
-struct iovec iov = (struct iovec) {
-.iov_base   = (uint8_t*) co->buf,
-.iov_len= bytes,
-};
-qemu_iovec_init_external(&local_qiov, &iov, 1);
-
-co->ret = vmdk_pwritev(co->bs, offset, bytes, &local_qiov, false, false);
-}
-
-static int vmdk_write_compressed(BlockDriverState *bs,
- int64_t sector_num,
- const uint8_t *buf,
- int nb_sectors)
+static int coroutine_fn
+vmdk_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
+   uint64_t bytes, QEMUIOVector *qiov)
 {
-BDRVVmdkState *s = bs->opaque;
-
-if (s->num_extents == 1 && s->extents[0].compressed) {
-Coroutine *co;
-AioContext *aio_context = bdrv_get_aio_context(bs);
-VmdkWriteCompressedCo data = {
-.bs = bs,
-.sector_num = sector_num,
-.buf= buf,
-.nb_sectors = nb_sectors,
-.ret= -EINPROGRESS,
-};
-co = qemu_coroutine_create(vmdk_co_write_compressed, &data);
-qemu_coroutine_enter(co);
-while (data.ret == -EINPROGRESS) {
-aio_poll(aio_context, true);
-}
-return data.ret;
-} else {
-return -ENOTSUP;
-}
+return vmdk_co_pwritev(bs, offset, bytes, qiov, 0);
 }
 
 static int coroutine_fn vmdk_co_pwrite_zeroes(BlockDriverState *bs,
@@ -2393,7 +2348,7 @@ static BlockDriver bdrv_vmdk = {
 .bdrv_reopen_prepare  = vmdk_reopen_prepare,
 .bdrv_co_preadv   = vmdk_co_preadv,
 .bdrv_co_pwritev  = vmdk_co_pwritev,
-.bdrv_write_compressed= vmdk_write_compressed,
+.bdrv_co_pwritev_compressed   = vmdk_co_pwritev_compressed,
 .bdrv_co_pwrite_zeroes= vmdk_co_pwrite_zeroes,
 .bdrv_close   = vmdk_close,
 .bdrv_create  = vmdk_create,
-- 
2.1.4

[Qemu-devel] [PATCH v7 02/16] block: Convert bdrv_pwrite_compressed() to BdrvChild

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Signed-off-by: Pavel Butsykin 
Signed-off-by: Denis V. Lunev 
Reviewed-by: Eric Blake 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/block-backend.c | 2 +-
 block/io.c| 3 ++-
 include/block/block.h | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 8f38ab4..4bfc2eb 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1477,7 +1477,7 @@ int blk_pwrite_compressed(BlockBackend *blk, int64_t 
offset, const void *buf,
 return ret;
 }
 
-return bdrv_pwrite_compressed(blk_bs(blk), offset, buf, count);
+return bdrv_pwrite_compressed(blk->root, offset, buf, count);
 }
 
 int blk_truncate(BlockBackend *blk, int64_t offset)
diff --git a/block/io.c b/block/io.c
index e9f35c6..1503e09 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1867,9 +1867,10 @@ int bdrv_is_allocated_above(BlockDriverState *top,
 return 0;
 }
 
-int bdrv_pwrite_compressed(BlockDriverState *bs, int64_t offset,
+int bdrv_pwrite_compressed(BdrvChild *child, int64_t offset,
const void *buf, int bytes)
 {
+BlockDriverState *bs = child->bs;
 BlockDriver *drv = bs->drv;
 int ret;
 
diff --git a/include/block/block.h b/include/block/block.h
index b4a97f2..7bb5ddb 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -399,7 +399,7 @@ const char *bdrv_get_node_name(const BlockDriverState *bs);
 const char *bdrv_get_device_name(const BlockDriverState *bs);
 const char *bdrv_get_device_or_node_name(const BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
-int bdrv_pwrite_compressed(BlockDriverState *bs, int64_t offset,
+int bdrv_pwrite_compressed(BdrvChild *child, int64_t offset,
const void *buf, int bytes);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
-- 
2.1.4

[Qemu-devel] [PATCH v7 13/16] drive-backup: added support for data compression

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

The idea is simple - backup is "written-once" data. It is written block
by block and it is large enough. It would be nice to save storage
space and compress it.

The patch adds a flag to the qmp/hmp drive-backup command which enables
block compression. Compression should be implemented in the format driver
to enable this feature.

There are some limitations of the format driver to allow compressed writes.
We can write data only once. Though for backup this is perfectly fine.
These limitations are maintained by the driver and the error will be
reported if we are doing something wrong.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 block/backup.c| 12 +++-
 blockdev.c|  9 ++---
 hmp-commands.hx   |  8 +---
 hmp.c |  3 +++
 include/block/block_int.h |  1 +
 qapi/block-core.json  |  5 -
 qmp-commands.hx   |  5 -
 7 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 2c05323..bb3bb9a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,6 +47,7 @@ typedef struct BackupBlockJob {
 uint64_t sectors_read;
 unsigned long *done_bitmap;
 int64_t cluster_size;
+bool compress;
 NotifierWithReturn before_write;
 QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;
@@ -154,7 +155,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
 } else {
 ret = blk_co_pwritev(job->target, start * job->cluster_size,
- bounce_qiov.size, &bounce_qiov, 0);
+ bounce_qiov.size, &bounce_qiov,
+ job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
0);
 }
 if (ret < 0) {
 trace_backup_do_cow_write_fail(job, start, ret);
@@ -477,6 +479,7 @@ static void coroutine_fn backup_run(void *opaque)
 void backup_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
+  bool compress,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
   BlockCompletionFunc *cb, void *opaque,
@@ -507,6 +510,12 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 return;
 }
 
+if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
+error_setg(errp, "Compression is not supported for this drive %s",
+   bdrv_get_device_name(target));
+return;
+}
+
 if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
 return;
 }
@@ -555,6 +564,7 @@ void backup_start(const char *job_id, BlockDriverState *bs,
 job->sync_mode = sync_mode;
 job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
sync_bitmap : NULL;
+job->compress = compress;
 
 /* If there is no backing file on the target, we cannot rely on COW if our
  * backup cluster size is smaller than the target cluster size. Even for
diff --git a/blockdev.c b/blockdev.c
index 0c5ea25..587d76b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3154,6 +3154,9 @@ static void do_drive_backup(DriveBackup *backup, 
BlockJobTxn *txn, Error **errp)
 if (!backup->has_job_id) {
 backup->job_id = NULL;
 }
+if (!backup->has_compress) {
+backup->compress = false;
+}
 
 blk = blk_by_name(backup->device);
 if (!blk) {
@@ -3242,8 +3245,8 @@ static void do_drive_backup(DriveBackup *backup, 
BlockJobTxn *txn, Error **errp)
 }
 
 backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
- bmap, backup->on_source_error, backup->on_target_error,
- block_job_cb, bs, txn, &local_err);
+ bmap, backup->compress, backup->on_source_error,
+ backup->on_target_error, block_job_cb, bs, txn, &local_err);
 bdrv_unref(target_bs);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
@@ -3317,7 +3320,7 @@ void do_blockdev_backup(BlockdevBackup *backup, 
BlockJobTxn *txn, Error **errp)
 }
 }
 backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
- NULL, backup->on_source_error, backup->on_target_error,
+ NULL, false, backup->on_source_error, backup->on_target_error,
  block_job_cb, bs, txn, &local_err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 848efee..74f32e5 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1182,8 +1182,8 @@ ETEXI

[Qemu-devel] [PATCH v7 15/16] qemu-iotests: test backup compression in 055

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Added cases to check the backup compression out of qcow2, raw in qcow2
on drive-backup and blockdev-backup.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 tests/qemu-iotests/055| 97 +++
 tests/qemu-iotests/055.out|  4 +-
 tests/qemu-iotests/iotests.py | 10 ++---
 3 files changed, 104 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index c8e3578..be81a42 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -451,5 +451,102 @@ class TestSingleTransaction(iotests.QMPTestCase):
 self.assert_qmp(result, 'error/class', 'GenericError')
 self.assert_no_active_block_jobs()
 
+
+class TestDriveCompression(iotests.QMPTestCase):
+image_len = 64 * 1024 * 1024 # MB
+outfmt = 'qcow2'
+
+def setUp(self):
+# Write data to the image so we can compare later
+qemu_img('create', '-f', iotests.imgfmt, test_img, 
str(TestDriveCompression.image_len))
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x11 0 64k', test_img)
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x00 64k 128k', test_img)
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x22 162k 32k', test_img)
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x33 67043328 64k', 
test_img)
+
+qemu_img('create', '-f', TestDriveCompression.outfmt, 
blockdev_target_img,
+ str(TestDriveCompression.image_len))
+self.vm = 
iotests.VM().add_drive(test_img).add_drive(blockdev_target_img,
+ 
format=TestDriveCompression.outfmt)
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(test_img)
+os.remove(blockdev_target_img)
+try:
+os.remove(target_img)
+except OSError:
+pass
+
+def do_test_compress_complete(self, cmd, **args):
+self.assert_no_active_block_jobs()
+
+result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, 
**args)
+self.assert_qmp(result, 'return', {})
+
+self.wait_until_completed()
+
+self.vm.shutdown()
+self.assertTrue(iotests.compare_images(test_img, blockdev_target_img,
+   iotests.imgfmt, 
TestDriveCompression.outfmt),
+'target image does not match source after backup')
+
+def test_complete_compress_drive_backup(self):
+self.do_test_compress_complete('drive-backup', 
target=blockdev_target_img, mode='existing')
+
+def test_complete_compress_blockdev_backup(self):
+self.do_test_compress_complete('blockdev-backup', target='drive1')
+
+def do_test_compress_cancel(self, cmd, **args):
+self.assert_no_active_block_jobs()
+
+result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, 
**args)
+self.assert_qmp(result, 'return', {})
+
+event = self.cancel_and_wait()
+self.assert_qmp(event, 'data/type', 'backup')
+
+def test_compress_cancel_drive_backup(self):
+self.do_test_compress_cancel('drive-backup', 
target=blockdev_target_img, mode='existing')
+
+def test_compress_cancel_blockdev_backup(self):
+self.do_test_compress_cancel('blockdev-backup', target='drive1')
+
+def do_test_compress_pause(self, cmd, **args):
+self.assert_no_active_block_jobs()
+
+self.vm.pause_drive('drive0')
+result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, 
**args)
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('block-job-pause', device='drive0')
+self.assert_qmp(result, 'return', {})
+
+self.vm.resume_drive('drive0')
+time.sleep(1)
+result = self.vm.qmp('query-block-jobs')
+offset = self.dictpath(result, 'return[0]/offset')
+
+time.sleep(1)
+result = self.vm.qmp('query-block-jobs')
+self.assert_qmp(result, 'return[0]/offset', offset)
+
+result = self.vm.qmp('block-job-resume', device='drive0')
+self.assert_qmp(result, 'return', {})
+
+self.wait_until_completed()
+
+self.vm.shutdown()
+self.assertTrue(iotests.compare_images(test_img, blockdev_target_img,
+   iotests.imgfmt, 
TestDriveCompression.outfmt),
+'target image does not match source after backup')
+
+def test_compress_pause_drive_backup(self):
+self.do_test_compress_pause('drive-backup', 
target=blockdev_target_img, mode='existing')
+
+def test_compress_pause_blockdev_backup(self):
+self.do_test_compress_pause('blockdev-backup', target='drive1')
+
 if __name__ == '__main__':
 iotests.main(supported_fmts

Re: [Qemu-devel] [RFC PATCH V9 5/7] qemu-char: Add qemu_chr_add_handlers_full() for GMaincontext

2016-07-22 Thread Zhang Chen


add to: Daniel P . Berrange


On 07/22/2016 02:56 PM, Zhang Chen wrote:



On 07/22/2016 02:45 PM, Li Zhijian wrote:



On 07/22/2016 01:38 PM, Zhang Chen wrote:

Add qemu_chr_add_handlers_full() API, we can use
this API pass in a GMainContext,make handler run
in the context rather than main_loop.
This comments from Daniel P . Berrange.

Cc: Daniel P . Berrange 
Cc: Paolo Bonzini 

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  include/sysemu/char.h |  11 -
  qemu-char.c   | 119 
+++---

  2 files changed, 84 insertions(+), 46 deletions(-)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index 307fd8f..86888bc 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -65,7 +65,8 @@ struct CharDriverState {
  int (*chr_sync_read)(struct CharDriverState *s,
   const uint8_t *buf, int len);
  GSource *(*chr_add_watch)(struct CharDriverState *s, 
GIOCondition cond);

-void (*chr_update_read_handler)(struct CharDriverState *s);
+void (*chr_update_read_handler_full)(struct CharDriverState *s,
+ GMainContext *context);
  int (*chr_ioctl)(struct CharDriverState *s, int cmd, void *arg);
  int (*get_msgfds)(struct CharDriverState *s, int* fds, int num);
  int (*set_msgfds)(struct CharDriverState *s, int *fds, int num);
@@ -388,6 +389,14 @@ void qemu_chr_add_handlers(CharDriverState *s,
 IOEventHandler *fd_event,
 void *opaque);

+/* This API can make handler run in the context what you pass to. */
+void qemu_chr_add_handlers_full(CharDriverState *s,
+IOCanReadHandler *fd_can_read,
+IOReadHandler *fd_read,
+IOEventHandler *fd_event,
+void *opaque,
+GMainContext *context);
+
  void qemu_chr_be_generic_open(CharDriverState *s);
  void qemu_chr_accept_input(CharDriverState *s);
  int qemu_chr_add_client(CharDriverState *s, int fd);
diff --git a/qemu-char.c b/qemu-char.c
index b597ee1..0a45c9e 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -448,11 +448,12 @@ void qemu_chr_fe_printf(CharDriverState *s, 
const char *fmt, ...)


  static void remove_fd_in_watch(CharDriverState *chr);

-void qemu_chr_add_handlers(CharDriverState *s,
-   IOCanReadHandler *fd_can_read,
-   IOReadHandler *fd_read,
-   IOEventHandler *fd_event,
-   void *opaque)
+void qemu_chr_add_handlers_full(CharDriverState *s,
+IOCanReadHandler *fd_can_read,
+IOReadHandler *fd_read,
+IOEventHandler *fd_event,
+void *opaque,
+GMainContext *context)
  {
  int fe_open;

@@ -466,8 +467,9 @@ void qemu_chr_add_handlers(CharDriverState *s,
  s->chr_read = fd_read;
  s->chr_event = fd_event;
  s->handler_opaque = opaque;
-if (fe_open && s->chr_update_read_handler)
-s->chr_update_read_handler(s);
+if (fe_open && s->chr_update_read_handler_full) {
+s->chr_update_read_handler_full(s, context);
+}

  if (!s->explicit_fe_open) {
  qemu_chr_fe_set_open(s, fe_open);
@@ -480,6 +482,16 @@ void qemu_chr_add_handlers(CharDriverState *s,
  }
  }

+void qemu_chr_add_handlers(CharDriverState *s,
+   IOCanReadHandler *fd_can_read,
+   IOReadHandler *fd_read,
+   IOEventHandler *fd_event,
+   void *opaque)
+{
+qemu_chr_add_handlers_full(s, fd_can_read, fd_read,
+   fd_event, opaque, NULL);
+}
+
  static int null_chr_write(CharDriverState *chr, const uint8_t 
*buf, int len)

  {
  return len;
@@ -717,7 +729,8 @@ static void mux_chr_event(void *opaque, int event)
  mux_chr_send_event(d, i, event);
  }

-static void mux_chr_update_read_handler(CharDriverState *chr)
+static void mux_chr_update_read_handler_full(CharDriverState *chr,
+ GMainContext *context)
  {
  MuxDriver *d = chr->opaque;

@@ -731,8 +744,10 @@ static void 
mux_chr_update_read_handler(CharDriverState *chr)

  d->chr_event[d->mux_cnt] = chr->chr_event;
  /* Fix up the real driver with mux routines */
  if (d->mux_cnt == 0) {
-qemu_chr_add_handlers(d->drv, mux_chr_can_read, mux_chr_read,
-  mux_chr_event, chr);
+qemu_chr_add_handlers_full(d->drv, mux_chr_can_read,
+   mux_chr_read,
+   mux_chr_event,
+   chr, context);
  }
  if (d->focus != -1) {

[Qemu-devel] [PATCH v7 12/16] block: simplify blockdev-backup

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

Now that we can support boxed commands, use it to greatly reduce the
number of parameters (and likelihood of getting out of sync) when
adjusting blockdev-backup parameters.

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 blockdev.c   | 70 +---
 qapi/block-core.json |  6 -
 2 files changed, 27 insertions(+), 49 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index e29147a..0c5ea25 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1906,14 +1906,8 @@ typedef struct BlockdevBackupState {
 AioContext *aio_context;
 } BlockdevBackupState;
 
-static void do_blockdev_backup(const char *job_id, const char *device,
-   const char *target, enum MirrorSyncMode sync,
-   bool has_speed, int64_t speed,
-   bool has_on_source_error,
-   BlockdevOnError on_source_error,
-   bool has_on_target_error,
-   BlockdevOnError on_target_error,
-   BlockJobTxn *txn, Error **errp);
+static void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn,
+   Error **errp);
 
 static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
 {
@@ -1953,12 +1947,7 @@ static void blockdev_backup_prepare(BlkActionState 
*common, Error **errp)
 state->bs = blk_bs(blk);
 bdrv_drained_begin(state->bs);
 
-do_blockdev_backup(backup->has_job_id ? backup->job_id : NULL,
-   backup->device, backup->target, backup->sync,
-   backup->has_speed, backup->speed,
-   backup->has_on_source_error, backup->on_source_error,
-   backup->has_on_target_error, backup->on_target_error,
-   common->block_job_txn, &local_err);
+do_blockdev_backup(backup, common->block_job_txn, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -3275,14 +3264,7 @@ BlockDeviceInfoList *qmp_query_named_block_nodes(Error 
**errp)
 return bdrv_named_nodes_list(errp);
 }
 
-void do_blockdev_backup(const char *job_id, const char *device,
-const char *target, enum MirrorSyncMode sync,
- bool has_speed, int64_t speed,
- bool has_on_source_error,
- BlockdevOnError on_source_error,
- bool has_on_target_error,
- BlockdevOnError on_target_error,
- BlockJobTxn *txn, Error **errp)
+void do_blockdev_backup(BlockdevBackup *backup, BlockJobTxn *txn, Error **errp)
 {
 BlockBackend *blk;
 BlockDriverState *bs;
@@ -3290,19 +3272,22 @@ void do_blockdev_backup(const char *job_id, const char 
*device,
 Error *local_err = NULL;
 AioContext *aio_context;
 
-if (!has_speed) {
-speed = 0;
+if (!backup->has_speed) {
+backup->speed = 0;
 }
-if (!has_on_source_error) {
-on_source_error = BLOCKDEV_ON_ERROR_REPORT;
+if (!backup->has_on_source_error) {
+backup->on_source_error = BLOCKDEV_ON_ERROR_REPORT;
 }
-if (!has_on_target_error) {
-on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+if (!backup->has_on_target_error) {
+backup->on_target_error = BLOCKDEV_ON_ERROR_REPORT;
+}
+if (!backup->has_job_id) {
+backup->job_id = NULL;
 }
 
-blk = blk_by_name(device);
+blk = blk_by_name(backup->device);
 if (!blk) {
-error_setg(errp, "Device '%s' not found", device);
+error_setg(errp, "Device '%s' not found", backup->device);
 return;
 }
 
@@ -3310,12 +3295,12 @@ void do_blockdev_backup(const char *job_id, const char 
*device,
 aio_context_acquire(aio_context);
 
 if (!blk_is_available(blk)) {
-error_setg(errp, "Device '%s' has no medium", device);
+error_setg(errp, "Device '%s' has no medium", backup->device);
 goto out;
 }
 bs = blk_bs(blk);
 
-target_bs = bdrv_lookup_bs(target, target, errp);
+target_bs = bdrv_lookup_bs(backup->target, backup->target, errp);
 if (!target_bs) {
 goto out;
 }
@@ -3331,8 +3316,9 @@ void do_blockdev_backup(const char *job_id, const char 
*device,
 goto out;
 }
 }
-backup_start(job_id, bs, target_bs, speed, sync, NULL, on_source_error,
- on_target_error, block_job_cb, bs, txn, &local_err);
+backup_start(backup->job_id, bs, target_bs, backup->speed, backup->sync,
+ NULL, backup->on_source_error, backup->on_target_error,
+ block_job_cb, bs, txn, &local_err);
 if (local_err != NULL) {
 error_pro

[Qemu-devel] [PATCH v7 16/16] qemu-iotests: add vmdk for test backup compression in 055

2016-07-22 Thread Denis V. Lunev

From: Pavel Butsykin 

The vmdk format has support for compression, it would be fine to add it for
the test backup compression

Signed-off-by: Pavel Butsykin 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Markus Armbruster 
CC: Eric Blake 
CC: John Snow 
CC: Stefan Hajnoczi 
CC: Kevin Wolf 
---
 tests/qemu-iotests/055 | 57 ++
 1 file changed, 39 insertions(+), 18 deletions(-)

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index be81a42..cf5a423 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -454,7 +454,8 @@ class TestSingleTransaction(iotests.QMPTestCase):
 
 class TestDriveCompression(iotests.QMPTestCase):
 image_len = 64 * 1024 * 1024 # MB
-outfmt = 'qcow2'
+fmt_supports_compression = [{'type': 'qcow2', 'args': ()},
+{'type': 'vmdk', 'args': ('-o', 
'subformat=streamOptimized')}]
 
 def setUp(self):
 # Write data to the image so we can compare later
@@ -464,12 +465,6 @@ class TestDriveCompression(iotests.QMPTestCase):
 qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x22 162k 32k', test_img)
 qemu_io('-f', iotests.imgfmt, '-c', 'write -P0x33 67043328 64k', 
test_img)
 
-qemu_img('create', '-f', TestDriveCompression.outfmt, 
blockdev_target_img,
- str(TestDriveCompression.image_len))
-self.vm = 
iotests.VM().add_drive(test_img).add_drive(blockdev_target_img,
- 
format=TestDriveCompression.outfmt)
-self.vm.launch()
-
 def tearDown(self):
 self.vm.shutdown()
 os.remove(test_img)
@@ -479,7 +474,18 @@ class TestDriveCompression(iotests.QMPTestCase):
 except OSError:
 pass
 
-def do_test_compress_complete(self, cmd, **args):
+def do_prepare_drives(self, fmt, args):
+self.vm = iotests.VM().add_drive(test_img)
+
+qemu_img('create', '-f', fmt, blockdev_target_img,
+ str(TestDriveCompression.image_len), *args)
+self.vm.add_drive(blockdev_target_img, format=fmt)
+
+self.vm.launch()
+
+def do_test_compress_complete(self, cmd, format, **args):
+self.do_prepare_drives(format['type'], format['args'])
+
 self.assert_no_active_block_jobs()
 
 result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, 
**args)
@@ -489,16 +495,21 @@ class TestDriveCompression(iotests.QMPTestCase):
 
 self.vm.shutdown()
 self.assertTrue(iotests.compare_images(test_img, blockdev_target_img,
-   iotests.imgfmt, 
TestDriveCompression.outfmt),
+   iotests.imgfmt, format['type']),
 'target image does not match source after backup')
 
 def test_complete_compress_drive_backup(self):
-self.do_test_compress_complete('drive-backup', 
target=blockdev_target_img, mode='existing')
+for format in TestDriveCompression.fmt_supports_compression:
+self.do_test_compress_complete('drive-backup', format,
+   target=blockdev_target_img, 
mode='existing')
 
 def test_complete_compress_blockdev_backup(self):
-self.do_test_compress_complete('blockdev-backup', target='drive1')
+for format in TestDriveCompression.fmt_supports_compression:
+self.do_test_compress_complete('blockdev-backup', format, 
target='drive1')
+
+def do_test_compress_cancel(self, cmd, format, **args):
+self.do_prepare_drives(format['type'], format['args'])
 
-def do_test_compress_cancel(self, cmd, **args):
 self.assert_no_active_block_jobs()
 
 result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, 
**args)
@@ -507,13 +518,20 @@ class TestDriveCompression(iotests.QMPTestCase):
 event = self.cancel_and_wait()
 self.assert_qmp(event, 'data/type', 'backup')
 
+self.vm.shutdown()
+
 def test_compress_cancel_drive_backup(self):
-self.do_test_compress_cancel('drive-backup', 
target=blockdev_target_img, mode='existing')
+for format in TestDriveCompression.fmt_supports_compression:
+self.do_test_compress_cancel('drive-backup', format,
+ target=blockdev_target_img, 
mode='existing')
 
 def test_compress_cancel_blockdev_backup(self):
-self.do_test_compress_cancel('blockdev-backup', target='drive1')
+   for format in TestDriveCompression.fmt_supports_compression:
+self.do_test_compress_cancel('blockdev-backup', format, 
target='drive1')
+
+def do_test_compress_pause(self, cmd, format, **args):
+self.do_prepare_drives(format['type'], format['args'])
 
-def do_test_compress_pause(self, cmd, **args):
 self.assert_no_active_block_jobs()
 
 self.vm.pause_drive('drive0')

Re: [Qemu-devel] [PATCH 28/37] virtio-input: free config list

2016-07-22 Thread Marc-André Lureau

Hi

- Original Message -
> > --- a/hw/input/virtio-input-hid.c
> > +++ b/hw/input/virtio-input-hid.c
> 
> > +.instance_finalize = virtio_input_finalize,
> 
> > --- a/hw/input/virtio-input.c
> > +++ b/hw/input/virtio-input.c
> 
> > +void virtio_input_finalize(Object *obj)
> > +{
> > +VirtIOInput *vinput = VIRTIO_INPUT(obj);
> > +VirtIOInputConfig *cfg, *next;
> > +
> > +QTAILQ_FOREACH_SAFE(cfg, &vinput->cfg_list, node, next) {
> > +QTAILQ_REMOVE(&vinput->cfg_list, cfg, node);
> > +g_free(cfg);
> > +}
> > +}
> 
> I think you can keep this local to virtio-input.c and simply hook it
> into the abstract base class (TYPE_VIRTIO_INPUT).
> 

Yes, not sure why I didn't do that in the first, place. fixed.

Re: [Qemu-devel] [Qemu-ppc] [PATCH 6/8] spapr: init CPUState->cpu_index with index relative to core-id

2016-07-22 Thread Greg Kurz

On Fri, 22 Jul 2016 17:14:33 +1000
David Gibson  wrote:

> On Fri, Jul 22, 2016 at 11:40:03AM +0530, Bharata B Rao wrote:
> > On Fri, Jul 22, 2016 at 01:23:01PM +1000, David Gibson wrote:  
> > > On Thu, Jul 21, 2016 at 05:54:37PM +0200, Igor Mammedov wrote:  
> > > > It will enshure that cpu_index for a given cpu stays the same
> > > > regardless of the order cpus has been created/deleted and so
> > > > it would be possible to migrate QEMU instance with out of order
> > > > created CPU.
> > > > 
> > > > Signed-off-by: Igor Mammedov   
> > > 
> > > So, this isn't quite right (it wasn't right in my version either).
> > > 
> > > The problem occurs when smp_threads < kvmppc_smt_threads().  That is,
> > > when the requested threads-per-core is less than the hardware's
> > > maximum number of threads-per-core.
> > > 
> > > The core-id values are assigned essentially as i *
> > > kvmppc_smt_threads(), meaning the patch below will leave gaps in the
> > > cpu_index values and the last ones will exceed max_cpus, causing other
> > > problems.  
> > 
> > This would lead to hotplug failures as cpu_dt_id is still being
> > derived from non-contiguous cpu_index resulting in wrong enumeration
> > of CPU nodes in DT.  
> 
> Which "This" are you referring to?
> 
> > 
> > For -smp 8,threads=4 we see the following CPU nodes in DT
> > 
> > PowerPC,POWER8@0 PowerPC,POWER8@10
> > 
> > which otherwise should have been
> > 
> > PowerPC,POWER8@0 PowerPC,POWER8@8
> > 
> > The problem manifests as drmgr failure.
> > 
> > Greg's patchset that moved cpu_dt_id setting to machine code and that
> > derived cpu_dt_id from core-id for sPAPR would be needed to fix this
> > I guess.  
> 
> No, it shouldn't be necessary to fix it.  But we certainly do want to
> clean this stuff up.  I'm not terribly convinced by the current
> approach in Greg's series though.  I'd actually prefer to remove
> cpu_dt_id from the cpustate entirely and instead work it out from the

That was the initial plan for my series :)

> (now stable) cpu index when we go to construct the device tree.
> 

Patch 5/8 provides a stable cpu_index for the pc machine type out of the
index in possible_cpus[].

Since we also store cores and threads in fixed size arrays, we could easily
follow the same logic: 

index_of_core_in_spapr_cores * smp_threads + index_of_thread_in_core_threads

Cheers.

--
Greg


pgp4QHY_KzhMj.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)

2016-07-22 Thread Dave Chinner

On Thu, Jul 21, 2016 at 10:23:48AM -0400, Paolo Bonzini wrote:
> > > 1) avoid copying zero data, to keep the copy process efficient.  For this,
> > > SEEK_HOLE/SEEK_DATA are enough.
> > > 
> > > 2) copy file contents while preserving the allocation state of the file's
> > > extents.
> > 
> > Which is /very difficult/ to do safely and reliably.
> > i.e. the use of fiemap to duplicate the exact layout of a file
> > from userspace is only posisble if you can /guarantee/ the source
> > file has not changed in any way during the copy operation at the
> > pointin time you finalise the destination data copy.
> 
> We don't do exactly that, exactly because it's messy when you have
> concurrent accesses (which shouldn't be done but you never know).

Which means you *cannot make the assumption it won't happen*.

FIEMAP is not guaranteed to tell you exactly where the data in the
file is that you need to copy is and that nothing you can do from
userspace changes that. I can't say it any clearer than that.

> When
> doing a copy, we use(d to use) FIEMAP the same way as you'd use lseek,
> querying one extent at a time.  If you proceed this way, all of these
> can cause the same races:
> 
> - pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is
> not copied
> 
> - pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is
> copied
> 
> - lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not
> copied
> 
> - lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is
> copied
> 
> - ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so
> the 10MB..20MB area is not copied

No, FIEMAP is not guaranteed to behave like this. what is returned
is filesystem dependent. Fielsystems that don't support holes will
return data extents. Filesystems that support compression might
return a compressed data extent rather than a hole. Encrypted files
might not expose holes at all, so people can't easily find known
plain text regions in the encrypted data. Filesystems could report
holes as deduplicated data, etc.  What do you do when FIEMAP returns
"OFFLINE" to indicate that the data is located elsewhere and will
need to be retrieved by the HSM operating on top of the filesystem
before layout can be determined?

All of the above are *valid* and *correct*, because the filesytem
defines what FIEMAP returns for a given file offset. just because
ext4 and XFS have mostly the same behaviour, it doesn't mean that
every other filesystem behaves the same way.

The assumptions being made about FIEMAP behaviour will only lead to
user data corruption, as they already have several times in the past.

Cheers,

Dave.
-- 
Dave Chinner
dchin...@redhat.com

Re: [Qemu-devel] [PULL 15/15] translate-all: add tb hash bucket info to 'info jit' dump

2016-07-22 Thread Changlong Xie


On 06/10/2016 10:26 PM, Richard Henderson wrote:

From: "Emilio G. Cota" 

Examples:

- Good hashing, i.e. tb_hash_func5(phys_pc, pc, flags):
TB count715135/2684354
[...]
TB hash buckets 388775/524288 (74.15% head buckets used)
TB hash occupancy   33.04% avg chain occ. Histogram: [0,10)%|▆ █  
▅▁▃▁▁|[90,100]%
TB hash avg chain   1.017 buckets. Histogram: 1|█▁▁|3

- Not-so-good hashing, i.e. tb_hash_func5(phys_pc, pc, 0):
TB count712636/2684354
[...]
TB hash buckets 344924/524288 (65.79% head buckets used)
TB hash occupancy   31.64% avg chain occ. Histogram: [0,10)%|█ ▆  
▅▁▃▁▂|[90,100]%
TB hash avg chain   1.047 buckets. Histogram: 1|█▁▁▁|4

- Bad hashing, i.e. tb_hash_func5(phys_pc, 0, 0):
TB count702818/2684354
[...]
TB hash buckets 112741/524288 (21.50% head buckets used)
TB hash occupancy   10.15% avg chain occ. Histogram: [0,10)%|█ ▁  
▁|[90,100]%
TB hash avg chain   2.107 buckets. Histogram: [1.0,10.2)|█▁|[83.8,93.0]

- Good hashing, but no auto-resize:
TB count715634/2684354
TB hash buckets 8192/8192 (100.00% head buckets used)
TB hash occupancy   98.30% avg chain occ. Histogram: 
[95.3,95.8)%|▁▁▃▄▃▄▁▇▁█|[99.5,100.0]%
TB hash avg chain   22.070 buckets. Histogram: 
[15.0,16.7)|▁▂▅▄█▅|[30.3,32.0]

Acked-by: Sergey Fedorov 
Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
Message-Id: <1465412133-3029-16-git-send-email-c...@braap.org>
Signed-off-by: Richard Henderson 
---
  translate-all.c | 36 
  1 file changed, 36 insertions(+)

diff --git a/translate-all.c b/translate-all.c
index b620fcc..e8b88b4 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1668,6 +1668,10 @@ void dump_exec_info(FILE *f, fprintf_function 
cpu_fprintf)
  int i, target_code_size, max_target_code_size;
  int direct_jmp_count, direct_jmp2_count, cross_page;
  TranslationBlock *tb;
+struct qht_stats hst;
+uint32_t hgram_opts;
+size_t hgram_bins;
+char *hgram;

  target_code_size = 0;
  max_target_code_size = 0;
@@ -1718,6 +1722,38 @@ void dump_exec_info(FILE *f, fprintf_function 
cpu_fprintf)
  direct_jmp2_count,
  tcg_ctx.tb_ctx.nb_tbs ? (direct_jmp2_count * 100) /
  tcg_ctx.tb_ctx.nb_tbs : 0);
+
+qht_statistics_init(&tcg_ctx.tb_ctx.htable, &hst);


Hi Emilio

  If we use kvm accelerator, we will encouter segment fault.

slave:~/.xie # gdb --args qemu-colo/x86_64-softmmu/qemu-system-x86_64 
--enable-kvm -drive driver=qcow2,file.filename=fd16.qcow2 -monitor stdio 
-vnc :1


(qemu) info jit
Translation buffer state:
gen code size   0/0
TB count0/0
TB avg target size  0 max=0 bytes
TB avg host size0 bytes (expansion ratio: 0.0)
cross page TB count 0 (0%)
direct jump count   0 (0%) (2 jumps=0 0%)

Program received signal SIGSEGV, Segmentation fault.
0x55c0c432 in qht_statistics_init (ht=0x561ddfd0 
, stats=0x7fffcab0) at util/qht.c:786

786 stats->head_buckets = map->n_buckets;
Program received signal SIGSEGV, Segmentation fault.
0x55c0c432 in qht_statistics_init (ht=0x561ddfd0 
, stats=0x7fffcab0) at util/qht.c:786

786 stats->head_buckets = map->n_buckets;
(gdb) p *ht
$1 = {map = 0x0, lock = {lock = {__data = {__lock = 0, __count = 0, 
__owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 
0x0, __next = 0x0}}, __size =

'\000' , __align = 0}}, mode = 0}
(gdb)

Thanks
-Xie

+
+cpu_fprintf(f, "TB hash buckets %zu/%zu (%0.2f%% head buckets used)\n",
+hst.used_head_buckets, hst.head_buckets,
+(double)hst.used_head_buckets / hst.head_buckets * 100);
+
+hgram_opts =  QDIST_PR_BORDER | QDIST_PR_LABELS;
+hgram_opts |= QDIST_PR_100X   | QDIST_PR_PERCENT;
+if (qdist_xmax(&hst.occupancy) - qdist_xmin(&hst.occupancy) == 1) {
+hgram_opts |= QDIST_PR_NODECIMAL;
+}
+hgram = qdist_pr(&hst.occupancy, 10, hgram_opts);
+cpu_fprintf(f, "TB hash occupancy   %0.2f%% avg chain occ. Histogram: 
%s\n",
+qdist_avg(&hst.occupancy) * 100, hgram);
+g_free(hgram);
+
+hgram_opts = QDIST_PR_BORDER | QDIST_PR_LABELS;
+hgram_bins = qdist_xmax(&hst.chain) - qdist_xmin(&hst.chain);
+if (hgram_bins > 10) {
+hgram_bins = 10;
+} else {
+hgram_bins = 0;
+hgram_opts |= QDIST_PR_NODECIMAL | QDIST_PR_NOBINRANGE;
+}
+hgram = qdist_pr(&hst.chain, hgram_bins, hgram_opts);
+cpu_fprintf(f, "TB hash avg chain   %0.3f buckets. Histogram: %s\n",
+qdist_avg(&hst.chain), hgram);
+g_free(hgram);
+
+qht_statistics_destroy(&hst);
+
  cpu_fprintf(f, "\nStatistics:\n");
  cpu_fprintf(f, "TB flush count  %d\n", tcg_ctx.tb_ctx.tb_flush_count);
  cpu_fprintf(f, "TB invalidate count %d\n",

Re: [Qemu-devel] [PATCH v2 1/3] ipmi_bmc_sim: Add a proper unrealize function

2016-07-22 Thread Marc-André Lureau

Hi

- Original Message -
> From: Corey Minyard 
> 
> Add an unrealize function to free the timer allocated in the
> realize function, unregsiter the vmstate, and free any
> pending messages.

I don't know how to test this either, the device seems to be hotpluggable, but 
doing device_del crashes qemu. Looks like it would be worth fixing that too 
(even better would be to automate this kind of test for all devices, but that's 
just some thought)

> Also, get rid of the unnecessary mutex, it was a vestige
> of something else that was not done.  That way we don't
> have to free it.

You may want to split this in a seperate patch

> 
> Signed-off-by: Corey Minyard 
> Cc: Marc-André Lureau 
> ---
>  hw/ipmi/ipmi_bmc_sim.c | 22 --
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
> index dc9c14c..fe92b93 100644
> --- a/hw/ipmi/ipmi_bmc_sim.c
> +++ b/hw/ipmi/ipmi_bmc_sim.c
> @@ -217,7 +217,6 @@ struct IPMIBmcSim {
>  /* Odd netfns are for responses, so we only need the even ones. */
>  const IPMINetfn *netfns[MAX_NETFNS / 2];
>  
> -QemuMutex lock;
>  /* We allow one event in the buffer */
>  uint8_t evtbuf[16];
>  
> @@ -940,7 +939,6 @@ static void get_msg(IPMIBmcSim *ibs,
>  {
>  IPMIRcvBufEntry *msg;
>  
> -qemu_mutex_lock(&ibs->lock);
>  if (QTAILQ_EMPTY(&ibs->rcvbufs)) {
>  rsp_buffer_set_error(rsp, 0x80); /* Queue empty */
>  goto out;
> @@ -960,7 +958,6 @@ static void get_msg(IPMIBmcSim *ibs,
>  }
>  
>  out:
> -qemu_mutex_unlock(&ibs->lock);
>  return;
>  }
>  
> @@ -1055,11 +1052,9 @@ static void send_msg(IPMIBmcSim *ibs,
>   end_msg:
>  msg->buf[msg->len] = ipmb_checksum(msg->buf, msg->len, 0);
>  msg->len++;
> -qemu_mutex_lock(&ibs->lock);
>  QTAILQ_INSERT_TAIL(&ibs->rcvbufs, msg, entry);
>  ibs->msg_flags |= IPMI_BMC_MSG_FLAG_RCV_MSG_QUEUE;
>  k->set_atn(s, 1, attn_irq_enabled(ibs));
> -qemu_mutex_unlock(&ibs->lock);
>  }
>  
>  static void do_watchdog_reset(IPMIBmcSim *ibs)
> @@ -1753,7 +1748,6 @@ static void ipmi_sim_realize(DeviceState *dev, Error
> **errp)
>  unsigned int i;
>  IPMIBmcSim *ibs = IPMI_BMC_SIMULATOR(b);
>  
> -qemu_mutex_init(&ibs->lock);
>  QTAILQ_INIT(&ibs->rcvbufs);
>  
>  ibs->bmc_global_enables = (1 << IPMI_BMC_EVENT_LOG_BIT);
> @@ -1786,12 +1780,28 @@ static void ipmi_sim_realize(DeviceState *dev, Error
> **errp)
>  vmstate_register(NULL, 0, &vmstate_ipmi_sim, ibs);
>  }
>  
> +static void ipmi_sim_unrealize(DeviceState *dev, Error **errp)
> +{
> +IPMIBmc *b = IPMI_BMC(dev);
> +IPMIRcvBufEntry *msg, *tmp;
> +IPMIBmcSim *ibs = IPMI_BMC_SIMULATOR(b);
> +
> +vmstate_unregister(NULL, &vmstate_ipmi_sim, ibs);
> +timer_del(ibs->timer);
> +timer_free(ibs->timer);
> +QTAILQ_FOREACH_SAFE(msg, &ibs->rcvbufs, entry, tmp) {
> +QTAILQ_REMOVE(&ibs->rcvbufs, msg, entry);
> +g_free(msg);
> +}
> +}
> +

Otherwise, for completeness, this looks good so
Reviewed-by: Marc-André Lureau 

>  static void ipmi_sim_class_init(ObjectClass *oc, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(oc);
>  IPMIBmcClass *bk = IPMI_BMC_CLASS(oc);
>  
>  dc->realize = ipmi_sim_realize;
> +dc->unrealize = ipmi_sim_unrealize;
>  bk->handle_command = ipmi_sim_handle_command;
>  }
>  
> --
> 2.7.4
> 
>

Re: [Qemu-devel] [PATCH v2 3/3] wdt_ib700: Free timer

2016-07-22 Thread Marc-André Lureau

Hi

- Original Message -
> From: Corey Minyard 
> 
> Add an unrealize function to free the timer allocated in the
> realize function and to delete the port memory added there,
> too.
> 
> Signed-off-by: Corey Minyard 
> Cc: Richard W.M. Jones 
> Cc: Marc-André Lureau 
> Reviewed-by: Richard W.M. Jones 
> ---
>  hw/watchdog/wdt_ib700.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/hw/watchdog/wdt_ib700.c b/hw/watchdog/wdt_ib700.c
> index 532afe8..23d4857 100644
> --- a/hw/watchdog/wdt_ib700.c
> +++ b/hw/watchdog/wdt_ib700.c
> @@ -117,6 +117,15 @@ static void wdt_ib700_realize(DeviceState *dev, Error
> **errp)
>  portio_list_add(&s->port_list, isa_address_space_io(&s->parent_obj), 0);
>  }
>  
> +static void wdt_ib700_unrealize(DeviceState *dev, Error **errp)
> +{
> +IB700State *s = IB700(dev);
> +
> +timer_del(s->timer);
> +timer_free(s->timer);
> +portio_list_del(&s->port_list);

actually portio_list_destroy() seems to be more appropriate (cleaning up 
allocations)

btw, I do not how to test this yet either.

> +}
> +
>  static void wdt_ib700_reset(DeviceState *dev)
>  {
>  IB700State *s = IB700(dev);
> @@ -136,6 +145,7 @@ static void wdt_ib700_class_init(ObjectClass *klass, void
> *data)
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  dc->realize = wdt_ib700_realize;
> +dc->unrealize = wdt_ib700_unrealize;
>  dc->reset = wdt_ib700_reset;
>  dc->vmsd = &vmstate_ib700;
>  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> --
> 2.7.4
> 
>

Re: [Qemu-devel] [PULL v3 00/55] pc, pci, virtio: new features, cleanups, fixes

2016-07-22 Thread Peter Maydell

On 22 July 2016 at 02:13, Fam Zheng  wrote:
> On Thu, 07/21 11:45, Peter Maydell wrote:
>> It failed on several of my test builds, not just one, but these
>> things are tricky to avoid if they don't happen on all compilers.
>> In this case I think it is a compiler bug:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119
>> so you'll only see it with an older compiler.
>
> Could you name the distro and gcc version? If it's worth to keep the buggy
> compiler happy, it probably can be added as a docker test. :)

I think ubuntu trusty stock gcc and the clang on OSX, probably
others too.

thanks
-- PMM

Re: [Qemu-devel] [RFC v1 13/13] target-ppc: introduce opc4 for Expanded Opcode

2016-07-22 Thread Bharata B Rao

On Mon, Jul 18, 2016 at 10:35 PM, Nikunj A Dadhania
 wrote:
> ISA 3.0 has introduced EO - Expanded Opcode. Introduce third level
> indirect opcode table and corresponding parsing routines.
>
> EO (11:12) Expanded opcode field
> Formats: XX1
>
> EO (11:15) Expanded opcode field
> Formats: VX, X, XX2
>
> Signed-off-by: Nikunj A Dadhania 
> ---
>  target-ppc/translate.c  |  73 +--
>  target-ppc/translate_init.c | 103 
> 
>  2 files changed, 136 insertions(+), 40 deletions(-)
>
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 6c5a4a6..733d68d 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -40,6 +40,7 @@
>  /* Include definitions for instructions classes and implementations flags */
>  //#define PPC_DEBUG_DISAS
>  //#define DO_PPC_STATISTICS
> +//#define PPC_DUMP_CPU
>
>  #ifdef PPC_DEBUG_DISAS
>  #  define LOG_DISAS(...) qemu_log_mask(CPU_LOG_TB_IN_ASM, ## __VA_ARGS__)
> @@ -367,12 +368,15 @@ GEN_OPCODE2(name, onam, opc1, opc2, opc3, inval, type, 
> PPC_NONE)
>  #define GEN_HANDLER2_E(name, onam, opc1, opc2, opc3, inval, type, type2) 
>  \
>  GEN_OPCODE2(name, onam, opc1, opc2, opc3, inval, type, type2)
>
> +#define GEN_HANDLER_E_2(name, opc1, opc2, opc3, opc4, inval, type, type2)
>  \
> +GEN_OPCODE3(name, opc1, opc2, opc3, opc4, inval, type, type2)
> +
>  typedef struct opcode_t {
> -unsigned char opc1, opc2, opc3;
> +unsigned char opc1, opc2, opc3, opc4;
>  #if HOST_LONG_BITS == 64 /* Explicitly align to 64 bits */
> -unsigned char pad[5];
> +unsigned char pad[4];
>  #else
> -unsigned char pad[1];
> +unsigned char pad[4]; /* 4-byte pad to maintain pad in opcode table */
>  #endif
>  opc_handler_t handler;
>  const char *oname;
> @@ -452,6 +456,8 @@ EXTRACT_HELPER(opc1, 26, 6);
>  EXTRACT_HELPER(opc2, 1, 5);
>  /* Opcode part 3 */
>  EXTRACT_HELPER(opc3, 6, 5);
> +/* Opcode part 4 */
> +EXTRACT_HELPER(opc4, 16, 5);
>  /* Update Cr0 flags */
>  EXTRACT_HELPER(Rc, 0, 1);
>  /* Update Cr6 flags (Altivec) */
> @@ -589,6 +595,7 @@ EXTRACT_HELPER(SP, 19, 2);
>  .opc1 = op1, 
>  \
>  .opc2 = op2, 
>  \
>  .opc3 = op3, 
>  \
> +.opc4 = 0xff,
>  \
>  .pad  = { 0, },  
>  \
>  .handler = { 
>  \
>  .inval1  = invl, 
>  \
> @@ -604,6 +611,7 @@ EXTRACT_HELPER(SP, 19, 2);
>  .opc1 = op1, 
>  \
>  .opc2 = op2, 
>  \
>  .opc3 = op3, 
>  \
> +.opc4 = 0xff,
>  \
>  .pad  = { 0, },  
>  \
>  .handler = { 
>  \
>  .inval1  = invl1,
>  \
> @@ -620,6 +628,7 @@ EXTRACT_HELPER(SP, 19, 2);
>  .opc1 = op1, 
>  \
>  .opc2 = op2, 
>  \
>  .opc3 = op3, 
>  \
> +.opc4 = 0xff,
>  \
>  .pad  = { 0, },  
>  \
>  .handler = { 
>  \
>  .inval1  = invl, 
>  \
> @@ -630,12 +639,29 @@ EXTRACT_HELPER(SP, 19, 2);
>  },   
>  \
>  .oname = onam,   
>  \
>  }
> +#define GEN_OPCODE3(name, op1, op2, op3, op4, invl, _typ, _typ2) 
>  \
> +{
>  \
> +.opc1 = op1, 
>  \
> +.opc2 = op2, 
>  \
> +.opc3 = op3, 
>  \
> +.opc4 = op4, 
>  \
> +.pad  = { 0, },  
>  \
> +.handler = { 
>  \
> +.inval1  = invl,

[Qemu-devel] [PATCH] hw/mips_malta: Fix YAMON API print routine

2016-07-22 Thread Paul Burton

The print routine provided as part of the in-built bootloader had a bug
in that it attempted to use a jump instruction as part of a loop, but
the target has its upper bits zeroed leading to control flow
transferring to 0xb814 rather than the intended 0xbfc00814. Fix this
by using a branch instruction instead, which seems more fit for purpose.

A simple way to test this is to build a Linux kernel with EVA enabled &
attempt to boot it in QEMU. It will attempt to print a message
indicating the configuration mismatch but QEMU would previously
incorrectly jump & wind up printing a continuous stream of the letter E.

Signed-off-by: Paul Burton 
Cc: Aurelien Jarno 
Cc: Leon Alrae 
---
 hw/mips/mips_malta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
index 34d41ef..e90857e 100644
--- a/hw/mips/mips_malta.c
+++ b/hw/mips/mips_malta.c
@@ -727,7 +727,7 @@ static void write_bootloader(uint8_t *base, int64_t 
run_addr,
 stl_p(p++, 0x); /* nop */
 stl_p(p++, 0x0ff0021c); /* jal 870 */
 stl_p(p++, 0x); /* nop */
-stl_p(p++, 0x08000205); /* j 814 */
+stl_p(p++, 0x1000fff9); /* b 814 */
 stl_p(p++, 0x); /* nop */
 stl_p(p++, 0x01a9); /* jalr t5 */
 stl_p(p++, 0x01602021); /* move a0,t3 
*/
-- 
2.9.0

Re: [Qemu-devel] [RFC v1 13/13] target-ppc: introduce opc4 for Expanded Opcode

2016-07-22 Thread Nikunj A Dadhania

Bharata B Rao  writes:

> On Mon, Jul 18, 2016 at 10:35 PM, Nikunj A Dadhania
>  wrote:
>> ISA 3.0 has introduced EO - Expanded Opcode. Introduce third level
>> indirect opcode table and corresponding parsing routines.
>>
>> EO (11:12) Expanded opcode field
>> Formats: XX1
>>
>> EO (11:15) Expanded opcode field
>> Formats: VX, X, XX2
>>
>> Signed-off-by: Nikunj A Dadhania 
>> ---
>> +static int register_trplind_insn (opc_handler_t **ppc_opcodes,
>> +  unsigned char idx1, unsigned char idx2,
>> +  unsigned char idx3, unsigned char idx4,
>> +  opc_handler_t *handler)
>> +{
>> +opc_handler_t **table;
>> +
>> +if (register_ind_in_table(ppc_opcodes, idx1, idx2, NULL) < 0) {
>> +printf("*** ERROR: unable to join indirect table idx "
>> +   "[%02x-%02x]\n", idx1, idx2);
>> +return -1;
>> +}
>> +table = ind_table(ppc_opcodes[idx1]);
>> +if (register_ind_in_table(table, idx2, idx3, NULL) < 0) {
>> +printf("*** ERROR: unable to join 2nd-level indirect table idx "
>> +   "[%02x-%02x-%02x]\n", idx1, idx2, idx3);
>> +return -1;
>> +}
>> +table = ind_table(table[idx2]);
>> +if (register_ind_in_table(table, idx3, idx4, handler) < 0) {
>> +printf("*** ERROR: unable to insert opcode "
>> +   "[%02x-%02x-%02x-%02x]\n", idx1, idx2, idx3, idx4);
>> +return -1;
>> +}
>> +return 0;
>> +}
>
> If you are adding a 3rd level opcode table, explicit freeing of the
> same from ppc_cpu_unrealizefn() is necessary right ?

Yes, you are right, will add in my next revision.

Regards,
Nikunj

Re: [Qemu-devel] [PATCH 0/8] Fix migration issues with arbitrary cpu-hot(un)plug

2016-07-22 Thread Igor Mammedov

On Fri, 22 Jul 2016 14:35:05 +1000
David Gibson  wrote:

> On Fri, Jul 22, 2016 at 12:56:26AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Jul 21, 2016 at 05:54:31PM +0200, Igor Mammedov wrote:  
> > > Series fixes migration issues caused by unstable cpu_index which depended
> > > on order cpus were created/destroyed. It follows David's idea to make
> > > cpu_index assignable by selected boards if board supports cpu-hotplug
> > > with device_add and needs stable cpu_index/'migration id' but leaves
> > > behaviour of the same as before for users that don't care about
> > > cpu-hot(un)plug making changes low-risk.
> > > 
> > > tested with:
> > >   SRC -snapshot -enable-kvm -smp 1,maxcpus=3 -m 256M guest.img -monitor 
> > > stdio \
> > >-device qemu64-x86_64-cpu,id=cpudel,apic-id=1 \
> > >-device qemu64-x86_64-cpu,apic-id=2 
> > >   (qemu) device_del cpudel
> > >   (qemu) stop
> > >   (qemu) migrate "exec:gzip -c > STATEFILE.gz"
> > >   
> > >   DST -snapshot -enable-kvm -smp 1,maxcpus=3 -m 256M guest.img -monitor 
> > > stdio \
> > >   -device qemu64-x86_64-cpu,apic-id=2 \
> > >   -incoming "exec: gzip -c -d STATEFILE.gz"
> > > 
> > > git tree to test with:
> > >  https://github.com/imammedo/qemu cpu-index-stable
> > >  to view
> > >  https://github.com/imammedo/qemu/commits/cpu-index-stable  
> > 
> > For PC bits:
> > 
> > Reviewed-by: Michael S. Tsirkin 
> > 
> > This would be nice to have in 2.7.  
> 
> I agree.  Despite the lateness, I think this will avoid substantial
> future pain.
> 
> > Who's reviewing/merging the rest? Eduardo?  
> 
> I've reviewed.  I could merge through my tree if we don't have a
> better option, but merging pc specific pieces through the ppc tree
> would seem odd.
Eduardo,
if you take it through your tree could you drop spapr patches for now,
it looks like PPC side are going to redefine core-id semantics
so they'll post patches on top of this series.

[Qemu-devel] [PATCH] block/gluster: fix doc in the qapi schema

2016-07-22 Thread Prasanna Kumar Kalever

1. s/@debug-level/@debug_level/
2. rearrange the versioning
3. s/server description/servers description/

Signed-off-by: Prasanna Kumar Kalever 
---
 qapi/block-core.json | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index f462345..5af0ffd 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1689,8 +1689,9 @@
 #
 # @host_device, @host_cdrom: Since 2.1
 #
-# Since: 2.0
 # @gluster: Since 2.7
+#
+# Since: 2.0
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
@@ -2134,9 +2135,9 @@
 #
 # @path:absolute path to image file in gluster volume
 #
-# @server:  gluster server description
+# @server:  gluster servers description
 #
-# @debug-level: #optional libgfapi log level (default '4' which is Error)
+# @debug_level: #optional libgfapi log level (default '4' which is Error)
 #
 # Since: 2.7
 ##
-- 
2.7.4

Re: [Qemu-devel] [PATCH v4] virtio-pci: error out when both legacy and modern modes are disabled

2016-07-22 Thread Greg Kurz

On Fri, 22 Jul 2016 10:04:35 +0200
Cornelia Huck  wrote:

> On Thu, 21 Jul 2016 23:21:16 +0200
> Greg Kurz  wrote:
> 
> > From: Greg Kurz 
> > 
> > Without presuming if we got there because of a user mistake or some
> > more subtle bug in the tooling, it really does not make sense to
> > implement a non-functional device.
> > 
> > Signed-off-by: Greg Kurz 
> > Reviewed-by: Marcel Apfelbaum 
> > Signed-off-by: Greg Kurz 
> > ---
> > v4: - rephrased error message and provide a hint to the user
> > - split string literals to stay below 80 characters
> > - added Marcel's R-b tag
> > ---
> >  hw/virtio/virtio-pci.c |8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index 755f9218b77d..72c4b392ffda 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -1842,6 +1842,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
> > Error **errp)
> >  VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> >  PCIDevice *pci_dev = &proxy->pci_dev;
> > 
> > +if (!(virtio_pci_modern(proxy) || virtio_pci_legacy(proxy))) {  
> 
> I'm not sure that I didn't mess up the sequence of the realize
> callbacks, but could disable_legacy still be AUTO here? In that case,
> we'd fail for disable-modern=on and disable-legacy unset (i.e., AUTO),
> which would be ok for pcie but not for !pcie.
> 

Marcel made the same comment in:

https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg05225.html

If the user explicitly disables modern, she shouldn't rely on QEMU
implicitly enabling legacy, hence the suggestion in error_append_hint().

> > +error_setg(errp, "device cannot work when both modern and legacy 
> > modes"
> > +   " are disabled");
> > +error_append_hint(errp, "Set either disable-modern or 
> > disable-legacy"
> > +  " to off\n");
> > +return;
> > +}
> > +
> >  if (!(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_PCIE) &&
> >  virtio_pci_modern(proxy)) {
> >  pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
> >   
>

Re: [Qemu-devel] [PATCH v2 3/3] wdt_ib700: Free timer

2016-07-22 Thread Richard W.M. Jones

On Fri, Jul 22, 2016 at 05:26:27AM -0400, Marc-André Lureau wrote:
> btw, I do not how to test this yet either.

I have a little test framework for the watchdog device which cuts
through a lot of the BS with running the full watchdog daemon, and
also has some simple instructions to follow:

  http://git.annexia.org/?p=watchdog-test-framework.git;a=tree

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

[Qemu-devel] [PATCH v22 01/10] unblock backup operations in backing file

2016-07-22 Thread Wang WeiWei

From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
Signed-off-by: Wang WeiWei 
---
 block.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block.c b/block.c
index 30d64e6..194a060 100644
--- a/block.c
+++ b/block.c
@@ -1311,6 +1311,23 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd)
 /* Otherwise we won't be able to commit due to check in bdrv_commit */
 bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
 bs->backing_blocker);
+/*
+ * We do backup in 3 ways:
+ * 1. drive backup
+ *The target bs is new opened, and the source is top BDS
+ * 2. blockdev backup
+ *Both the source and the target are top BDSes.
+ * 3. internal backup(used for block replication)
+ *Both the source and the target are backing file
+ *
+ * In case 1 and 2, neither the source nor the target is the backing file.
+ * In case 3, we will block the top BDS, so there is only one block job
+ * for the top BDS and its backing chain.
+ */
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+bs->backing_blocker);
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+bs->backing_blocker);
 out:
 bdrv_refresh_limits(bs, NULL);
 }
-- 
1.9.3

[Qemu-devel] [PATCH v22 06/10] auto complete active commit

2016-07-22 Thread Wang WeiWei

From: Wen Congyang 

Auto complete mirror job in background to prevent from
blocking synchronously

Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
Signed-off-by: Wang WeiWei 
---
 block/mirror.c| 13 +
 blockdev.c|  2 +-
 include/block/block_int.h |  3 ++-
 qemu-img.c|  2 +-
 4 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 836a5d0..fce29c2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -914,7 +914,8 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
  BlockCompletionFunc *cb,
  void *opaque, Error **errp,
  const BlockJobDriver *driver,
- bool is_none_mode, BlockDriverState *base)
+ bool is_none_mode, BlockDriverState *base,
+ bool auto_complete)
 {
 MirrorBlockJob *s;
 
@@ -950,6 +951,9 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 s->granularity = granularity;
 s->buf_size = ROUND_UP(buf_size, granularity);
 s->unmap = unmap;
+if (auto_complete) {
+s->should_complete = true;
+}
 
 s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
 if (!s->dirty_bitmap) {
@@ -988,14 +992,15 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
 mirror_start_job(job_id, bs, target, replaces,
  speed, granularity, buf_size, backing_mode,
  on_source_error, on_target_error, unmap, cb, opaque, errp,
- &mirror_job_driver, is_none_mode, base);
+ &mirror_job_driver, is_none_mode, base, false);
 }
 
 void commit_active_start(const char *job_id, BlockDriverState *bs,
  BlockDriverState *base, int64_t speed,
  BlockdevOnError on_error,
  BlockCompletionFunc *cb,
- void *opaque, Error **errp)
+ void *opaque, Error **errp,
+ bool auto_complete)
 {
 int64_t length, base_length;
 int orig_base_flags;
@@ -1036,7 +1041,7 @@ void commit_active_start(const char *job_id, 
BlockDriverState *bs,
 mirror_start_job(job_id, bs, base, NULL, speed, 0, 0,
  MIRROR_LEAVE_BACKING_CHAIN,
  on_error, on_error, false, cb, opaque, &local_err,
- &commit_active_job_driver, false, base);
+ &commit_active_job_driver, false, base, auto_complete);
 if (local_err) {
 error_propagate(errp, local_err);
 goto error_restore_flags;
diff --git a/blockdev.c b/blockdev.c
index eafeba9..be7be7b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3140,7 +3140,7 @@ void qmp_block_commit(bool has_job_id, const char 
*job_id, const char *device,
 goto out;
 }
 commit_active_start(has_job_id ? job_id : NULL, bs, base_bs, speed,
-on_error, block_job_cb, bs, &local_err);
+on_error, block_job_cb, bs, &local_err, false);
 } else {
 commit_start(has_job_id ? job_id : NULL, bs, base_bs, top_bs, speed,
  on_error, block_job_cb, bs,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 09be16f..810221e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -699,13 +699,14 @@ void commit_start(const char *job_id, BlockDriverState 
*bs,
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
+ * @auto_complete: Auto complete the job.
  *
  */
 void commit_active_start(const char *job_id, BlockDriverState *bs,
  BlockDriverState *base, int64_t speed,
  BlockdevOnError on_error,
  BlockCompletionFunc *cb,
- void *opaque, Error **errp);
+ void *opaque, Error **errp, bool auto_complete);
 /*
  * mirror_start:
  * @job_id: The id of the newly-created job, or %NULL to use the
diff --git a/qemu-img.c b/qemu-img.c
index 2e40e1f..ae204c9 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -921,7 +921,7 @@ static int img_commit(int argc, char **argv)
 };
 
 commit_active_start("commit", bs, base_bs, 0, BLOCKDEV_ON_ERROR_REPORT,
-common_block_job_cb, &cbi, &local_err);
+common_block_job_cb, &cbi, &local_err, false);
 if (local_err) {
 goto done;
 }
-- 
1.9.3

[Qemu-devel] [PATCH v22 05/10] docs: block replication's description

2016-07-22 Thread Wang WeiWei

From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Signed-off-by: Wang WeiWei 
---
 docs/block-replication.txt | 239 +
 1 file changed, 239 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 000..6bde673
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,239 @@
+Block replication
+
+Copyright Fujitsu, Corp. 2016
+Copyright (c) 2016 Intel Corporation
+Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COarse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of the Primary and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort during a vmstate checkpoint, the disk modification operations of
+the Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
++--+++
+|Primary Write Requests||Secondary Write Requests|
++--+++
+  |   |
+  |  (4)
+  |   V
+  |  /-\
+  |  Copy and Forward| |
+  |-(1)--+   | Disk Buffer |
+  |  |   | |
+  | (3)  \-/
+  | speculative  ^
+  |write through(2)
+  |  |   |
+  V  V   |
+   +--+   ++
+   | Primary Disk |   | Secondary Disk |
+   +--+   ++
+
+1) Primary write requests will be copied and forwarded to Secondary
+   QEMU.
+2) Before Primary write requests are written to Secondary disk, the
+   original sector content will be read from Secondary disk and
+   buffered in the Disk buffer, but it will not overwrite the existing
+   sector content (it could be from either "Secondary Write Requests" or
+   previous COW of "Primary Write Requests") in the Disk buffer.
+3) Primary write requests will be written to Secondary disk.
+4) Secondary write requests will be buffered in the Disk buffer and it
+   will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+ virtio-blk   ||
+ ^||.--
+ |||| Secondary
+1 Quorum  ||'--
+ /  \ ||
+/\||
+   Primary2 filter
+ disk ^
 virtio-blk
+  |
  ^
+3 NBD  --->  3 NBD 
  |
+client|| server
  2 filter
+  ||^  
  ^
+. |||  
  |
+Primary | ||  Secondary disk <- hidden-disk 5 
<- active-disk 4
+' |||  backing^   backing
+  ||| |
+  ||| |
+  ||'-'
+  ||

[Qemu-devel] [PATCH v22 04/10] Link backup into block core

2016-07-22 Thread Wang WeiWei

From: Wen Congyang 

Some programs that add a dependency on it will use
the block layer directly.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Jeff Cody 
Signed-off-by: Wang WeiWei 
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 2593a2f..8a3270b 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,11 +22,11 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 block-obj-y += crypto.o
 
 common-obj-y += stream.o
-common-obj-y += backup.o
 
 iscsi.o-cflags := $(LIBISCSI_CFLAGS)
 iscsi.o-libs   := $(LIBISCSI_LIBS)
-- 
1.9.3

[Qemu-devel] [PATCH v22 09/10] tests: add unit test case for replication

2016-07-22 Thread Wang WeiWei

From: Changlong Xie 

Signed-off-by: Wen Congyang 
Signed-off-by: Changlong Xie 
Signed-off-by: Wang WeiWei 
---
 tests/.gitignore |   1 +
 tests/Makefile.include   |   4 +
 tests/test-replication.c | 575 +++
 3 files changed, 580 insertions(+)
 create mode 100644 tests/test-replication.c

diff --git a/tests/.gitignore b/tests/.gitignore
index dbb5263..b4a9cfc 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -63,6 +63,7 @@ test-qmp-introspect.[ch]
 test-qmp-marshal.c
 test-qmp-output-visitor
 test-rcu-list
+test-replication
 test-rfifolock
 test-string-input-visitor
 test-string-output-visitor
diff --git a/tests/Makefile.include b/tests/Makefile.include
index e7e50d6..2e77d35 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -111,6 +111,7 @@ check-unit-y += tests/test-crypto-xts$(EXESUF)
 check-unit-y += tests/test-crypto-block$(EXESUF)
 gcov-files-test-logging-y = tests/test-logging.c
 check-unit-y += tests/test-logging$(EXESUF)
+check-unit-y += tests/test-replication$(EXESUF)
 
 check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
 
@@ -478,6 +479,9 @@ tests/test-base64$(EXESUF): tests/test-base64.o \
 
 tests/test-logging$(EXESUF): tests/test-logging.o $(test-util-obj-y)
 
+tests/test-replication$(EXESUF): tests/test-replication.o $(test-util-obj-y) \
+   $(test-block-obj-y)
+
 tests/test-qapi-types.c tests/test-qapi-types.h :\
 $(SRC_PATH)/tests/qapi-schema/qapi-schema-test.json 
$(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
diff --git a/tests/test-replication.c b/tests/test-replication.c
new file mode 100644
index 000..b63f1ef
--- /dev/null
+++ b/tests/test-replication.c
@@ -0,0 +1,575 @@
+/*
+ * Block replication tests
+ *
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Author: Changlong Xie 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "replication.h"
+#include "block/block_int.h"
+#include "sysemu/block-backend.h"
+
+#define IMG_SIZE (64 * 1024 * 1024)
+
+/* primary */
+#define P_ID "primary-id"
+static char p_local_disk[] = "/tmp/p_local_disk.XX";
+
+/* secondary */
+#define S_ID "secondary-id"
+#define S_LOCAL_DISK_ID "secondary-local-disk-id"
+static char s_local_disk[] = "/tmp/s_local_disk.XX";
+static char s_active_disk[] = "/tmp/s_active_disk.XX";
+static char s_hidden_disk[] = "/tmp/s_hidden_disk.XX";
+
+/* FIXME: steal from blockdev.c */
+QemuOptsList qemu_drive_opts = {
+.name = "drive",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_drive_opts.head),
+.desc = {
+{ /* end of list */ }
+},
+};
+
+#define NOT_DONE 0x7fff
+
+static void blk_rw_done(void *opaque, int ret)
+{
+*(int *)opaque = ret;
+}
+
+static void test_blk_read(BlockBackend *blk, long pattern,
+  int64_t pattern_offset, int64_t pattern_count,
+  int64_t offset, int64_t count,
+  bool expect_failed)
+{
+void *pattern_buf = NULL;
+QEMUIOVector qiov;
+void *cmp_buf = NULL;
+int async_ret = NOT_DONE;
+
+if (pattern) {
+cmp_buf = g_malloc(pattern_count);
+memset(cmp_buf, pattern, pattern_count);
+}
+
+pattern_buf = g_malloc(count);
+if (pattern) {
+memset(pattern_buf, pattern, count);
+} else {
+memset(pattern_buf, 0x00, count);
+}
+
+qemu_iovec_init(&qiov, 1);
+qemu_iovec_add(&qiov, pattern_buf, count);
+
+blk_aio_preadv(blk, offset, &qiov, 0, blk_rw_done, &async_ret);
+while (async_ret == NOT_DONE) {
+main_loop_wait(false);
+}
+
+if (expect_failed) {
+g_assert(async_ret != 0);
+} else {
+g_assert(async_ret == 0);
+if (pattern) {
+g_assert(memcmp(pattern_buf + pattern_offset,
+cmp_buf, pattern_count) <= 0);
+}
+}
+
+g_free(pattern_buf);
+}
+
+static void test_blk_write(BlockBackend *blk, long pattern, int64_t offset,
+   int64_t count, bool expect_failed)
+{
+void *pattern_buf = NULL;
+QEMUIOVector qiov;
+int async_ret = NOT_DONE;
+
+pattern_buf = g_malloc(count);
+if (pattern) {
+memset(pattern_buf, pattern, count);
+} else {
+memset(pattern_buf, 0x00, count);
+}
+
+qemu_iovec_init(&qiov, 1);
+qemu_iovec_add(&qiov, pattern_buf, count);
+
+blk_aio_pwritev(blk, offset, &qiov, 0, blk_rw_done, &async_ret);
+while (async_ret == NOT_DONE) {
+main_loop_wait(false);
+}
+
+if (expect_failed) {
+g_assert(async_ret != 0);
+} else {
+g_assert(async_ret == 0);
+}
+
+g_free(pattern_buf);
+}
+
+/*
+ * Create a uniquely-named empty temporary file.
+ */
+static void make_temp(char *template)
+{
+int fd;
+
+

[Qemu-devel] [PATCH v22 07/10] Introduce new APIs to do replication operation

2016-07-22 Thread Wang WeiWei

From: Changlong Xie 

This commit introduces six replication interfaces(for block, network etc).
Firstly we can use replication_(new/remove) to create/destroy replication
instances, then in migration we can use replication_(start/stop/do_checkpoint
/get_error)_all to handle all replication operations. More detail please
refer to replication.h

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Signed-off-by: Wang WeiWei 
---
 Makefile.objs|   1 +
 qapi/block-core.json |  13 
 replication.c| 107 +++
 replication.h| 174 +++
 4 files changed, 295 insertions(+)
 create mode 100644 replication.c
 create mode 100644 replication.h

diff --git a/Makefile.objs b/Makefile.objs
index 7f1f0a3..4abdc81 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qemu-io-cmds.o
+block-obj-y += replication.o
 
 block-obj-m = block/
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index f462345..7f05b68 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2147,6 +2147,19 @@
 '*debug_level': 'int' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.7
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.  Many options are available for all
diff --git a/replication.c b/replication.c
new file mode 100644
index 000..be3a42f
--- /dev/null
+++ b/replication.c
@@ -0,0 +1,107 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Changlong Xie 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "replication.h"
+
+static QLIST_HEAD(, ReplicationState) replication_states;
+
+ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
+{
+ReplicationState *rs;
+
+assert(ops != NULL);
+rs = g_new0(ReplicationState, 1);
+rs->opaque = opaque;
+rs->ops = ops;
+QLIST_INSERT_HEAD(&replication_states, rs, node);
+
+return rs;
+}
+
+void replication_remove(ReplicationState *rs)
+{
+if (rs) {
+QLIST_REMOVE(rs, node);
+g_free(rs);
+}
+}
+
+/*
+ * The caller of the function MUST make sure vm stopped
+ */
+void replication_start_all(ReplicationMode mode, Error **errp)
+{
+ReplicationState *rs, *next;
+Error *local_err = NULL;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->start) {
+rs->ops->start(rs, mode, &local_err);
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
+void replication_do_checkpoint_all(Error **errp)
+{
+ReplicationState *rs, *next;
+Error *local_err = NULL;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->checkpoint) {
+rs->ops->checkpoint(rs, &local_err);
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
+void replication_get_error_all(Error **errp)
+{
+ReplicationState *rs, *next;
+Error *local_err = NULL;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->get_error) {
+rs->ops->get_error(rs, &local_err);
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
+
+void replication_stop_all(bool failover, Error **errp)
+{
+ReplicationState *rs, *next;
+Error *local_err = NULL;
+
+QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+if (rs->ops && rs->ops->stop) {
+rs->ops->stop(rs, failover, &local_err);
+}
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+}
+}
diff --git a/replication.h b/replication.h
new file mode 100644
index 000..ece6ca6
--- /dev/null
+++ b/replication.h
@@ -0,0 +1,174 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Changlong Xie 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REPLICATION_H
+#define

[Qemu-devel] [PATCH v22 08/10] Implement new driver for block replication

2016-07-22 Thread Wang WeiWei

From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Signed-off-by: Wang WeiWei 
---
 block/Makefile.objs |   1 +
 block/replication.c | 658 
 2 files changed, 659 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 8a3270b..ee1d849 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 block-obj-y += crypto.o
 
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 000..ec35348
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,658 @@
+/*
+ * Replication Block filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "block/nbd.h"
+#include "block/blockjob.h"
+#include "block/block_int.h"
+#include "block/block_backup.h"
+#include "sysemu/block-backend.h"
+#include "qapi/error.h"
+#include "replication.h"
+
+typedef struct BDRVReplicationState {
+ReplicationMode mode;
+int replication_state;
+BdrvChild *active_disk;
+BdrvChild *hidden_disk;
+BdrvChild *secondary_disk;
+char *top_id;
+ReplicationState *rs;
+Error *blocker;
+int orig_hidden_flags;
+int orig_secondary_flags;
+int error;
+} BDRVReplicationState;
+
+enum {
+BLOCK_REPLICATION_NONE, /* block replication is not started */
+BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+BLOCK_REPLICATION_FAILOVER, /* failover is running in background */
+BLOCK_REPLICATION_FAILOVER_FAILED,  /* failover failed */
+BLOCK_REPLICATION_DONE, /* block replication is done */
+};
+
+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+  Error **errp);
+static void replication_do_checkpoint(ReplicationState *rs, Error **errp);
+static void replication_get_error(ReplicationState *rs, Error **errp);
+static void replication_stop(ReplicationState *rs, bool failover,
+ Error **errp);
+
+#define REPLICATION_MODE"mode"
+#define REPLICATION_TOP_ID  "top-id"
+static QemuOptsList replication_runtime_opts = {
+.name = "replication",
+.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+.desc = {
+{
+.name = REPLICATION_MODE,
+.type = QEMU_OPT_STRING,
+},
+{
+.name = REPLICATION_TOP_ID,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static ReplicationOps replication_ops = {
+.start = replication_start,
+.checkpoint = replication_do_checkpoint,
+.get_error = replication_get_error,
+.stop = replication_stop,
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+int flags, Error **errp)
+{
+int ret;
+BDRVReplicationState *s = bs->opaque;
+Error *local_err = NULL;
+QemuOpts *opts = NULL;
+const char *mode;
+const char *top_id;
+
+ret = -EINVAL;
+opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+qemu_opts_absorb_qdict(opts, options, &local_err);
+if (local_err) {
+goto fail;
+}
+
+mode = qemu_opt_get(opts, REPLICATION_MODE);
+if (!mode) {
+error_setg(&local_err, "Missing the option mode");
+goto fail;
+}
+
+if (!strcmp(mode, "primary")) {
+s->mode = REPLICATION_MODE_PRIMARY;
+} else if (!strcmp(mode, "secondary")) {
+s->mode = REPLICATION_MODE_SECONDARY;
+top_id = qemu_opt_get(opts, REPLICATION_TOP_ID);
+s->top_id = g_strdup(top_id);
+if (!s->top_id) {
+error_setg(&local_err, "Missing the option top-id");
+goto fail;
+}
+} else {
+error_setg(&local_err,
+   "The option mode's value should be primary or secondary");
+goto fail;
+}
+
+s->rs = replication_new(bs, &replication_ops);
+
+ret = 0;
+
+fail:
+qemu_opts_del(opts);
+error_propagate(errp, local_err);
+
+return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+BDRVReplicationState *s = bs->opaque;
+
+if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+replication_stop(s->rs, false, NULL);
+}
+
+if (s->mode == REPLICATION_MODE_SECONDARY) {
+g_free(s->top_id);
+}
+
+replication_remove(s->rs);
+}
+
+st

[Qemu-devel] [PATCH v22 02/10] Backup: clear all bitmap when doing block checkpoint

2016-07-22 Thread Wang WeiWei

From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Signed-off-by: Wang WeiWei 
---
 block/backup.c   | 18 ++
 include/block/block_backup.h |  3 +++
 2 files changed, 21 insertions(+)
 create mode 100644 include/block/block_backup.h

diff --git a/block/backup.c b/block/backup.c
index 2c05323..3bce416 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -17,6 +17,7 @@
 #include "block/block.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/block_backup.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
@@ -253,6 +254,23 @@ static void backup_attached_aio_context(BlockJob *job, 
AioContext *aio_context)
 blk_set_aio_context(s->target, aio_context);
 }
 
+void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+int64_t len;
+
+assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+error_setg(errp, "The backup job only supports block checkpoint in"
+   " sync=none mode");
+return;
+}
+
+len = DIV_ROUND_UP(backup_job->common.len, backup_job->cluster_size);
+bitmap_zero(backup_job->done_bitmap, len);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
diff --git a/include/block/block_backup.h b/include/block/block_backup.h
new file mode 100644
index 000..3753bcb
--- /dev/null
+++ b/include/block/block_backup.h
@@ -0,0 +1,3 @@
+#include "block/block_int.h"
+
+void backup_do_checkpoint(BlockJob *job, Error **errp);
-- 
1.9.3

[Qemu-devel] [PATCH v22 03/10] Backup: export interfaces for extra serialization

2016-07-22 Thread Wang WeiWei

From: Changlong Xie 

Normal backup(sync='none') workflow:
step 1. NBD peformance I/O write from client to server
   qcow2_co_writev
bdrv_co_writev
 ...
   bdrv_aligned_pwritev
notifier_with_return_list_notify -> backup_do_cow
 bdrv_driver_pwritev // write new contents

step 2. drive-backup sync=none
   backup_do_cow
   {
wait_for_overlapping_requests
cow_request_begin
for(; start < end; start++) {
bdrv_co_readv_no_serialising //read old contents from Secondary disk
bdrv_co_writev // write old contents to hidden-disk
}
cow_request_end
   }

step 3. Then roll back to "step 1" to write new contents to Secondary disk.

And for replication, we must make sure that we only read the old contents from
Secondary disk in order to keep contents consistent.

1) Replication workflow of Secondary
 virtio-blk
  ^
--->  1 NBD   |
   || server   3 replication
   ||^^
   |||   backing backing  |
   ||  Secondary disk 6< hidden-disk 5 < active-disk 4
   ||| ^
   ||'-'
   ||   drive-backup sync=none 2

Hence, we need these interfaces to implement coarse-grained serialization 
between
COW of Secondary disk and the read operation of replication.

Example codes about how to use them:

*#include "block/block_backup.h"

static coroutine_fn int xxx_co_readv()
{
CowRequest req;
BlockJob *job = secondary_disk->bs->job;

if (job) {
  backup_wait_for_overlapping_requests(job, start, end);
  backup_cow_request_begin(&req, job, start, end);
  ret = bdrv_co_readv();
  backup_cow_request_end(&req);
  goto out;
}
ret = bdrv_co_readv();
out:
return ret;
}

Signed-off-by: Changlong Xie 
Signed-off-by: Wen Congyang 
Signed-off-by: Wang WeiWei 
---
 block/backup.c   | 41 ++---
 include/block/block_backup.h | 14 ++
 2 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 3bce416..919b63a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -28,13 +28,6 @@
 #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
 #define SLICE_TIME 1ULL /* ns */
 
-typedef struct CowRequest {
-int64_t start;
-int64_t end;
-QLIST_ENTRY(CowRequest) list;
-CoQueue wait_queue; /* coroutines blocked on this request */
-} CowRequest;
-
 typedef struct BackupBlockJob {
 BlockJob common;
 BlockBackend *target;
@@ -271,6 +264,40 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
 bitmap_zero(backup_job->done_bitmap, len);
 }
 
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
+  int nb_sectors)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
+int64_t start, end;
+
+assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+start = sector_num / sectors_per_cluster;
+end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+wait_for_overlapping_requests(backup_job, start, end);
+}
+
+void backup_cow_request_begin(CowRequest *req, BlockJob *job,
+  int64_t sector_num,
+  int nb_sectors)
+{
+BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
+int64_t start, end;
+
+assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+start = sector_num / sectors_per_cluster;
+end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+cow_request_begin(req, backup_job, start, end);
+}
+
+void backup_cow_request_end(CowRequest *req)
+{
+cow_request_end(req);
+}
+
 static const BlockJobDriver backup_job_driver = {
 .instance_size  = sizeof(BackupBlockJob),
 .job_type   = BLOCK_JOB_TYPE_BACKUP,
diff --git a/include/block/block_backup.h b/include/block/block_backup.h
index 3753bcb..e0e7ce6 100644
--- a/include/block/block_backup.h
+++ b/include/block/block_backup.h
@@ -1,3 +1,17 @@
 #include "block/block_int.h"
 
+typedef struct CowRequest {
+int64_t start;
+int64_t end;
+QLIST_ENTRY(CowRequest) list;
+CoQueue wait_queue; /* coroutines blocked on this request */
+} CowRequest;
+
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
+  int nb_sectors);
+void backup_cow_request_begin(CowRequest *req, BlockJob *job,
+

Re: [Qemu-devel] [PATCH 2/2] util/qemu-sockets: shoot unix_nonblocking_connect()

2016-07-22 Thread Cao jin


Hi Daniel

On 07/21/2016 11:39 PM, Daniel P. Berrange wrote:

On Thu, Jul 21, 2016 at 08:42:25AM -0600, Eric Blake wrote:

On 07/21/2016 04:33 AM, Cao jin wrote:

It is never used, and now all connect is nonblocking via
inet_connect_addr().



Could be squashed with 1/2.  In fact, if you squash it, I'd title the patch:

util: Drop unused *_nonblocking_connect() functions

You may also want to call out which commit id rendered the functions unused.


Well once those two functions are dropped the only other place accepting
NonBlockingConnectHandler is the socket_connect() method. Since nearly
everything is converted to QIOChannel now, there's only one caller of
socket_connect() left, and that's net/socket.c

Any newly written code which needs a non-blocking connect should use the
QIOChannel code, so I don't see any further usage of socket_connect()
being added.

IOW, we can rip out NonBlockingConnectHandler as a concept entirely, not
merely drop the *_nonblocking_connect() methods.



I don't quite follow the "rip out NonBlockingConnectHandler" thing. 
According what I learned from code, we offered non-blocking connection 
mechanism, but it seems nobody use it(all callers of socket_connect() 
set callback as NULL), so, do you mean removing this mechanism?


more explanation will be much appreciated:)


Regards,
Daniel



--
Yours Sincerely,

Cao jin

[Qemu-devel] [PATCH v22 00/10] Block replication for continuous checkpoints

2016-07-22 Thread Wang WeiWei

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/wangww-fnst/qemu/tree/block-replication-v22

You can get the patch with framework here:
https://github.com/wangww-fnst/qemu/tree/colo_framework_v21

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V22:
1. Rebase to the lastest code
2. modify code adapt to the modification of backup_start & commit_active_start
3. rewrite io_read & io_write for interface changes 
V21:
1. Rebase to the lastest code
2. use bdrv_pwrite_zeroes() and BDRV_SECTOR_BITS for p9
V20 Resend:
1. Resend to avoid bothering qemu-trivial maintainers
2. Address comments from Eric, fix header file issue and add a brief commit 
message for p7
V20:
1. Rebase to the lastest code
2. Address comments from stefan
p8: 
1. error_setg() with an error message when check_top_bs() fails. 
2. remove bdrv_ref(s->hidden_disk->bs) since commit 5c438bc6
3. use bloc_job_cancel_sync() before active commit
p9: 
1. fix uninitialized 'pattern_buf'
2. introduce mkstemp(3) to fix unique filenames
3. use qemu_vfree() for qemu_blockalign() memory
4. add missing replication_start_all()
5. remove useless pattern for io_write()
V19:
1. Rebase to v2.6.0
2. Address comments from stefan
p3: a new patch that export interfaces for extra serialization
p8: 
1. call replication_stop() before freeing s->top_id
2. check top_bs
3. reopen file readonly in error return paths
4. enable extra serialization between read and COW
p9: try to hanlde SIGABRT
V18:
p6: add local_err in all replication callbacks to prevent "errp == NULL"
p7: add missing qemu_iovec_destroy(xxx)
V17:
1. Rebase to the lastest codes 
p2: refactor backup_do_checkpoint addressed comments from Jeff Cody
p4: fix bugs in "drive_add buddy xxx" hmp commands
p6: add "since: 2.7"
p7: fix bug in replication_close(), add missing "qapi/error.h", add 
test-replication 
p8: add "since: 2.7"
V16:
1. Rebase to the newest codes
2. Address comments from Stefan & hailiang
p3: we don't need this patch now
p4: add "top-id" parameters for secondary
p6: fix NULL pointer in replication callbacks, remove unnecessary typedefs, 
add doc comments that explain the semantics of Replication
p7: Refactor AioContext for thread-safe, remove unnecessary get_top_bs()
*Note*: I'm working on replication testcase now, will send out in V17
V15:
1. Rebase to the newest codes
2. Fix typos and coding style addresed Eric's comments
3. Address Stefan's comments
   1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
   2) Update the message and description for [PATCH 4/9]
   3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
   4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
   5) Use BdrvChild instead of holding on to BlockDriverState * pointers
4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771  
5. Introduce replication_get_error_all to check replication status
6. Remove useless discard interface
V14:
1. Implement auto complete active commit
2. Implement active commit block job for replication.c
3. Address the comments from Stefan, add replication-specific API and data
   structure, also remove old block layer APIs
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c 
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
   opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
   when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
   its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
   reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's 
suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleeci

[Qemu-devel] [PATCH v22 10/10] support replication driver in blockdev-add

2016-07-22 Thread Wang WeiWei

From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
Reviewed-by: Eric Blake 
Signed-off-by: Wang WeiWei 
---
 qapi/block-core.json | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7f05b68..59565e9 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -248,6 +248,7 @@
 #   2.3: 'host_floppy' deprecated
 #   2.5: 'host_floppy' dropped
 #   2.6: 'luks' added
+#   2.7: 'replication' added
 #
 # @backing_file: #optional the name of the backing file (for copy-on-write)
 #
@@ -1696,8 +1697,8 @@
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
 'dmg', 'file', 'ftp', 'ftps', 'gluster', 'host_cdrom',
 'host_device', 'http', 'https', 'luks', 'null-aio', 'null-co',
-'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp',
-'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
+'parallels', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 
'replication',
+'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsFile
@@ -2160,6 +2161,19 @@
 { 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
 
 ##
+# @BlockdevOptionsReplication
+#
+# Driver specific block device options for replication
+#
+# @mode: the replication mode
+#
+# Since: 2.7
+##
+{ 'struct': 'BlockdevOptionsReplication',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'mode': 'ReplicationMode'  } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.  Many options are available for all
@@ -2224,6 +2238,7 @@
   'quorum': 'BlockdevOptionsQuorum',
   'raw':'BlockdevOptionsGenericFormat',
 # TODO rbd: Wait for structured options
+  'replication':'BlockdevOptionsReplication',
 # TODO sheepdog: Wait for structured options
 # TODO ssh: Should take InetSocketAddress for 'host'?
   'tftp':   'BlockdevOptionsFile',
-- 
1.9.3

Re: [Qemu-devel] [PULL 0/7] migration: fix, perf testing framework

2016-07-22 Thread Peter Maydell

On 22 July 2016 at 09:00, Amit Shah  wrote:
> The following changes since commit 206d0c24361a083fbdcb2cc86fb75dc8b7f251a2:
>
>   Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
> (2016-07-21 20:12:37 +0100)
>
> are available in the git repository at:
>
>   http://git.kernel.org/pub/scm/virt/qemu/amit/migration.git 
> tags/migration-for-2.7-6
>
> for you to fetch changes up to 409437e16df273fc5f78f6cd1cb53023eaeb9b72:
>
>   tests: introduce a framework for testing migration performance (2016-07-22 
> 13:23:39 +0530)
>
> 
> Migration:
> - Fix a postcopy bug
> - Add a testsuite for measuring migration performance
>
> 
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH 2/2] util/qemu-sockets: shoot unix_nonblocking_connect()

2016-07-22 Thread Daniel P. Berrange

On Fri, Jul 22, 2016 at 06:34:11PM +0800, Cao jin wrote:
> Hi Daniel
> 
> On 07/21/2016 11:39 PM, Daniel P. Berrange wrote:
> > On Thu, Jul 21, 2016 at 08:42:25AM -0600, Eric Blake wrote:
> > > On 07/21/2016 04:33 AM, Cao jin wrote:
> > > > It is never used, and now all connect is nonblocking via
> > > > inet_connect_addr().
> > > > 
> > > 
> > > Could be squashed with 1/2.  In fact, if you squash it, I'd title the 
> > > patch:
> > > 
> > > util: Drop unused *_nonblocking_connect() functions
> > > 
> > > You may also want to call out which commit id rendered the functions 
> > > unused.
> > 
> > Well once those two functions are dropped the only other place accepting
> > NonBlockingConnectHandler is the socket_connect() method. Since nearly
> > everything is converted to QIOChannel now, there's only one caller of
> > socket_connect() left, and that's net/socket.c
> > 
> > Any newly written code which needs a non-blocking connect should use the
> > QIOChannel code, so I don't see any further usage of socket_connect()
> > being added.
> > 
> > IOW, we can rip out NonBlockingConnectHandler as a concept entirely, not
> > merely drop the *_nonblocking_connect() methods.
> > 
> 
> I don't quite follow the "rip out NonBlockingConnectHandler" thing.
> According what I learned from code, we offered non-blocking connection
> mechanism, but it seems nobody use it(all callers of socket_connect() set
> callback as NULL), so, do you mean removing this mechanism?

Yes, remove it all, as it is no longer needed.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH 2/2] util/qemu-sockets: shoot unix_nonblocking_connect()

2016-07-22 Thread Cao jin




On 07/22/2016 06:30 PM, Daniel P. Berrange wrote:

On Fri, Jul 22, 2016 at 06:34:11PM +0800, Cao jin wrote:

Hi Daniel

On 07/21/2016 11:39 PM, Daniel P. Berrange wrote:

On Thu, Jul 21, 2016 at 08:42:25AM -0600, Eric Blake wrote:

On 07/21/2016 04:33 AM, Cao jin wrote:

It is never used, and now all connect is nonblocking via
inet_connect_addr().



Could be squashed with 1/2.  In fact, if you squash it, I'd title the patch:

util: Drop unused *_nonblocking_connect() functions

You may also want to call out which commit id rendered the functions unused.


Well once those two functions are dropped the only other place accepting
NonBlockingConnectHandler is the socket_connect() method. Since nearly
everything is converted to QIOChannel now, there's only one caller of
socket_connect() left, and that's net/socket.c

Any newly written code which needs a non-blocking connect should use the
QIOChannel code, so I don't see any further usage of socket_connect()
being added.

IOW, we can rip out NonBlockingConnectHandler as a concept entirely, not
merely drop the *_nonblocking_connect() methods.



I don't quite follow the "rip out NonBlockingConnectHandler" thing.
According what I learned from code, we offered non-blocking connection
mechanism, but it seems nobody use it(all callers of socket_connect() set
callback as NULL), so, do you mean removing this mechanism?


Yes, remove it all, as it is no longer needed.



Thanks for clarifying. Actually, I am still curious why nobody want to 
use this mechanism, is there any disadvantage? And why this mechanism is 
introduced in



Regards,
Daniel



--
Yours Sincerely,

Cao jin

Re: [Qemu-devel] [PATCH 2/2] util/qemu-sockets: shoot unix_nonblocking_connect()

2016-07-22 Thread Daniel P. Berrange

On Fri, Jul 22, 2016 at 06:43:51PM +0800, Cao jin wrote:
> 
> 
> On 07/22/2016 06:30 PM, Daniel P. Berrange wrote:
> > On Fri, Jul 22, 2016 at 06:34:11PM +0800, Cao jin wrote:
> > > Hi Daniel
> > > 
> > > On 07/21/2016 11:39 PM, Daniel P. Berrange wrote:
> > > > On Thu, Jul 21, 2016 at 08:42:25AM -0600, Eric Blake wrote:
> > > > > On 07/21/2016 04:33 AM, Cao jin wrote:
> > > > > > It is never used, and now all connect is nonblocking via
> > > > > > inet_connect_addr().
> > > > > > 
> > > > > 
> > > > > Could be squashed with 1/2.  In fact, if you squash it, I'd title the 
> > > > > patch:
> > > > > 
> > > > > util: Drop unused *_nonblocking_connect() functions
> > > > > 
> > > > > You may also want to call out which commit id rendered the functions 
> > > > > unused.
> > > > 
> > > > Well once those two functions are dropped the only other place accepting
> > > > NonBlockingConnectHandler is the socket_connect() method. Since nearly
> > > > everything is converted to QIOChannel now, there's only one caller of
> > > > socket_connect() left, and that's net/socket.c
> > > > 
> > > > Any newly written code which needs a non-blocking connect should use the
> > > > QIOChannel code, so I don't see any further usage of socket_connect()
> > > > being added.
> > > > 
> > > > IOW, we can rip out NonBlockingConnectHandler as a concept entirely, not
> > > > merely drop the *_nonblocking_connect() methods.
> > > > 
> > > 
> > > I don't quite follow the "rip out NonBlockingConnectHandler" thing.
> > > According what I learned from code, we offered non-blocking connection
> > > mechanism, but it seems nobody use it(all callers of socket_connect() set
> > > callback as NULL), so, do you mean removing this mechanism?
> > 
> > Yes, remove it all, as it is no longer needed.
> > 
> 
> Thanks for clarifying. Actually, I am still curious why nobody want to use
> this mechanism, is there any disadvantage? And why this mechanism is
> introduced in

As mentioned previously it is obsolete as all new code will use the QIOChannel
APIs which already provide non-blocking connect in a different manner. The
qemu-sockets non-blocking code never worked properly in the first place
because it calls getaddrinfo() which blocks on DNS lookups.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)

2016-07-22 Thread Paolo Bonzini

> > > i.e. the use of fiemap to duplicate the exact layout of a file
> > > from userspace is only posisble if you can /guarantee/ the source
> > > file has not changed in any way during the copy operation at the
> > > pointin time you finalise the destination data copy.
> > 
> > We don't do exactly that, exactly because it's messy when you have
> > concurrent accesses (which shouldn't be done but you never know).
> 
> Which means you *cannot make the assumption it won't happen*.
> FIEMAP is not guaranteed to tell you exactly where the data in the
> file is that you need to copy is and that nothing you can do from
> userspace changes that. I can't say it any clearer than that.

You've said it very clearly.  But I'm not saying "fix the damn FIEMAP", I'm
saying "this is what we need, lseek doesn't provide it, FIEMAP comes
close but really doesn't".  If the solution is to fix FIEMAP, if it's
possible at all, that'd be great.  But any other solution is okay.

Do you at least agree that it's possible to use the kind of information
in struct fiemap_extent (extent start, length and flags) in a way that is
not racy, or at least not any different from SEEK_DATA and SEEK_HOLE's
raciness?  I'm not saying that you'd get that information from FIEMAP.
It's just the kind of information that I'd like to get. 

(BTW, the documentation of FIEMAP is horrible.  It does not say at all
that FIEMAP_FLAG_SYNC is needed to return extents that match what
SEEK_HOLE/SEEK_DATA would return.  The obvious reading is that
FIEMAP_FLAG_SYNC would avoid returning FIEMAP_EXTENT_DELALLOC extents,
and that in turn would not be a problem if you don't need fe_physical.
Perhaps it would help if fiemap.txt said started with *why* would one
use FIEMAP, not just what it does).

> > When doing a copy, we use(d to use) FIEMAP the same way as you'd use lseek,
> > querying one extent at a time.  If you proceed this way, all of these
> > can cause the same races:
> > 
> > - pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is
> > not copied
> > 
> > - pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is
> > copied
> > 
> > - lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not
> > copied
> > 
> > - lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is
> > copied
> > 
> > - ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so
> > the 10MB..20MB area is not copied
> 
> No, FIEMAP is not guaranteed to behave like this. what is returned
> is filesystem dependent. Fielsystems that don't support holes will
> return data extents. Filesystems that support compression might
> return a compressed data extent rather than a hole. Encrypted files
> might not expose holes at all, so people can't easily find known
> plain text regions in the encrypted data. Filesystems could report
> holes as deduplicated data, etc.  What do you do when FIEMAP returns
> "OFFLINE" to indicate that the data is located elsewhere and will
> need to be retrieved by the HSM operating on top of the filesystem
> before layout can be determined?

lseek(SEEK_DATA) might also say you're not on a hole because the file
is compressed/encrypted/deduplicated/offline/whatnot.  So lseek is
also filesystem dependent, isn't it?  It also doesn't work on block
devices, so it's really file descriptor dependent.  That's not news.

Of course read, lseek and FIEMAP will not return exactly the same
information.  The point is that they're subject to exactly the same
races, and that struct fiemap_extent *can* be parsed conservatively.
The code I attached to the previous message does that, if it finds any
extent kind other than an unwritten extent it just treats it as data.

Based on this it would be nice to understand the reason why FIEMAP needs
FIEMAP_FLAG_SYNC to return meaningful values (the meaningful values
are there, they're just what lseek or read use), or to have a more
powerful function than just lseek(SEEK_DATA/SEEK_HOLE).  All we want is
to be able to distinguish between the three fallocate modes.

> The assumptions being made about FIEMAP behaviour will only lead to
> user data corruption, as they already have several times in the past.

Indeed, FIEMAP as is ranks just above gets() in usability.  But there's
no reason why it has to be that way.

Paolo

Re: [Qemu-devel] [PATCH 2/2] util/qemu-sockets: shoot unix_nonblocking_connect()

2016-07-22 Thread Cao jin




On 07/22/2016 06:38 PM, Daniel P. Berrange wrote:

On Fri, Jul 22, 2016 at 06:43:51PM +0800, Cao jin wrote:



On 07/22/2016 06:30 PM, Daniel P. Berrange wrote:

On Fri, Jul 22, 2016 at 06:34:11PM +0800, Cao jin wrote:

Hi Daniel

On 07/21/2016 11:39 PM, Daniel P. Berrange wrote:

On Thu, Jul 21, 2016 at 08:42:25AM -0600, Eric Blake wrote:

On 07/21/2016 04:33 AM, Cao jin wrote:

It is never used, and now all connect is nonblocking via
inet_connect_addr().



Could be squashed with 1/2.  In fact, if you squash it, I'd title the patch:

util: Drop unused *_nonblocking_connect() functions

You may also want to call out which commit id rendered the functions unused.


Well once those two functions are dropped the only other place accepting
NonBlockingConnectHandler is the socket_connect() method. Since nearly
everything is converted to QIOChannel now, there's only one caller of
socket_connect() left, and that's net/socket.c

Any newly written code which needs a non-blocking connect should use the
QIOChannel code, so I don't see any further usage of socket_connect()
being added.

IOW, we can rip out NonBlockingConnectHandler as a concept entirely, not
merely drop the *_nonblocking_connect() methods.



I don't quite follow the "rip out NonBlockingConnectHandler" thing.
According what I learned from code, we offered non-blocking connection
mechanism, but it seems nobody use it(all callers of socket_connect() set
callback as NULL), so, do you mean removing this mechanism?


Yes, remove it all, as it is no longer needed.



Thanks for clarifying. Actually, I am still curious why nobody want to use
this mechanism, is there any disadvantage? And why this mechanism is
introduced in


As mentioned previously it is obsolete as all new code will use the QIOChannel
APIs which already provide non-blocking connect in a different manner. The
qemu-sockets non-blocking code never worked properly in the first place
because it calls getaddrinfo() which blocks on DNS lookups.



Aha! I see! And also I see the comment you left in the code:
/* socket_connect() does a non-blocking connect(), but it
 * still blocks in DNS lookups, so we must use a thread */

Thanks very much, and I think maybe I can do this cleanup work:)


Regards,
Daniel



--
Yours Sincerely,

Cao jin

Re: [Qemu-devel] [PATCH] target-sh4: Use glib allocator in movcal helper

2016-07-22 Thread Peter Maydell

On 21 July 2016 at 17:28, Aurelien Jarno  wrote:
> On 2016-07-21 13:44, Peter Maydell wrote:
>> Ping?
>>
>> thanks
>> -- PMM
>>
>> On 12 July 2016 at 13:50, Peter Maydell  wrote:
>> > Coverity spots that helper_movcal() calls malloc() but doesn't
>> > check for failure. Fix this by switching to the glib allocation
>> > functions, which abort on allocation failure.
>> >
>> > Signed-off-by: Peter Maydell 
>> > ---
>> >  target-sh4/op_helper.c | 7 ---
>> >  1 file changed, 4 insertions(+), 3 deletions(-)
>
> I have just looked at it and test it. It's all fine, sorry for the
> delay.
>
> Acked-by: Aurelien Jarno 

Applied to master, thanks.

-- PMM

[Qemu-devel] [PATCH RFC] spapr: disintricate core-id from DT semantics

2016-07-22 Thread Greg Kurz

The goal of this patch is to have a stable core-id which does not depend
on any DT related semantics, which involve non-obvious computations on
modern PowerPC server cpus.

With this patch, the DT core id is computed on-demand as:

   (core-id / smp_threads) * smt

where smt is the number of threads per core in the host.

This formula should be consolidated in a helper since it is needed in
several places.

Other uses for core-id includes: compute a stable cpu_index (which
allows random order hotplug/unplug without breaking migration) and
NUMA.

Signed-off-by: Greg Kurz 
---

It was first suggested here:

https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg01727.html

and as option 1) in the following discussion on IRC:

 imammedo, basically the options are: 1) change core-ids to be
 0, 1, .. n and compute cpu_index as core_id * threads + thread#, or
 2) leave core-ids as they are and calculate cpu_index as
 core-id / smt * threads + thread#

It is based on David's ppc-for-2.7 branch (commit bb6268f35f457).

It is lightly tested but I could at least do in-order hotplug/unplug.


 hw/ppc/spapr.c  |   10 +-
 hw/ppc/spapr_cpu_core.c |   24 +++-
 2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9193ac2c122b..fbbd0518edd4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1815,10 +1815,11 @@ static void ppc_spapr_init(MachineState *machine)
 
 spapr->cores = g_new0(Object *, spapr_max_cores);
 for (i = 0; i < spapr_max_cores; i++) {
-int core_dt_id = i * smt;
+int core_id = i * smp_threads;
 sPAPRDRConnector *drc =
 spapr_dr_connector_new(OBJECT(spapr),
-   SPAPR_DR_CONNECTOR_TYPE_CPU, 
core_dt_id);
+   SPAPR_DR_CONNECTOR_TYPE_CPU,
+   (core_id / smp_threads) * smt);
 
 qemu_register_reset(spapr_drc_reset, drc);
 
@@ -1834,7 +1835,7 @@ static void ppc_spapr_init(MachineState *machine)
 core  = object_new(type);
 object_property_set_int(core, smp_threads, "nr-threads",
 &error_fatal);
-object_property_set_int(core, core_dt_id, 
CPU_CORE_PROP_CORE_ID,
+object_property_set_int(core, core_id, CPU_CORE_PROP_CORE_ID,
 &error_fatal);
 object_property_set_bool(core, true, "realized", &error_fatal);
 }
@@ -2376,7 +2377,6 @@ static HotpluggableCPUList 
*spapr_query_hotpluggable_cpus(MachineState *machine)
 HotpluggableCPUList *head = NULL;
 sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
 int spapr_max_cores = max_cpus / smp_threads;
-int smt = kvmppc_smt_threads();
 
 for (i = 0; i < spapr_max_cores; i++) {
 HotpluggableCPUList *list_item = g_new0(typeof(*list_item), 1);
@@ -2386,7 +2386,7 @@ static HotpluggableCPUList 
*spapr_query_hotpluggable_cpus(MachineState *machine)
 cpu_item->type = spapr_get_cpu_core_type(machine->cpu_model);
 cpu_item->vcpus_count = smp_threads;
 cpu_props->has_core_id = true;
-cpu_props->core_id = i * smt;
+cpu_props->core_id = i * smp_threads;
 /* TODO: add 'has_node/node' here to describe
to which node core belongs */
 
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 4bfc96bd5a67..c04aaa47d7cd 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -103,7 +103,6 @@ static void spapr_core_release(DeviceState *dev, void 
*opaque)
 size_t size = object_type_get_instance_size(typename);
 sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 CPUCore *cc = CPU_CORE(dev);
-int smt = kvmppc_smt_threads();
 int i;
 
 for (i = 0; i < cc->nr_threads; i++) {
@@ -117,7 +116,7 @@ static void spapr_core_release(DeviceState *dev, void 
*opaque)
 object_unparent(obj);
 }
 
-spapr->cores[cc->core_id / smt] = NULL;
+spapr->cores[cc->core_id / smp_threads] = NULL;
 
 g_free(sc->threads);
 object_unparent(OBJECT(dev));
@@ -128,18 +127,19 @@ void spapr_core_unplug(HotplugHandler *hotplug_dev, 
DeviceState *dev,
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
 CPUCore *cc = CPU_CORE(dev);
+int smt = kvmppc_smt_threads();
+int index = cc->core_id / smp_threads;
 sPAPRDRConnector *drc =
-spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, cc->core_id);
+spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, index * smt);
 sPAPRDRConnectorClass *drck;
 Error *local_err = NULL;
-int smt = kvmppc_smt_threads();
-int index = cc->core_id / smt;
 int spapr_max_cores = max_cpus / smp_threads;
 int i;
 
 for (i = spapr_max_cores - 1; i > index; i--) {
 if (spapr->cores[i]) {
-error_set

[Qemu-devel] [PATCH v3] block/gluster: add support to choose libgfapi logfile

2016-07-22 Thread Prasanna Kumar Kalever

currently all the libgfapi logs defaults to '/dev/stderr' as it was hardcoded
in a call to glfs logging api, in case if debug level is chosen to DEBUG/TRACE
gfapi logs will be huge and fill/overflow the console view.

this patch provides a commandline option to mention log file path which helps
in logging to the specified file and also help in persisting the gfapi logs.

Usage:
-
 *URI Style:
  -
  -drive file=gluster://hostname/volname/image.qcow2,file.debug=9,\
  file.logfile=/var/log/qemu/qemu-gfapi.log

 *JSON Style:
  --
  'json:{
   "driver":"qcow2",
   "file":{
  "driver":"gluster",
  "volume":"volname",
  "path":"image.qcow2",
  "debug":"9",
  "logfile":"/var/log/qemu/qemu-gfapi.log",
  "server":[
 {
"type":"tcp",
"host":"1.2.3.4",
"port":24007
 },
 {
"type":"unix",
"socket":"/var/run/glusterd.socket"
 }
  ]
   }
}'

Signed-off-by: Prasanna Kumar Kalever 
---
v3: rebased on master, which is now QMP compatible.
v2: address comments from Jeff Cody, thanks Jeff!
v1: initial patch
---
 block/gluster.c  | 47 ++-
 qapi/block-core.json |  5 -
 2 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 01b479f..51a1089 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -26,10 +26,12 @@
 #define GLUSTER_OPT_IPV4"ipv4"
 #define GLUSTER_OPT_IPV6"ipv6"
 #define GLUSTER_OPT_SOCKET  "socket"
-#define GLUSTER_OPT_DEBUG   "debug"
 #define GLUSTER_DEFAULT_PORT24007
+#define GLUSTER_OPT_DEBUG   "debug"
 #define GLUSTER_DEBUG_DEFAULT   4
 #define GLUSTER_DEBUG_MAX   9
+#define GLUSTER_OPT_LOGFILE "logfile"
+#define GLUSTER_LOGFILE_DEFAULT "-" /* handled in libgfapi as /dev/stderr 
*/
 
 #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n"
 
@@ -44,6 +46,7 @@ typedef struct GlusterAIOCB {
 typedef struct BDRVGlusterState {
 struct glfs *glfs;
 struct glfs_fd *fd;
+char *logfile;
 bool supports_seek_data;
 int debug_level;
 } BDRVGlusterState;
@@ -73,6 +76,11 @@ static QemuOptsList qemu_gluster_create_opts = {
 .type = QEMU_OPT_NUMBER,
 .help = "Gluster log level, valid range is 0-9",
 },
+{
+.name = GLUSTER_OPT_LOGFILE,
+.type = QEMU_OPT_STRING,
+.help = "Logfile path of libgfapi",
+},
 { /* end of list */ }
 }
 };
@@ -91,6 +99,11 @@ static QemuOptsList runtime_opts = {
 .type = QEMU_OPT_NUMBER,
 .help = "Gluster log level, valid range is 0-9",
 },
+{
+.name = GLUSTER_OPT_LOGFILE,
+.type = QEMU_OPT_STRING,
+.help = "Logfile path of libgfapi",
+},
 { /* end of list */ }
 },
 };
@@ -341,7 +354,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 }
 }
 
-ret = glfs_set_logging(glfs, "-", gconf->debug_level);
+ret = glfs_set_logging(glfs, gconf->logfile, gconf->debug_level);
 if (ret < 0) {
 goto out;
 }
@@ -576,7 +589,9 @@ static struct glfs 
*qemu_gluster_init(BlockdevOptionsGluster *gconf,
 if (ret < 0) {
 error_setg(errp, "invalid URI");
 error_append_hint(errp, "Usage: file=gluster[+transport]://"
-
"[host[:port]]/volume/path[?socket=...]\n");
+"[host[:port]]volname/image[?socket=...]"
+"[,file.debug=N]"
+"[,file.logfile=/path/filename.log]\n");
 errno = -ret;
 return NULL;
 }
@@ -586,7 +601,8 @@ static struct glfs 
*qemu_gluster_init(BlockdevOptionsGluster *gconf,
 error_append_hint(errp, "Usage: "
  "-drive driver=qcow2,file.driver=gluster,"
  "file.volume=testvol,file.path=/path/a.qcow2"
- "[,file.debug=9],file.server.0.type=tcp,"
+ 
"[,file.debug=9][,file.logfile=/path/filename.log]"
+ "file.server.0.type=tcp,"
  "file.server.0.host=1.2.3.4,"
  "file.server.0.port=24007,"
  "file.server.1.transport=unix,"
@@ -677,7 +693,7 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
*options,
 BlockdevOptionsGluster *gconf = NULL;
 QemuOpts *opts;
 Error *local_err = NULL;
-const char *filename;
+const char *filename, *logfile;
 
 opts = qemu_opts_create(&runtime_opts, NULL, 0,

Re: [Qemu-devel] [PATCH v2 3/3] wdt_ib700: Free timer

2016-07-22 Thread Corey Minyard


On 07/22/2016 05:12 AM, Richard W.M. Jones wrote:

On Fri, Jul 22, 2016 at 05:26:27AM -0400, Marc-André Lureau wrote:

btw, I do not how to test this yet either.

I have a little test framework for the watchdog device which cuts
through a lot of the BS with running the full watchdog daemon, and
also has some simple instructions to follow:

   http://git.annexia.org/?p=watchdog-test-framework.git;a=tree

Rich.


Thanks, but unfortunately, I need to be able to test unrealizing
the device in qemu.  These changes don't affect the function
of the watchdog.  Nice little program, though, I'll point our
testers at it.

-corey

Re: [Qemu-devel] [PATCH v2 1/3] ipmi_bmc_sim: Add a proper unrealize function

2016-07-22 Thread Corey Minyard


On 07/22/2016 04:22 AM, Marc-André Lureau wrote:

Hi

- Original Message -

From: Corey Minyard 

Add an unrealize function to free the timer allocated in the
realize function, unregsiter the vmstate, and free any
pending messages.

I don't know how to test this either, the device seems to be hotpluggable, but 
doing device_del crashes qemu. Looks like it would be worth fixing that too 
(even better would be to automate this kind of test for all devices, but that's 
just some thought)



That's actually a bug.  This device's hot plug should be tied to it's 
interface's hot plug, which is a separate device.  I was trying to 
unplug the interface, which is on an ISA bus, not the BMC.



Also, get rid of the unnecessary mutex, it was a vestige
of something else that was not done.  That way we don't
have to free it.

You may want to split this in a seperate patch


Yeah, you are right.

Thanks,

-corey




Signed-off-by: Corey Minyard 
Cc: Marc-André Lureau 
---
  hw/ipmi/ipmi_bmc_sim.c | 22 --
  1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
index dc9c14c..fe92b93 100644
--- a/hw/ipmi/ipmi_bmc_sim.c
+++ b/hw/ipmi/ipmi_bmc_sim.c
@@ -217,7 +217,6 @@ struct IPMIBmcSim {
  /* Odd netfns are for responses, so we only need the even ones. */
  const IPMINetfn *netfns[MAX_NETFNS / 2];
  
-QemuMutex lock;

  /* We allow one event in the buffer */
  uint8_t evtbuf[16];
  
@@ -940,7 +939,6 @@ static void get_msg(IPMIBmcSim *ibs,

  {
  IPMIRcvBufEntry *msg;
  
-qemu_mutex_lock(&ibs->lock);

  if (QTAILQ_EMPTY(&ibs->rcvbufs)) {
  rsp_buffer_set_error(rsp, 0x80); /* Queue empty */
  goto out;
@@ -960,7 +958,6 @@ static void get_msg(IPMIBmcSim *ibs,
  }
  
  out:

-qemu_mutex_unlock(&ibs->lock);
  return;
  }
  
@@ -1055,11 +1052,9 @@ static void send_msg(IPMIBmcSim *ibs,

   end_msg:
  msg->buf[msg->len] = ipmb_checksum(msg->buf, msg->len, 0);
  msg->len++;
-qemu_mutex_lock(&ibs->lock);
  QTAILQ_INSERT_TAIL(&ibs->rcvbufs, msg, entry);
  ibs->msg_flags |= IPMI_BMC_MSG_FLAG_RCV_MSG_QUEUE;
  k->set_atn(s, 1, attn_irq_enabled(ibs));
-qemu_mutex_unlock(&ibs->lock);
  }
  
  static void do_watchdog_reset(IPMIBmcSim *ibs)

@@ -1753,7 +1748,6 @@ static void ipmi_sim_realize(DeviceState *dev, Error
**errp)
  unsigned int i;
  IPMIBmcSim *ibs = IPMI_BMC_SIMULATOR(b);
  
-qemu_mutex_init(&ibs->lock);

  QTAILQ_INIT(&ibs->rcvbufs);
  
  ibs->bmc_global_enables = (1 << IPMI_BMC_EVENT_LOG_BIT);

@@ -1786,12 +1780,28 @@ static void ipmi_sim_realize(DeviceState *dev, Error
**errp)
  vmstate_register(NULL, 0, &vmstate_ipmi_sim, ibs);
  }
  
+static void ipmi_sim_unrealize(DeviceState *dev, Error **errp)

+{
+IPMIBmc *b = IPMI_BMC(dev);
+IPMIRcvBufEntry *msg, *tmp;
+IPMIBmcSim *ibs = IPMI_BMC_SIMULATOR(b);
+
+vmstate_unregister(NULL, &vmstate_ipmi_sim, ibs);
+timer_del(ibs->timer);
+timer_free(ibs->timer);
+QTAILQ_FOREACH_SAFE(msg, &ibs->rcvbufs, entry, tmp) {
+QTAILQ_REMOVE(&ibs->rcvbufs, msg, entry);
+g_free(msg);
+}
+}
+

Otherwise, for completeness, this looks good so
Reviewed-by: Marc-André Lureau 


  static void ipmi_sim_class_init(ObjectClass *oc, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(oc);
  IPMIBmcClass *bk = IPMI_BMC_CLASS(oc);
  
  dc->realize = ipmi_sim_realize;

+dc->unrealize = ipmi_sim_unrealize;
  bk->handle_command = ipmi_sim_handle_command;
  }
  
--

2.7.4

Re: [Qemu-devel] [PULL 0/4] Block patches

2016-07-22 Thread Max Reitz

On 21.07.2016 21:14, Peter Maydell wrote:
> On 20 July 2016 at 22:16, Eric Blake  wrote:
>> On 07/20/2016 10:05 AM, Peter Maydell wrote:
>>> On 19 July 2016 at 23:47, Max Reitz  wrote:
 The following changes since commit 
 5d3217340adcb6c4f0e4af5d2b865331eb2ff63d:

   disas: Fix ATTRIBUTE_UNUSED define clash with ALSA headers (2016-07-19 
 16:40:39 +0100)

 are available in the git repository at:

   git://github.com/XanClic/qemu.git tags/pull-block-2016-07-20

 for you to fetch changes up to bafea5b7c26dd14895f7be64685a12645a75f4cf:

   block: export LUKS specific data to qemu-img info (2016-07-20 00:34:03 
 +0200)

 
 Block patches for master

 
>>>
>>> Fails to build on everything:
>>>
>>>   GEN   qapi-visit.h
>>> In file included from /Users/pm215/src/qemu-for-merges/qapi-schema.json:9:
>>> /Users/pm215/src/qemu-for-merges/qapi/crypto.json:299: Union
>>> 'QCryptoBlockInfo' data missing 'qcow' branch
>>
>> Aha. Cause is two branches developed in parallel; commit d0b18239 forces
>> all branches of a flat union to be listed (to avoid an abort() if the
>> user passes a branch that was not listed); solution is to expand the
>> crypto.json addition to cover all branches, even if it means an empty
>> type for the branches that have no additional data.
> 
> I'm just processing the last other outstanding pullreq now,
> so unless a respin of this arrives by tomorrow lunchtime UK
> time it's going to miss rc0, I think.

Since there's nothing critical in this pull request, I'll drop the
crypto patches and keep the non-offending patches for the next pull
request (for rc1).

So no need to wait for another pull request from me for rc0.

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] linux-user: Handle brk() attempts with very large sizes

2016-07-22 Thread Peter Maydell

In do_brk(), we were inadvertently truncating the size
of a requested brk() from the guest by putting it into an
'int' variable. This meant that we would incorrectly report
success back to the guest rather than a failed allocation,
typically resulting in the guest then segfaulting. Use
abi_ulong instead.

This fixes a crash in the '31370.cc' test in the gcc libstdc++ test
suite (the test case starts by trying to allocate a very large
size and reduces the size until the allocation succeeds).

Signed-off-by: Peter Maydell 
---
 linux-user/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b4dc721..0e98593 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -830,7 +830,7 @@ void target_set_brk(abi_ulong new_brk)
 abi_long do_brk(abi_ulong new_brk)
 {
 abi_long mapped_addr;
-intnew_alloc_size;
+abi_ulong new_alloc_size;
 
 DEBUGF_BRK("do_brk(" TARGET_ABI_FMT_lx ") -> ", new_brk);
 
-- 
1.9.1

[Qemu-devel] [PATCH v4 1/2] crypto: add support for querying parameters for block encryption

2016-07-22 Thread Daniel P. Berrange

When creating new block encryption volumes, we accept a list of
parameters to control the formatting process. It is useful to
be able to query what those parameters were for existing block
devices. Add a qcrypto_block_get_info() method which returns a
QCryptoBlockInfo instance to report this data.

Signed-off-by: Daniel P. Berrange 
---
 crypto/block-luks.c| 67 ++
 crypto/block.c | 17 ++
 crypto/blockpriv.h |  4 +++
 include/crypto/block.h | 16 ++
 qapi/crypto.json   | 87 ++
 5 files changed, 191 insertions(+)

diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index fcf3b04..aba4455 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -201,6 +201,15 @@ QEMU_BUILD_BUG_ON(sizeof(struct QCryptoBlockLUKSHeader) != 
592);
 
 struct QCryptoBlockLUKS {
 QCryptoBlockLUKSHeader header;
+
+/* Cache parsed versions of what's in header fields,
+ * as we can't rely on QCryptoBlock.cipher being
+ * non-NULL */
+QCryptoCipherAlgorithm cipher_alg;
+QCryptoCipherMode cipher_mode;
+QCryptoIVGenAlgorithm ivgen_alg;
+QCryptoHashAlgorithm ivgen_hash_alg;
+QCryptoHashAlgorithm hash_alg;
 };
 
 
@@ -847,6 +856,12 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 block->payload_offset = luks->header.payload_offset *
 QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
 
+luks->cipher_alg = cipheralg;
+luks->cipher_mode = ciphermode;
+luks->ivgen_alg = ivalg;
+luks->ivgen_hash_alg = ivhash;
+luks->hash_alg = hash;
+
 g_free(masterkey);
 g_free(password);
 
@@ -1271,6 +1286,12 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 goto error;
 }
 
+luks->cipher_alg = luks_opts.cipher_alg;
+luks->cipher_mode = luks_opts.cipher_mode;
+luks->ivgen_alg = luks_opts.ivgen_alg;
+luks->ivgen_hash_alg = luks_opts.ivgen_hash_alg;
+luks->hash_alg = luks_opts.hash_alg;
+
 memset(masterkey, 0, luks->header.key_bytes);
 g_free(masterkey);
 memset(slotkey, 0, luks->header.key_bytes);
@@ -1305,6 +1326,51 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 }
 
 
+static int qcrypto_block_luks_get_info(QCryptoBlock *block,
+   QCryptoBlockInfo *info,
+   Error **errp)
+{
+QCryptoBlockLUKS *luks = block->opaque;
+QCryptoBlockInfoLUKSSlot *slot;
+QCryptoBlockInfoLUKSSlotList *slots = NULL, **prev = &info->u.luks.slots;
+size_t i;
+
+info->u.luks.cipher_alg = luks->cipher_alg;
+info->u.luks.cipher_mode = luks->cipher_mode;
+info->u.luks.ivgen_alg = luks->ivgen_alg;
+if (info->u.luks.ivgen_alg == QCRYPTO_IVGEN_ALG_ESSIV) {
+info->u.luks.has_ivgen_hash_alg = true;
+info->u.luks.ivgen_hash_alg = luks->ivgen_hash_alg;
+}
+info->u.luks.hash_alg = luks->hash_alg;
+info->u.luks.payload_offset = block->payload_offset;
+info->u.luks.master_key_iters = luks->header.master_key_iterations;
+info->u.luks.uuid = g_strndup((const char *)luks->header.uuid,
+  sizeof(luks->header.uuid));
+
+for (i = 0; i < QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS; i++) {
+slots = g_new0(QCryptoBlockInfoLUKSSlotList, 1);
+*prev = slots;
+
+slots->value = slot = g_new0(QCryptoBlockInfoLUKSSlot, 1);
+slot->active = luks->header.key_slots[i].active ==
+QCRYPTO_BLOCK_LUKS_KEY_SLOT_ENABLED;
+slot->key_offset = luks->header.key_slots[i].key_offset
+ * QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
+if (slot->active) {
+slot->has_iters = true;
+slot->iters = luks->header.key_slots[i].iterations;
+slot->has_stripes = true;
+slot->stripes = luks->header.key_slots[i].stripes;
+}
+
+prev = &slots->next;
+}
+
+return 0;
+}
+
+
 static void qcrypto_block_luks_cleanup(QCryptoBlock *block)
 {
 g_free(block->opaque);
@@ -1342,6 +1408,7 @@ qcrypto_block_luks_encrypt(QCryptoBlock *block,
 const QCryptoBlockDriver qcrypto_block_driver_luks = {
 .open = qcrypto_block_luks_open,
 .create = qcrypto_block_luks_create,
+.get_info = qcrypto_block_luks_get_info,
 .cleanup = qcrypto_block_luks_cleanup,
 .decrypt = qcrypto_block_luks_decrypt,
 .encrypt = qcrypto_block_luks_encrypt,
diff --git a/crypto/block.c b/crypto/block.c
index da60eba..be823ee 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -105,6 +105,23 @@ QCryptoBlock 
*qcrypto_block_create(QCryptoBlockCreateOptions *options,
 }
 
 
+QCryptoBlockInfo *qcrypto_block_get_info(QCryptoBlock *block,
+ Error **errp)
+{
+QCryptoBlockInfo *info = g_new0(QCryptoBlockInfo, 1);
+
+info->format = block->format;
+
+if (block->driver->get_info &&
+block->driver->get_info(block, info, errp) < 0) {
+g_free(info);
+return NULL;
+}
+
+return inf

[Qemu-devel] [PATCH v4 2/2] block: export LUKS specific data to qemu-img info

2016-07-22 Thread Daniel P. Berrange

The qemu-img info command has the ability to expose format
specific metadata about volumes. Wire up this facility for
the LUKS driver to report on cipher configuration and key
slot usage.

$ qemu-img info ~/VirtualMachines/demo.luks
image: /home/berrange/VirtualMachines/demo.luks
file format: luks
virtual size: 98M (102760448 bytes)
disk size: 100M
encrypted: yes
Format specific information:
ivgen alg: plain64
hash alg: sha1
cipher alg: aes-128
uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
cipher mode: xts
slots:
[0]:
active: true
iters: 572706
key offset: 4096
stripes: 4000
[1]:
active: false
key offset: 135168
[2]:
active: false
key offset: 266240
[3]:
active: false
key offset: 397312
[4]:
active: false
key offset: 528384
[5]:
active: false
key offset: 659456
[6]:
active: false
key offset: 790528
[7]:
active: false
key offset: 921600
payload offset: 2097152
master key iters: 142375

One somewhat undesirable artifact is that the data fields are
printed out in (apparently) random order. This will be addressed
later by changing the way the block layer pretty-prints the
image specific data.

Signed-off-by: Daniel P. Berrange 
---
 block/crypto.c   | 49 +
 qapi/block-core.json |  6 +-
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index 7eaa057..7f61e12 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -563,6 +563,53 @@ static int block_crypto_create_luks(const char *filename,
filename, opts, errp);
 }
 
+static int block_crypto_get_info_luks(BlockDriverState *bs,
+  BlockDriverInfo *bdi)
+{
+BlockDriverInfo subbdi;
+int ret;
+
+ret = bdrv_get_info(bs->file->bs, &subbdi);
+if (ret != 0) {
+return ret;
+}
+
+bdi->unallocated_blocks_are_zero = false;
+bdi->can_write_zeroes_with_unmap = false;
+bdi->cluster_size = subbdi.cluster_size;
+
+return 0;
+}
+
+static ImageInfoSpecific *
+block_crypto_get_specific_info_luks(BlockDriverState *bs)
+{
+BlockCrypto *crypto = bs->opaque;
+ImageInfoSpecific *spec_info;
+QCryptoBlockInfo *info;
+
+info = qcrypto_block_get_info(crypto->block, NULL);
+if (!info) {
+return NULL;
+}
+if (info->format != Q_CRYPTO_BLOCK_FORMAT_LUKS) {
+qapi_free_QCryptoBlockInfo(info);
+return NULL;
+}
+
+spec_info = g_new(ImageInfoSpecific, 1);
+spec_info->type = IMAGE_INFO_SPECIFIC_KIND_LUKS;
+spec_info->u.luks.data = g_new(QCryptoBlockInfoLUKS, 1);
+*spec_info->u.luks.data = info->u.luks;
+
+/* Blank out pointers we've just stolen to avoid double free */
+memset(&info->u.luks, 0, sizeof(info->u.luks));
+
+qapi_free_QCryptoBlockInfo(info);
+
+return spec_info;
+}
+
 BlockDriver bdrv_crypto_luks = {
 .format_name= "luks",
 .instance_size  = sizeof(BlockCrypto),
@@ -576,6 +623,8 @@ BlockDriver bdrv_crypto_luks = {
 .bdrv_co_readv  = block_crypto_co_readv,
 .bdrv_co_writev = block_crypto_co_writev,
 .bdrv_getlength = block_crypto_getlength,
+.bdrv_get_info  = block_crypto_get_info_luks,
+.bdrv_get_specific_info = block_crypto_get_specific_info_luks,
 };
 
 static void block_crypto_init(void)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index f462345..d4bab5d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -85,7 +85,11 @@
 { 'union': 'ImageInfoSpecific',
   'data': {
   'qcow2': 'ImageInfoSpecificQCow2',
-  'vmdk': 'ImageInfoSpecificVmdk'
+  'vmdk': 'ImageInfoSpecificVmdk',
+  # If we need to add block driver specific parameters for
+  # LUKS in future, then we'll subclass QCryptoBlockInfoLUKS
+  # to define a ImageInfoSpecificLUKS
+  'luks': 'QCryptoBlockInfoLUKS'
   } }
 
 ##
-- 
2.7.4

[Qemu-devel] [PATCH v4 0/2] Report format specific info for LUKS block driver

2016-07-22 Thread Daniel P. Berrange

This is a followup to:

  v1: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01723.html
  v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03642.html
  v3: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03885.html

The 'qemu-img info' tool has ability to print format specific
information, eg with qcow2 it reports two extra items:

  $ qemu-img info ~/VirtualMachines/demo.qcow2
  image: /home/berrange/VirtualMachines/demo.qcow2
  file format: qcow2
  virtual size: 3.0G (3221225472 bytes)
  disk size: 140K
  cluster_size: 65536
  Format specific information:
  compat: 0.10
  refcount bits: 16


This is not currently wired up for the LUKS driver. This patch
series adds that support so that we can report useful data about
the LUKS volume such as the crypto algorithm choices, key slot
usage and other volume metadata.

The first patch extends the crypto API to allow querying of the
format specific metadata

The second patches extends the block API to allow the LUKS driver
to report the format specific metadata.

$ qemu-img info ~/VirtualMachines/demo.luks
image: /home/berrange/VirtualMachines/demo.luks
file format: luks
virtual size: 98M (102760448 bytes)
disk size: 100M
encrypted: yes
Format specific information:
ivgen alg: plain64
hash alg: sha1
cipher alg: aes-128
uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
cipher mode: xts
slots:
[0]:
active: true
iters: 572706
key offset: 4096
stripes: 4000
[1]:
active: false
key offset: 135168
[2]:
active: false
key offset: 266240
[3]:
active: false
key offset: 397312
[4]:
active: false
key offset: 528384
[5]:
active: false
key offset: 659456
[6]:
active: false
key offset: 790528
[7]:
active: false
key offset: 921600
payload offset: 2097152
master key iters: 142375

Technically most of the code changes here are in the crypto
layer, rather than the block layer. I'm fine with both patches
going through the block maintainer tree, or can submit a both
patches myself as, for sake of simplicity of merge.

Changed in v4:

 - Introduce an empty QCryptoBlockInfoQCow struct to keep
   QAPI generator happy (Eric)

Changed in v3:

 - Do full struct copy instead of field-by-field copy (Max)
 - Simplify handling of linked list pointers (Max)
 - Use g_strndup with uuid to guarantee null termination (Max)
 - Misc typos (Max)

Changed in v2:

 - Drop patches related to creating a text output visitor to
   format the ImageInfoSpecific data. This will be continued
   in a separate patch series
 - Fix key offset to be in bytes instead of sectors
 - Drop the duplicated ImageInfoSpecificLUKS type and just
   directly use QCryptoBlockInfoLUKS type in block layer
 - Skip reporting stripes/iters if keyslot is inactive
 - Add missing QAPI schema docs




Daniel P. Berrange (2):
  crypto: add support for querying parameters for block encryption
  block: export LUKS specific data to qemu-img info

 block/crypto.c | 49 
 crypto/block-luks.c| 67 ++
 crypto/block.c | 17 ++
 crypto/blockpriv.h |  4 +++
 include/crypto/block.h | 16 ++
 qapi/block-core.json   |  6 +++-
 qapi/crypto.json   | 87 ++
 7 files changed, 245 insertions(+), 1 deletion(-)

-- 
2.7.4

[Qemu-devel] [PATCH] target-ppc: set MSR_CM bit for BookE 2.06 MMU

2016-07-22 Thread Michael Walle

64 bit user mode doesn't work for the e5500 core because the MSR_CM bit is
not set which enables the 64 bit mode for this MMU model. Memory addresses
are truncated to 32 bit, which results in "Invalid data memory access"
error messages. Fix it by setting the MSR_CM bit for this MMU model.

Signed-off-by: Michael Walle 
---
 target-ppc/translate_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 5ecafc7..1ebb143 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -10218,6 +10218,9 @@ static void ppc_cpu_reset(CPUState *s)
 if (env->mmu_model & POWERPC_MMU_64) {
 msr |= (1ULL << MSR_SF);
 }
+if (env->mmu_model == POWERPC_MMU_BOOKE206) {
+msr |= (1ULL << MSR_CM);
+}
 #endif
 
 hreg_store_msr(env, msr, 1);
-- 
2.1.4

Re: [Qemu-devel] [PATCH] hw/mips_malta: Fix YAMON API print routine

2016-07-22 Thread Aurelien Jarno

On 2016-07-22 10:55, Paul Burton wrote:
> The print routine provided as part of the in-built bootloader had a bug
> in that it attempted to use a jump instruction as part of a loop, but
> the target has its upper bits zeroed leading to control flow
> transferring to 0xb814 rather than the intended 0xbfc00814. Fix this
> by using a branch instruction instead, which seems more fit for purpose.
> 
> A simple way to test this is to build a Linux kernel with EVA enabled &
> attempt to boot it in QEMU. It will attempt to print a message
> indicating the configuration mismatch but QEMU would previously
> incorrectly jump & wind up printing a continuous stream of the letter E.
> 
> Signed-off-by: Paul Burton 
> Cc: Aurelien Jarno 
> Cc: Leon Alrae 
> ---
>  hw/mips/mips_malta.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
> index 34d41ef..e90857e 100644
> --- a/hw/mips/mips_malta.c
> +++ b/hw/mips/mips_malta.c
> @@ -727,7 +727,7 @@ static void write_bootloader(uint8_t *base, int64_t 
> run_addr,
>  stl_p(p++, 0x); /* nop */
>  stl_p(p++, 0x0ff0021c); /* jal 870 */
>  stl_p(p++, 0x); /* nop */
> -stl_p(p++, 0x08000205); /* j 814 */
> +stl_p(p++, 0x1000fff9); /* b 814 */
>  stl_p(p++, 0x); /* nop */
>  stl_p(p++, 0x01a9); /* jalr t5 */
>  stl_p(p++, 0x01602021); /* move 
> a0,t3 */

This looks fine. The switch from jump to branch is questionable given
there are other jumps around in the code, but that's just nitpicking.

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] target-ppc: set MSR_CM bit for BookE 2.06 MMU

2016-07-22 Thread Alexander Graf


> On 22 Jul 2016, at 15:00, Michael Walle  wrote:
> 
> 64 bit user mode doesn't work for the e5500 core because the MSR_CM bit is
> not set which enables the 64 bit mode for this MMU model. Memory addresses
> are truncated to 32 bit, which results in "Invalid data memory access"
> error messages. Fix it by setting the MSR_CM bit for this MMU model.
> 
> Signed-off-by: Michael Walle 
> ---
> target-ppc/translate_init.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index 5ecafc7..1ebb143 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -10218,6 +10218,9 @@ static void ppc_cpu_reset(CPUState *s)
> if (env->mmu_model & POWERPC_MMU_64) {
> msr |= (1ULL << MSR_SF);
> }
> +if (env->mmu_model == POWERPC_MMU_BOOKE206) {

Is this check correct? Doesn’t e500mc adhere to 2.06 as well? Running

  qemu-system-ppc64 -M ppce500 -cpu e500mc …

is perfectly valid and should just work. With your patch, it would start in 
invalid 64bit mode :).


Alex

Re: [Qemu-devel] [PATCH v4] virtio-pci: error out when both legacy and modern modes are disabled

2016-07-22 Thread Cornelia Huck

On Fri, 22 Jul 2016 12:11:11 +0200
Greg Kurz  wrote:

> On Fri, 22 Jul 2016 10:04:35 +0200
> Cornelia Huck  wrote:
> 
> > On Thu, 21 Jul 2016 23:21:16 +0200
> > Greg Kurz  wrote:
> > 
> > > From: Greg Kurz 
> > > 
> > > Without presuming if we got there because of a user mistake or some
> > > more subtle bug in the tooling, it really does not make sense to
> > > implement a non-functional device.
> > > 
> > > Signed-off-by: Greg Kurz 
> > > Reviewed-by: Marcel Apfelbaum 
> > > Signed-off-by: Greg Kurz 
> > > ---
> > > v4: - rephrased error message and provide a hint to the user
> > > - split string literals to stay below 80 characters
> > > - added Marcel's R-b tag
> > > ---
> > >  hw/virtio/virtio-pci.c |8 
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > index 755f9218b77d..72c4b392ffda 100644
> > > --- a/hw/virtio/virtio-pci.c
> > > +++ b/hw/virtio/virtio-pci.c
> > > @@ -1842,6 +1842,14 @@ static void virtio_pci_dc_realize(DeviceState 
> > > *qdev, Error **errp)
> > >  VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> > >  PCIDevice *pci_dev = &proxy->pci_dev;
> > > 
> > > +if (!(virtio_pci_modern(proxy) || virtio_pci_legacy(proxy))) {  
> > 
> > I'm not sure that I didn't mess up the sequence of the realize
> > callbacks, but could disable_legacy still be AUTO here? In that case,
> > we'd fail for disable-modern=on and disable-legacy unset (i.e., AUTO),
> > which would be ok for pcie but not for !pcie.
> > 
> 
> Marcel made the same comment in:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg05225.html
> 
> If the user explicitly disables modern, she shouldn't rely on QEMU
> implicitly enabling legacy, hence the suggestion in error_append_hint().

I don't know, I'd find that a bit surprising, especially as I would end
up with a legacy-capable device if I did not specify anything in
the !pcie case.

> 
> > > +error_setg(errp, "device cannot work when both modern and legacy 
> > > modes"
> > > +   " are disabled");

Suggest to change this wording to:

"device cannot work as neither modern nor legacy mode is enabled"

as this more accurately reflects what happened (the user did not
actively disable legacy in the case above).

> > > +error_append_hint(errp, "Set either disable-modern or 
> > > disable-legacy"
> > > +  " to off\n");

The hint looks fine to me :)

> > > +return;
> > > +}
> > > +
> > >  if (!(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_PCIE) &&
> > >  virtio_pci_modern(proxy)) {
> > >  pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
> > >   
> > 
>

Re: [Qemu-devel] [PATCH v4] virtio-pci: error out when both legacy and modern modes are disabled

2016-07-22 Thread Greg Kurz

On Fri, 22 Jul 2016 12:32:24 +0200
Cornelia Huck  wrote:

> On Fri, 22 Jul 2016 12:11:11 +0200
> Greg Kurz  wrote:
> 
> > On Fri, 22 Jul 2016 10:04:35 +0200
> > Cornelia Huck  wrote:
> >   
> > > On Thu, 21 Jul 2016 23:21:16 +0200
> > > Greg Kurz  wrote:
> > >   
> > > > From: Greg Kurz 
> > > > 
> > > > Without presuming if we got there because of a user mistake or some
> > > > more subtle bug in the tooling, it really does not make sense to
> > > > implement a non-functional device.
> > > > 
> > > > Signed-off-by: Greg Kurz 
> > > > Reviewed-by: Marcel Apfelbaum 
> > > > Signed-off-by: Greg Kurz 
> > > > ---
> > > > v4: - rephrased error message and provide a hint to the user
> > > > - split string literals to stay below 80 characters
> > > > - added Marcel's R-b tag
> > > > ---
> > > >  hw/virtio/virtio-pci.c |8 
> > > >  1 file changed, 8 insertions(+)
> > > > 
> > > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > > index 755f9218b77d..72c4b392ffda 100644
> > > > --- a/hw/virtio/virtio-pci.c
> > > > +++ b/hw/virtio/virtio-pci.c
> > > > @@ -1842,6 +1842,14 @@ static void virtio_pci_dc_realize(DeviceState 
> > > > *qdev, Error **errp)
> > > >  VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> > > >  PCIDevice *pci_dev = &proxy->pci_dev;
> > > > 
> > > > +if (!(virtio_pci_modern(proxy) || virtio_pci_legacy(proxy))) {
> > > 
> > > I'm not sure that I didn't mess up the sequence of the realize
> > > callbacks, but could disable_legacy still be AUTO here? In that case,
> > > we'd fail for disable-modern=on and disable-legacy unset (i.e., AUTO),
> > > which would be ok for pcie but not for !pcie.
> > >   
> > 
> > Marcel made the same comment in:
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg05225.html
> > 
> > If the user explicitly disables modern, she shouldn't rely on QEMU
> > implicitly enabling legacy, hence the suggestion in error_append_hint().  
> 
> I don't know, I'd find that a bit surprising, especially as I would end
> up with a legacy-capable device if I did not specify anything in
> the !pcie case.
> 

Isn't it already what happens with legacy being the default in pre-2.7 QEMU ?

Do you think we should have separate checks for pcie and !pcie ?

> >   
> > > > +error_setg(errp, "device cannot work when both modern and 
> > > > legacy modes"
> > > > +   " are disabled");  
> 
> Suggest to change this wording to:
> 
> "device cannot work as neither modern nor legacy mode is enabled"
> 
> as this more accurately reflects what happened (the user did not
> actively disable legacy in the case above).
> 

Thanks ! This is THE wording I was looking for :)

> > > > +error_append_hint(errp, "Set either disable-modern or 
> > > > disable-legacy"
> > > > +  " to off\n");  
> 
> The hint looks fine to me :)
> 

It was the easy part :)

> > > > +return;
> > > > +}
> > > > +
> > > >  if (!(proxy->flags & VIRTIO_PCI_FLAG_DISABLE_PCIE) &&
> > > >  virtio_pci_modern(proxy)) {
> > > >  pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
> > > > 
> > >   
> >   
> 
>

Re: [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation

2016-07-22 Thread Aurelien Jarno

On 2016-07-19 09:09, Richard Henderson wrote:
> On 06/24/2016 09:18 AM, Richard Henderson wrote:
> > I was unhappy about the complexity of the second try.
> > 
> > Better to convert to normal temps, allowing in rare
> > occasions, spilling the "globals" to the stack in order
> > to satisfy register allocation.
> > 
> > I can no longer provoke an allocation failure on i686.
> > Hopefully this fixes the OpenBSD case that Mark mentioned
> > re the second attempt.
> 
> Ping for review.  It would be nice to have this fixed for 2.7, but this is
> complex enough I'd prefer another set of eyes.

I'll try to have a look during the week-end. Sorry about the delay.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH] target-ppc: set MSR_CM bit for BookE 2.06 MMU

2016-07-22 Thread Michael Walle


Am 2016-07-22 15:07, schrieb Alexander Graf:

On 22 Jul 2016, at 15:00, Michael Walle  wrote:

64 bit user mode doesn't work for the e5500 core because the MSR_CM 
bit is
not set which enables the 64 bit mode for this MMU model. Memory 
addresses

are truncated to 32 bit, which results in "Invalid data memory access"
error messages. Fix it by setting the MSR_CM bit for this MMU model.

Signed-off-by: Michael Walle 
---
target-ppc/translate_init.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 5ecafc7..1ebb143 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -10218,6 +10218,9 @@ static void ppc_cpu_reset(CPUState *s)
if (env->mmu_model & POWERPC_MMU_64) {
msr |= (1ULL << MSR_SF);
}
+if (env->mmu_model == POWERPC_MMU_BOOKE206) {


Is this check correct? Doesn’t e500mc adhere to 2.06 as well? Running

  qemu-system-ppc64 -M ppce500 -cpu e500mc …

is perfectly valid and should just work. With your patch, it would
start in invalid 64bit mode :).


Alex


Mhh, sorry I don't really have any understanding of the PPC state after 
reset. Should have flagged this as RFC.


Maybe I should explain my issue. I'm debugging a problem with the 64 bit 
linux-user variant (qemu-ppc64). There the first instructions causes an 
"Invalid data memory access" because the address is truncated to 32 bit. 
This is because the msr_is_64bit() returns false in my case. So first 
question here, is qemu-ppc64 supposed to set the MSR to 64bit mode? I 
guess so, because 32bit mode would be the qemu-ppc binary. What is the 
MSR state in full system emulation for a e5500 core? 64bit or 32bit?


If it is 32bit, the simple solution would be to put #ifdef 
CONFIG_USER_ONLY around my patch, right?
If the MMU is in 64bit mode after reset, I would have to check for the 
e5500, too. Mhh, I don't see that this information is available in 
ppc_cpu_reset().


-michael

Re: [Qemu-devel] [PATCH v4] virtio-pci: error out when both legacy and modern modes are disabled

2016-07-22 Thread Cornelia Huck

On Fri, 22 Jul 2016 15:23:19 +0200
Greg Kurz  wrote:

> On Fri, 22 Jul 2016 12:32:24 +0200
> Cornelia Huck  wrote:
> 
> > On Fri, 22 Jul 2016 12:11:11 +0200
> > Greg Kurz  wrote:
> > 
> > > On Fri, 22 Jul 2016 10:04:35 +0200
> > > Cornelia Huck  wrote:
> > >   
> > > > On Thu, 21 Jul 2016 23:21:16 +0200
> > > > Greg Kurz  wrote:
> > > >   
> > > > > From: Greg Kurz 
> > > > > 
> > > > > Without presuming if we got there because of a user mistake or some
> > > > > more subtle bug in the tooling, it really does not make sense to
> > > > > implement a non-functional device.
> > > > > 
> > > > > Signed-off-by: Greg Kurz 
> > > > > Reviewed-by: Marcel Apfelbaum 
> > > > > Signed-off-by: Greg Kurz 
> > > > > ---
> > > > > v4: - rephrased error message and provide a hint to the user
> > > > > - split string literals to stay below 80 characters
> > > > > - added Marcel's R-b tag
> > > > > ---
> > > > >  hw/virtio/virtio-pci.c |8 
> > > > >  1 file changed, 8 insertions(+)
> > > > > 
> > > > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > > > index 755f9218b77d..72c4b392ffda 100644
> > > > > --- a/hw/virtio/virtio-pci.c
> > > > > +++ b/hw/virtio/virtio-pci.c
> > > > > @@ -1842,6 +1842,14 @@ static void virtio_pci_dc_realize(DeviceState 
> > > > > *qdev, Error **errp)
> > > > >  VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> > > > >  PCIDevice *pci_dev = &proxy->pci_dev;
> > > > > 
> > > > > +if (!(virtio_pci_modern(proxy) || virtio_pci_legacy(proxy))) {   
> > > > >  
> > > > 
> > > > I'm not sure that I didn't mess up the sequence of the realize
> > > > callbacks, but could disable_legacy still be AUTO here? In that case,
> > > > we'd fail for disable-modern=on and disable-legacy unset (i.e., AUTO),
> > > > which would be ok for pcie but not for !pcie.
> > > >   
> > > 
> > > Marcel made the same comment in:
> > > 
> > > https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg05225.html
> > > 
> > > If the user explicitly disables modern, she shouldn't rely on QEMU
> > > implicitly enabling legacy, hence the suggestion in error_append_hint().  
> > 
> > I don't know, I'd find that a bit surprising, especially as I would end
> > up with a legacy-capable device if I did not specify anything in
> > the !pcie case.
> > 
> 
> Isn't it already what happens with legacy being the default in pre-2.7 QEMU ?

Well, that is exactly my point; users may be surprised.

> 
> Do you think we should have separate checks for pcie and !pcie ?

I don't think we should overengineer this.

> 
> > >   
> > > > > +error_setg(errp, "device cannot work when both modern and 
> > > > > legacy modes"
> > > > > +   " are disabled");  
> > 
> > Suggest to change this wording to:
> > 
> > "device cannot work as neither modern nor legacy mode is enabled"
> > 
> > as this more accurately reflects what happened (the user did not
> > actively disable legacy in the case above).
> > 
> 
> Thanks ! This is THE wording I was looking for :)

:)

I'm fine with the patch with the changed wording, as it less confusing
for the user.



if the only thing you change is the message>

Re: [Qemu-devel] [PATCH] gt64xxx: access right I/O port when activating byte swapping

2016-07-22 Thread Aurelien Jarno

On 2016-06-18 22:48, Hervé Poussineau wrote:
> Hi Aurélien,
> 
> Le 20/05/2016 à 21:56, Aurelien Jarno a écrit :
> > On 2016-05-20 15:05, Hervé Poussineau wrote:
> > > Incidentally, this fixes YAMON on big endian guest.
> > > 
> > > Signed-off-by: Hervé Poussineau 
> > > ---
> > >  hw/mips/gt64xxx_pci.c | 62 
> > > +--
> > >  1 file changed, 60 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
> > > index 3f4523d..c76ee88 100644
> > > --- a/hw/mips/gt64xxx_pci.c
> > > +++ b/hw/mips/gt64xxx_pci.c
> > > @@ -177,6 +177,7 @@
> > > 
> > >  /* PCI Internal */
> > >  #define GT_PCI0_CMD  (0xc00 >> 2)
> > > +#define   GT_CMD_MWORDSWAP  (1 << 10)
> > >  #define GT_PCI0_TOR  (0xc04 >> 2)
> > >  #define GT_PCI0_BS_SCS10 (0xc08 >> 2)
> > >  #define GT_PCI0_BS_SCS32 (0xc0c >> 2)
> > > @@ -294,6 +295,62 @@ static void gt64120_isd_mapping(GT64120State *s)
> > >  memory_region_add_subregion(get_system_memory(), s->ISD_start, 
> > > &s->ISD_mem);
> > >  }
> > > 
> > > +static uint64_t gt64120_pci_io_read(void *opaque, hwaddr addr,
> > > +unsigned int size)
> > > +{
> > > +GT64120State *s = opaque;
> > > +uint8_t buf[4];
> > > +
> > > +if (s->regs[GT_PCI0_CMD] & GT_CMD_MWORDSWAP) {
> > 
> > First of all, it should be noted that this bit doesn't control byte
> > swapping, but swaps the 2 4-byte words in a 8-byte word.
> > 
> > > +addr = (addr & ~3) + 4 - size - (addr & 3);
> > 
> > This looks complicated, and I don't think it is correct. In addition
> > this doesn't behave correctly at the edges of the address space. For
> > example a 2 byte access at address 0x3 would access address
> > 0x.
> > 
> > For sizes <= 4, swapping the 2 words should be done with addr ^= 4.
> > Maybe you should also check for MBYTESWAP which also swaps the bytes
> > within a 8-byte word.
> 
> The real word problem (ie the one from Yamon) is:
> In LE Yamon, there is a read a 0x4d1 (len = 1). MWORDSWAP and MBYTESWAP are 
> disabled
> In BE Yamon, the same read is at address 0x4d2. MWORDSWAP is enabled while 
> MBYTESWAP is disabled.
> 
> MWORDSWAP documentation is:
> "The GT-64120 PCI master swaps the words of the incoming and outgoing PCI 
> data (swap the 2 words of a long word)"
> 
> Do we have to ignore it, as QEMU only handles 4-bytes accesses?

I think indeed that it should be ignored.

> Then, how to change this address 0x4d2 to 0x4d1, address where is located the 
> i8259 ELCR register?
> Next accesses are for the RTC, at address 0x72 in BE and address 0x71 in LE.
> I think I'm missing something.

If you talk about byte accesses, it really looks like a simple byteswap.
0x4d2 ^ 3 = 0x4d1 and 0x72 ^ 3 = 0x71. This could be there is a byteswap
that is missing somewhere. It would be interesting to see how 16-bit and
32-bit accesses are changed between big and little endian.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net

Re: [Qemu-devel] [PATCH v4 0/2] Report format specific info for LUKS block driver

2016-07-22 Thread Eric Blake

On 07/22/2016 06:53 AM, Daniel P. Berrange wrote:
> This is a followup to:
> 
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01723.html
>   v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03642.html
>   v3: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg03885.html
> 
> The 'qemu-img info' tool has ability to print format specific
> information, eg with qcow2 it reports two extra items:
> 
>   $ qemu-img info ~/VirtualMachines/demo.qcow2
>   image: /home/berrange/VirtualMachines/demo.qcow2
>   file format: qcow2
>   virtual size: 3.0G (3221225472 bytes)
>   disk size: 140K
>   cluster_size: 65536
>   Format specific information:
>   compat: 0.10
>   refcount bits: 16
> 


> 
> Changed in v4:
> 
>  - Introduce an empty QCryptoBlockInfoQCow struct to keep
>QAPI generator happy (Eric)

> 
> 
> Daniel P. Berrange (2):
>   crypto: add support for querying parameters for block encryption
>   block: export LUKS specific data to qemu-img info

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v3 1/2] qdev: ignore GlobalProperty.errp for hotplugged devices

2016-07-22 Thread Eduardo Habkost

On Fri, Jul 22, 2016 at 11:28:48AM +1000, David Gibson wrote:
> On Fri, Jul 22, 2016 at 01:01:26AM +0200, Greg Kurz wrote:
> > This patch ensures QEMU won't terminate while hotplugging a device if the
> > global property cannot be set and errp points to error_fatal or error_abort.
> > 
> > While here, it also fixes indentation of the typename argument.
> > 
> > Suggested-by: Eduardo Habkost 
> > Signed-off-by: Greg Kurz 
> 
> This seems kind of bogus to me - we have this whole infrastructure for
> handling errors, and here we throw it away.

What is this patch throwing away? We have never been able to use
the error infrastructure properly while applying global
properties.

> 
> It seems like the right solution would be to make the caller in the
> hotplug case *not* use error_abort or error_fatal, and instead get the
> error propagated back to the monitor which will display it.

GlobalProperty::errp is a workaround to the fact that
ObjectClass::instance_post_init() can't report errors at all (and
that's because object_new() and object_initialize_with_type()
can't report errors. Do you have any suggestions to fix it?

I have suggested saving global property errors in a DeviceState
field and reporting then later on device_realize(). Maybe I
should implement it and send as RFC.

> 
> > ---
> >  hw/core/qdev-properties.c |4 ++--
> >  include/hw/qdev-core.h|4 +++-
> >  2 files changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> > index 14e544ab17d2..311af6da7684 100644
> > --- a/hw/core/qdev-properties.c
> > +++ b/hw/core/qdev-properties.c
> > @@ -1084,7 +1084,7 @@ int qdev_prop_check_globals(void)
> >  }
> >  
> >  static void qdev_prop_set_globals_for_type(DeviceState *dev,
> > -const char *typename)
> > +   const char *typename)
> >  {
> >  GList *l;
> >  
> > @@ -1100,7 +1100,7 @@ static void 
> > qdev_prop_set_globals_for_type(DeviceState *dev,
> >  if (err != NULL) {
> >  error_prepend(&err, "can't apply global %s.%s=%s: ",
> >prop->driver, prop->property, prop->value);
> > -if (prop->errp) {
> > +if (!dev->hotplugged && prop->errp) {
> >  error_propagate(prop->errp, err);
> >  } else {
> >  assert(prop->user_provided);
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 1d1f8612a9b8..4b4b33bec885 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -261,7 +261,9 @@ struct PropertyInfo {
> >   * @used: Set to true if property was used when initializing a device.
> >   * @errp: Error destination, used like first argument of error_setg()
> >   *in case property setting fails later. If @errp is NULL, we
> > - *print warnings instead of ignoring errors silently.
> > + *print warnings instead of ignoring errors silently. For
> > + *hotplugged devices, errp is always ignored and warnings are
> > + *printed instead.
> >   */
> >  typedef struct GlobalProperty {
> >  const char *driver;
> > 
> 
> -- 
> David Gibson  | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au| minimalist, thank you.  NOT _the_ 
> _other_
>   | _way_ _around_!
> http://www.ozlabs.org/~dgibson



-- 
Eduardo

Re: [Qemu-devel] [PATCH v4] virtio-pci: error out when both legacy and modern modes are disabled

2016-07-22 Thread Greg Kurz

On Fri, 22 Jul 2016 15:42:48 +0200
Cornelia Huck  wrote:

> [...]
> > > > > On Thu, 21 Jul 2016 23:21:16 +0200
> > > > > Greg Kurz  wrote:
> > > > > 
> > > > > > From: Greg Kurz 
> > > > > > 
> > > > > > Without presuming if we got there because of a user mistake or some
> > > > > > more subtle bug in the tooling, it really does not make sense to
> > > > > > implement a non-functional device.
> > > > > > 
> > > > > > Signed-off-by: Greg Kurz 
> > > > > > Reviewed-by: Marcel Apfelbaum 
> > > > > > Signed-off-by: Greg Kurz 
> > > > > > ---
> > > > > > v4: - rephrased error message and provide a hint to the user
> > > > > > - split string literals to stay below 80 characters
> > > > > > - added Marcel's R-b tag
> > > > > > ---
> > > > > >  hw/virtio/virtio-pci.c |8 
> > > > > >  1 file changed, 8 insertions(+)
> > > > > > 
> > > > > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > > > > index 755f9218b77d..72c4b392ffda 100644
> > > > > > --- a/hw/virtio/virtio-pci.c
> > > > > > +++ b/hw/virtio/virtio-pci.c
> > > > > > @@ -1842,6 +1842,14 @@ static void 
> > > > > > virtio_pci_dc_realize(DeviceState *qdev, Error **errp)
> > > > > >  VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> > > > > >  PCIDevice *pci_dev = &proxy->pci_dev;
> > > > > > 
> > > > > > +if (!(virtio_pci_modern(proxy) || virtio_pci_legacy(proxy))) { 
> > > > > >  
> > > > > 
> > > > > I'm not sure that I didn't mess up the sequence of the realize
> > > > > callbacks, but could disable_legacy still be AUTO here? In that case,
> > > > > we'd fail for disable-modern=on and disable-legacy unset (i.e., AUTO),
> > > > > which would be ok for pcie but not for !pcie.
> > > > > 
> > > > 
> > > > Marcel made the same comment in:
> > > > 
> > > > https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg05225.html
> > > > 
> > > > If the user explicitly disables modern, she shouldn't rely on QEMU
> > > > implicitly enabling legacy, hence the suggestion in 
> > > > error_append_hint().
> > > 
> > > I don't know, I'd find that a bit surprising, especially as I would end
> > > up with a legacy-capable device if I did not specify anything in
> > > the !pcie case.
> > >   
> > 
> > Isn't it already what happens with legacy being the default in pre-2.7 QEMU 
> > ?  
> 
> Well, that is exactly my point; users may be surprised.
> 

One day legacy will be hopefully buried :)

> > 
> > Do you think we should have separate checks for pcie and !pcie ?  
> 
> I don't think we should overengineer this.
> 

Agreed.

> >   
> > > > 
> > > > > > +error_setg(errp, "device cannot work when both modern and 
> > > > > > legacy modes"
> > > > > > +   " are disabled");
> > > 
> > > Suggest to change this wording to:
> > > 
> > > "device cannot work as neither modern nor legacy mode is enabled"
> > > 
> > > as this more accurately reflects what happened (the user did not
> > > actively disable legacy in the case above).
> > >   
> > 
> > Thanks ! This is THE wording I was looking for :)  
> 
> :)
> 
> I'm fine with the patch with the changed wording, as it less confusing
> for the user.
> 
>  
> Reviewed-by: Cornelia Huck 
> 
> if the only thing you change is the message>
> 

I'll do this right away as I'll be offline for 1 month starting... just after
I post v5 :)

Cheers.

--
Greg

1 2 >

1 - 100 of 154 matches

Mail list logo