date:20160308

Re: [Qemu-devel] [RFC PATCH v1 01/10] exec: Remove cpu from cpus list during cpu_exec_exit()

2016-03-08 Thread Bharata B Rao

On Mon, Mar 07, 2016 at 05:23:48PM +0100, Thomas Huth wrote:
> On 04.03.2016 07:54, Bharata B Rao wrote:
> > CPUState *cpu gets added to the cpus list during cpu_exec_init(). It
> > should be removed from cpu_exec_exit().
> > 
> > cpu_exec_init() is called from generic CPU::instance_finalize and some
> 
> s/cpu_exec_init/cpu_exec_exit/
> 
> > archs like PowerPC call it from CPU unrealizefn. So ensure that we
> > dequeue the cpu only once.
> > 
> > Now -1 value for cpu->cpu_index indicates that we have already dequeued
> > the cpu for CONFIG_USER_ONLY case also.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  exec.c | 32 
> >  1 file changed, 24 insertions(+), 8 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index c62c439..7c3f747 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -588,15 +588,9 @@ static int cpu_get_free_index(Error **errp)
> >  return cpu;
> >  }
> >  
> > -void cpu_exec_exit(CPUState *cpu)
> > +static void cpu_release_index(CPUState *cpu)
> >  {
> > -if (cpu->cpu_index == -1) {
> > -/* cpu_index was never allocated by this @cpu or was already 
> > freed. */
> > -return;
> > -}
> > -
> >  bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
> > -cpu->cpu_index = -1;
> >  }
> >  #else
> >  
> > @@ -611,11 +605,33 @@ static int cpu_get_free_index(Error **errp)
> >  return cpu_index;
> >  }
> >  
> > -void cpu_exec_exit(CPUState *cpu)
> > +static void cpu_release_index(CPUState *cpu)
> >  {
> > +return;
> 
> You could also simply leave that return statement away, I think.
> 
> >  }
> >  #endif
> >  
> > +void cpu_exec_exit(CPUState *cpu)
> > +{
> > +#if defined(CONFIG_USER_ONLY)
> > +cpu_list_lock();
> > +#endif
> > +if (cpu->cpu_index == -1) {
> > +/* cpu_index was never allocated by this @cpu or was already 
> > freed. */
> > +#if defined(CONFIG_USER_ONLY)
> > +cpu_list_unlock();
> > +#endif
> > +return;
> > +}
> > +
> > +QTAILQ_REMOVE(&cpus, cpu, node);
> > +cpu_release_index(cpu);
> > +cpu->cpu_index = -1;
> > +#if defined(CONFIG_USER_ONLY)
> > +cpu_list_unlock();
> > +#endif
> > +}
> 
> Since there are a couple of these
> 
> #if defined(CONFIG_USER_ONLY)
> cpu_list_[un]lock();
> #endif
> 
> in exec.c already, it might be somewhat nices to declare them at the
> beginning of the file as empty functions, somewhat like:
> 
> #if !defined(CONFIG_USER_ONLY)
> static inline void cpu_list_lock(void)
> {
> }
> static inline void cpu_list_unlock(void)
> {
> }
> #endif
> 
> What do you think about that?

If you and/or the maintainer insist/prefer, I can make the change.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH] usb: fix unbound stack warning for inotify_watchfn

2016-03-08 Thread Paolo Bonzini



On 09/03/2016 07:07, Peter Xu wrote:
> -char buf[len];
> +char buf[sizeof(struct inotify_event) + NAME_MAX + 1];
>  
>  for (;;) {
> -bytes = read(s->inotifyfd, buf, len);
> +bytes = read(s->inotifyfd, buf, sizeof(buf));

sizeof is good here, since read takes a size in bytes.

Reviewed-by: Paolo Bonzini

Re: [Qemu-devel] [PATCH] migration: fix warning for source_return_path_thread

2016-03-08 Thread Paolo Bonzini



On 09/03/2016 07:12, Peter Xu wrote:
> max_len is not necessary, while it brings a warning during compilation
> when specify "-Wstack-usage=100". Replacing using sizeof().
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 0129d9f..6680d95 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1268,8 +1268,7 @@ static void *source_return_path_thread(void *opaque)
>  MigrationState *ms = opaque;
>  QEMUFile *rp = ms->rp_state.from_dst_file;
>  uint16_t header_len, header_type;
> -const int max_len = 512;
> -uint8_t buf[max_len];
> +uint8_t buf[512];
>  uint32_t tmp32, sibling_error;
>  ram_addr_t start = 0; /* =0 to silence warning */
>  size_t  len = 0, expected_len;
> @@ -1292,7 +1291,7 @@ static void *source_return_path_thread(void *opaque)
>  
>  if ((rp_cmd_args[header_type].len != -1 &&
>  header_len != rp_cmd_args[header_type].len) ||
> -header_len > max_len) {
> +header_len > sizeof(buf)) {

sizeof works fine because buf is an array of bytes, but ARRAY_SIZE is
better because it works for any type of buffer.

Paolo

>  error_report("RP: Received '%s' message (0x%04x) with"
>  "incorrect length %d expecting %zu",
>  rp_cmd_args[header_type].name, header_type, header_len,
>

Re: [Qemu-devel] [RFC PATCH v1 08/10] spapr: CPU hotplug support

2016-03-08 Thread Bharata B Rao

On Wed, Mar 09, 2016 at 01:58:53PM +1100, David Gibson wrote:
> On Tue, Mar 08, 2016 at 10:37:08AM +0100, Igor Mammedov wrote:
> > On Tue, 8 Mar 2016 15:27:39 +1100
> > David Gibson  wrote:
> > 
> > > On Mon, Mar 07, 2016 at 11:59:42AM +0530, Bharata B Rao wrote:
> > > > On Mon, Mar 07, 2016 at 02:49:06PM +1100, David Gibson wrote:  
> > > > > On Fri, Mar 04, 2016 at 12:24:19PM +0530, Bharata B Rao wrote:  
> > > > > > Set up device tree entries for the hotplugged CPU core and use the
> > > > > > exising EPOW event infrastructure to send CPU hotplug notification 
> > > > > > to
> > > > > > the guest.
> > > > > > 
> > > > > > Signed-off-by: Bharata B Rao 
> > > > > > ---
> > > > > >  hw/ppc/spapr.c  | 73 
> > > > > > -
> > > > > >  hw/ppc/spapr_cpu_core.c | 60 
> > > > > > +
> > > > > >  hw/ppc/spapr_events.c   |  3 ++
> > > > > >  hw/ppc/spapr_rtas.c | 24 ++
> > > > > >  include/hw/ppc/spapr.h  |  4 +++
> > > > > >  include/hw/ppc/spapr_cpu_core.h |  2 ++
> > > > > >  6 files changed, 165 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > > index 5acb612..6c4ac50 100644
> > > > > > --- a/hw/ppc/spapr.c
> > > > > > +++ b/hw/ppc/spapr.c
> > > > > > @@ -603,6 +603,18 @@ static void spapr_populate_cpu_dt(CPUState 
> > > > > > *cs, void *fdt, int offset,
> > > > > >  size_t page_sizes_prop_size;
> > > > > >  uint32_t vcpus_per_socket = smp_threads * smp_cores;
> > > > > >  uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
> > > > > > +sPAPRMachineClass *smc = 
> > > > > > SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
> > > > > > +sPAPRDRConnector *drc;
> > > > > > +sPAPRDRConnectorClass *drck;
> > > > > > +int drc_index;
> > > > > > +
> > > > > > +if (smc->dr_cpu_enabled) {
> > > > > > +drc = 
> > > > > > spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, index);
> > > > > > +g_assert(drc);
> > > > > > +drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > > +drc_index = drck->get_index(drc);
> > > > > > +_FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", 
> > > > > > drc_index)));
> > > > > > +}
> > > > > >  
> > > > > >  /* Note: we keep CI large pages off for now because a 64K 
> > > > > > capable guest
> > > > > >   * provisioned with large pages might otherwise try to map a 
> > > > > > qemu
> > > > > > @@ -987,6 +999,16 @@ static void 
> > > > > > spapr_finalize_fdt(sPAPRMachineState *spapr,
> > > > > >  _FDT(spapr_drc_populate_dt(fdt, 0, NULL, 
> > > > > > SPAPR_DR_CONNECTOR_TYPE_LMB));
> > > > > >  }
> > > > > >  
> > > > > > +if (smc->dr_cpu_enabled) {
> > > > > > +int offset = fdt_path_offset(fdt, "/cpus");
> > > > > > +ret = spapr_drc_populate_dt(fdt, offset, NULL,
> > > > > > +SPAPR_DR_CONNECTOR_TYPE_CPU);
> > > > > > +if (ret < 0) {
> > > > > > +error_report("Couldn't set up CPU DR device tree 
> > > > > > properties");
> > > > > > +exit(1);
> > > > > > +}
> > > > > > +}
> > > > > > +
> > > > > >  _FDT((fdt_pack(fdt)));
> > > > > >  
> > > > > >  if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
> > > > > > @@ -1181,7 +1203,7 @@ static void ppc_spapr_reset(void)
> > > > > >  
> > > > > >  }
> > > > > >  
> > > > > > -static void spapr_cpu_reset(void *opaque)
> > > > > > +void spapr_cpu_reset(void *opaque)
> > > > > >  {
> > > > > >  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > > > > >  PowerPCCPU *cpu = opaque;
> > > > > > @@ -1622,6 +1644,8 @@ static void spapr_boot_set(void *opaque, 
> > > > > > const char *boot_device,
> > > > > >  void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu, 
> > > > > > Error **errp)
> > > > > >  {
> > > > > >  CPUPPCState *env = &cpu->env;
> > > > > > +CPUState *cs = CPU(cpu);
> > > > > > +int i;
> > > > > >  
> > > > > >  /* Set time-base frequency to 512 MHz */
> > > > > >  cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> > > > > > @@ -1646,6 +1670,14 @@ void spapr_cpu_init(sPAPRMachineState 
> > > > > > *spapr, PowerPCCPU *cpu, Error **errp)
> > > > > >  }
> > > > > >  }
> > > > > >  
> > > > > > +/* Set NUMA node for the added CPUs  */
> > > > > > +for (i = 0; i < nb_numa_nodes; i++) {
> > > > > > +if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
> > > > > > +cs->numa_node = i;
> > > > > > +break;
> > > > > > +}
> > > > > > +}
> > > > > > +  
> > > > > 
> > > > > This hunk seems like it belongs in a different patch.  
> > > > 
> > > > It appears that this would be needed by other archs also to set the
> > > > NUMA node for the hot-plugged CPU. How about make an API out of this
> > > > and use this something like below ? Igor ?  
> > > 
> > > Is the

Re: [Qemu-devel] [PATCH 4/8] usb: fix unbounded stack for xhci_dma_write_u32s

2016-03-08 Thread Paolo Bonzini



On 09/03/2016 06:08, Peter Xu wrote:
> pxdev:bin# gcc -v
> Using built-in specs.
> COLLECT_GCC=/bin/gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
> --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla 
> --enable-bootstrap --enable-shared --enable-threads=posix 
> --enable-checking=release --with-system-zlib --enable-__cxa_atexit 
> --disable-libunwind-exceptions --enable-gnu-unique-object 
> --enable-linker-build-id --with-linker-hash-style=gnu 
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin 
> --enable-initfini-array --disable-libgcj 
> --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install
>  
> --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install
>  --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 
> --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)
> 
> Do you know why "might not be inlinable"? Failed to figure it out
> myself as mentioned in cover letter..

It's just a difference in compiler versions.  But ARRAY_SIZE should be
enough to fix it.

Paolo

Re: [Qemu-devel] [PATCH 07/16] block: change drain to look only at one child at a time

2016-03-08 Thread Paolo Bonzini

On 09/03/2016 04:41, Fam Zheng wrote:
> > bdrv_requests_pending is checking children to also wait until internal
> > requests (such as metadata writes) have completed.  However, checking
> > children is in general overkill because, apart from this special case,
> > the parent's in_flight count will always be incremented by at least one
> > for every request in the child.
> 
> While the patch looks good, I'm not sure I understand this: aren't children's
> requests generated by the child itself, how is parent's in_flight incremented?

What I mean is that there are two cases of bs generating I/O on
bs->file->bs:

1) writes caused by an operation on bs, e.g. a bdrv_aio_write to bs.  In
this case, the bdrv_aio_write increments bs's in_flight count

2) asynchronous metadata writes or flushes.  Such writes can be started
even if bs's in_flight count is zero, but not after the .bdrv_drain
callback has been invoked.  So the correct sequence is:

- finish current I/O

- call bdrv_drain callback

- recurse on children

This is what is needed for QED's callback to work correctly, and I'll
implement it this way in v2.

Paolo

Re: [Qemu-devel] [PATCH 06/16] block: add BDS field to count in-flight requests

2016-03-08 Thread Paolo Bonzini

On 09/03/2016 04:35, Fam Zheng wrote:
>> >  enum BdrvTrackedRequestType {
>> >  BDRV_TRACKED_READ,
>> >  BDRV_TRACKED_WRITE,
>> > -BDRV_TRACKED_FLUSH,
>> > -BDRV_TRACKED_IOCTL,
>> >  BDRV_TRACKED_DISCARD,
> Okay, so flush and ioctl are not needed, but why is discard different?

Discard can modify the contents of the device, so I think it's safer to
serialize it against RMW and copy-on-read operations.

Paolo

Re: [Qemu-devel] [PATCH] hw/i386: fix unbounded stack for load_multiboot

2016-03-08 Thread Fam Zheng

On Wed, 03/09 14:14, Peter Xu wrote:
> Use heap rather than stack for kcmdline.
> 
> Signed-off-by: Peter Xu 
> ---
>  hw/i386/multiboot.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
> index 9e164e6..bc45394 100644
> --- a/hw/i386/multiboot.c
> +++ b/hw/i386/multiboot.c
> @@ -324,10 +324,9 @@ int load_multiboot(FWCfgState *fw_cfg,
>  }
>  
>  /* Commandline support */
> -char kcmdline[strlen(kernel_filename) + strlen(kernel_cmdline) + 2];
> -snprintf(kcmdline, sizeof(kcmdline), "%s %s",
> - kernel_filename, kernel_cmdline);
> +char *kcmdline = g_strdup_printf("%s %s", kernel_filename, 
> kernel_cmdline);
>  stl_p(bootinfo + MBI_CMDLINE, mb_add_cmdline(&mbs, kcmdline));
> +g_free(kcmdline);
>  
>  stl_p(bootinfo + MBI_BOOTLOADER, mb_add_bootloader(&mbs, 
> bootloader_name));

While at it, it's better to move the variable declaration to the beginning of
function.

Fam

Re: [Qemu-devel] [PATCH 05/16] mirror: use bottom half to re-enter coroutine

2016-03-08 Thread Paolo Bonzini



On 09/03/2016 04:19, Fam Zheng wrote:
>> > +/* The I/O operation is not finished until the callback returns.
>> > + * If we call qemu_coroutine_enter here, there is the possibility
>> > + * of a deadlock when the coroutine calls bdrv_drained_begin.
>> > + */
>> > +op->co_enter_bh = qemu_bh_new(mirror_bh_cb, op);
> Shouldn't this be aio_bh_new()?

Yes, of course.

Paolo

Re: [Qemu-devel] [PATCH 03/16] block: introduce bdrv_no_throttling_begin/end

2016-03-08 Thread Paolo Bonzini



On 09/03/2016 02:45, Fam Zheng wrote:
>> > @@ -555,11 +574,7 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
>> > offset,
>> >   * will not fire; so the I/O throttling function has to be disabled 
>> > here
>> >   * if it has been enabled.
>> >   */
>> > -if (bs->io_limits_enabled) {
>> > -fprintf(stderr, "Disabling I/O throttling on '%s' due "
>> > -"to synchronous I/O.\n", 
>> > bdrv_get_device_name(bs));
>> > -bdrv_io_limits_disable(bs);
>> > -}
>> > +bdrv_no_throttling_begin(bs);
>> >  
>> >  if (qemu_in_coroutine()) {
>> >  /* Fast-path if already in coroutine context */
>> > @@ -573,6 +588,8 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
>> > offset,
>> >  aio_poll(aio_context, true);
>> >  }
>> >  }
>> > +
>> > +bdrv_no_throttling_end(bs);
> 
> Does this change the behavior? There wasn't a bdrv_io_limits_enable() here, 
> and
> the throttle doesn't come back automatically. Just want to make sure it's
> intended.

Yes, it's intended.  As long as the I/O stays synchronous, throttling is
disabled.  If it starts to be asynchronous, it will be re-enabled
automatically.

Paolo

Re: [Qemu-devel] [PATCH 02/16] block: move restarting of throttled reqs to block/throttle-groups.c

2016-03-08 Thread Paolo Bonzini



On 09/03/2016 02:26, Fam Zheng wrote:
>> diff --git a/block/throttle-groups.c b/block/throttle-groups.c
>> index 4920e09..eccfc0d 100644
>> --- a/block/throttle-groups.c
>> +++ b/block/throttle-groups.c
>> @@ -313,6 +313,17 @@ void coroutine_fn 
>> throttle_group_co_io_limits_intercept(BlockDriverState *bs,
>>  qemu_mutex_unlock(&tg->lock);
>>  }
>>  
>> +void throttle_group_restart_bs(BlockDriverState *bs)
>> +{
>> +int i;
>> +
>> +for (i = 0; i < 2; i++) {
>> +while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
>> +;
>> +}
>> +}
>> +}
>> +
>>  /* Update the throttle configuration for a particular group. Similar
>>   * to throttle_config(), but guarantees atomicity within the
>>   * throttling group.
>> @@ -335,6 +346,10 @@ void throttle_group_config(BlockDriverState *bs, 
>> ThrottleConfig *cfg)
>>  }
>>  throttle_config(ts, tt, cfg);
>>  qemu_mutex_unlock(&tg->lock);
>> +
>> +aio_context_acquire(bdrv_get_aio_context(bs));
>> +throttle_group_restart_bs(bs);
>> +aio_context_release(bdrv_get_aio_context(bs));
> 
> Could you explain why does this hunk belong to this patch?
> 
> Otherwise looks good.

Sure.  It's moved from bdrv_set_io_limits, which calls
throttle_group_config, to throttle_group_config itself:

>>  void bdrv_set_io_limits(BlockDriverState *bs,
>>  ThrottleConfig *cfg)
>>  {
>> -int i;
>> -
>>  throttle_group_config(bs, cfg);
>> -
>> -for (i = 0; i < 2; i++) {
>> -qemu_co_enter_next(&bs->throttled_reqs[i]);
>> -}
>>  }
>>  

But in v2 I'll change it to only restart the first request so there is
no semantic change.

Paolo

Re: [Qemu-devel] [PATCH v4 04/10] qapi: Emit implicit structs in generated C

2016-03-08 Thread Markus Armbruster

Eric Blake  writes:

> On 03/08/2016 12:09 PM, Markus Armbruster wrote:
>
>> 
>>> I think what would sway me over the fence is looking at some of our
>>> constructs: for example, qapi-types.py has gen_object() which it now
>>> calls recursively.  When called directly from visit_object_type(), we
>>> have all the pieces; when called from visit_alternate_type(), we have to
>>> create a 1-element members array; and when called recursively, we have
>>> to manually explode base into the four pieces.
>> 
>> What could be improved if we changed visit_object_type() to take a
>> QAPISchemaObjectType instead of name, info, base, members, variants?
>> 
>> I believe we'd leave gen_object() unchanged to keep
>> visit_alternate_type() simple.
>> 
>> Here's a different one: we could drop visit_object_type_flat().
>
> Indeed. But I'd rather get v5 of this series out sooner rather than
> later, so I'll save the change for a later day.

I agree that we've wandered beyond the scope of this series here.  It's
not about the visitor, it merely struggles with the visitor's narrow
view on the visited objects.  That led us to discuss whether that view
still makes sense.  We're not sure.

 Uh, these are "always reserved for use as identifiers with file scope"
 (C99 7.1.3).  I'm afraid we need to use the q_ prefix.
>
> Seems doable. We already reserved q_ for ourselves (commit 9fb081e0), so
> far using it mainly by c_name.  There's still the risk that c_name adds
> q_ onto something ticklish which then clashes with our use of q_ on an
> implicit type for something else; except that we haven't declared 'obj'
> as ticklish.

As far as I can tell, c_name() is still the only user of q_.  If we add
more, we need to update the "Reserve the entire 'q_' namespace for
c_name()" comment, and we should try to keep track of our use of q_
somehow.

Alternatively, reserve another chunk of the name space for implicit
object types.  Might be easier to maintain.

>> I feel awful generating >100KiB of code that gets included pretty much
>> every time we compile anything.  Perhaps the proper fix for that is to
>> find out *why* we include qapi-types.h so much, then figure out how to
>> include it much, much less.
>> 
>> Here's a guilty one: error.h.  I believe it includes qapi-types.h just
>> for QapiErrorClass.  I guess it's only in the QAPI schema to have
>> QapiErrorClass_lookup[] generated.  I'd gladly maintain those nine(!)
>> lines of code manually if it helps me drop >100KiB of useless code from
>> many (most?) compiles.
>
> Commit 13f59ae predates my work on qapi, but I think you're right that
> error.h including qapi-types.h is a big offender and appears to do so
> solely for ErrorClass.  But how about as a followup series, so that v5
> gets out the door sooner.

Teaching error.h manners doesn't depend on anything in your pipe.  I'm
invoking "Error Reporting" maintainer privileges: the task is mine :)

>> Another one are builtin QAPI types.  A few headers want them.  We could
>> put them into a separate header instead of generating them into
>> qapi-types.h.  Could also enable getting rid of the ugly "spew builtins
>> into every header, but guard them with #ifdeffery" trick.
>
> In that same series.

I'd expect this to be a series of its own, going through the QAPI tree.

Re: [Qemu-devel] [PULL] MAINTAINERS: Add Samuel Thibault as slirp maintainer

2016-03-08 Thread Peter Maydell

On 9 March 2016 at 07:28, Samuel Thibault  wrote:
> The following changes since commit 1464ad45cd6cdeb0b5c1a54d3d3791396e47e52f:
>
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-03-04' 
> into staging (2016-03-06 11:53:27 +)
>
> are available in the git repository at:
>
>   http://people.debian.org/~sthibault/qemu.git tags/samuel-thibault
>
> for you to fetch changes up to eda509fa0ac7d4870b7ff206fd6c33cdd06c2d55:
>
>   MAINTAINERS: Add Samuel Thibault as slirp maintainer (2016-03-08 21:39:04 
> +0100)
>
> 
> Add Samuel Thibault as slirp maintainer
>
> 
> Samuel Thibault (1):
>   MAINTAINERS: Add Samuel Thibault as slirp maintainer

Applied, thanks.

-- PMM

Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-08 Thread Li, Liang Z

> On 04/03/2016 15:26, Li, Liang Z wrote:
> >> >
> >> > The memory usage will keep increasing due to ever growing caches,
> >> > etc, so you'll be left with very little free memory fairly soon.
> >> >
> > I don't think so.
> >
> 
> Roman is right.  For example, here I am looking at a 64 GB (physical) machine
> which was booted about 30 minutes ago, and which is running disk-heavy
> workloads (installing VMs).
> 
> Since I have started writing this email (2 minutes?), the amount of free
> memory has already gone down from 37 GB to 33 GB.  I expect that by the
> time I have finished running the workload, in two hours, it will not have any
> free memory.
> 
> Paolo

I have a VM which has 2GB of RAM, when the guest booted, there were about 1.4GB 
of free pages.
Then I tried to download a large file from the internet with the browser, after 
the downloading finished,
there were only 72MB of free pages left, as Roman pointed out, there were quite 
a lot of Cached memory.
Then I tried to compile the QEMU, after the compiling finished, there were 
about 1.3G free pages.

So even the cache will increase to a large amount, it will be freed if there 
are some other specific workloads. 
The cache memory is a big issue that should be taken into consideration.
 How about reclaim some cache before getting the free pages information?  

Liang

[Qemu-devel] [PATCH] hw/i386: fix unbounded stack for load_multiboot

2016-03-08 Thread Peter Xu

Use heap rather than stack for kcmdline.

Signed-off-by: Peter Xu 
---
 hw/i386/multiboot.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 9e164e6..bc45394 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -324,10 +324,9 @@ int load_multiboot(FWCfgState *fw_cfg,
 }
 
 /* Commandline support */
-char kcmdline[strlen(kernel_filename) + strlen(kernel_cmdline) + 2];
-snprintf(kcmdline, sizeof(kcmdline), "%s %s",
- kernel_filename, kernel_cmdline);
+char *kcmdline = g_strdup_printf("%s %s", kernel_filename, kernel_cmdline);
 stl_p(bootinfo + MBI_CMDLINE, mb_add_cmdline(&mbs, kcmdline));
+g_free(kcmdline);
 
 stl_p(bootinfo + MBI_BOOTLOADER, mb_add_bootloader(&mbs, bootloader_name));
 
-- 
2.4.3

[Qemu-devel] [PATCH] migration: fix warning for source_return_path_thread

2016-03-08 Thread Peter Xu

max_len is not necessary, while it brings a warning during compilation
when specify "-Wstack-usage=100". Replacing using sizeof().

Signed-off-by: Peter Xu 
---
 migration/migration.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0129d9f..6680d95 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1268,8 +1268,7 @@ static void *source_return_path_thread(void *opaque)
 MigrationState *ms = opaque;
 QEMUFile *rp = ms->rp_state.from_dst_file;
 uint16_t header_len, header_type;
-const int max_len = 512;
-uint8_t buf[max_len];
+uint8_t buf[512];
 uint32_t tmp32, sibling_error;
 ram_addr_t start = 0; /* =0 to silence warning */
 size_t  len = 0, expected_len;
@@ -1292,7 +1291,7 @@ static void *source_return_path_thread(void *opaque)
 
 if ((rp_cmd_args[header_type].len != -1 &&
 header_len != rp_cmd_args[header_type].len) ||
-header_len > max_len) {
+header_len > sizeof(buf)) {
 error_report("RP: Received '%s' message (0x%04x) with"
 "incorrect length %d expecting %zu",
 rp_cmd_args[header_type].name, header_type, header_len,
-- 
2.4.3

[Qemu-devel] [PATCH 2/2] usb: trivial cleanup for usb_mtp_add_str

2016-03-08 Thread Peter Xu

Remove useless var "ret".

Signed-off-by: Peter Xu 
---
 hw/usb/dev-mtp.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index cf63fd0..38cc4fc 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -719,10 +719,8 @@ static void usb_mtp_add_str(MTPData *data, const char *str)
 {
 uint32_t len = strlen(str)+1;
 wchar_t *wstr = g_malloc(sizeof(wchar_t) * len);
-size_t ret;
 
-ret = mbstowcs(wstr, str, len);
-if (ret == -1) {
+if (mbstowcs(wstr, str, len) == -1) {
 usb_mtp_add_wstr(data, L"Oops");
 } else {
 usb_mtp_add_wstr(data, wstr);
-- 
2.4.3

[Qemu-devel] [PATCH 1/2] usb: fix unbound stack usage for usb_mtp_add_str

2016-03-08 Thread Peter Xu

Use heap instead of stack.

Signed-off-by: Peter Xu 
---
 hw/usb/dev-mtp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 7391783..cf63fd0 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -718,7 +718,7 @@ static void usb_mtp_add_wstr(MTPData *data, const wchar_t 
*str)
 static void usb_mtp_add_str(MTPData *data, const char *str)
 {
 uint32_t len = strlen(str)+1;
-wchar_t wstr[len];
+wchar_t *wstr = g_malloc(sizeof(wchar_t) * len);
 size_t ret;
 
 ret = mbstowcs(wstr, str, len);
@@ -727,6 +727,8 @@ static void usb_mtp_add_str(MTPData *data, const char *str)
 } else {
 usb_mtp_add_wstr(data, wstr);
 }
+
+g_free(wstr);
 }
 
 static void usb_mtp_add_time(MTPData *data, time_t time)
-- 
2.4.3

[Qemu-devel] [PATCH 0/2] usb: trivial fixes

2016-03-08 Thread Peter Xu

Both for usb_mtp_add_str: fix unbounded stack for it, also cleanup
one variable.

Peter Xu (2):
  usb: fix unbound stack usage for usb_mtp_add_str
  usb: trivial cleanup for usb_mtp_add_str

 hw/usb/dev-mtp.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.4.3

[Qemu-devel] [PATCH] usb: fix unbound stack warning for inotify_watchfn

2016-03-08 Thread Peter Xu

Signed-off-by: Peter Xu 
---
 hw/usb/dev-mtp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 7391783..76ad64e 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -433,12 +433,11 @@ static void inotify_watchfn(void *arg)
 MTPState *s = arg;
 ssize_t bytes;
 /* From the man page: atleast one event can be read */
-int len = sizeof(struct inotify_event) + NAME_MAX + 1;
 int pos;
-char buf[len];
+char buf[sizeof(struct inotify_event) + NAME_MAX + 1];
 
 for (;;) {
-bytes = read(s->inotifyfd, buf, len);
+bytes = read(s->inotifyfd, buf, sizeof(buf));
 pos = 0;
 
 if (bytes <= 0) {
-- 
2.4.3

[Qemu-devel] [PATCH] qdict: fix unbounded stack for qdict_array_entries

2016-03-08 Thread Peter Xu

Signed-off-by: Peter Xu 
---
 qobject/qdict.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/qobject/qdict.c b/qobject/qdict.c
index 9833bd0..9188a87 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -704,19 +704,16 @@ int qdict_array_entries(QDict *src, const char *subqdict)
 for (i = 0; i < INT_MAX; i++) {
 QObject *subqobj;
 int subqdict_entries;
-size_t slen = 32 + subqdict_len;
-char indexstr[slen], prefix[slen];
-size_t snprintf_ret;
+char *prefix = g_strdup_printf("%s%u.", subqdict, i);
 
-snprintf_ret = snprintf(indexstr, slen, "%s%u", subqdict, i);
-assert(snprintf_ret < slen);
+subqdict_entries = qdict_count_prefixed_entries(src, prefix);
 
-subqobj = qdict_get(src, indexstr);
+/* Remove ending "." */
+prefix[strlen(prefix) - 1] = 0x00;
+subqobj = qdict_get(src, prefix);
 
-snprintf_ret = snprintf(prefix, slen, "%s%u.", subqdict, i);
-assert(snprintf_ret < slen);
+g_free(prefix);
 
-subqdict_entries = qdict_count_prefixed_entries(src, prefix);
 if (subqdict_entries < 0) {
 return subqdict_entries;
 }
-- 
2.4.3

[Qemu-devel] [PATCH 2/2] block/qapi: fix unbounded stack for dump_qdict

2016-03-08 Thread Peter Xu

Using heap instead of stack for better safety.

Signed-off-by: Peter Xu 
---
 block/qapi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index c4c2115..b798e35 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -636,9 +636,8 @@ static void dump_qdict(fprintf_function func_fprintf, void 
*f, int indentation,
 for (entry = qdict_first(dict); entry; entry = qdict_next(dict, entry)) {
 QType type = qobject_type(entry->value);
 bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
-char key[strlen(entry->key) + 1];
+char *key = g_malloc(strlen(entry->key) + 1);
 int i;
-
 /* replace dashes with spaces in key (variable) names */
 for (i = 0; entry->key[i]; i++) {
 key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
@@ -650,6 +649,7 @@ static void dump_qdict(fprintf_function func_fprintf, void 
*f, int indentation,
 if (!composite) {
 func_fprintf(f, "\n");
 }
+g_free(key);
 }
 }
 
-- 
2.4.3

[Qemu-devel] [PATCH 1/2] block/qapi: make two printf() formats literal

2016-03-08 Thread Peter Xu

Fix two places to use literal printf format when possible.

Signed-off-by: Peter Xu 
---
 block/qapi.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index db2d3fb..c4c2115 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -619,9 +619,8 @@ static void dump_qlist(fprintf_function func_fprintf, void 
*f, int indentation,
 for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
 QType type = qobject_type(entry->value);
 bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
-const char *format = composite ? "%*s[%i]:\n" : "%*s[%i]: ";
-
-func_fprintf(f, format, indentation * 4, "", i);
+func_fprintf(f, "%*s[%i]:%c", indentation * 4, "", i,
+ composite ? '\n' : ' ');
 dump_qobject(func_fprintf, f, indentation + 1, entry->value);
 if (!composite) {
 func_fprintf(f, "\n");
@@ -637,7 +636,6 @@ static void dump_qdict(fprintf_function func_fprintf, void 
*f, int indentation,
 for (entry = qdict_first(dict); entry; entry = qdict_next(dict, entry)) {
 QType type = qobject_type(entry->value);
 bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
-const char *format = composite ? "%*s%s:\n" : "%*s%s: ";
 char key[strlen(entry->key) + 1];
 int i;
 
@@ -646,8 +644,8 @@ static void dump_qdict(fprintf_function func_fprintf, void 
*f, int indentation,
 key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
 }
 key[i] = 0;
-
-func_fprintf(f, format, indentation * 4, "", key);
+func_fprintf(f, "%*s%s:%c", indentation * 4, "", key,
+ composite ? '\n' : ' ');
 dump_qobject(func_fprintf, f, indentation + 1, entry->value);
 if (!composite) {
 func_fprintf(f, "\n");
-- 
2.4.3

[Qemu-devel] [PATCH 0/2] block/qapi: trivial fixes

2016-03-08 Thread Peter Xu

One is to use literal printf format when possible.

One is to fix an unbounded usage of stack.

Peter Xu (2):
  block/qapi: make two printf() formats literal
  block/qapi: fix unbounded stack for dump_qdict

 block/qapi.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

-- 
2.4.3

Re: [Qemu-devel] [PATCH v4 04/10] qapi: Emit implicit structs in generated C

2016-03-08 Thread Eric Blake

On 03/08/2016 12:09 PM, Markus Armbruster wrote:

> 
>> I think what would sway me over the fence is looking at some of our
>> constructs: for example, qapi-types.py has gen_object() which it now
>> calls recursively.  When called directly from visit_object_type(), we
>> have all the pieces; when called from visit_alternate_type(), we have to
>> create a 1-element members array; and when called recursively, we have
>> to manually explode base into the four pieces.
> 
> What could be improved if we changed visit_object_type() to take a
> QAPISchemaObjectType instead of name, info, base, members, variants?
> 
> I believe we'd leave gen_object() unchanged to keep
> visit_alternate_type() simple.
> 
> Here's a different one: we could drop visit_object_type_flat().

Indeed. But I'd rather get v5 of this series out sooner rather than
later, so I'll save the change for a later day.


>>>
>>> Uh, these are "always reserved for use as identifiers with file scope"
>>> (C99 7.1.3).  I'm afraid we need to use the q_ prefix.

Seems doable. We already reserved q_ for ourselves (commit 9fb081e0), so
far using it mainly by c_name.  There's still the risk that c_name adds
q_ onto something ticklish which then clashes with our use of q_ on an
implicit type for something else; except that we haven't declared 'obj'
as ticklish.

> I feel awful generating >100KiB of code that gets included pretty much
> every time we compile anything.  Perhaps the proper fix for that is to
> find out *why* we include qapi-types.h so much, then figure out how to
> include it much, much less.
> 
> Here's a guilty one: error.h.  I believe it includes qapi-types.h just
> for QapiErrorClass.  I guess it's only in the QAPI schema to have
> QapiErrorClass_lookup[] generated.  I'd gladly maintain those nine(!)
> lines of code manually if it helps me drop >100KiB of useless code from
> many (most?) compiles.

Commit 13f59ae predates my work on qapi, but I think you're right that
error.h including qapi-types.h is a big offender and appears to do so
solely for ErrorClass.  But how about as a followup series, so that v5
gets out the door sooner.

> 
> Another one are builtin QAPI types.  A few headers want them.  We could
> put them into a separate header instead of generating them into
> qapi-types.h.  Could also enable getting rid of the ugly "spew builtins
> into every header, but guard them with #ifdeffery" trick.

In that same series.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 8/8] hw/i386: fix unbounded stack for load_multiboot

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 01:29:21PM +0100, Paolo Bonzini wrote:
> 
> 
> On 08/03/2016 08:00, Peter Xu wrote:
> > @@ -159,6 +159,12 @@ int load_multiboot(FWCfgState *fw_cfg,
> >  uint8_t *mb_bootinfo_data;
> >  uint32_t cmdline_len;
> >  
> > +#define __KERN_FNAME_LEN (1024)
> > +#define __KERN_CMDLINE_LEN (4096)
> > +
> > +assert(strlen(kernel_filename) + 1 >= __KERN_FNAME_LEN);
> > +assert(strlen(kernel_cmdline) + 1 >= __KERN_CMDLINE_LEN);
> > +
> >  /* Ok, let's see if it is a multiboot image.
> > The header is 12x32bit long, so the latest entry may be 8192 - 48. 
> > */
> >  for (i = 0; i < (8192 - 48); i += 4) {
> > @@ -324,7 +330,7 @@ int load_multiboot(FWCfgState *fw_cfg,
> >  }
> >  
> >  /* Commandline support */
> > -char kcmdline[strlen(kernel_filename) + strlen(kernel_cmdline) + 2];
> > +char kcmdline[__KERN_FNAME_LEN + __KERN_CMDLINE_LEN];
> >  snprintf(kcmdline, sizeof(kcmdline), "%s %s",
> >   kernel_filename, kernel_cmdline);
> >  stl_p(bootinfo + MBI_CMDLINE, mb_add_cmdline(&mbs, kcmdline));
> > @@ -370,4 +376,6 @@ int load_multiboot(FWCfgState *fw_cfg,
> >  nb_option_roms++;
> >  
> >  return 1; /* yes, we are multiboot */
> > +#undef __KERN_FNAME_LEN
> > +#undef __KERN_CMDLINE_LEN
> 
> Just put it in the heap using g_strdup_printf.

Will fix and send standalone again. Thanks.

Peter

Re: [Qemu-devel] [PATCH 6/8] usb: fix unbounded stack for usb_mtp_add_str

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 09:10:44AM +0100, Gerd Hoffmann wrote:
> >  static void usb_mtp_add_str(MTPData *data, const char *str)
> >  {
> > +#define __WSTR_LEN (256)
> >  uint32_t len = strlen(str)+1;
> > -wchar_t wstr[len];
> > +wchar_t wstr[__WSTR_LEN];
> 
> I think we should g_malloc() here.

Agree. Will fix. Thanks.

Peter

Re: [Qemu-devel] [PATCH 7/8] migration: fix unbounded stack for source_return_path_thread

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 01:26:24PM +0100, Paolo Bonzini wrote:
> 
> 
> On 08/03/2016 08:00, Peter Xu wrote:
> > Suggested-by: Paolo Bonzini 
> > CC: Juan Quintela 
> > CC: Amit Shah 
> > Signed-off-by: Peter Xu 
> > ---
> >  migration/migration.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 0129d9f..f1a3976 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -1265,11 +1265,11 @@ static void 
> > migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname,
> >   */
> >  static void *source_return_path_thread(void *opaque)
> >  {
> > +#define __MAX_LEN (512)
> >  MigrationState *ms = opaque;
> >  QEMUFile *rp = ms->rp_state.from_dst_file;
> >  uint16_t header_len, header_type;
> > -const int max_len = 512;
> > -uint8_t buf[max_len];
> > +uint8_t buf[__MAX_LEN];
> >  uint32_t tmp32, sibling_error;
> >  ram_addr_t start = 0; /* =0 to silence warning */
> >  size_t  len = 0, expected_len;
> > @@ -1292,7 +1292,7 @@ static void *source_return_path_thread(void *opaque)
> >  
> >  if ((rp_cmd_args[header_type].len != -1 &&
> >  header_len != rp_cmd_args[header_type].len) ||
> > -header_len > max_len) {
> > +header_len > __MAX_LEN) {
> >  error_report("RP: Received '%s' message (0x%04x) with"
> >  "incorrect length %d expecting %zu",
> >  rp_cmd_args[header_type].name, header_type, header_len,
> > @@ -1372,6 +1372,7 @@ out:
> >  ms->rp_state.from_dst_file = NULL;
> >  qemu_fclose(rp);
> >  return NULL;
> > +#undef __MAX_LEN
> >  }
> >  
> >  static int open_return_path_on_source(MigrationState *ms)
> > 
> 
> Another compiler false positive that you can fix with ARRAY_SIZE.

Yes, will fix.

Thanks.
Peter

Re: [Qemu-devel] [PATCH 5/8] usb: fix unbounded stack for inotify_watchfn

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 01:22:19PM +0100, Paolo Bonzini wrote:
> >  for (;;) {
> > -bytes = read(s->inotifyfd, buf, len);
> > +bytes = read(s->inotifyfd, buf, __BUF_LEN);
> 
> Again, here you can use ARRAY_SIZE(buf) and avoid the macro.

Yes, will fix. Thanks!

Peter

Re: [Qemu-devel] [PULL 00/14] Net patches

2016-03-08 Thread Wen Congyang

On 03/09/2016 12:26 PM, Li Zhijian wrote:
> 
> 
> On 03/09/2016 09:36 AM, Wen Congyang wrote:
>> On 03/08/2016 05:54 PM, Peter Maydell wrote:
>>> On 8 March 2016 at 16:06, Zhang Chen  wrote:
 I found the reason for this problem is that
 unix_connect() have not connect to sock_path before iov_send().
 It need time to establish connection. so can we fix it with usleep()
 like this:

  recv_sock = unix_connect(sock_path, NULL);
  g_assert_cmpint(recv_sock, !=, -1);
 +usleep(1000);

  ret = iov_send(send_sock[0], iov, 2, 0, sizeof(size) +
 sizeof(send_buf));
  g_assert_cmpint(ret, ==, sizeof(send_buf) + sizeof(size));
  close(send_sock[0]);

  ret = qemu_recv(recv_sock, &len, sizeof(len), 0);
>>>
>>> I would prefer it if we could find a way to fix this race
>>> reliably rather than just inserting a delay and hoping it
>>> is sufficient. Otherwise the test is likely to be unreliable
>>> if run on a heavily loaded or slow machine.
>>
>> Yes, but there is no way to know when tcp_chr_accept() is called. Add a event
>> to notify it?
>>
>> Thanks
>> Wen Congyang
>>
> 
> Hi, Jason, PMM
> As Congyang said that this is a bug of testcase instead of filter-mirror.
> Maybe we should re-wrok the testcase, for example
> - using -chardev pipe instead of -chardev socket, because we are
>   intend to test the packet mirror fuction instead of -chardev socket

I think it is OK to change it.

Thanks
Wen Congyang

> 
> How about that ?
> 
> 
>>>
>>> thanks
>>> -- PMM
>>>
>>>
>>>
>>
>>
>>
>>
>>
> .
>

Re: [Qemu-devel] [PATCH 5/8] usb: fix unbounded stack for inotify_watchfn

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 01:22:46PM +0100, Paolo Bonzini wrote:
> 
> 
> On 08/03/2016 08:20, Peter Maydell wrote:
> >> > +#define __BUF_LEN (sizeof(struct inotify_event) + NAME_MAX + 1)
> >> >  /* From the man page: atleast one event can be read */
> >> > -int len = sizeof(struct inotify_event) + NAME_MAX + 1;
> >> >  int pos;
> >> > -char buf[len];
> >> > +char buf[__BUF_LEN];
> > The commit message subject says this is fixing an unbounded
> > stack usage, but (a) this array wasn't unbounded in size
> > (b) the change doesn't change the size we allocate.
> > What are you trying to do here?

Sorry. I should be more clear to say "it avoids one warning during
compilation" rather than saying "fix unbounded stack usage", while
it's not.

> 
> I suspect it's just fixing a false positive in the compiler.
> 
> Paolo

Yes. I will avoid touching these kinds of places in the code next
time I guess... only when necessary. Since this one is easy, I'd
like to send another standalone patch, using sizeof(). rather than
macros, to avoid the warning.

Thanks.
Peter

Re: [Qemu-devel] [PULL 0/3] migration: avx2, 'info migrate' updates

2016-03-08 Thread Peter Maydell

On 8 March 2016 at 18:32, Amit Shah  wrote:
> The following changes since commit 97556fe80e4f7252300b3498b3477fb4295153a3:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2016-03-08 04:53:37 +)
>
> are available in the git repository at:
>
>   https://git.kernel.org/pub/scm/virt/qemu/amit/migration.git 
> tags/migration-for-2.6-6
>
> for you to fetch changes up to 28b90d9c19d368645f475e36297ca21c53c38799:
>
>   cutils: add avx2 instruction optimization (2016-03-08 16:53:26 +0530)
>
> 
> migration:
> * add avx2 instruction optimization, speeds up zero-page checking on
>   compatible architectures and compilers (gcc 4.9+)
> * add additional postcopy stats to 'info migrate' output
>
> 


Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH 4/8] usb: fix unbounded stack for xhci_dma_write_u32s

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 02:26:36PM +0700, Peter Maydell wrote:
> On 8 March 2016 at 14:00, Peter Xu  wrote:
> > First of all, this function cannot be inlined even with always_inline,
> > so removing inline.
> 
> Please don't mix two different changes in one patch.

Sorry. Will follow this.

> 
> > After that, make its stack bounded.
> >
> > Suggested-by: Paolo Bonzini 
> > CC: Gerd Hoffmann 
> > Signed-off-by: Peter Xu 
> > ---
> >  hw/usb/hcd-xhci.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
> > index 44b6f8c..3dd5a02 100644
> > --- a/hw/usb/hcd-xhci.c
> > +++ b/hw/usb/hcd-xhci.c
> > @@ -694,18 +694,22 @@ static inline void xhci_dma_read_u32s(XHCIState 
> > *xhci, dma_addr_t addr,
> >  }
> >  }
> >
> > -static inline void xhci_dma_write_u32s(XHCIState *xhci, dma_addr_t addr,
> > -   uint32_t *buf, size_t len)
> > +static void xhci_dma_write_u32s(XHCIState *xhci, dma_addr_t addr,
> > +uint32_t *buf, size_t len)
> >  {
> >  int i;
> > -uint32_t tmp[len / sizeof(uint32_t)];
> > +uint32_t n = len / sizeof(uint32_t);
> > +#define __BUF_SIZE (12)
> > +uint32_t tmp[__BUF_SIZE];
> >
> > +assert(__BUF_SIZE >= n);
> >  assert((len % sizeof(uint32_t)) == 0);
> >
> > -for (i = 0; i < (len / sizeof(uint32_t)); i++) {
> > +for (i = 0; i < n; i++) {
> >  tmp[i] = cpu_to_le32(buf[i]);
> >  }
> >  pci_dma_write(PCI_DEVICE(xhci), addr, tmp, len);
> > +#undef __BUF_SIZE
> 
> All the patches in this series seem to be following the
> same pattern of #defining an arbitrary fixed length for
> the array. This does not at all seem to me to be an
> improvement. We should be avoiding unbounded stack
> allocations, but we need to do this by either changing
> the code to work correctly without an unbounded allocation
> or by using a heap allocation instead of a stack allocation.
> 
> In some cases, like this one, the original code isn't even
> unbounded -- we always call this function with a length
> parameter which is a small compile time constant, so the
> stack usage is definitely bounded. So this change is making
> the code uglier and less flexible for no benefit that I
> can see.

I was trying to avoid the "stack unlimited" warning. There will be a
warning generated before applying this patch. The patch solved
this. Though I'd say this patch might not be good enough. :(

Then, will drop this one too if I have no further better
idea.

Thanks for review.

Peter

Re: [Qemu-devel] [PATCH 4/8] usb: fix unbounded stack for xhci_dma_write_u32s

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 01:21:52PM +0100, Paolo Bonzini wrote:
> 
> 
> On 08/03/2016 08:00, Peter Xu wrote:
> > First of all, this function cannot be inlined even with always_inline,
> > so removing inline.
> 
> Why?  always_inline fixes the error for me.

I tried this patch:

-

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 44b6f8c..961fd78 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -694,7 +694,7 @@ static inline void xhci_dma_read_u32s(XHCIState *xhci, 
dma_addr_t addr,
 }
 }

-static inline void xhci_dma_write_u32s(XHCIState *xhci, dma_addr_t addr,
+static QEMU_ARTIFICIAL void xhci_dma_write_u32s(XHCIState *xhci, dma_addr_t 
addr,
uint32_t *buf, size_t len)
 {
 int i;

-

What I got is:

/root/git/qemu/hw/usb/hcd-xhci.c:699:1: warning: ‘artificial’ attribute ignored 
[-Wattributes]
 {
 ^
/root/git/qemu/hw/usb/hcd-xhci.c:697:56: warning: always_inline function might 
not be inlinable [-Wattributes]
 static QEMU_ARTIFICIAL void xhci_dma_write_u32s(XHCIState *xhci, dma_addr_t 
addr,
^

GCC version:

pxdev:bin# gcc -v
Using built-in specs.
COLLECT_GCC=/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla 
--enable-bootstrap --enable-shared --enable-threads=posix 
--enable-checking=release --with-system-zlib --enable-__cxa_atexit 
--disable-libunwind-exceptions --enable-gnu-unique-object 
--enable-linker-build-id --with-linker-hash-style=gnu 
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin 
--enable-initfini-array --disable-libgcj 
--with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install
 
--with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install
 --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 
--build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)

Do you know why "might not be inlinable"? Failed to figure it out
myself as mentioned in cover letter..

> 
> >  int i;
> > -uint32_t tmp[len / sizeof(uint32_t)];
> > +uint32_t n = len / sizeof(uint32_t);
> > +#define __BUF_SIZE (12)
> > +uint32_t tmp[__BUF_SIZE];
> >  
> > +assert(__BUF_SIZE >= n);
> 
> Instead of a #define, you can use ARRAY_SIZE(tmp).

Will do when needed. Thanks!

Peter

Re: [Qemu-devel] [RFC PATCH v1 03/10] cpu: Reclaim vCPU objects

2016-03-08 Thread Bharata B Rao

On Mon, Mar 07, 2016 at 08:05:58PM +0100, Thomas Huth wrote:
> On 04.03.2016 07:54, Bharata B Rao wrote:
> > From: Gu Zheng 
> > 
> > In order to deal well with the kvm vcpus (which can not be removed without 
> > any
> > protection), we do not close KVM vcpu fd, just record and mark it as stopped
> > into a list, so that we can reuse it for the appending cpu hot-add request 
> > if
> > possible. It is also the approach that kvm guys suggested:
> > https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> > 
> > Signed-off-by: Chen Fan 
> > Signed-off-by: Gu Zheng 
> > Signed-off-by: Zhu Guihua 
> > Signed-off-by: Bharata B Rao 
> >[- Explicit CPU_REMOVE() from qemu_kvm/tcg_destroy_vcpu()
> >   isn't needed as it is done from cpu_exec_exit()
> > - Use iothread mutex instead of global mutex during
> >   destroy
> > - Don't cleanup vCPU object from vCPU thread context
> >   but leave it to the callers (device_add/device_del)]
> > Reviewed-by: David Gibson 
> > ---
> >  cpus.c   | 38 +++
> >  include/qom/cpu.h| 10 +
> >  include/sysemu/kvm.h |  1 +
> >  kvm-all.c| 57 
> > +++-
> >  kvm-stub.c   |  5 +
> >  5 files changed, 110 insertions(+), 1 deletion(-)
> > 
> > diff --git a/cpus.c b/cpus.c
> > index 9592163..07cc054 100644
> > --- a/cpus.c
> > +++ b/cpus.c
> > @@ -953,6 +953,18 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void 
> > *data), void *data)
> >  qemu_cpu_kick(cpu);
> >  }
> >  
> > +static void qemu_kvm_destroy_vcpu(CPUState *cpu)
> > +{
> > +if (kvm_destroy_vcpu(cpu) < 0) {
> > +error_report("kvm_destroy_vcpu failed");
> > +exit(EXIT_FAILURE);
> > +}
> > +}
> > +
> > +static void qemu_tcg_destroy_vcpu(CPUState *cpu)
> > +{
> > +}
> > +
> >  static void flush_queued_work(CPUState *cpu)
> >  {
> >  struct qemu_work_item *wi;
> > @@ -1053,6 +1065,11 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
> >  }
> >  }
> >  qemu_kvm_wait_io_event(cpu);
> > +if (cpu->exit && !cpu_can_run(cpu)) {
> > +qemu_kvm_destroy_vcpu(cpu);
> > +qemu_mutex_unlock_iothread();
> > +return NULL;
> > +}
> 
> My comment from last time still applies:
> 
> You could increase readability of the code by changing the condition of
> the loop instead - currently it is a "while (1)" ... you could turn that
> into a "do { ... } while (!cpu->exit || cpu_can_run(cpu))" and then
> destroy the cpu after the loop.

Sorry for missing this, will take of this and the other comment in this
thread in the next version.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH 3/8] usb: fix unbounded stack for ohci_td_pkt

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 01:20:45PM +0100, Paolo Bonzini wrote:
> 
> 
> On 08/03/2016 08:00, Peter Xu wrote:
> > Suggested-by: Paolo Bonzini 
> > CC: Gerd Hoffmann 
> > Signed-off-by: Peter Xu 
> > ---
> >  hw/usb/hcd-ohci.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> > index 17ed461..c3cd4e2 100644
> > --- a/hw/usb/hcd-ohci.c
> > +++ b/hw/usb/hcd-ohci.c
> > @@ -936,11 +936,11 @@ static int ohci_service_iso_td(OHCIState *ohci, 
> > struct ohci_ed *ed,
> >  #ifdef trace_event_get_state
> >  static void ohci_td_pkt(const char *msg, const uint8_t *buf, size_t len)
> >  {
> > +#define __TEMP_WIDTH (16)
> >  bool print16 = !!trace_event_get_state(TRACE_USB_OHCI_TD_PKT_SHORT);
> >  bool printall = !!trace_event_get_state(TRACE_USB_OHCI_TD_PKT_FULL);
> > -const int width = 16;
> >  int i;
> > -char tmp[3 * width + 1];
> > +char tmp[3 * __TEMP_WIDTH + 1];
> >  char *p = tmp;
> >  
> >  if (!printall && !print16) {
> > @@ -948,7 +948,7 @@ static void ohci_td_pkt(const char *msg, const uint8_t 
> > *buf, size_t len)
> >  }
> >  
> >  for (i = 0; ; i++) {
> > -if (i && (!(i % width) || (i == len))) {
> > +if (i && (!(i % __TEMP_WIDTH) || (i == len))) {
> >  if (!printall) {
> >  trace_usb_ohci_td_pkt_short(msg, tmp);
> >  break;
> > @@ -963,6 +963,7 @@ static void ohci_td_pkt(const char *msg, const uint8_t 
> > *buf, size_t len)
> >  
> >  p += sprintf(p, " %.2x", buf[i]);
> >  }
> > +#undef __TEMP_WIDTH
> >  }
> >  #else
> >  static void ohci_td_pkt(const char *msg, const uint8_t *buf, size_t len)
> > 
> 
> This is a compiler false positive.  You can change "i % width" to
> 
>p - tmp == ARRAY_SIZE(tmp) - 1
> 
> if you want to avoid it, but I'd just ignore this one.

Then I'd like to drop this patch for now.

Btw, do you know why the compiler got this false positive? Since we
declared width as constant. Is it a... "bug" or a "feature"?

Thanks.
Peter

Re: [Qemu-devel] [PULL 00/14] Net patches

2016-03-08 Thread Li Zhijian




On 03/09/2016 09:36 AM, Wen Congyang wrote:

On 03/08/2016 05:54 PM, Peter Maydell wrote:

On 8 March 2016 at 16:06, Zhang Chen  wrote:

I found the reason for this problem is that
unix_connect() have not connect to sock_path before iov_send().
It need time to establish connection. so can we fix it with usleep()
like this:

 recv_sock = unix_connect(sock_path, NULL);
 g_assert_cmpint(recv_sock, !=, -1);
+usleep(1000);

 ret = iov_send(send_sock[0], iov, 2, 0, sizeof(size) +
sizeof(send_buf));
 g_assert_cmpint(ret, ==, sizeof(send_buf) + sizeof(size));
 close(send_sock[0]);

 ret = qemu_recv(recv_sock, &len, sizeof(len), 0);


I would prefer it if we could find a way to fix this race
reliably rather than just inserting a delay and hoping it
is sufficient. Otherwise the test is likely to be unreliable
if run on a heavily loaded or slow machine.


Yes, but there is no way to know when tcp_chr_accept() is called. Add a event
to notify it?

Thanks
Wen Congyang



Hi, Jason, PMM
As Congyang said that this is a bug of testcase instead of filter-mirror.
Maybe we should re-wrok the testcase, for example
- using -chardev pipe instead of -chardev socket, because we are
  intend to test the packet mirror fuction instead of -chardev socket

How about that ?




thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 01/10] ipmi: remove IPMI_CHECK_CMD_LEN() macro

2016-03-08 Thread Corey Minyard


On 03/09/2016 12:06 AM, Cédric Le Goater wrote:

On 03/07/2016 11:40 AM, Cédric Le Goater wrote:

On 03/05/2016 12:41 PM, Corey Minyard wrote:

On 03/02/2016 04:14 AM, Cédric Le Goater wrote:

Most IPMI command handlers in the BMC simulator start with a call to
the macro IPMI_CHECK_CMD_LEN() which verifies that a minimal number of
arguments expected by the command are indeed available. To achieve
this task, the macro implicitly uses local variables which is
misleading in the code.

This patch adds a 'cmd_len_min' attribute to the struct IPMICmdHandler
defining the minimal number of arguments expected by the command and
moves this check in the global command handler ipmi_sim_handle_command().

This is much better.  One style comment inline...

Acked-by: Corey Minyard 


Signed-off-by: Cédric Le Goater 
---
   hw/ipmi/ipmi_bmc_sim.c | 137 
++---
   1 file changed, 62 insertions(+), 75 deletions(-)

diff --git a/hw/ipmi/ipmi_bmc_sim.c b/hw/ipmi/ipmi_bmc_sim.c
index 51d234aa1bf2..30b9fb48ea2d 100644
--- a/hw/ipmi/ipmi_bmc_sim.c
+++ b/hw/ipmi/ipmi_bmc_sim.c
@@ -155,10 +155,15 @@ typedef struct IPMISensor {
   typedef struct IPMIBmcSim IPMIBmcSim;

   #define MAX_NETFNS 64
-typedef void (*IPMICmdHandler)(IPMIBmcSim *s,
-   uint8_t *cmd, unsigned int cmd_len,
-   uint8_t *rsp, unsigned int *rsp_len,
-   unsigned int max_rsp_len);
+
+typedef struct IPMICmdHandler {
+void (*cmd_handler)(IPMIBmcSim *s,
+uint8_t *cmd, unsigned int cmd_len,
+uint8_t *rsp, unsigned int *rsp_len,
+unsigned int max_rsp_len);
+unsigned int cmd_len_min;
+} IPMICmdHandler;
+
   typedef struct IPMINetfn {
   unsigned int cmd_nums;
   const IPMICmdHandler *cmd_handlers;
@@ -269,13 +274,6 @@ struct IPMIBmcSim {
   rsp[(*rsp_len)++] = (b);   \
   } while (0)

-/* Verify that the received command is a certain length. */
-#define IPMI_CHECK_CMD_LEN(l) \
-if (cmd_len < l) { \
-rsp[2] = IPMI_CC_REQUEST_DATA_LENGTH_INVALID;  \
-return; \
-}
-
   /* Check that the reservation in the command is valid. */
   #define IPMI_CHECK_RESERVATION(off, r) \
   do {   \
@@ -623,14 +621,19 @@ static void ipmi_sim_handle_command(IPMIBmc *b,

   /* Odd netfns are not valid, make sure the command is registered */
   if ((netfn & 1) || !ibs->netfns[netfn / 2] ||
-(cmd[1] >= ibs->netfns[netfn / 2]->cmd_nums) ||
-(!ibs->netfns[netfn / 2]->cmd_handlers[cmd[1]])) {
+(cmd[1] >= ibs->netfns[netfn / 2]->cmd_nums) ||
+(!ibs->netfns[netfn / 2]->cmd_handlers[cmd[1]].cmd_handler)) {
   rsp[2] = IPMI_CC_INVALID_CMD;
   goto out;

I'm not sure of the qemu style here, but the above change makes the code hard 
to read.

Cleary. I was lazy. I will try to clarify what this code does with a helper
or some intermediate variables.

How's that  ?

@@ -591,6 +589,7 @@ static void ipmi_sim_handle_command(IPMI
  unsigned int rsp_len_holder = 0;
  unsigned int *rsp_len = &rsp_len_holder;
  unsigned int max_rsp_len = sizeof(rsp);
+const IPMICmdHandler *hdl;
  
  /* Set up the response, set the low bit of NETFN. */

  /* Note that max_rsp_len must be at least 3 */
@@ -623,14 +622,23 @@ static void ipmi_sim_handle_command(IPMI
  
  /* Odd netfns are not valid, make sure the command is registered */

  if ((netfn & 1) || !ibs->netfns[netfn / 2] ||
-(cmd[1] >= ibs->netfns[netfn / 2]->cmd_nums) ||
-(!ibs->netfns[netfn / 2]->cmd_handlers[cmd[1]])) {
+(cmd[1] >= ibs->netfns[netfn / 2]->cmd_nums)) {
  rsp[2] = IPMI_CC_INVALID_CMD;
  goto out;
  }
  


This cleans things up a bit.  The curse of a 4 character indent is that
the line that follows a multi-line if statement is at the same indent as
the lines in the if statement, so it's hard to tell where the if statement
ends and where the code in the if statement begins.  This is not bad,
but if you have if statements like the gcc compiler (at least how the
gcc compiler used to be, haven't looked at it recently) it makes your
head hurt.

The Linux indent rule for this is that you indent the second and later
lines of an if statement way over, or you break it up.

I looked through a few of the major qemu files, and I couldn't find
a multi-line if statement, so it's hard to say.  That is perhaps telling :).

-corey


-ibs->netfns[netfn / 2]->cmd_handlers[cmd[1]](ibs, cmd, cmd_len, rsp, 
rsp_len,
-max_rsp_len);
+hdl = &ibs->netfns[netfn / 2]->cmd_handlers[cmd[1]];
+if (!hdl->cmd_handler) {
+rsp[2] = IPMI_CC_INVALID_CMD;
+goto ou

[Qemu-devel] [PATCH v11 3/3] qmp: add monitor command to add/remove a child

2016-03-08 Thread Changlong Xie

From: Wen Congyang 

The new QMP command name is x-blockdev-change. It's just for adding/removing
quorum's child now, and doesn't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 blockdev.c   | 55 
 qapi/block-core.json | 32 ++
 qmp-commands.hx  | 54 +++
 3 files changed, 141 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 0f20c65..435631e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -4017,6 +4017,61 @@ out:
 aio_context_release(aio_context);
 }
 
+static BdrvChild *bdrv_find_child(BlockDriverState *parent_bs,
+  const char *child_name)
+{
+BdrvChild *child;
+
+QLIST_FOREACH(child, &parent_bs->children, next) {
+if (strcmp(child->name, child_name) == 0) {
+return child;
+}
+}
+
+return NULL;
+}
+
+void qmp_x_blockdev_change(const char *parent, bool has_child,
+   const char *child, bool has_node,
+   const char *node, Error **errp)
+{
+BlockDriverState *parent_bs, *new_bs = NULL;
+BdrvChild *p_child;
+
+parent_bs = bdrv_lookup_bs(parent, parent, errp);
+if (!parent_bs) {
+return;
+}
+
+if (has_child == has_node) {
+if (has_child) {
+error_setg(errp, "The parameters child and node are in conflict");
+} else {
+error_setg(errp, "Either child or node must be specified");
+}
+return;
+}
+
+if (has_child) {
+p_child = bdrv_find_child(parent_bs, child);
+if (!p_child) {
+error_setg(errp, "Node '%s' does not have child '%s'",
+   parent, child);
+return;
+}
+bdrv_del_child(parent_bs, p_child, errp);
+}
+
+if (has_node) {
+new_bs = bdrv_find_node(node);
+if (!new_bs) {
+error_setg(errp, "Node '%s' not found", node);
+return;
+}
+bdrv_add_child(parent_bs, new_bs, errp);
+}
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
 BlockJobInfoList *head = NULL, **p_next = &head;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9bf1b22..bc3fd0b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2548,3 +2548,35 @@
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @x-blockdev-change
+#
+# Dynamically reconfigure the block driver state graph. It can be used
+# to add, remove, insert or replace a graph node. Currently only the
+# Quorum driver implements this feature to add or remove its child. This
+# is useful to fix a broken quorum child.
+#
+# If @node is specified, it will be inserted under @parent. @child
+# may not be specified in this case. If both @parent and @child are
+# specified but @node is not, @child will be detached from @parent.
+#
+# @parent: the id or name of the parent node.
+#
+# @child: #optional the name of a child under the given parent node.
+#
+# @node: #optional the name of the node that will be added.
+#
+# Note: this command is experimental, and its API is not stable. It
+# does not support all kinds of operations, all kinds of children, nor
+# all block drivers.
+#
+# Warning: The data in a new quorum child MUST be consistent with that of
+# the rest of the array.
+#
+# Since: 2.6
+##
+{ 'command': 'x-blockdev-change',
+  'data' : { 'parent': 'str',
+ '*child': 'str',
+ '*node': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index b629673..2a55135 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4398,6 +4398,60 @@ Example:
 EQMP
 
 {
+.name   = "x-blockdev-change",
+.args_type  = "parent:B,child:B?,node:B?",
+.mhandler.cmd_new = qmp_marshal_x_blockdev_change,
+},
+
+SQMP
+x-blockdev-change
+-
+
+Dynamically reconfigure the block driver state graph. It can be used
+to add, remove, insert or replace a graph node. Currently only the
+Quorum driver implements this feature to add or remove its child. This
+is useful to fix a broken quorum child.
+
+If @node is specified, it will be inserted under @parent. @child
+may not be specified in this case. If both @parent and @child are
+specified but @node is not, @child will be detached from @parent.
+
+Arguments:
+- "parent": the id or name of the parent node (json-string)
+- "child": the name of a child under the given parent node (json-string, 
optional)
+- "node": the name of the node that will be added (json-string, optional)
+
+Note: this command is experimental, and not a stable API. It doesn't
+support all kinds of operations, all kinds of children, nor all block
+driv

[Qemu-devel] [PATCH v11 0/3] qapi: child add/delete support

2016-03-08 Thread Changlong Xie

ChangLog:
v10~v11:
1. Rebase to the newest codes
2. Address comment from Max
Don't use contractions in error messages,
p1: Remove R-Bs, and use "BdrvChild *child" in bdrv_del_child
p2: Fix error logic in get_new_child_index/remove_child_index, and prefect
child->name parsing
p3: Make bdrv_find_child return BdrvChild *, and add missing explanation

v9~v10:
1. Rebase to the newest codes
2. Address comments from Berto and Max, update the documents in block-core.json 
   and qmp-commands.hx 
3. Remove redundant codes in quorum_add_child() and quorum_del_child()
v8:
1. Rebase to the newest codes
2. Address the comments from Eric Blake
v7:
1. Remove the qmp command x-blockdev-change's parameter operation according
   to Kevin's comments.
2. Remove the hmp command.
v6:
1. Use a single qmp command x-blockdev-change to replace x-blockdev-child-add
   and x-blockdev-child-delete
v5:
1. Address Eric Blake's comments
v4:
1. drop nbd driver's implementation. We can use human-monitor-command
   to do it.
2. Rename the command name.
v3:
1. Don't open BDS in bdrv_add_child(). Use the existing BDS which is
   created by the QMP command blockdev-add.
2. The driver NBD can support filename, path, host:port now.
v2:
1. Use bdrv_get_device_or_node_name() instead of new function
   bdrv_get_id_or_node_name()
2. Update the error message
3. Update the documents in block-core.json

Wen Congyang (3):
  Add new block driver interface to add/delete a BDS's child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  qmp: add monitor command to add/remove a child

 block.c   |  57 +++--
 block/quorum.c| 123 +-
 blockdev.c|  55 +
 include/block/block.h |   8 +++
 include/block/block_int.h |   5 ++
 qapi/block-core.json  |  32 
 qmp-commands.hx   |  54 
 7 files changed, 328 insertions(+), 6 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH v11 1/3] Add new block driver interface to add/delete a BDS's child

2016-03-08 Thread Changlong Xie

From: Wen Congyang 

In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block.c   | 49 +++
 include/block/block.h |  4 
 include/block/block_int.h |  5 +
 3 files changed, 58 insertions(+)

diff --git a/block.c b/block.c
index ba24b8e..d48f441 100644
--- a/block.c
+++ b/block.c
@@ -4395,3 +4395,52 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 QDECREF(json);
 }
 }
+
+/*
+ * Hot add/remove a BDS's child. So the user can take a child offline when
+ * it is broken and take a new child online
+ */
+void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
+Error **errp)
+{
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_add_child) {
+error_setg(errp, "The node %s does not support adding a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+if (!QLIST_EMPTY(&child_bs->parents)) {
+error_setg(errp, "The node %s already has a parent",
+   child_bs->node_name);
+return;
+}
+
+parent_bs->drv->bdrv_add_child(parent_bs, child_bs, errp);
+}
+
+void bdrv_del_child(BlockDriverState *parent_bs, BdrvChild *child, Error 
**errp)
+{
+BdrvChild *tmp;
+
+if (!parent_bs->drv || !parent_bs->drv->bdrv_del_child) {
+error_setg(errp, "The node %s does not support removing a child",
+   bdrv_get_device_or_node_name(parent_bs));
+return;
+}
+
+QLIST_FOREACH(tmp, &parent_bs->children, next) {
+if (tmp == child) {
+break;
+}
+}
+
+if (!tmp) {
+error_setg(errp, "The node %s does not have child named %s",
+   bdrv_get_device_or_node_name(parent_bs),
+   bdrv_get_device_or_node_name(child->bs));
+return;
+}
+
+parent_bs->drv->bdrv_del_child(parent_bs, child, errp);
+}
diff --git a/include/block/block.h b/include/block/block.h
index 1c4f4d8..7378e74 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -582,4 +582,8 @@ void bdrv_drained_begin(BlockDriverState *bs);
  */
 void bdrv_drained_end(BlockDriverState *bs);
 
+void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
+Error **errp);
+void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 9ef823a..704efe5 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -305,6 +305,11 @@ struct BlockDriver {
  */
 void (*bdrv_drain)(BlockDriverState *bs);
 
+void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
+   Error **errp);
+void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child,
+   Error **errp);
+
 QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
1.9.3

[Qemu-devel] [PATCH v11 2/3] quorum: implement bdrv_add_child() and bdrv_del_child()

2016-03-08 Thread Changlong Xie

From: Wen Congyang 

Signed-off-by: Wen Congyang 
Signed-off-by: zhanghailiang 
Signed-off-by: Gonglei 
Signed-off-by: Changlong Xie 
---
 block.c   |   8 ++--
 block/quorum.c| 123 +-
 include/block/block.h |   4 ++
 3 files changed, 129 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index d48f441..66d21af 100644
--- a/block.c
+++ b/block.c
@@ -1194,10 +1194,10 @@ static int bdrv_fill_options(QDict **options, const 
char *filename,
 return 0;
 }
 
-static BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-BlockDriverState *child_bs,
-const char *child_name,
-const BdrvChildRole *child_role)
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildRole *child_role)
 {
 BdrvChild *child = g_new(BdrvChild, 1);
 *child = (BdrvChild) {
diff --git a/block/quorum.c b/block/quorum.c
index 11cc60b..469e4a3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -24,6 +24,7 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi-event.h"
 #include "crypto/hash.h"
+#include "qemu/bitmap.h"
 
 #define HASH_LENGTH 32
 
@@ -81,6 +82,8 @@ typedef struct BDRVQuorumState {
 bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted
 * block if Quorum is reached.
 */
+unsigned long *index_bitmap;
+int bsize;
 
 QuorumReadPattern read_pattern;
 } BDRVQuorumState;
@@ -877,9 +880,9 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -EINVAL;
 goto exit;
 }
-if (s->num_children < 2) {
+if (s->num_children < 1) {
 error_setg(&local_err,
-   "Number of provided children must be greater than 1");
+   "Number of provided children must be 1 or more");
 ret = -EINVAL;
 goto exit;
 }
@@ -928,6 +931,7 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 /* allocate the children array */
 s->children = g_new0(BdrvChild *, s->num_children);
 opened = g_new0(bool, s->num_children);
+s->index_bitmap = bitmap_new(s->num_children);
 
 for (i = 0; i < s->num_children; i++) {
 char indexstr[32];
@@ -943,6 +947,8 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 opened[i] = true;
 }
+bitmap_set(s->index_bitmap, 0, s->num_children);
+s->bsize = s->num_children;
 
 g_free(opened);
 goto exit;
@@ -999,6 +1005,116 @@ static void quorum_attach_aio_context(BlockDriverState 
*bs,
 }
 }
 
+static int get_new_child_index(BDRVQuorumState *s)
+{
+int index;
+
+index = find_next_zero_bit(s->index_bitmap, s->bsize, 0);
+if (index < s->bsize) {
+return index;
+}
+
+s->index_bitmap = bitmap_zero_extend(s->index_bitmap, s->bsize,
+ s->bsize + 1);
+return s->bsize++;
+}
+
+static void remove_child_index(BDRVQuorumState *s, int index)
+{
+int last_index, old_bsize;
+size_t new_len;
+
+assert(index < s->bsize);
+
+clear_bit(index, s->index_bitmap);
+if (index < s->bsize - 1) {
+/* The last bit is always set */
+return;
+}
+
+/* Clear last bit */
+old_bsize = s->bsize;
+last_index = find_last_bit(s->index_bitmap, s->bsize);
+assert(last_index < old_bsize);
+s->bsize = last_index + 1;
+
+if (BITS_TO_LONGS(old_bsize) == BITS_TO_LONGS(s->bsize)) {
+return;
+}
+
+new_len = BITS_TO_LONGS(s->bsize) * sizeof(unsigned long);
+s->index_bitmap = g_realloc(s->index_bitmap, new_len);
+}
+
+static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+BdrvChild *child;
+char indexstr[32];
+int index, ret;
+
+index = get_new_child_index(s);
+ret = snprintf(indexstr, 32, "children.%d", index);
+if (ret < 0 || ret >= 32) {
+error_setg(errp, "cannot generate child name");
+return;
+}
+
+bdrv_drain(bs);
+
+assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
+if (s->num_children == INT_MAX / sizeof(BdrvChild *)) {
+error_setg(errp, "Too many children");
+return;
+}
+s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
+
+bdrv_ref(child_bs);
+child = bdrv_attach_child(bs, child_bs, indexstr, &child_format);
+s->children[s->num_children++] = child;
+set_bit(index, s->index_bitmap);
+}
+
+static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
+ Error **errp)
+{
+BDRVQuorumState *s = bs->opaque;
+in

Re: [Qemu-devel] [PATCH v2 2/5] spapr: convert slot name property to numeric core and links

2016-03-08 Thread David Gibson

On Tue, Mar 08, 2016 at 02:18:12PM +0100, Igor Mammedov wrote:
> it's just a hack to get qiuck swith to numeric core id
> should be split and merged in patches
> introducing modified code.
> 
> Signed-off-by: Igor Mammedov 
> ---
>  hw/cpu/core.c   | 32 +++-
>  hw/ppc/spapr.c  | 39 ++-
>  hw/ppc/spapr_cpu_core.c | 25 ++---
>  include/hw/cpu/core.h   |  4 ++--
>  4 files changed, 29 insertions(+), 71 deletions(-)
> 
> diff --git a/hw/cpu/core.c b/hw/cpu/core.c
> index d8caf37..90a9408 100644
> --- a/hw/cpu/core.c
> +++ b/hw/cpu/core.c
> @@ -7,25 +7,39 @@
>   * See the COPYING file in the top-level directory.
>   */
>  #include "hw/cpu/core.h"
> +#include "qapi/visitor.h"
>  
> -static char *core_prop_get_slot(Object *obj, Error **errp)
> +static void core_prop_get_core(Object *obj, Visitor *v,
> +   const char *name, void *opaque,
> +   Error **errp)
>  {
> -CPUCore *core = CPU_CORE(obj);
> +CPUCore *cc = CPU_CORE(obj);
> +int64_t value = cc->core;
>  
> -return g_strdup(core->slot);
> +visit_type_int(v, name, &value, errp);
>  }
>  
> -static void core_prop_set_slot(Object *obj, const char *val, Error **errp)
> +static void core_prop_set_core(Object *obj, Visitor *v,
> +   const char *name, void *opaque,
> +   Error **errp)
>  {
> -CPUCore *core = CPU_CORE(obj);
> -
> -core->slot = g_strdup(val);
> +CPUCore *cc = CPU_CORE(obj);
> +Error *local_err = NULL;
> +int64_t value;
> +
> +visit_type_int(v, name, &value, &local_err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}
> +cc->core = value;
>  }
>  
>  static void cpu_core_instance_init(Object *obj)
>  {
> -object_property_add_str(obj, "slot", core_prop_get_slot, 
> core_prop_set_slot,
> -NULL);
> +object_property_add(obj, CPU_CORE_ID_PROP, "int",
> +core_prop_get_core, core_prop_set_core,
> +NULL, NULL, NULL);
>  }

Something we should clarify at some point: is the core property
intended to be globally unique on its own, or just unique in
combination with other information (e.g. socket and/or node).

>  static const TypeInfo cpu_core_type_info = {
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6173c1b..6890a44 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1755,28 +1755,6 @@ static void spapr_validate_node_memory(MachineState 
> *machine, Error **errp)
>  }
>  }
>  
> -/*
> - * Check to see if core is being hot-plugged into an already populated slot.
> - */
> -static void spapr_cpu_core_allow_set_link(Object *obj, const char *name,
> -  Object *val, Error **errp)
> -{
> -Object *core = object_property_get_link(qdev_get_machine(), name, NULL);
> -
> -/*
> - * Allow the link to be unset when the core is unplugged.
> - */
> -if (!val) {
> -return;
> -}
> -
> -if (core) {
> -char *path = object_get_canonical_path(core);
> -error_setg(errp, "Slot %s already populated with %s", name, path);
> -g_free(path);
> -}
> -}
> -
>  /* pSeries LPAR / sPAPR hardware init */
>  static void ppc_spapr_init(MachineState *machine)
>  {
> @@ -1884,21 +1862,8 @@ static void ppc_spapr_init(MachineState *machine)
>  spapr->cores = g_new0(Object *, spapr_max_cores);
>  
>  for (i = 0; i < spapr_max_cores; i++) {
> -char name[32];
> -
> -/*
> - * Create links from machine objects to all possible cores.
> - */
> -snprintf(name, sizeof(name), "%s[%d]", SPAPR_MACHINE_CPU_CORE_PROP, 
> i);
> -object_property_add_link(OBJECT(spapr), name, TYPE_SPAPR_CPU_CORE,
> - (Object **)&spapr->cores[i],
> - spapr_cpu_core_allow_set_link,
> - OBJ_PROP_LINK_UNREF_ON_RELEASE,
> - &error_fatal);
> -
>  /*
> - * Create cores and set link from machine object to core object for
> - * boot time CPUs and realize them.
> + * Create cores for boot time CPUs and realize them.
>   */
>  if (i < spapr_cores) {
>  Object *core  = object_new(TYPE_SPAPR_CPU_CORE);
> @@ -1907,7 +1872,7 @@ static void ppc_spapr_init(MachineState *machine)
>  &error_fatal);
>  object_property_set_int(core, smp_threads, "nr_threads",
>  &error_fatal);
> -object_property_set_str(core, name, CPU_CORE_SLOT_PROP,
> +object_property_set_int(core, i, CPU_CORE_ID_PROP,
>  &error_fatal);

Not really important for this quick hack, but for spapr

Re: [Qemu-devel] [PATCH v2 1/5] QMP: add query-hotpluggable-cpus

2016-03-08 Thread David Gibson

On Tue, Mar 08, 2016 at 09:46:58AM -0700, Eric Blake wrote:
> On 03/08/2016 06:18 AM, Igor Mammedov wrote:
> > it will allow mgmt to query present and possible to hotplug
> 
> maybe s/possible to hotplug/hotpluggable/
> 
> > CPU objects, it is required from a target platform that
> > wish to support command to implement
> >  qmp_query_hotpluggable_cpus()
> > functioni, which will return a list of possible CPU objects
> 
> s/functioni/function/
> 
> > with options that would be needed for hotplugging possible
> > CPU objects.
> > 
> > For RFC there are:
> > 'type': 'str' - OQOM CPU object type for usage with device_add
> 
> s/OQOM/QOM/ ?
> 
> > 
> > and a set of optional fields that are to used for hotplugging
> > a CPU objects and would allows mgmt tools to know what/where
> > it could be hotplugged;
> > [node],[socket],[core],[thread]
> > 
> > For present CPUs there is a 'qom-path' field which
> > would allow mgmt inspect whatever object/abstraction
> 
> s/inspect/to inspect/
> 
> > the target platform considers as CPU object.
> > 
> > Signed-off-by: Igor Mammedov 
> > ---
> >  qapi-schema.json| 39 
> > +
> >  qmp-commands.hx | 34 
> >  stubs/Makefile.objs |  1 +
> >  stubs/qmp_query_hotpluggable_cpus.c |  9 +
> >  4 files changed, 83 insertions(+)
> >  create mode 100644 stubs/qmp_query_hotpluggable_cpus.c
> > 
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 362c9d8..c59840d 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -4122,3 +4122,42 @@
> >  ##
> >  { 'enum': 'ReplayMode',
> >'data': [ 'none', 'record', 'play' ] }
> > +
> > +##
> > +# CpuInstanceProps
> 
> Worth spelling this as Properties instead of abbreviating?  But the type
> names aren't ABI (they don't affect introspection), so I'm not insisting.
> 
> > +#
> > +# @node: NUMA node ID the CPU belongs to, optional
> 
> Elsewhere, we use the tag '#optional', not 'optional, so that when we
> finally get Marc-Andre's patches for auto-generating docs, they will
> have a sane string to search for.
> 
> > +# @socket: socket number within node/board the CPU belongs to, optional
> > +# @core: core number within socket the CPU belongs to, optional
> > +# @thread: thread number within core the CPU belongs to, optional
> > +#
> > +# Since: 2.7
> 
> Ah, so you've already conceded that this is too much of a feature too
> late past 2.6 soft freeze.
> 
> > +{ 'struct': 'CpuInstanceProps',
> > +  'data': { '*node': 'int',
> > +'*socket': 'int',
> > +'*core': 'int',
> > +'*thread': 'int'
> > +  }
> > +}
> > +
> > +##
> > +# @HotpluggableCPU
> > +#
> > +# @type: CPU object type for usage with device_add command
> > +# @qom-path: link to existing CPU object if CPU is present or
> > +#omitted if CPU is not present.
> 
> Missing '#optional' marker.
> 
> > +# @props: list of properties to be used for hotplugging CPU
> 
> Is this always going to be present, or should it have an '#optional'
> marker?  Right now, it could be present but content-free, as in
> "props":{}, if that would help users realize that the particular CPU has
> no further tuning available for hotplug purposes.

I think it should be always present, even if it can be empty.

> > +#
> > +# Since: 2.7
> > +{ 'struct': 'HotpluggableCPU',
> > +  'data': { 'type': 'str',
> > +'*qom-path': 'str',
> > +'*props': 'CpuInstanceProps'
> > +  }
> > +}
> > +
> > +##
> > +# @query-hotpluggable-cpus
> > +#
> > +# Since: 2.6
> 
> Inconsistent - how can the command be 2.6 if the structures it uses are 2.7?
> 
> > +{ 'command': 'query-hotpluggable-cpus', 'returns': ['HotpluggableCPU'] }
> 
> > +Example for x86 target started with -smp 
> > 2,sockets=2,cores=1,threads=3,maxcpus=6:
> > +
> > +-> { "execute": "query-hotpluggable-cpus" }
> > +<- {"return": [
> > + {"core": 0, "socket": 1, "thread": 2}, "type": "qemu64-x86_64-cpu"},
> 
> Not valid JSON.  You probably meant:
> 
> {"return": [
>   { "props": {"core"...2}, "type"...},
> 
> > + {"core": 0, "socket": 0, "thread": 1}, "type": "qemu64-x86_64-cpu", 
> > "qom-path": "/machine/unattached/device[3]"},
> 
> It's okay to line-wrap, to keep 80-column lines.
> 
> > + {"core": 0, "socket": 0, "thread": 0}, "type": "qemu64-x86_64-cpu", 
> > "qom-path": "/machine/unattached/device[0]"}
> > +   ]}'
> > +
> > +Example for SPAPR target started with -smp 2,cores=2,maxcpus=4:
> > +
> > +-> { "execute": "query-hotpluggable-cpus" }
> > +<- {"return": [
> > + {"core": 1 }, "type": "spapr-cpu-core"},
> 
> Again, not valid JSON.
> 
> But useful examples.
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP

Re: [Qemu-devel] [PATCH 07/16] block: change drain to look only at one child at a time

2016-03-08 Thread Fam Zheng

On Tue, 02/16 18:56, Paolo Bonzini wrote:
> bdrv_requests_pending is checking children to also wait until internal
> requests (such as metadata writes) have completed.  However, checking
> children is in general overkill because, apart from this special case,
> the parent's in_flight count will always be incremented by at least one
> for every request in the child.

While the patch looks good, I'm not sure I understand this: aren't children's
requests generated by the child itself, how is parent's in_flight incremented?

> 
> Since internal requests are only generated by the parent in the child,
> instead visit the tree parent first, and then wait for internal I/O in
> the children to complete.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io.c | 27 +++
>  1 file changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index a9a23a6..e0c9215 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -249,6 +249,23 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
>  }
>  }
>  
> +static bool bdrv_drain_io_recurse(BlockDriverState *bs)
> +{
> +BdrvChild *child;
> +bool waited = false;
> +
> +while (atomic_read(&bs->in_flight) > 0) {
> +aio_poll(bdrv_get_aio_context(bs), true);
> +waited = true;
> +}
> +
> +QLIST_FOREACH(child, &bs->children, next) {
> +waited |= bdrv_drain_io_recurse(child->bs);
> +}
> +
> +return waited;
> +}
> +
>  /*
>   * Wait for pending requests to complete on a single BlockDriverState 
> subtree,
>   * and suspend block driver's internal I/O until next request arrives.
> @@ -265,10 +282,7 @@ void bdrv_drain(BlockDriverState *bs)
>  bdrv_no_throttling_begin(bs);
>  bdrv_io_unplugged_begin(bs);
>  bdrv_drain_recurse(bs);
> -while (bdrv_requests_pending(bs)) {
> -/* Keep iterating */
> - aio_poll(bdrv_get_aio_context(bs), true);
> -}
> +bdrv_drain_io_recurse(bs);
>  bdrv_io_unplugged_end(bs);
>  bdrv_no_throttling_end(bs);
>  }
> @@ -319,10 +333,7 @@ void bdrv_drain_all(void)
>  aio_context_acquire(aio_context);
>  while ((bs = bdrv_next(bs))) {
>  if (aio_context == bdrv_get_aio_context(bs)) {
> -if (bdrv_requests_pending(bs)) {
> -aio_poll(aio_context, true);
> -waited = true;
> -}
> +waited |= bdrv_drain_io_recurse(bs);
>  }
>  }
>  aio_context_release(aio_context);
> -- 
> 2.5.0
> 
> 
>

Re: [Qemu-devel] [PATCH 06/16] block: add BDS field to count in-flight requests

2016-03-08 Thread Fam Zheng

On Tue, 02/16 18:56, Paolo Bonzini wrote:
> Unlike tracked_requests, this field also counts throttled requests,
> and remains non-zero if an AIO operation needs a BH to be "really"
> completed.
> 
> With this change, it is no longer necessary to have a dummy
> BdrvTrackedRequest for requests that are never serialising, and
> it is no longer necessary to poll the AioContext once after
> bdrv_requests_pending(bs) returns false.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io.c| 71 
> +++
>  block/mirror.c|  2 ++
>  include/block/block_int.h |  8 --
>  3 files changed, 54 insertions(+), 27 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index d8d50b7..a9a23a6 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -224,13 +224,7 @@ bool bdrv_requests_pending(BlockDriverState *bs)
>  {
>  BdrvChild *child;
>  
> -if (!QLIST_EMPTY(&bs->tracked_requests)) {
> -return true;
> -}
> -if (!qemu_co_queue_empty(&bs->throttled_reqs[0])) {
> -return true;
> -}
> -if (!qemu_co_queue_empty(&bs->throttled_reqs[1])) {
> +if (atomic_read(&bs->in_flight)) {
>  return true;
>  }
>  
> @@ -268,15 +262,12 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
>   */
>  void bdrv_drain(BlockDriverState *bs)
>  {
> -bool busy = true;
> -
>  bdrv_no_throttling_begin(bs);
>  bdrv_io_unplugged_begin(bs);
>  bdrv_drain_recurse(bs);
> -while (busy) {
> +while (bdrv_requests_pending(bs)) {
>  /* Keep iterating */
> - busy = bdrv_requests_pending(bs);
> - busy |= aio_poll(bdrv_get_aio_context(bs), busy);
> + aio_poll(bdrv_get_aio_context(bs), true);
>  }
>  bdrv_io_unplugged_end(bs);
>  bdrv_no_throttling_end(bs);
> @@ -291,7 +282,7 @@ void bdrv_drain(BlockDriverState *bs)
>  void bdrv_drain_all(void)
>  {
>  /* Always run first iteration so any pending completion BHs run */
> -bool busy = true;
> +bool waited = true;
>  BlockDriverState *bs = NULL;
>  GSList *aio_ctxs = NULL, *ctx;
>  
> @@ -318,8 +309,8 @@ void bdrv_drain_all(void)
>   * request completion.  Therefore we must keep looping until there was no
>   * more activity rather than simply draining each device independently.
>   */
> -while (busy) {
> -busy = false;
> +while (waited) {
> +waited = false;
>  
>  for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) {
>  AioContext *aio_context = ctx->data;
> @@ -329,12 +320,11 @@ void bdrv_drain_all(void)
>  while ((bs = bdrv_next(bs))) {
>  if (aio_context == bdrv_get_aio_context(bs)) {
>  if (bdrv_requests_pending(bs)) {
> -busy = true;
> -aio_poll(aio_context, busy);
> +aio_poll(aio_context, true);
> +waited = true;
>  }
>  }
>  }
> -busy |= aio_poll(aio_context, false);
>  aio_context_release(aio_context);
>  }
>  }
> @@ -457,6 +447,16 @@ static bool tracked_request_overlaps(BdrvTrackedRequest 
> *req,
>  return true;
>  }
>  
> +void bdrv_inc_in_flight(BlockDriverState *bs)
> +{
> +atomic_inc(&bs->in_flight);
> +}
> +
> +void bdrv_dec_in_flight(BlockDriverState *bs)
> +{
> +atomic_dec(&bs->in_flight);
> +}
> +
>  static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
>  {
>  BlockDriverState *bs = self->bs;
> @@ -963,6 +963,8 @@ static int coroutine_fn 
> bdrv_co_do_preadv(BlockDriverState *bs,
>  return ret;
>  }
>  
> +bdrv_inc_in_flight(bs);
> +
>  /* Don't do copy-on-read if we read data before write operation */
>  if (bs->copy_on_read && !(flags & BDRV_REQ_NO_SERIALISING)) {
>  flags |= BDRV_REQ_COPY_ON_READ;
> @@ -1003,6 +1005,7 @@ static int coroutine_fn 
> bdrv_co_do_preadv(BlockDriverState *bs,
>use_local_qiov ? &local_qiov : qiov,
>flags);
>  tracked_request_end(&req);
> +bdrv_dec_in_flight(bs);
>  
>  if (use_local_qiov) {
>  qemu_iovec_destroy(&local_qiov);
> @@ -1310,6 +1313,8 @@ static int coroutine_fn 
> bdrv_co_do_pwritev(BlockDriverState *bs,
>  return ret;
>  }
>  
> +bdrv_inc_in_flight(bs);
> +
>  /* throttling disk I/O */
>  if (bs->throttle_state) {
>  throttle_group_co_io_limits_intercept(bs, bytes, true);
> @@ -1408,6 +1413,7 @@ fail:
>  qemu_vfree(tail_buf);
>  out:
>  tracked_request_end(&req);
> +bdrv_dec_in_flight(bs);
>  return ret;
>  }
>  
> @@ -2065,6 +2071,7 @@ static const AIOCBInfo bdrv_em_aiocb_info = {
>  static void bdrv_aio_bh_cb(void *opaque)
>  {
>  BlockAIOCBSync *acb = opaque;
> +BlockDriverState *bs = acb->common.bs;
>  
>  if (!acb->is_write && acb->ret >= 0) {
>

Re: [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication

2016-03-08 Thread Changlong Xie


On 03/05/2016 01:39 AM, Stefan Hajnoczi wrote:

On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:

+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+  Error **errp)
+{
+BlockDriverState *bs = rs->opaque;
+BDRVReplicationState *s = bs->opaque;
+int64_t active_length, hidden_length, disk_length;
+AioContext *aio_context;
+Error *local_err = NULL;
+
+if (s->replication_state != BLOCK_REPLICATION_NONE) {


Dereferencing bs and s is not thread-safe since the AioContext isn't
acquired here.  Unless you have other locking rules, the default is that
everything in BlockDriverState and bs->opaque must be accessed with the
AioContext acquired.

Please apply this comment to the rest of the patch series, too.
Functions that are not part of BlockDriver are probably called outside
the AioContext and must acquire it before they are allowed to access
anything else in bs.


+s->top_bs = get_top_bs(bs);
+if (!s->top_bs) {
+error_setg(errp, "Cannot get the top block driver state to do"
+   " internal backup");
+return;
+}


Traversing the BDS graph using the parent pointer is not safe - you
cannot stash the parent pointer because it changes if a parent node is
inserted/removed.

I suggest passing the drive name as an argument so that bdrv_lookup_bs()
can be used when installing the op blocker.  Then the BdrvChild.parent
pointer patch and get_top_bs() can be deleted.

Jeff Cody is currently working on a new op blocker system.  Hopefully
this system will allow you to install blockers on bs instead of on the
drive's top BDS.  But let's not worry about that for now and just use
the drive name as an argument.


+/*
+ * Must protect backup target if backup job was stopped/cancelled
+ * unexpectedly
+ */
+bdrv_ref(s->hidden_disk->bs);


Where is the matching bdrv_unref() call?


Two conditions
1. job is cancelled, so "local_err != 0"

449 if (local_err) {
450 error_propagate(errp, local_err);
451 backup_job_cleanup(s);
452 bdrv_unref(s->hidden_disk->bs);
453 return;
454 }

2. backup completed
backup_complete
bdrv_unref(s->target);

If i'm wrong, pls correct me.

Thanks
-Xie

Re: [Qemu-devel] [PATCH 1/8] qdict: fix unbounded stack for qdict_array_entries

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 08:04:50PM -0700, Eric Blake wrote:
> On 03/08/2016 07:57 PM, Peter Xu wrote:
> > diff --git a/qobject/qdict.c b/qobject/qdict.c
> > index 9833bd0..dde99e0 100644
> > --- a/qobject/qdict.c
> > +++ b/qobject/qdict.c
> > @@ -704,17 +704,16 @@ int qdict_array_entries(QDict *src, const char 
> > *subqdict)
> >  for (i = 0; i < INT_MAX; i++) {
> >  QObject *subqobj;
> >  int subqdict_entries;
> > -size_t slen = 32 + subqdict_len;
> > -char indexstr[slen], prefix[slen];
> > +char indexstr[128], prefix[128];
> >  size_t snprintf_ret;
> > 
> > -snprintf_ret = snprintf(indexstr, slen, "%s%u", subqdict, i);
> > -assert(snprintf_ret < slen);
> > +snprintf_ret = snprintf(indexstr, ARRAY_SIZE(indexstr), "%s%u", 
> > subqdict, i);
> > +assert(snprintf_ret < ARRAY_SIZE(indexstr));
> 
> sizeof(indexstr) works, and is a bit nicer than ARRAY_SIZE() when
> dealing with char.

Yes, will use it next time.

> 
> But I'm worried that this can trigger an abort() by someone hammering on
> the command line.  Just because we don't expect any QMP command to
> validate with a key name longer than 128 doesn't mean that we don't have
> to deal with a command line with a garbage key name that long.  What's
> wrong with just using g_strdup_printf() and heap-allocating the result,
> avoiding snprintf() and fixed lengths altogether?

Agreed.. And this should only be called once too when open the
quorum device AFAIK, and not critical path too, correct? If so, I'd
like to use g_strdup_printf() in next spin if np.

Thanks.
Peter

Re: [Qemu-devel] [PATCH 05/16] mirror: use bottom half to re-enter coroutine

2016-03-08 Thread Fam Zheng

On Tue, 02/16 18:56, Paolo Bonzini wrote:
> mirror is calling bdrv_drain from an AIO callback---more precisely,
> the bdrv_drain happens far away from the AIO callback, in the coroutine that
> the AIO callback enters.
> 
> This used to be okay because bdrv_drain more or less tried to guess
> when all AIO callbacks were done; however it will cause a deadlock once
> bdrv_drain really checks for AIO callbacks to be complete.  The situation
> here is admittedly underdefined, and Stefan pointed out that the same
> solution is found in many other places in the QEMU block layer, therefore
> I think this workaround is acceptable.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/mirror.c | 23 +++
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 2c0edfa..793c20c 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -71,6 +71,7 @@ typedef struct MirrorOp {
>  QEMUIOVector qiov;
>  int64_t sector_num;
>  int nb_sectors;
> +QEMUBH *co_enter_bh;
>  } MirrorOp;
>  
>  static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
> @@ -86,6 +87,18 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob 
> *s, bool read,
>  }
>  }
>  
> +static void mirror_bh_cb(void *opaque)
> +{
> +MirrorOp *op = opaque;
> +MirrorBlockJob *s = op->s;
> +
> +qemu_bh_delete(op->co_enter_bh);
> +g_free(op);
> +if (s->waiting_for_io) {
> +qemu_coroutine_enter(s->common.co, NULL);
> +}
> +}
> +
>  static void mirror_iteration_done(MirrorOp *op, int ret)
>  {
>  MirrorBlockJob *s = op->s;
> @@ -116,11 +129,13 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>  }
>  
>  qemu_iovec_destroy(&op->qiov);
> -g_free(op);
>  
> -if (s->waiting_for_io) {
> -qemu_coroutine_enter(s->common.co, NULL);
> -}
> +/* The I/O operation is not finished until the callback returns.
> + * If we call qemu_coroutine_enter here, there is the possibility
> + * of a deadlock when the coroutine calls bdrv_drained_begin.
> + */
> +op->co_enter_bh = qemu_bh_new(mirror_bh_cb, op);

Shouldn't this be aio_bh_new()?

Fam

> +qemu_bh_schedule(op->co_enter_bh);
>  }
>  
>  static void mirror_write_complete(void *opaque, int ret)
> -- 
> 2.5.0
> 
> 
>

Re: [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 01:17:03PM +0100, Paolo Bonzini wrote:
> 
> 
> On 08/03/2016 09:12, Markus Armbruster wrote:
> > I'm afraid this isn't a good idea.  It relies on the non-local argument
> > that nobody will ever put a key longer than 255 into a qdict that gets
> > dumped.  That may even be the case, but you need to *prove* it, not just
> > assert it.  The weakest acceptable proof might be assertions in every
> > place that put keys into a dict that might get dumped.  I suspect that's
> > practical and maintainable only if there's a single place that does it.
> > 
> > If this was a good idea, I'd recommend to avoid the awkward macro:
> > 
> >char key[256];
> >int i;
> >
> >assert(strlen(entry->key) + 1 <= ARRAY_SIZE(key));
> > 
> > There are several other ways to limit the stack usage:
> > 
> > 1. Move the array from stack to heap.  Fine unless it's on a hot path.
> >As far as I can tell, this dumping business is for HMP and qemu-io,
> >i.e. not hot.
> 
> I think this is the best.  You can just g_strdup, modify in place, print
> and free.

g_strdup() will bring one more loop? One to copy the strings, one
for replacing "-" to " ". Though I will first need to replace
g_malloc0() with g_malloc(), which seems more suitable here. :)

Thanks!
Peter

Re: [Qemu-devel] [PATCH 1/8] qdict: fix unbounded stack for qdict_array_entries

2016-03-08 Thread Eric Blake

On 03/08/2016 07:57 PM, Peter Xu wrote:
> On Tue, Mar 08, 2016 at 11:19:44AM +0100, Kevin Wolf wrote:
>> Am 08.03.2016 um 09:22 hat Markus Armbruster geschrieben:
>>> Same arguments as for PATCH 2, except here an argument on the maximum
>>> length of subqdict would probably be easier.
>>
>> Yes, these are constant string literals in all callers, including the
>> one non-test case in quorum.
>>
>> Let's simply assert a reasonable maximum for subqdict_length. The
>> minimum we need to allow with the existing callers is 9, and I expect
>> we'll never get keys longer than 16 characters.
> 
> Hi, Kevin, Markus,
> 
> The patch should be trying to do as mentioned above. To make it
> clearer, how about the following one:
> 
> diff --git a/qobject/qdict.c b/qobject/qdict.c
> index 9833bd0..dde99e0 100644
> --- a/qobject/qdict.c
> +++ b/qobject/qdict.c
> @@ -704,17 +704,16 @@ int qdict_array_entries(QDict *src, const char 
> *subqdict)
>  for (i = 0; i < INT_MAX; i++) {
>  QObject *subqobj;
>  int subqdict_entries;
> -size_t slen = 32 + subqdict_len;
> -char indexstr[slen], prefix[slen];
> +char indexstr[128], prefix[128];
>  size_t snprintf_ret;
> 
> -snprintf_ret = snprintf(indexstr, slen, "%s%u", subqdict, i);
> -assert(snprintf_ret < slen);
> +snprintf_ret = snprintf(indexstr, ARRAY_SIZE(indexstr), "%s%u", 
> subqdict, i);
> +assert(snprintf_ret < ARRAY_SIZE(indexstr));

sizeof(indexstr) works, and is a bit nicer than ARRAY_SIZE() when
dealing with char.

But I'm worried that this can trigger an abort() by someone hammering on
the command line.  Just because we don't expect any QMP command to
validate with a key name longer than 128 doesn't mean that we don't have
to deal with a command line with a garbage key name that long.  What's
wrong with just using g_strdup_printf() and heap-allocating the result,
avoiding snprintf() and fixed lengths altogether?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 2/8] block: fix unbounded stack for dump_qdict

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 02:47:31PM +0100, Markus Armbruster wrote:
> Fam Zheng  writes:
> > Also I think the double underscore identifiers are considered reserved in C,
> > no?
> 
> Correct.  C99 7.1.3 Reserved identifiers: All identifiers that begin
> with an underscore and either an uppercase letter or another underscore
> are always reserved for any use.
> 
> [...]

Fam, Markus,

Thanks to point out. Will fix.

Peter

Re: [Qemu-devel] [RFC PATCH v1 05/10] cpu: Abstract CPU core type

2016-03-08 Thread David Gibson

On Tue, Mar 08, 2016 at 10:11:17AM +0100, Igor Mammedov wrote:
> On Tue, 8 Mar 2016 14:57:10 +1100
> David Gibson  wrote:
> 
> > On Mon, Mar 07, 2016 at 11:40:11AM +0100, Igor Mammedov wrote:
> > > On Mon, 7 Mar 2016 14:01:55 +0530
> > > Bharata B Rao  wrote:
> > >   
> > > > On Mon, Mar 07, 2016 at 02:36:55PM +1100, David Gibson wrote:  
> > > > > On Fri, Mar 04, 2016 at 07:07:20PM +0100, Igor Mammedov wrote:
> > > > > > On Fri, 4 Mar 2016 16:32:53 +0530
> > > > > > Bharata B Rao  wrote:
> > > > > > 
> > > > > > > On Fri, Mar 04, 2016 at 11:38:45AM +0100, Igor Mammedov wrote:
> > > > > > > > On Fri,  4 Mar 2016 12:24:16 +0530
> > > > > > > > Bharata B Rao  wrote:
> > > > > > > >   
> > > > > > > > > Add an abstract CPU core type that could be used by machines 
> > > > > > > > > that want
> > > > > > > > > to define and hotplug CPUs in core granularity.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Bharata B Rao 
> > > > > > > > > ---
> > > > > > > > >  hw/cpu/Makefile.objs  |  1 +
> > > > > > > > >  hw/cpu/core.c | 44 
> > > > > > > > > 
> > > > > > > > >  include/hw/cpu/core.h | 30 ++
> > > > > > > > >  3 files changed, 75 insertions(+)
> > > > > > > > >  create mode 100644 hw/cpu/core.c
> > > > > > > > >  create mode 100644 include/hw/cpu/core.h
> > > > > > > > > 
> > > > > > > > > diff --git a/hw/cpu/Makefile.objs b/hw/cpu/Makefile.objs
> > > > > > > > > index 0954a18..942a4bb 100644
> > > > > > > > > --- a/hw/cpu/Makefile.objs
> > > > > > > > > +++ b/hw/cpu/Makefile.objs
> > > > > > > > > @@ -2,4 +2,5 @@ obj-$(CONFIG_ARM11MPCORE) += arm11mpcore.o
> > > > > > > > >  obj-$(CONFIG_REALVIEW) += realview_mpcore.o
> > > > > > > > >  obj-$(CONFIG_A9MPCORE) += a9mpcore.o
> > > > > > > > >  obj-$(CONFIG_A15MPCORE) += a15mpcore.o
> > > > > > > > > +obj-y += core.o
> > > > > > > > >  
> > > > > > > > > diff --git a/hw/cpu/core.c b/hw/cpu/core.c
> > > > > > > > > new file mode 100644
> > > > > > > > > index 000..d8caf37
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/hw/cpu/core.c
> > > > > > > > > @@ -0,0 +1,44 @@
> > > > > > > > > +/*
> > > > > > > > > + * CPU core abstract device
> > > > > > > > > + *
> > > > > > > > > + * Copyright (C) 2016 Bharata B Rao 
> > > > > > > > > 
> > > > > > > > > + *
> > > > > > > > > + * This work is licensed under the terms of the GNU GPL, 
> > > > > > > > > version 2 or later.
> > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > + */
> > > > > > > > > +#include "hw/cpu/core.h"
> > > > > > > > > +
> > > > > > > > > +static char *core_prop_get_slot(Object *obj, Error **errp)
> > > > > > > > > +{
> > > > > > > > > +CPUCore *core = CPU_CORE(obj);
> > > > > > > > > +
> > > > > > > > > +return g_strdup(core->slot);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void core_prop_set_slot(Object *obj, const char *val, 
> > > > > > > > > Error **errp)
> > > > > > > > > +{
> > > > > > > > > +CPUCore *core = CPU_CORE(obj);
> > > > > > > > > +
> > > > > > > > > +core->slot = g_strdup(val);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void cpu_core_instance_init(Object *obj)
> > > > > > > > > +{
> > > > > > > > > +object_property_add_str(obj, "slot", core_prop_get_slot, 
> > > > > > > > > core_prop_set_slot,
> > > > > > > > > +NULL);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static const TypeInfo cpu_core_type_info = {
> > > > > > > > > +.name = TYPE_CPU_CORE,
> > > > > > > > > +.parent = TYPE_DEVICE,
> > > > > > > > > +.abstract = true,
> > > > > > > > > +.instance_size = sizeof(CPUCore),
> > > > > > > > > +.instance_init = cpu_core_instance_init,
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > +static void cpu_core_register_types(void)
> > > > > > > > > +{
> > > > > > > > > +type_register_static(&cpu_core_type_info);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +type_init(cpu_core_register_types)
> > > > > > > > > diff --git a/include/hw/cpu/core.h b/include/hw/cpu/core.h
> > > > > > > > > new file mode 100644
> > > > > > > > > index 000..2daa724
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/include/hw/cpu/core.h
> > > > > > > > > @@ -0,0 +1,30 @@
> > > > > > > > > +/*
> > > > > > > > > + * CPU core abstract device
> > > > > > > > > + *
> > > > > > > > > + * Copyright (C) 2016 Bharata B Rao 
> > > > > > > > > 
> > > > > > > > > + *
> > > > > > > > > + * This work is licensed under the terms of the GNU GPL, 
> > > > > > > > > version 2 or later.
> > > > > > > > > + * See the COPYING file in the top-level directory.
> > > > > > > > > + */
> > > > > > > > > +#ifndef HW_CPU_CORE_H
> > > > > > > > > +#define HW_CPU_CORE_H
> > > > > > > > > +
> > > > > > > > > +#include "qemu/osdep.h"
> > > > > > > > > +#include "hw/qdev.h"
> > > > > > > > > +
> > > > > > > > > +#define TYPE_CPU_C

Re: [Qemu-devel] [RFC PATCH v1 08/10] spapr: CPU hotplug support

2016-03-08 Thread David Gibson

On Tue, Mar 08, 2016 at 10:37:08AM +0100, Igor Mammedov wrote:
> On Tue, 8 Mar 2016 15:27:39 +1100
> David Gibson  wrote:
> 
> > On Mon, Mar 07, 2016 at 11:59:42AM +0530, Bharata B Rao wrote:
> > > On Mon, Mar 07, 2016 at 02:49:06PM +1100, David Gibson wrote:  
> > > > On Fri, Mar 04, 2016 at 12:24:19PM +0530, Bharata B Rao wrote:  
> > > > > Set up device tree entries for the hotplugged CPU core and use the
> > > > > exising EPOW event infrastructure to send CPU hotplug notification to
> > > > > the guest.
> > > > > 
> > > > > Signed-off-by: Bharata B Rao 
> > > > > ---
> > > > >  hw/ppc/spapr.c  | 73 
> > > > > -
> > > > >  hw/ppc/spapr_cpu_core.c | 60 
> > > > > +
> > > > >  hw/ppc/spapr_events.c   |  3 ++
> > > > >  hw/ppc/spapr_rtas.c | 24 ++
> > > > >  include/hw/ppc/spapr.h  |  4 +++
> > > > >  include/hw/ppc/spapr_cpu_core.h |  2 ++
> > > > >  6 files changed, 165 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index 5acb612..6c4ac50 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -603,6 +603,18 @@ static void spapr_populate_cpu_dt(CPUState *cs, 
> > > > > void *fdt, int offset,
> > > > >  size_t page_sizes_prop_size;
> > > > >  uint32_t vcpus_per_socket = smp_threads * smp_cores;
> > > > >  uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
> > > > > +sPAPRMachineClass *smc = 
> > > > > SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
> > > > > +sPAPRDRConnector *drc;
> > > > > +sPAPRDRConnectorClass *drck;
> > > > > +int drc_index;
> > > > > +
> > > > > +if (smc->dr_cpu_enabled) {
> > > > > +drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, 
> > > > > index);
> > > > > +g_assert(drc);
> > > > > +drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > > > +drc_index = drck->get_index(drc);
> > > > > +_FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", 
> > > > > drc_index)));
> > > > > +}
> > > > >  
> > > > >  /* Note: we keep CI large pages off for now because a 64K 
> > > > > capable guest
> > > > >   * provisioned with large pages might otherwise try to map a qemu
> > > > > @@ -987,6 +999,16 @@ static void spapr_finalize_fdt(sPAPRMachineState 
> > > > > *spapr,
> > > > >  _FDT(spapr_drc_populate_dt(fdt, 0, NULL, 
> > > > > SPAPR_DR_CONNECTOR_TYPE_LMB));
> > > > >  }
> > > > >  
> > > > > +if (smc->dr_cpu_enabled) {
> > > > > +int offset = fdt_path_offset(fdt, "/cpus");
> > > > > +ret = spapr_drc_populate_dt(fdt, offset, NULL,
> > > > > +SPAPR_DR_CONNECTOR_TYPE_CPU);
> > > > > +if (ret < 0) {
> > > > > +error_report("Couldn't set up CPU DR device tree 
> > > > > properties");
> > > > > +exit(1);
> > > > > +}
> > > > > +}
> > > > > +
> > > > >  _FDT((fdt_pack(fdt)));
> > > > >  
> > > > >  if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
> > > > > @@ -1181,7 +1203,7 @@ static void ppc_spapr_reset(void)
> > > > >  
> > > > >  }
> > > > >  
> > > > > -static void spapr_cpu_reset(void *opaque)
> > > > > +void spapr_cpu_reset(void *opaque)
> > > > >  {
> > > > >  sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > > > >  PowerPCCPU *cpu = opaque;
> > > > > @@ -1622,6 +1644,8 @@ static void spapr_boot_set(void *opaque, const 
> > > > > char *boot_device,
> > > > >  void spapr_cpu_init(sPAPRMachineState *spapr, PowerPCCPU *cpu, Error 
> > > > > **errp)
> > > > >  {
> > > > >  CPUPPCState *env = &cpu->env;
> > > > > +CPUState *cs = CPU(cpu);
> > > > > +int i;
> > > > >  
> > > > >  /* Set time-base frequency to 512 MHz */
> > > > >  cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> > > > > @@ -1646,6 +1670,14 @@ void spapr_cpu_init(sPAPRMachineState *spapr, 
> > > > > PowerPCCPU *cpu, Error **errp)
> > > > >  }
> > > > >  }
> > > > >  
> > > > > +/* Set NUMA node for the added CPUs  */
> > > > > +for (i = 0; i < nb_numa_nodes; i++) {
> > > > > +if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
> > > > > +cs->numa_node = i;
> > > > > +break;
> > > > > +}
> > > > > +}
> > > > > +  
> > > > 
> > > > This hunk seems like it belongs in a different patch.  
> > > 
> > > It appears that this would be needed by other archs also to set the
> > > NUMA node for the hot-plugged CPU. How about make an API out of this
> > > and use this something like below ? Igor ?  
> > 
> > Is there a way we could put this in the the CPU thread initialization
> > itself?  Rather than requiring every platform to call a helper.
> I'd suggest hotplugable CPU entity to have 'node' property, like we have
> in pc-dimm.

Ok.  Do you think that makes sense for the base core type (which will
sometimes be a hotpl

Re: [Qemu-devel] [PATCH 1/8] qdict: fix unbounded stack for qdict_array_entries

2016-03-08 Thread Peter Xu

On Tue, Mar 08, 2016 at 11:19:44AM +0100, Kevin Wolf wrote:
> Am 08.03.2016 um 09:22 hat Markus Armbruster geschrieben:
> > Same arguments as for PATCH 2, except here an argument on the maximum
> > length of subqdict would probably be easier.
> 
> Yes, these are constant string literals in all callers, including the
> one non-test case in quorum.
> 
> Let's simply assert a reasonable maximum for subqdict_length. The
> minimum we need to allow with the existing callers is 9, and I expect
> we'll never get keys longer than 16 characters.

Hi, Kevin, Markus,

The patch should be trying to do as mentioned above. To make it
clearer, how about the following one:

diff --git a/qobject/qdict.c b/qobject/qdict.c
index 9833bd0..dde99e0 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -704,17 +704,16 @@ int qdict_array_entries(QDict *src, const char *subqdict)
 for (i = 0; i < INT_MAX; i++) {
 QObject *subqobj;
 int subqdict_entries;
-size_t slen = 32 + subqdict_len;
-char indexstr[slen], prefix[slen];
+char indexstr[128], prefix[128];
 size_t snprintf_ret;

-snprintf_ret = snprintf(indexstr, slen, "%s%u", subqdict, i);
-assert(snprintf_ret < slen);
+snprintf_ret = snprintf(indexstr, ARRAY_SIZE(indexstr), "%s%u", 
subqdict, i);
+assert(snprintf_ret < ARRAY_SIZE(indexstr));

 subqobj = qdict_get(src, indexstr);

-snprintf_ret = snprintf(prefix, slen, "%s%u.", subqdict, i);
-assert(snprintf_ret < slen);
+snprintf_ret = snprintf(prefix, ARRAY_SIZE(prefix), "%s%u.", subqdict, 
i);
+assert(snprintf_ret < ARRAY_SIZE(prefix));

 subqdict_entries = qdict_count_prefixed_entries(src, prefix);
 if (subqdict_entries < 0) {

Two assertions on the snprintf_ret should make sure we are safe,
right?

Thanks.
Peter

Re: [Qemu-devel] [PATCH 03/16] block: introduce bdrv_no_throttling_begin/end

2016-03-08 Thread Fam Zheng

On Tue, 02/16 18:56, Paolo Bonzini wrote:
> Extract the handling of throttling from bdrv_flush_io_queue.

Looks good overall. Have two questions below.

> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block.c   |  1 -
>  block/io.c| 56 
> +--
>  block/throttle-groups.c   |  4 
>  include/block/block_int.h |  6 ++---
>  4 files changed, 42 insertions(+), 25 deletions(-)
> 
> diff --git a/block.c b/block.c
> index efc3c43..b4f328f 100644
> --- a/block.c
> +++ b/block.c
> @@ -2314,7 +2314,6 @@ static void swap_feature_fields(BlockDriverState 
> *bs_top,
>  
>  assert(!bs_new->throttle_state);
>  if (bs_top->throttle_state) {
> -assert(bs_top->io_limits_enabled);
>  bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
>  bdrv_io_limits_disable(bs_top);
>  }
> diff --git a/block/io.c b/block/io.c
> index 5ee5032..272caac 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -69,28 +69,43 @@ void bdrv_set_io_limits(BlockDriverState *bs,
>  throttle_group_config(bs, cfg);
>  }
>  
> -static void bdrv_start_throttled_reqs(BlockDriverState *bs)
> +static void bdrv_no_throttling_begin(BlockDriverState *bs)
>  {
> -bool enabled = bs->io_limits_enabled;
> +BdrvChild *child;
> +
> +QLIST_FOREACH(child, &bs->children, next) {
> +bdrv_no_throttling_begin(child->bs);
> +}
>  
> -bs->io_limits_enabled = false;
> -throttle_group_restart_bs(bs);
> -bs->io_limits_enabled = enabled;
> +if (bs->io_limits_disabled++ == 0) {
> +throttle_group_restart_bs(bs);
> +}
> +}
> +
> +static void bdrv_no_throttling_end(BlockDriverState *bs)
> +{
> +BdrvChild *child;
> +
> +--bs->io_limits_disabled;
> +
> +QLIST_FOREACH(child, &bs->children, next) {
> +bdrv_no_throttling_end(child->bs);
> +}
>  }
>  
>  void bdrv_io_limits_disable(BlockDriverState *bs)
>  {
> -bs->io_limits_enabled = false;
> -bdrv_start_throttled_reqs(bs);
> +assert(bs->throttle_state);
> +bdrv_no_throttling_begin(bs);
>  throttle_group_unregister_bs(bs);
> +bdrv_no_throttling_end(bs);
>  }
>  
>  /* should be called before bdrv_set_io_limits if a limit is set */
>  void bdrv_io_limits_enable(BlockDriverState *bs, const char *group)
>  {
> -assert(!bs->io_limits_enabled);
> +assert(!bs->throttle_state);
>  throttle_group_register_bs(bs, group);
> -bs->io_limits_enabled = true;
>  }
>  
>  void bdrv_io_limits_update_group(BlockDriverState *bs, const char *group)
> @@ -255,6 +270,7 @@ void bdrv_drain(BlockDriverState *bs)
>  {
>  bool busy = true;
>  
> +bdrv_no_throttling_begin(bs);
>  bdrv_drain_recurse(bs);
>  while (busy) {
>  /* Keep iterating */
> @@ -262,6 +278,7 @@ void bdrv_drain(BlockDriverState *bs)
>   busy = bdrv_requests_pending(bs);
>   busy |= aio_poll(bdrv_get_aio_context(bs), busy);
>  }
> +bdrv_no_throttling_end(bs);
>  }
>  
>  /*
> @@ -284,6 +301,7 @@ void bdrv_drain_all(void)
>  if (bs->job) {
>  block_job_pause(bs->job);
>  }
> +bdrv_no_throttling_begin(bs);
>  bdrv_drain_recurse(bs);
>  aio_context_release(aio_context);
>  
> @@ -325,6 +343,7 @@ void bdrv_drain_all(void)
>  AioContext *aio_context = bdrv_get_aio_context(bs);
>  
>  aio_context_acquire(aio_context);
> +bdrv_no_throttling_end(bs);
>  if (bs->job) {
>  block_job_resume(bs->job);
>  }
> @@ -555,11 +574,7 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
> offset,
>   * will not fire; so the I/O throttling function has to be disabled here
>   * if it has been enabled.
>   */
> -if (bs->io_limits_enabled) {
> -fprintf(stderr, "Disabling I/O throttling on '%s' due "
> -"to synchronous I/O.\n", bdrv_get_device_name(bs));
> -bdrv_io_limits_disable(bs);
> -}
> +bdrv_no_throttling_begin(bs);
>  
>  if (qemu_in_coroutine()) {
>  /* Fast-path if already in coroutine context */
> @@ -573,6 +588,8 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t 
> offset,
>  aio_poll(aio_context, true);
>  }
>  }
> +
> +bdrv_no_throttling_end(bs);

Does this change the behavior? There wasn't a bdrv_io_limits_enable() here, and
the throttle doesn't come back automatically. Just want to make sure it's
intended.

>  return rwco.ret;
>  }
>  
> @@ -608,13 +625,11 @@ int bdrv_read(BlockDriverState *bs, int64_t sector_num,
>  int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
>uint8_t *buf, int nb_sectors)
>  {
> -bool enabled;
>  int ret;
>  
> -enabled = bs->io_limits_enabled;
> -bs->io_limits_enabled = false;
> +bdrv_no_throttling_begin(bs);
>  ret = bdrv_read(bs, sector_num, buf, nb_sectors);
> -bs->io_limits_enabled = enabled;
> +bdrv_

Re: [Qemu-devel] [PATCHv3 1/7] vfio: Start improving VFIO/EEH interface

2016-03-08 Thread Alex Williamson

On Wed, 9 Mar 2016 11:56:57 +1100
David Gibson  wrote:

> On Tue, Mar 08, 2016 at 11:33:45AM -0700, Alex Williamson wrote:
> > On Tue,  8 Mar 2016 13:10:23 +1100
> > David Gibson  wrote:
> >   
> > > At present the code handling IBM's Enhanced Error Handling (EEH) interface
> > > on VFIO devices operates by bypassing the usual VFIO logic with
> > > vfio_container_ioctl().  That's a poorly designed interface with unclear
> > > semantics about exactly what can be operated on.
> > > 
> > > In particular it operates on a single vfio container internally (hence the
> > > name), but takes an address space and group id, from which it deduces the
> > > container in a rather roundabout way.  groupids are something that code
> > > outside vfio shouldn't even be aware of.
> > > 
> > > This patch creates new interfaces for EEH operations.  Internally we
> > > have vfio_eeh_container_op() which takes a VFIOContainer object
> > > directly.  For external use we have vfio_eeh_as_ok() which determines
> > > if an AddressSpace is usable for EEH (at present this means it has a
> > > single container with exactly one group attached), and vfio_eeh_as_op()
> > > which will perform an operation on an AddressSpace in the unambiguous 
> > > case,
> > > and otherwise returns an error.
> > > 
> > > This interface still isn't great, but it's enough of an improvement to
> > > allow a number of cleanups in other places.
> > > 
> > > Signed-off-by: David Gibson 
> > > Reviewed-by: Alexey Kardashevskiy 
> > > ---  
> > 
> > I'll let you push this through your tree:
> > 
> > Acked-by: Alex Williamson   
> 
> Thanks.  Any guess at when your vGPU series will be pushed?  Mine will
> conflict until that is merged upstream.

It's been out long enough, I'll send a pull request tomorrow.  Thanks,

Alex

Re: [Qemu-devel] [PULL 00/14] Net patches

2016-03-08 Thread Wen Congyang

On 03/08/2016 05:54 PM, Peter Maydell wrote:
> On 8 March 2016 at 16:06, Zhang Chen  wrote:
>> I found the reason for this problem is that
>> unix_connect() have not connect to sock_path before iov_send().
>> It need time to establish connection. so can we fix it with usleep()
>> like this:
>>
>> recv_sock = unix_connect(sock_path, NULL);
>> g_assert_cmpint(recv_sock, !=, -1);
>> +usleep(1000);
>>
>> ret = iov_send(send_sock[0], iov, 2, 0, sizeof(size) +
>> sizeof(send_buf));
>> g_assert_cmpint(ret, ==, sizeof(send_buf) + sizeof(size));
>> close(send_sock[0]);
>>
>> ret = qemu_recv(recv_sock, &len, sizeof(len), 0);
> 
> I would prefer it if we could find a way to fix this race
> reliably rather than just inserting a delay and hoping it
> is sufficient. Otherwise the test is likely to be unreliable
> if run on a heavily loaded or slow machine.

Yes, but there is no way to know when tcp_chr_accept() is called. Add a event
to notify it?

Thanks
Wen Congyang

> 
> thanks
> -- PMM
> 
> 
>

Re: [Qemu-devel] [PATCH v2 05/11] vfio: add check host bus reset is support or not

2016-03-08 Thread Chen Fan



On 03/09/2016 06:55 AM, Alex Williamson wrote:

On Mon, 7 Mar 2016 11:22:58 +0800
Cao jin  wrote:


From: Chen Fan 

when boot up a VM that assigning vfio devices with aer enabled, we
must check the vfio device whether support host bus reset. because
when one error occur. OS driver always recover the device by do a
bus reset, in order to recover the vfio device, qemu must to do a
host bus reset to reset the device to default status. and for all
affected devices by the bus reset. we must check them whether all
are assigned to the VM.

Signed-off-by: Chen Fan 
---
  hw/vfio/pci.c | 218 --
  hw/vfio/pci.h |   1 +
  2 files changed, 212 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8ec9b25..0898e34 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1868,6 +1868,197 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, 
uint8_t pos)
  return 0;
  }
  
+static bool vfio_pci_host_slot_match(PCIHostDeviceAddress *host1,

+ PCIHostDeviceAddress *host2)
+{
+return (host1->domain == host2->domain && host1->bus == host2->bus &&
+host1->slot == host2->slot);
+}
+
+static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
+PCIHostDeviceAddress *host2)
+{
+return (vfio_pci_host_slot_match(host1, host2) &&
+host1->function == host2->function);
+}
+
+struct VFIODeviceFind {
+PCIDevice *pdev;
+bool found;
+};
+
+static void vfio_check_device_noreset(PCIBus *bus, PCIDevice *pdev,
+  void *opaque)
+{
+DeviceState *dev = DEVICE(pdev);
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
+VFIOPCIDevice *vdev;
+struct VFIODeviceFind *find = opaque;
+
+if (find->found) {
+return;
+}
+
+if (!object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
+if (!dc->reset) {
+goto found;
+}
+return;
+}
+vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
+if (!(vdev->features & VFIO_FEATURE_ENABLE_AER) &&
+!vdev->vbasedev.reset_works) {
+goto found;
+}
+
+return;
+found:
+find->pdev = pdev;
+find->found = true;
+}
+
+static void vfio_check_host_bus_reset(VFIOPCIDevice *vdev, Error **errp)
+{
+PCIBus *bus = vdev->pdev.bus;
+struct vfio_pci_hot_reset_info *info = NULL;
+struct vfio_pci_dependent_device *devices;
+VFIOGroup *group;
+struct VFIODeviceFind find;
+int ret, i;
+
+ret = vfio_get_hot_reset_info(vdev, &info);
+if (ret) {
+error_setg(errp, "vfio: Cannot enable AER for device %s,"
+   " device does not support hot reset.",
+   vdev->vbasedev.name);
+return;
+}
+
+/* List all affected devices by bus reset */
+devices = &info->devices[0];
+
+/* Verify that we have all the groups required */
+for (i = 0; i < info->count; i++) {
+PCIHostDeviceAddress host;
+VFIOPCIDevice *tmp;
+VFIODevice *vbasedev_iter;
+bool found = false;
+
+host.domain = devices[i].segment;
+host.bus = devices[i].bus;
+host.slot = PCI_SLOT(devices[i].devfn);
+host.function = PCI_FUNC(devices[i].devfn);
+
+/* Skip the current device */
+if (vfio_pci_host_match(&host, &vdev->host)) {
+continue;
+}
+
+/* Ensure we own the group of the affected device */
+QLIST_FOREACH(group, &vfio_group_list, next) {
+if (group->groupid == devices[i].group_id) {
+break;
+}
+}
+
+if (!group) {
+error_setg(errp, "vfio: Cannot enable AER for device %s, "
+   "depends on group %d which is not owned.",
+   vdev->vbasedev.name, devices[i].group_id);
+goto out;
+}
+
+/* Ensure affected devices for reset on the same slot */
+QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+continue;
+}
+tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+if (vfio_pci_host_match(&host, &tmp->host)) {
+/*
+ * AER errors may be broadcast to all functions of a multi-
+ * function endpoint.  If any of those sibling functions are
+ * also assigned, they need to have AER enabled or else an
+ * error may continue to cause a vm_stop condition.  IOW,
+ * AER setup of this function would be pointless.
+ */
+if (vfio_pci_host_slot_match(&vdev->host, &tmp->host) &&
+!(tmp->features & VFIO_FEATURE_ENABLE_AER)) {
+error_setg(errp, "vfio: Cannot enable AER for device %s, on 
same slot"
+   " the dependent device %s which does

Re: [Qemu-devel] [PATCH v2 04/11] vfio: add aer support for vfio device

2016-03-08 Thread Chen Fan



On 03/09/2016 06:55 AM, Alex Williamson wrote:

On Mon, 7 Mar 2016 11:22:57 +0800
Cao jin  wrote:


From: Chen Fan 

Calling pcie_aer_init to initilize aer related registers for
vfio device, then reload physical related registers to expose
device capability.

Signed-off-by: Chen Fan 
---
  hw/vfio/pci.c | 81 ---
  hw/vfio/pci.h |  3 +++
  2 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e64cce3..8ec9b25 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1868,6 +1868,62 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos)
  return 0;
  }
  
+static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,

+  int pos, uint16_t size)
+{
+PCIDevice *pdev = &vdev->pdev;
+PCIDevice *dev_iter;
+uint8_t type;
+uint32_t errcap;
+
+if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
+pcie_add_capability(pdev, PCI_EXT_CAP_ID_ERR,
+cap_ver, pos, size);
+return 0;
+}
+
+dev_iter = pci_bridge_get_device(pdev->bus);
+if (!dev_iter) {
+goto error;
+}
+
+while (dev_iter) {
+type = pcie_cap_get_type(dev_iter);

This asserts if dev_iter doesn't have an express capability so do we
really ever get to the error goto in practice?  I think an average user
is going to get more information from your error_report than from an
assert.  Thanks,

you are right, I will fix it in the next version.

Thanks,
Chen



Alex


+if ((type != PCI_EXP_TYPE_ROOT_PORT &&
+ type != PCI_EXP_TYPE_UPSTREAM &&
+ type != PCI_EXP_TYPE_DOWNSTREAM)) {
+goto error;
+}
+
+if (!dev_iter->exp.aer_cap) {
+goto error;
+}
+
+dev_iter = pci_bridge_get_device(dev_iter->bus);
+}
+
+errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
+/*
+ * The ability to record multiple headers is depending on
+ * the state of the Multiple Header Recording Capable bit and
+ * enabled by the Multiple Header Recording Enable bit.
+ */
+if ((errcap & PCI_ERR_CAP_MHRC) &&
+(errcap & PCI_ERR_CAP_MHRE)) {
+pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
+} else {
+pdev->exp.aer_log.log_max = 0;
+}
+
+pcie_cap_deverr_init(pdev);
+return pcie_aer_init(pdev, pos, size);
+
+error:
+error_report("vfio: Unable to enable AER for device %s, parent bus "
+ "does not support AER signaling", vdev->vbasedev.name);
+return -1;
+}
+
  static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
  {
  PCIDevice *pdev = &vdev->pdev;
@@ -1875,6 +1931,7 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
  uint16_t cap_id, next, size;
  uint8_t cap_ver;
  uint8_t *config;
+int ret = 0;
  
  /*

   * pcie_add_capability always inserts the new capability at the tail
@@ -1898,16 +1955,29 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
   */
  size = vfio_ext_cap_max_size(config, next);
  
-pcie_add_capability(pdev, cap_id, cap_ver, next, size);

-pci_set_long(dev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
+switch (cap_id) {
+case PCI_EXT_CAP_ID_ERR:
+ret = vfio_setup_aer(vdev, cap_ver, next, size);
+break;
+default:
+pcie_add_capability(pdev, cap_id, cap_ver, next, size);
+break;
+}
+
+if (ret) {
+goto out;
+}
+
+pci_set_long(pdev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
  
  /* Use emulated next pointer to allow dropping extended caps */

  pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
 PCI_EXT_CAP_NEXT_MASK);
  }
  
+out:

  g_free(config);
-return 0;
+return ret;
  }
  
  static int vfio_add_capabilities(VFIOPCIDevice *vdev)

@@ -2662,6 +2732,11 @@ static int vfio_initfn(PCIDevice *pdev)
  goto out_teardown;
  }
  
+if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&

+!pdev->exp.aer_cap) {
+goto out_teardown;
+}
+
  /* QEMU emulates all of MSI & MSIX */
  if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
  memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 6256587..e0c53f2 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -15,6 +15,7 @@
  #include "qemu-common.h"
  #include "exec/memory.h"
  #include "hw/pci/pci.h"
+#include "hw/pci/pci_bridge.h"
  #include "hw/vfio/vfio-common.h"
  #include "qemu/event_notifier.h"
  #include "qemu/queue.h"
@@ -128,6 +129,8 @@ typedef struct VFIOPCIDevice {
  #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
  #define VFIO_FEATURE_ENABLE_REQ_BIT 1
  #define VFIO_FEATURE_ENABLE_REQ (1 << VFIO_FEATURE_ENABLE_REQ_BIT)
+#define VFIO_FEATURE_ENABLE_AER_BIT 2
+#

Re: [Qemu-devel] [PATCH 02/16] block: move restarting of throttled reqs to block/throttle-groups.c

2016-03-08 Thread Fam Zheng

On Tue, 02/16 18:56, Paolo Bonzini wrote:
> We want to remove throttled_reqs from block/io.c.  This is the easy
> part---hide the handling of throttled_reqs during disable/enable of
> throttling within throttle-groups.c.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/io.c  | 15 +--
>  block/throttle-groups.c | 15 +++
>  include/block/throttle-groups.h |  1 +
>  3 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index e58cfe2..5ee5032 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -66,28 +66,15 @@ static int coroutine_fn 
> bdrv_co_do_write_zeroes(BlockDriverState *bs,
>  void bdrv_set_io_limits(BlockDriverState *bs,
>  ThrottleConfig *cfg)
>  {
> -int i;
> -
>  throttle_group_config(bs, cfg);
> -
> -for (i = 0; i < 2; i++) {
> -qemu_co_enter_next(&bs->throttled_reqs[i]);
> -}
>  }
>  
>  static void bdrv_start_throttled_reqs(BlockDriverState *bs)
>  {
>  bool enabled = bs->io_limits_enabled;
> -int i;
>  
>  bs->io_limits_enabled = false;
> -
> -for (i = 0; i < 2; i++) {
> -while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
> -;
> -}
> -}
> -
> +throttle_group_restart_bs(bs);
>  bs->io_limits_enabled = enabled;
>  }
>  
> diff --git a/block/throttle-groups.c b/block/throttle-groups.c
> index 4920e09..eccfc0d 100644
> --- a/block/throttle-groups.c
> +++ b/block/throttle-groups.c
> @@ -313,6 +313,17 @@ void coroutine_fn 
> throttle_group_co_io_limits_intercept(BlockDriverState *bs,
>  qemu_mutex_unlock(&tg->lock);
>  }
>  
> +void throttle_group_restart_bs(BlockDriverState *bs)
> +{
> +int i;
> +
> +for (i = 0; i < 2; i++) {
> +while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
> +;
> +}
> +}
> +}
> +
>  /* Update the throttle configuration for a particular group. Similar
>   * to throttle_config(), but guarantees atomicity within the
>   * throttling group.
> @@ -335,6 +346,10 @@ void throttle_group_config(BlockDriverState *bs, 
> ThrottleConfig *cfg)
>  }
>  throttle_config(ts, tt, cfg);
>  qemu_mutex_unlock(&tg->lock);
> +
> +aio_context_acquire(bdrv_get_aio_context(bs));
> +throttle_group_restart_bs(bs);
> +aio_context_release(bdrv_get_aio_context(bs));

Could you explain why does this hunk belong to this patch?

Otherwise looks good.

Fam

>  }
>  
>  /* Get the throttle configuration from a particular group. Similar to
> diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
> index aba28f3..395f72d 100644
> --- a/include/block/throttle-groups.h
> +++ b/include/block/throttle-groups.h
> @@ -38,6 +38,7 @@ void throttle_group_get_config(BlockDriverState *bs, 
> ThrottleConfig *cfg);
>  
>  void throttle_group_register_bs(BlockDriverState *bs, const char *groupname);
>  void throttle_group_unregister_bs(BlockDriverState *bs);
> +void throttle_group_restart_bs(BlockDriverState *bs);
>  
>  void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
>  unsigned int bytes,
> -- 
> 2.5.0
> 
> 
>

Re: [Qemu-devel] [PATCH qemu] spapr-pci: Make MMIO spacing a machine property and increase it

2016-03-08 Thread David Gibson

On Tue, Mar 08, 2016 at 10:50:51AM +1100, Alexey Kardashevskiy wrote:
> On 03/04/2016 03:13 PM, Alexey Kardashevskiy wrote:
> >On 03/04/2016 02:39 PM, David Gibson wrote:
> >>On Thu, Mar 03, 2016 at 12:42:53PM +1100, Alexey Kardashevskiy wrote:
> >>>The pseries machine supports multiple PHBs. Each PHB's MMIO/IO space is
> >>>mapped to the CPU address space starting at SPAPR_PCI_WINDOW_BASE plus
> >>>some offset which is calculated from PHB's index and
> >>>SPAPR_PCI_WINDOW_SPACING which is defined now as 64GB.
> >>>
> >>>Since the default 32bit DMA window is using first 2GB of MMIO space,
> >>>the amount of MMIO which the PCI devices can actually use is reduced
> >>>to 62GB. This is a problem if the user wants to use devices with
> >>>huge BARs.
> >>>
> >>>For example, 2 PCI functions of a NVIDIA K80 adapter being passed through
> >>>will exceed this limit as they have 16M + 16G + 32M BARs which
> >>>(when aligned) will need 64GB.
> >>>
> >>>This converts SPAPR_PCI_WINDOW_BASE and SPAPR_PCI_WINDOW_SPACING to
> >>>sPAPRMachineState properties. This uses old values for pseries machine
> >>>before 2.6 and increases the spacing to 128GB so MMIO space becomes 126GB.
> >>>
> >>>This changes the default value of sPAPRPHBState::mem_win_size to -1 for
> >>>pseries-2.6 and adds setup to spapr_phb_realize.
> >>>
> >>>Signed-off-by: Alexey Kardashevskiy 
> >>
> >>So, in theory I dislike the spapr_pci device reaching into the machine
> >>type to get the spacing configuration.  But.. I don't know of a better
> >>way to achieve the desired outcome.
> >
> >
> >We could drop @index and spacing; and request the user to specify the MMIO
> >window start (at least) for every additional PHB.
> 
> So what is the decision? :)

There isn't one.  I really don't know how to handle this, trying to
talk to some people for ideas.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PULL 0/6] acpi: add fw_cfg device node to dsdt

2016-03-08 Thread Peter Maydell

On 8 March 2016 at 18:18, Gerd Hoffmann  wrote:
>   Hi,
>
> Finally going to merge this.  Figured the acpi test issue, I've turned
> off iasl in my build scripts a while ago, probably to workaround
> something, then forgot about it.  So with that fixed I could
> successfully update the acpi test data, which was the last thing
> blocking the merge.
>
> please pull,
>   Gerd
>
> The following changes since commit 97556fe80e4f7252300b3498b3477fb4295153a3:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2016-03-08 04:53:37 +)
>
> are available in the git repository at:
>
>
>   git://git.kraxel.org/qemu tags/pull-fw-cfg-20160308-1
>
> for you to fetch changes up to a60c7856088b75b402671de74bf9c5cfce87dfbb:
>
>   tests: update acpi test data (2016-03-08 12:15:27 +0100)
>
> 
> acpi: add fw_cfg device node to dsdt
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCHv3 1/7] vfio: Start improving VFIO/EEH interface

2016-03-08 Thread David Gibson

On Tue, Mar 08, 2016 at 11:33:45AM -0700, Alex Williamson wrote:
> On Tue,  8 Mar 2016 13:10:23 +1100
> David Gibson  wrote:
> 
> > At present the code handling IBM's Enhanced Error Handling (EEH) interface
> > on VFIO devices operates by bypassing the usual VFIO logic with
> > vfio_container_ioctl().  That's a poorly designed interface with unclear
> > semantics about exactly what can be operated on.
> > 
> > In particular it operates on a single vfio container internally (hence the
> > name), but takes an address space and group id, from which it deduces the
> > container in a rather roundabout way.  groupids are something that code
> > outside vfio shouldn't even be aware of.
> > 
> > This patch creates new interfaces for EEH operations.  Internally we
> > have vfio_eeh_container_op() which takes a VFIOContainer object
> > directly.  For external use we have vfio_eeh_as_ok() which determines
> > if an AddressSpace is usable for EEH (at present this means it has a
> > single container with exactly one group attached), and vfio_eeh_as_op()
> > which will perform an operation on an AddressSpace in the unambiguous case,
> > and otherwise returns an error.
> > 
> > This interface still isn't great, but it's enough of an improvement to
> > allow a number of cleanups in other places.
> > 
> > Signed-off-by: David Gibson 
> > Reviewed-by: Alexey Kardashevskiy 
> > ---
> 
> I'll let you push this through your tree:
> 
> Acked-by: Alex Williamson 

Thanks.  Any guess at when your vGPU series will be pushed?  Mine will
conflict until that is merged upstream.

> 
> >  hw/vfio/common.c   | 95 
> > ++
> >  include/hw/vfio/vfio.h |  2 ++
> >  2 files changed, 97 insertions(+)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 96ccb79..0636bb1 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -1137,3 +1137,98 @@ int vfio_container_ioctl(AddressSpace *as, int32_t 
> > groupid,
> >  
> >  return vfio_container_do_ioctl(as, groupid, req, param);
> >  }
> > +
> > +/*
> > + * Interfaces for IBM EEH (Enhanced Error Handling)
> > + */
> > +static bool vfio_eeh_container_ok(VFIOContainer *container)
> > +{
> > +/*
> > + * As of 2016-03-04 (linux-4.5) the host kernel EEH/VFIO
> > + * implementation is broken if there are multiple groups in a
> > + * container.  The hardware works in units of Partitionable
> > + * Endpoints (== IOMMU groups) and the EEH operations naively
> > + * iterate across all groups in the container, without any logic
> > + * to make sure the groups have their state synchronized.  For
> > + * certain operations (ENABLE) that might be ok, until an error
> > + * occurs, but for others (GET_STATE) it's clearly broken.
> > + */
> > +
> > +/*
> > + * XXX Once fixed kernels exist, test for them here
> > + */
> > +
> > +if (QLIST_EMPTY(&container->group_list)) {
> > +return false;
> > +}
> > +
> > +if (QLIST_NEXT(QLIST_FIRST(&container->group_list), container_next)) {
> > +return false;
> > +}
> > +
> > +return true;
> > +}
> > +
> > +static int vfio_eeh_container_op(VFIOContainer *container, uint32_t op)
> > +{
> > +struct vfio_eeh_pe_op pe_op = {
> > +.argsz = sizeof(pe_op),
> > +.op = op,
> > +};
> > +int ret;
> > +
> > +if (!vfio_eeh_container_ok(container)) {
> > +error_report("vfio/eeh: EEH_PE_OP 0x%x: "
> > + "kernel requires a container with exactly one group", 
> > op);
> > +return -EPERM;
> > +}
> > +
> > +ret = ioctl(container->fd, VFIO_EEH_PE_OP, &pe_op);
> > +if (ret < 0) {
> > +error_report("vfio/eeh: EEH_PE_OP 0x%x failed: %m", op);
> > +return -errno;
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +static VFIOContainer *vfio_eeh_as_container(AddressSpace *as)
> > +{
> > +VFIOAddressSpace *space = vfio_get_address_space(as);
> > +VFIOContainer *container = NULL;
> > +
> > +if (QLIST_EMPTY(&space->containers)) {
> > +/* No containers to act on */
> > +goto out;
> > +}
> > +
> > +container = QLIST_FIRST(&space->containers);
> > +
> > +if (QLIST_NEXT(container, next)) {
> > +/* We don't yet have logic to synchronize EEH state across
> > + * multiple containers */
> > +container = NULL;
> > +goto out;
> > +}
> > +
> > +out:
> > +vfio_put_address_space(space);
> > +return container;
> > +}
> > +
> > +bool vfio_eeh_as_ok(AddressSpace *as)
> > +{
> > +VFIOContainer *container = vfio_eeh_as_container(as);
> > +
> > +return (container != NULL) && vfio_eeh_container_ok(container);
> > +}
> > +
> > +int vfio_eeh_as_op(AddressSpace *as, uint32_t op)
> > +{
> > +VFIOContainer *container = vfio_eeh_as_container(as);
> > +
> > +if (!container) {
> > +return -ENODEV;
> > +}
> > +return vfio_eeh_container_op(container, op);
>

Re: [Qemu-devel] [PULL 0/1] rng: use simpleq instead of gslist

2016-03-08 Thread Peter Maydell

On 8 March 2016 at 17:54, Amit Shah  wrote:
> The following changes since commit 97556fe80e4f7252300b3498b3477fb4295153a3:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2016-03-08 04:53:37 +)
>
> are available in the git repository at:
>
>   https://git.kernel.org/pub/scm/virt/qemu/amit/virtio-rng.git 
> tags/rng-for-2.6-2
>
> for you to fetch changes up to 443590c2044968a97f5e7cddd35100c6075856a4:
>
>   rng: switch request queue to QSIMPLEQ (2016-03-08 12:54:14 +0530)
>
> 
> rng: use simpleq instead of gslist
>
> 
>
> Ladi Prosek (1):
>   rng: switch request queue to QSIMPLEQ
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v2 3/3] vmdk: Switch to heap arrays for vmdk_parent_open

2016-03-08 Thread Fam Zheng

On Tue, 03/08 16:24, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> ---
>  block/vmdk.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index c68f456..03be7f0 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -343,15 +343,16 @@ static int vmdk_reopen_prepare(BDRVReopenState *state,
>  static int vmdk_parent_open(BlockDriverState *bs)
>  {
>  char *p_name;
> -char desc[DESC_SIZE + 1];
> +char *desc;
>  BDRVVmdkState *s = bs->opaque;
>  int ret;
>  
> -desc[DESC_SIZE] = '\0';
> +desc = g_malloc0(DESC_SIZE + 1);
>  ret = bdrv_pread(bs->file->bs, s->desc_offset, desc, DESC_SIZE);
>  if (ret < 0) {
> -return ret;
> +goto out;
>  }
> +ret = 0;

Kevin, were you referring to this "ret = 0" in the cover letter? If so, it is
not useless, because ret was set to DESC_SIZE by bdrv_pread. :)

Fam

[Qemu-devel] [PULL] MAINTAINERS: Add Samuel Thibault as slirp maintainer

2016-03-08 Thread Samuel Thibault

The following changes since commit 1464ad45cd6cdeb0b5c1a54d3d3791396e47e52f:

  Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-03-04' into 
staging (2016-03-06 11:53:27 +)

are available in the git repository at:

  http://people.debian.org/~sthibault/qemu.git tags/samuel-thibault

for you to fetch changes up to eda509fa0ac7d4870b7ff206fd6c33cdd06c2d55:

  MAINTAINERS: Add Samuel Thibault as slirp maintainer (2016-03-08 21:39:04 
+0100)


Add Samuel Thibault as slirp maintainer


Samuel Thibault (1):
  MAINTAINERS: Add Samuel Thibault as slirp maintainer

 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

Re: [Qemu-devel] [PULL] MAINTAINERS: Add Samuel Thibault as slirp maintainer

2016-03-08 Thread Samuel Thibault

Peter Maydell, on Wed 09 Mar 2016 07:20:44 +0700, wrote:
> Shouldn't there be some Acked-by: lines on this?

Right, sent again.

Samuel

[Qemu-devel] [PULL] MAINTAINERS: Add Samuel Thibault as slirp maintainer

2016-03-08 Thread Samuel Thibault

From: Samuel Thibault 

Signed-off-by: Samuel Thibault 
Acked-by: Jan Kiszka 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2f5a338..a316bc3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1217,6 +1217,7 @@ F: scripts/qmp/
 T: git git://repo.or.cz/qemu/armbru.git qapi-next
 
 SLIRP
+M: Samuel Thibault 
 M: Jan Kiszka 
 S: Maintained
 F: slirp/
-- 
2.7.0

Re: [Qemu-devel] [PULL] MAINTAINERS: Add Samuel Thibault as slirp maintainer

2016-03-08 Thread Peter Maydell

On 8 March 2016 at 17:05, Samuel Thibault  wrote:
> From: Samuel Thibault 
>
> Signed-off-by: Samuel Thibault 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2f5a338..a316bc3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1217,6 +1217,7 @@ F: scripts/qmp/
>  T: git git://repo.or.cz/qemu/armbru.git qapi-next
>
>  SLIRP
> +M: Samuel Thibault 
>  M: Jan Kiszka 
>  S: Maintained
>  F: slirp/
> --
> 2.7.0

Shouldn't there be some Acked-by: lines on this?

(Sorry I missed that first time around.)

thanks
-- PMM

Re: [Qemu-devel] [PULL V2 00/12] Net patches

2016-03-08 Thread Peter Maydell

On 8 March 2016 at 14:52, Jason Wang  wrote:
> The following changes since commit 97556fe80e4f7252300b3498b3477fb4295153a3:
>
>   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
> staging (2016-03-08 04:53:37 +)
>
> are available in the git repository at:
>
>   https://github.com/jasowang/qemu.git tags/net-pull-request
>
> for you to fetch changes up to 362786f14a753d8a5256ef97d7c10ed576d6572b:
>
>   net: check packet payload length (2016-03-08 15:34:18 +0800)
>
> 
>
> - netfilter could be disabled and enabled through qom-set now
> - fix netfilter crash when specifiying wrong parameters
> - rocker switch now can allow user to specifiy world
> - fix OOB access for ne2000
>
> Changes from V1:
> - drop mirror from the pull request
>
> 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH] loader: Fix incorrect parameter name in load_image_mr() macro

2016-03-08 Thread Laszlo Ersek

On 03/08/16 01:23, Peter Maydell wrote:
> From: Jens Wiklander 
> 
> Fix a typo in the load_image_mr() macro: 'mr' was written when
> the parameter name is '_mr'. (This had no visible effects since
> the single use of the macro used 'mr' as the argument.)
> 
> Fixes 76151cacfe956248a25b38b5e8429465584f47bb "loader: Add
> load_image_mr() to load ROM image to a MemoryRegion"
> 
> Signed-off-by: Jens Wiklander 
> Reviewed-by: Peter Maydell 
> [PMM: tweaked commit message]
> Signed-off-by: Peter Maydell 
> ---
> Oops...
> 
>  include/hw/loader.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/hw/loader.h b/include/hw/loader.h
> index 0ba7808..b3d1358 100644
> --- a/include/hw/loader.h
> +++ b/include/hw/loader.h
> @@ -137,7 +137,7 @@ void hmp_info_roms(Monitor *mon, const QDict *qdict);
>  #define rom_add_blob_fixed(_f, _b, _l, _a)  \
>  rom_add_blob(_f, _b, _l, _l, _a, NULL, NULL, NULL)
>  #define rom_add_file_mr(_f, _mr, _i)\
> -rom_add_file(_f, NULL, 0, _i, false, mr)
> +rom_add_file(_f, NULL, 0, _i, false, _mr)
>  
>  #define PC_ROM_MIN_VGA 0xc
>  #define PC_ROM_MIN_OPTION  0xc8000
> 

Reviewed-by: Laszlo Ersek

Re: [Qemu-devel] [PATCH 1/2] i386: Prepare for interrupt remapping

2016-03-08 Thread Eric Blake

On 03/08/2016 04:07 PM, Eric Blake wrote:
> On 03/08/2016 12:28 PM, Rita Sinha wrote:
>> From: Jan Kiszka 
>>
>> Introduce a DMA default target address space for PCI devices. Catch all
> 
> [meta-comment]: When sending a series, both messages should be
> In-Reply-To a 0/2 cover letter, rather than separate top-level threads.
> 

> I'll leave review of the patch proper to someone more familiar with the
> code.

Oh, and one more comment:

> Signed-off-by: Rita Sinha 

The email claims that 'Jan Kiszka' is the author; if that's true, then
Jan needs to also have an S-o-b.  If you are the author instead, then
you should fix your From: attribution.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/2] i386: Prepare for interrupt remapping

2016-03-08 Thread Eric Blake

On 03/08/2016 12:28 PM, Rita Sinha wrote:
> From: Jan Kiszka 
> 
> Introduce a DMA default target address space for PCI devices. Catch all

[meta-comment]: When sending a series, both messages should be
In-Reply-To a 0/2 cover letter, rather than separate top-level threads.

> interrupt requests to the front-side bus via an MSI memory region that
> is part of that address space. Provide separate address spaces for
> IOAPIC and HPET if the IOMMU is active to prepare for adding remapping
> logic later on. Deliver the EDID from the IOAPIC for the same reason.
> 
> Allows to remove the MSI hack from the APIC MMIO write handler.

Awkward grammar, would sound better as:

Allows us to remove the MSI hack...

or

Allows the removal of the MSI hack...

I'll leave review of the patch proper to someone more familiar with the
code.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 04/11] vfio: add aer support for vfio device

2016-03-08 Thread Alex Williamson

On Mon, 7 Mar 2016 11:22:57 +0800
Cao jin  wrote:

> From: Chen Fan 
> 
> Calling pcie_aer_init to initilize aer related registers for
> vfio device, then reload physical related registers to expose
> device capability.
> 
> Signed-off-by: Chen Fan 
> ---
>  hw/vfio/pci.c | 81 
> ---
>  hw/vfio/pci.h |  3 +++
>  2 files changed, 81 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e64cce3..8ec9b25 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1868,6 +1868,62 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, 
> uint8_t pos)
>  return 0;
>  }
>  
> +static int vfio_setup_aer(VFIOPCIDevice *vdev, uint8_t cap_ver,
> +  int pos, uint16_t size)
> +{
> +PCIDevice *pdev = &vdev->pdev;
> +PCIDevice *dev_iter;
> +uint8_t type;
> +uint32_t errcap;
> +
> +if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) {
> +pcie_add_capability(pdev, PCI_EXT_CAP_ID_ERR,
> +cap_ver, pos, size);
> +return 0;
> +}
> +
> +dev_iter = pci_bridge_get_device(pdev->bus);
> +if (!dev_iter) {
> +goto error;
> +}
> +
> +while (dev_iter) {
> +type = pcie_cap_get_type(dev_iter);

This asserts if dev_iter doesn't have an express capability so do we
really ever get to the error goto in practice?  I think an average user
is going to get more information from your error_report than from an
assert.  Thanks,

Alex

> +if ((type != PCI_EXP_TYPE_ROOT_PORT &&
> + type != PCI_EXP_TYPE_UPSTREAM &&
> + type != PCI_EXP_TYPE_DOWNSTREAM)) {
> +goto error;
> +}
> +
> +if (!dev_iter->exp.aer_cap) {
> +goto error;
> +}
> +
> +dev_iter = pci_bridge_get_device(dev_iter->bus);
> +}
> +
> +errcap = vfio_pci_read_config(pdev, pos + PCI_ERR_CAP, 4);
> +/*
> + * The ability to record multiple headers is depending on
> + * the state of the Multiple Header Recording Capable bit and
> + * enabled by the Multiple Header Recording Enable bit.
> + */
> +if ((errcap & PCI_ERR_CAP_MHRC) &&
> +(errcap & PCI_ERR_CAP_MHRE)) {
> +pdev->exp.aer_log.log_max = PCIE_AER_LOG_MAX_DEFAULT;
> +} else {
> +pdev->exp.aer_log.log_max = 0;
> +}
> +
> +pcie_cap_deverr_init(pdev);
> +return pcie_aer_init(pdev, pos, size);
> +
> +error:
> +error_report("vfio: Unable to enable AER for device %s, parent bus "
> + "does not support AER signaling", vdev->vbasedev.name);
> +return -1;
> +}
> +
>  static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>  {
>  PCIDevice *pdev = &vdev->pdev;
> @@ -1875,6 +1931,7 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>  uint16_t cap_id, next, size;
>  uint8_t cap_ver;
>  uint8_t *config;
> +int ret = 0;
>  
>  /*
>   * pcie_add_capability always inserts the new capability at the tail
> @@ -1898,16 +1955,29 @@ static int vfio_add_ext_cap(VFIOPCIDevice *vdev)
>   */
>  size = vfio_ext_cap_max_size(config, next);
>  
> -pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> -pci_set_long(dev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
> +switch (cap_id) {
> +case PCI_EXT_CAP_ID_ERR:
> +ret = vfio_setup_aer(vdev, cap_ver, next, size);
> +break;
> +default:
> +pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> +break;
> +}
> +
> +if (ret) {
> +goto out;
> +}
> +
> +pci_set_long(pdev->config + next, PCI_EXT_CAP(cap_id, cap_ver, 0));
>  
>  /* Use emulated next pointer to allow dropping extended caps */
>  pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
> PCI_EXT_CAP_NEXT_MASK);
>  }
>  
> +out:
>  g_free(config);
> -return 0;
> +return ret;
>  }
>  
>  static int vfio_add_capabilities(VFIOPCIDevice *vdev)
> @@ -2662,6 +2732,11 @@ static int vfio_initfn(PCIDevice *pdev)
>  goto out_teardown;
>  }
>  
> +if ((vdev->features & VFIO_FEATURE_ENABLE_AER) &&
> +!pdev->exp.aer_cap) {
> +goto out_teardown;
> +}
> +
>  /* QEMU emulates all of MSI & MSIX */
>  if (pdev->cap_present & QEMU_PCI_CAP_MSIX) {
>  memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff,
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 6256587..e0c53f2 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -15,6 +15,7 @@
>  #include "qemu-common.h"
>  #include "exec/memory.h"
>  #include "hw/pci/pci.h"
> +#include "hw/pci/pci_bridge.h"
>  #include "hw/vfio/vfio-common.h"
>  #include "qemu/event_notifier.h"
>  #include "qemu/queue.h"
> @@ -128,6 +129,8 @@ typedef struct VFIOPCIDevice {
>  #define VFIO_FEATURE_ENABLE_VGA (1 << VFIO_FEATURE_ENABLE_VGA_BIT)
>  #define VFIO_FEAT

Re: [Qemu-devel] [PATCH v2 05/11] vfio: add check host bus reset is support or not

2016-03-08 Thread Alex Williamson

On Mon, 7 Mar 2016 11:22:58 +0800
Cao jin  wrote:

> From: Chen Fan 
> 
> when boot up a VM that assigning vfio devices with aer enabled, we
> must check the vfio device whether support host bus reset. because
> when one error occur. OS driver always recover the device by do a
> bus reset, in order to recover the vfio device, qemu must to do a
> host bus reset to reset the device to default status. and for all
> affected devices by the bus reset. we must check them whether all
> are assigned to the VM.
> 
> Signed-off-by: Chen Fan 
> ---
>  hw/vfio/pci.c | 218 
> --
>  hw/vfio/pci.h |   1 +
>  2 files changed, 212 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8ec9b25..0898e34 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1868,6 +1868,197 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, 
> uint8_t pos)
>  return 0;
>  }
>  
> +static bool vfio_pci_host_slot_match(PCIHostDeviceAddress *host1,
> + PCIHostDeviceAddress *host2)
> +{
> +return (host1->domain == host2->domain && host1->bus == host2->bus &&
> +host1->slot == host2->slot);
> +}
> +
> +static bool vfio_pci_host_match(PCIHostDeviceAddress *host1,
> +PCIHostDeviceAddress *host2)
> +{
> +return (vfio_pci_host_slot_match(host1, host2) &&
> +host1->function == host2->function);
> +}
> +
> +struct VFIODeviceFind {
> +PCIDevice *pdev;
> +bool found;
> +};
> +
> +static void vfio_check_device_noreset(PCIBus *bus, PCIDevice *pdev,
> +  void *opaque)
> +{
> +DeviceState *dev = DEVICE(pdev);
> +DeviceClass *dc = DEVICE_GET_CLASS(dev);
> +VFIOPCIDevice *vdev;
> +struct VFIODeviceFind *find = opaque;
> +
> +if (find->found) {
> +return;
> +}
> +
> +if (!object_dynamic_cast(OBJECT(dev), "vfio-pci")) {
> +if (!dc->reset) {
> +goto found;
> +}
> +return;
> +}
> +vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> +if (!(vdev->features & VFIO_FEATURE_ENABLE_AER) &&
> +!vdev->vbasedev.reset_works) {
> +goto found;
> +}
> +
> +return;
> +found:
> +find->pdev = pdev;
> +find->found = true;
> +}
> +
> +static void vfio_check_host_bus_reset(VFIOPCIDevice *vdev, Error **errp)
> +{
> +PCIBus *bus = vdev->pdev.bus;
> +struct vfio_pci_hot_reset_info *info = NULL;
> +struct vfio_pci_dependent_device *devices;
> +VFIOGroup *group;
> +struct VFIODeviceFind find;
> +int ret, i;
> +
> +ret = vfio_get_hot_reset_info(vdev, &info);
> +if (ret) {
> +error_setg(errp, "vfio: Cannot enable AER for device %s,"
> +   " device does not support hot reset.",
> +   vdev->vbasedev.name);
> +return;
> +}
> +
> +/* List all affected devices by bus reset */
> +devices = &info->devices[0];
> +
> +/* Verify that we have all the groups required */
> +for (i = 0; i < info->count; i++) {
> +PCIHostDeviceAddress host;
> +VFIOPCIDevice *tmp;
> +VFIODevice *vbasedev_iter;
> +bool found = false;
> +
> +host.domain = devices[i].segment;
> +host.bus = devices[i].bus;
> +host.slot = PCI_SLOT(devices[i].devfn);
> +host.function = PCI_FUNC(devices[i].devfn);
> +
> +/* Skip the current device */
> +if (vfio_pci_host_match(&host, &vdev->host)) {
> +continue;
> +}
> +
> +/* Ensure we own the group of the affected device */
> +QLIST_FOREACH(group, &vfio_group_list, next) {
> +if (group->groupid == devices[i].group_id) {
> +break;
> +}
> +}
> +
> +if (!group) {
> +error_setg(errp, "vfio: Cannot enable AER for device %s, "
> +   "depends on group %d which is not owned.",
> +   vdev->vbasedev.name, devices[i].group_id);
> +goto out;
> +}
> +
> +/* Ensure affected devices for reset on the same slot */
> +QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> +if (vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> +continue;
> +}
> +tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> +if (vfio_pci_host_match(&host, &tmp->host)) {
> +/*
> + * AER errors may be broadcast to all functions of a multi-
> + * function endpoint.  If any of those sibling functions are
> + * also assigned, they need to have AER enabled or else an
> + * error may continue to cause a vm_stop condition.  IOW,
> + * AER setup of this function would be pointless.
> + */
> +if (vfio_pci_host_slot_match(&vdev->host, &tmp->host) &&
> +

[Qemu-devel] [PATCH 2/2] i386: Interrupt remapping support for VT-d

2016-03-08 Thread Rita Sinha

From: Jan Kiszka 

Still a bit hacky, unconditionally enabled (must become opt-in, not
available with in-kernel irqchip), not reporting faults properly - but
it works! And revealed a Linux bug [1]

[1] http://thread.gmane.org/gmane.linux.kernel/1766261

Signed-off-by: Rita Sinha 
---
 hw/i386/acpi-build.c   |  28 ++-
 hw/i386/intel_iommu.c  | 162 -
 hw/i386/intel_iommu_internal.h |  27 +++
 hw/intc/apic.c |   1 +
 hw/pci-host/q35.c  |  11 +++
 include/hw/acpi/acpi-defs.h|  22 ++
 include/hw/i386/intel_iommu.h  |   7 ++
 7 files changed, 252 insertions(+), 6 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 52c9470..ef43122 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -68,6 +68,9 @@
 
 #define ACPI_BUILD_TABLE_SIZE 0x2
 
+#define ACPI_BUILD_IOAPIC_ID  0x0
+#define ACPI_BUILD_HPET_ID0x0
+
 /* #define DEBUG_ACPI_BUILD */
 #ifdef DEBUG_ACPI_BUILD
 #define ACPI_BUILD_DPRINTF(fmt, ...)\
@@ -392,7 +395,6 @@ build_madt(GArray *table_data, GArray *linker, AcpiCpuInfo 
*cpu)
 io_apic = acpi_data_push(table_data, sizeof *io_apic);
 io_apic->type = ACPI_APIC_IO;
 io_apic->length = sizeof(*io_apic);
-#define ACPI_BUILD_IOAPIC_ID 0x0
 io_apic->io_apic_id = ACPI_BUILD_IOAPIC_ID;
 io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
 io_apic->interrupt = cpu_to_le32(0);
@@ -2302,6 +2304,7 @@ build_hpet(GArray *table_data, GArray *linker)
  */
 hpet->timer_block_id = cpu_to_le32(0x8086a201);
 hpet->addr.address = cpu_to_le64(HPET_BASE);
+hpet->hpet_number = ACPI_BUILD_HPET_ID;
 build_header(linker, table_data,
  (void *)hpet, "HPET", sizeof(*hpet), 1, NULL, NULL);
 }
@@ -2496,19 +2499,38 @@ build_dmar_q35(GArray *table_data, GArray *linker)
 
 AcpiTableDmar *dmar;
 AcpiDmarHardwareUnit *drhd;
+AcpiDmarDeviceScope *dev_scope;
 
 dmar = acpi_data_push(table_data, sizeof(*dmar));
 dmar->host_address_width = VTD_HOST_ADDRESS_WIDTH - 1;
-dmar->flags = 0;/* No intr_remap for now */
+dmar->flags = ACPI_DMAR_INTR_REMAP;
 
 /* DMAR Remapping Hardware Unit Definition structure */
 drhd = acpi_data_push(table_data, sizeof(*drhd));
 drhd->type = cpu_to_le16(ACPI_DMAR_TYPE_HARDWARE_UNIT);
-drhd->length = cpu_to_le16(sizeof(*drhd));   /* No device scope now */
+drhd->length = cpu_to_le16(sizeof(*drhd) + (sizeof(*dev_scope) + 2) * 2);
 drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
 drhd->pci_segment = cpu_to_le16(0);
 drhd->address = cpu_to_le64(Q35_HOST_BRIDGE_IOMMU_ADDR);
 
+/* Device Scope structures for IOAPIC */
+dev_scope = acpi_data_push(table_data, sizeof(*dev_scope) + 2);
+dev_scope->type = ACPI_DMAR_SCOPE_TYPE_IOAPIC;
+dev_scope->length = sizeof(*dev_scope) + 2;
+dev_scope->enumeration_id = ACPI_BUILD_IOAPIC_ID;
+dev_scope->start_bus_number = Q35_PSEUDO_BUS_PLATFORM;
+dev_scope->path[0] = PCI_SLOT(Q35_PSEUDO_DEVFN_IOAPIC);
+dev_scope->path[1] = PCI_FUNC(Q35_PSEUDO_DEVFN_IOAPIC);
+
+/* Device Scope structures for HPET */
+dev_scope = acpi_data_push(table_data, sizeof(*dev_scope) + 2);
+dev_scope->type = ACPI_DMAR_SCOPE_TYPE_HPET;
+dev_scope->length = sizeof(*dev_scope) + 2;
+dev_scope->enumeration_id = ACPI_BUILD_HPET_ID;
+dev_scope->start_bus_number = Q35_PSEUDO_BUS_PLATFORM;
+dev_scope->path[0] = PCI_SLOT(Q35_PSEUDO_DEVFN_HPET);
+dev_scope->path[1] = PCI_FUNC(Q35_PSEUDO_DEVFN_HPET);
+
 build_header(linker, table_data, (void *)(table_data->data + dmar_start),
  "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
 }
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c371588..2ea642c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -20,6 +20,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/i386/apic-msidef.h"
 #include "hw/sysbus.h"
 #include "exec/address-spaces.h"
 #include "intel_iommu_internal.h"
@@ -30,10 +31,11 @@
 #ifdef DEBUG_INTEL_IOMMU
 enum {
 DEBUG_GENERAL, DEBUG_CSR, DEBUG_INV, DEBUG_MMU, DEBUG_FLOG,
-DEBUG_CACHE,
+DEBUG_CACHE, DEBUG_IR
 };
 #define VTD_DBGBIT(x)   (1 << DEBUG_##x)
-static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
+static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR) |
+  VTD_DBGBIT(IR);
 
 #define VTD_DPRINTF(what, fmt, ...) do { \
 if (vtd_dbgflags & VTD_DBGBIT(what)) { \
@@ -1134,6 +1136,31 @@ static void vtd_handle_gcmd_qie(IntelIOMMUState *s, bool 
en)
 }
 
 /* Set Root Table Pointer */
+static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
+{
+VTD_DPRINTF(CSR, "set Interrupt Remap Table Pointer");
+
+s->irta = vtd_get_quad_raw(s, DMAR_IRTA_REG);
+s->irt_size = 2 << (s->irta & VTD_IRTA_SIZE_MASK);
+s->irta &= VTD_IRTA_ADDR_MASK;
+/* Ok - report back to driver */
+vtd_set_cle

[Qemu-devel] [PATCH 1/2] i386: Prepare for interrupt remapping

2016-03-08 Thread Rita Sinha

From: Jan Kiszka 

Introduce a DMA default target address space for PCI devices. Catch all
interrupt requests to the front-side bus via an MSI memory region that
is part of that address space. Provide separate address spaces for
IOAPIC and HPET if the IOMMU is active to prepare for adding remapping
logic later on. Deliver the EDID from the IOAPIC for the same reason.

Allows to remove the MSI hack from the APIC MMIO write handler.

Signed-off-by: Rita Sinha 
---
 hw/i386/intel_iommu.c | 22 ++--
 hw/i386/kvm/apic.c| 24 +++---
 hw/i386/pc.c  | 17 +
 hw/i386/pc_piix.c |  1 -
 hw/i386/pc_q35.c  | 11 +---
 hw/i386/xen/xen_apic.c| 24 +++---
 hw/intc/apic.c| 53 +++
 hw/intc/apic_common.c |  2 ++
 hw/intc/ioapic.c  | 32 ---
 hw/pci-host/q35.c |  7 ++
 hw/pci/pci.c  |  8 +-
 hw/timer/hpet.c   |  6 ++---
 include/hw/i386/apic-msidef.h |  4 +++
 include/hw/i386/apic_internal.h   |  1 +
 include/hw/i386/ioapic_internal.h |  1 +
 include/hw/i386/pc.h  |  6 +
 include/hw/pci-host/q35.h |  4 +++
 include/hw/pci/pci.h  |  2 ++
 target-i386/cpu.c |  7 ++
 target-i386/cpu.h |  2 +-
 20 files changed, 156 insertions(+), 78 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 347718f..c371588 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -24,6 +24,7 @@
 #include "exec/address-spaces.h"
 #include "intel_iommu_internal.h"
 #include "hw/pci/pci.h"
+#include "hw/i386/pc.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -266,6 +267,11 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t 
source_id,
 g_hash_table_replace(s->iotlb, key, entry);
 }
 
+static AddressSpace *get_dma_address_space(void)
+{
+return &PC_MACHINE(qdev_get_machine())->dma_address_space;
+}
+
 /* Given the reg addr of both the message data and address, generate an
  * interrupt via MSI.
  */
@@ -282,7 +288,7 @@ static void vtd_generate_interrupt(IntelIOMMUState *s, 
hwaddr mesg_addr_reg,
 data = vtd_get_long_raw(s, mesg_data_reg);
 
 VTD_DPRINTF(FLOG, "msi: addr 0x%"PRIx64 " data 0x%"PRIx32, addr, data);
-address_space_stl_le(&address_space_memory, addr, data,
+address_space_stl_le(get_dma_address_space(), addr, data,
  MEMTXATTRS_UNSPECIFIED, NULL);
 }
 
@@ -496,7 +502,7 @@ static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t 
index,
 dma_addr_t addr;
 
 addr = s->root + index * sizeof(*re);
-if (dma_memory_read(&address_space_memory, addr, re, sizeof(*re))) {
+if (dma_memory_read(get_dma_address_space(), addr, re, sizeof(*re))) {
 VTD_DPRINTF(GENERAL, "error: fail to access root-entry at 0x%"PRIx64
 " + %"PRIu8, s->root, index);
 re->val = 0;
@@ -521,7 +527,7 @@ static int vtd_get_context_entry_from_root(VTDRootEntry 
*root, uint8_t index,
 return -VTD_FR_ROOT_ENTRY_P;
 }
 addr = (root->val & VTD_ROOT_ENTRY_CTP) + index * sizeof(*ce);
-if (dma_memory_read(&address_space_memory, addr, ce, sizeof(*ce))) {
+if (dma_memory_read(get_dma_address_space(), addr, ce, sizeof(*ce))) {
 VTD_DPRINTF(GENERAL, "error: fail to access context-entry at 0x%"PRIx64
 " + %"PRIu8,
 (uint64_t)(root->val & VTD_ROOT_ENTRY_CTP), index);
@@ -555,7 +561,7 @@ static uint64_t vtd_get_slpte(dma_addr_t base_addr, 
uint32_t index)
 
 assert(index < VTD_SL_PT_ENTRY_NR);
 
-if (dma_memory_read(&address_space_memory,
+if (dma_memory_read(get_dma_address_space(),
 base_addr + index * sizeof(slpte), &slpte,
 sizeof(slpte))) {
 slpte = (uint64_t)-1;
@@ -1227,7 +1233,7 @@ static bool vtd_get_inv_desc(dma_addr_t base_addr, 
uint32_t offset,
  VTDInvDesc *inv_desc)
 {
 dma_addr_t addr = base_addr + offset * sizeof(*inv_desc);
-if (dma_memory_read(&address_space_memory, addr, inv_desc,
+if (dma_memory_read(get_dma_address_space(), addr, inv_desc,
 sizeof(*inv_desc))) {
 VTD_DPRINTF(GENERAL, "error: fail to fetch Invalidation Descriptor "
 "base_addr 0x%"PRIx64 " offset %"PRIu32, base_addr, 
offset);
@@ -1262,8 +1268,8 @@ static bool vtd_process_wait_desc(IntelIOMMUState *s, 
VTDInvDesc *inv_desc)
 VTD_DPRINTF(INV, "status data 0x%x, status addr 0x%"PRIx64,
 status_data, status_addr);
 status_data = cpu_to_le32(status_data);
-if (dma_memory_write(&address_space_memory, status_addr, &status_data,
- sizeof(status_data))) {
+if (dma_memory_write(get_dma_address_space

[Qemu-devel] [PATCH v5 13/15] register: Add GPIO API

2016-03-08 Thread Alistair Francis

Add GPIO functionality to the register API. This allows association
and automatic connection of GPIOs to bits in registers. GPIO inputs
will attach to handlers that automatically set read-only bits in
registers. GPIO outputs will be updated to reflect their field value
when their respective registers are written (or reset). Supports
active low GPIOs.

This is particularly effective for implementing system level
controllers, where heterogenous collections of control signals are
placed is a SoC specific peripheral then propagated all over the
system.

Signed-off-by: Peter Crosthwaite 
[ EI Changes:
  * register: Add a polarity field to GPIO connections
  Makes it possible to directly connect active low signals
  to generic interrupt pins.
]
Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Alistair Francis 
---
V5
 - Remove RegisterAccessError struct

 hw/core/register.c| 96 +++
 include/hw/register.h | 26 ++
 2 files changed, 122 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index 5db8f62..4201373 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -100,6 +100,7 @@ void register_write(RegisterInfo *reg, uint64_t val, 
uint64_t we)
 }
 
 register_write_val(reg, new_val);
+register_refresh_gpios(reg, old_val);
 
 if (ac->post_write) {
 ac->post_write(reg, new_val);
@@ -139,23 +140,117 @@ uint64_t register_read(RegisterInfo *reg)
 void register_reset(RegisterInfo *reg)
 {
 g_assert(reg);
+uint64_t old_val;
 
 if (!reg->data || !reg->access) {
 return;
 }
 
+old_val = register_read_val(reg);
+
 register_write_val(reg, reg->access->reset);
+register_refresh_gpios(reg, old_val);
+}
+
+void register_refresh_gpios(RegisterInfo *reg, uint64_t old_value)
+{
+const RegisterAccessInfo *ac;
+const RegisterGPIOMapping *gpio;
+
+ac = reg->access;
+for (gpio = ac->gpios; gpio && gpio->name; gpio++) {
+int i;
+
+if (gpio->input) {
+continue;
+}
+
+for (i = 0; i < gpio->num; ++i) {
+uint64_t gpio_value, gpio_value_old;
+
+qemu_irq gpo = qdev_get_gpio_out_named(DEVICE(reg), gpio->name, i);
+gpio_value_old = extract64(old_value,
+   gpio->bit_pos + i * gpio->width,
+   gpio->width) ^ gpio->polarity;
+gpio_value = extract64(register_read_val(reg),
+   gpio->bit_pos + i * gpio->width,
+   gpio->width) ^ gpio->polarity;
+if (!(gpio_value_old ^ gpio_value)) {
+continue;
+}
+if (reg->debug && gpo) {
+qemu_log("refreshing gpio out %s to %" PRIx64 "\n",
+ gpio->name, gpio_value);
+}
+qemu_set_irq(gpo, gpio_value);
+}
+}
+}
+
+typedef struct DeviceNamedGPIOHandlerOpaque {
+DeviceState *dev;
+const char *name;
+} DeviceNamedGPIOHandlerOpaque;
+
+static void register_gpio_handler(void *opaque, int n, int level)
+{
+DeviceNamedGPIOHandlerOpaque *gho = opaque;
+RegisterInfo *reg = REGISTER(gho->dev);
+
+const RegisterAccessInfo *ac;
+const RegisterGPIOMapping *gpio;
+
+ac = reg->access;
+for (gpio = ac->gpios; gpio && gpio->name; gpio++) {
+if (gpio->input && !strcmp(gho->name, gpio->name)) {
+register_write_val(reg, deposit64(register_read_val(reg),
+  gpio->bit_pos + n * gpio->width,
+  gpio->width,
+  level ^ gpio->polarity));
+return;
+}
+}
+
+abort();
 }
 
 void register_init(RegisterInfo *reg)
 {
 assert(reg);
+const RegisterAccessInfo *ac;
+const RegisterGPIOMapping *gpio;
 
 if (!reg->data || !reg->access) {
 return;
 }
 
 object_initialize((void *)reg, sizeof(*reg), TYPE_REGISTER);
+
+ac = reg->access;
+for (gpio = ac->gpios; gpio && gpio->name; gpio++) {
+if (!gpio->num) {
+((RegisterGPIOMapping *)gpio)->num = 1;
+}
+if (!gpio->width) {
+((RegisterGPIOMapping *)gpio)->width = 1;
+}
+if (gpio->input) {
+DeviceNamedGPIOHandlerOpaque gho = {
+.name = gpio->name,
+.dev = DEVICE(reg),
+};
+qemu_irq irq;
+
+qdev_init_gpio_in_named(DEVICE(reg), register_gpio_handler,
+gpio->name, gpio->num);
+irq = qdev_get_gpio_in_named(DEVICE(reg), gpio->name, gpio->num);
+qemu_irq_set_opaque(irq, g_memdup(&gho, sizeof(gho)));
+} else {
+qemu_irq *gpos = g_new0(qemu_irq, gpio->num);
+
+qdev_init_gpio_out_named(DEVICE(reg), gpos, gpio

[Qemu-devel] [PATCH v5 10/15] qdev: Define qdev_get_gpio_out

2016-03-08 Thread Alistair Francis

From: Peter Crosthwaite 

An API similar to the existing qdev_get_gpio_in() except gets outputs.
Useful for:

1: Implementing lightweight devices that don't want to keep pointers
to their own GPIOs. They can get their GPIO pointers at runtime from
QOM using this API.

2: testing or debugging code which may wish to override the
hardware generated value of of a GPIO with a user specified value
(E.G. interrupt injection).

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/core/qdev.c | 12 
 include/hw/qdev-core.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index db41aa1..e3015d2 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -489,6 +489,18 @@ qemu_irq qdev_get_gpio_in(DeviceState *dev, int n)
 return qdev_get_gpio_in_named(dev, NULL, n);
 }
 
+qemu_irq qdev_get_gpio_out_named(DeviceState *dev, const char *name, int n)
+{
+char *propname = g_strdup_printf("%s[%d]",
+ name ? name : "unnamed-gpio-out", n);
+return (qemu_irq)object_property_get_link(OBJECT(dev), propname, NULL);
+}
+
+qemu_irq qdev_get_gpio_out(DeviceState *dev, int n)
+{
+return qdev_get_gpio_out_named(dev, NULL, n);
+}
+
 void qdev_connect_gpio_out_named(DeviceState *dev, const char *name, int n,
  qemu_irq pin)
 {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index c3ff99f..25aa9e9 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -286,6 +286,8 @@ bool qdev_machine_modified(void);
 
 qemu_irq qdev_get_gpio_in(DeviceState *dev, int n);
 qemu_irq qdev_get_gpio_in_named(DeviceState *dev, const char *name, int n);
+qemu_irq qdev_get_gpio_out(DeviceState *dev, int n);
+qemu_irq qdev_get_gpio_out_named(DeviceState *dev, const char *name, int n);
 
 void qdev_connect_gpio_out(DeviceState *dev, int n, qemu_irq pin);
 void qdev_connect_gpio_out_named(DeviceState *dev, const char *name, int n,
-- 
2.5.0

[Qemu-devel] [PATCH v5 14/15] misc: Introduce ZynqMP IOU SLCR

2016-03-08 Thread Alistair Francis

From: Peter Crosthwaite 

IOU = I/O Unit
SLCR = System Level Control Registers

This IP is a misc collections of control registers that switch various
properties of system IPs. Currently the only thing implemented is the
SD_SLOTTYPE control (implemented as a GPIO output).

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/misc/Makefile.objs  |   1 +
 hw/misc/xlnx-zynqmp-iou-slcr.c | 115 +
 include/hw/misc/xlnx-zynqmp-iou-slcr.h |  47 ++
 3 files changed, 163 insertions(+)
 create mode 100644 hw/misc/xlnx-zynqmp-iou-slcr.c
 create mode 100644 include/hw/misc/xlnx-zynqmp-iou-slcr.h

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index ea6cd3c..eef3e40 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -41,6 +41,7 @@ obj-$(CONFIG_RASPI) += bcm2835_property.o
 obj-$(CONFIG_SLAVIO) += slavio_misc.o
 obj-$(CONFIG_ZYNQ) += zynq_slcr.o
 obj-$(CONFIG_ZYNQ) += zynq-xadc.o
+obj-$(CONFIG_ZYNQ) += xlnx-zynqmp-iou-slcr.o
 obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
diff --git a/hw/misc/xlnx-zynqmp-iou-slcr.c b/hw/misc/xlnx-zynqmp-iou-slcr.c
new file mode 100644
index 000..00b617e
--- /dev/null
+++ b/hw/misc/xlnx-zynqmp-iou-slcr.c
@@ -0,0 +1,115 @@
+/*
+ * Xilinx ZynqMP IOU System Level Control Registers (SLCR)
+ *
+ * Copyright (c) 2013 Xilinx Inc
+ * Copyright (c) 2013 Peter Crosthwaite 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/misc/xlnx-zynqmp-iou-slcr.h"
+
+#ifndef XLNX_ZYNQMP_IOU_SLCR_ERR_DEBUG
+#define XLNX_ZYNQMP_IOU_SLCR_ERR_DEBUG 0
+#endif
+
+REG32(SD_SLOTTYPE, 0x310)
+#define R_SD_SLOTTYPE_RSVD   0x7ffe
+
+static const RegisterAccessInfo xlnx_zynqmp_iou_slcr_regs_info[] = {
+{   .name = "SD Slot TYPE", .decode.addr = A_SD_SLOTTYPE,
+.rsvd = R_SD_SLOTTYPE_RSVD,
+.gpios = (RegisterGPIOMapping []) {
+{ .name = "SD0_SLOTTYPE",   .bit_pos = 0  },
+{ .name = "SD1_SLOTTYPE",   .bit_pos = 15 },
+{},
+}
+}
+/* FIXME: Complete device model */
+};
+
+static void xlnx_zynqmp_iou_slcr_reset(DeviceState *dev)
+{
+XlnxZynqMPIOUSLCR *s = XLNX_ZYNQMP_IOU_SLCR(dev);
+int i;
+
+for (i = 0; i < XLNX_ZYNQ_MP_IOU_SLCR_R_MAX; ++i) {
+register_reset(&s->regs_info[i]);
+}
+}
+
+static const MemoryRegionOps xlnx_zynqmp_iou_slcr_ops = {
+.read = register_read_memory_le,
+.write = register_write_memory_le,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+}
+};
+
+static void xlnx_zynqmp_iou_slcr_init(Object *obj)
+{
+XlnxZynqMPIOUSLCR *s = XLNX_ZYNQMP_IOU_SLCR(obj);
+
+memory_region_init(&s->iomem, obj, "MMIO", XLNX_ZYNQ_MP_IOU_SLCR_R_MAX * 
4);
+register_init_block32(DEVICE(obj), xlnx_zynqmp_iou_slcr_regs_info,
+  ARRAY_SIZE(xlnx_zynqmp_iou_slcr_regs_info),
+  s->regs_info, s->regs, &s->iomem,
+  &xlnx_zynqmp_iou_slcr_ops,
+  XLNX_ZYNQMP_IOU_SLCR_ERR_DEBUG,
+  XLNX_ZYNQ_MP_IOU_SLCR_R_MAX);
+sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->iomem);
+}
+
+static const VMStateDescription vmstate_xlnx_zynqmp_iou_slcr = {
+.name = "xlnx_zynqmp_iou_slcr",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32_ARRAY(regs, XlnxZynqMPIOUSLCR,
+ XLNX_ZYNQ_MP_IOU_SLCR_R_MAX),
+VMSTATE_END_OF_LIST(),
+}
+};
+
+static void xlnx_zynqmp_iou_slcr_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->reset = xlnx_zynqmp_iou_slcr_reset;
+dc->vmsd = &vmstate_xlnx_zynqmp_iou_slcr;

[Qemu-devel] [PATCH v5 07/15] register: Add block initialise helper

2016-03-08 Thread Alistair Francis

From: Peter Crosthwaite 

Add a helper that will scan a static RegisterAccessInfo Array
and populate a container MemoryRegion with registers as defined.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V3:
 - Fix typo
V2:
 - Use memory_region_add_subregion_no_print()

 hw/core/register.c| 39 +++
 include/hw/register.h | 20 
 2 files changed, 59 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index 28f3776..5db8f62 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -228,6 +228,45 @@ uint64_t register_read_memory_le(void *opaque, hwaddr 
addr, unsigned size)
 return register_read_memory(opaque, addr, size, false);
 }
 
+void register_init_block32(DeviceState *owner, const RegisterAccessInfo *rae,
+   int num, RegisterInfo *ri, uint32_t *data,
+   MemoryRegion *container, const MemoryRegionOps *ops,
+   bool debug_enabled, uint64_t memory_size)
+{
+const char *device_prefix = object_get_typename(OBJECT(owner));
+RegisterInfoArray *r_array = g_malloc(sizeof(RegisterInfoArray));
+int i;
+
+r_array->num_elements = 0;
+r_array->r = g_malloc_n(num, sizeof(RegisterInfo *));
+
+for (i = 0; i < num; i++) {
+int index = rae[i].decode.addr / 4;
+RegisterInfo *r = &ri[index];
+
+*r = (RegisterInfo) {
+.data = &data[index],
+.data_size = sizeof(uint32_t),
+.access = &rae[i],
+.debug = debug_enabled,
+.prefix = device_prefix,
+.opaque = owner,
+};
+register_init(r);
+
+r_array->r[r_array->num_elements] = r;
+r_array->num_elements++;
+}
+
+r_array->num_elements--;
+
+memory_region_init_io(&r_array->mem, OBJECT(owner), ops, r_array,
+  device_prefix, memory_size);
+memory_region_add_subregion(container,
+r_array->r[0]->access->decode.addr,
+&r_array->mem);
+}
+
 static const TypeInfo register_info = {
 .name  = TYPE_REGISTER,
 .parent = TYPE_DEVICE,
diff --git a/include/hw/register.h b/include/hw/register.h
index d732f55..00df7d5 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -176,6 +176,26 @@ void register_write_memory_le(void *opaque, hwaddr addr, 
uint64_t value,
 uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size);
 uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size);
 
+/**
+ * Init a block of consecutive registers into a container MemoryRegion. A
+ * number of constant register definitions are parsed to create a corresponding
+ * array of RegisterInfo's.
+ *
+ * @owner: device owning the registers
+ * @rae: Register definitions to init
+ * @num: number of registers to init (length of @rae)
+ * @ri: Register array to init
+ * @data: Array to use for register data
+ * @container: Memory region to contain new registers
+ * @ops: Memory region ops to access registers.
+ * @debug enabled: turn on/off verbose debug information
+ */
+
+void register_init_block32(DeviceState *owner, const RegisterAccessInfo *rae,
+   int num, RegisterInfo *ri, uint32_t *data,
+   MemoryRegion *container, const MemoryRegionOps *ops,
+   bool debug_enabled, uint64_t memory_size);
+
 /* Define constants for a 32 bit register */
 #define REG32(reg, addr)  \
 enum { A_ ## reg = (addr) };  \
-- 
2.5.0

[Qemu-devel] [PATCH v5 11/15] qdev: Add qdev_pass_all_gpios API

2016-03-08 Thread Alistair Francis

From: Peter Crosthwaite 

For passing all GPIOs of all names from a contained device to a
container.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/core/qdev.c | 9 +
 include/hw/qdev-core.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index e3015d2..371fba0 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -589,6 +589,15 @@ void qdev_pass_gpios(DeviceState *dev, DeviceState 
*container,
 QLIST_INSERT_HEAD(&container->gpios, ngl, node);
 }
 
+void qdev_pass_all_gpios(DeviceState *dev, DeviceState *container)
+{
+NamedGPIOList *ngl;
+
+QLIST_FOREACH(ngl, &dev->gpios, node) {
+qdev_pass_gpios(dev, container, ngl->name);
+}
+}
+
 BusState *qdev_get_child_bus(DeviceState *dev, const char *name)
 {
 BusState *bus;
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 25aa9e9..ce64fa1 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -311,6 +311,7 @@ void qdev_init_gpio_out_named(DeviceState *dev, qemu_irq 
*pins,
 
 void qdev_pass_gpios(DeviceState *dev, DeviceState *container,
  const char *name);
+void qdev_pass_all_gpios(DeviceState *dev, DeviceState *container);
 
 BusState *qdev_get_parent_bus(DeviceState *dev);
 
-- 
2.5.0

[Qemu-devel] [PATCH v5 09/15] xilinx_zynq: Connect devcfg to the Zynq machine model

2016-03-08 Thread Alistair Francis

From: Peter Crosthwaite 

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V4:
 - Small corrections to the device model logic

 hw/arm/xilinx_zynq.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index a35983a..d10b0ef 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -290,6 +290,14 @@ static void zynq_init(MachineState *machine)
 sysbus_connect_irq(busdev, n + 1, pic[dma_irqs[n] - IRQ_OFFSET]);
 }
 
+dev = qdev_create(NULL, "xlnx.ps7-dev-cfg");
+object_property_add_child(qdev_get_machine(), "xlnx-devcfg", OBJECT(dev),
+  NULL);
+qdev_init_nofail(dev);
+busdev = SYS_BUS_DEVICE(dev);
+sysbus_connect_irq(busdev, 0, pic[40 - IRQ_OFFSET]);
+sysbus_mmio_map(busdev, 0, 0xF8007000);
+
 zynq_binfo.ram_size = ram_size;
 zynq_binfo.kernel_filename = kernel_filename;
 zynq_binfo.kernel_cmdline = kernel_cmdline;
-- 
2.5.0

[Qemu-devel] [PATCH v5 05/15] register: Define REG and FIELD macros

2016-03-08 Thread Alistair Francis

From: Peter Crosthwaite 

Define some macros that can be used for defining registers and fields.

The REG32 macro will define A_FOO, for the byte address of a register
as well as R_FOO for the uint32_t[] register number (A_FOO / 4).

The FIELD macro will define FOO_BAR_MASK, FOO_BAR_SHIFT and
FOO_BAR_LENGTH constants for field BAR in register FOO.

Finally, there are some shorthand helpers for extracting/depositing
fields from registers based on these naming schemes.

Usage can greatly reduce the verbosity of device code.

The deposit and extract macros (eg F_EX32, AF_DP32 etc.) can be used
to generate extract and deposits without any repetition of the name
stems.

Signed-off-by: Peter Crosthwaite 
[ EI Changes:
  * Add Deposit macros
]
Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Alistair Francis 
---
E.g. Currently you have to define something like:

\#define R_FOOREG (0x84/4)
\#define R_FOOREG_BARFIELD_SHIFT 10
\#define R_FOOREG_BARFIELD_LENGTH 5

uint32_t foobar_val = extract32(s->regs[R_FOOREG],
R_FOOREG_BARFIELD_SHIFT,
R_FOOREG_BARFIELD_LENGTH);

Which has:
2 macro definitions per field
3 register names ("FOOREG") per extract
2 field names ("BARFIELD") per extract

With these macros this becomes:

REG32(FOOREG, 0x84)
FIELD(FOOREG, BARFIELD, 10, 5)

uint32_t foobar_val = AF_EX32(s->regs, FOOREG, BARFIELD)

Which has:
1 macro definition per field
1 register name per extract
1 field name per extract

If you are not using arrays for the register data you can just use the
non-array "F_" variants and still save 2 name stems:

uint32_t foobar_val = F_EX32(s->fooreg, FOOREG, BARFIELD)

Deposit is similar for depositing values. Deposit has compile-time
overflow checking for literals.
For example:

REG32(XYZ1, 0x84)
FIELD(XYZ1, TRC, 0, 4)

/* Correctly set XYZ1.TRC = 5.  */
AF_DP32(s->regs, XYZ1, TRC, 5);

/* Incorrectly set XYZ1.TRC = 16.  */
AF_DP32(s->regs, XYZ1, TRC, 16);

The latter assignment results in:
warning: large integer implicitly truncated to unsigned type [-Woverflow]


 include/hw/register.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/include/hw/register.h b/include/hw/register.h
index bc2c96a..b105d76 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -161,4 +161,42 @@ void register_write_memory_le(void *opaque, hwaddr addr, 
uint64_t value,
 uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size);
 uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size);
 
+/* Define constants for a 32 bit register */
+#define REG32(reg, addr)  \
+enum { A_ ## reg = (addr) };  \
+enum { R_ ## reg = (addr) / 4 };
+
+/* Define SHIFT, LEGTH and MASK constants for a field within a register */
+#define FIELD(reg, field, shift, length)  \
+enum { R_ ## reg ## _ ## field ## _SHIFT = (shift)};  \
+enum { R_ ## reg ## _ ## field ## _LENGTH = (length)};\
+enum { R_ ## reg ## _ ## field ## _MASK = (((1ULL << (length)) - 1)   \
+  << (shift)) };
+
+/* Extract a field from a register */
+
+#define F_EX32(storage, reg, field)   \
+extract32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH)
+
+/* Extract a field from an array of registers */
+
+#define AF_EX32(regs, reg, field) \
+F_EX32((regs)[R_ ## reg], reg, field)
+
+/* Deposit a register field.  */
+
+#define F_DP32(storage, reg, field, val) ({   \
+struct {  \
+unsigned int v:R_ ## reg ## _ ## field ## _LENGTH;\
+} v = { .v = val };   \
+uint32_t d;   \
+d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,   \
+  R_ ## reg ## _ ## field ## _LENGTH, v.v);   \
+d; })
+
+/* Deposit a field to array of registers.  */
+
+#define AF_DP32(regs, reg, field, val)\
+(regs)[R_ ## reg] = F_DP32((regs)[R_ ## reg], reg, field, val);
 #endif
-- 
2.5.0

[Qemu-devel] [PATCH v5 08/15] dma: Add Xilinx Zynq devcfg device model

2016-03-08 Thread Alistair Francis

Add a minimal model for the devcfg device which is part of Zynq.
This model supports DMA capabilities and interrupt generation.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V5:
 - Corrections to the device model logic

 default-configs/arm-softmmu.mak   |   1 +
 hw/dma/Makefile.objs  |   1 +
 hw/dma/xlnx-zynq-devcfg.c | 394 ++
 include/hw/dma/xlnx-zynq-devcfg.h |  62 ++
 4 files changed, 458 insertions(+)
 create mode 100644 hw/dma/xlnx-zynq-devcfg.c
 create mode 100644 include/hw/dma/xlnx-zynq-devcfg.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index a9f82a1..d524584 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -66,6 +66,7 @@ CONFIG_PXA2XX=y
 CONFIG_BITBANG_I2C=y
 CONFIG_FRAMEBUFFER=y
 CONFIG_XILINX_SPIPS=y
+CONFIG_ZYNQ_DEVCFG=y
 
 CONFIG_ARM11SCU=y
 CONFIG_A9SCU=y
diff --git a/hw/dma/Makefile.objs b/hw/dma/Makefile.objs
index 0e65ed0..eaf0a81 100644
--- a/hw/dma/Makefile.objs
+++ b/hw/dma/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_PL330) += pl330.o
 common-obj-$(CONFIG_I82374) += i82374.o
 common-obj-$(CONFIG_I8257) += i8257.o
 common-obj-$(CONFIG_XILINX_AXI) += xilinx_axidma.o
+common-obj-$(CONFIG_ZYNQ_DEVCFG) += xlnx-zynq-devcfg.o
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_dma.o
 common-obj-$(CONFIG_STP2000) += sparc32_dma.o
 common-obj-$(CONFIG_SUN4M) += sun4m_iommu.o
diff --git a/hw/dma/xlnx-zynq-devcfg.c b/hw/dma/xlnx-zynq-devcfg.c
new file mode 100644
index 000..63e97a7
--- /dev/null
+++ b/hw/dma/xlnx-zynq-devcfg.c
@@ -0,0 +1,394 @@
+/*
+ * QEMU model of the Xilinx Zynq Devcfg Interface
+ *
+ * (C) 2011 PetaLogix Pty Ltd
+ * (C) 2014 Xilinx Inc.
+ * Written by Peter Crosthwaite 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/dma/xlnx-zynq-devcfg.h"
+#include "qemu/bitops.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/dma.h"
+
+#define FREQ_HZ 9
+
+#define BTT_MAX 0x400
+
+#ifndef XLNX_ZYNQ_DEVCFG_ERR_DEBUG
+#define XLNX_ZYNQ_DEVCFG_ERR_DEBUG 0
+#endif
+
+#define DB_PRINT(fmt, args...) do { \
+if (XLNX_ZYNQ_DEVCFG_ERR_DEBUG) { \
+qemu_log("%s: " fmt, __func__, ## args); \
+} \
+} while (0);
+
+REG32(CTRL, 0x00)
+FIELD(CTRL, FORCE_RST,  31,  1) /* Not supported, wr ignored */
+FIELD(CTRL, PCAP_PR,27,  1) /* Forced to 0 on bad unlock */
+FIELD(CTRL, PCAP_MODE,  26,  1)
+FIELD(CTRL, MULTIBOOT_EN,   24,  1)
+FIELD(CTRL, USER_MODE,  15,  1)
+FIELD(CTRL, PCFG_AES_FUSE,  12,  1)
+FIELD(CTRL, PCFG_AES_EN, 9,  3)
+FIELD(CTRL, SEU_EN,  8,  1)
+FIELD(CTRL, SEC_EN,  7,  1)
+FIELD(CTRL, SPNIDEN, 6,  1)
+FIELD(CTRL, SPIDEN,  5,  1)
+FIELD(CTRL, NIDEN,   4,  1)
+FIELD(CTRL, DBGEN,   3,  1)
+FIELD(CTRL, DAP_EN,  0,  3)
+
+REG32(LOCK, 0x04)
+#define AES_FUSE_LOCK4
+#define AES_EN_LOCK  3
+#define SEU_LOCK 2
+#define SEC_LOCK 1
+#define DBG_LOCK 0
+
+/* mapping bits in R_LOCK to what they lock in R_CTRL */
+static const uint32_t lock_ctrl_map[] = {
+[AES_FUSE_LOCK] = R_CTRL_PCFG_AES_FUSE_MASK,
+[AES_EN_LOCK]   = R_CTRL_PCFG_AES_EN_MASK,
+[SEU_LOCK]  = R_CTRL_SEU_EN_MASK,
+[SEC_LOCK]  = R_CTRL_SEC_EN_MASK,
+[DBG_LOCK]  = R_CTRL_SPNIDEN_MASK | R_CTRL_SPIDEN_MASK |
+  R_CTRL_NIDEN_MASK   | R_CTRL_DBGEN_MASK  |
+  R_CTRL_DAP_EN_MASK,
+};
+
+REG32(CFG, 0x08)
+FIELD(CFG,  RFIFO_TH,   10,  2)
+FIELD(CFG,  WFIFO_TH,8,  2)
+FIELD(CFG,  RCLK_EDGE,   7,  1)
+FIELD(CFG,  WCLK_EDGE,   6,  1)
+FIELD(CFG,

[Qemu-devel] [PATCH v5 06/15] register: QOMify

2016-03-08 Thread Alistair Francis

From: Peter Crosthwaite 

QOMify registers as a child of TYPE_DEVICE. This allows registers to
define GPIOs.

Define an init helper that will do QOM initialisation.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
Reviewed-by: KONRAD Frederic 
---
V5:
 - Convert to using only one memory region

 hw/core/register.c| 23 +++
 include/hw/register.h | 15 +++
 2 files changed, 38 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index e1cd0c4..28f3776 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -147,6 +147,17 @@ void register_reset(RegisterInfo *reg)
 register_write_val(reg, reg->access->reset);
 }
 
+void register_init(RegisterInfo *reg)
+{
+assert(reg);
+
+if (!reg->data || !reg->access) {
+return;
+}
+
+object_initialize((void *)reg, sizeof(*reg), TYPE_REGISTER);
+}
+
 static inline void register_write_memory(void *opaque, hwaddr addr,
  uint64_t value, unsigned size, bool 
be)
 {
@@ -216,3 +227,15 @@ uint64_t register_read_memory_le(void *opaque, hwaddr 
addr, unsigned size)
 {
 return register_read_memory(opaque, addr, size, false);
 }
+
+static const TypeInfo register_info = {
+.name  = TYPE_REGISTER,
+.parent = TYPE_DEVICE,
+};
+
+static void register_register_types(void)
+{
+type_register_static(®ister_info);
+}
+
+type_init(register_register_types)
diff --git a/include/hw/register.h b/include/hw/register.h
index b105d76..d732f55 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -11,6 +11,7 @@
 #ifndef REGISTER_H
 #define REGISTER_H
 
+#include "hw/qdev-core.h"
 #include "exec/memory.h"
 
 typedef struct RegisterInfo RegisterInfo;
@@ -79,6 +80,9 @@ struct RegisterAccessInfo {
  */
 
 struct RegisterInfo {
+/*  */
+DeviceState parent_obj;
+
 /*  */
 void *data;
 int data_size;
@@ -91,6 +95,9 @@ struct RegisterInfo {
 void *opaque;
 };
 
+#define TYPE_REGISTER "qemu,register"
+#define REGISTER(obj) OBJECT_CHECK(RegisterInfo, (obj), TYPE_REGISTER)
+
 /**
  * This structure is used to group all of the individual registers which are
  * modeled using the RegisterInfo strucutre.
@@ -136,6 +143,14 @@ uint64_t register_read(RegisterInfo *reg);
 void register_reset(RegisterInfo *reg);
 
 /**
+ * Initialize a register. GPIO's are setup as IOs to the specified device.
+ * Fast paths for eligible registers are enabled.
+ * @reg: Register to initialize
+ */
+
+void register_init(RegisterInfo *reg);
+
+/**
  * Memory API MMIO write handler that will write to a Register API register.
  *  _be for big endian variant and _le for little endian.
  * @opaque: RegisterInfo to write to
-- 
2.5.0

[Qemu-devel] [PATCH v5 12/15] irq: Add opaque setter routine

2016-03-08 Thread Alistair Francis

From: Peter Crosthwaite 

Add a routine to set or override the opaque data of an IRQ.

Qdev currently always initialises IRQ opaque as the device itself.
This allows you to override to a custom opaque in the case where
there is extra or different data needed.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---

 hw/core/irq.c| 5 +
 include/hw/irq.h | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/hw/core/irq.c b/hw/core/irq.c
index 49ff2e6..9d125fb 100644
--- a/hw/core/irq.c
+++ b/hw/core/irq.c
@@ -77,6 +77,11 @@ qemu_irq qemu_allocate_irq(qemu_irq_handler handler, void 
*opaque, int n)
 return irq;
 }
 
+void qemu_irq_set_opaque(qemu_irq irq, void *opaque)
+{
+irq->opaque = opaque;
+}
+
 void qemu_free_irqs(qemu_irq *s, int n)
 {
 int i;
diff --git a/include/hw/irq.h b/include/hw/irq.h
index 4c4c2ea..edad0fc 100644
--- a/include/hw/irq.h
+++ b/include/hw/irq.h
@@ -44,6 +44,8 @@ qemu_irq qemu_allocate_irq(qemu_irq_handler handler, void 
*opaque, int n);
 qemu_irq *qemu_extend_irqs(qemu_irq *old, int n_old, qemu_irq_handler handler,
 void *opaque, int n);
 
+void qemu_irq_set_opaque(qemu_irq irq, void *opaque);
+
 void qemu_free_irqs(qemu_irq *s, int n);
 void qemu_free_irq(qemu_irq irq);
 
-- 
2.5.0

[Qemu-devel] [PATCH v5 02/15] register: Add Register API

2016-03-08 Thread Alistair Francis

This API provides some encapsulation of registers and factors our some
common functionality to common code. Bits of device state (usually MMIO
registers), often have all sorts of access restrictions and semantics
associated with them. This API allow you to define what those
restrictions are on a bit-by-bit basis.

Helper functions are then used to access the register which observe the
semantics defined by the RegisterAccessInfo struct.

Some features:
Bits can be marked as read_only (ro field)
Bits can be marked as write-1-clear (w1c field)
Bits can be marked as reserved (rsvd field)
Reset values can be defined (reset)
Bits can be marked clear on read (cor)
Pre and post action callbacks can be added to read and write ops
Verbose debugging info can be enabled/disabled

Useful for defining device register spaces in a data driven way. Cuts
down on a lot of the verbosity and repetition in the switch-case blocks
in the standard foo_mmio_read/write functions.

Also useful for automated generation of device models from hardware
design sources.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V5:
 - Convert to using only one memory region
V4:
 - Rebase
 - Remove the guest error masking
 - Simplify the unimplemented masking
 - Use the reserved value in the write calculations
 - Remove read_lite and write_lite
 - General fixes to asserts and log printing
V3:
 - Address some comments from Fred

 hw/core/Makefile.objs |   1 +
 hw/core/register.c| 148 ++
 include/hw/register.h | 108 
 3 files changed, 257 insertions(+)
 create mode 100644 hw/core/register.c
 create mode 100644 include/hw/register.h

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index abb3560..bf95db5 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -14,4 +14,5 @@ common-obj-$(CONFIG_SOFTMMU) += machine.o
 common-obj-$(CONFIG_SOFTMMU) += null-machine.o
 common-obj-$(CONFIG_SOFTMMU) += loader.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
+common-obj-$(CONFIG_SOFTMMU) += register.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
diff --git a/hw/core/register.c b/hw/core/register.c
new file mode 100644
index 000..1656f71
--- /dev/null
+++ b/hw/core/register.c
@@ -0,0 +1,148 @@
+/*
+ * Register Definition API
+ *
+ * Copyright (c) 2016 Xilinx Inc.
+ * Copyright (c) 2013 Peter Crosthwaite 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/register.h"
+#include "hw/qdev.h"
+#include "qemu/log.h"
+
+static inline void register_write_val(RegisterInfo *reg, uint64_t val)
+{
+g_assert(reg->data);
+
+switch (reg->data_size) {
+case 1:
+*(uint8_t *)reg->data = val;
+break;
+case 2:
+*(uint16_t *)reg->data = val;
+break;
+case 4:
+*(uint32_t *)reg->data = val;
+break;
+case 8:
+*(uint64_t *)reg->data = val;
+break;
+default:
+g_assert_not_reached();
+}
+}
+
+static inline uint64_t register_read_val(RegisterInfo *reg)
+{
+switch (reg->data_size) {
+case 1:
+return *(uint8_t *)reg->data;
+case 2:
+return *(uint16_t *)reg->data;
+case 4:
+return *(uint32_t *)reg->data;
+case 8:
+return *(uint64_t *)reg->data;
+default:
+g_assert_not_reached();
+}
+return 0; /* unreachable */
+}
+
+void register_write(RegisterInfo *reg, uint64_t val, uint64_t we)
+{
+uint64_t old_val, new_val, test, no_w_mask;
+const RegisterAccessInfo *ac;
+
+assert(reg);
+
+ac = reg->access;
+
+if (!ac || !ac->name) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: write to undefined device state "
+  "(written value: %#" PRIx64 ")\n", reg->prefix, val);
+return;
+}
+
+old_val = reg->data ? register_read_val(reg) : ac->reset;
+
+test = (old_val ^ val) & ac->rsvd;
+if (test) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: change of value in reserved bit"
+  "fields: %#" PRIx64 ")\n", reg->prefix, test);
+}
+
+test = val & ac->unimp;
+if (test) {
+qemu_log_mask(LOG_UNIMP,
+  "%s:%s writing %#" PRIx64 " to unimplemented bits:" \
+  " %#" PRIx64 "",
+  reg->prefix, reg->access->name, val, ac->unimp);
+}
+
+/* Create the no write mask based on the read only, write to clear and
+ * reserved bit masks.
+ */
+no_w_mask = ac->ro | ac->w1c | ac->rsvd | ~we;
+new_val = (val & ~no_w_mask) | (old_val & no_w_mask);
+new_val &= ~(val & ac->w1c);
+
+if (ac->pre_write) {
+new_val = ac->pre_write(reg, new_val);
+}
+
+if (reg->debug) {
+qemu_log("%s:%s: write of value %#" PRIx64 "\n", reg->prefix, ac->name,
+ new_val);
+}
+
+regis

[Qemu-devel] [PATCH v5 03/15] register: Add Memory API glue

2016-03-08 Thread Alistair Francis

Add memory io handlers that glue the register API to the memory API.
Just translation functions at this stage. Although it does allow for
devices to be created without all-in-one mmio r/w handlers.

This patch also adds the RegisterInfoArray struct, which allows all of
the individual RegisterInfo structs to be grouped into a single memory
region.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V5:
 - Convert to using only one memory region

 hw/core/register.c| 70 +++
 include/hw/register.h | 51 +
 2 files changed, 121 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index 1656f71..e1cd0c4 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -146,3 +146,73 @@ void register_reset(RegisterInfo *reg)
 
 register_write_val(reg, reg->access->reset);
 }
+
+static inline void register_write_memory(void *opaque, hwaddr addr,
+ uint64_t value, unsigned size, bool 
be)
+{
+RegisterInfoArray *reg_array = opaque;
+RegisterInfo *reg = NULL;
+uint64_t we = ~0;
+int i, shift = 0;
+
+for (i = 0; i < reg_array->num_elements; i++) {
+if (reg_array->r[i]->access->decode.addr == addr) {
+reg = reg_array->r[i];
+break;
+}
+}
+assert(reg);
+
+/* Generate appropriate write enable mask and shift values */
+if (reg->data_size < size) {
+we = MAKE_64BIT_MASK(0, reg->data_size * 8);
+shift = 8 * (be ? reg->data_size - size : 0);
+} else if (reg->data_size >= size) {
+we = MAKE_64BIT_MASK(0, size * 8);
+}
+
+register_write(reg, value << shift, we << shift);
+}
+
+void register_write_memory_be(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size)
+{
+register_write_memory(opaque, addr, value, size, true);
+}
+
+
+void register_write_memory_le(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size)
+{
+register_write_memory(opaque, addr, value, size, false);
+}
+
+static inline uint64_t register_read_memory(void *opaque, hwaddr addr,
+unsigned size, bool be)
+{
+RegisterInfoArray *reg_array = opaque;
+RegisterInfo *reg = NULL;
+int i, shift;
+
+for (i = 0; i < reg_array->num_elements; i++) {
+if (reg_array->r[i]->access->decode.addr == addr) {
+reg = reg_array->r[i];
+break;
+}
+}
+assert(reg);
+
+shift = 8 * (be ? reg->data_size - size : 0);
+
+return (register_read(reg) >> shift) & MAKE_64BIT_MASK(0, size * 8);
+}
+
+uint64_t register_read_memory_be(void *opaque, hwaddr addr, unsigned size)
+{
+return register_read_memory(opaque, addr, size, true);
+}
+
+uint64_t register_read_memory_le(void *opaque, hwaddr addr, unsigned size)
+{
+return register_read_memory(opaque, addr, size, false);
+}
diff --git a/include/hw/register.h b/include/hw/register.h
index baa08f5..726a914 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -15,6 +15,7 @@
 
 typedef struct RegisterInfo RegisterInfo;
 typedef struct RegisterAccessInfo RegisterAccessInfo;
+typedef struct RegisterInfoArray RegisterInfoArray;
 
 /**
  * Access description for a register that is part of guest accessible device
@@ -51,6 +52,10 @@ struct RegisterAccessInfo {
 void (*post_write)(RegisterInfo *reg, uint64_t val);
 
 uint64_t (*post_read)(RegisterInfo *reg, uint64_t val);
+
+struct {
+hwaddr addr;
+} decode;
 };
 
 /**
@@ -82,6 +87,26 @@ struct RegisterInfo {
 };
 
 /**
+ * This structure is used to group all of the individual registers which are
+ * modeled using the RegisterInfo strucutre.
+ *
+ * @r is an aray containing of all the relevent RegisterInfo structures.
+ *
+ * @num_elements is the number of elements in the array r
+ *
+ * @mem: optional Memory region for the register
+ */
+
+struct RegisterInfoArray {
+/*  */
+MemoryRegion mem;
+
+/*  */
+int num_elements;
+RegisterInfo **r;
+};
+
+/**
  * write a value to a register, subject to its restrictions
  * @reg: register to write to
  * @val: value to write
@@ -105,4 +130,30 @@ uint64_t register_read(RegisterInfo *reg);
 
 void register_reset(RegisterInfo *reg);
 
+/**
+ * Memory API MMIO write handler that will write to a Register API register.
+ *  _be for big endian variant and _le for little endian.
+ * @opaque: RegisterInfo to write to
+ * @addr: Address to write
+ * @value: Value to write
+ * @size: Number of bytes to write
+ */
+
+void register_write_memory_be(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size);
+void register_write_memory_le(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size);
+
+/**
+ * Memory API MMIO read handler that will read from a Register API register.
+ *  _be for big endia

[Qemu-devel] [PATCH v5 04/15] register: Add support for decoding information

2016-03-08 Thread Alistair Francis

Allow defining of optional address decoding information in register
definitions. This is useful for clients that want to associate
registers with specific addresses.

Signed-off-by: Peter Crosthwaite 
Signed-off-by: Alistair Francis 
---
V3:
 - Remove unused flags option

 include/hw/register.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/hw/register.h b/include/hw/register.h
index 726a914..bc2c96a 100644
--- a/include/hw/register.h
+++ b/include/hw/register.h
@@ -39,6 +39,11 @@ typedef struct RegisterInfoArray RegisterInfoArray;
  * allowing this function to modify the value before return to the client.
  */
 
+#define REG_DECODE_READ (1 << 0)
+#define REG_DECODE_WRITE (1 << 1)
+#define REG_DECODE_EXECUTE (1 << 2)
+#define REG_DECODE_RW (REG_DECODE_READ | REG_DECODE_WRITE)
+
 struct RegisterAccessInfo {
 const char *name;
 uint64_t ro;
-- 
2.5.0

[Qemu-devel] [PATCH v5 01/15] bitops: Add MAKE_64BIT_MASK macro

2016-03-08 Thread Alistair Francis

Add a macro that creates a 64bit value which has length number of ones
shifted acrros by the value of shift.

Signed-off-by: Alistair Francis 
---
V5:
 - Re-write to a 64-bit mask instead of ONES()
 - Re-order this patch in the series

 include/qemu/bitops.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 755fdd1..3c45791 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -24,6 +24,9 @@
 #define BIT_WORD(nr)((nr) / BITS_PER_LONG)
 #define BITS_TO_LONGS(nr)   DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
 
+#define MAKE_64BIT_MASK(shift, length) \
+(((1ull << (length)) - 1) << shift)
+
 /**
  * set_bit - Set a bit in memory
  * @nr: the bit to set
-- 
2.5.0

[Qemu-devel] [PATCH v5 00/15] data-driven device registers

2016-03-08 Thread Alistair Francis

This patch series is based on Peter C's original register API. His
original cover letter is below.

Future work: Allow support for memory attributes.

V5:
 - Only create a single memory region instead of a memory region for
   each register
 - General tidyups based on Alex's comments
V4:
 - Rebase and fix build issue
 - Simplify the register write logic
 - Other small fixes suggested by Alex Bennee
V3:
 - Small changes reported by Fred
V2:
 - Rebase
 - Fix up IOU SLCR connections
 - Add the memory_region_add_subregion_no_print() function and use it
   for the registers
Changes since RFC:
 - Connect the ZynqMP IOU SLCR device
 - Rebase

Original cover letter From Peter:
Hi All. This is a new scheme I've come up with handling device registers in a
data driven way. My motivation for this is to factor out a lot of the access
checking that seems to be replicated in every device. See P1 commit message for
further discussion.

P1 is the main patch, adds the register definition functionality
P2-3,6 add helpers that glue the register API to the Memory API
P4 Defines a set of macros that minimise register and field definitions
P5 is QOMfication
P7 is a trivial
P10-13 Work up to GPIO support
P8,9,14 add new devices (the Xilinx Zynq devcfg & ZynqMP SLCR) that use this
scheme.
P15: Connect the ZynqMP SLCR device

This Zynq devcfg device was particularly finnicky with per-bit restrictions.
I'm also looking for a higher-than-usual modelling fidelity
on the register space, with semantics defined for random reserved bits
in-between otherwise consistent fields.

Here's an example of the qemu_log output for the devcfg device. This is produced
by now generic sharable code:

/machine/unattached/device[44]:Addr 0x08:CFG: write of value 0508
/machine/unattached/device[44]:Addr 0x80:MCTRL: write of value 00800010
/machine/unattached/device[44]:Addr 0x10:INT_MASK: write of value 
/machine/unattached/device[44]:Addr :CTRL: write of value 0c00607f

And an example of a rogue guest banging on a bad bit:

/machine/unattached/device[44]:Addr 0x14:STATUS bits 0x01 may not be \
written to 1

A future feature I am interested in is implementing TCG optimisation of
side-effectless registers. The register API allows clear definition of
what registers have txn side effects and which ones don't. You could even
go a step further and translate such side-effectless accesses based on the
data pointer for the register.


Alistair Francis (7):
  bitops: Add MAKE_64BIT_MASK macro
  register: Add Register API
  register: Add Memory API glue
  register: Add support for decoding information
  dma: Add Xilinx Zynq devcfg device model
  register: Add GPIO API
  xlnx-zynqmp: Connect the ZynqMP IOU SLCR

Peter Crosthwaite (8):
  register: Define REG and FIELD macros
  register: QOMify
  register: Add block initialise helper
  xilinx_zynq: Connect devcfg to the Zynq machine model
  qdev: Define qdev_get_gpio_out
  qdev: Add qdev_pass_all_gpios API
  irq: Add opaque setter routine
  misc: Introduce ZynqMP IOU SLCR

 default-configs/arm-softmmu.mak|   1 +
 hw/arm/xilinx_zynq.c   |   8 +
 hw/arm/xlnx-zynqmp.c   |  13 ++
 hw/core/Makefile.objs  |   1 +
 hw/core/irq.c  |   5 +
 hw/core/qdev.c |  21 ++
 hw/core/register.c | 376 +++
 hw/dma/Makefile.objs   |   1 +
 hw/dma/xlnx-zynq-devcfg.c  | 394 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/xlnx-zynqmp-iou-slcr.c | 115 ++
 include/hw/arm/xlnx-zynqmp.h   |   2 +
 include/hw/dma/xlnx-zynq-devcfg.h  |  62 ++
 include/hw/irq.h   |   2 +
 include/hw/misc/xlnx-zynqmp-iou-slcr.h |  47 
 include/hw/qdev-core.h |   3 +
 include/hw/register.h  | 263 ++
 include/qemu/bitops.h  |   3 +
 18 files changed, 1318 insertions(+)
 create mode 100644 hw/core/register.c
 create mode 100644 hw/dma/xlnx-zynq-devcfg.c
 create mode 100644 hw/misc/xlnx-zynqmp-iou-slcr.c
 create mode 100644 include/hw/dma/xlnx-zynq-devcfg.h
 create mode 100644 include/hw/misc/xlnx-zynqmp-iou-slcr.h
 create mode 100644 include/hw/register.h

-- 
2.5.0

Re: [Qemu-devel] [PATCH v12 8/9] hw/ptimer: Perform delayed tick instead of immediate if delta = 0

2016-03-08 Thread Peter Crosthwaite

On Sat, Jan 30, 2016 at 8:43 AM, Dmitry Osipenko  wrote:
> It might be necessary by some emulated HW to perform the tick after one
> period if delta = 0. Given that it is much less churny to implement immediate
> tick by the ptimer user itself, let's make ptimer do the delayed tick.
>

Isn't this related to previous patch? It is kind of a rounding problem
that will vary from timer to timer. Some timers may interpret the
"tick" as the wrap-around back to the load value (even if that is 0)
while others interpret the tick as the transition to zero (makes more
sense for a one-shot). It's hard to set a universal policy here.

But is this a non-issue when we consider that event latency (usually
interrupt latency) is undefined anyway?

Regards,
Peter

> Signed-off-by: Dmitry Osipenko 
> ---
>  hw/core/ptimer.c | 34 +++---
>  1 file changed, 15 insertions(+), 19 deletions(-)
>
> diff --git a/hw/core/ptimer.c b/hw/core/ptimer.c
> index b2044fb..bcd090c 100644
> --- a/hw/core/ptimer.c
> +++ b/hw/core/ptimer.c
> @@ -36,19 +36,7 @@ static void ptimer_reload(ptimer_state *s)
>  {
>  uint32_t period_frac = s->period_frac;
>  uint64_t period = s->period;
> -
> -if (s->delta == 0) {
> -ptimer_trigger(s);
> -}
> -
> -if (s->delta == 0 && s->enabled == 1) {
> -s->delta = s->limit;
> -}
> -
> -if (s->delta == 0) {
> -ptimer_stop(s);
> -return;
> -}
> +uint64_t delta = MAX(1, s->delta);
>
>  /*
>   * Artificially limit timeout rate to something
> @@ -59,15 +47,15 @@ static void ptimer_reload(ptimer_state *s)
>   * on the current generation of host machines.
>   */
>
> -if (s->enabled == 1 && (s->delta * period < 1) && !use_icount) {
> -period = 1 / s->delta;
> +if (s->enabled == 1 && (delta * period < 1) && !use_icount) {
> +period = 1 / delta;
>  period_frac = 0;
>  }
>
>  s->last_event = s->next_event;
> -s->next_event = s->last_event + s->delta * period;
> +s->next_event = s->last_event + delta * period;
>  if (period_frac) {
> -s->next_event += ((int64_t)period_frac * s->delta) >> 32;
> +s->next_event += ((int64_t)period_frac * delta) >> 32;
>  }
>  timer_mod(s->timer, s->next_event);
>  }
> @@ -75,8 +63,16 @@ static void ptimer_reload(ptimer_state *s)
>  static void ptimer_tick(void *opaque)
>  {
>  ptimer_state *s = (ptimer_state *)opaque;
> -s->delta = 0;
> -ptimer_reload(s);
> +
> +s->delta = (s->enabled == 1) ? s->limit : 0;
> +
> +if (s->delta == 0) {
> +s->enabled = 0;
> +} else {
> +ptimer_reload(s);
> +}
> +
> +ptimer_trigger(s);
>  }
>
>  uint64_t ptimer_get_count(ptimer_state *s)
> --
> 2.7.0
>

Re: [Qemu-devel] [PATCH v12 9/9] arm_mptimer: Convert to use ptimer

2016-03-08 Thread Peter Crosthwaite

On Sat, Jan 30, 2016 at 8:43 AM, Dmitry Osipenko  wrote:
> Current ARM MPTimer implementation uses QEMUTimer for the actual timer,
> this implementation isn't complete and mostly tries to duplicate of what
> generic ptimer is already doing fine.
>
> Conversion to ptimer brings the following benefits and fixes:
> - Simple timer pausing implementation
> - Fixes counter value preservation after stopping the timer
> - Correctly handles prescaler != 0 / counter = 0 / load = 0 cases
> - Code simplification and reduction
>
> Bump VMSD to version 3, since VMState is changed and is not compatible
> with the previous implementation.
>
> Signed-off-by: Dmitry Osipenko 

Im a little rusty on some of the corner cases that have come out of
your test suite, but the general idea is solid, and it adds those
missing features that got the whole discussion started.

Reviewed-by: Peter Crosthwaite 

> ---
>  hw/timer/arm_mptimer.c | 133 
> +
>  include/hw/timer/arm_mptimer.h |   5 +-
>  2 files changed, 70 insertions(+), 68 deletions(-)
>
> diff --git a/hw/timer/arm_mptimer.c b/hw/timer/arm_mptimer.c
> index 5dfab66..f002458 100644
> --- a/hw/timer/arm_mptimer.c
> +++ b/hw/timer/arm_mptimer.c
> @@ -19,8 +19,9 @@
>   * with this program; if not, see .
>   */
>
> +#include "hw/ptimer.h"
>  #include "hw/timer/arm_mptimer.h"
> -#include "qemu/timer.h"
> +#include "qemu/main-loop.h"
>  #include "qom/cpu.h"
>
>  /* This device implements the per-cpu private timer and watchdog block
> @@ -42,55 +43,54 @@ static inline void timerblock_update_irq(TimerBlock *tb)
>  }
>
>  /* Return conversion factor from mpcore timer ticks to qemu timer ticks.  */
> -static inline uint32_t timerblock_scale(TimerBlock *tb)
> +static inline uint32_t timerblock_scale(uint32_t control)
>  {
> -return (((tb->control >> 8) & 0xff) + 1) * 10;
> +return (((control >> 8) & 0xff) + 1) * 10;
>  }
>
> -static void timerblock_reload(TimerBlock *tb, int restart)
> +static inline void timerblock_set_count(struct ptimer_state *timer,
> +uint32_t control, uint64_t *count)
>  {
> -if (tb->count == 0) {
> -return;
> +/* PTimer would immediately trigger interrupt for periodic timer
> + * when counter set to 0, MPtimer under certain condition only.
> + */
> +if ((control & 3) == 3 && (control & 0xff00) == 0 && *count == 0) {
> +*count = ptimer_get_limit(timer);
>  }
> -if (restart) {
> -tb->tick = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> +ptimer_set_count(timer, *count);
> +}
> +
> +static inline void timerblock_run(struct ptimer_state *timer,
> +  uint32_t control, uint32_t load)
> +{
> +if ((control & 1) && ((control & 0xff00) || load != 0)) {
> +ptimer_run(timer, !(control & 2));
>  }
> -tb->tick += (int64_t)tb->count * timerblock_scale(tb);
> -timer_mod(tb->timer, tb->tick);
>  }
>
>  static void timerblock_tick(void *opaque)
>  {
>  TimerBlock *tb = (TimerBlock *)opaque;
>  tb->status = 1;
> -if (tb->control & 2) {
> -tb->count = tb->load;
> -timerblock_reload(tb, 0);
> -} else {
> -tb->count = 0;
> -}
>  timerblock_update_irq(tb);
> +/* Periodic timer with load = 0 and prescaler != 0 would re-trigger
> + * IRQ after one period, otherwise it either stops or wraps around.
> + */
> +if ((tb->control & 2) && (tb->control & 0xff00) &&
> +ptimer_get_limit(tb->timer) == 0) {
> +ptimer_run(tb->timer, 0);
> +}
>  }
>
>  static uint64_t timerblock_read(void *opaque, hwaddr addr,
>  unsigned size)
>  {
>  TimerBlock *tb = (TimerBlock *)opaque;
> -int64_t val;
>  switch (addr) {
>  case 0: /* Load */
> -return tb->load;
> +return ptimer_get_limit(tb->timer);
>  case 4: /* Counter.  */
> -if (((tb->control & 1) == 0) || (tb->count == 0)) {
> -return 0;
> -}
> -/* Slow and ugly, but hopefully won't happen too often.  */
> -val = tb->tick - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> -val /= timerblock_scale(tb);
> -if (val < 0) {
> -val = 0;
> -}
> -return val;
> +return ptimer_get_count(tb->timer);
>  case 8: /* Control.  */
>  return tb->control;
>  case 12: /* Interrupt status.  */
> @@ -104,37 +104,45 @@ static void timerblock_write(void *opaque, hwaddr addr,
>   uint64_t value, unsigned size)
>  {
>  TimerBlock *tb = (TimerBlock *)opaque;
> -int64_t old;
> +uint32_t control = tb->control;
>  switch (addr) {
>  case 0: /* Load */
> -tb->load = value;
> -/* Fall through.  */
> -case 4: /* Counter.  */
> -if ((tb->control & 1) && tb->count) {
> -/* Cancel the previous

[Qemu-devel] [PATCH v3 4/5] bcm2835_property: implement framebuffer control/configuration properties

2016-03-08 Thread Andrew Baumann

From: Grégory ESTRADE 

The property channel driver now interfaces with the framebuffer device
to query and set framebuffer parameters. As a result of this, the "get
ARM RAM size" query now correctly returns the video RAM base address
(not total RAM size), and the ram-size property is no longer relevant
here.

Signed-off-by: Grégory ESTRADE 
[AB: cleanup/refactoring for upstream submission]
Signed-off-by: Andrew Baumann 
Reviewed-by: Peter Maydell 
---

Notes:
v2:
 * avoid ldl/stl_phys
 * move code to increase default pi2 memory size from preceding patch here
   (it was incorrect without the property channel implementation changes)

 hw/arm/bcm2835_peripherals.c   |   8 +--
 hw/arm/raspi.c |   7 +-
 hw/misc/bcm2835_property.c | 139 -
 include/hw/misc/bcm2835_property.h |   5 +-
 4 files changed, 144 insertions(+), 15 deletions(-)

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index c2fe6b7..4d74a18 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -79,6 +79,8 @@ static void bcm2835_peripherals_init(Object *obj)
   "board-rev", &error_abort);
 qdev_set_parent_bus(DEVICE(&s->property), sysbus_get_default());
 
+object_property_add_const_link(OBJECT(&s->property), "fb",
+   OBJECT(&s->fb), &error_abort);
 object_property_add_const_link(OBJECT(&s->property), "dma-mr",
OBJECT(&s->gpu_bus_mr), &error_abort);
 
@@ -211,12 +213,6 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
qdev_get_gpio_in(DEVICE(&s->mboxes), MBOX_CHAN_FB));
 
 /* Property channel */
-object_property_set_int(OBJECT(&s->property), ram_size, "ram-size", &err);
-if (err) {
-error_propagate(errp, err);
-return;
-}
-
 object_property_set_bool(OBJECT(&s->property), true, "realized", &err);
 if (err) {
 error_propagate(errp, err);
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 5498209..83fe809 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -164,11 +164,6 @@ static void raspi2_machine_init(MachineClass *mc)
 mc->no_floppy = 1;
 mc->no_cdrom = 1;
 mc->max_cpus = BCM2836_NCPUS;
-
-/* XXX: Temporary restriction in RAM size from the full 1GB. Since
- * we do not yet support the framebuffer / GPU, we need to limit
- * RAM usable by the OS to sit below the peripherals.
- */
-mc->default_ram_size = 0x3F00; /* BCM2836_PERI_BASE */
+mc->default_ram_size = 1024 * 1024 * 1024;
 };
 DEFINE_MACHINE("raspi2", raspi2_machine_init)
diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c
index 41fbbe3..15dcc02 100644
--- a/hw/misc/bcm2835_property.c
+++ b/hw/misc/bcm2835_property.c
@@ -17,6 +17,11 @@ static void bcm2835_property_mbox_push(BCM2835PropertyState 
*s, uint32_t value)
 uint32_t tot_len;
 size_t resplen;
 uint32_t tmp;
+int n;
+uint32_t offset, length, color;
+uint32_t xres, yres, xoffset, yoffset, bpp, pixo, alpha;
+uint32_t *newxres = NULL, *newyres = NULL, *newxoffset = NULL,
+*newyoffset = NULL, *newbpp = NULL, *newpixo = NULL, *newalpha = NULL;
 
 value &= ~0xf;
 
@@ -60,7 +65,14 @@ static void bcm2835_property_mbox_push(BCM2835PropertyState 
*s, uint32_t value)
 /* base */
 stl_le_phys(&s->dma_as, value + 12, 0);
 /* size */
-stl_le_phys(&s->dma_as, value + 16, s->ram_size);
+stl_le_phys(&s->dma_as, value + 16, s->fbdev->vcram_base);
+resplen = 8;
+break;
+case 0x00010006: /* Get VC memory */
+/* base */
+stl_le_phys(&s->dma_as, value + 12, s->fbdev->vcram_base);
+/* size */
+stl_le_phys(&s->dma_as, value + 16, s->fbdev->vcram_size);
 resplen = 8;
 break;
 case 0x00028001: /* Set power state */
@@ -122,6 +134,114 @@ static void 
bcm2835_property_mbox_push(BCM2835PropertyState *s, uint32_t value)
 resplen = 8;
 break;
 
+/* Frame buffer */
+
+case 0x00040001: /* Allocate buffer */
+stl_le_phys(&s->dma_as, value + 12, s->fbdev->base);
+stl_le_phys(&s->dma_as, value + 16, s->fbdev->size);
+resplen = 8;
+break;
+case 0x00048001: /* Release buffer */
+resplen = 0;
+break;
+case 0x00040002: /* Blank screen */
+resplen = 4;
+break;
+case 0x00040003: /* Get display width/height */
+case 0x00040004:
+stl_le_phys(&s->dma_as, value + 12, s->fbdev->xres);
+stl_le_phys(&s->dma_as, value + 16, s->fbdev->yres);
+resplen = 8;
+break;
+case 0x00044003: /* Test display width/height */
+case 0x00044004:
+resplen = 8;
+

[Qemu-devel] [PATCH v3 5/5] bcm2835_dma: add emulation of Raspberry Pi DMA controller

2016-03-08 Thread Andrew Baumann

From: Grégory ESTRADE 

At present, all DMA transfers complete inline (so a looping descriptor
queue will lock up the device). We also do not model pause/abort,
arbitrarion/priority, or debug features.

Signed-off-by: Grégory ESTRADE 
[AB: implement 2D mode, cleanup/refactoring for upstream submission]
Signed-off-by: Andrew Baumann 
Reviewed-by: Peter Maydell 
---

Notes:
v2:
 * avoid ldl_phys/stl_phys
 * compute address of channel structure only after asserting its validity
 * correctly implement per-channel reset and abort bits
 * set channel paused bit when completing DMA

 hw/arm/bcm2835_peripherals.c |  26 +++
 hw/dma/Makefile.objs |   1 +
 hw/dma/bcm2835_dma.c | 408 +++
 include/hw/arm/bcm2835_peripherals.h |   2 +
 include/hw/dma/bcm2835_dma.h |  47 
 5 files changed, 484 insertions(+)
 create mode 100644 hw/dma/bcm2835_dma.c
 create mode 100644 include/hw/dma/bcm2835_dma.h

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index 4d74a18..8099a8a 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -88,6 +88,14 @@ static void bcm2835_peripherals_init(Object *obj)
 object_initialize(&s->sdhci, sizeof(s->sdhci), TYPE_SYSBUS_SDHCI);
 object_property_add_child(obj, "sdhci", OBJECT(&s->sdhci), NULL);
 qdev_set_parent_bus(DEVICE(&s->sdhci), sysbus_get_default());
+
+/* DMA Channels */
+object_initialize(&s->dma, sizeof(s->dma), TYPE_BCM2835_DMA);
+object_property_add_child(obj, "dma", OBJECT(&s->dma), NULL);
+qdev_set_parent_bus(DEVICE(&s->dma), sysbus_get_default());
+
+object_property_add_const_link(OBJECT(&s->dma), "dma-mr",
+   OBJECT(&s->gpu_bus_mr), &error_abort);
 }
 
 static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
@@ -258,6 +266,24 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+/* DMA Channels */
+object_property_set_bool(OBJECT(&s->dma), true, "realized", &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+memory_region_add_subregion(&s->peri_mr, DMA_OFFSET,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->dma), 0));
+memory_region_add_subregion(&s->peri_mr, DMA15_OFFSET,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->dma), 1));
+
+for (n = 0; n <= 12; n++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->dma), n,
+   qdev_get_gpio_in_named(DEVICE(&s->ic),
+  BCM2835_IC_GPU_IRQ,
+  INTERRUPT_DMA0 + n));
+}
 }
 
 static void bcm2835_peripherals_class_init(ObjectClass *oc, void *data)
diff --git a/hw/dma/Makefile.objs b/hw/dma/Makefile.objs
index 0e65ed0..a1abbcf 100644
--- a/hw/dma/Makefile.objs
+++ b/hw/dma/Makefile.objs
@@ -11,3 +11,4 @@ common-obj-$(CONFIG_SUN4M) += sun4m_iommu.o
 
 obj-$(CONFIG_OMAP) += omap_dma.o soc_dma.o
 obj-$(CONFIG_PXA2XX) += pxa2xx_dma.o
+obj-$(CONFIG_RASPI) += bcm2835_dma.o
diff --git a/hw/dma/bcm2835_dma.c b/hw/dma/bcm2835_dma.c
new file mode 100644
index 000..c7ce4e4
--- /dev/null
+++ b/hw/dma/bcm2835_dma.c
@@ -0,0 +1,408 @@
+/*
+ * Raspberry Pi emulation (c) 2012 Gregory Estrade
+ * This code is licensed under the GNU GPLv2 and later.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/dma/bcm2835_dma.h"
+
+/* DMA CS Control and Status bits */
+#define BCM2708_DMA_ACTIVE  (1 << 0)
+#define BCM2708_DMA_END (1 << 1) /* GE */
+#define BCM2708_DMA_INT (1 << 2)
+#define BCM2708_DMA_ISPAUSED(1 << 4)  /* Pause requested or not active */
+#define BCM2708_DMA_ISHELD  (1 << 5)  /* Is held by DREQ flow control */
+#define BCM2708_DMA_ERR (1 << 8)
+#define BCM2708_DMA_ABORT   (1 << 30) /* stop current CB, go to next, WO */
+#define BCM2708_DMA_RESET   (1 << 31) /* WO, self clearing */
+
+/* DMA control block "info" field bits */
+#define BCM2708_DMA_INT_EN  (1 << 0)
+#define BCM2708_DMA_TDMODE  (1 << 1)
+#define BCM2708_DMA_WAIT_RESP   (1 << 3)
+#define BCM2708_DMA_D_INC   (1 << 4)
+#define BCM2708_DMA_D_WIDTH (1 << 5)
+#define BCM2708_DMA_D_DREQ  (1 << 6)
+#define BCM2708_DMA_D_IGNORE(1 << 7)
+#define BCM2708_DMA_S_INC   (1 << 8)
+#define BCM2708_DMA_S_WIDTH (1 << 9)
+#define BCM2708_DMA_S_DREQ  (1 << 10)
+#define BCM2708_DMA_S_IGNORE(1 << 11)
+
+/* Register offsets */
+#define BCM2708_DMA_CS  0x00 /* Control and Status */
+#define BCM2708_DMA_ADDR0x04 /* Control block address */
+/* the current control block appears in the following registers - read only */
+#define BCM2708_DMA_INFO0x08
+#define BCM2708_DMA_SOURCE_AD   0x0c
+#define BCM2708_DMA_DEST_AD 0x10
+#define BCM2708_DMA_TXFR_LEN0x14
+#define BCM2708_DMA_STRIDE  0x18
+#define BCM2708_DMA_NEXTCB  0x1C
+#define

[Qemu-devel] [PATCH v3 3/5] bcm2835_fb: add framebuffer device for Raspberry Pi

2016-03-08 Thread Andrew Baumann

From: Grégory ESTRADE 

The framebuffer occupies the upper portion of memory (64MiB by
default), but it can only be controlled/configured via a system
mailbox or property channel (to be added by a subsequent patch).

Signed-off-by: Grégory ESTRADE 
[AB: added Windows (BGR) support and cleanup/refactoring for upstream 
submission]
Signed-off-by: Andrew Baumann 
Reviewed-by: Peter Maydell 
---

Notes:
v2:
 * avoid ldl_phys
 * move code to increase default pi2 memory size back to the final patch

 hw/arm/bcm2835_peripherals.c |  38 +++-
 hw/arm/bcm2836.c |   2 +
 hw/arm/raspi.c   |   5 +-
 hw/display/Makefile.objs |   1 +
 hw/display/bcm2835_fb.c  | 424 +++
 include/hw/arm/bcm2835_peripherals.h |   2 +
 include/hw/display/bcm2835_fb.h  |  47 
 7 files changed, 517 insertions(+), 2 deletions(-)
 create mode 100644 hw/display/bcm2835_fb.c
 create mode 100644 include/hw/display/bcm2835_fb.h

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index d6453cc..c2fe6b7 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -62,6 +62,16 @@ static void bcm2835_peripherals_init(Object *obj)
 object_property_add_const_link(OBJECT(&s->mboxes), "mbox-mr",
OBJECT(&s->mbox_mr), &error_abort);
 
+/* Framebuffer */
+object_initialize(&s->fb, sizeof(s->fb), TYPE_BCM2835_FB);
+object_property_add_child(obj, "fb", OBJECT(&s->fb), NULL);
+object_property_add_alias(obj, "vcram-size", OBJECT(&s->fb), "vcram-size",
+  &error_abort);
+qdev_set_parent_bus(DEVICE(&s->fb), sysbus_get_default());
+
+object_property_add_const_link(OBJECT(&s->fb), "dma-mr",
+   OBJECT(&s->gpu_bus_mr), &error_abort);
+
 /* Property channel */
 object_initialize(&s->property, sizeof(s->property), 
TYPE_BCM2835_PROPERTY);
 object_property_add_child(obj, "property", OBJECT(&s->property), NULL);
@@ -84,7 +94,7 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 Object *obj;
 MemoryRegion *ram;
 Error *err = NULL;
-uint32_t ram_size;
+uint32_t ram_size, vcram_size;
 CharDriverState *chr;
 int n;
 
@@ -174,6 +184,32 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_ARM_IRQ,
INTERRUPT_ARM_MAILBOX));
 
+/* Framebuffer */
+vcram_size = (uint32_t)object_property_get_int(OBJECT(s), "vcram-size",
+   &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+object_property_set_int(OBJECT(&s->fb), ram_size - vcram_size,
+"vcram-base", &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+object_property_set_bool(OBJECT(&s->fb), true, "realized", &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+memory_region_add_subregion(&s->mbox_mr, MBOX_CHAN_FB << 
MBOX_AS_CHAN_SHIFT,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->fb), 0));
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->fb), 0,
+   qdev_get_gpio_in(DEVICE(&s->mboxes), MBOX_CHAN_FB));
+
 /* Property channel */
 object_property_set_int(OBJECT(&s->property), ram_size, "ram-size", &err);
 if (err) {
diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index 0321439..89a6b35 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -42,6 +42,8 @@ static void bcm2836_init(Object *obj)
   &error_abort);
 object_property_add_alias(obj, "board-rev", OBJECT(&s->peripherals),
   "board-rev", &error_abort);
+object_property_add_alias(obj, "vcram-size", OBJECT(&s->peripherals),
+  "vcram-size", &error_abort);
 qdev_set_parent_bus(DEVICE(&s->peripherals), sysbus_get_default());
 }
 
diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index 6582279..5498209 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -113,6 +113,7 @@ static void setup_boot(MachineState *machine, int version, 
size_t ram_size)
 static void raspi2_init(MachineState *machine)
 {
 RasPiState *s = g_new0(RasPiState, 1);
+uint32_t vcram_size;
 DriveInfo *di;
 BlockBackend *blk;
 BusState *bus;
@@ -149,7 +150,9 @@ static void raspi2_init(MachineState *machine)
 qdev_prop_set_drive(carddev, "drive", blk, &error_fatal);
 object_property_set_bool(OBJECT(carddev), true, "realized", &error_fatal);
 
-setup_boot(machine, 2, machine->ram_size);
+vcram_size = object_property_get_int(OBJECT(&s->soc), "vcram-size",
+ &error_abort);
+setup_boot(machine, 2, machine->ram_size - vcram_size);
 }
 
 static void raspi2_mach

[Qemu-devel] [PATCH v3 2/5] bcm2835_aux: add emulation of BCM2835 AUX (aka UART1) block

2016-03-08 Thread Andrew Baumann

At present only the core UART functions (data path for tx/rx) are
implemented, which is enough for UEFI to boot. The following
features/registers are unimplemented:
  * Line/modem control
  * Scratch register
  * Extra control
  * Baudrate
  * SPI interfaces

Signed-off-by: Andrew Baumann 
Reviewed-by: Peter Maydell 
---

Notes:
v3:
 * use qemu_char_get_next_serial() to configure the chardev prop in
   bcm2835_peripherals, rather than reaching into serial_hds[] directly

v2:
 * document unimplemented features, log unimplemented register accesses
 * model read path as 8-bit
 * drop incorrect event (break detection) functionality
 * implement AUX_IRQ register
 * model interrupt enables as a uint8 rather than 2 bools
 * corrected bugs in implementation of IIR bits 0-2
 * use chardev prop rather than qemu_char_get_next_serial(); the
   soc-level (bcm2835_peripherals) handling is still a little messy
   however, because uart0 is a pl011 device which still calls
   qemu_char_get_next_serial()

 hw/arm/bcm2835_peripherals.c |  32 
 hw/char/Makefile.objs|   1 +
 hw/char/bcm2835_aux.c| 316 +++
 include/hw/arm/bcm2835_peripherals.h |   2 +
 include/hw/char/bcm2835_aux.h|  33 
 5 files changed, 384 insertions(+)
 create mode 100644 hw/char/bcm2835_aux.c
 create mode 100644 include/hw/char/bcm2835_aux.h

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index 6ce9cd1..d6453cc 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -12,6 +12,7 @@
 #include "hw/arm/bcm2835_peripherals.h"
 #include "hw/misc/bcm2835_mbox_defs.h"
 #include "hw/arm/raspi_platform.h"
+#include "sysemu/char.h"
 
 /* Peripheral base address on the VC (GPU) system bus */
 #define BCM2835_VC_PERI_BASE 0x7e00
@@ -48,6 +49,11 @@ static void bcm2835_peripherals_init(Object *obj)
 object_property_add_child(obj, "uart0", OBJECT(s->uart0), NULL);
 qdev_set_parent_bus(DEVICE(s->uart0), sysbus_get_default());
 
+/* AUX / UART1 */
+object_initialize(&s->aux, sizeof(s->aux), TYPE_BCM2835_AUX);
+object_property_add_child(obj, "aux", OBJECT(&s->aux), NULL);
+qdev_set_parent_bus(DEVICE(&s->aux), sysbus_get_default());
+
 /* Mailboxes */
 object_initialize(&s->mboxes, sizeof(s->mboxes), TYPE_BCM2835_MBOX);
 object_property_add_child(obj, "mbox", OBJECT(&s->mboxes), NULL);
@@ -79,6 +85,7 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 MemoryRegion *ram;
 Error *err = NULL;
 uint32_t ram_size;
+CharDriverState *chr;
 int n;
 
 obj = object_property_get_link(OBJECT(dev), "ram", &err);
@@ -131,6 +138,29 @@ static void bcm2835_peripherals_realize(DeviceState *dev, 
Error **errp)
 qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
INTERRUPT_UART));
 
+/* AUX / UART1 */
+/* TODO: don't call qemu_char_get_next_serial() here, instead set
+ * chardev properties for each uart at the board level, once pl011
+ * (uart0) has been updated to avoid qemu_char_get_next_serial()
+ */
+chr = qemu_char_get_next_serial();
+if (chr == NULL) {
+chr = qemu_chr_new("bcm2835.uart1", "null", NULL);
+}
+qdev_prop_set_chr(DEVICE(&s->aux), "chardev", chr);
+
+object_property_set_bool(OBJECT(&s->aux), true, "realized", &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+memory_region_add_subregion(&s->peri_mr, UART1_OFFSET,
+sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->aux), 0));
+sysbus_connect_irq(SYS_BUS_DEVICE(&s->aux), 0,
+qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
+   INTERRUPT_AUX));
+
 /* Mailboxes */
 object_property_set_bool(OBJECT(&s->mboxes), true, "realized", &err);
 if (err) {
@@ -203,6 +233,8 @@ static void bcm2835_peripherals_class_init(ObjectClass *oc, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 
 dc->realize = bcm2835_peripherals_realize;
+/* Reason: realize() method uses qemu_char_get_next_serial() */
+dc->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo bcm2835_peripherals_type_info = {
diff --git a/hw/char/Makefile.objs b/hw/char/Makefile.objs
index 5931cc8..69a553c 100644
--- a/hw/char/Makefile.objs
+++ b/hw/char/Makefile.objs
@@ -16,6 +16,7 @@ obj-$(CONFIG_SH4) += sh_serial.o
 obj-$(CONFIG_PSERIES) += spapr_vty.o
 obj-$(CONFIG_DIGIC) += digic-uart.o
 obj-$(CONFIG_STM32F2XX_USART) += stm32f2xx_usart.o
+obj-$(CONFIG_RASPI) += bcm2835_aux.o
 
 common-obj-$(CONFIG_ETRAXFS) += etraxfs_ser.o
 common-obj-$(CONFIG_ISA_DEBUG) += debugcon.o
diff --git a/hw/char/bcm2835_aux.c b/hw/char/bcm2835_aux.c
new file mode 100644
index 000..0394d11
--- /dev/null
+++ b/hw/char/bcm2835_aux.c
@@ -0,0 +1,316 @@
+/*
+ * BCM2835 (Raspberry Pi / Pi

[Qemu-devel] [PATCH v3 0/5] Raspberry Pi framebuffer, DMA and Windows support

2016-03-08 Thread Andrew Baumann

This patch series adds support for the AUX (second UART), framebuffer
and DMA controller on Raspberry Pi 2, and enables booting Windows on
this device. As with the previous series, it is heavily based on the
original (out of tree) work of Gregory Estrade, Stefan Weil and others
to support Raspberry Pi 1.

After this series, it is possible to boot Windows by following the
instructions at https://github.com/0xabu/qemu/wiki. You also boot
Raspbian to the GUI using a command such as:

  qemu-system-arm -M raspi2 -kernel raspbian-boot/kernel7.img -sd
  2015-09-24-raspbian-jessie.img -append "rw earlyprintk loglevel=8
  console=ttyAMA0 dwc_otg.lpm_enable=0 root=/dev/mmcblk0p2 rootwait"
  -dtb raspbian-boot/bcm2709-rpi-2-b.dtb -serial stdio

I plan to add USB, and remaining timers / system devices in
future patch series, along with support for pi1 (bcm2835). In the
meantime, the complete code is available at https://github.com/0xabu/qemu

v2:
 * added DMA controller
 * revised per PMM's review feedback
 * rebased on top of the patch for fixing ldl_phys/stl_phys in raspi
   devices, presently in target-arm.next

v3:
 * tweaked instantiation of uart1 char device
 * identified Gregory as primary author of framebuffer and DMA emulation

Cheers,
Andrew

Andrew Baumann (2):
  bcm2835_peripherals: enable sdhci pending-insert quirk for raspberry
pi
  bcm2835_aux: add emulation of BCM2835 AUX (aka UART1) block

Grégory ESTRADE (3):
  bcm2835_fb: add framebuffer device for Raspberry Pi
  bcm2835_property: implement framebuffer control/configuration
properties
  bcm2835_dma: add emulation of Raspberry Pi DMA controller

 hw/arm/bcm2835_peripherals.c | 103 -
 hw/arm/bcm2836.c |   2 +
 hw/arm/raspi.c   |  12 +-
 hw/char/Makefile.objs|   1 +
 hw/char/bcm2835_aux.c| 316 ++
 hw/display/Makefile.objs |   1 +
 hw/display/bcm2835_fb.c  | 424 +++
 hw/dma/Makefile.objs |   1 +
 hw/dma/bcm2835_dma.c | 408 +
 hw/misc/bcm2835_property.c   | 139 +++-
 include/hw/arm/bcm2835_peripherals.h |   6 +
 include/hw/char/bcm2835_aux.h|  33 +++
 include/hw/display/bcm2835_fb.h  |  47 
 include/hw/dma/bcm2835_dma.h |  47 
 include/hw/misc/bcm2835_property.h   |   5 +-
 15 files changed, 1532 insertions(+), 13 deletions(-)
 create mode 100644 hw/char/bcm2835_aux.c
 create mode 100644 hw/display/bcm2835_fb.c
 create mode 100644 hw/dma/bcm2835_dma.c
 create mode 100644 include/hw/char/bcm2835_aux.h
 create mode 100644 include/hw/display/bcm2835_fb.h
 create mode 100644 include/hw/dma/bcm2835_dma.h

-- 
2.7.0

1 2 3 4 >

1 - 100 of 311 matches

Mail list logo