Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset

2015-03-23 Thread David Gibson
On Tue, Mar 24, 2015 at 05:24:55PM +1100, Gavin Shan wrote:
> On Tue, Mar 24, 2015 at 04:41:21PM +1100, David Gibson wrote:
> >On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
> >> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
> >> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
> >> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
> >> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
> >> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
> >> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
> >> >> >> should be cleared accordingly. Otherwise, we will run into another
> >> >> >> (recursive) EEH error and the PCI devices contained in the PE have
> >> >> >> to be offlined exceptionally.
> >> >> >> 
> >> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
> >> >> >> table could be restored properly after EEH PE reset.
> >> >> >> 
> >> >> >> Signed-off-by: Gavin Shan 
> >> >> >> ---
> >> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
> >> >> >> error message for this function. Dropped vfio_put_group()
> >> >> >> on NULL group
> >> >> >> ---
> >> >> >>  hw/vfio/Makefile.objs  |  6 +-
> >> >> >>  hw/vfio/common.c   |  7 +++
> >> >> >>  hw/vfio/pci-stub.c | 17 +
> >> >> >>  hw/vfio/pci.c  | 38 ++
> >> >> >>  include/hw/vfio/vfio.h |  2 ++
> >> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
> >> >> >>  create mode 100644 hw/vfio/pci-stub.c
> >> >> >> 
> >> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> >> >> >> index e31f30e..1b8a065 100644
> >> >> >> --- a/hw/vfio/Makefile.objs
> >> >> >> +++ b/hw/vfio/Makefile.objs
> >> >> >> @@ -1,4 +1,8 @@
> >> >> >>  ifeq ($(CONFIG_LINUX), y)
> >> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
> >> >> >> -obj-$(CONFIG_PCI) += pci.o
> >> >> >> +ifeq ($(CONFIG_PCI), y)
> >> >> >> +obj-y += pci.o
> >> >> >> +else
> >> >> >> +obj-y += pci-stub.o
> >> >> >> +endif
> >> >> >>  endif
> >> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> >> >> index 148eb53..ed07814 100644
> >> >> >> --- a/hw/vfio/common.c
> >> >> >> +++ b/hw/vfio/common.c
> >> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, 
> >> >> >> int32_t groupid,
> >> >> >>  switch (req) {
> >> >> >>  case VFIO_CHECK_EXTENSION:
> >> >> >>  case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> >> >> >> +break;
> >> >> >>  case VFIO_EEH_PE_OP:
> >> >> >> +if (vfio_container_eeh_event(as, groupid, param) != 0) {
> >> >> >
> >> >> >I really dislike the idea of having an arbitrarily complex side effect
> >> >> >from a function whose name suggest's it's just a trivial wrapper
> >> >> >around the ioctl().
> >> >> >
> >> >> 
> >> >> Ok. I guess you would like putting the complex in the callers of
> >> >> vfio_container_ioctl().
> >> >
> >> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
> >> >which both called the ioctl() and did the other necessary pieces.
> >> >They should just be called something that indicates their full
> >> >function, not a name which suggests they're just an ioctl wrapper.
> >> >
> >> 
> >> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
> >> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
> >> at giving a good function name :)
> >
> >Well, I don't think your wrapper should be multiplexed.  The multiplex
> >works for the simple ioctl() wrapper, because there really is nothing
> >that varies apart from the exact ioctl number called.
> >
> >But now that you have different operations here, I think you want
> >wrappers for each one - each one will call the ioctl(), then do the
> >specific extra steps necessary for that operation.  So
> >vfio_container_event() will go away as well, split into various other
> >functions.
> >
> 
> It wouldn't a good idea if I understand your proposal correctly. Currnetly,
> the global function vfio_container_ioctl() can be called from sPAPR platform
> for any ioctl commands handled in kernel source file vfio_iommu_spapr_tce.c,
> which means the function isn't called for EEH only. Other sPAPR TCE container
> ioctl commands are also routed by this function. There will be lots if having
> one global function for each ioctl commands, which just improve the cost to
> maintain the code.

I don't really follow your objection.  I'm only suggesting separate
wrappers for things which require extra actions currently implemented
in vfio_container_event().  Things which only ned the plain ioctl()
can still use the simple vfio_container_ioctl() wrapper.

> Alternatively, we might expose another function vfio_container_eeh_ioctl(),
> which calls vfio_container_ioctl() after doing what we did in 
> vfio_container_event()
> if necessary.
> 
> Thanks,
> Gavin
> 
> 
> 

-- 
David Gibson| I'll h

Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset

2015-03-23 Thread Gavin Shan
On Tue, Mar 24, 2015 at 04:41:21PM +1100, David Gibson wrote:
>On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
>> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
>> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
>> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
>> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
>> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
>> >> >> should be cleared accordingly. Otherwise, we will run into another
>> >> >> (recursive) EEH error and the PCI devices contained in the PE have
>> >> >> to be offlined exceptionally.
>> >> >> 
>> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> >> >> table could be restored properly after EEH PE reset.
>> >> >> 
>> >> >> Signed-off-by: Gavin Shan 
>> >> >> ---
>> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>> >> >> error message for this function. Dropped vfio_put_group()
>> >> >> on NULL group
>> >> >> ---
>> >> >>  hw/vfio/Makefile.objs  |  6 +-
>> >> >>  hw/vfio/common.c   |  7 +++
>> >> >>  hw/vfio/pci-stub.c | 17 +
>> >> >>  hw/vfio/pci.c  | 38 ++
>> >> >>  include/hw/vfio/vfio.h |  2 ++
>> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
>> >> >>  create mode 100644 hw/vfio/pci-stub.c
>> >> >> 
>> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> >> >> index e31f30e..1b8a065 100644
>> >> >> --- a/hw/vfio/Makefile.objs
>> >> >> +++ b/hw/vfio/Makefile.objs
>> >> >> @@ -1,4 +1,8 @@
>> >> >>  ifeq ($(CONFIG_LINUX), y)
>> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
>> >> >> -obj-$(CONFIG_PCI) += pci.o
>> >> >> +ifeq ($(CONFIG_PCI), y)
>> >> >> +obj-y += pci.o
>> >> >> +else
>> >> >> +obj-y += pci-stub.o
>> >> >> +endif
>> >> >>  endif
>> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> >> >> index 148eb53..ed07814 100644
>> >> >> --- a/hw/vfio/common.c
>> >> >> +++ b/hw/vfio/common.c
>> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, 
>> >> >> int32_t groupid,
>> >> >>  switch (req) {
>> >> >>  case VFIO_CHECK_EXTENSION:
>> >> >>  case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> >> >> +break;
>> >> >>  case VFIO_EEH_PE_OP:
>> >> >> +if (vfio_container_eeh_event(as, groupid, param) != 0) {
>> >> >
>> >> >I really dislike the idea of having an arbitrarily complex side effect
>> >> >from a function whose name suggest's it's just a trivial wrapper
>> >> >around the ioctl().
>> >> >
>> >> 
>> >> Ok. I guess you would like putting the complex in the callers of
>> >> vfio_container_ioctl().
>> >
>> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
>> >which both called the ioctl() and did the other necessary pieces.
>> >They should just be called something that indicates their full
>> >function, not a name which suggests they're just an ioctl wrapper.
>> >
>> 
>> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
>> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
>> at giving a good function name :)
>
>Well, I don't think your wrapper should be multiplexed.  The multiplex
>works for the simple ioctl() wrapper, because there really is nothing
>that varies apart from the exact ioctl number called.
>
>But now that you have different operations here, I think you want
>wrappers for each one - each one will call the ioctl(), then do the
>specific extra steps necessary for that operation.  So
>vfio_container_event() will go away as well, split into various other
>functions.
>

It wouldn't a good idea if I understand your proposal correctly. Currnetly,
the global function vfio_container_ioctl() can be called from sPAPR platform
for any ioctl commands handled in kernel source file vfio_iommu_spapr_tce.c,
which means the function isn't called for EEH only. Other sPAPR TCE container
ioctl commands are also routed by this function. There will be lots if having
one global function for each ioctl commands, which just improve the cost to
maintain the code.

Alternatively, we might expose another function vfio_container_eeh_ioctl(),
which calls vfio_container_ioctl() after doing what we did in 
vfio_container_event()
if necessary.

Thanks,
Gavin

>-- 
>David Gibson   | I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au | minimalist, thank you.  NOT _the_ _other_
>   | _way_ _around_!
>http://www.ozlabs.org/~dgibson





Re: [Qemu-devel] [PATCH 00/10] spapr: Small patches to prepare for Dynamic DMA windows

2015-03-23 Thread David Gibson
On Tue, Mar 17, 2015 at 10:03:49PM +1100, Alexey Kardashevskiy wrote:
> On 03/10/2015 02:51 PM, Alexey Kardashevskiy wrote:
> >On 02/23/2015 07:33 PM, Alexey Kardashevskiy wrote:
> >>These I have in my DDW working tree for quite a while, while I am polishing
> >>others, there could go to some tree already.
> >>
> >>Please comment. Thanks!
> >
> >Alex, ping.
> >
> >"spapr_pci: Make find_phb()/find_dev() public" won't apply after Gavin's
> >EEH patches but the others would, should I repost them together with DDW
> >patches or separately or any other suggestion about these? Thanks.
> 
> 
> Ping?
> Do I need to repost those when I'll be posting next respin of DDW for QEMU
> or you can take them into your ppc-next-2.4? Thanks

I've merged these into my new spapr-next tree (branch 'spapr-next' at
git://github.com/dgibson/qemu.git).  I'll be pushing that tree to Alex
in future.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpPLgX6XXkGp.pgp
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset

2015-03-23 Thread David Gibson
On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
> >> >> should be cleared accordingly. Otherwise, we will run into another
> >> >> (recursive) EEH error and the PCI devices contained in the PE have
> >> >> to be offlined exceptionally.
> >> >> 
> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
> >> >> table could be restored properly after EEH PE reset.
> >> >> 
> >> >> Signed-off-by: Gavin Shan 
> >> >> ---
> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
> >> >> error message for this function. Dropped vfio_put_group()
> >> >> on NULL group
> >> >> ---
> >> >>  hw/vfio/Makefile.objs  |  6 +-
> >> >>  hw/vfio/common.c   |  7 +++
> >> >>  hw/vfio/pci-stub.c | 17 +
> >> >>  hw/vfio/pci.c  | 38 ++
> >> >>  include/hw/vfio/vfio.h |  2 ++
> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
> >> >>  create mode 100644 hw/vfio/pci-stub.c
> >> >> 
> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> >> >> index e31f30e..1b8a065 100644
> >> >> --- a/hw/vfio/Makefile.objs
> >> >> +++ b/hw/vfio/Makefile.objs
> >> >> @@ -1,4 +1,8 @@
> >> >>  ifeq ($(CONFIG_LINUX), y)
> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
> >> >> -obj-$(CONFIG_PCI) += pci.o
> >> >> +ifeq ($(CONFIG_PCI), y)
> >> >> +obj-y += pci.o
> >> >> +else
> >> >> +obj-y += pci-stub.o
> >> >> +endif
> >> >>  endif
> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> >> index 148eb53..ed07814 100644
> >> >> --- a/hw/vfio/common.c
> >> >> +++ b/hw/vfio/common.c
> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t 
> >> >> groupid,
> >> >>  switch (req) {
> >> >>  case VFIO_CHECK_EXTENSION:
> >> >>  case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> >> >> +break;
> >> >>  case VFIO_EEH_PE_OP:
> >> >> +if (vfio_container_eeh_event(as, groupid, param) != 0) {
> >> >
> >> >I really dislike the idea of having an arbitrarily complex side effect
> >> >from a function whose name suggest's it's just a trivial wrapper
> >> >around the ioctl().
> >> >
> >> 
> >> Ok. I guess you would like putting the complex in the callers of
> >> vfio_container_ioctl().
> >
> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
> >which both called the ioctl() and did the other necessary pieces.
> >They should just be called something that indicates their full
> >function, not a name which suggests they're just an ioctl wrapper.
> >
> 
> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
> at giving a good function name :)

Well, I don't think your wrapper should be multiplexed.  The multiplex
works for the simple ioctl() wrapper, because there really is nothing
that varies apart from the exact ioctl number called.

But now that you have different operations here, I think you want
wrappers for each one - each one will call the ioctl(), then do the
specific extra steps necessary for that operation.  So
vfio_container_event() will go away as well, split into various other
functions.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpbc44cWjlCc.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 40/45] Postcopy; Handle userfault requests

2015-03-23 Thread David Gibson
On Wed, Feb 25, 2015 at 04:52:03PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> userfaultfd is a Linux syscall that gives an fd that receives a stream
> of notifications of accesses to pages registered with it and allows
> the program to acknowledge those stalls and tell the accessing
> thread to carry on.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  include/migration/migration.h |   4 +
>  migration/postcopy-ram.c  | 217 
> --
>  trace-events  |  12 +++
>  3 files changed, 223 insertions(+), 10 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 139bb1b..cec064f 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -86,11 +86,15 @@ struct MigrationIncomingState {
>  
>  PostcopyState postcopy_state;
>  
> +bool   have_fault_thread;
>  QemuThread fault_thread;
>  QemuSemaphore  fault_thread_sem;
>  
>  /* For the kernel to send us notifications */
>  intuserfault_fd;
> +/* To tell the fault_thread to quit */
> +intuserfault_quit_fd;
> +
>  QEMUFile *return_path;
>  QemuMutex  rp_mutex;/* We send replies from multiple threads */
>  PostcopyPMIpostcopy_pmi;
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 86fa5a0..abc039e 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -47,6 +47,8 @@ struct PostcopyDiscardState {
>   */
>  #if defined(__linux__)
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -264,7 +266,7 @@ void postcopy_pmi_dump(MigrationIncomingState *mis)
>  void postcopy_hook_early_receive(MigrationIncomingState *mis,
>   size_t bitmap_index)
>  {
> -if (mis->postcopy_state == POSTCOPY_INCOMING_ADVISE) {
> +if (postcopy_state_get(mis) == POSTCOPY_INCOMING_ADVISE) {

It kind of looks like that's a fix which should be folded into an
earlier patch.

>  /*
>   * If we're in precopy-advise mode we need to track received pages 
> even
>   * though we don't need to place pages atomically yet.
> @@ -489,15 +491,40 @@ int postcopy_ram_incoming_init(MigrationIncomingState 
> *mis, size_t ram_pages)
>   */
>  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>  {
> -/* TODO: Join the fault thread once we're sure it will exit */
> -if (qemu_ram_foreach_block(cleanup_area, mis)) {
> -return -1;
> +trace_postcopy_ram_incoming_cleanup_entry();
> +
> +if (mis->have_fault_thread) {
> +uint64_t tmp64;
> +
> +if (qemu_ram_foreach_block(cleanup_area, mis)) {
> +return -1;
> +}
> +/*
> + * Tell the fault_thread to exit, it's an eventfd that should
> + * currently be at 0, we're going to inc it to 1
> + */
> +tmp64 = 1;
> +if (write(mis->userfault_quit_fd, &tmp64, 8) == 8) {
> +trace_postcopy_ram_incoming_cleanup_join();
> +qemu_thread_join(&mis->fault_thread);
> +} else {
> +/* Not much we can do here, but may as well report it */
> +perror("incing userfault_quit_fd");
> +}
> +trace_postcopy_ram_incoming_cleanup_closeuf();
> +close(mis->userfault_fd);
> +close(mis->userfault_quit_fd);
> +mis->have_fault_thread = false;
>  }
>  
> +postcopy_state_set(mis, POSTCOPY_INCOMING_END);
> +migrate_send_rp_shut(mis, qemu_file_get_error(mis->file) != 0);
> +
>  if (mis->postcopy_tmp_page) {
>  munmap(mis->postcopy_tmp_page, getpagesize());
>  mis->postcopy_tmp_page = NULL;
>  }
> +trace_postcopy_ram_incoming_cleanup_exit();
>  return 0;
>  }
>  
> @@ -531,36 +558,206 @@ static int ram_block_enable_notify(const char 
> *block_name, void *host_addr,
>  }
>  
>  /*
> + * Tell the kernel that we've now got some memory it previously asked for.
> + */
> +static int ack_userfault(MigrationIncomingState *mis, void *start, size_t 
> len)
> +{
> +struct uffdio_range range_struct;
> +
> +range_struct.start = (uint64_t)(uintptr_t)start;
> +range_struct.len = (uint64_t)len;
> +
> +errno = 0;
> +if (ioctl(mis->userfault_fd, UFFDIO_WAKE, &range_struct)) {
> +int e = errno;
> +
> +if (e == ENOENT) {
> +/* Kernel said it wasn't waiting - one case where this can
> + * happen is where two threads triggered the userfault
> + * and we receive the page and ack it just after we received
> + * the 2nd request and that ends up deciding it should ack it
> + * We could optimise it out, but it's rare.
> + */
> +/*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start, len); 
> */
> +return 0;
> +}
> +error_report("postcopy_ram: Failed to not

Re: [Qemu-devel] [PATCH v5 39/45] Host page!=target page: Cleanup bitmaps

2015-03-23 Thread David Gibson
On Wed, Feb 25, 2015 at 04:52:02PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Prior to the start of postcopy, ensure that everything that will
> be transferred later is a whole host-page in size.
> 
> This is accomplished by discarding partially transferred host pages
> and marking any that are partially dirty as fully dirty.

Again, I wonder if this might be a bit more obvious with
send/receive_host_page() helpers.  Rather than jiggering the basic
data structures, you make the code only do the transmission in terms
of host page sized chunks, doing the dirty check against all the
necessary target page bits.

Or better yet, a migration_chunk_size variable, rather than host page
size.  Initially that would be initialized to host page size, but
gives easier flexibility to improve future handling of cases where
source hps != dest hps.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgp6biHxJmJ3E.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 37/45] qemu_ram_block_from_host

2015-03-23 Thread David Gibson
On Wed, Feb 25, 2015 at 04:52:00PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Postcopy sends RAMBlock names and offsets over the wire (since it can't
> rely on the order of ramaddr being the same), and it starts out with
> HVA fault addresses from the kernel.

Though as noted earlier, I do wonder if GPA might be a better idea.


> 
> qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
> in the RAMBlock, the global ram_addr_t value and it's bitmap position.
> 
> Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.
> 
> Provide qemu_ram_get_idstr since its the actual name text sent on the
> wire.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpRDcospTOSj.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 38/45] Don't sync dirty bitmaps in postcopy

2015-03-23 Thread David Gibson
On Wed, Feb 25, 2015 at 04:52:01PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Once we're in postcopy the source processors are stopped and memory
> shouldn't change any more, so there's no need to look at the dirty
> map.
> 
> There are two notes to this:
>   1) If we do resync and a page had changed then the page would get
>  sent again, which the destination wouldn't allow (since it might
>  have also modified the page)

I'm not quite sure what you mean by "resync" in this context.

>   2) Before disabling this I'd seen very rare cases where a page had been
>  marked dirtied although the memory contents are apparently identical
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpHMUQpT1LRl.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v5 36/45] Postcopy: Use helpers to map pages during migration

2015-03-23 Thread David Gibson
On Wed, Feb 25, 2015 at 04:51:59PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> In postcopy, the destination guest is running at the same time
> as it's receiving pages; as we receive new pages we must put
> them into the guests address space atomically to avoid a running
> CPU accessing a partially written page.
> 
> Use the helpers in postcopy-ram.c to map these pages.
> 
> Note, gcc 4.9.2 is giving me false uninitialized warnings in ram_load's
> switch, so anything conditionally set at the start of the switch needs
> initializing; filed as
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64614
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  arch_init.c | 137 
> 
>  1 file changed, 111 insertions(+), 26 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index acf65e1..7a1c9ea 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1479,9 +1479,39 @@ static int load_xbzrle(QEMUFile *f, ram_addr_t addr, 
> void *host)
>  return 0;
>  }
>  
> +/*
> + * Helper for host_from_stream_offset in the succesful case.
> + * Returns the host pointer for the given block and offset
> + *   calling the postcopy hook and filling in *rb.
> + */
> +static void *host_with_postcopy_hook(MigrationIncomingState *mis,
> + ram_addr_t offset,
> + RAMBlock *block,
> + RAMBlock **rb)
> +{
> +if (rb) {
> +*rb = block;
> +}
> +
> +postcopy_hook_early_receive(mis,
> +(offset + block->offset) >> TARGET_PAGE_BITS);

What does postcopy_hook_early_receive() do?

> +return memory_region_get_ram_ptr(block->mr) + offset;
> +}
> +
> +/*
> + * Read a RAMBlock ID from the stream f, find the host address of the
> + * start of that block and add on 'offset'
> + *
> + * f: Stream to read from
> + * mis: MigrationIncomingState
> + * offset: Offset within the block
> + * flags: Page flags (mostly to see if it's a continuation of previous block)
> + * rb: Pointer to RAMBlock* that gets filled in with the RB we find
> + */
>  static inline void *host_from_stream_offset(QEMUFile *f,
> +MigrationIncomingState *mis,
>  ram_addr_t offset,
> -int flags)
> +int flags, RAMBlock **rb)
>  {
>  static RAMBlock *block = NULL;
>  char id[256];
> @@ -1492,8 +1522,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>  error_report("Ack, bad migration stream!");
>  return NULL;
>  }
> -
> -return memory_region_get_ram_ptr(block->mr) + offset;
> +return host_with_postcopy_hook(mis, offset, block, rb);
>  }
>  
>  len = qemu_get_byte(f);
> @@ -1503,7 +1532,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>  QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>  if (!strncmp(id, block->idstr, sizeof(id)) &&
>  block->max_length > offset) {
> -return memory_region_get_ram_ptr(block->mr) + offset;
> +return host_with_postcopy_hook(mis, offset, block, rb);
>  }
>  }
>  
> @@ -1537,6 +1566,15 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> version_id)
>  {
>  int flags = 0, ret = 0;
>  static uint64_t seq_iter;
> +/*
> + * System is running in postcopy mode, page inserts to host memory must 
> be
> + * atomic
> + */
> +MigrationIncomingState *mis = migration_incoming_get_current();
> +bool postcopy_running = postcopy_state_get(mis) >=
> +POSTCOPY_INCOMING_LISTENING;
> +void *postcopy_host_page = NULL;
> +bool postcopy_place_needed = false;
>  
>  seq_iter++;
>  
> @@ -1545,14 +1583,57 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> version_id)
>  }
>  
>  while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
> +RAMBlock *rb = 0; /* =0 needed to silence compiler */
>  ram_addr_t addr, total_ram_bytes;
> -void *host;
> +void *host = 0;
> +void *page_buffer = 0;
>  uint8_t ch;
> +bool all_zero = false;
>  
>  addr = qemu_get_be64(f);
>  flags = addr & ~TARGET_PAGE_MASK;
>  addr &= TARGET_PAGE_MASK;
>  
> +if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
> + RAM_SAVE_FLAG_XBZRLE)) {
> +host = host_from_stream_offset(f, mis, addr, flags, &rb);
> +if (!host) {
> +error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> +ret = -EINVAL;
> +break;
> +}
> +if (!postcopy_running) {
> +page_buffer = host;
> +} else {
> +/*
> + * Postcopy requires that we place whole ho

Re: [Qemu-devel] [PATCH v5 35/45] postcopy_ram.c: place_page and helpers

2015-03-23 Thread David Gibson
On Wed, Feb 25, 2015 at 04:51:58PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> postcopy_place_page (etc) provide a way for postcopy to place a page
> into guests memory atomically (using the copy ioctl on the ufd).
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  include/migration/migration.h|   2 +
>  include/migration/postcopy-ram.h |  16 ++
>  migration/postcopy-ram.c | 113 
> ++-
>  trace-events |   1 +
>  4 files changed, 130 insertions(+), 2 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index b1c7cad..139bb1b 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -94,6 +94,8 @@ struct MigrationIncomingState {
>  QEMUFile *return_path;
>  QemuMutex  rp_mutex;/* We send replies from multiple threads */
>  PostcopyPMIpostcopy_pmi;
> +void  *postcopy_tmp_page;
> +long   postcopy_place_skipped; /* Check for incorrect place ops 
> */
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
> diff --git a/include/migration/postcopy-ram.h 
> b/include/migration/postcopy-ram.h
> index fbb2a93..3d30280 100644
> --- a/include/migration/postcopy-ram.h
> +++ b/include/migration/postcopy-ram.h
> @@ -80,4 +80,20 @@ void postcopy_discard_send_chunk(MigrationState *ms, 
> PostcopyDiscardState *pds,
>  void postcopy_discard_send_finish(MigrationState *ms,
>PostcopyDiscardState *pds);
>  
> +/*
> + * Place a page (from) at (host) efficiently
> + *There are restrictions on how 'from' must be mapped, in general best
> + *to use other postcopy_ routines to allocate.
> + * returns 0 on success
> + */
> +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> +long bitmap_offset, bool all_zero);
> +
> +/*
> + * Allocate a page of memory that can be mapped at a later point in time
> + * using postcopy_place_page
> + * Returns: Pointer to allocated page
> + */
> +void *postcopy_get_tmp_page(MigrationIncomingState *mis);
> +
>  #endif
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 33dd332..86fa5a0 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -197,7 +197,6 @@ static PostcopyPMIState postcopy_pmi_get_state_nolock(
>  }
>  
>  /* Retrieve the state of the given page */
> -__attribute__ (( unused )) /* Until later in patch series */
>  static PostcopyPMIState postcopy_pmi_get_state(MigrationIncomingState *mis,
> size_t bitmap_index)
>  {
> @@ -213,7 +212,6 @@ static PostcopyPMIState 
> postcopy_pmi_get_state(MigrationIncomingState *mis,
>   * Set the page state to the given state if the previous state was as 
> expected
>   * Return the actual previous state.
>   */
> -__attribute__ (( unused )) /* Until later in patch series */
>  static PostcopyPMIState postcopy_pmi_change_state(MigrationIncomingState 
> *mis,
> size_t bitmap_index,
> PostcopyPMIState expected_state,
> @@ -477,6 +475,7 @@ static int cleanup_area(const char *block_name, void 
> *host_addr,
>  int postcopy_ram_incoming_init(MigrationIncomingState *mis, size_t ram_pages)
>  {
>  postcopy_pmi_init(mis, ram_pages);
> +mis->postcopy_place_skipped = -1;
>  
>  if (qemu_ram_foreach_block(init_area, mis)) {
>  return -1;
> @@ -495,6 +494,10 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState 
> *mis)
>  return -1;
>  }
>  
> +if (mis->postcopy_tmp_page) {
> +munmap(mis->postcopy_tmp_page, getpagesize());
> +mis->postcopy_tmp_page = NULL;
> +}
>  return 0;
>  }
>  
> @@ -561,6 +564,100 @@ int postcopy_ram_enable_notify(MigrationIncomingState 
> *mis)
>  return 0;
>  }
>  
> +/*
> + * Place a host page (from) at (host) tomically

s/tomically/atomically/

> + *There are restrictions on how 'from' must be mapped, in general best
> + *to use other postcopy_ routines to allocate.
> + * all_zero: Hint that the page being placed is 0 throughout
> + * returns 0 on success
> + * bitmap_offset: Index into the migration bitmaps
> + *
> + * State changes:
> + *   none -> received
> + *   requested -> received (ack)
> + *
> + * Note the UF thread is also updating the state, and maybe none->requested
> + * at the same time.

Hrm.. these facts do tend me towards thinking that separate and
explicit requested and received bits will be clearer than treating it
as an enum state variable

> + */
> +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> +long bitmap_offset, bool all_zero)
> +{
> +PostcopyPMIState old_state, tmp_state, new_state;
> +
> +if (!all_zero) {
> +struct uffdio_copy copy_struct;
> +
> + 

Re: [Qemu-devel] GSoC Proposal: ARM Virtualization Extensions

2015-03-23 Thread Edgar E. Iglesias
On Sun, Mar 22, 2015 at 07:30:07AM -0700, Sergey Fedorov wrote:
> On 20.03.2015 21:49, Edgar E. Iglesias wrote:
> > Hi all,
> >
> > Sergey, that's good to hear!
> >
> > Peter, Yes I have quite a bit of patch material but unfortunately a lot of
> > it is not in a state for upstreaming. I know I've promised to clean
> > it up and submit more but I have not been able to find time for it,
> > sorry about that...
> >
> > Patches that I planned to send are around hyp timers, misc bugfixes
> > here and there and maybe the GIC virt extensions.
> >
> > The 2 stage MMU code I have is shamefully ugly but it kind of works
> > so it is useful for understanding some of the problems that need to be
> > solved. (works well enough to boot XEN and KVM guests).
> >
> > Sergey and myself communicated a bit off-list a while back ago
> > and he expressed interest in AArch64 EL2 and my code. Sergey, it would be
> > very interesting to hear more details on how you are doing on that?
> >
> > Cheers,
> > Edgar
> 
> Hi Edgar,
> 
> I successfully reproduced your results with running KVM on your patches.
> Then I decided to start implementing the features without looking at
> your code so that I can get somewhat different view on how to implement
> that. Then I plan to compare my and your code and make a final version.
> 
> Now I've got patches that allows to boot Linux kernel with KVM enabled.
> So KVM initialization is happy. Basically, I added some EL2 registers
> required by KVM and adjusted page table walk to support EL2 translation
> regime. A colleague of mine has prepared a patch to support
> virtualization extensions in generic timer.
> 
> Before sending patches here, I will do my best to carefully compare my
> final code against yours and credit you appropriately in that patches
> for any derivatives found. Anyway, do not hesitate to ask questions or
> point out any misses in respect of this matter.


Thanks Sergey!

Cheers,
Edgar



Re: [Qemu-devel] [Xen-devel] [PATCH] SeaBios/vTPM: Enable Xen stubdom vTPM for HVM virtual machine

2015-03-23 Thread Xu, Quan


> -Original Message-
> From: Stefan Berger [mailto:stef...@linux.vnet.ibm.com]
> Sent: Tuesday, March 24, 2015 4:01 AM
> To: Xu, Quan; Ian Campbell
> Cc: ke...@koconnor.net; qemu-devel@nongnu.org;
> stefano.stabell...@eu.citrix.com; xen-de...@lists.xen.org
> Subject: Re: [Qemu-devel] [Xen-devel] [PATCH] SeaBios/vTPM: Enable Xen
> stubdom vTPM for HVM virtual machine
> 
> On 03/23/2015 08:03 AM, Xu, Quan wrote:
> >
> >> -Original Message-
> >> From: Stefan Berger [mailto:stef...@linux.vnet.ibm.com]
> >> Sent: Monday, March 23, 2015 6:57 PM
> >> To: Xu, Quan; Ian Campbell
> >> Cc: ke...@koconnor.net; xen-de...@lists.xen.org;
> >> qemu-devel@nongnu.org; stefano.stabell...@eu.citrix.com
> >> Subject: Re: [Xen-devel] [PATCH] SeaBios/vTPM: Enable Xen stubdom
> >> vTPM for HVM virtual machine
> >>
> >> On 03/22/2015 09:47 PM, Xu, Quan wrote:
>  -Original Message-
>  From: Stefan Berger [mailto:stef...@linux.vnet.ibm.com]
>  Sent: Friday, March 20, 2015 7:44 PM
>  To: Ian Campbell; Xu, Quan
>  Cc: ke...@koconnor.net; xen-de...@lists.xen.org;
>  qemu-devel@nongnu.org; stefano.stabell...@eu.citrix.com
>  Subject: Re: [Xen-devel] [PATCH] SeaBios/vTPM: Enable Xen stubdom
>  vTPM for HVM virtual machine
> 
>  On 03/19/2015 08:56 AM, Ian Campbell wrote:
> > On Tue, 2015-03-10 at 08:16 -0400, Quan Xu wrote:
> >> @@ -151,6 +152,8 @@ device_hardware_setup(void)
> >> esp_scsi_setup();
> >> megasas_setup();
> >> pvscsi_setup();
> >> +if (runningOnXen())
> >> +vtpm4hvm_setup();
> > Is there anything which is actually Xen specific about the driver
> > in tpm.[ch]? Would it be better to just probe for it, perhaps
> > gates by a Kconfig option which enables TPM support.
>  I also think the probing should be done. That code can also be
>  recycled from what I posted earlier. It's gated by a Kconfig
>  option, so it doesn't
> >> fill up the 128k ROM.
> Stefan
> 
> >>> Agree, I will do it ASAP.
> >> I reposted v9 of my series of patches. I will probably post v10 today.
> >> Please try that one then since these patches should cover Xen, QEMU
> >> (using a driver that only I can test at the moment), and to some
> >> extent bare metal system.
> >>
> >>
> >>  Stefan
> >>
> > Great!  Could you also archive v10 to your github?
> > then I can also test it and go through these source code.
> 
> I put it here now:
> 
> https://github.com/stefanberger/seabios-tpm
> 
>  Stefan

Thanks.
MS windows guest VM are maybe tricky issues. In my early-stage SeaBios patch, 
I deal with TPM TCPA and SSDT in SeaBios, but MS windows guest VM is blue 
screens(Linux guest virtual machines are working). 
It works when I deal with TPM TCPA and SSDT in hvmloader for Windows guest VM.

CCed Jan/Wei.


Quan



Re: [Qemu-devel] [PATCH v5 33/45] Page request: Process incoming page request

2015-03-23 Thread David Gibson
On Wed, Feb 25, 2015 at 04:51:56PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> On receiving MIG_RPCOMM_REQ_PAGES look up the address and
> queue the page.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  arch_init.c   | 55 
> +++
>  include/exec/cpu-all.h|  2 --
>  include/migration/migration.h | 21 +
>  include/qemu/typedefs.h   |  1 +
>  migration/migration.c | 33 +-
>  trace-events  |  3 ++-
>  6 files changed, 111 insertions(+), 4 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index d2c4457..9d8fc6b 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -669,6 +669,61 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, 
> ram_addr_t offset,
>  }
>  
>  /*
> + * Queue the pages for transmission, e.g. a request from postcopy destination
> + *   ms: MigrationStatus in which the queue is held
> + *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse 
> last)
> + *   start: Offset from the start of the RAMBlock
> + *   len: Length (in bytes) to send
> + *   Return: 0 on success
> + */
> +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> + ram_addr_t start, ram_addr_t len)
> +{
> +RAMBlock *ramblock;
> +
> +if (!rbname) {
> +/* Reuse last RAMBlock */
> +ramblock = ms->last_req_rb;
> +
> +if (!ramblock) {
> +/*
> + * Shouldn't happen, we can't reuse the last RAMBlock if
> + * it's the 1st request.
> + */
> +error_report("ram_save_queue_pages no previous block");
> +return -1;
> +}
> +} else {
> +ramblock = ram_find_block(rbname);
> +
> +if (!ramblock) {
> +/* We shouldn't be asked for a non-existent RAMBlock */
> +error_report("ram_save_queue_pages no block '%s'", rbname);
> +return -1;
> +}
> +}
> +trace_ram_save_queue_pages(ramblock->idstr, start, len);
> +if (start+len > ramblock->used_length) {
> +error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
> + __func__, start, len, ramblock->used_length);
> +return -1;
> +}
> +
> +struct MigrationSrcPageRequest *new_entry =
> +g_malloc0(sizeof(struct MigrationSrcPageRequest));
> +new_entry->rb = ramblock;
> +new_entry->offset = start;
> +new_entry->len = len;
> +ms->last_req_rb = ramblock;
> +
> +qemu_mutex_lock(&ms->src_page_req_mutex);
> +QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
> +qemu_mutex_unlock(&ms->src_page_req_mutex);
> +
> +return 0;
> +}
> +
> +/*
>   * ram_find_and_save_block: Finds a page to send and sends it to f
>   *
>   * Returns:  The number of bytes written.
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index 2c48286..3088000 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -265,8 +265,6 @@ CPUArchState *cpu_copy(CPUArchState *env);
>  
>  /* memory API */
>  
> -typedef struct RAMBlock RAMBlock;
> -
>  struct RAMBlock {
>  struct MemoryRegion *mr;
>  uint8_t *host;
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 2c15d63..b1c7cad 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -100,6 +100,18 @@ MigrationIncomingState 
> *migration_incoming_get_current(void);
>  MigrationIncomingState *migration_incoming_state_new(QEMUFile *f);
>  void migration_incoming_state_destroy(void);
>  
> +/*
> + * An outstanding page request, on the source, having been received
> + * and queued
> + */
> +struct MigrationSrcPageRequest {
> +RAMBlock *rb;
> +hwaddroffset;
> +hwaddrlen;
> +
> +QSIMPLEQ_ENTRY(MigrationSrcPageRequest) next_req;
> +};
> +
>  struct MigrationState
>  {
>  int64_t bandwidth_limit;
> @@ -142,6 +154,12 @@ struct MigrationState
>   * of the postcopy phase
>   */
>  unsigned long *sentmap;
> +
> +/* Queue of outstanding page requests from the destination */
> +QemuMutex src_page_req_mutex;
> +QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
> src_page_requests;
> +/* The RAMBlock used in the last src_page_request */
> +RAMBlock *last_req_rb;
>  };
>  
>  void process_incoming_migration(QEMUFile *f);
> @@ -276,6 +294,9 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t 
> block_offset,
>   ram_addr_t offset, size_t size,
>   int *bytes_sent);
>  
> +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> + ram_addr_t start, ram_addr_t len);
> +
>  PostcopyState postcopy_state_get(MigrationIncomingState *mis);
>  
>  /* Set the state and return the old state */
> diff --git a/include/qemu/typedefs.h b/include/

Re: [Qemu-devel] [PATCH v5 34/45] Page request: Consume pages off the post-copy queue

2015-03-23 Thread David Gibson
On Wed, Feb 25, 2015 at 04:51:57PM +, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> When transmitting RAM pages, consume pages that have been queued by
> MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
> 
> Note:
>   a) After a queued page the linear walk carries on from after the
> unqueued page; there is a reasonable chance that the destination
> was about to ask for other closeby pages anyway.
> 
>   b) We have to be careful of any assumptions that the page walking
> code makes, in particular it does some short cuts on its first linear
> walk that break as soon as we do a queued page.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  arch_init.c  | 154 
> +--
>  trace-events |   2 +
>  2 files changed, 131 insertions(+), 25 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 9d8fc6b..acf65e1 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -328,6 +328,7 @@ static RAMBlock *last_seen_block;
>  /* This is the last block from where we have sent data */
>  static RAMBlock *last_sent_block;
>  static ram_addr_t last_offset;
> +static bool last_was_from_queue;
>  static unsigned long *migration_bitmap;
>  static uint64_t migration_dirty_pages;
>  static uint32_t last_version;
> @@ -461,6 +462,19 @@ static inline bool migration_bitmap_set_dirty(ram_addr_t 
> addr)
>  return ret;
>  }
>  
> +static inline bool migration_bitmap_clear_dirty(ram_addr_t addr)
> +{
> +bool ret;
> +int nr = addr >> TARGET_PAGE_BITS;
> +
> +ret = test_and_clear_bit(nr, migration_bitmap);
> +
> +if (ret) {
> +migration_dirty_pages--;
> +}
> +return ret;
> +}
> +
>  static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
>  {
>  ram_addr_t addr;
> @@ -669,6 +683,39 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, 
> ram_addr_t offset,
>  }
>  
>  /*
> + * Unqueue a page from the queue fed by postcopy page requests
> + *
> + * Returns:  The RAMBlock* to transmit from (or NULL if the queue is 
> empty)
> + *  ms:  MigrationState in
> + *  offset:  the byte offset within the RAMBlock for the start of the 
> page
> + * ram_addr_abs: global offset in the dirty/sent bitmaps
> + */
> +static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t 
> *offset,
> +   ram_addr_t *ram_addr_abs)
> +{
> +RAMBlock *result = NULL;
> +qemu_mutex_lock(&ms->src_page_req_mutex);
> +if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
> +struct MigrationSrcPageRequest *entry =
> +QSIMPLEQ_FIRST(&ms->src_page_requests);
> +result = entry->rb;
> +*offset = entry->offset;
> +*ram_addr_abs = (entry->offset + entry->rb->offset) & 
> TARGET_PAGE_MASK;
> +
> +if (entry->len > TARGET_PAGE_SIZE) {
> +entry->len -= TARGET_PAGE_SIZE;
> +entry->offset += TARGET_PAGE_SIZE;
> +} else {
> +QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> +g_free(entry);
> +}
> +}
> +qemu_mutex_unlock(&ms->src_page_req_mutex);
> +
> +return result;
> +}
> +
> +/*
>   * Queue the pages for transmission, e.g. a request from postcopy destination
>   *   ms: MigrationStatus in which the queue is held
>   *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse 
> last)
> @@ -732,46 +779,102 @@ int ram_save_queue_pages(MigrationState *ms, const 
> char *rbname,
>  
>  static int ram_find_and_save_block(QEMUFile *f, bool last_stage)
>  {
> +MigrationState *ms = migrate_get_current();
>  RAMBlock *block = last_seen_block;
> +RAMBlock *tmpblock;
>  ram_addr_t offset = last_offset;
> +ram_addr_t tmpoffset;
>  bool complete_round = false;
>  int bytes_sent = 0;
> -MemoryRegion *mr;
>  ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
>   ram_addr_t space */
> +unsigned long hps = sysconf(_SC_PAGESIZE);
>  
> -if (!block)
> +if (!block) {
>  block = QTAILQ_FIRST(&ram_list.blocks);
> +last_was_from_queue = false;
> +}
>  
> -while (true) {
> -mr = block->mr;
> -offset = migration_bitmap_find_and_reset_dirty(mr, offset,
> -   &dirty_ram_abs);
> -if (complete_round && block == last_seen_block &&
> -offset >= last_offset) {
> -break;
> +while (true) { /* Until we send a block or run out of stuff to send */
> +tmpblock = NULL;
> +
> +/*
> + * Don't break host-page chunks up with queue items
> + * so only unqueue if,
> + *   a) The last item came from the queue anyway
> + *   b) The last sent item was the last target-page in a host page
> + */
> +if (last_was_from_queue || !last_sen

Re: [Qemu-devel] [PATCH v5 45/45] Inhibit ballooning during postcopy

2015-03-23 Thread David Gibson
On Mon, Mar 23, 2015 at 12:21:50PM +, Dr. David Alan Gilbert wrote:
> * David Gibson (da...@gibson.dropbear.id.au) wrote:
> > On Wed, Feb 25, 2015 at 04:52:08PM +, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > The userfault mechanism used for postcopy generates faults
> > > for us on pages that are 'not present', inflating a balloon in
> > > the guest causes host pages to be marked as 'not present'; doing
> > > this during a postcopy, as potentially the same pages were being
> > > received from the source, would confuse the state of the received
> > > page -> disable ballooning during postcopy.
> > 
> > That is a ludicrously long sentence, which I have great difficulty parsing.
> 
> OK, how about:
> 
> -
> Postcopy detects accesses to pages that haven't been transferred yet
> using userfaultfd, and it causes exceptions on pages that are 'not present'.
> Ballooning also causes pages to be marked as 'not present' when the guest
> inflates the balloon.
> Potentially a balloon could be inflated to discard pages that are currently
> inflight during postcopy and that may be arriving at about the same time.
> 
> To avoid this confusion, disable ballooning during postcopy.

Better, thanks.

> -
> 
> > > When disabled we drop balloon requests from the guest.  Since ballooning
> > > is generally initiated by the host, the management system should avoid
> > > initiating any balloon instructions to the guest during migration,
> > > although it's not possible to know how long it would take a guest to
> > > process a request made prior to the start of migration.
> > 
> > Yeah :/.  It would be nice if it could queue the guest actions,
> > instead of dropping them.
> 
> Yes, I did look at that briefly; it's not trivial; for
> example consider the situation where the guest discards some pages
> by inflating, and then later deflates, it expects to lose that data
> but then starts accessing that physical page again.  
> If you replay that sequence at the end then you've lost newly accessed pages.
> So you have to filter out inflates that have been deflated later,
> and have to order those correctly with the sense of changes made to those
> pages after the deflation occurs.

Ah, yes, I see.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpeT_3V24cQP.pgp
Description: PGP signature


[Qemu-devel] [PATCH v2 0/2] block: Fix unaligned zero write

2015-03-23 Thread Fam Zheng
This fixes a segfault when doing unaligned zero write to an image that is 4k
aligned.

v2: Don't drop the unset of UNMAP flag. [Stefan, Peter]

Reproducer:

$ (echo "open -o file.align=4k blkdebug::img"; echo "write -z 512 1024") | 
qemu-io

Fam Zheng (2):
  block: Fix unaligned zero write
  qemu-iotests: Test unaligned 4k zero write

 block.c| 45 ++--
 tests/qemu-iotests/033 | 47 +-
 tests/qemu-iotests/033.out | 26 +
 3 files changed, 95 insertions(+), 23 deletions(-)

-- 
1.9.3




[Qemu-devel] [PATCH v2 1/2] block: Fix unaligned zero write

2015-03-23 Thread Fam Zheng
If the zero write is not aligned, bdrv_co_do_pwritev will segfault
because of accessing to the NULL qiov passed in by bdrv_co_write_zeroes.
Fix this by allocating a local qiov in bdrv_co_do_pwritev if the request
is not aligned. (In this case the padding iovs are necessary anyway, so
it doesn't hurt.)

Also add a check at the end of bdrv_co_do_pwritev to clear the zero flag
if padding is involved.

Signed-off-by: Fam Zheng 
---
 block.c | 45 +++--
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 0fe97de..f2f8ae7 100644
--- a/block.c
+++ b/block.c
@@ -3118,6 +3118,19 @@ out:
 return ret;
 }
 
+static inline uint64_t bdrv_get_align(BlockDriverState *bs)
+{
+/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
+return MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+}
+
+static inline bool bdrv_req_is_aligned(BlockDriverState *bs,
+   int64_t offset, size_t bytes)
+{
+int64_t align = bdrv_get_align(bs);
+return !(offset & (align - 1) || (bytes & (align - 1)));
+}
+
 /*
  * Handle a read request in coroutine context
  */
@@ -3128,8 +3141,7 @@ static int coroutine_fn 
bdrv_co_do_preadv(BlockDriverState *bs,
 BlockDriver *drv = bs->drv;
 BdrvTrackedRequest req;
 
-/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
-uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+uint64_t align = bdrv_get_align(bs);
 uint8_t *head_buf = NULL;
 uint8_t *tail_buf = NULL;
 QEMUIOVector local_qiov;
@@ -3371,8 +3383,7 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
 BdrvRequestFlags flags)
 {
 BdrvTrackedRequest req;
-/* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
-uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+uint64_t align = bdrv_get_align(bs);
 uint8_t *head_buf = NULL;
 uint8_t *tail_buf = NULL;
 QEMUIOVector local_qiov;
@@ -3471,6 +3482,10 @@ static int coroutine_fn 
bdrv_co_do_pwritev(BlockDriverState *bs,
 bytes = ROUND_UP(bytes, align);
 }
 
+if (use_local_qiov) {
+/* Local buffer may have non-zero data. */
+flags &= ~BDRV_REQ_ZERO_WRITE;
+}
 ret = bdrv_aligned_pwritev(bs, &req, offset, bytes,
use_local_qiov ? &local_qiov : qiov,
flags);
@@ -3511,14 +3526,32 @@ int coroutine_fn bdrv_co_write_zeroes(BlockDriverState 
*bs,
   int64_t sector_num, int nb_sectors,
   BdrvRequestFlags flags)
 {
+int ret;
+
 trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors, flags);
 
 if (!(bs->open_flags & BDRV_O_UNMAP)) {
 flags &= ~BDRV_REQ_MAY_UNMAP;
 }
+if (bdrv_req_is_aligned(bs, sector_num << BDRV_SECTOR_BITS,
+nb_sectors << BDRV_SECTOR_BITS)) {
+ret = bdrv_co_do_writev(bs, sector_num, nb_sectors, NULL,
+BDRV_REQ_ZERO_WRITE | flags);
+} else {
+uint8_t *buf;
+QEMUIOVector local_qiov;
+size_t bytes = nb_sectors << BDRV_SECTOR_BITS;
 
-return bdrv_co_do_writev(bs, sector_num, nb_sectors, NULL,
- BDRV_REQ_ZERO_WRITE | flags);
+buf = qemu_memalign(bdrv_opt_mem_align(bs), bytes);
+memset(buf, 0, bytes);
+qemu_iovec_init(&local_qiov, 1);
+qemu_iovec_add(&local_qiov, buf, bytes);
+
+ret = bdrv_co_do_writev(bs, sector_num, nb_sectors, &local_qiov,
+BDRV_REQ_ZERO_WRITE | flags);
+qemu_vfree(buf);
+}
+return ret;
 }
 
 /**
-- 
1.9.3




[Qemu-devel] [PATCH v2 2/2] qemu-iotests: Test unaligned 4k zero write

2015-03-23 Thread Fam Zheng
Signed-off-by: Fam Zheng 
---
 tests/qemu-iotests/033 | 47 +-
 tests/qemu-iotests/033.out | 26 +
 2 files changed, 56 insertions(+), 17 deletions(-)

diff --git a/tests/qemu-iotests/033 b/tests/qemu-iotests/033
index ea3351c..4008f10 100755
--- a/tests/qemu-iotests/033
+++ b/tests/qemu-iotests/033
@@ -46,26 +46,39 @@ _supported_os Linux
 size=128M
 _make_test_img $size
 
-echo
-echo "== preparing image =="
-$QEMU_IO -c "write -P 0xa 0x200 0x400" "$TEST_IMG" | _filter_qemu_io
-$QEMU_IO -c "write -P 0xa 0x2 0x600" "$TEST_IMG" | _filter_qemu_io
-$QEMU_IO -c "write -z 0x400 0x2" "$TEST_IMG" | _filter_qemu_io
+do_test()
+{
+   local align=$1
+   local iocmd=$2
+   local img=$3
+   {
+   echo "open -o driver=$IMGFMT,file.align=$align blkdebug::$img"
+   echo $iocmd
+   } | $QEMU_IO
+}
 
-echo
-echo "== verifying patterns (1) =="
-$QEMU_IO -c "read -P 0xa 0x200 0x200" "$TEST_IMG" | _filter_qemu_io
-$QEMU_IO -c "read -P 0x0 0x400 0x2" "$TEST_IMG" | _filter_qemu_io
-$QEMU_IO -c "read -P 0xa 0x20400 0x200" "$TEST_IMG" | _filter_qemu_io
+for align in 512 4k; do
+   echo
+   echo "== preparing image =="
+   do_test $align "write -P 0xa 0x200 0x400" "$TEST_IMG" | _filter_qemu_io
+   do_test $align "write -P 0xa 0x2 0x600" "$TEST_IMG" | 
_filter_qemu_io
+   do_test $align "write -z 0x400 0x2" "$TEST_IMG" | _filter_qemu_io
 
-echo
-echo "== rewriting zeroes =="
-$QEMU_IO -c "write -P 0xb 0x1 0x1" "$TEST_IMG" | _filter_qemu_io
-$QEMU_IO -c "write -z 0x1 0x1" "$TEST_IMG" | _filter_qemu_io
+   echo
+   echo "== verifying patterns (1) =="
+   do_test $align "read -P 0xa 0x200 0x200" "$TEST_IMG" | _filter_qemu_io
+   do_test $align "read -P 0x0 0x400 0x2" "$TEST_IMG" | _filter_qemu_io
+   do_test $align "read -P 0xa 0x20400 0x200" "$TEST_IMG" | _filter_qemu_io
 
-echo
-echo "== verifying patterns (2) =="
-$QEMU_IO -c "read -P 0x0 0x400 0x2" "$TEST_IMG" | _filter_qemu_io
+   echo
+   echo "== rewriting zeroes =="
+   do_test $align "write -P 0xb 0x1 0x1" "$TEST_IMG" | 
_filter_qemu_io
+   do_test $align "write -z 0x1 0x1" "$TEST_IMG" | _filter_qemu_io
+
+   echo
+   echo "== verifying patterns (2) =="
+   do_test $align "read -P 0x0 0x400 0x2" "$TEST_IMG" | _filter_qemu_io
+done
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/033.out b/tests/qemu-iotests/033.out
index 41475ee..305949f 100644
--- a/tests/qemu-iotests/033.out
+++ b/tests/qemu-iotests/033.out
@@ -26,4 +26,30 @@ wrote 65536/65536 bytes at offset 65536
 == verifying patterns (2) ==
 read 131072/131072 bytes at offset 1024
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== preparing image ==
+wrote 1024/1024 bytes at offset 512
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1536/1536 bytes at offset 131072
+1.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 131072/131072 bytes at offset 1024
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verifying patterns (1) ==
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 1024
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 132096
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== rewriting zeroes ==
+wrote 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verifying patterns (2) ==
+read 131072/131072 bytes at offset 1024
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
1.9.3




Re: [Qemu-devel] [PATCH 1/2] block: Fix unaligned zero write

2015-03-23 Thread Fam Zheng
On Mon, 03/23 14:14, Stefan Hajnoczi wrote:
> On Mon, Mar 23, 2015 at 12:46:09PM +0800, Fam Zheng wrote:
> > @@ -3435,6 +3446,10 @@ static int coroutine_fn 
> > bdrv_co_do_pwritev(BlockDriverState *bs,
> >  bytes = ROUND_UP(bytes, align);
> >  }
> >  
> > +if (use_local_qiov) {
> > +/* Local buffer may have non-zero data. */
> > +flags &= ~BDRV_REQ_ZERO_WRITE;
> > +}
> >  ret = bdrv_aligned_pwritev(bs, &req, offset, bytes,
> > use_local_qiov ? &local_qiov : qiov,
> > flags);
> > @@ -3475,14 +3490,32 @@ int coroutine_fn 
> > bdrv_co_write_zeroes(BlockDriverState *bs,
> >int64_t sector_num, int nb_sectors,
> >BdrvRequestFlags flags)
> >  {
> > +int ret;
> > +
> >  trace_bdrv_co_write_zeroes(bs, sector_num, nb_sectors, flags);
> >  
> > -if (!(bs->open_flags & BDRV_O_UNMAP)) {
> > -flags &= ~BDRV_REQ_MAY_UNMAP;
> > -}
> 
> Why is it okay to drop this when the request is aligned?

It's not. I will restore it.

> 
> > +if (bdrv_req_is_aligned(bs, sector_num << BDRV_SECTOR_BITS,
> > +nb_sectors << BDRV_SECTOR_BITS)) {
> > +ret = bdrv_co_do_writev(bs, sector_num, nb_sectors, NULL,
> > +BDRV_REQ_ZERO_WRITE | flags);
> > +} else {





Re: [Qemu-devel] [PATCH v2] block: Switch to host monotonic clock for IO throttling

2015-03-23 Thread Fam Zheng
On Mon, 03/23 14:28, Stefan Hajnoczi wrote:
> On Mon, Mar 23, 2015 at 1:58 PM, Paolo Bonzini  wrote:
> > On 23/03/2015 14:56, Stefan Hajnoczi wrote:
> >> On Mon, Mar 23, 2015 at 01:24:15PM +0800, Fam Zheng wrote:
> >>> Currently, throttle timers won't make any progress when VCPU is not
> >>> running, which would stall the request queue in utils, qtest, vm
> >>> suspending, and live migration without special handling.
> >>>
> >>> For example in bdrv_drain_all, all requests are resumed immediately
> >>> without taking throttling limit into account. This means whenever it is
> >>> called, IO throttling goes ineffective (examples: system reset,
> >>> migration and many block job operations.).
> >>>
> >>> This might be some loophole that guest could exploit.
> >>>
> >>> If we use the host clock, we can later just trust the nested poll when
> >>> waiting for requests.
> >>>
> >>> Note that for qemu-iotests case 093, which sets up qtest when running
> >>> QEMU, we still use vm clock so the script can control the clock stepping
> >>> in order to be deterministic.
> >>>
> >>> Signed-off-by: Fam Zheng 
> >>>
> >>> ---
> >>> v2: Don't break qemu-iotests 093.
> >>> ---
> >>>  block.c | 9 -
> >>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> Should we make an exception for live migration to reduce downtime?
> >>
> >> I'm concerned that now vm_stop() can take even longer since we'll wait
> >> for throttling.
> >
> > What would have prevented the wait before?  (In fact I'm not sure why we
> > would have even terminated bdrv_drain_all, since we run it when
> > QEMU_CLOCK_VIRTUAL is not advancing anymore!).
> 
> Hang on, I just noticed that bdrv_drain_one() disables throttling:
> 
> bs->io_limits_enabled = false;
> 
> for (i = 0; i < 2; i++) {
> while (qemu_co_enter_next(&bs->throttled_reqs[i])) {
> drained = true;
> }
> }
> 
> bs->io_limits_enabled = enabled;
> 
> So this patch does not make bdrv_drain_all() honor I/O throttling
> limits.  I find the commit description confusing when it mentions
> bdrv_drain_all().
> 
> What this patch really does is to all throttled I/O requests after
> vm_stop() or when the CPU was never running in the first place.

This patch has no effect even on that - before vm_stop we have
bdrv_drain_all(), right :)

See below for the reason of this change.

> 
> The reason why the virtual clock was chosen for throttling is to
> guarantee that we never violate the assumption that device backends
> are stopped when vcpus are stopped.

I don't get this. Doesn't bdrv_drain_all() in do_vm_stop guarantees that
already?

> 
> I'd like to know what problems exactly this patch fixes.

This patch is necessary, though not enough, to make bdrv_drain_all honor IO
throttling. It currently disables throttling, which is bad.

A malicious guest could use a loop that keep triggering bdrv_drain_all() so
that many requests can go beyond throttling, for example live snapshot.

In order to fix that we should remove the throttling disabling code above, but
it's not possible without this patch. Otherwise vm_stop and many other things
would not work.

Alberto saw that this patch also fixes the odd behavior: block jobs, which need
to R/W a throttled BDS, will not make progress if VCPU is not running. If we
don't consider this as a bug, we should document the inconsistency (confusion):
if no throttling, they DO make progress after VCPU stopped.

Fam




Re: [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes

2015-03-23 Thread Richard Henderson
On 03/23/2015 02:42 PM, Stefan Weil wrote:
> Further optimizations are possible. TCGTemp can be reduced to 32 bytes as the
> output
> of pahole shows:
> 
> struct TCGTemp {
> TCGTempVal val_type:8; /* 0:24  4 */

Need only be 2 bits.

> unsigned int   reg:8; /* 0:16  4 */
> unsigned int   mem_reg:8; /* 0: 8  4 */

Need only be  6 (ia64) bits, but an aligned 8-bit slot probably performs best.

> 
> /* Bitfield combined with next fields */
> 
> _Bool  fixed_reg:1; /* 3: 7  1 */
> _Bool  mem_coherent:1; /* 3: 6  1 */
> _Bool  mem_allocated:1; /* 3: 5  1 */
> _Bool  temp_local:1; /* 3: 4  1 */
> _Bool  temp_allocated:1; /* 3: 3  1 */
> 
> /* XXX 3 bits hole, try to pack */
> 
> TCGTypebase_type:16; /* 4:16  4 */
> TCGTypetype:16; /* 4: 0  4 */

Need only be 1 bit, honestly, but 2 bits might be easier to arrange.  Anyway,
you're down to 23 bits from the word, or 16 bytes on a 32-bit host.  It's no
better than the 32 bytes you got for a 64-bit host though.


> tcg_target_longval; /* 8 8 */
> intptr_t   mem_offset; /*16 8 */
> const char  *  name; /*24 8 */
> 
> /* size: 32, cachelines: 1, members: 13 */
> /* bit holes: 1, sum bit holes: 3 bits */
> /* last cacheline: 32 bytes */
> };
> 
> Here I used a new enum type for val_type and reduced some values to 8 or 16 
> bit.
> I also put the two most often used values at the beginning, so they can be
> addressed without or with a small offset ("often" in the code, no runtime
> data available).
> 
> Are such optimizations useful?

Yes, I think so.  Especially because of the rather large arrays we build.


r~



Re: [Qemu-devel] [Xen-devel] Question about scsi emulation

2015-03-23 Thread Yaoli Zheng
Thank you for the advice!
It will be grateful if someone can provide any guide how to config  MegaSAS HBA 
or how to use xen pvscsi in XEN.
There seems no option in XEN for these two driver emulation and no document 
found online.

Thanks!

-Original Message-
From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com] 
Sent: Monday, March 23, 2015 11:18 AM
To: Stefan Hajnoczi
Cc: Yaoli Zheng; qemu-devel@nongnu.org; xen-de...@lists.xen.org
Subject: Re: [Xen-devel] [Qemu-devel] Question about scsi emulation

On Mon, 23 Mar 2015, Stefan Hajnoczi wrote:
> On Wed, Mar 18, 2015 at 02:16:47PM -0700, Yaoli Zheng wrote:
> > We have problem using qemu emulated scsi driver(the old lsi). Wonder if any 
> > of other device model we can try for emulating scsi, and how we can get and 
> > config it in Xen? Having been told virtio-scsi is alternative one, but have 
> > no idea how to get it work in Xen.
> 
> The MegaSAS HBA might be another option.  Not sure whether MegaSAS or 
> virtio-scsi are supported under Xen though.
 
Making MegaSAS HBA work on Xen should be easy.

Rather than virtio-scsi I would consider the new Xen pvscsi frontend/backend 
drivers, now upstream in Linux (drivers/scsi/xen-scsifront.c and 
drivers/xen/xen-scsiback.c). Usually Xen PV drivers perform much better than 
virtio devices on Xen.

That said, probably most things would be better than the old lsi.



Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/2] i6300esb: Fix signed integer overflow

2015-03-23 Thread David Gibson
On Mon, Mar 23, 2015 at 10:54:39AM +0100, BALATON Zoltan wrote:
> On Mon, 23 Mar 2015, David Gibson wrote:
> >If the guest programs a sufficiently large timeout value an integer
> >overflow can occur in i6300esb_restart_timer().  e.g. if the maximum
> >possible timer preload value of 0xf is programmed then we end up with
> >the calculation:
> >
> >timeout = get_ticks_per_sec() * (0xf << 15) / 3300;
> >
> >get_ticks_per_sec() returns 10 (10^9) giving:
> >
> >10^9 * (0xf * 2^15) == 0x1dcd632329b00 (65 bits)
> >
> >Obviously the division by 33MHz brings it back under 64-bits, but the
> >overflow has already occurred.
> >
> >Since signed integer overflow has undefined behaviour in C, in theory this
> >could be arbitrarily bad.  In practice, the overflowed value wraps around
> >to something negative, causing the watchdog to immediately expire, killing
> >the guest, which is still fairly bad.
> >
> >The bug can be triggered by running a Linux guest, loading the i6300esb
> >driver with parameter "heartbeat=2046" and opening /dev/watchdog.  The
> >watchdog will trigger as soon as the device is opened.
> >
> >This patch corrects the problem by using muldiv64(), which effectively
> >allows a 128-bit intermediate value between the multiplication and
> >division.
> >
> >Signed-off-by: David Gibson 
> >---
> >hw/watchdog/wdt_i6300esb.c | 10 --
> >1 file changed, 8 insertions(+), 2 deletions(-)
> >
> >diff --git a/hw/watchdog/wdt_i6300esb.c b/hw/watchdog/wdt_i6300esb.c
> >index e694fa9..c7316f5 100644
> >--- a/hw/watchdog/wdt_i6300esb.c
> >+++ b/hw/watchdog/wdt_i6300esb.c
> >@@ -125,8 +125,14 @@ static void i6300esb_restart_timer(I6300State *d, int 
> >stage)
> >else
> >timeout <<= 5;
> >
> >-/* Get the timeout in units of ticks_per_sec. */
> >-timeout = get_ticks_per_sec() * timeout / 3300;
> >+/* Get the timeout in units of ticks_per_sec.
> >+ *
> >+ * ticks_per_sec is typically 10^9 == 0x3B9ACA00 (30 bits), with
> >+ * 20 bits of user supplied preload, and 15 bits of scale, the
> >+ * multiply here can exceed 64-bits, before we divide by 33MHz, so
> >+ * we use a 128-bit temporary
> >+ */
> 
> Is the comment still correct saying "we use a 128-bit temporary" when the
> code does not do that explicitely any more?

Bother.  I fixed the commit message, but not this comment.  It's still
kind of correct, in that muldiv64 does effectively have a 128-bit
temporary internally.  Not quite what I meant, and a little misleading
though.

Paolo, worth fixing?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpeS2eXPMuK0.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH v3] rcu tests: fix compilation on 32-bit ppc

2015-03-23 Thread Andreas Färber
Am 22.03.2015 um 09:26 schrieb Paolo Bonzini:
> 32-bit PPC cannot do atomic operations on long long.  Inside the loops,
> we are already using local counters that are summed at the end of
> the run---with some exceptions (rcu_stress_count for rcutorture,
> n_nodes for test-rcu-list): fix them to use the same technique.
> For test-rcu-list, remove the mostly unused member "val" from the
> list.  Then, use a mutex to protect the global counts.
> 
> Performance does not matter there because every thread will only enter
> the critical section once.
> 
> Remaining uses of atomic instructions are for ints or pointers.
> 
> Reported-by: Andreas Faerber 
> Signed-off-by: Paolo Bonzini 
> ---
>  tests/rcutorture.c| 20 
>  tests/test-rcu-list.c | 50 --
>  2 files changed, 44 insertions(+), 26 deletions(-)

Tested-by: Andreas Färber 

Passing these tests now, but running into unrelated qtest failures.

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu,
Graham Norton; HRB 21284 (AG Nürnberg)



Re: [Qemu-devel] [PATCH 06/12 v8] target-tilegx: Add TILE-Gx building files

2015-03-23 Thread Chen Gang

On 3/23/15 23:06, Richard Henderson wrote:
> On 03/21/2015 03:23 AM, Chen Gang wrote:
>> Add related configuration, make files for tilegx.
>>
>> Signed-off-by: Chen Gang 
>> ---
>>  configure | 3 +++
>>  default-configs/tilegx-linux-user.mak | 1 +
>>  target-tilegx/Makefile.objs   | 1 +
> 
> 
> You need to move this patch after 10/12, so that every patch is buildable.
> Otherwise this is a fairly good split of patches.
>

OK, thanks. I shall send patch v9 for it within this month (2015-03-31).
And welcome any additional ideas, suggestions and completions for all
patches in v8 by any members.

Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed



[Qemu-devel] [PATCH 1/2] CVE-2015-1779: incrementally decode websocket frames

2015-03-23 Thread Daniel P. Berrange
The logic for decoding websocket frames wants to fully
decode the frame header and payload, before allowing the
VNC server to see any of the payload data. There is no
size limit on websocket payloads, so this allows a
malicious network client to consume 2^64 bytes in memory
in QEMU. It can trigger this denial of service before
the VNC server even performs any authentication.

The fix is to decode the header, and then incrementally
decode the payload data as it is needed. With this fix
the websocket decoder will allow at most 4k of data to
be buffered before decoding and processing payload.

Signed-off-by: Daniel P. Berrange 
---
 ui/vnc-ws.c | 105 
 ui/vnc-ws.h |   9 --
 ui/vnc.h|   2 ++
 3 files changed, 80 insertions(+), 36 deletions(-)

diff --git a/ui/vnc-ws.c b/ui/vnc-ws.c
index 85dbb7e..e8146d0 100644
--- a/ui/vnc-ws.c
+++ b/ui/vnc-ws.c
@@ -107,7 +107,7 @@ long vnc_client_read_ws(VncState *vs)
 {
 int ret, err;
 uint8_t *payload;
-size_t payload_size, frame_size;
+size_t payload_size, header_size;
 VNC_DEBUG("Read websocket %p size %zd offset %zd\n", vs->ws_input.buffer,
 vs->ws_input.capacity, vs->ws_input.offset);
 buffer_reserve(&vs->ws_input, 4096);
@@ -117,18 +117,39 @@ long vnc_client_read_ws(VncState *vs)
 }
 vs->ws_input.offset += ret;
 
-/* make sure that nothing is left in the ws_input buffer */
+ret = 0;
+/* consume as much of ws_input buffer as possible */
 do {
-err = vncws_decode_frame(&vs->ws_input, &payload,
-  &payload_size, &frame_size);
-if (err <= 0) {
-return err;
+if (vs->ws_payload_remain == 0) {
+err = vncws_decode_frame_header(&vs->ws_input,
+&header_size,
+&vs->ws_payload_remain,
+&vs->ws_payload_mask);
+if (err <= 0) {
+return err;
+}
+
+buffer_advance(&vs->ws_input, header_size);
 }
+if (vs->ws_payload_remain != 0) {
+err = vncws_decode_frame_payload(&vs->ws_input,
+ &vs->ws_payload_remain,
+ &vs->ws_payload_mask,
+ &payload,
+ &payload_size);
+if (err < 0) {
+return err;
+}
+if (err == 0) {
+return ret;
+}
+ret += err;
 
-buffer_reserve(&vs->input, payload_size);
-buffer_append(&vs->input, payload, payload_size);
+buffer_reserve(&vs->input, payload_size);
+buffer_append(&vs->input, payload, payload_size);
 
-buffer_advance(&vs->ws_input, frame_size);
+buffer_advance(&vs->ws_input, payload_size);
+}
 } while (vs->ws_input.offset > 0);
 
 return ret;
@@ -265,15 +286,14 @@ void vncws_encode_frame(Buffer *output, const void 
*payload,
 buffer_append(output, payload, payload_size);
 }
 
-int vncws_decode_frame(Buffer *input, uint8_t **payload,
-   size_t *payload_size, size_t *frame_size)
+int vncws_decode_frame_header(Buffer *input,
+  size_t *header_size,
+  size_t *payload_remain,
+  WsMask *payload_mask)
 {
 unsigned char opcode = 0, fin = 0, has_mask = 0;
-size_t header_size = 0;
-uint32_t *payload32;
+size_t payload_len;
 WsHeader *header = (WsHeader *)input->buffer;
-WsMask mask;
-int i;
 
 if (input->offset < WS_HEAD_MIN_LEN + 4) {
 /* header not complete */
@@ -283,7 +303,7 @@ int vncws_decode_frame(Buffer *input, uint8_t **payload,
 fin = (header->b0 & 0x80) >> 7;
 opcode = header->b0 & 0x0f;
 has_mask = (header->b1 & 0x80) >> 7;
-*payload_size = header->b1 & 0x7f;
+payload_len = header->b1 & 0x7f;
 
 if (opcode == WS_OPCODE_CLOSE) {
 /* disconnect */
@@ -300,40 +320,57 @@ int vncws_decode_frame(Buffer *input, uint8_t **payload,
 return -2;
 }
 
-if (*payload_size < 126) {
-header_size = 6;
-mask = header->u.m;
-} else if (*payload_size == 126 && input->offset >= 8) {
-*payload_size = be16_to_cpu(header->u.s16.l16);
-header_size = 8;
-mask = header->u.s16.m16;
-} else if (*payload_size == 127 && input->offset >= 14) {
-*payload_size = be64_to_cpu(header->u.s64.l64);
-header_size = 14;
-mask = header->u.s64.m64;
+if (payload_len < 126) {
+*payload_remain = payload_len;
+*header_size = 6;
+*payload_mask = header->u.m;
+} else if (payload_len == 126 && input->offset >= 8) {
+*payload_remain = be16_to_cpu(header->u.s16.

[Qemu-devel] [PATCH 2/2] CVE-2015-1779: limit size of HTTP headers from websockets clients

2015-03-23 Thread Daniel P. Berrange
The VNC server websockets decoder will read and buffer data from
websockets clients until it sees the end of the HTTP headers,
as indicated by \r\n\r\n. In theory this allows a malicious to
trick QEMU into consuming an arbitrary amount of RAM. In practice,
because QEMU runs g_strstr_len() across the buffered header data,
it will spend increasingly long burning CPU time searching for
the substring match and less & less time reading data. So while
this does cause arbitrary memory growth, the bigger problem is
that QEMU will be burning 100% of available CPU time.

A novnc websockets client typically sends headers of around
512 bytes in length. As such it is reasonable to place a 4096
byte limit on the amount of data buffered while searching for
the end of HTTP headers.

Signed-off-by: Daniel P. Berrange 
---
 ui/vnc-ws.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/ui/vnc-ws.c b/ui/vnc-ws.c
index e8146d0..525e2d7 100644
--- a/ui/vnc-ws.c
+++ b/ui/vnc-ws.c
@@ -81,8 +81,11 @@ void vncws_handshake_read(void *opaque)
 VncState *vs = opaque;
 uint8_t *handshake_end;
 long ret;
-buffer_reserve(&vs->ws_input, 4096);
-ret = vnc_client_read_buf(vs, buffer_end(&vs->ws_input), 4096);
+/* Typical HTTP headers from novnc are 512 bytes, so limiting
+ * total header size to 4096 is easily enough. */
+size_t want = 4096 - vs->ws_input.offset;
+buffer_reserve(&vs->ws_input, want);
+ret = vnc_client_read_buf(vs, buffer_end(&vs->ws_input), want);
 
 if (!ret) {
 if (vs->csock == -1) {
@@ -99,6 +102,9 @@ void vncws_handshake_read(void *opaque)
 vncws_process_handshake(vs, vs->ws_input.buffer, vs->ws_input.offset);
 buffer_advance(&vs->ws_input, handshake_end - vs->ws_input.buffer +
 strlen(WS_HANDSHAKE_END));
+} else if (vs->ws_input.offset >= 4096) {
+VNC_DEBUG("End of headers not found in first 4096 bytes\n");
+vnc_client_error(vs);
 }
 }
 
-- 
2.1.0




[Qemu-devel] [PATCH 0/2] CVE-2015-1779: fix denial of service in VNC websockets

2015-03-23 Thread Daniel P. Berrange
The VNC websockets protocol decoder has two places where it did
not correctly limit its resource usage when processing data from
the client. This can be abused by a malicious client to cause QEMU
to consume all system memory, unless it is otherwise limited by
ulimits and/or cgroups. These problems can be triggered in the
websockets layer before the VNC protocol actually starts, so no
client authentication will have taken place at this point.

Daniel P. Berrange (2):
  CVE-2015-1779: incrementally decode websocket frames
  CVE-2015-1779: limit size of HTTP headers from websockets clients

 ui/vnc-ws.c | 115 +---
 ui/vnc-ws.h |   9 +++--
 ui/vnc.h|   2 ++
 3 files changed, 88 insertions(+), 38 deletions(-)

-- 
2.1.0




Re: [Qemu-devel] [PATCH for-2.3 0/1] block: New command line option --misc format-probing=off

2015-03-23 Thread Peter Maydell
On 23 March 2015 at 10:04, Markus Armbruster  wrote:
> First of all, my apologies for being so late with this.  I realized
> part way through the current development cycle that I couldn't do both
> the error work and my half of the block probing work we discussed back
> in November, so I punted the latter to the next cycle, missing the one
> little feature I quite obviously could do.
>
> Why 2.3?
>
> 1. libvirt wants it, the sooner, the better.  See the libvirt RFC PATCH
>https://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg04457.html
>
> 2. The patch is simple, and quite obviously does nothing unless you
>run with --misc format-probing=off.

I'm really dubious about adding new commandline ABI at this point
in the release cycle, especially a whole new option group...

-- PMM



Re: [Qemu-devel] [PATCH for-2.3] powerpc: fix -machine usb=no for newworld and pseries machines

2015-03-23 Thread Marcel Apfelbaum

On 03/23/2015 10:11 PM, Paolo Bonzini wrote:



On 23/03/2015 19:21, Alexander Graf wrote:

On 03/23/2015 07:20 PM, Marcel Apfelbaum wrote:

On 03/23/2015 07:05 PM, Paolo Bonzini wrote:

Capture the explicit setting of "usb=no" into a separate bool, and
use it to skip the update of machine->usb in the board init function.

Signed-off-by: Paolo Bonzini 
---
   hw/core/machine.c | 1 +
   hw/ppc/mac_newworld.c | 2 +-
   hw/ppc/spapr.c| 2 +-
   include/hw/boards.h   | 1 +
   4 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index cb1185a..25c45e6 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -223,6 +223,7 @@ static void machine_set_usb(Object *obj, bool
value, Error **errp)
   MachineState *ms = MACHINE(obj);

   ms->usb = value;
+ms->usb_disabled = !value;

Maybe is too late now, but I really not like this pollution of
MachineState
with 'usb_disabled'. (Imagine we have this kind of fields for lots of
objects and lots
of corner cases...)
I know it comes to solve a bug, but we talked about it in another mail
thread and
this change in semantics was approved.

Let me explain *why* I don't like it.
1. We add an "usb_disabled" field to a base class (actually object)
of all the machines and the only place it is interesting is
for 2 machines on ppc.


So we do for kernel_irqchip_requested/allowed.  Both approaches could be
replaced by a tri-state on/off/auto.

Personally I prefer this one, but out of the scope of this patch.




2. Even for these 2 machines, the scenario of defaults=on and usb=off
is not practical.


Why?  For example you could add a virtio-input device instead of a USB
keyboard and mouse.

You got me there :)
From what I understood for those boards there is no need for this
combination but I don't know them enough (OK.. at all).




I'm personally fine either way, but I assumed that Paolo had a good
reason for writing the patch?


Actually, two.

One good reason is that no matter how you look at it, it's at least
surprising and at worst a bug that "-machine usb=no" includes a default
USB controller.

The second good reason is that it's a guest ABI change for the versioned
pSeries machines, and as such it breaks migration.

Always migration wins.

Bottom line, of course I don't have anything against fixing this bug,
my problem was only with the way we add those fields (usb_disabled), maybe a 
three state
QOM property (and variable behind it) is a solution, but not for now of course.

I also didn't like the required/allowed fields and I added them anyway...

Thanks,
Marcel




Paolo






Re: [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes

2015-03-23 Thread Stefan Weil

Am 21.03.2015 um 07:27 schrieb Emilio G. Cota:

This brings down the size of the struct from 56 to 48 bytes.

Signed-off-by: Emilio G. Cota 
---
  tcg/tcg.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index add7f75..3276924 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -429,8 +429,8 @@ typedef struct TCGTemp {
  int val_type;
  int reg;
  tcg_target_long val;
-int mem_reg;
  intptr_t mem_offset;
+int mem_reg;
  unsigned int fixed_reg:1;
  unsigned int mem_coherent:1;
  unsigned int mem_allocated:1;


Reviewed-by: Stefan Weil 

TCGContext includes an array of TCGTemp, so it is even reduced by 4 KiB 
(good for caching),

and tcg.o now uses 55364 instead of 56116 bytes (maybe faster, too).

Further optimizations are possible. TCGTemp can be reduced to 32 bytes 
as the output

of pahole shows:

struct TCGTemp {
TCGTempVal val_type:8; /* 0:24  4 */
unsigned int   reg:8; /* 0:16  4 */
unsigned int   mem_reg:8; /* 0: 8  4 */

/* Bitfield combined with next fields */

_Bool  fixed_reg:1; /* 3: 7  1 */
_Bool  mem_coherent:1; /* 3: 6  1 */
_Bool  mem_allocated:1; /* 3: 5  1 */
_Bool  temp_local:1; /* 3: 4  1 */
_Bool  temp_allocated:1; /* 3: 3  1 */

/* XXX 3 bits hole, try to pack */

TCGTypebase_type:16; /* 4:16  4 */
TCGTypetype:16; /* 4: 0  4 */
tcg_target_longval; /* 8 8 */
intptr_t   mem_offset; /*16 8 */
const char  *  name; /*24 8 */

/* size: 32, cachelines: 1, members: 13 */
/* bit holes: 1, sum bit holes: 3 bits */
/* last cacheline: 32 bytes */
};

Here I used a new enum type for val_type and reduced some values to 8 or 
16 bit.

I also put the two most often used values at the beginning, so they can be
addressed without or with a small offset ("often" in the code, no runtime
data available).

Are such optimizations useful?

Stefan




Re: [Qemu-devel] [PATCH for-2.3 2/3] sdhci: Make device "sdhci-pci" unavailable with -device

2015-03-23 Thread Markus Armbruster
Peter Maydell  writes:

> On 23 March 2015 at 19:09, Markus Armbruster  wrote:
>> Due to misuse of drive_get_next(), -device sdhci can connect to a
>> -drive if=sd,...  If you can't see what's wrong here, check out the
>> FIXME in sdhci_initfn() and the commit that added it.
>>
>> We can't fix this in time for the release, but since the device is new
>> in 2.3, we can disable it before this mistake becomes ABI: set
>> cannot_instantiate_with_device_add_yet.
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>  hw/sd/sdhci.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
>> index f056c52..ab13505 100644
>> --- a/hw/sd/sdhci.c
>> +++ b/hw/sd/sdhci.c
>> @@ -1254,6 +1254,8 @@ static void sdhci_pci_class_init(ObjectClass *klass, 
>> void *data)
>>  set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>>  dc->vmsd = &sdhci_vmstate;
>>  dc->props = sdhci_properties;
>> +/* Reason: realize() method uses drive_get_next() */
>> +dc->cannot_instantiate_with_device_add_yet = true;
>>  }
>
> This is basically saying "this device won't work in this
> release", right?

Yes, because with this patch, you can use it only from code, and code
isn't using it right now.



Re: [Qemu-devel] [PATCH for-2.3 1/1] block: New command line option --misc format-probing=off

2015-03-23 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 23/03/2015 11:04, Markus Armbruster wrote:
>> Probing is convenient, but probing untrusted raw images is insecure
>> (CVE-2008-2004).  To avoid it, users should always specify raw format
>> explicitly.  This isn't trivial, and even sophisticated users have
>> gotten it wrong (libvirt CVE-2010-2237, CVE-2010-2238, CVE-2010-2239,
>> plus more recent variations of the theme that didn't get CVEs because
>> they were caught before they could hurt users).
>> 
>> Disabling probing entirely is a (hamfisted) way to ensure you always
>> specify the format.
>> 
>> Instead of creating yet another simple option that doesn't work with
>> -readconfig, create a "misc" option group and --misc command line
>> option.  We're out of space in vm_config_groups[], so double it.
>> 
>> This will let us make existing miscellaneous non-QemeOpts options
>> sugar for --misc, so they become available with -readconfig.  Left for
>> another day.
>
> Which exactly?  Could they fit into another scheme?  (See how
> -mem-prealloc was replaced and generalized by memory-backend-* objects).
>
> For example, -win2k-install-hack should really be an IDE disk property
> that can be set with -global, and many other options could be machine or
> display options.
>
> I don't think it's the right solution.  Libvirt knows where to add a
> format=raw option, and it can do it without waiting for QEMU to
> implement this.  Direct command-line users are not going to use the
> option anyway.

Two separate bones of contention here:

1. Do we want to give libvirt the bug insurance it wants?

2. Is --misc sane?

We're discussing 1. elsewhere already.

Regarding 2.: if anyone has a better idea on how to do the command line
switch, I'm all ears.

Eyeballing vl.c, I suspect these options don't use QemuOpts, thus don't
support -readconfig:

nodefconfig
nouserconfig
cpu
snapshot
display
nographic
curses
portrait
rotate
no-fd-bootchk
tftp
bootp
redir
audio_help
soundhw
help
version
mempath
mem-prealloc
d
D
s
L
singlestep
S
k
localtime
vga
g
echr
watchdog
watchdog-action
loadvm
full-screen
no-frame
alt-grab
ctrl-grab
no-quit
sdl
pidfile
win2k-hack
rtc-td-hack
no-kvm-pit-reinjection
no-acpi
no-hpet
no-reboot
no-shutdown
show-cursor
uuid
semihosting
prom-env
startdate
tb-size
incoming
nodefaults
xen-domid
xen-attach
qtest
qtest-log
dump-vmstate
smb
runas
chroot
daemonize
enable-fips

Unless we stop adding more, we'll never get --readconfig reasonably
complete.

>
> So for today we're 1-1 on NACKs. :D

I NACKed something today?

All I remember is advising to disable sdhci-pci instead of changing how
it's hacked up.



Re: [Qemu-devel] [PATCH RFC for-2.3 1/1] block: New command line option --no-format-probing

2015-03-23 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 23/03/2015 18:48, Eric Blake wrote:
>>> Why can't libvirt just add ,format=raw instead of leaving out the
>>> format key altogether?
>> 
>> Libvirt DOES add format=raw.  This patch is an extra insurance
>> policy to guarantee that libvirt does not have any code paths that
>> omit the explicit format (as we have had a couple of CVEs in
>> libvirt over the years where that was the case).
>
> And where's the extra insurance policy to guarantee that QEMU does not
> have any code paths that ignore the new command line option?

Right here (in the non-RFC patch, not this one):

@@ -751,6 +752,11 @@ static int find_image_format(BlockDriverState *bs, const 
char *filename,
 return ret;
 }
 
+if (bdrv_image_probing_disabled) {
+error_setg(errp, "Format not specified and image probing disabled");
+return -EINVAL;
+}
+
 ret = bdrv_pread(bs, 0, buf, sizeof(buf));
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not read image for determining its 
"

The option sets bdrv_image_probing_disabled in a straightforward manner,
and bdrv_image_probing_disabled guards the probing code in an equally
straightforward manner.

> This is really borderline security theater.

I don't think so.  It's optional insurance agains libvirt bugs, and
libvirt wants to buy it.

>  Bugs happen, we fix them.

Yes.  Doesn't mean we shouldn't proactively mitigate bugs.

>  Even better, Kevin now has implemented a strong mitigation for CVEs
> like this, that won't allow guests to transmute a probed raw image
> into another format.

Yes.  As we discussed at some length back then, Kevin's mitigation is
not airtight and not without risk, but we decided it's worth having all
the same, in particular since it provides a good amount of protection
for users unwilling or unable to always specify formats.

How to get p0wned anyway:

1. Use your raw image with format=raw.  Malicious guest writes qcow2
header.

2. Use the image again without a format.  Probe returns qcow2, no
warning is printed.

Plausible attack?  Maybe not.  Worth stopping cold anyway?  I think so,
as long as it's easy.

My new option is a different kind of mitigation.  It's for users who
want more airtight protection, prefer writes to sector 0 just work,
funny or not, and are willing to always specify the format.  At least
one such user exists: libvirt.

Without this patch:

* Alternate use of raw images with and without format is still insecure;
  Kevin's mitigation can't protect you then.

* If you somehow miss specifying a format, and probing detects raw, you
  get a warning on stderr.  If your guest writes something funny to
  sector 0, the write fails.

With this patch:

* If you somehow miss specifying a format, open fails.  This is what
  libvirt wants.

>   There certainly hasn't been enough discussion
> for this to get into 2.3.

Isn't that what were doing now?  :)



Re: [Qemu-devel] [PATCH 0/9] add virtio-gpu with 2d support

2015-03-23 Thread Christopher Covington
Hi Gerd,

This looks great, thanks for your work on this. Unfortunately, I'm seeing the
segfault below.

Alex, do you have any thoughts on how well this might be able to replace the
Goldfish Framebuffer on the Android Emulator?

On 03/13/2015 05:47 AM, Gerd Hoffmann wrote:
>   Hi,
> 
> Next round of virtio-gpu patches.  Patches 1-8 are meant to be merged,
> patch 9 is a hack to simplify testing with libvirt and will not be
> merged.
> 
> Changes since the RfC submission earlier this month are a bunch of
> sanity checks being added (mostly pointed out by max) and the
> virtio-1.0 adaptions are squashed in now.
> 
> This series depends on virtio 1.0 patches still not merged.
> 
> This series is also available via git:
>   git://git.kraxel.org/qemu tags/virtio-gpu-2015-03-13
> 
> The virtio patches are here (mst's virtio-1.0 branch, rebased to master):
>   git://git.kraxel.org/qemu tags/virtio-mst-rebased-2015-03-13
> 
> Guest kernel driver is here:
>   git://git.kraxel.org/linux virtio-gpu
> 
> Usage:
>   qemu-system-x86_64 -vga virtio [ ... ]
>   qemu-system-x86_64 -device virtio-vga [ ... ]
>   qemu-system-ppc64 -M pseries -device virtio-gpu-pci [ ... ]
>   qemu-system-arm -M virt -device virtio-gpu-device [ ... ]

git remote add kraxel git://git.kraxel.org/qemu
git fetch kraxel
git checkout tags/virtio-gpu-2015-03-13
./configure --target-list=aarch64-softmmu
make -j 12
gdb --args aarch64-softmmu/qemu-system-aarch64 -M virt -cpu cortex-a57 \
  -device virtio-gpu-device
[...]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x77fb0ac0 (LWP 26568)]
gd_switch (dcl=0x563285d0, surface=0x0) at ui/gtk.c:565
565 trace_gd_switch(vc->label, surface_width(surface), surface_heigh
t(surface));
(gdb) bt
#0  gd_switch (dcl=0x563285d0, surface=0x0) at ui/gtk.c:565
#1  0x558ba706 in register_displaychangelistener (
dcl=0x563285d0) at ui/console.c:1353
#2  0x558dcd80 in gd_vc_gfx_init (view_menu=0x56bf4220,
group=, idx=, con=0x56333de0,
vc=0x56328598, s=0x56328470) at ui/gtk.c:1705
#3  gd_create_menu_view (s=0x56328470) at ui/gtk.c:1781
#4  gd_create_menus (s=0x56328470) at ui/gtk.c:1807
#5  gtk_display_init (ds=, full_screen=false,
grab_on_hover=false) at ui/gtk.c:1894
#6  0x55609247 in main (argc=,
argv=, envp=) at vl.c:4292

Thanks,
Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [Qemu-devel] [PATCH for-2.3] powerpc: fix -machine usb=no for newworld and pseries machines

2015-03-23 Thread Paolo Bonzini


On 23/03/2015 19:21, Alexander Graf wrote:
> On 03/23/2015 07:20 PM, Marcel Apfelbaum wrote:
>> On 03/23/2015 07:05 PM, Paolo Bonzini wrote:
>>> Capture the explicit setting of "usb=no" into a separate bool, and
>>> use it to skip the update of machine->usb in the board init function.
>>>
>>> Signed-off-by: Paolo Bonzini 
>>> ---
>>>   hw/core/machine.c | 1 +
>>>   hw/ppc/mac_newworld.c | 2 +-
>>>   hw/ppc/spapr.c| 2 +-
>>>   include/hw/boards.h   | 1 +
>>>   4 files changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>> index cb1185a..25c45e6 100644
>>> --- a/hw/core/machine.c
>>> +++ b/hw/core/machine.c
>>> @@ -223,6 +223,7 @@ static void machine_set_usb(Object *obj, bool
>>> value, Error **errp)
>>>   MachineState *ms = MACHINE(obj);
>>>
>>>   ms->usb = value;
>>> +ms->usb_disabled = !value;
>> Maybe is too late now, but I really not like this pollution of
>> MachineState
>> with 'usb_disabled'. (Imagine we have this kind of fields for lots of
>> objects and lots
>> of corner cases...)
>> I know it comes to solve a bug, but we talked about it in another mail
>> thread and
>> this change in semantics was approved.
>>
>> Let me explain *why* I don't like it.
>> 1. We add an "usb_disabled" field to a base class (actually object)
>>of all the machines and the only place it is interesting is
>>for 2 machines on ppc.

So we do for kernel_irqchip_requested/allowed.  Both approaches could be
replaced by a tri-state on/off/auto.

>> 2. Even for these 2 machines, the scenario of defaults=on and usb=off
>>is not practical.

Why?  For example you could add a virtio-input device instead of a USB
keyboard and mouse.

> I'm personally fine either way, but I assumed that Paolo had a good
> reason for writing the patch?

Actually, two.

One good reason is that no matter how you look at it, it's at least
surprising and at worst a bug that "-machine usb=no" includes a default
USB controller.

The second good reason is that it's a guest ABI change for the versioned
pSeries machines, and as such it breaks migration.

Paolo



Re: [Qemu-devel] [PATCH v4 4/5] Qemu-Xen-vTPM: Qemu vTPM xenstubdoms backen.

2015-03-23 Thread Stefan Berger

On 03/23/2015 08:44 AM, Xu, Quan wrote:



-Original Message-
From: Stefan Berger [mailto:stef...@linux.vnet.ibm.com]
Sent: Thursday, March 19, 2015 3:17 AM
To: Xu, Quan; stefano.stabell...@eu.citrix.com; qemu-devel@nongnu.org;
arm...@redhat.com; lcapitul...@redhat.com; aligu...@amazon.com;
pbonz...@redhat.com; ebl...@redhat.com; kra...@redhat.com;
meyer...@redhat.com; m...@tls.msk.ru; s...@weilnetz.de; wei.l...@citrix.com
Cc: xen-de...@lists.xen.org
Subject: Re: [PATCH v4 4/5] Qemu-Xen-vTPM: Qemu vTPM xenstubdoms
backen.

On 03/10/2015 08:14 AM, Quan Xu wrote:

This Patch provides the glue for the TPM_TIS(Qemu frontend) to Xen
stubdom vTPM domain that provides the actual TPM functionality. It
sends data and TPM commends with xen_vtpm_frontend. It is similar as
another two vTPM backens:
*vTPM passthrough backen Since QEMU 1.5.
*vTPM libtpms-based backen.

Some details:
This part of the patch provides support for the spawning of a thread
that will interact with stubdom vTPM domain by the xen_vtpm_frontend.
It expects a signal from the frontend to wake and pick up the TPM
command that is supposed to be processed and delivers the response
packet using a callback function provided by the frontend.

The backend connects itself to the frontend by filling out an
interface structure with pointers to the function implementing support
for various operations.

(QEMU) vTPM XenStubdoms backen is initialized by Qemu command line

options,

"-tpmdev xenstubdoms,id=xenvtpm0 -device

tpm-tis,tpmdev=xenvtpm0"

--Changes in v3:
-Call vtpm_send() and vtpm_recv() directly

--Changes in v4:
-Fix the comment style

Signed-off-by: Quan Xu 
---
   hw/tpm/Makefile.objs |   2 +-
   hw/tpm/tpm_xenstubdoms.c | 247

+++

   2 files changed, 248 insertions(+), 1 deletion(-)
   create mode 100644 hw/tpm/tpm_xenstubdoms.c

diff --git a/hw/tpm/Makefile.objs b/hw/tpm/Makefile.objs index
57919fa..190e776 100644
--- a/hw/tpm/Makefile.objs
+++ b/hw/tpm/Makefile.objs
@@ -1,3 +1,3 @@
   common-obj-$(CONFIG_TPM_TIS) += tpm_tis.o
   common-obj-$(CONFIG_TPM_PASSTHROUGH) += tpm_passthrough.o
-common-obj-$(CONFIG_TPM_XENSTUBDOMS) += xen_vtpm_frontend.o
+common-obj-$(CONFIG_TPM_XENSTUBDOMS) += tpm_xenstubdoms.o
+xen_vtpm_frontend.o
diff --git a/hw/tpm/tpm_xenstubdoms.c b/hw/tpm/tpm_xenstubdoms.c new
file mode 100644 index 000..6d0dc32
--- /dev/null
+++ b/hw/tpm/tpm_xenstubdoms.c
@@ -0,0 +1,247 @@
+/*
+ * Xen Stubdom vTPM driver
+ *
+ *  Copyright (c) 2015 Intel Corporation
+ *  Authors:
+ *Quan Xu 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+  */
+
+#include 
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "qemu/sockets.h"
+#include "qemu/log.h"
+#include "sysemu/tpm_backend.h"
+#include "tpm_int.h"
+#include "hw/hw.h"
+#include "hw/i386/pc.h"
+#include "hw/xen/xen_backend.h"
+#include "sysemu/tpm_backend_int.h"
+#include "tpm_tis.h"
+
+#ifdef DEBUG_TPM
+#define DPRINTF(fmt, ...) \
+do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) #else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
+#define TYPE_TPM_XENSTUBDOMS "tpm-xenstubdoms"
+#define TPM_XENSTUBDOMS(obj) \
+OBJECT_CHECK(TPMXenstubdomsState, (obj),

TYPE_TPM_XENSTUBDOMS)

+
+static const TPMDriverOps tpm_xenstubdoms_driver;
+
+/* Data structures */
+typedef struct TPMXenstubdomsThreadParams {
+TPMState *tpm_state;
+TPMRecvDataCB *recv_data_callback;
+TPMBackend *tb;
+} TPMXenstubdomsThreadParams;
+
+struct TPMXenstubdomsState {
+TPMBackend parent;
+TPMBackendThread tbt;
+TPMXenstubdomsThreadParams tpm_thread_params;
+bool had_startup_error;
+};
+
+typedef struct TPMXenstubdomsState TPMXenstubdomsState;
+
+/* Functions */
+static void tpm_xenstubdoms_cancel_cmd(TPMBackend *tb);
+
+static int tpm_xenstubdoms_unix_transfer(const TPMLocality
+*locty_data) {
+size_t rlen;
+struct XenDevice *xendev;
+
+xendev = xen_find_xendev("vtpm", xen_domid, xenstore_dev);
+if (xendev == NULL) {
+xen_be_printf(xendev, 0, "Con not find vtpm device\n");

Con not -> Cannot

+return -1;
+}
+vtpm_send(xendev, locty_data->w_buffer.buffer,

locty_data->w_offset);

+vtpm_recv(xendev, locty_data->r_buffer.buffer, &rlen);
+return 0;
+}
+
+static void tpm_xenstubdoms_worker_thread(gpointer data,
+   

Re: [Qemu-devel] [Xen-devel] [PATCH] SeaBios/vTPM: Enable Xen stubdom vTPM for HVM virtual machine

2015-03-23 Thread Stefan Berger

On 03/23/2015 08:03 AM, Xu, Quan wrote:



-Original Message-
From: Stefan Berger [mailto:stef...@linux.vnet.ibm.com]
Sent: Monday, March 23, 2015 6:57 PM
To: Xu, Quan; Ian Campbell
Cc: ke...@koconnor.net; xen-de...@lists.xen.org; qemu-devel@nongnu.org;
stefano.stabell...@eu.citrix.com
Subject: Re: [Xen-devel] [PATCH] SeaBios/vTPM: Enable Xen stubdom vTPM for
HVM virtual machine

On 03/22/2015 09:47 PM, Xu, Quan wrote:

-Original Message-
From: Stefan Berger [mailto:stef...@linux.vnet.ibm.com]
Sent: Friday, March 20, 2015 7:44 PM
To: Ian Campbell; Xu, Quan
Cc: ke...@koconnor.net; xen-de...@lists.xen.org;
qemu-devel@nongnu.org; stefano.stabell...@eu.citrix.com
Subject: Re: [Xen-devel] [PATCH] SeaBios/vTPM: Enable Xen stubdom
vTPM for HVM virtual machine

On 03/19/2015 08:56 AM, Ian Campbell wrote:

On Tue, 2015-03-10 at 08:16 -0400, Quan Xu wrote:

@@ -151,6 +152,8 @@ device_hardware_setup(void)
esp_scsi_setup();
megasas_setup();
pvscsi_setup();
+if (runningOnXen())
+vtpm4hvm_setup();

Is there anything which is actually Xen specific about the driver in
tpm.[ch]? Would it be better to just probe for it, perhaps gates by
a Kconfig option which enables TPM support.

I also think the probing should be done. That code can also be
recycled from what I posted earlier. It's gated by a Kconfig option, so it 
doesn't

fill up the 128k ROM.

   Stefan


Agree, I will do it ASAP.

I reposted v9 of my series of patches. I will probably post v10 today.
Please try that one then since these patches should cover Xen, QEMU (using a
driver that only I can test at the moment), and to some extent bare metal
system.


 Stefan


Great!  Could you also archive v10 to your github?
then I can also test it and go through these source code.


I put it here now:

https://github.com/stefanberger/seabios-tpm

Stefan




Re: [Qemu-devel] Revised patch for QEMU Guest Agent compilaiton

2015-03-23 Thread Stefan Weil

Am 23.03.2015 um 20:29 schrieb Joseph Hindin:

Hi

   About a week ago I've submitted the revised patch for QEMU GA 
Windows cross-compilation. Since then I've not got any responses. 
Would you mind to look into the proposed patch?


Regards,

   Joseph Hindin

http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg03155.html


Maybe you would have got a response if you had sent that patch to Paolo 
or me, too.
Your patch has formal issues: it should have a short subject (< 80 
characters typically)
and some description. Your patch puts all text into the subject, so this 
becomes
rather long. Please send an updated patch with "git send-email" and use 
scripts/checkpatch.pl

before you send it.

Is there any reason why you filter the -fstack-protector% the way you 
did in your patch

(and not as suggested by Paolo)?

Technically I think your patch fixes the problem.

Thanks,
Stefan





[Qemu-devel] [PATCH for-2.4 v5 7/7] tests: test-x86-cpu: Ensure that -cpu override KVM-specific defaults

2015-03-23 Thread Eduardo Habkost
Add a test case to ensure reordering the feature handling code won't
make accelerator-specific defaults override explicit command-line
options. Explicit command-line options should always override the CPU
model defaults.

Signed-off-by: Eduardo Habkost 
---
 tests/test-x86-cpu.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/tests/test-x86-cpu.c b/tests/test-x86-cpu.c
index b1b04ea..618624b 100644
--- a/tests/test-x86-cpu.c
+++ b/tests/test-x86-cpu.c
@@ -160,6 +160,36 @@ static void test_host_cpu_features_kvm(void)
 all_unsupported = false;
 }
 
+/* Ensure that explicit CPU options always override KVM-specific defaults */
+static void test_kvm_override(void)
+{
+int i;
+kvm_allowed = true;
+
+/* Even KVM defaults change on cpu.c, this test relies on x2apic being
+ * always enabled, and monitor being always disabled by KVM:
+ */
+kvm_default_features[FEAT_1_ECX] |= CPUID_EXT_X2APIC;
+kvm_default_unset_features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
+
+for (i = 0; i < ARRAY_SIZE(builtin_x86_defs); ++i) {
+ObjectClass *oc;
+X86CPU *cpu;
+X86CPUDefinition *def = &builtin_x86_defs[i];
+char features[] = "-x2apic,+monitor";
+
+oc = x86_cpu_class_by_name(def->name);
+cpu = X86_CPU(object_new(object_class_get_name(oc)));
+x86_cpu_parse_featurestr(CPU(cpu), features, &error_abort);
+
+g_assert_cmpint(cpu->env.features[FEAT_1_ECX] & CPUID_EXT_MONITOR, ==,
+CPUID_EXT_MONITOR);
+g_assert_cmpint(cpu->env.features[FEAT_1_ECX] & CPUID_EXT_X2APIC, ==,
+0);
+object_unref(OBJECT(cpu));
+}
+}
+
 int main(int argc, char *argv[])
 {
 module_call_init(MODULE_INIT_QOM);
@@ -170,6 +200,7 @@ int main(int argc, char *argv[])
 g_test_add_func("/cpu/x86/features/tcg", test_cpu_features_tcg);
 g_test_add_func("/cpu/x86/features/kvm", test_cpu_features_kvm);
 g_test_add_func("/cpu/x86/features/host", test_host_cpu_features_kvm);
+g_test_add_func("/cpu/x86/features/kvm-override", test_kvm_override);
 
 g_test_run();
 
-- 
2.1.0




[Qemu-devel] [PATCH for-2.4 v5 6/7] tests: test-x86-cpu: Add KVM feature bit initialization test

2015-03-23 Thread Eduardo Habkost
Signed-off-by: Eduardo Habkost 
---
 tests/test-x86-cpu.c | 84 
 1 file changed, 84 insertions(+)

diff --git a/tests/test-x86-cpu.c b/tests/test-x86-cpu.c
index a0923e7..b1b04ea 100644
--- a/tests/test-x86-cpu.c
+++ b/tests/test-x86-cpu.c
@@ -78,6 +78,88 @@ static void test_cpu_features_tcg(void)
 }
 }
 
+static void test_cpu_features_kvm(void)
+{
+int i;
+kvm_allowed = true;
+
+for (i = 0; i < ARRAY_SIZE(builtin_x86_defs); ++i) {
+FeatureWord w;
+ObjectClass *oc;
+X86CPU *cpu;
+X86CPUDefinition *def = &builtin_x86_defs[i];
+char features[] = "";
+
+oc = x86_cpu_class_by_name(def->name);
+cpu = X86_CPU(object_new(object_class_get_name(oc)));
+x86_cpu_parse_featurestr(CPU(cpu), features, &error_abort);
+
+for (w = 0; w < FEATURE_WORDS; w++) {
+uint32_t expected = (def->features[w] | default_features_all[w] |
+ kvm_default_features[w]) &
+~kvm_default_unset_features[w];
+uint32_t actual = cpu->env.features[w];
+g_assert_cmpint(actual, ==, expected);
+}
+object_unref(OBJECT(cpu));
+}
+}
+
+static void test_host_cpu_features_kvm(void)
+{
+FeatureWord w;
+ObjectClass *oc;
+X86CPU *cpu;
+char features1[] = "migratable=off";
+char features2[] = "migratable=off";
+char features3[] = "migratable=off";
+
+kvm_allowed = true;
+oc = x86_cpu_class_by_name("host");
+
+/* fake_cpuid() test mode: */
+use_fake_cpuid = true;
+cpu = X86_CPU(object_new(object_class_get_name(oc)));
+x86_cpu_parse_featurestr(CPU(cpu), features1, &error_abort);
+
+for (w = 0; w < FEATURE_WORDS; w++) {
+FeatureWordInfo *wi = &feature_word_info[w];
+uint32_t expected = fake_cpuid(wi->cpuid_eax, wi->cpuid_ecx,
+   wi->cpuid_reg);
+uint32_t actual = cpu->env.features[w];
+g_assert_cmpint(cpu->filtered_features[w], ==, 0);
+g_assert_cmpint(actual, ==, expected);
+}
+object_unref(OBJECT(cpu));
+
+/* all-supported test mode: */
+use_fake_cpuid = false;
+cpu = X86_CPU(object_new(object_class_get_name(oc)));
+x86_cpu_parse_featurestr(CPU(cpu), features2, &error_abort);
+
+for (w = 0; w < FEATURE_WORDS; w++) {
+uint32_t expected = ~0;
+uint32_t actual = cpu->env.features[w];
+g_assert_cmpint(cpu->filtered_features[w], ==, 0);
+g_assert_cmpint(actual, ==, expected);
+}
+object_unref(OBJECT(cpu));
+
+/* all-unsupported test mode: */
+all_unsupported = true;
+cpu = X86_CPU(object_new(object_class_get_name(oc)));
+x86_cpu_parse_featurestr(CPU(cpu), features3, &error_abort);
+
+for (w = 0; w < FEATURE_WORDS; w++) {
+uint32_t expected = 0;
+uint32_t actual = cpu->env.features[w];
+g_assert_cmpint(cpu->filtered_features[w], ==, 0);
+g_assert_cmpint(actual, ==, expected);
+}
+object_unref(OBJECT(cpu));
+all_unsupported = false;
+}
+
 int main(int argc, char *argv[])
 {
 module_call_init(MODULE_INIT_QOM);
@@ -86,6 +168,8 @@ int main(int argc, char *argv[])
 
 g_test_add_func("/cpu/x86/creation", test_cpu_creation);
 g_test_add_func("/cpu/x86/features/tcg", test_cpu_features_tcg);
+g_test_add_func("/cpu/x86/features/kvm", test_cpu_features_kvm);
+g_test_add_func("/cpu/x86/features/host", test_host_cpu_features_kvm);
 
 g_test_run();
 
-- 
2.1.0




[Qemu-devel] [PATCH for-2.4 v5 3/7] tests: Add unit test for X86CPU code

2015-03-23 Thread Eduardo Habkost
The unit test includes target-i386/cpu.c instead of simply linking
against cpu.o because the test code will use static variables/functions
from cpu.c.

Reasoning for each object file included in the test binary:
 * qom/cpu.o - for TYPE_CPU. Dependencies:
   * qom/qom-qobject.o
 * qom/qdev.o - for TYPE_DEVICE. Dependencies:
   * qom/container.o
   * migration/vmstate.o. Dependencies:
 * migration/qemu-file.o
 * qjson.o
   * hw/core/hotplug.o
   * hw/core/irq.o
   * hw/core/fw-path-provider.o
   * hw/core/qdev-properties.o
 * qom/object.o - for TYPE_OBJECT
 * x86_64-softmmu/target-i386/machine.o - for vmstate_x86_cpu
 * qemu-log.o - for the logging API, used by target-i386/cpu.c
 * libqemuutil.a - for QAPI visitors, error API, and other symbols
 * libqemustub.a - existing stubs, including: savevm, monitor symbols

The remaining symbols used by target-i386/cpu.c were added as stubs to
either tests/vl-stub.c and tests/x86-stub.c.

Note: I couldn't add dependencies that ensure the target-specific object
files are compiled on demand when building the test binary, but "make
check" already requires "make" to be run first because of the qtest test
cases, so I assume this is OK.

Signed-off-by: Eduardo Habkost 
---
Changes v4 -> v5:
* Rewrote kvm_arch_get_supported_cpuid() to simplify test code

Changes v3 -> v4:
* Keep target_words_bigendian() prototype inside x86-stub.c

Changes v1 -> v2:
* Don't include cpus.o on test binary, making lots of stubs now
  unnecessary
---
 tests/.gitignore |   1 +
 tests/Makefile   |  16 -
 tests/test-x86-cpu.c |  68 +
 tests/vl-stub.c  |  15 +
 tests/x86-stub.c | 165 +++
 5 files changed, 264 insertions(+), 1 deletion(-)
 create mode 100644 tests/test-x86-cpu.c
 create mode 100644 tests/vl-stub.c
 create mode 100644 tests/x86-stub.c

diff --git a/tests/.gitignore b/tests/.gitignore
index 0dcb618..24b5eb4 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -38,5 +38,6 @@ test-vmstate
 test-write-threshold
 test-x86-cpuid
 test-xbzrle
+test-x86-cpu
 *-test
 qapi-schema/*.test.*
diff --git a/tests/Makefile b/tests/Makefile
index 1a78b86..e2b1fcd 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -72,6 +72,7 @@ check-unit-y += tests/test-qemu-opts$(EXESUF)
 gcov-files-test-qemu-opts-y = qom/test-qemu-opts.c
 check-unit-y += tests/test-write-threshold$(EXESUF)
 gcov-files-test-write-threshold-y = block/write-threshold.c
+check-unit-x86_64-softmmu-y += tests/test-x86-cpu$(EXESUF)
 
 check-block-$(CONFIG_POSIX) += tests/qemu-iotests-quick.sh
 
@@ -235,7 +236,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o 
tests/check-qdict.o \
tests/test-opts-visitor.o tests/test-qmp-event.o \
tests/rcutorture.o tests/test-rcu-list.o
 
-test-obj-x86_64-softmmu-y = tests/test-x86-cpuid.o
+test-obj-x86_64-softmmu-y = tests/test-x86-cpuid.o \
+   tests/test-x86-cpu.o tests/x86-stub.o
 
 test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \
  tests/test-qapi-event.o
@@ -371,6 +373,18 @@ tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o 
qemu-char.o qemu-timer.o
 tests/qemu-iotests/socket_scm_helper$(EXESUF): 
tests/qemu-iotests/socket_scm_helper.o
 tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o libqemuutil.a 
libqemustub.a
 tests/test-write-threshold$(EXESUF): tests/test-write-threshold.o 
$(block-obj-y) libqemuutil.a libqemustub.a
+tests/test-x86-cpu$(EXESUF): tests/test-x86-cpu.o \
+   x86_64-softmmu/target-i386/machine.o \
+   qom/cpu.o \
+   qom/object.o qom/qom-qobject.o qom/container.o \
+   hw/core/qdev.o hw/core/qdev-properties.o \
+   hw/core/hotplug.o hw/core/irq.o hw/core/fw-path-provider.o \
+   migration/vmstate.o migration/qemu-file.o \
+   qemu-log.o \
+   qjson.o \
+   libqemuutil.a \
+   libqemustub.a \
+   tests/vl-stub.o tests/x86-stub.o
 
 ifeq ($(CONFIG_POSIX),y)
 LIBS += -lutil
diff --git a/tests/test-x86-cpu.c b/tests/test-x86-cpu.c
new file mode 100644
index 000..1ce7e1d
--- /dev/null
+++ b/tests/test-x86-cpu.c
@@ -0,0 +1,68 @@
+#include 
+
+#include "cpu.c"
+
+
+/* stub to avoid requiring hw/intc/apic.o */
+void apic_poll_irq(DeviceState *dev)
+{
+}
+
+/* Special test modes for faking kvm_arch_get_supported_cpuid() results: */
+static bool use_fake_cpuid;  /* Enable fake_cpuid() usage */
+static bool all_unsupported; /* Return all features as unsupported */
+
+/* Fake get_supported_cpuid() which will just return as hash of
+ * function/index/reg. Use a (somewhat poor) hash function to do that.
+ */
+static uint32_t fake_cpuid(uint32_t function, uint32_t index, int reg)
+{
+uint64_t i = (uint64_t)function * 16ULL * 4ULL + (uint64_t)index * 4 + reg;
+return i * 2654435761ULL;
+}
+
+uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
+  uint32_t index, int reg)
+{
+if (all_unsupported) {
+r

[Qemu-devel] [PATCH for-2.4 v5 4/7] target-i386: Isolate enabled-by-default features to a separate array

2015-03-23 Thread Eduardo Habkost
This will make it easier to write unit tests for the feature
initialization logic.

Signed-off-by: Eduardo Habkost 
---
 target-i386/cpu.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index b2d1c95..8557a98 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -481,6 +481,11 @@ static uint32_t kvm_default_unset_features[FEATURE_WORDS] 
= {
 [FEAT_8000_0001_ECX] = CPUID_EXT3_SVM,
 };
 
+/* Features that are added by default to all CPU models in any accelerator: */
+FeatureWordArray default_features_all = {
+[FEAT_1_ECX] = CPUID_EXT_HYPERVISOR,
+};
+
 void x86_cpu_compat_kvm_no_autoenable(FeatureWord w, uint32_t features)
 {
 kvm_default_features[w] &= ~features;
@@ -2117,15 +2122,14 @@ static void x86_cpu_load_def(X86CPU *cpu, 
X86CPUDefinition *def, Error **errp)
 }
 
 /* Special cases not set in the X86CPUDefinition structs: */
-if (kvm_enabled()) {
-FeatureWord w;
-for (w = 0; w < FEATURE_WORDS; w++) {
+for (w = 0; w < FEATURE_WORDS; w++) {
+if (kvm_enabled()) {
 env->features[w] |= kvm_default_features[w];
 env->features[w] &= ~kvm_default_unset_features[w];
 }
+env->features[w] |= default_features_all[w];
 }
 
-env->features[FEAT_1_ECX] |= CPUID_EXT_HYPERVISOR;
 
 /* sysenter isn't supported in compatibility mode on AMD,
  * syscall isn't supported in compatibility mode on Intel.
-- 
2.1.0




[Qemu-devel] [PATCH for-2.4 v5 2/7] tests: Make test-x86-cpuid target-specific

2015-03-23 Thread Eduardo Habkost
Instead of using a test-specific hack to add -I$(SRC_PATH)/target-i386, add
test-x86-cpuid to $(test-obj-x86_64-softmmu-y).

Signed-off-by: Eduardo Habkost 
---
 tests/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/Makefile b/tests/Makefile
index 33b74a2..1a78b86 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -45,7 +45,7 @@ check-unit-y += tests/test-thread-pool$(EXESUF)
 gcov-files-test-thread-pool-y = thread-pool.c
 gcov-files-test-hbitmap-y = util/hbitmap.c
 check-unit-y += tests/test-hbitmap$(EXESUF)
-check-unit-y += tests/test-x86-cpuid$(EXESUF)
+check-unit-x86_64-softmmu-y += tests/test-x86-cpuid$(EXESUF)
 # all code tested by test-x86-cpuid is inside topology.h
 gcov-files-test-x86-cpuid-y =
 ifeq ($(CONFIG_SOFTMMU),y)
@@ -235,6 +235,8 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o 
tests/check-qdict.o \
tests/test-opts-visitor.o tests/test-qmp-event.o \
tests/rcutorture.o tests/test-rcu-list.o
 
+test-obj-x86_64-softmmu-y = tests/test-x86-cpuid.o
+
 test-qapi-obj-y = tests/test-qapi-visit.o tests/test-qapi-types.o \
  tests/test-qapi-event.o
 
-- 
2.1.0




[Qemu-devel] [PATCH for-2.4 v5 5/7] tests: test-x86-cpu: Add TCG feature bit initialization test

2015-03-23 Thread Eduardo Habkost
Signed-off-by: Eduardo Habkost 
---
 tests/test-x86-cpu.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/tests/test-x86-cpu.c b/tests/test-x86-cpu.c
index 1ce7e1d..a0923e7 100644
--- a/tests/test-x86-cpu.c
+++ b/tests/test-x86-cpu.c
@@ -54,6 +54,30 @@ static void test_cpu_creation(void)
 }
 }
 
+static void test_cpu_features_tcg(void)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(builtin_x86_defs); ++i) {
+FeatureWord w;
+ObjectClass *oc;
+X86CPU *cpu;
+X86CPUDefinition *def = &builtin_x86_defs[i];
+char features[] = "";
+
+oc = x86_cpu_class_by_name(def->name);
+cpu = X86_CPU(object_new(object_class_get_name(oc)));
+x86_cpu_parse_featurestr(CPU(cpu), features, &error_abort);
+
+for (w = 0; w < FEATURE_WORDS; w++) {
+uint32_t expected = def->features[w] | default_features_all[w];
+uint32_t actual = cpu->env.features[w];
+g_assert_cmpint(actual, ==, expected);
+}
+object_unref(OBJECT(cpu));
+}
+}
+
 int main(int argc, char *argv[])
 {
 module_call_init(MODULE_INIT_QOM);
@@ -61,6 +85,7 @@ int main(int argc, char *argv[])
 g_test_init(&argc, &argv, NULL);
 
 g_test_add_func("/cpu/x86/creation", test_cpu_creation);
+g_test_add_func("/cpu/x86/features/tcg", test_cpu_features_tcg);
 
 g_test_run();
 
-- 
2.1.0




[Qemu-devel] [PATCH for-2.4 v5 0/7] Target-specific unit test support, add unit tests for target-i386/cpu.c code

2015-03-23 Thread Eduardo Habkost
Changes v4 -> v5:
 * Rewrite fake kvm_arch_get_supported_cpuid() code
 * Add new test to ensure KVM defaults won't override explicit command-line
   options

Changes v3 -> v4:
 * Move target_words_bigendian() prototype back inside tests/x86-stub.c.

Changes v2 -> v3:
 * Extra KVM "host" CPU model test cases
 * Move target_words_bigendian() prototype to exec-all.h

Changes v1 -> v2:
 * Make dependency list of test binary much simpler, now that cpus.o
   was removed.

Eduardo Habkost (7):
  tests: Support target-specific unit tests
  tests: Make test-x86-cpuid target-specific
  tests: Add unit test for X86CPU code
  target-i386: Isolate enabled-by-default features to a separate array
  tests: test-x86-cpu: Add TCG feature bit initialization test
  tests: test-x86-cpu: Add KVM feature bit initialization test
  tests: test-x86-cpu: Ensure that -cpu override KVM-specific defaults

 target-i386/cpu.c|  12 ++-
 tests/.gitignore |   1 +
 tests/Makefile   |  49 +---
 tests/test-x86-cpu.c | 208 +++
 tests/vl-stub.c  |  15 
 tests/x86-stub.c | 165 
 6 files changed, 437 insertions(+), 13 deletions(-)
 create mode 100644 tests/test-x86-cpu.c
 create mode 100644 tests/vl-stub.c
 create mode 100644 tests/x86-stub.c

-- 
2.1.0




[Qemu-devel] [PATCH for-2.4 v5 1/7] tests: Support target-specific unit tests

2015-03-23 Thread Eduardo Habkost
To make unit tests that depend on target-specific files, use
check-unit--y and test-obj--y.

Note that the qtest test cases were per-*arch* (e.g. i386, mips, ppc),
not per-*target* (e.g. i386-softmmu, x86_64-linux-user), because they
implicitly apply only to the -softmmu targets. Target-specific unit
tests, on the other hand, may apply to any target (e.g. they may test
*-softmmu and/or *-user code). To clarify this, $(TARGETS) was renamed
to $(QTEST_ARCHES).

Signed-off-by: Eduardo Habkost 
---
 tests/Makefile | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index 55aa745..33b74a2 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -374,12 +374,27 @@ ifeq ($(CONFIG_POSIX),y)
 LIBS += -lutil
 endif
 
+
+SOFTMMU_TARGETS=$(filter %-softmmu,$(TARGET_DIRS))
+SOFTMMU_ARCHES=$(patsubst %-softmmu,%, $(SOFTMMU_TARGETS))
+
+# unit test rules:
+
+# target-specific tests/objs:
+
+test-obj-y += $(foreach TARGET,$(TARGET_DIRS), $(test-obj-$(TARGET)-y))
+check-unit-y += $(foreach TARGET,$(TARGET_DIRS), $(check-unit-$(TARGET)-y))
+
+$(foreach TARGET,$(TARGET_DIRS),$(eval include $(TARGET)/config-target.mak) \
+$(eval $(test-obj-$(TARGET)-y): QEMU_CFLAGS += 
-I$(TARGET) -I$(SRC_PATH)/target-$(TARGET_BASE_ARCH) -DNEED_CPU_H))
+
+$(test-obj-y): QEMU_INCLUDES += -Itests
+
 # QTest rules
 
-TARGETS=$(patsubst %-softmmu,%, $(filter %-softmmu,$(TARGET_DIRS)))
 ifeq ($(CONFIG_POSIX),y)
-QTEST_TARGETS=$(foreach TARGET,$(TARGETS), $(if $(check-qtest-$(TARGET)-y), 
$(TARGET),))
-check-qtest-y=$(foreach TARGET,$(TARGETS), $(check-qtest-$(TARGET)-y))
+QTEST_ARCHES=$(foreach ARCH,$(SOFTMMU_ARCHES), $(if $(check-qtest-$(ARCH)-y), 
$(ARCH),))
+check-qtest-y=$(foreach ARCH,$(QTEST_ARCHES), $(check-qtest-$(ARCH)-y))
 endif
 
 qtest-obj-y = tests/libqtest.o libqemuutil.a libqemustub.a
@@ -411,8 +426,8 @@ GCOV_OPTIONS = -n $(if $(V),-f,)
 
 # gtester tests, possibly with verbose output
 
-.PHONY: $(patsubst %, check-qtest-%, $(QTEST_TARGETS))
-$(patsubst %, check-qtest-%, $(QTEST_TARGETS)): check-qtest-%: $(check-qtest-y)
+.PHONY: $(patsubst %, check-qtest-%, $(QTEST_ARCHES))
+$(patsubst %, check-qtest-%, $(QTEST_ARCHES)): check-qtest-%: $(check-qtest-y)
$(if $(CONFIG_GCOV),@rm -f *.gcda */*.gcda */*/*.gcda */*/*/*.gcda,)
$(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \
MALLOC_PERTURB_=$${MALLOC_PERTURB_:-$$((RANDOM % 255 + 1))} \
@@ -435,7 +450,7 @@ $(patsubst %, check-%, $(check-unit-y)): check-%: %
 
 # gtester tests with XML output
 
-$(patsubst %, check-report-qtest-%.xml, $(QTEST_TARGETS)): 
check-report-qtest-%.xml: $(check-qtest-y)
+$(patsubst %, check-report-qtest-%.xml, $(QTEST_ARCHES)): 
check-report-qtest-%.xml: $(check-qtest-y)
$(call quiet-command,QTEST_QEMU_BINARY=$*-softmmu/qemu-system-$* \
  gtester -q $(GTESTER_OPTIONS) -o $@ -m=$(SPEED) 
$(check-qtest-$*-y),"GTESTER $@")
 
@@ -444,7 +459,7 @@ check-report-unit.xml: $(check-unit-y)
 
 # Reports and overall runs
 
-check-report.xml: $(patsubst %,check-report-qtest-%.xml, $(QTEST_TARGETS)) 
check-report-unit.xml
+check-report.xml: $(patsubst %,check-report-qtest-%.xml, $(QTEST_ARCHES)) 
check-report-unit.xml
$(call quiet-command,$(SRC_PATH)/scripts/gtester-cat $^ > $@, "  GEN
$@")
 
 check-report.html: check-report.xml
@@ -478,7 +493,7 @@ $(patsubst %, check-%, $(check-qapi-schema-y)): 
check-%.json: $(SRC_PATH)/%.json
 
 .PHONY: check-qapi-schema check-qtest check-unit check check-clean
 check-qapi-schema: $(patsubst %,check-%, $(check-qapi-schema-y))
-check-qtest: $(patsubst %,check-qtest-%, $(QTEST_TARGETS))
+check-qtest: $(patsubst %,check-qtest-%, $(QTEST_ARCHES))
 check-unit: $(patsubst %,check-%, $(check-unit-y))
 check-block: $(patsubst %,check-%, $(check-block-y))
 check: check-qapi-schema check-unit check-qtest
-- 
2.1.0




Re: [Qemu-devel] RFC: 'alternate' qapi top-level expression [was: [PATCH v4 12/19] qapi: Add some type check tests]

2015-03-23 Thread Markus Armbruster
Cc'ing Kevin as fair punishment for is previous work on QAPI code
generation in general, and union types in particular.

Eric Blake  writes:

> [revisiting this series, finally!]
>
> On 09/25/2014 10:19 AM, Markus Armbruster wrote:
>
>>> Return of an anon union isn't used yet, but _might_ make sense (as the
>>> only feasible way of changing existing commands that return an array or
>>> primitive extensible to instead return a dict) - 
>> 
>> Good point.
>> 
>>>  except that back-compat
>>> demands that we can't return a dict in place of a primitive unless the
>>> arguments of the command are also enhanced (that is, older callers are
>>> not expecting a dict, so we can't return a dict unless the caller
>>> witnesses they are new enough by explicitly asking for a dict return).
>> 
>> I think we can keep things simple for now and reject anonymous unions.
>> We can always relax the check when we run into a use.
>
> In trying to code up what it would take to easily reject anonymous
> unions from a return type, I'm realizing that it would be smarter to
> quit mixing anonymous unions in with other unions.
>
> Refresher course: on the wire, both a regular union:
>
> QAPI:
>  { 'type': 'Type1', 'data': { 'value': 'int' } }
>  { 'union': 'Foo', 'data': { 'a': 'Type1', 'b': 'Type2' } }
>  { 'command': 'bar', 'data': 'Foo' }
> Wire:
>  { "execute": "bar", "arguments": { "type": "a",
>"data": { "value": 1 } } }
>
> and a flat union:
>
> QAPI:
>  { 'type': 'Type1', 'data': { 'value': 'int' } }
>  { 'enum': 'Enum', 'data': [ 'a', 'b' ] }
>  { 'type': 'Base', { 'switch': 'Enum' } }
>  { 'union': 'Foo', 'base': 'Base', 'discriminator': 'switch',
>'data': { 'a': 'Type1', 'b': 'Type2' } }
>  { 'command': 'bar', 'data': 'Foo' }
> Wire:
>  { "execute": "bar", "arguments": { "switch": "a",
>"value": 1 } }
>
> happen to guarantee a top-level dictionary (plain unions always have a
> two-element dictionary, flat unions are required to have a base type
> which is itself a dictionary).

Yes.

> But an anonymous union is explicitly
> allowing a multi-type variable, where the determination of which branch
> of the union is made by the type of the variable.  Furthermore, we do
> not allow two branches to have the same type,

Even more severe: the JSON types have to be different!  Thus, at most
one can be a complex or union type, and at most one can be a string or
enum type.

>   so at least one branch
> must be a non-dictionary;

Correct.

>   but as _all_ QMP commands currently take a
> dictionary for the "arguments" key, we do not want to allow:
>
> QAPI:
>  { 'type': 'Type1', 'data': { 'value': 'int' } }
>  { 'union': 'Foo', 'discriminator': {},
>'data': { 'a': 'Type1', 'b': 'int' } }
>  { 'command': 'bar', 'data': 'Foo' }
> Wire:
>  { "execute": "bar", "arguments": 1 }

Yes, let's stay out of the generalization tar pits and stick to named
parameters, i.e. JSON object arguments.

> Tracking all three qapi expressions as union types is making the
> generator code a bit verbose, especially since the code generation for
> all three is so distinct.
>
>
> Proposal: I am proposing that we convert to an alternate syntax for what
> we now term as anonymous unions.  It will not have any impact to the
> wire representation of QMP, only to the qapi code generators.  The
> proposal is simple: instead of using "'union':'Name',
> 'discriminator':{}", we instead use "'alternate': 'Foo'" when declaring
> a type as an anonymous union (which, for obvious reasons, I would then
> update the documentation to call an "alternate" instead of an "anonymous
> union").

This separates tagged and untagged unions more clearly: reserve 'union'
for the two kinds of tagged unions, switch the untagged union to
'alternate'.

I don't mind.  Kevin, any objections?

> I'll go ahead and propose the patches (I've already done the bulk of the
> conversion work, to prove that not many files were affected).

I'll gladly review.



Re: [Qemu-devel] [PATCH 2/3] ppc64-softmmu: Remove unsupported FDC from config

2015-03-23 Thread Maciej W. Rozycki
On Wed, 11 Mar 2015, Alexander Graf wrote:

> > So if you know how to get working floppy disk with qemu-system-ppc64,
> > that would help me a lot in rejecting requests from libvirt folks :)
> > Thanks :)
> 
> I don't think you want floppy disk emulation on -M pseries at all. In
> fact, you only ever want floppy disk emulation on x86. So I'd recommend
> to change the logic in libvirt accordingly and just hard code floppy
> emulation to x86 (and if alpha or some other weird architecture needs it
> later as well, have them extend the list).

 For the record: the MIPS Malta board that we support also has direct PC 
FDD support.  The real board has a SMSC FDC37M817 super I/O controller 
chip wired to Intel 82371EB (PIIX4E) south bridge's LPC interface and a 
suitable IDC header for a FDD data cable soldered onto the PCB as well.  
So it's not only x86 that can make use of floppy disk emulation.  Note 
that there's no real ISA (no slots) on the Malta board, only the LPC and 
PIIX4E on-chip stuff.

  Maciej



Re: [Qemu-devel] [PATCH for-2.3 2/3] sdhci: Make device "sdhci-pci" unavailable with -device

2015-03-23 Thread Peter Maydell
On 23 March 2015 at 19:09, Markus Armbruster  wrote:
> Due to misuse of drive_get_next(), -device sdhci can connect to a
> -drive if=sd,...  If you can't see what's wrong here, check out the
> FIXME in sdhci_initfn() and the commit that added it.
>
> We can't fix this in time for the release, but since the device is new
> in 2.3, we can disable it before this mistake becomes ABI: set
> cannot_instantiate_with_device_add_yet.
>
> Signed-off-by: Markus Armbruster 
> ---
>  hw/sd/sdhci.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
> index f056c52..ab13505 100644
> --- a/hw/sd/sdhci.c
> +++ b/hw/sd/sdhci.c
> @@ -1254,6 +1254,8 @@ static void sdhci_pci_class_init(ObjectClass *klass, 
> void *data)
>  set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
>  dc->vmsd = &sdhci_vmstate;
>  dc->props = sdhci_properties;
> +/* Reason: realize() method uses drive_get_next() */
> +dc->cannot_instantiate_with_device_add_yet = true;
>  }

This is basically saying "this device won't work in this
release", right?

-- PMM



[Qemu-devel] [PATCH for-2.3 2/3] sdhci: Make device "sdhci-pci" unavailable with -device

2015-03-23 Thread Markus Armbruster
Due to misuse of drive_get_next(), -device sdhci can connect to a
-drive if=sd,...  If you can't see what's wrong here, check out the
FIXME in sdhci_initfn() and the commit that added it.

We can't fix this in time for the release, but since the device is new
in 2.3, we can disable it before this mistake becomes ABI: set
cannot_instantiate_with_device_add_yet.

Signed-off-by: Markus Armbruster 
---
 hw/sd/sdhci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index f056c52..ab13505 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1254,6 +1254,8 @@ static void sdhci_pci_class_init(ObjectClass *klass, void 
*data)
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 dc->vmsd = &sdhci_vmstate;
 dc->props = sdhci_properties;
+/* Reason: realize() method uses drive_get_next() */
+dc->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo sdhci_pci_info = {
-- 
1.9.3




[Qemu-devel] [PATCH for-2.3 3/3] sysbus: Contain drive_get_next() misuse

2015-03-23 Thread Markus Armbruster
A number of sysbus devices abuse drive_get_next(): sl-nand,
milkymist-memcard, pl181, generic-sdhci.  If you can't see what's
wrong with their use of drive_get_next(), check out the FIXMEs in
their init() methods and the commit that added them.

We're in luck, though: only machines ppce500 and pseries-* support
-device with sysbus devices, and none of the devices above is
available with these machines.

Set cannot_instantiate_with_device_add_yet to preserve our luck.

Signed-off-by: Markus Armbruster 
---
 hw/arm/spitz.c| 2 ++
 hw/sd/milkymist-memcard.c | 2 ++
 hw/sd/pl181.c | 2 ++
 hw/sd/sdhci.c | 3 +++
 4 files changed, 9 insertions(+)

diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
index da02932..5bf032a 100644
--- a/hw/arm/spitz.c
+++ b/hw/arm/spitz.c
@@ -1036,6 +1036,8 @@ static void sl_nand_class_init(ObjectClass *klass, void 
*data)
 k->init = sl_nand_init;
 dc->vmsd = &vmstate_sl_nand_info;
 dc->props = sl_nand_properties;
+/* Reason: init() method uses drive_get() */
+dc->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo sl_nand_info = {
diff --git a/hw/sd/milkymist-memcard.c b/hw/sd/milkymist-memcard.c
index 0cc53d9..2209ef1 100644
--- a/hw/sd/milkymist-memcard.c
+++ b/hw/sd/milkymist-memcard.c
@@ -297,6 +297,8 @@ static void milkymist_memcard_class_init(ObjectClass 
*klass, void *data)
 k->init = milkymist_memcard_init;
 dc->reset = milkymist_memcard_reset;
 dc->vmsd = &vmstate_milkymist_memcard;
+/* Reason: init() method uses drive_get_next() */
+dc->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo milkymist_memcard_info = {
diff --git a/hw/sd/pl181.c b/hw/sd/pl181.c
index bf37da6..11fcd47 100644
--- a/hw/sd/pl181.c
+++ b/hw/sd/pl181.c
@@ -508,6 +508,8 @@ static void pl181_class_init(ObjectClass *klass, void *data)
 sdc->init = pl181_init;
 k->vmsd = &vmstate_pl181;
 k->reset = pl181_reset;
+/* Reason: init() method uses drive_get_next() */
+k->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo pl181_info = {
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index ab13505..e4c70b5 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1297,12 +1297,15 @@ static void sdhci_sysbus_class_init(ObjectClass *klass, 
void *data)
 dc->vmsd = &sdhci_vmstate;
 dc->props = sdhci_properties;
 dc->realize = sdhci_sysbus_realize;
+/* Reason: instance_init() method uses drive_get_next() */
+dc->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo sdhci_sysbus_info = {
 .name = TYPE_SYSBUS_SDHCI,
 .parent = TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(SDHCIState),
+/* FIXME SysBusDeviceClass init() instead of .instance_init */
 .instance_init = sdhci_sysbus_init,
 .instance_finalize = sdhci_sysbus_finalize,
 .class_init = sdhci_sysbus_class_init,
-- 
1.9.3




[Qemu-devel] [PATCH for-2.3 1/3] hw: Mark devices misusing drive_get(), drive_get_next() FIXME

2015-03-23 Thread Markus Armbruster
Drives defined with if!=none are for board initialization to wire up.
Board code calls drive_get() or similar to find them, and creates
devices with their qdev drive properties set accordingly.

Except a few devices go on a fishing expedition for a suitable backend
instead of exposing a drive property for board code to set: they call
driver_get() or drive_get_next() in their realize() or init() method.
Wrong.  Mark them with suitable FIXME comments.

Signed-off-by: Markus Armbruster 
---
 hw/arm/spitz.c| 1 +
 hw/block/m25p80.c | 1 +
 hw/isa/pc87312.c  | 2 ++
 hw/sd/milkymist-memcard.c | 1 +
 hw/sd/pl181.c | 1 +
 hw/sd/sdhci.c | 1 +
 hw/sd/ssi-sd.c| 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
index a16831c..da02932 100644
--- a/hw/arm/spitz.c
+++ b/hw/arm/spitz.c
@@ -168,6 +168,7 @@ static int sl_nand_init(SysBusDevice *dev)
 DriveInfo *nand;
 
 s->ctl = 0;
+/* FIXME use a qdev drive property instead of drive_get() */
 nand = drive_get(IF_MTD, 0, 0);
 s->nand = nand_init(nand ? blk_by_legacy_dinfo(nand) : NULL,
 s->manf_id, s->chip_id);
diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index ff1106b..afe243b 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -623,6 +623,7 @@ static int m25p80_init(SSISlave *ss)
 s->dirty_page = -1;
 s->storage = blk_blockalign(s->blk, s->size);
 
+/* FIXME use a qdev drive property instead of drive_get_next() */
 dinfo = drive_get_next(IF_MTD);
 
 if (dinfo) {
diff --git a/hw/isa/pc87312.c b/hw/isa/pc87312.c
index 40a1106..2849e8d 100644
--- a/hw/isa/pc87312.c
+++ b/hw/isa/pc87312.c
@@ -319,11 +319,13 @@ static void pc87312_realize(DeviceState *dev, Error 
**errp)
 d = DEVICE(isa);
 qdev_prop_set_uint32(d, "iobase", get_fdc_iobase(s));
 qdev_prop_set_uint32(d, "irq", 6);
+/* FIXME use a qdev drive property instead of drive_get() */
 drive = drive_get(IF_FLOPPY, 0, 0);
 if (drive != NULL) {
 qdev_prop_set_drive_nofail(d, "driveA",
blk_by_legacy_dinfo(drive));
 }
+/* FIXME use a qdev drive property instead of drive_get() */
 drive = drive_get(IF_FLOPPY, 0, 1);
 if (drive != NULL) {
 qdev_prop_set_drive_nofail(d, "driveB",
diff --git a/hw/sd/milkymist-memcard.c b/hw/sd/milkymist-memcard.c
index 9661eaf..0cc53d9 100644
--- a/hw/sd/milkymist-memcard.c
+++ b/hw/sd/milkymist-memcard.c
@@ -255,6 +255,7 @@ static int milkymist_memcard_init(SysBusDevice *dev)
 DriveInfo *dinfo;
 BlockBackend *blk;
 
+/* FIXME use a qdev drive property instead of drive_get_next() */
 dinfo = drive_get_next(IF_SD);
 blk = dinfo ? blk_by_legacy_dinfo(dinfo) : NULL;
 s->card = sd_init(blk, false);
diff --git a/hw/sd/pl181.c b/hw/sd/pl181.c
index e704b6e..bf37da6 100644
--- a/hw/sd/pl181.c
+++ b/hw/sd/pl181.c
@@ -490,6 +490,7 @@ static int pl181_init(SysBusDevice *sbd)
 sysbus_init_irq(sbd, &s->irq[0]);
 sysbus_init_irq(sbd, &s->irq[1]);
 qdev_init_gpio_out(dev, s->cardstatus, 2);
+/* FIXME use a qdev drive property instead of drive_get_next() */
 dinfo = drive_get_next(IF_SD);
 s->card = sd_init(dinfo ? blk_by_legacy_dinfo(dinfo) : NULL, false);
 if (s->card == NULL) {
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 27b914a..f056c52 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1146,6 +1146,7 @@ static void sdhci_initfn(SDHCIState *s)
 {
 DriveInfo *di;
 
+/* FIXME use a qdev drive property instead of drive_get_next() */
 di = drive_get_next(IF_SD);
 s->card = sd_init(di ? blk_by_legacy_dinfo(di) : NULL, false);
 if (s->card == NULL) {
diff --git a/hw/sd/ssi-sd.c b/hw/sd/ssi-sd.c
index a71fbca..e4b2d4f 100644
--- a/hw/sd/ssi-sd.c
+++ b/hw/sd/ssi-sd.c
@@ -255,6 +255,7 @@ static int ssi_sd_init(SSISlave *d)
 DriveInfo *dinfo;
 
 s->mode = SSI_SD_CMD;
+/* FIXME use a qdev drive property instead of drive_get_next() */
 dinfo = drive_get_next(IF_SD);
 s->sd = sd_init(dinfo ? blk_by_legacy_dinfo(dinfo) : NULL, true);
 if (s->sd == NULL) {
-- 
1.9.3




[Qemu-devel] [PATCH for-2.3 0/3] Contain drive_get() misuse

2015-03-23 Thread Markus Armbruster
Drives defined with if!=none are for board initialization to wire up.
Board code calls drive_get() or similar to find them, and creates
devices with their qdev drive properties set accordingly.

Except a few devices go on a fishing expedition for a suitable backend
instead of exposing a drive property for board code to set: they call
driver_get() or drive_get_next() in their realize() or init() method.
Wrong.

We can't fix this in time for the release, so do the next best thing:
contain the mistakes as far as possible so they don't become ABI:

* Mark them all with suitable FIXME comments [PATCH 1]

* sdhci-pci is new, set cannot_instantiate_with_device_add_yet to make
  it unavailable with -device [PATCH 2]

* A few more aren't currently available with -device, set
  cannot_instantiate_with_device_add_yet to ensure they stay
  unavailable [PATCH 3]

* Left alone: m25p80-generic and its derivatives, ssi-sd, pc87312

Markus Armbruster (3):
  hw: Mark devices misusing drive_get(), drive_get_next() FIXME
  sdhci: Make device "sdhci-pci" unavailable with -device
  sysbus: Contain drive_get_next() misuse

 hw/arm/spitz.c| 3 +++
 hw/block/m25p80.c | 1 +
 hw/isa/pc87312.c  | 2 ++
 hw/sd/milkymist-memcard.c | 3 +++
 hw/sd/pl181.c | 3 +++
 hw/sd/sdhci.c | 6 ++
 hw/sd/ssi-sd.c| 1 +
 7 files changed, 19 insertions(+)

-- 
1.9.3




Re: [Qemu-devel] [PATCH for-2.3] powerpc: fix -machine usb=no for newworld and pseries machines

2015-03-23 Thread Alexander Graf

On 03/23/2015 07:20 PM, Marcel Apfelbaum wrote:

On 03/23/2015 07:05 PM, Paolo Bonzini wrote:

Capture the explicit setting of "usb=no" into a separate bool, and
use it to skip the update of machine->usb in the board init function.

Signed-off-by: Paolo Bonzini 
---
  hw/core/machine.c | 1 +
  hw/ppc/mac_newworld.c | 2 +-
  hw/ppc/spapr.c| 2 +-
  include/hw/boards.h   | 1 +
  4 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index cb1185a..25c45e6 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -223,6 +223,7 @@ static void machine_set_usb(Object *obj, bool 
value, Error **errp)

  MachineState *ms = MACHINE(obj);

  ms->usb = value;
+ms->usb_disabled = !value;
Maybe is too late now, but I really not like this pollution of 
MachineState
with 'usb_disabled'. (Imagine we have this kind of fields for lots of 
objects and lots

of corner cases...)
I know it comes to solve a bug, but we talked about it in another mail 
thread and

this change in semantics was approved.

Let me explain *why* I don't like it.
1. We add an "usb_disabled" field to a base class (actually object)
   of all the machines and the only place it is interesting is
   for 2 machines on ppc.
2. Even for these 2 machines, the scenario of defaults=on and usb=off
   is not practical.


I'm personally fine either way, but I assumed that Paolo had a good 
reason for writing the patch?



Alex




Re: [Qemu-devel] [PATCH for-2.3] powerpc: fix -machine usb=no for newworld and pseries machines

2015-03-23 Thread Marcel Apfelbaum

On 03/23/2015 07:05 PM, Paolo Bonzini wrote:

Capture the explicit setting of "usb=no" into a separate bool, and
use it to skip the update of machine->usb in the board init function.

Signed-off-by: Paolo Bonzini 
---
  hw/core/machine.c | 1 +
  hw/ppc/mac_newworld.c | 2 +-
  hw/ppc/spapr.c| 2 +-
  include/hw/boards.h   | 1 +
  4 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index cb1185a..25c45e6 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -223,6 +223,7 @@ static void machine_set_usb(Object *obj, bool value, Error 
**errp)
  MachineState *ms = MACHINE(obj);

  ms->usb = value;
+ms->usb_disabled = !value;

Maybe is too late now, but I really not like this pollution of MachineState
with 'usb_disabled'. (Imagine we have this kind of fields for lots of objects 
and lots
of corner cases...)
I know it comes to solve a bug, but we talked about it in another mail thread 
and
this change in semantics was approved.

Let me explain *why* I don't like it.
1. We add an "usb_disabled" field to a base class (actually object)
   of all the machines and the only place it is interesting is
   for 2 machines on ppc.
2. Even for these 2 machines, the scenario of defaults=on and usb=off
   is not practical.

I hope I helped,
Thanks,
Marcel

P.S. If you still want it there, maybe move it to a ppc base class?


  }

  static char *machine_get_firmware(Object *obj, Error **errp)
diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index e0397bc..a365bf9 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -371,7 +371,7 @@ static void ppc_core99_init(MachineState *machine)
  /* 970 gets a U3 bus */
  pci_bus = pci_pmac_u3_init(pic, get_system_memory(), get_system_io());
  machine_arch = ARCH_MAC99_U3;
-machine->usb |= defaults_enabled();
+machine->usb |= defaults_enabled() && !machine->usb_disabled;
  } else {
  pci_bus = pci_pmac_init(pic, get_system_memory(), get_system_io());
  machine_arch = ARCH_MAC99;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0487f52..dd5e101 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1563,7 +1563,7 @@ static void ppc_spapr_init(MachineState *machine)
  /* Graphics */
  if (spapr_vga_init(phb->bus)) {
  spapr->has_graphics = true;
-machine->usb |= defaults_enabled();
+machine->usb |= defaults_enabled() && !machine->usb_disabled;
  }

  if (machine->usb) {
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 1feea2b..be6a0ed 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -141,6 +141,7 @@ struct MachineState {
  bool dump_guest_core;
  bool mem_merge;
  bool usb;
+bool usb_disabled;
  char *firmware;
  bool iommu;
  bool suppress_vmdesc;






Re: [Qemu-devel] [Xen-devel] Question about scsi emulation

2015-03-23 Thread Stefano Stabellini
On Mon, 23 Mar 2015, Stefan Hajnoczi wrote:
> On Wed, Mar 18, 2015 at 02:16:47PM -0700, Yaoli Zheng wrote:
> > We have problem using qemu emulated scsi driver(the old lsi). Wonder if any 
> > of other device model we can try for emulating scsi, and how we can get and 
> > config it in Xen? Having been told virtio-scsi is alternative one, but have 
> > no idea how to get it work in Xen.
> 
> The MegaSAS HBA might be another option.  Not sure whether MegaSAS or
> virtio-scsi are supported under Xen though.
 
Making MegaSAS HBA work on Xen should be easy.

Rather than virtio-scsi I would consider the new Xen pvscsi
frontend/backend drivers, now upstream in Linux
(drivers/scsi/xen-scsifront.c and drivers/xen/xen-scsiback.c). Usually
Xen PV drivers perform much better than virtio devices on Xen.

That said, probably most things would be better than the old lsi.



Re: [Qemu-devel] RFC: memory API changes

2015-03-23 Thread Peter Maydell
On 23 March 2015 at 17:51, Andreas Färber  wrote:
> Am 23.03.2015 um 15:39 schrieb Paolo Bonzini:
>> On 23/03/2015 13:24, Peter Maydell wrote:
>>> The point of indicating failure via MemTxResult is that at
>>> some point we need to correct the current broken handling of
>>> the CPUClass do_unassigned_access hook, because that should
>>> only be invoked if the CPU itself does an access to an unassigned
>>> address, not if some random DMA'ing device does!
>>
>> 100% agreement on this.
>
> Just to clarify, CPUClass::do_unassigned_access() was a simple
> refactoring towards multi-target builds (~two inline functions remain).
>
> If you rather need it on MemoryRegion level or somewhere else, feel free
> to post patches, just please keep the original goal in mind.

Yes, the change here would just be to hoist the callsite
for the hook up to somewhere else, probably io_read*/io_write*
but at any rate somewhere which is in the "definitely only
for CPU generated loads and stores" code path.

-- PMM



Re: [Qemu-devel] [PATCH 0/4] Ide patches for 2.3-rc1

2015-03-23 Thread Peter Maydell
On 23 March 2015 at 16:56, John Snow  wrote:
> The following changes since commit 3c6c9fe034c0c07b77f272e4a53d7735220a16a4:
>
>   Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
> staging (2015-03-20 12:26:09 +)
>
> are available in the git repository at:
>
>   https://github.com/jnsnow/qemu.git tags/ide-pull-request
>
> for you to fetch changes up to 54fced034e4d32d8ba6d1e27ecb7e2e2fb2f45d4:
>
>   ahci-test: improve rw buffer patterns (2015-03-23 12:24:16 -0400)
>
> 

Applied, thanks.

-- PMM



Re: [Qemu-devel] RFC: memory API changes

2015-03-23 Thread Andreas Färber
Am 23.03.2015 um 15:39 schrieb Paolo Bonzini:
> On 23/03/2015 13:24, Peter Maydell wrote:
>> The point of indicating failure via MemTxResult is that at
>> some point we need to correct the current broken handling of
>> the CPUClass do_unassigned_access hook, because that should
>> only be invoked if the CPU itself does an access to an unassigned
>> address, not if some random DMA'ing device does!
> 
> 100% agreement on this.

Just to clarify, CPUClass::do_unassigned_access() was a simple
refactoring towards multi-target builds (~two inline functions remain).

If you rather need it on MemoryRegion level or somewhere else, feel free
to post patches, just please keep the original goal in mind.

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu,
Graham Norton; HRB 21284 (AG Nürnberg)



Re: [Qemu-devel] [PATCH RFC for-2.3 1/1] block: New command line option --no-format-probing

2015-03-23 Thread Paolo Bonzini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



On 23/03/2015 18:48, Eric Blake wrote:
>> Why can't libvirt just add ,format=raw instead of leaving out the
>> format key altogether?
> 
> Libvirt DOES add format=raw.  This patch is an extra insurance
> policy to guarantee that libvirt does not have any code paths that
> omit the explicit format (as we have had a couple of CVEs in
> libvirt over the years where that was the case).

And where's the extra insurance policy to guarantee that QEMU does not
have any code paths that ignore the new command line option?

This is really borderline security theater.  Bugs happen, we fix them.
 Even better, Kevin now has implemented a strong mitigation for CVEs
like this, that won't allow guests to transmute a probed raw image
into another format.  There certainly hasn't been enough discussion
for this to get into 2.3.

Paolo
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQEcBAEBCAAGBQJVEFJ+AAoJEL/70l94x66D/OEH/1j58fDg1W8XBjtaGQ12YsL6
HLKYaU2ObaxY3m5sX+mlMr1ftn/5kQnwVC7xx88xDCq/UG+GuSBRrT+SbxZtkdl4
SM9d0fATaK3yC0o0q3SWXeURAvi0bVOEoGqdpvwgrgGTcGkZPzsh9TwQySkupa8J
mQns/HTF3b7JWJvoVCseTOP99Hq+6+2DmWFbzyfisah/f2nlgNhPULSj0KZQmWxP
dMHPn9PG3NXV3E/xelTXWsMDuJKnnMu3w5MbULbNYDkwJe2f5bBOl6/AV4zqHZ5U
49Ewb1Mdcw+6r3aro2kCQ3wEYKnEpLb/Mb6Lj/i6OUXbA+0TlBWX906BBze+6SI=
=BWO8
-END PGP SIGNATURE-



Re: [Qemu-devel] [PATCH RFC for-2.3 1/1] block: New command line option --no-format-probing

2015-03-23 Thread Eric Blake
On 03/23/2015 11:23 AM, Paolo Bonzini wrote:
> 
> 
> On 20/03/2015 15:19, Markus Armbruster wrote:
>>> If (a working version of) this makes it in 2.3, libvirt WILL use it in
>>> the next release.  It will take me less than 5 minutes to write up the
>>> libvirt patch, as long as the new option is advertised via
>>> query-command-line-options (which means that QMP introspection of the
>>> new option is a must for v2 :)
> 
> Why can't libvirt just add ,format=raw instead of leaving out the format
> key altogether?

Libvirt DOES add format=raw.  This patch is an extra insurance policy to
guarantee that libvirt does not have any code paths that omit the
explicit format (as we have had a couple of CVEs in libvirt over the
years where that was the case).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] add 1394 support

2015-03-23 Thread Itamar Tal
>From 9c4425d4145ac6b0a2d19a6fdf5803dd13f1fa93 Mon Sep 17 00:00:00 2001
From: Itamar Tal 
Date: Mon, 23 Mar 2015 14:26:47 +0200
Subject: [PATCH] add 1394 support

This patch add 1394 (firewire) support to x86_64 and i386 qemu
softmmu. It allows one virtual machine to be the server side and the
other to be the client side, connected by TCP stream socket (same
thing is possible using serial port). This is very useful in for
allowing legacy devices to communicate over the firewire channel, but
doesn't support USB communication. Especially, it's useful for remote
Windows kernel debugging over qemu for malware analysis and so on...
The patch was tested on major stable version 2.0.0, 2.2.1 and current
master (2.3.0rc?).

Itamar Tal,
Guardicore

---
 hw/1394/Makefile.objs |7 +
 hw/1394/hcd-ohci.c| 1645 +
 hw/1394/hcd-ohci.h|  147 +
 hw/Makefile.objs  |1 +
 4 files changed, 1800 insertions(+)
 create mode 100644 hw/1394/Makefile.objs
 create mode 100644 hw/1394/hcd-ohci.c
 create mode 100644 hw/1394/hcd-ohci.h

diff --git a/hw/1394/Makefile.objs b/hw/1394/Makefile.objs
new file mode 100644
index 000..6c039e6
--- /dev/null
+++ b/hw/1394/Makefile.objs
@@ -0,0 +1,7 @@
+ifeq ($(TARGET_X86_64),y)
+common-obj-y += hcd-ohci.o
+else
+ifeq ($(TARGET_I386),y)
+common-obj-y += hcd-ohci.o
+endif
+endif
diff --git a/hw/1394/hcd-ohci.c b/hw/1394/hcd-ohci.c
new file mode 100644
index 000..cf373e6
--- /dev/null
+++ b/hw/1394/hcd-ohci.c
@@ -0,0 +1,1645 @@
+/*
+ * FireWire (1394) support
+ *
+ * Copyright (c) 2015 Guardicore
+ * - ita...@guardicore.com
+ * Originally Written by James Harper
+ *
+ * This is a `bare-bones' implementation of the Firewire 1394 OHCI
+ * for virtual->virtual firewire connections emulation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the
"Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ */
+
+#include "hcd-ohci.h"
+#include 
+
+#define OHCI_1394_MMIO_SIZE 0x800
+
+#define HCCONTROL_RESET 16
+#define HCCONTROL_LINK_ENABLE 17
+#define HCCONTROL_LPS 19
+
+#define HCCONTROL_RESET_MASK (1 << (HCCONTROL_RESET))
+#define HCCONTROL_LINK_ENABLE_MASK (1 << (HCCONTROL_LINK_ENABLE))
+#define HCCONTROL_LPS_MASK (1 << (HCCONTROL_LPS))
+
+#define HCD_STATE_UNPLUGGED0 /* no connection */
+#define HCD_STATE_MAGIC1 /* waiting for magic */
+#define HCD_STATE_DISCONNECTED 2 /* waiting for link packet */
+#define HCD_STATE_ARBITRATION1 3 /* send bid */
+#define HCD_STATE_ARBITRATION2 4 /* receive bid and compare */
+#define HCD_STATE_CONNECTED5 /* connected and ready to go */
+
+#define IS_Ax_ACTIVE(n) (s->mmio.regs[((0x180 + ((n) * 0x20)) +
0x000) >> 2] & (1 << 10))
+#define SET_Ax_ACTIVE(n) s->mmio.regs[((0x180 + ((n) * 0x20)) +
0x000) >> 2] |= (1 << 10)
+#define CLR_Ax_ACTIVE(n) s->mmio.regs[((0x180 + ((n) * 0x20)) +
0x000) >> 2] &= ~(1 << 10)
+
+#define IS_Ax_DEAD(n) (s->mmio.regs[((0x180 + ((n) * 0x20)) + 0x000)
>> 2] & (1 << 11))
+#define SET_Ax_DEAD(n) s->mmio.regs[((0x180 + ((n) * 0x20)) + 0x000)
>> 2] |= (1 << 11)
+
+#define IS_Ax_WAKE(n) (s->mmio.regs[((0x180 + ((n) * 0x20)) + 0x000)
>> 2] & (1 << 12))
+#define CLR_Ax_WAKE(n) s->mmio.regs[((0x180 + ((n) * 0x20)) + 0x000)
>> 2] &= ~(1 << 12)
+
+#define IS_Ax_RUN(n) (s->mmio.regs[((0x180 + ((n) * 0x20)) + 0x000)
>> 2] & (1 << 15))
+
+#define SET_Ax_EVENT_CODE(n, e) s->mmio.regs[((0x180 + ((n) * 0x20))
+ 0x000) >> 2] = ((s->mmio.regs[((0x180 + ((n) * 0x20)) + 0x00) >> 2]
& 0xFFE0) | (e))
+
+#define Ax_COMMAND_PTR(n) s->mmio.regs[((0x180 + ((n) * 0x20)) + 0x00C) >> 2]
+#define SET_Ax_COMMAND_PTR(n, c) s->mmio.regs[((0x180 + ((n) * 0x20))
+ 0x00C) >> 2] = c
+
+#define Ax_CONTEXT_CONTROL(n) s->mmio.regs[((0x180 + ((n) * 0x20)) +
0x00) >> 2]
+
+#define DECLARE_PHY \
+union { \
+uint32_t PhyControl; \
+struct { \
+uint32_t PhyControl_wrData:8; \
+uint32_t PhyControl_regAddr:4; \
+uint32_t :2; \
+uint32_t PhyControl_wrReg:1; \
+uint32_t PhyControl_r

[Qemu-devel] [PATCH RFC 2/4] target-i386: Prepare CPU socket/core abstraction

2015-03-23 Thread Andreas Färber
Short of generic recursive device realization, realize cores and threads
recursively.

Signed-off-by: Andreas Färber 
---
 hw/i386/Makefile.objs|  1 +
 hw/i386/cpu-core.c   | 45 
 hw/i386/cpu-socket.c | 45 
 include/hw/i386/cpu-core.h   | 26 +
 include/hw/i386/cpu-socket.h | 29 
 5 files changed, 146 insertions(+)
 create mode 100644 hw/i386/cpu-core.c
 create mode 100644 hw/i386/cpu-socket.c
 create mode 100644 include/hw/i386/cpu-core.h
 create mode 100644 include/hw/i386/cpu-socket.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index e058a39..9f76424 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -1,5 +1,6 @@
 obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o smbios.o
+obj-y += cpu-socket.o cpu-core.o
 obj-y += pc.o pc_piix.o pc_q35.o
 obj-y += pc_sysfw.o
 obj-y += intel_iommu.o
diff --git a/hw/i386/cpu-core.c b/hw/i386/cpu-core.c
new file mode 100644
index 000..d579025
--- /dev/null
+++ b/hw/i386/cpu-core.c
@@ -0,0 +1,45 @@
+/*
+ * x86 CPU core abstraction
+ *
+ * Copyright (c) 2015 SUSE Linux GmbH
+ */
+
+#include "hw/i386/cpu-core.h"
+
+static int x86_cpu_core_realize_child(Object *child, void *opaque)
+{
+Error **errp = opaque;
+Error *local_err = NULL;
+
+object_property_set_bool(child, true, "realized", &local_err);
+error_propagate(errp, local_err);
+
+return local_err != NULL ? 1 : 0;
+}
+
+static void x86_cpu_core_realize(DeviceState *dev, Error **errp)
+{
+/* XXX generic */
+object_child_foreach(OBJECT(dev), x86_cpu_core_realize_child, errp);
+}
+
+static void x86_cpu_core_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+dc->realize = x86_cpu_core_realize;
+}
+
+static const TypeInfo x86_cpu_core_type_info = {
+.name = TYPE_X86_CPU_CORE,
+.parent = TYPE_DEVICE,
+.instance_size = sizeof(X86CPUCore),
+.class_init = x86_cpu_core_class_init,
+};
+
+static void x86_cpu_core_register_types(void)
+{
+type_register_static(&x86_cpu_core_type_info);
+}
+
+type_init(x86_cpu_core_register_types)
diff --git a/hw/i386/cpu-socket.c b/hw/i386/cpu-socket.c
new file mode 100644
index 000..dc27400
--- /dev/null
+++ b/hw/i386/cpu-socket.c
@@ -0,0 +1,45 @@
+/*
+ * x86 CPU socket abstraction
+ *
+ * Copyright (c) 2015 SUSE Linux GmbH
+ */
+
+#include "hw/i386/cpu-socket.h"
+
+static int x86_cpu_socket_realize_child(Object *child, void *opaque)
+{
+Error **errp = opaque;
+Error *local_err = NULL;
+
+object_property_set_bool(child, true, "realized", &local_err);
+error_propagate(errp, local_err);
+
+return local_err != NULL ? 1 : 0;
+}
+
+static void x86_cpu_socket_realize(DeviceState *dev, Error **errp)
+{
+/* XXX generic */
+object_child_foreach(OBJECT(dev), x86_cpu_socket_realize_child, errp);
+}
+
+static void x86_cpu_socket_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+dc->realize = x86_cpu_socket_realize;
+}
+
+static const TypeInfo x86_cpu_socket_type_info = {
+.name = TYPE_X86_CPU_SOCKET,
+.parent = TYPE_CPU_SOCKET,
+.instance_size = sizeof(X86CPUSocket),
+.class_init = x86_cpu_socket_class_init,
+};
+
+static void x86_cpu_socket_register_types(void)
+{
+type_register_static(&x86_cpu_socket_type_info);
+}
+
+type_init(x86_cpu_socket_register_types)
diff --git a/include/hw/i386/cpu-core.h b/include/hw/i386/cpu-core.h
new file mode 100644
index 000..be78f95
--- /dev/null
+++ b/include/hw/i386/cpu-core.h
@@ -0,0 +1,26 @@
+/*
+ * x86 CPU core abstraction
+ *
+ * Copyright (c) 2015 SUSE Linux GmbH
+ */
+#ifndef HW_I386_CPU_CORE_H
+#define HW_I386_CPU_CORE_H
+
+#include "hw/qdev.h"
+
+#ifdef TARGET_X86_64
+#define TYPE_X86_CPU_CORE "x86_64-cpu-core"
+#else
+#define TYPE_X86_CPU_CORE "i386-cpu-core"
+#endif
+
+#define X86_CPU_CORE(obj) \
+OBJECT_CHECK(X86CPUCore, (obj), TYPE_X86_CPU_CORE)
+
+typedef struct X86CPUCore {
+/*< private >*/
+DeviceState parent_obj;
+/*< public >*/
+} X86CPUCore;
+
+#endif
diff --git a/include/hw/i386/cpu-socket.h b/include/hw/i386/cpu-socket.h
new file mode 100644
index 000..9fb3ef8
--- /dev/null
+++ b/include/hw/i386/cpu-socket.h
@@ -0,0 +1,29 @@
+/*
+ * x86 CPU socket abstraction
+ *
+ * Copyright (c) 2015 SUSE Linux GmbH
+ */
+#ifndef HW_I386_CPU_SOCKET_H
+#define HW_I386_CPU_SOCKET_H
+
+#include "hw/cpu/socket.h"
+#include "cpu-core.h"
+
+#ifdef TARGET_X86_64
+#define TYPE_X86_CPU_SOCKET "x86_64-cpu-socket"
+#else
+#define TYPE_X86_CPU_SOCKET "i386-cpu-socket"
+#endif
+
+#define X86_CPU_SOCKET(obj) \
+OBJECT_CHECK(X86CPUSocket, (obj), TYPE_X86_CPU_SOCKET)
+
+typedef struct X86CPUSocket {
+/*< private >*/
+DeviceState parent_obj;
+/*< public >*/
+
+X86CPUCore core[0];
+} X86CPUSocket;
+
+#endif
-- 
2.1.4




[Qemu-devel] [PATCH RFC 4/4] pc: Create initial CPUs in-place

2015-03-23 Thread Andreas Färber
Inline pc_new_cpu() for the initial setup.

Signed-off-by: Andreas Färber 
---
 hw/i386/pc.c   | 39 ++-
 include/hw/i386/cpu-core.h |  3 +++
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 492c262..efc5a23 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -995,13 +995,13 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
level)
 
 static inline size_t pc_cpu_core_size(void)
 {
-return sizeof(X86CPUCore);
+return sizeof(X86CPUCore) + smp_threads * sizeof(X86CPU);
 }
 
 static inline X86CPUCore *pc_cpu_socket_get_core(X86CPUSocket *socket,
  unsigned int index)
 {
-return &socket->core[index];
+return (void *)&socket->core[0] + index * pc_cpu_core_size();
 }
 
 static X86CPU *pc_new_cpu(const char *cpu_model, int64_t apic_id,
@@ -1083,7 +1083,12 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 X86CPUSocket *socket;
 X86CPUCore *core;
 X86CPU *cpu = NULL;
+X86CPUClass *xcc;
+CPUClass *cc;
+ObjectClass *cpu_oc;
 Error *error = NULL;
+gchar **cpu_model_pieces;
+char *cpu_name, *cpu_features;
 unsigned long apic_id_limit;
 int sockets, cpu_index = 0;
 
@@ -1096,6 +1101,23 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 #endif
 }
 current_cpu_model = cpu_model;
+cpu_model_pieces = g_strsplit(cpu_model, ",", 2);
+cpu_name = cpu_model_pieces[0];
+assert(cpu_name);
+cpu_features = cpu_model_pieces[1];
+
+cpu_oc = cpu_class_by_name(TYPE_X86_CPU, cpu_name);
+if (cpu_oc == NULL) {
+error_report("Unable to find CPU definition: %s", cpu_name);
+exit(1);
+}
+cc = CPU_CLASS(cpu_oc);
+xcc = X86_CPU_CLASS(cpu_oc);
+
+if (xcc->kvm_required && !kvm_enabled()) {
+error_report("CPU model '%s' requires KVM", cpu_name);
+exit(1);
+}
 
 apic_id_limit = pc_apic_id_limit(max_cpus);
 if (apic_id_limit > ACPI_CPU_HOTPLUG_ID_LIMIT) {
@@ -1120,12 +1142,18 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 }
 
 for (k = 0; k < smp_threads; k++) {
-cpu = pc_new_cpu(cpu_model,
- x86_cpu_apic_id_from_index(cpu_index),
- icc_bridge, &error);
+cpu = &core->thread[k];
+object_initialize(cpu, sizeof(*cpu),
+  object_class_get_name(cpu_oc));
+cc->parse_features(CPU(cpu), cpu_features, &error);
 if (error) {
 goto error;
 }
+qdev_set_parent_bus(DEVICE(cpu),
+qdev_get_child_bus(icc_bridge, "icc"));
+object_property_set_int(OBJECT(cpu),
+x86_cpu_apic_id_from_index(cpu_index),
+"apic-id", &error);
 object_property_add_child(OBJECT(core), "thread[*]",
   OBJECT(cpu), &error);
 object_unref(OBJECT(cpu));
@@ -1152,6 +1180,7 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 smbios_set_cpuid(cpu->env.cpuid_version, cpu->env.features[FEAT_1_EDX]);
 
 error:
+g_strfreev(cpu_model_pieces);
 if (error) {
 error_report_err(error);
 exit(1);
diff --git a/include/hw/i386/cpu-core.h b/include/hw/i386/cpu-core.h
index be78f95..bb60e8e 100644
--- a/include/hw/i386/cpu-core.h
+++ b/include/hw/i386/cpu-core.h
@@ -7,6 +7,7 @@
 #define HW_I386_CPU_CORE_H
 
 #include "hw/qdev.h"
+#include "cpu.h"
 
 #ifdef TARGET_X86_64
 #define TYPE_X86_CPU_CORE "x86_64-cpu-core"
@@ -21,6 +22,8 @@ typedef struct X86CPUCore {
 /*< private >*/
 DeviceState parent_obj;
 /*< public >*/
+
+X86CPU thread[0];
 } X86CPUCore;
 
 #endif
-- 
2.1.4




[Qemu-devel] [PATCH RFC 1/4] cpu: Prepare Socket container type

2015-03-23 Thread Andreas Färber
Signed-off-by: Andreas Färber 
---
 hw/cpu/Makefile.objs|  2 +-
 hw/cpu/socket.c | 21 +
 include/hw/cpu/socket.h | 14 ++
 3 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 hw/cpu/socket.c
 create mode 100644 include/hw/cpu/socket.h

diff --git a/hw/cpu/Makefile.objs b/hw/cpu/Makefile.objs
index 6381238..e6890cf 100644
--- a/hw/cpu/Makefile.objs
+++ b/hw/cpu/Makefile.objs
@@ -3,4 +3,4 @@ obj-$(CONFIG_REALVIEW) += realview_mpcore.o
 obj-$(CONFIG_A9MPCORE) += a9mpcore.o
 obj-$(CONFIG_A15MPCORE) += a15mpcore.o
 obj-$(CONFIG_ICC_BUS) += icc_bus.o
-
+obj-y += socket.o
diff --git a/hw/cpu/socket.c b/hw/cpu/socket.c
new file mode 100644
index 000..5ca47e9
--- /dev/null
+++ b/hw/cpu/socket.c
@@ -0,0 +1,21 @@
+/*
+ * CPU socket abstraction
+ *
+ * Copyright (c) 2013-2014 SUSE LINUX Products GmbH
+ * Copyright (c) 2015 SUSE Linux GmbH
+ */
+
+#include "hw/cpu/socket.h"
+
+static const TypeInfo cpu_socket_type_info = {
+.name = TYPE_CPU_SOCKET,
+.parent = TYPE_DEVICE,
+.abstract = true,
+};
+
+static void cpu_socket_register_types(void)
+{
+type_register_static(&cpu_socket_type_info);
+}
+
+type_init(cpu_socket_register_types)
diff --git a/include/hw/cpu/socket.h b/include/hw/cpu/socket.h
new file mode 100644
index 000..c8e0c18
--- /dev/null
+++ b/include/hw/cpu/socket.h
@@ -0,0 +1,14 @@
+/*
+ * CPU socket abstraction
+ *
+ * Copyright (c) 2013-2014 SUSE LINUX Products GmbH
+ * Copyright (c) 2015 SUSE Linux GmbH
+ */
+#ifndef HW_CPU_SOCKET_H
+#define HW_CPU_SOCKET_H
+
+#include "hw/qdev.h"
+
+#define TYPE_CPU_SOCKET "cpu-socket"
+
+#endif
-- 
2.1.4




[Qemu-devel] [PATCH RFC 0/4] target-i386: PC socket/core/thread modeling, part 1

2015-03-23 Thread Andreas Färber
Hello,

This long-postponed series proposes a hierarchical QOM model of socket
and core objects for the x86 PC machines.

Background is that due to qdev limitations we had to introduce an ICC bus
to be able to hot-add CPUs and their APICs. By now this limitation could be
resolved via a QOM hotplug handler interface.

However, the QOM hotplug model is associated with having link<> properties.
Given that physically we cannot hot-add hyperthreads but full CPU sockets,
this series prepares the underbelly for having those link properties be of
the new type X86CPUSocket rather than X86CPU.

As final step in this series, the X86CPU allocation model is changed to be
init'ed in-place, as part of an "atomic" socket object. A follow-up will be
to attempt the same in-place allocation model for the APIC; one difficulty
there is that several places do a NULL check for the APIC pointer as quick
way of telling whether an APIC is present or not.

NOT IN THIS SERIES is converting cpu-add to the same socket/core/thread model
and initializing them in-place. The current PoC implementation assumes that
CPUs get added sequentially and that the preceding CPU can be used to obtain
the core via unclean QOM parent accesses.
IIUC that must be changed so that an arbitrary thread added via cpu-add
creates the full socket and core(s). That would work best if indexed link<>
properties could be used. That's an undecided design question of whether
we want to have /machine/cpu-socket[n] or whether it makes sense to integrate
NUMA modeling while at it and rather have /machine/node[n]/socket[m].

Note that this socket modeling is not only PC-specific in the softmmu sense
but also in that not every X86CPU must be on a socket (e.g., Quark X1000).
Therefore it was a concious decision to not label some things target-i386
and to place code in pc.c rather than cpu.c.

Further note that it is ignored here that -smp enforces that AMD CPUs don't
have hyperthreads, i.e. AMD X86CPUs will have only one thread[n] child<>.

Context:

   qemu.git master
   "pc: Ensure non-zero CPU ref count after attaching to ICC bus"
-> this series adding socket/core objects
   cpu-add conversion
   APIC cleanups

Available for testing here:
git://github.com/afaerber/qemu-cpu.git qom-cpu-x86-sockets-1.v1
https://github.com/afaerber/qemu-cpu/commits/qom-cpu-x86-sockets-1.v1

Regards,
Andreas

Cc: Eduardo Habkost 
Cc: Igor Mammedov 
Cc: Paolo Bonzini 
Cc: Peter Maydell 
Cc: Bharata B Rao 
Cc: Christian Borntraeger 

Andreas Färber (4):
  cpu: Prepare Socket container type
  target-i386: Prepare CPU socket/core abstraction
  pc: Create sockets and cores for CPUs
  pc: Create initial CPUs in-place

 hw/cpu/Makefile.objs |  2 +-
 hw/cpu/socket.c  | 21 ++
 hw/i386/Makefile.objs|  1 +
 hw/i386/cpu-core.c   | 45 +
 hw/i386/cpu-socket.c | 45 +
 hw/i386/pc.c | 95 
 include/hw/cpu/socket.h  | 14 +++
 include/hw/i386/cpu-core.h   | 29 ++
 include/hw/i386/cpu-socket.h | 29 ++
 9 files changed, 272 insertions(+), 9 deletions(-)
 create mode 100644 hw/cpu/socket.c
 create mode 100644 hw/i386/cpu-core.c
 create mode 100644 hw/i386/cpu-socket.c
 create mode 100644 include/hw/cpu/socket.h
 create mode 100644 include/hw/i386/cpu-core.h
 create mode 100644 include/hw/i386/cpu-socket.h

-- 
2.1.4




[Qemu-devel] [PATCH RFC 3/4] pc: Create sockets and cores for CPUs

2015-03-23 Thread Andreas Färber
Inline realized=true from pc_new_cpu() so that the realization can be
deferred, as it would otherwise create a device[n] node.

Signed-off-by: Andreas Färber 
---
 hw/i386/pc.c | 66 
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2c48277..492c262 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -54,11 +54,14 @@
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
 #include "sysemu/arch_init.h"
+#include "sysemu/cpus.h"
 #include "qemu/bitmap.h"
 #include "qemu/config-file.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/cpu/icc_bus.h"
+#include "hw/i386/cpu-socket.h"
+#include "hw/i386/cpu-core.h"
 #include "hw/boards.h"
 #include "hw/pci/pci_host.h"
 #include "acpi-build.h"
@@ -990,6 +993,17 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
level)
 }
 }
 
+static inline size_t pc_cpu_core_size(void)
+{
+return sizeof(X86CPUCore);
+}
+
+static inline X86CPUCore *pc_cpu_socket_get_core(X86CPUSocket *socket,
+ unsigned int index)
+{
+return &socket->core[index];
+}
+
 static X86CPU *pc_new_cpu(const char *cpu_model, int64_t apic_id,
   DeviceState *icc_bridge, Error **errp)
 {
@@ -1009,7 +1023,6 @@ static X86CPU *pc_new_cpu(const char *cpu_model, int64_t 
apic_id,
 qdev_set_parent_bus(DEVICE(cpu), qdev_get_child_bus(icc_bridge, "icc"));
 
 object_property_set_int(OBJECT(cpu), apic_id, "apic-id", &local_err);
-object_property_set_bool(OBJECT(cpu), true, "realized", &local_err);
 
 out:
 if (local_err) {
@@ -1060,15 +1073,19 @@ void pc_hot_add_cpu(const int64_t id, Error **errp)
 error_propagate(errp, local_err);
 return;
 }
+object_property_set_bool(OBJECT(cpu), true, "realized", errp);
 object_unref(OBJECT(cpu));
 }
 
 void pc_cpus_init(const char *cpu_model, DeviceState *icc_bridge)
 {
-int i;
+int i, j, k;
+X86CPUSocket *socket;
+X86CPUCore *core;
 X86CPU *cpu = NULL;
 Error *error = NULL;
 unsigned long apic_id_limit;
+int sockets, cpu_index = 0;
 
 /* init CPUs */
 if (cpu_model == NULL) {
@@ -1087,14 +1104,41 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 exit(1);
 }
 
-for (i = 0; i < smp_cpus; i++) {
-cpu = pc_new_cpu(cpu_model, x86_cpu_apic_id_from_index(i),
- icc_bridge, &error);
+sockets = smp_cpus / smp_cores / smp_threads;
+for (i = 0; i < sockets; i++) {
+socket = g_malloc0(sizeof(*socket) + smp_cores * pc_cpu_core_size());
+object_initialize(socket, sizeof(*socket), TYPE_X86_CPU_SOCKET);
+OBJECT(socket)->free = g_free;
+
+for (j = 0; j < smp_cores; j++) {
+core = pc_cpu_socket_get_core(socket, j);
+object_initialize(core, sizeof(*core), TYPE_X86_CPU_CORE);
+object_property_add_child(OBJECT(socket), "core[*]",
+  OBJECT(core), &error);
+if (error) {
+goto error;
+}
+
+for (k = 0; k < smp_threads; k++) {
+cpu = pc_new_cpu(cpu_model,
+ x86_cpu_apic_id_from_index(cpu_index),
+ icc_bridge, &error);
+if (error) {
+goto error;
+}
+object_property_add_child(OBJECT(core), "thread[*]",
+  OBJECT(cpu), &error);
+object_unref(OBJECT(cpu));
+if (error) {
+goto error;
+}
+cpu_index++;
+}
+}
+object_property_set_bool(OBJECT(socket), true, "realized", &error);
 if (error) {
-error_report_err(error);
-exit(1);
+goto error;
 }
-object_unref(OBJECT(cpu));
 }
 
 /* map APIC MMIO area if CPU has APIC */
@@ -1106,6 +1150,12 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 
 /* tell smbios about cpuid version and features */
 smbios_set_cpuid(cpu->env.cpuid_version, cpu->env.features[FEAT_1_EDX]);
+
+error:
+if (error) {
+error_report_err(error);
+exit(1);
+}
 }
 
 /* pci-info ROM file. Little endian format */
-- 
2.1.4




Re: [Qemu-devel] [PATCH for-2.3] powerpc: fix -machine usb=no for newworld and pseries machines

2015-03-23 Thread Alexander Graf

On 03/23/2015 06:05 PM, Paolo Bonzini wrote:

Capture the explicit setting of "usb=no" into a separate bool, and
use it to skip the update of machine->usb in the board init function.

Signed-off-by: Paolo Bonzini 


Thanks, applied to ppc-next (for 2.3).


Alex




Re: [Qemu-devel] [PATCH RFC for-2.3 1/1] block: New command line option --no-format-probing

2015-03-23 Thread Paolo Bonzini


On 20/03/2015 15:19, Markus Armbruster wrote:
> > If (a working version of) this makes it in 2.3, libvirt WILL use it in
> > the next release.  It will take me less than 5 minutes to write up the
> > libvirt patch, as long as the new option is advertised via
> > query-command-line-options (which means that QMP introspection of the
> > new option is a must for v2 :)

Why can't libvirt just add ,format=raw instead of leaving out the format
key altogether?

Paolo



Re: [Qemu-devel] [PATCH for-2.3 1/1] block: New command line option --misc format-probing=off

2015-03-23 Thread Paolo Bonzini


On 23/03/2015 11:04, Markus Armbruster wrote:
> Probing is convenient, but probing untrusted raw images is insecure
> (CVE-2008-2004).  To avoid it, users should always specify raw format
> explicitly.  This isn't trivial, and even sophisticated users have
> gotten it wrong (libvirt CVE-2010-2237, CVE-2010-2238, CVE-2010-2239,
> plus more recent variations of the theme that didn't get CVEs because
> they were caught before they could hurt users).
> 
> Disabling probing entirely is a (hamfisted) way to ensure you always
> specify the format.
> 
> Instead of creating yet another simple option that doesn't work with
> -readconfig, create a "misc" option group and --misc command line
> option.  We're out of space in vm_config_groups[], so double it.
> 
> This will let us make existing miscellaneous non-QemeOpts options
> sugar for --misc, so they become available with -readconfig.  Left for
> another day.

Which exactly?  Could they fit into another scheme?  (See how
-mem-prealloc was replaced and generalized by memory-backend-* objects).

For example, -win2k-install-hack should really be an IDE disk property
that can be set with -global, and many other options could be machine or
display options.

I don't think it's the right solution.  Libvirt knows where to add a
format=raw option, and it can do it without waiting for QEMU to
implement this.  Direct command-line users are not going to use the
option anyway.

So for today we're 1-1 on NACKs. :D

Paolo



[Qemu-devel] [PATCH for-2.3] powerpc: fix -machine usb=no for newworld and pseries machines

2015-03-23 Thread Paolo Bonzini
Capture the explicit setting of "usb=no" into a separate bool, and
use it to skip the update of machine->usb in the board init function.

Signed-off-by: Paolo Bonzini 
---
 hw/core/machine.c | 1 +
 hw/ppc/mac_newworld.c | 2 +-
 hw/ppc/spapr.c| 2 +-
 include/hw/boards.h   | 1 +
 4 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index cb1185a..25c45e6 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -223,6 +223,7 @@ static void machine_set_usb(Object *obj, bool value, Error 
**errp)
 MachineState *ms = MACHINE(obj);
 
 ms->usb = value;
+ms->usb_disabled = !value;
 }
 
 static char *machine_get_firmware(Object *obj, Error **errp)
diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index e0397bc..a365bf9 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -371,7 +371,7 @@ static void ppc_core99_init(MachineState *machine)
 /* 970 gets a U3 bus */
 pci_bus = pci_pmac_u3_init(pic, get_system_memory(), get_system_io());
 machine_arch = ARCH_MAC99_U3;
-machine->usb |= defaults_enabled();
+machine->usb |= defaults_enabled() && !machine->usb_disabled;
 } else {
 pci_bus = pci_pmac_init(pic, get_system_memory(), get_system_io());
 machine_arch = ARCH_MAC99;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0487f52..dd5e101 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1563,7 +1563,7 @@ static void ppc_spapr_init(MachineState *machine)
 /* Graphics */
 if (spapr_vga_init(phb->bus)) {
 spapr->has_graphics = true;
-machine->usb |= defaults_enabled();
+machine->usb |= defaults_enabled() && !machine->usb_disabled;
 }
 
 if (machine->usb) {
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 1feea2b..be6a0ed 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -141,6 +141,7 @@ struct MachineState {
 bool dump_guest_core;
 bool mem_merge;
 bool usb;
+bool usb_disabled;
 char *firmware;
 bool iommu;
 bool suppress_vmdesc;
-- 
2.3.3




[Qemu-devel] [PATCH v5 2/6] target-arm: kvm: save/restore mp state

2015-03-23 Thread Alex Bennée
This adds the saving and restore of the current Multi-Processing state
of the machine. While the KVM_GET/SET_MP_STATE API exposes a number of
potential states for x86 we only use two for ARM. Either the process is
running or not. We then save this state into the cpu_powered TCG state
to avoid changing the serialisation format.

Signed-off-by: Alex Bennée 

---
v2
  - make mpstate field runtime dependant (kvm_enabled())
  - drop initial KVM_CAP_MP_STATE requirement
  - re-use cpu_powered instead of new field

v4
  - s/HALTED/STOPPED/
  - move code from machine.c to kvm.

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 72c1fa1..a74832c 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -458,6 +458,46 @@ void kvm_arm_reset_vcpu(ARMCPU *cpu)
 }
 }
 
+/*
+ * Update KVM's MP_STATE based on what QEMU thinks it is
+ */
+int kvm_arm_sync_mpstate_to_kvm(ARMCPU *cpu)
+{
+if (kvm_check_extension(CPU(cpu)->kvm_state, KVM_CAP_MP_STATE)) {
+struct kvm_mp_state mp_state = {
+.mp_state =
+cpu->powered_off ? KVM_MP_STATE_STOPPED : KVM_MP_STATE_RUNNABLE
+};
+int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MP_STATE, &mp_state);
+if (ret) {
+fprintf(stderr, "%s: failed to set MP_STATE %d/%s\n",
+__func__, ret, strerror(ret));
+return -1;
+}
+}
+
+return 0;
+}
+
+/*
+ * Sync the KVM MP_STATE into QEMU
+ */
+int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu)
+{
+if (kvm_check_extension(CPU(cpu)->kvm_state, KVM_CAP_MP_STATE)) {
+struct kvm_mp_state mp_state;
+int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MP_STATE, &mp_state);
+if (ret) {
+fprintf(stderr, "%s: failed to get MP_STATE %d/%s\n",
+__func__, ret, strerror(ret));
+abort();
+}
+cpu->powered_off = (mp_state.mp_state == KVM_MP_STATE_STOPPED);
+}
+
+return 0;
+}
+
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
 }
diff --git a/target-arm/kvm32.c b/target-arm/kvm32.c
index 94030d1..49b6bab 100644
--- a/target-arm/kvm32.c
+++ b/target-arm/kvm32.c
@@ -356,6 +356,8 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return EINVAL;
 }
 
+kvm_arm_sync_mpstate_to_kvm(cpu);
+
 return ret;
 }
 
@@ -427,5 +429,7 @@ int kvm_arch_get_registers(CPUState *cs)
  */
 write_list_to_cpustate(cpu);
 
+kvm_arm_sync_mpstate_to_qemu(cpu);
+
 return 0;
 }
diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index 8cf3a62..fed03f2 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -211,6 +211,8 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return EINVAL;
 }
 
+kvm_arm_sync_mpstate_to_kvm(cpu);
+
 /* TODO:
  * FP state
  */
@@ -310,6 +312,8 @@ int kvm_arch_get_registers(CPUState *cs)
  */
 write_list_to_cpustate(cpu);
 
+kvm_arm_sync_mpstate_to_qemu(cpu);
+
 /* TODO: other registers */
 return ret;
 }
diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h
index 455dea3..7b75758 100644
--- a/target-arm/kvm_arm.h
+++ b/target-arm/kvm_arm.h
@@ -162,6 +162,24 @@ typedef struct ARMHostCPUClass {
  */
 bool kvm_arm_get_host_cpu_features(ARMHostCPUClass *ahcc);
 
+
+/**
+ * kvm_arm_sync_mpstate_to_kvm
+ * @cpu: ARMCPU
+ *
+ * If supported set the KVM MP_STATE based on QEMUs migration data.
+ */
+int kvm_arm_sync_mpstate_to_kvm(ARMCPU *cpu);
+
+/**
+ * kvm_arm_sync_mpstate_to_qemu
+ * @cpu: ARMCPU
+ *
+ * If supported get the MP_STATE from KVM and store in QEMUs migration
+ * data.
+ */
+int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu);
+
 #endif
 
 #endif
-- 
2.3.2




[Qemu-devel] [PATCH v5 5/6] target-arm: kvm64 fix save/restore of SPSR regs

2015-03-23 Thread Alex Bennée
The current code was negatively indexing the cpu state array and not
synchronizing banked spsr register state with the current mode's spsr
state, causing occasional failures with migration.

Some munging is done to take care of the aarch64 mapping and also to
ensure the most current value of the spsr is updated to the banked
registers (relevant for KVM<->TCG migration).

Signed-off-by: Alex Bennée 

---
v2 (ajb)
  - minor tweaks and clarifications
v3
  - Use the correct bank index function for setting/getting env->spsr
  - only deal with spsrs in elevated exception levels
v4
 - try and make commentary clearer
 - ensure env->banked_spsr[0] = env->spsr before we sync
v5
 - fix banking index now banking fixed
 - keep wide spacing on [ ] forms
 - claimed authorship

diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index 857e970..5270fa7 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -139,6 +139,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 uint64_t val;
 int i;
 int ret;
+unsigned int el;
 
 ARMCPU *cpu = ARM_CPU(cs);
 CPUARMState *env = &cpu->env;
@@ -205,9 +206,24 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
+/* Saved Program State Registers
+ *
+ * Before we restore from the banked_spsr[] array we need to
+ * ensure that any modifications to env->spsr are correctly
+ * reflected in the banks.
+ */
+el = arm_current_el(env);
+if (el > 0) {
+i = is_a64(env) ?
+aarch64_banked_spsr_index(el) :
+bank_number(env->uncached_cpsr & CPSR_M);
+env->banked_spsr[i] = env->spsr;
+}
+
+/* KVM 0-4 map to QEMU banks 1-5 */
 for (i = 0; i < KVM_NR_SPSR; i++) {
 reg.id = AARCH64_CORE_REG(spsr[i]);
-reg.addr = (uintptr_t) &env->banked_spsr[i - 1];
+reg.addr = (uintptr_t) &env->banked_spsr[i + 1];
 ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
 if (ret) {
 return ret;
@@ -253,11 +269,13 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
+
 int kvm_arch_get_registers(CPUState *cs)
 {
 struct kvm_one_reg reg;
 uint64_t val;
 uint32_t fpr;
+unsigned int el;
 int i;
 int ret;
 
@@ -330,15 +348,27 @@ int kvm_arch_get_registers(CPUState *cs)
 return ret;
 }
 
+/* Fetch the SPSR registers
+ *
+ * KVM SPSRs 0-4 map to QEMU banks 1-5
+ */
 for (i = 0; i < KVM_NR_SPSR; i++) {
 reg.id = AARCH64_CORE_REG(spsr[i]);
-reg.addr = (uintptr_t) &env->banked_spsr[i - 1];
+reg.addr = (uintptr_t) &env->banked_spsr[i + 1];
 ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
 if (ret) {
 return ret;
 }
 }
 
+el = arm_current_el(env);
+if (el > 0) {
+i = is_a64(env) ?
+aarch64_banked_spsr_index(el) :
+bank_number(env->uncached_cpsr & CPSR_M);
+env->spsr = env->banked_spsr[i];
+}
+
 /* Advanced SIMD and FP registers
  * We map Qn = regs[2n+1]:regs[2n]
  */
-- 
2.3.2




[Qemu-devel] [PATCH v5 3/6] hw/intc: arm_gic_kvm.c restore config first

2015-03-23 Thread Alex Bennée
As there is logic to deal with the difference between edge and level
triggered interrupts in the kernel we must ensure it knows the
configuration of the IRQs before we restore the pending state.

Signed-off-by: Alex Bennée 
Acked-by: Christoffer Dall 

diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
index 0d20750..e2512f1 100644
--- a/hw/intc/arm_gic_kvm.c
+++ b/hw/intc/arm_gic_kvm.c
@@ -370,6 +370,11 @@ static void kvm_arm_gic_put(GICState *s)
  * the appropriate CPU interfaces in the kernel) */
 kvm_dist_put(s, 0x800, 8, s->num_irq, translate_targets);
 
+/* irq_state[n].trigger -> GICD_ICFGRn
+ * (restore targets before pending IRQs so we treat level/edge
+ * correctly */
+kvm_dist_put(s, 0xc00, 2, s->num_irq, translate_trigger);
+
 /* irq_state[n].pending + irq_state[n].level -> GICD_ISPENDRn */
 kvm_dist_put(s, 0x280, 1, s->num_irq, translate_clear);
 kvm_dist_put(s, 0x200, 1, s->num_irq, translate_pending);
@@ -378,8 +383,6 @@ static void kvm_arm_gic_put(GICState *s)
 kvm_dist_put(s, 0x380, 1, s->num_irq, translate_clear);
 kvm_dist_put(s, 0x300, 1, s->num_irq, translate_active);
 
-/* irq_state[n].trigger -> GICD_ICFRn */
-kvm_dist_put(s, 0xc00, 2, s->num_irq, translate_trigger);
 
 /* s->priorityX[irq] -> ICD_IPRIORITYRn */
 kvm_dist_put(s, 0x400, 8, s->num_irq, translate_priority);
-- 
2.3.2




[Qemu-devel] [PATCH v5 4/6] target-arm: kvm64 sync FP register state

2015-03-23 Thread Alex Bennée
For migration to work we need to sync all of the register state. This is
especially noticeable when GCC starts using FP registers as spill
registers even with integer programs.

Signed-off-by: Alex Bennée 

---

v4:
  - fixed merge conflicts
  - rm superfluous reg.id++
v5:
  - use interim float128 to deal with endianess
  - correctly map into vfp.regs[]
  - fix spacing around []s

diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
index fed03f2..857e970 100644
--- a/target-arm/kvm64.c
+++ b/target-arm/kvm64.c
@@ -126,9 +126,16 @@ bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
 #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
  KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
+#define AARCH64_SIMD_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U128 | \
+ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
+
+#define AARCH64_SIMD_CTRL_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U32 | \
+ KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
+
 int kvm_arch_put_registers(CPUState *cs, int level)
 {
 struct kvm_one_reg reg;
+uint32_t fpr;
 uint64_t val;
 int i;
 int ret;
@@ -207,15 +214,42 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 }
 }
 
+/* Advanced SIMD and FP registers
+ * We map Qn = regs[2n+1]:regs[2n]
+ */
+for (i = 0; i < 32; i++) {
+int rd = i << 1;
+float128 fp_val = make_float128(env->vfp.regs[rd + 1],
+env->vfp.regs[rd]);
+reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
+reg.addr = (uintptr_t)(&fp_val);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+}
+
+reg.addr = (uintptr_t)(&fpr);
+fpr = vfp_get_fpsr(env);
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+
+fpr = vfp_get_fpcr(env);
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+
 if (!write_list_to_kvmstate(cpu)) {
 return EINVAL;
 }
 
 kvm_arm_sync_mpstate_to_kvm(cpu);
 
-/* TODO:
- * FP state
- */
 return ret;
 }
 
@@ -223,6 +257,7 @@ int kvm_arch_get_registers(CPUState *cs)
 {
 struct kvm_one_reg reg;
 uint64_t val;
+uint32_t fpr;
 int i;
 int ret;
 
@@ -304,6 +339,38 @@ int kvm_arch_get_registers(CPUState *cs)
 }
 }
 
+/* Advanced SIMD and FP registers
+ * We map Qn = regs[2n+1]:regs[2n]
+ */
+for (i = 0; i < 32; i++) {
+float128 fp_val;
+reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
+reg.addr = (uintptr_t)(&fp_val);
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
+if (ret) {
+return ret;
+} else {
+int rd = i << 1;
+env->vfp.regs[rd + 1] = fp_val.high;
+env->vfp.regs[rd] = fp_val.low;
+}
+}
+
+reg.addr = (uintptr_t)(&fpr);
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+vfp_set_fpsr(env, fpr);
+
+reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
+if (ret) {
+return ret;
+}
+vfp_set_fpcr(env, fpr);
+
 if (!write_kvmstate_to_list(cpu)) {
 return EINVAL;
 }
-- 
2.3.2




[Qemu-devel] [PATCH v5 0/6] QEMU ARM64 Migration Fixes

2015-03-23 Thread Alex Bennée
Hi,

Following some review comments (and a patch) from Peter I've re-spun
this series:

v5
  - Added Peter's SPSR_EL1 state fix for architectural mapping
  - As a result SPSR save/restore no longer does munge
  - FP register save/restore re-done to deal float128 mapping
  - Some minor [ spaces ] added

I submitted the kernel side of this on Friday

Branch: https://github.com/stsquad/qemu/tree/migration/fixes-v5
Kernel: 
https://git.linaro.org/people/alex.bennee/linux.git/shortlog/refs/heads/migration/kvmarm-fixes-for-4.0-v3

Alex Bennée (5):
  target-arm: kvm: save/restore mp state
  hw/intc: arm_gic_kvm.c restore config first
  target-arm: kvm64 sync FP register state
  target-arm: kvm64 fix save/restore of SPSR regs
  target-arm: cpu.h document why env->spsr exists

Peter Maydell (1):
  target-arm: Store SPSR_EL1 state in banked_spsr[1] (SPSR_svc)

 hw/intc/arm_gic_kvm.c   |   7 ++-
 target-arm/cpu.h|   5 +++
 target-arm/helper-a64.c |   2 +-
 target-arm/helper.c |   2 +-
 target-arm/internals.h  |   5 ++-
 target-arm/kvm.c|  40 +
 target-arm/kvm32.c  |   4 ++
 target-arm/kvm64.c  | 111 +---
 target-arm/kvm_arm.h|  18 
 9 files changed, 184 insertions(+), 10 deletions(-)

-- 
2.3.2




[Qemu-devel] [PATCH v5 1/6] target-arm: Store SPSR_EL1 state in banked_spsr[1] (SPSR_svc)

2015-03-23 Thread Alex Bennée
From: Peter Maydell 

The AArch64 SPSR_EL1 register is architecturally mandated to
be mapped to the AArch32 SPSR_svc register. This means its
state should live in QEMU's env->banked_spsr[1] field.
Correct the various places in the code that incorrectly
put it in banked_spsr[0].

Signed-off-by: Peter Maydell 

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 7e0d038..861f6fa 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -523,7 +523,7 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
 aarch64_save_sp(env, arm_current_el(env));
 env->elr_el[new_el] = env->pc;
 } else {
-env->banked_spsr[0] = cpsr_read(env);
+env->banked_spsr[aarch64_banked_spsr_index(new_el)] = cpsr_read(env);
 if (!env->thumb) {
 env->cp15.esr_el[new_el] |= 1 << 25;
 }
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 10886c5..d77c6de 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2438,7 +2438,7 @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
 { .name = "SPSR_EL1", .state = ARM_CP_STATE_AA64,
   .type = ARM_CP_ALIAS,
   .opc0 = 3, .opc1 = 0, .crn = 4, .crm = 0, .opc2 = 0,
-  .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, banked_spsr[0]) },
+  .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, banked_spsr[1]) },
 /* We rely on the access checks not allowing the guest to write to the
  * state field when SPSel indicates that it's being used as the stack
  * pointer.
diff --git a/target-arm/internals.h b/target-arm/internals.h
index bb171a7..2cc3017 100644
--- a/target-arm/internals.h
+++ b/target-arm/internals.h
@@ -82,11 +82,14 @@ static inline void arm_log_exception(int idx)
 
 /*
  * For AArch64, map a given EL to an index in the banked_spsr array.
+ * Note that this mapping and the AArch32 mapping defined in bank_number()
+ * must agree such that the AArch64<->AArch32 SPSRs have the architecturally
+ * mandated mapping between each other.
  */
 static inline unsigned int aarch64_banked_spsr_index(unsigned int el)
 {
 static const unsigned int map[4] = {
-[1] = 0, /* EL1.  */
+[1] = 1, /* EL1.  */
 [2] = 6, /* EL2.  */
 [3] = 7, /* EL3.  */
 };
-- 
2.3.2




[Qemu-devel] [PATCH v5 6/6] target-arm: cpu.h document why env->spsr exists

2015-03-23 Thread Alex Bennée
I was getting very confused about the duplication of state so wanted to
make it explicit.

Signed-off-by: Alex Bennée 

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 083211c..6dc1799 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -155,6 +155,11 @@ typedef struct CPUARMState {
This contains all the other bits.  Use cpsr_{read,write} to access
the whole CPSR.  */
 uint32_t uncached_cpsr;
+/* The spsr is a alias for spsr_elN where N is the current
+ * exception level. It is provided for here so the TCG msr/mrs
+ * implementation can access one register. Care needs to be taken
+ * to ensure the banked_spsr[] is also updated.
+ */
 uint32_t spsr;
 
 /* Banked registers.  */
-- 
2.3.2




Re: [Qemu-devel] [PULL 0/3] linux-user patches for 2.3-rc1

2015-03-23 Thread Peter Maydell
On 23 March 2015 at 13:54,   wrote:
> From: Riku Voipio 
>
> The following changes since commit 3c6c9fe034c0c07b77f272e4a53d7735220a16a4:
>
>   Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
> staging (2015-03-20 12:26:09 +)
>
> are available in the git repository at:
>
>   git://git.linaro.org/people/riku.voipio/qemu.git 
> tags/pull-linux-user-20150323
>
> for you to fetch changes up to 61c7480fa36775cc2baa2f8141f0c64a15f827b5:
>
>   linux-user: fix broken cpu_copy() (2015-03-23 15:26:42 +0200)
>
> 
> linux-user patches for 2.3-rc1
>
> 

Applied, thanks.

-- PMM



[Qemu-devel] [PATCH 3/4] ahci: Fix sglist offset manipulation for BE machines

2015-03-23 Thread John Snow
This does not bother DMA, because DMA generally transfers
the entire SGList in one shot if it can.

PIO, on the other hand, tries to transfer just one sector
at a time, and will make multiple visits to the sglist
to fetch memory addresses.

Fix the memory address calculaton when we have an offset
by moving the offset addition OUTSIDE of the le64_to_cpu
calculation.

Signed-off-by: John Snow 
Reviewed-by: Stefan Hajnoczi 
Tested-by: Andreas Färber 
Message-id: 1426811056-2202-4-git-send-email-js...@redhat.com
---
 hw/ide/ahci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index e1ae36f..7a223be 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -799,7 +799,7 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList 
*sglist,
 
 qemu_sglist_init(sglist, qbus->parent, (sglist_alloc_hint - off_idx),
  ad->hba->as);
-qemu_sglist_add(sglist, le64_to_cpu(tbl[off_idx].addr + off_pos),
+qemu_sglist_add(sglist, le64_to_cpu(tbl[off_idx].addr) + off_pos,
 prdt_tbl_entry_size(&tbl[off_idx]) - off_pos);
 
 for (i = off_idx + 1; i < sglist_alloc_hint; i++) {
-- 
2.1.0




Re: [Qemu-devel] [PATCH 0/4] Ide patches for 2.3-rc1

2015-03-23 Thread John Snow



On 03/23/2015 12:56 PM, John Snow wrote:

The following changes since commit 3c6c9fe034c0c07b77f272e4a53d7735220a16a4:

   Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
staging (2015-03-20 12:26:09 +)

are available in the git repository at:

   https://github.com/jnsnow/qemu.git tags/ide-pull-request

for you to fetch changes up to 54fced034e4d32d8ba6d1e27ecb7e2e2fb2f45d4:

   ahci-test: improve rw buffer patterns (2015-03-23 12:24:16 -0400)





John Snow (4):
   ide: fix cmd_write_pio when nsectors > 1
   ide: fix cmd_read_pio when nsectors > 1
   ahci: Fix sglist offset manipulation for BE machines
   ahci-test: improve rw buffer patterns

  hw/ide/ahci.c |  2 +-
  hw/ide/core.c | 10 --
  tests/ahci-test.c | 36 
  3 files changed, 33 insertions(+), 15 deletions(-)



Script failure, sorry :(

Should read [PULL], of course ...



[Qemu-devel] [PATCH 2/4] ide: fix cmd_read_pio when nsectors > 1

2015-03-23 Thread John Snow
Similar to the cmd_write_pio fix, update the nsector count and
ide sector before we invoke ide_transfer_start.

Signed-off-by: John Snow 
Reviewed-by: Stefan Hajnoczi 
Tested-by: Andreas Färber 
Message-id: 1426811056-2202-3-git-send-email-js...@redhat.com
---
 hw/ide/core.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 0e9da64..a895fd8 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -587,14 +587,12 @@ static void ide_sector_read_cb(void *opaque, int ret)
 n = s->req_nb_sectors;
 }
 
+ide_set_sector(s, ide_get_sector(s) + n);
+s->nsector -= n;
 /* Allow the guest to read the io_buffer */
 ide_transfer_start(s, s->io_buffer, n * BDRV_SECTOR_SIZE, ide_sector_read);
-
-ide_set_irq(s->bus);
-
-ide_set_sector(s, ide_get_sector(s) + n);
-s->nsector -= n;
 s->io_buffer_offset += 512 * n;
+ide_set_irq(s->bus);
 }
 
 static void ide_sector_read(IDEState *s)
-- 
2.1.0




[Qemu-devel] [PATCH 4/4] ahci-test: improve rw buffer patterns

2015-03-23 Thread John Snow
My pattern was cyclical every 256 bytes, so it missed a fairly obvious
failure case. Add some rand() pepper into the test pattern, and for large
patterns that exceed 256 sectors, start writing an ID per-sector so that
we never generate identical sector patterns.

Signed-off-by: John Snow 
Reviewed-by: Stefan Hajnoczi 
Tested-by: Andreas Färber 
Message-id: 1426811056-2202-5-git-send-email-js...@redhat.com
---
 tests/ahci-test.c | 36 
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 169e83b..ea62e24 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -68,6 +68,32 @@ static void string_bswap16(uint16_t *s, size_t bytes)
 }
 }
 
+static void generate_pattern(void *buffer, size_t len, size_t cycle_len)
+{
+int i, j;
+unsigned char *tx = (unsigned char *)buffer;
+unsigned char p;
+size_t *sx;
+
+/* Write an indicative pattern that varies and is unique per-cycle */
+p = rand() % 256;
+for (i = j = 0; i < len; i++, j++) {
+tx[i] = p;
+if (j % cycle_len == 0) {
+p = rand() % 256;
+}
+}
+
+/* force uniqueness by writing an id per-cycle */
+for (i = 0; i < len / cycle_len; i++) {
+j = i * cycle_len;
+if (j + sizeof(*sx) <= len) {
+sx = (size_t *)&tx[j];
+*sx = i;
+}
+}
+}
+
 /*** Test Setup & Teardown ***/
 
 /**
@@ -736,7 +762,6 @@ static void ahci_test_io_rw_simple(AHCIQState *ahci, 
unsigned bufsize,
 {
 uint64_t ptr;
 uint8_t port;
-unsigned i;
 unsigned char *tx = g_malloc(bufsize);
 unsigned char *rx = g_malloc0(bufsize);
 
@@ -752,9 +777,7 @@ static void ahci_test_io_rw_simple(AHCIQState *ahci, 
unsigned bufsize,
 g_assert(ptr);
 
 /* Write some indicative pattern to our buffer. */
-for (i = 0; i < bufsize; i++) {
-tx[i] = (bufsize - i);
-}
+generate_pattern(tx, bufsize, AHCI_SECTOR_SIZE);
 memwrite(ptr, tx, bufsize);
 
 /* Write this buffer to disk, then read it back to the DMA buffer. */
@@ -865,7 +888,6 @@ static void test_dma_fragmented(void)
 size_t bufsize = 4096;
 unsigned char *tx = g_malloc(bufsize);
 unsigned char *rx = g_malloc0(bufsize);
-unsigned i;
 uint64_t ptr;
 
 ahci = ahci_boot_and_enable();
@@ -873,9 +895,7 @@ static void test_dma_fragmented(void)
 ahci_port_clear(ahci, px);
 
 /* create pattern */
-for (i = 0; i < bufsize; i++) {
-tx[i] = (bufsize - i);
-}
+generate_pattern(tx, bufsize, AHCI_SECTOR_SIZE);
 
 /* Create a DMA buffer in guest memory, and write our pattern to it. */
 ptr = guest_alloc(ahci->parent->alloc, bufsize);
-- 
2.1.0




[Qemu-devel] [PATCH 0/4] Ide patches for 2.3-rc1

2015-03-23 Thread John Snow
The following changes since commit 3c6c9fe034c0c07b77f272e4a53d7735220a16a4:

  Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
staging (2015-03-20 12:26:09 +)

are available in the git repository at:

  https://github.com/jnsnow/qemu.git tags/ide-pull-request

for you to fetch changes up to 54fced034e4d32d8ba6d1e27ecb7e2e2fb2f45d4:

  ahci-test: improve rw buffer patterns (2015-03-23 12:24:16 -0400)





John Snow (4):
  ide: fix cmd_write_pio when nsectors > 1
  ide: fix cmd_read_pio when nsectors > 1
  ahci: Fix sglist offset manipulation for BE machines
  ahci-test: improve rw buffer patterns

 hw/ide/ahci.c |  2 +-
 hw/ide/core.c | 10 --
 tests/ahci-test.c | 36 
 3 files changed, 33 insertions(+), 15 deletions(-)

-- 
2.1.0




[Qemu-devel] [PATCH 1/4] ide: fix cmd_write_pio when nsectors > 1

2015-03-23 Thread John Snow
We need to adjust the sector being written to
prior to calling ide_transfer_start, otherwise
we'll write to the same sector again.

Signed-off-by: John Snow 
Reviewed-by: Stefan Hajnoczi 
Tested-by: Andreas Färber 
Message-id: 1426811056-2202-2-git-send-email-js...@redhat.com
---
 hw/ide/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index ef52f35..0e9da64 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -846,6 +846,7 @@ static void ide_sector_write_cb(void *opaque, int ret)
 s->nsector -= n;
 s->io_buffer_offset += 512 * n;
 
+ide_set_sector(s, ide_get_sector(s) + n);
 if (s->nsector == 0) {
 /* no more sectors to write */
 ide_transfer_stop(s);
@@ -857,7 +858,6 @@ static void ide_sector_write_cb(void *opaque, int ret)
 ide_transfer_start(s, s->io_buffer, n1 * BDRV_SECTOR_SIZE,
ide_sector_write);
 }
-ide_set_sector(s, ide_get_sector(s) + n);
 
 if (win2k_install_hack && ((++s->irq_count % 16) == 0)) {
 /* It seems there is a bug in the Windows 2000 installer HDD
-- 
2.1.0




Re: [Qemu-devel] RFC: memory API changes

2015-03-23 Thread Peter Maydell
On 23 March 2015 at 16:30, Paolo Bonzini  wrote:
> None of them does, uses of ld*_phys(cs->as, ...) are almost all in
> target-* already.

Oh good :-)

Regardless, I don't see why this means that ld*_phys should be
in cpu-common.h or cpu-all.h rather than memory.h ?

> There's just hw/ppc/spapr_hcall.c and hw/ppc/spapr_iommu.c that are
> using cs->as in hw/, but they are correct because these are hypercalls.

Also hw/ppc/ppc405_uc.c (using stl_be_phys).

> The others use &address_space_memory.  I sent a patch to fix megasas and
> vmware.

Thanks.

-- PMM



Re: [Qemu-devel] [PATCH v2] target-tricore: Fix two helper functions (clang warnings)

2015-03-23 Thread Bastian Koppelmann

Hi Stefan,

On 03/23/2015 03:41 PM, Bastian Koppelmann wrote:

[snip]

The code for dvinit.b and dvinit.h is not correctly calculating the 
overflow bit, but I can't figure out why, since the documentation of 
the real hardware seems to be wrong. I will take your patch for now, 
so the warning is gone and try to come up with a proper fix as soon as 
possible.


Cheers,
Bastian

I have the proper fix now and will send a patch to the list in a few 
minutes. I'll keep your patch and will apply the fix on top of it.


Cheers,
Bastian



[Qemu-devel] [PATCH] target-tricore: properly fix dvinit_b/h_13

2015-03-23 Thread Bastian Koppelmann
The TriCore documentation was wrong on how to calculate ovf bits for those two
instructions, which I confirmed with real hardware (TC1796 chip). An ovf
actually happens, if the result (without remainder) does not fit into 8/16 bits.

Signed-off-by: Bastian Koppelmann 
---
 target-tricore/op_helper.c | 40 ++--
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/target-tricore/op_helper.c b/target-tricore/op_helper.c
index 2cfa95d..220ec4a 100644
--- a/target-tricore/op_helper.c
+++ b/target-tricore/op_helper.c
@@ -1942,29 +1942,19 @@ uint64_t helper_unpack(target_ulong arg1)
 uint64_t helper_dvinit_b_13(CPUTriCoreState *env, uint32_t r1, uint32_t r2)
 {
 uint64_t ret;
-int32_t abs_sig_dividend, abs_base_dividend, abs_divisor;
-int32_t quotient_sign;
+int32_t abs_sig_dividend, abs_divisor;
 
 ret = sextract32(r1, 0, 32);
 ret = ret << 24;
-quotient_sign = 0;
 if (!((r1 & 0x8000) == (r2 & 0x8000))) {
 ret |= 0xff;
-quotient_sign = 1;
 }
 
-abs_sig_dividend = abs((int32_t)r1) >> 7;
-abs_base_dividend = abs((int32_t)r1) & 0x7f;
+abs_sig_dividend = abs((int32_t)r1) >> 8;
 abs_divisor = abs((int32_t)r2);
-/* calc overflow */
-env->PSW_USB_V = 0;
-if ((quotient_sign) && (abs_divisor)) {
-env->PSW_USB_V = (((abs_sig_dividend == abs_divisor) &&
- (abs_base_dividend >= abs_divisor)) ||
- (abs_sig_dividend > abs_divisor));
-} else {
-env->PSW_USB_V = (abs_sig_dividend >= abs_divisor);
-}
+/* calc overflow
+   ofv if (a/b >= 255) <=> (a/255 >= b) */
+env->PSW_USB_V = (abs_sig_dividend >= abs_divisor) << 31;
 env->PSW_USB_V = env->PSW_USB_V << 31;
 env->PSW_USB_SV |= env->PSW_USB_V;
 env->PSW_USB_AV = 0;
@@ -1992,29 +1982,19 @@ uint64_t helper_dvinit_b_131(CPUTriCoreState *env, 
uint32_t r1, uint32_t r2)
 uint64_t helper_dvinit_h_13(CPUTriCoreState *env, uint32_t r1, uint32_t r2)
 {
 uint64_t ret;
-int32_t abs_sig_dividend, abs_base_dividend, abs_divisor;
-int32_t quotient_sign;
+int32_t abs_sig_dividend, abs_divisor;
 
 ret = sextract32(r1, 0, 32);
 ret = ret << 16;
-quotient_sign = 0;
 if (!((r1 & 0x8000) == (r2 & 0x8000))) {
 ret |= 0x;
-quotient_sign = 1;
 }
 
-abs_sig_dividend = abs((int32_t)r1) >> 7;
-abs_base_dividend = abs((int32_t)r1) & 0x7f;
+abs_sig_dividend = abs((int32_t)r1) >> 16;
 abs_divisor = abs((int32_t)r2);
-/* calc overflow */
-env->PSW_USB_V = 0;
-if ((quotient_sign) && (abs_divisor)) {
-env->PSW_USB_V = (((abs_sig_dividend == abs_divisor) &&
- (abs_base_dividend >= abs_divisor)) ||
- (abs_sig_dividend > abs_divisor));
-} else {
-env->PSW_USB_V = (abs_sig_dividend >= abs_divisor);
-}
+/* calc overflow
+   ofv if (a/b >= 0x) <=> (a/0x >= b) */
+env->PSW_USB_V = (abs_sig_dividend >= abs_divisor) << 31;
 env->PSW_USB_V = env->PSW_USB_V << 31;
 env->PSW_USB_SV |= env->PSW_USB_V;
 env->PSW_USB_AV = 0;
-- 
2.3.3




Re: [Qemu-devel] [PATCH v3 4/4] configure: Add workaround for ccache and clang

2015-03-23 Thread John Snow



On 03/23/2015 11:14 AM, Peter Maydell wrote:

On 23 March 2015 at 14:52, John Snow  wrote:

On 03/23/2015 09:11 AM, Peter Maydell wrote:

This is really working around a bug in either ccache or
in the way Fedora has configured ccache, so I kind of
feel it ought to be dealt with there. However I don't
object too much to our including the workaround in our
configure...



I feel like it might be an inescapable consequence of using both ccache and
clang together on any system, not just Fedora.


Right, but it's not *QEMU* specific, so it would be better


OK, true.


if either (a) ccache automatically enabled this if it
noticed it was being run for clang or (b) Fedora automatically
enabled this in their aliases which make 'clang' automatically
be "clang run via ccache".

Is there a bug filed against Fedora for this?

-- PMM



Not that I can see on the RH or ccache/LLVM bugzillas at a quick glance. 
I can try to file something for ccache after I look a little more carefully.


In the meantime, until things improve, I think this workaround is sane 
if we want to deal with clang failures less.


I'll clean up the other minor comments and resubmit.

--js



Re: [Qemu-devel] RFC: memory API changes

2015-03-23 Thread Andreas Färber
Am 23.03.2015 um 16:11 schrieb Peter Maydell:
> On 23 March 2015 at 14:39, Paolo Bonzini  wrote:
>> On 23/03/2015 13:24, Peter Maydell wrote:
>>>  * cpu_physical_memory_rw are obsolete and should be replaced
>>>with uses of the as_* functions -- we should at least note
>>>this in the header file. (Can't do this as an automated change
>>>really because the correct AS to use is callsite dependent.)
>>
>> All users that should _not_ be using address_space_memory have been
>> already changed to address_space_rw, or should have, so it can be done
>> automatically.  Same for cpu_physical_memory_map/unmap, BTW.
> 
> Hmm. Checking a few, I notice that for instance the kvm-all.c
> cpu_physical_memory_rw() should probably be using cpu->as.
[snip]

Igor, does that impact our APIC discussion?

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu,
Graham Norton; HRB 21284 (AG Nürnberg)



Re: [Qemu-devel] RFC: memory API changes

2015-03-23 Thread Paolo Bonzini


On 23/03/2015 17:00, Peter Maydell wrote:
> On 23 March 2015 at 15:47, Paolo Bonzini  wrote:
>> Ok, so most ld*_phys users are already using cs->as.
> 
> But this is only because when the AddressSpace* argument
> to ld*_phys was introduced we made it be cs->as for
> existing callers, to retain the existing behaviour.
> That doesn't mean that callers using cs->as are doing
> the right thing. In general I would expect that any
> caller that's not part of the CPU itself should not be
> rummaging around inside the CPU struct to find an
> AddressSpace to use...

None of them does, uses of ld*_phys(cs->as, ...) are almost all in
target-* already.

There's just hw/ppc/spapr_hcall.c and hw/ppc/spapr_iommu.c that are
using cs->as in hw/, but they are correct because these are hypercalls.

The others use &address_space_memory.  I sent a patch to fix megasas and
vmware.

Paolo



[Qemu-devel] [PATCH] vmw_pvscsi: use PCI DMA APIs

2015-03-23 Thread Paolo Bonzini
It is wrong to use address_space_memory directly, because there could be an
IOMMU in the middle.  Passing the entire PVSCSIRingInfo to RS_GET_FIELD
and RS_SET_FIELD makes it easy to go back to the PVSCSIState.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/vmw_pvscsi.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
index d3a92fb..c6148d3 100644
--- a/hw/scsi/vmw_pvscsi.c
+++ b/hw/scsi/vmw_pvscsi.c
@@ -42,12 +42,12 @@
 #define PVSCSI_MAX_CMD_DATA_WORDS \
 (sizeof(PVSCSICmdDescSetupRings)/sizeof(uint32_t))
 
-#define RS_GET_FIELD(rs_pa, field) \
-(ldl_le_phys(&address_space_memory, \
- rs_pa + offsetof(struct PVSCSIRingsState, field)))
-#define RS_SET_FIELD(rs_pa, field, val) \
-(stl_le_phys(&address_space_memory, \
- rs_pa + offsetof(struct PVSCSIRingsState, field), val))
+#define RS_GET_FIELD(m, field) \
+(ldl_le_pci_dma(&container_of(m, PVSCSIState, rings)->parent_obj, \
+ (m)->rs_pa + offsetof(struct PVSCSIRingsState, field)))
+#define RS_SET_FIELD(m, field, val) \
+(stl_le_pci_dma(&container_of(m, PVSCSIState, rings)->parent_obj, \
+ (m)->rs_pa + offsetof(struct PVSCSIRingsState, field), val))
 
 #define TYPE_PVSCSI "pvscsi"
 #define PVSCSI(obj) OBJECT_CHECK(PVSCSIState, (obj), TYPE_PVSCSI)
@@ -153,13 +153,13 @@ pvscsi_ring_init_data(PVSCSIRingInfo *m, 
PVSCSICmdDescSetupRings *ri)
 m->cmp_ring_pages_pa[i] = ri->cmpRingPPNs[i] << VMW_PAGE_SHIFT;
 }
 
-RS_SET_FIELD(m->rs_pa, reqProdIdx, 0);
-RS_SET_FIELD(m->rs_pa, reqConsIdx, 0);
-RS_SET_FIELD(m->rs_pa, reqNumEntriesLog2, txr_len_log2);
+RS_SET_FIELD(m, reqProdIdx, 0);
+RS_SET_FIELD(m, reqConsIdx, 0);
+RS_SET_FIELD(m, reqNumEntriesLog2, txr_len_log2);
 
-RS_SET_FIELD(m->rs_pa, cmpProdIdx, 0);
-RS_SET_FIELD(m->rs_pa, cmpConsIdx, 0);
-RS_SET_FIELD(m->rs_pa, cmpNumEntriesLog2, rxr_len_log2);
+RS_SET_FIELD(m, cmpProdIdx, 0);
+RS_SET_FIELD(m, cmpConsIdx, 0);
+RS_SET_FIELD(m, cmpNumEntriesLog2, rxr_len_log2);
 
 trace_pvscsi_ring_init_data(txr_len_log2, rxr_len_log2);
 
@@ -185,9 +185,9 @@ pvscsi_ring_init_msg(PVSCSIRingInfo *m, 
PVSCSICmdDescSetupMsgRing *ri)
 m->msg_ring_pages_pa[i] = ri->ringPPNs[i] << VMW_PAGE_SHIFT;
 }
 
-RS_SET_FIELD(m->rs_pa, msgProdIdx, 0);
-RS_SET_FIELD(m->rs_pa, msgConsIdx, 0);
-RS_SET_FIELD(m->rs_pa, msgNumEntriesLog2, len_log2);
+RS_SET_FIELD(m, msgProdIdx, 0);
+RS_SET_FIELD(m, msgConsIdx, 0);
+RS_SET_FIELD(m, msgNumEntriesLog2, len_log2);
 
 trace_pvscsi_ring_init_msg(len_log2);
 
@@ -213,7 +213,7 @@ pvscsi_ring_cleanup(PVSCSIRingInfo *mgr)
 static hwaddr
 pvscsi_ring_pop_req_descr(PVSCSIRingInfo *mgr)
 {
-uint32_t ready_ptr = RS_GET_FIELD(mgr->rs_pa, reqProdIdx);
+uint32_t ready_ptr = RS_GET_FIELD(mgr, reqProdIdx);
 
 if (ready_ptr != mgr->consumed_ptr) {
 uint32_t next_ready_ptr =
@@ -233,7 +233,7 @@ pvscsi_ring_pop_req_descr(PVSCSIRingInfo *mgr)
 static void
 pvscsi_ring_flush_req(PVSCSIRingInfo *mgr)
 {
-RS_SET_FIELD(mgr->rs_pa, reqConsIdx, mgr->consumed_ptr);
+RS_SET_FIELD(mgr, reqConsIdx, mgr->consumed_ptr);
 }
 
 static hwaddr
@@ -278,14 +278,14 @@ pvscsi_ring_flush_cmp(PVSCSIRingInfo *mgr)
 
 trace_pvscsi_ring_flush_cmp(mgr->filled_cmp_ptr);
 
-RS_SET_FIELD(mgr->rs_pa, cmpProdIdx, mgr->filled_cmp_ptr);
+RS_SET_FIELD(mgr, cmpProdIdx, mgr->filled_cmp_ptr);
 }
 
 static bool
 pvscsi_ring_msg_has_room(PVSCSIRingInfo *mgr)
 {
-uint32_t prodIdx = RS_GET_FIELD(mgr->rs_pa, msgProdIdx);
-uint32_t consIdx = RS_GET_FIELD(mgr->rs_pa, msgConsIdx);
+uint32_t prodIdx = RS_GET_FIELD(mgr, msgProdIdx);
+uint32_t consIdx = RS_GET_FIELD(mgr, msgConsIdx);
 
 return (prodIdx - consIdx) < (mgr->msg_len_mask + 1);
 }
@@ -298,7 +298,7 @@ pvscsi_ring_flush_msg(PVSCSIRingInfo *mgr)
 
 trace_pvscsi_ring_flush_msg(mgr->filled_msg_ptr);
 
-RS_SET_FIELD(mgr->rs_pa, msgProdIdx, mgr->filled_msg_ptr);
+RS_SET_FIELD(mgr, msgProdIdx, mgr->filled_msg_ptr);
 }
 
 static void
-- 
2.3.3




[Qemu-devel] [PATCH] megasas: use PCI DMA APIs

2015-03-23 Thread Paolo Bonzini
It is wrong to use address_space_memory directly, because there could be an
IOMMU in the middle.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/megasas.c | 51 ---
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index bf83b65..ad7317b 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -171,26 +171,29 @@ static bool megasas_is_jbod(MegasasState *s)
 return s->flags & MEGASAS_MASK_USE_JBOD;
 }
 
-static void megasas_frame_set_cmd_status(unsigned long frame, uint8_t v)
+static void megasas_frame_set_cmd_status(MegasasState *s,
+ unsigned long frame, uint8_t v)
 {
-stb_phys(&address_space_memory,
- frame + offsetof(struct mfi_frame_header, cmd_status), v);
+PCIDevice *pci = &s->parent_obj;
+stb_pci_dma(pci, frame + offsetof(struct mfi_frame_header, cmd_status), v);
 }
 
-static void megasas_frame_set_scsi_status(unsigned long frame, uint8_t v)
+static void megasas_frame_set_scsi_status(MegasasState *s,
+  unsigned long frame, uint8_t v)
 {
-stb_phys(&address_space_memory,
- frame + offsetof(struct mfi_frame_header, scsi_status), v);
+PCIDevice *pci = &s->parent_obj;
+stb_pci_dma(pci, frame + offsetof(struct mfi_frame_header, scsi_status), 
v);
 }
 
 /*
  * Context is considered opaque, but the HBA firmware is running
  * in little endian mode. So convert it to little endian, too.
  */
-static uint64_t megasas_frame_get_context(unsigned long frame)
+static uint64_t megasas_frame_get_context(MegasasState *s,
+  unsigned long frame)
 {
-return ldq_le_phys(&address_space_memory,
-   frame + offsetof(struct mfi_frame_header, context));
+PCIDevice *pci = &s->parent_obj;
+return ldq_le_pci_dma(pci, frame + offsetof(struct mfi_frame_header, 
context));
 }
 
 static bool megasas_frame_is_ieee_sgl(MegasasCmd *cmd)
@@ -523,8 +526,7 @@ static MegasasCmd *megasas_enqueue_frame(MegasasState *s,
 s->busy++;
 
 if (s->consumer_pa) {
-s->reply_queue_tail = ldl_le_phys(&address_space_memory,
-  s->consumer_pa);
+s->reply_queue_tail = ldl_le_pci_dma(pcid, s->consumer_pa);
 }
 trace_megasas_qf_enqueue(cmd->index, cmd->count, cmd->context,
  s->reply_queue_head, s->reply_queue_tail, 
s->busy);
@@ -547,29 +549,24 @@ static void megasas_complete_frame(MegasasState *s, 
uint64_t context)
  */
 if (megasas_use_queue64(s)) {
 queue_offset = s->reply_queue_head * sizeof(uint64_t);
-stq_le_phys(&address_space_memory,
-s->reply_queue_pa + queue_offset, context);
+stq_le_pci_dma(pci_dev, s->reply_queue_pa + queue_offset, context);
 } else {
 queue_offset = s->reply_queue_head * sizeof(uint32_t);
-stl_le_phys(&address_space_memory,
-s->reply_queue_pa + queue_offset, context);
+stl_le_pci_dma(pci_dev, s->reply_queue_pa + queue_offset, context);
 }
-s->reply_queue_tail = ldl_le_phys(&address_space_memory,
-  s->consumer_pa);
+s->reply_queue_tail = ldl_le_pci_dma(pci_dev, s->consumer_pa);
 trace_megasas_qf_complete(context, s->reply_queue_head,
   s->reply_queue_tail, s->busy);
 }
 
 if (megasas_intr_enabled(s)) {
 /* Update reply queue pointer */
-s->reply_queue_tail = ldl_le_phys(&address_space_memory,
-  s->consumer_pa);
+s->reply_queue_tail = ldl_le_pci_dma(pci_dev, s->consumer_pa);
 tail = s->reply_queue_head;
 s->reply_queue_head = megasas_next_index(s, tail, s->fw_cmds);
 trace_megasas_qf_update(s->reply_queue_head, s->reply_queue_tail,
 s->busy);
-stl_le_phys(&address_space_memory,
-s->producer_pa, s->reply_queue_head);
+stl_le_pci_dma(pci_dev, s->producer_pa, s->reply_queue_head);
 /* Notify HBA */
 if (msix_enabled(pci_dev)) {
 trace_megasas_msix_raise(0);
@@ -651,8 +648,8 @@ static int megasas_init_firmware(MegasasState *s, 
MegasasCmd *cmd)
 pa_lo = le32_to_cpu(initq->pi_addr_lo);
 pa_hi = le32_to_cpu(initq->pi_addr_hi);
 s->producer_pa = ((uint64_t) pa_hi << 32) | pa_lo;
-s->reply_queue_head = ldl_le_phys(&address_space_memory, s->producer_pa);
-s->reply_queue_tail = ldl_le_phys(&address_space_memory, s->consumer_pa);
+s->reply_queue_head = ldl_le_pci_dma(pcid, s->producer_pa);
+s->reply_queue_tail = ldl_le_pci_dma(pcid, s->consumer_pa);
 flags = le32_to_cpu(initq->flags);
 if (flags & MFI_QUEUE_FLAG_CONTEXT64) {
 s->flags |= MEGASAS_MASK_USE_QUEUE64;
@@

Re: [Qemu-devel] RFC: memory API changes

2015-03-23 Thread Peter Maydell
On 23 March 2015 at 15:47, Paolo Bonzini  wrote:
> Ok, so most ld*_phys users are already using cs->as.

But this is only because when the AddressSpace* argument
to ld*_phys was introduced we made it be cs->as for
existing callers, to retain the existing behaviour.
That doesn't mean that callers using cs->as are doing
the right thing. In general I would expect that any
caller that's not part of the CPU itself should not be
rummaging around inside the CPU struct to find an
AddressSpace to use...

-- PMM



Re: [Qemu-devel] [PATCH for-2.3] sdhci: add "drive" property

2015-03-23 Thread Paolo Bonzini


On 23/03/2015 16:01, Peter Crosthwaite wrote:
> Note that SD has a SPI mode, every SD card is in theory a valid SSI
> device. We could unify SD under SSI to achieve both QOMification and
> busification.

That wouldn't really be the way most SD cards work, anyway.  The
protocol doesn't do simultaneous bidirectional transfers like SSI does.

With your proposal the right way to invoke sdhci-pci would be "-device
sdhci-pci,id=sdhci -drive if=none,...,id=sd -device
sd,drive=sd,bus=sdhci.0".

Similarly, instantiating an SD card on an SSI bus would be "-device
ssi-sd,id=ssisd -drive if=none,...,id=sd -device sd,drive=sd,bus=ssisd.0".

> I would then expect the block setup of sd.c to be very similar to
> hw/block/m25p80.c (SPI flash).

Absolutely not, hw/block/m25p80.c has the same issue of doing
drive_get_next in the realize function.

Paolo



Re: [Qemu-devel] [PATCH v3 5/5] fw_cfg: insert fw_cfg file blobs via qemu cmdline

2015-03-23 Thread Paolo Bonzini


On 23/03/2015 15:11, Laszlo Ersek wrote:
> Depends on the exact maintainer picking up your series.
> 
> ... Now good luck figuring out who that's going to be :)
> 
> (I ran "scripts/get_maintainer.pl" on a squashed diff for this series.
> It named only Paolo, but for the wrong reason: "vl.c" matched "Main
> loop" from the MAINTAINERS file. fw_cfg has no dedicated owner, apparently.)

Gerd replied with "Looks good to me", so I think it should go through
him or through mst.

Paolo



Re: [Qemu-devel] [PATCH v3 7/9] omap_intc: convert ffs(3) to ctz32() in omap_inth_sir_update()

2015-03-23 Thread Paolo Bonzini


On 23/03/2015 16:29, Stefan Hajnoczi wrote:
> From: Paolo Bonzini 
> 
> The loop previously terminated on ffs(0) == 0, now it terminates on
> ctz32(0) + 1 == 33.

... now it terminates on level == 0.

Paolo



Re: [Qemu-devel] RFC: memory API changes

2015-03-23 Thread Paolo Bonzini


On 23/03/2015 16:39, Peter Maydell wrote:
> > Not necessarily, for example PCI devices can certainly
> > use the short name.
>
> PCI devices should all be using ld*_pci_dma and st*_pci_dma
> so they go through the correct address space for the PCIDevice.
> Any PCI device making direct calls to ld*_phys/st*_phys
> is broken...

Ok, so most ld*_phys users are already using cs->as.  The remaining ones
are:

hw/alpha/typhoon.c
hw/display/sm501_template.h
hw/dma/pl080.c
hw/dma/sun4m_iommu.c
hw/net/vmware_utils.h
hw/pci-host/apb.c
hw/s390x/css.c
hw/s390x/s390-pci-bus.c
hw/s390x/s390-virtio-bus.c
hw/s390x/virtio-ccw.c
hw/scsi/megasas.c
hw/scsi/vmw_pvscsi.c

So let's keep ld*_phys in cpu-common.h, fix the uses above, and then
move it to cpu-all.h (cpu-common.h probably should die).

Paolo



Re: [Qemu-devel] [PATCH 1/2] object: Add can_be_deleted callback to TypeInfo and TypeImpl

2015-03-23 Thread Lin Ma


在 2015年03月23日 20:06, Paolo Bonzini 写道:


On 23/03/2015 11:36, Peter Crosthwaite wrote:

I don't think TypeInfo is the right place for this. You can however
define function hooks for Object in ObjectClass. See the unparent
field of ObjectClass for a precedent.

In this case, the right place could be UserCreatable.  Alternatively...
Well... Because I'm not familiar with qom very much, Let me dig more 
details about UserCreatable first :-)



But is a better way to do this to add error handling to
object_unparent API and override object_unparent for your device in
question to throw the error? Then your change doesn't have to be
limited to QMP.

... this is also a good choice.

Paolo






  1   2   3   >