Re: [Qemu-devel] [PATCH] spapr: Don't accidentally advertise HTM support on POWER9

2017-05-21 Thread Sam Bobroff
On Tue, May 09, 2017 at 03:04:58PM +1000, David Gibson wrote:
> Logic in spapr_populate_pa_features() enables the bit advertising
> Hardware Transactional Memory (HTM) in the guest's device tree only when
> KVM advertises its availability with the KVM_CAP_PPC_HTM feature.
> 
> However, this assumes that the HTM bit is off in the base template used for
> the device tree value.  That is true for POWER8, but not for POWER9.
> 
> It looks like that was accidentally changed in 9fb4541 "spapr: Enable ISA
> 3.0 MMU mode selection via CAS".
> 
> Fixes: 9fb4541f5803f8d2ba116b12113386e26482ba30
> 
> Signed-off-by: David Gibson 
> ---
>  hw/ppc/spapr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index e2dc77c..1b7cada 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -219,7 +219,7 @@ static void spapr_populate_pa_features(CPUPPCState *env, 
> void *fdt, int offset,
>  /* 16: Vector */
>  0x00, 0x00, 0x00, 0x00, 0x80, 0x00, /* 12 - 17 */
>  /* 18: Vec. Scalar, 20: Vec. XOR, 22: HTM */
> -0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 18 - 23 */
> +0x80, 0x00, 0x80, 0x00, 0x00, 0x00, /* 18 - 23 */
>  /* 24: Ext. Dec, 26: 64 bit ftrs, 28: PM ftrs */
>  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 24 - 29 */
>  /* 30: MMR, 32: LE atomic, 34: EBB + ext EBB */
> -- 
> 2.9.3
Looks good to me.

Reviewed-by: Sam Bobroff 




Re: [Qemu-devel] [Qemu-arm] [PATCH 1/2] hw/adc: Add basic Aspeed ADC model

2017-05-21 Thread Andrew Jeffery
On Mon, 2017-05-22 at 03:15 +, Ryan Chen wrote:
> In ASPEED SoC chip, all register access have following rule. 
> Most of controller write access is only support 32bit access. 
> Read is support 8bits/16bits/32bits. 

Thanks for clearing that up Ryan.

Phil: I'll rework the model so the reads are 16-bits.

Thanks,

Andrew

> 
> 
> 
> Best Regards,
> Ryan
> 
> 信驊科技股份有限公司
> ASPEED Technology Inc.
> 2F,No.15,Industry East Road 4.,Hsinchu Science Park, Hsinchu City 30077, 
> Taiwan
> Tel: 886-3-578-9568  #857
> Fax: 886-3-578-9586
> * Email Confidentiality Notice 
> DISCLAIMER:
> This message (and any attachments) may contain legally privileged and/or 
> other confidential information. If you have received it in error, please 
> notify the sender by reply e-mail and immediately delete the e-mail and any 
> attachments without copying or disclosing the contents. Thank you.
> 
> -Original Message-
> > From: Andrew Jeffery [mailto:and...@aj.id.au] 
> Sent: Monday, May 22, 2017 8:24 AM
> > To: Philippe Mathieu-Daudé ; qemu-...@nongnu.org
> > > > Cc: Peter Maydell ; Ryan Chen 
> > > > ; Alistair Francis ; 
> > > > qemu-devel@nongnu.org; Cédric Le Goater ; Joel Stanley 
> > > > 
> Subject: Re: [Qemu-arm] [PATCH 1/2] hw/adc: Add basic Aspeed ADC model
> 
> Hi Phil,
> 
> On Sat, 2017-05-20 at 00:21 -0300, Philippe Mathieu-Daudé wrote:
> > Hi Andrew,
> > 
> > On 05/19/2017 09:26 PM, Andrew Jeffery wrote:
> > > This model implements enough behaviour to do basic functionality 
> > > tests such as device initialisation and read out of dummy sample 
> > > values. The sample value generation strategy is similar to the STM 
> > > ADC already in the tree.
> > > 
> > > > > Signed-off-by: Andrew Jeffery 
> > > 
> > > ---
> > >  hw/adc/Makefile.objs|   1 +
> > >  hw/adc/aspeed_adc.c | 246 
> > > 
> > >  include/hw/adc/aspeed_adc.h |  33 ++
> > >  3 files changed, 280 insertions(+)
> > >  create mode 100644 hw/adc/aspeed_adc.c
> > >  create mode 100644 include/hw/adc/aspeed_adc.h
> > > 
> > > diff --git a/hw/adc/Makefile.objs b/hw/adc/Makefile.objs index 
> > > 3f6dfdedaec7..2bf9362ba3c4 100644
> > > --- a/hw/adc/Makefile.objs
> > > +++ b/hw/adc/Makefile.objs
> > > @@ -1 +1,2 @@
> > >  obj-$(CONFIG_STM32F2XX_ADC) += stm32f2xx_adc.o
> > > +obj-$(CONFIG_ASPEED_SOC) += aspeed_adc.o
> > > diff --git a/hw/adc/aspeed_adc.c b/hw/adc/aspeed_adc.c new file mode 
> > > 100644 index ..d08f1684f7bc
> > > --- /dev/null
> > > +++ b/hw/adc/aspeed_adc.c
> > > @@ -0,0 +1,246 @@
> > > +/*
> > > + * Aspeed ADC
> > > + *
> > > > > + * Andrew Jeffery 
> > > 
> > > + *
> > > + * Copyright 2017 IBM Corp.
> > > + *
> > > + * This code is licensed under the GPL version 2 or later.  See
> > > + * the COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "hw/adc/aspeed_adc.h"
> > > +#include "qapi/error.h"
> > > +#include "qemu/log.h"
> > > +
> > > +#define ASPEED_ADC_ENGINE_CTRL  0x00 #define  
> > > +ASPEED_ADC_ENGINE_CH_EN_MASK   0x #define   
> > > +ASPEED_ADC_ENGINE_CH_EN(x)((BIT(x)) << 16) #define  
> > > +ASPEED_ADC_ENGINE_INIT BIT(8) #define  
> > > +ASPEED_ADC_ENGINE_AUTO_COMPBIT(5) #define  
> > > +ASPEED_ADC_ENGINE_COMP BIT(4) #define  
> > > +ASPEED_ADC_ENGINE_MODE_MASK0x000e #define   
> > > +ASPEED_ADC_ENGINE_MODE_OFF(0b000 << 1) #define   
> > > +ASPEED_ADC_ENGINE_MODE_STANDBY(0b001 << 1) #define   
> > > +ASPEED_ADC_ENGINE_MODE_NORMAL (0b111 << 1) #define  
> > > +ASPEED_ADC_ENGINE_EN   BIT(0)
> > > +
> > > +#define ASPEED_ADC_L_MASK   ((1 << 10) - 1) #define 
> > > +ASPEED_ADC_L(x) ((x) & ASPEED_ADC_L_MASK) #define 
> > > +ASPEED_ADC_H(x) (((x) >> 16) & ASPEED_ADC_L_MASK) #define 
> > > +ASPEED_ADC_LH_MASK  (ASPEED_ADC_L_MASK << 16 | 
> > > +ASPEED_ADC_L_MASK)
> > > +
> > > +static inline uint32_t update_channels(uint32_t current) {
> > > +const uint32_t next = (current + 7) & 0x3ff;
> > > +
> > > +return (next << 16) | next;
> > > +}
> > > +
> > > +static bool breaks_threshold(AspeedADCState *s, int ch_off) {
> > > +const uint32_t a = ASPEED_ADC_L(s->channels[ch_off]);
> > > +const uint32_t a_lower = ASPEED_ADC_L(s->bounds[2 * ch_off]);
> > > +const uint32_t a_upper = ASPEED_ADC_H(s->bounds[2 * ch_off]);
> > > +const uint32_t b = ASPEED_ADC_H(s->channels[ch_off]);
> > > +const uint32_t b_lower = ASPEED_ADC_L(s->bounds[2 * ch_off + 
> > > +1]);
> > > +const uint32_t b_upper = ASPEED_ADC_H(s->bounds[2 * ch_off + 
> > > +1]);
> > > +
> > > +return ((a < a_lower || a > a_upper)) ||
> > > +   ((b < b_lower 

Re: [Qemu-devel] [PATCH v4 3/3] net/rocker: Convert to realize()

2017-05-21 Thread Mao Zhongyi

Hi, Markus

On 05/19/2017 03:20 PM, Markus Armbruster wrote:

Mao Zhongyi  writes:


The rocker device still implements the old PCIDeviceClass .init()
instead of the new .realize(). All devices need to be converted to
.realize().

.init() reports errors with fprintf() and return 0 on success, negative
number on failure. Meanwhile, when -device rocker fails, it first report
a specific error, then a generic one, like this:

$ x86_64-softmmu/qemu-system-x86_64 -device rocker,name=qemu-rocker
rocker: name too long; please shorten to at most 9 chars
qemu-system-x86_64: -device rocker,name=qemu-rocker: Device initialization 
failed

Now, convert it to .realize() that passes errors to its callers via its
errp argument. Also avoid the superfluous second error message. After
the patch, effect like this:

$ x86_64-softmmu/qemu-system-x86_64 -device rocker,name=qemu-rocker
qemu-system-x86_64: -device rocker,name=qemu-rocker: name too long; please 
shorten to at most 9 chars

Cc: jasow...@redhat.com
Cc: j...@resnulli.us
Cc: f4...@amsat.org
Cc: arm...@redhat.com
Signed-off-by: Mao Zhongyi 


The conversion to realize() looks good to me, therefore
Reviewed-by: Markus Armbruster 

However, I spotted a few issues not related to this patch.

1. Unusual macro names

#define ROCKER "rocker"

#define to_rocker(obj) \
OBJECT_CHECK(Rocker, (obj), ROCKER)

   Please clean up to

#define TYPE_ROCKER "rocker"

#define ROCKER(obj) \
OBJECT_CHECK(Rocker, (obj), TYPE_ROCKER)

   in a separate patch.


Thanks, will fix it in the next version.



2. Memory leaks on error and unplug

   Explained inline below.


---
 hw/net/rocker/rocker.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index a382a6f..b9a77f1 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1252,20 +1252,18 @@ rollback:
 return err;
 }

-static int rocker_msix_init(Rocker *r)
+static int rocker_msix_init(Rocker *r, Error **errp)
 {
 PCIDevice *dev = PCI_DEVICE(r);
 int err;
-Error *local_err = NULL;

 err = msix_init(dev, ROCKER_MSIX_VEC_COUNT(r->fp_ports),
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_TABLE_OFFSET,
 >msix_bar,
 ROCKER_PCI_MSIX_BAR_IDX, ROCKER_PCI_MSIX_PBA_OFFSET,
-0, _err);
+0, errp);
 if (err) {
-error_report_err(local_err);
 return err;
 }

@@ -1301,7 +1299,7 @@ static World *rocker_world_type_by_name(Rocker *r, const 
char *name)
 return NULL;
 }

-static int pci_rocker_init(PCIDevice *dev)
+static void pci_rocker_realize(PCIDevice *dev, Error **errp)
 {
 Rocker *r = to_rocker(dev);
 const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
@@ -1319,10 +1317,9 @@ static int pci_rocker_init(PCIDevice *dev)

 r->world_dflt = rocker_world_type_by_name(r, r->world_name);
 if (!r->world_dflt) {
-fprintf(stderr,
-"rocker: requested world \"%s\" does not exist\n",
+error_setg(errp,
+"invalid argument requested world %s does not exist",
 r->world_name);
-err = -EINVAL;
 goto err_world_type_by_name;
 }

@@ -1342,7 +1339,7 @@ static int pci_rocker_init(PCIDevice *dev)

 /* MSI-X init */

-err = rocker_msix_init(r);
+err = rocker_msix_init(r, errp);
 if (err) {
 goto err_msix_init;
 }
@@ -1354,7 +1351,7 @@ static int pci_rocker_init(PCIDevice *dev)
 }

 if (rocker_find(r->name)) {
-err = -EEXIST;
+error_setg(errp, "%s already exists", r->name);
 goto err_duplicate;
 }

@@ -1368,10 +1365,9 @@ static int pci_rocker_init(PCIDevice *dev)
 #define ROCKER_IFNAMSIZ 16
 #define MAX_ROCKER_NAME_LEN  (ROCKER_IFNAMSIZ - 1 - 3 - 3)
 if (strlen(r->name) > MAX_ROCKER_NAME_LEN) {
-fprintf(stderr,
-"rocker: name too long; please shorten to at most %d chars\n",
+error_setg(errp,
+"name too long; please shorten to at most %d chars",
 MAX_ROCKER_NAME_LEN);
-err = -EINVAL;
 goto err_name_too_long;
 }

@@ -1429,7 +1425,7 @@ static int pci_rocker_init(PCIDevice *dev)

 QLIST_INSERT_HEAD(, r, next);

-return 0;
+return;

 err_name_too_long:
 err_duplicate:

   rocker_msix_uninit(r);
   err_msix_init:
   object_unparent(OBJECT(>msix_bar));
   object_unparent(OBJECT(>mmio));
   err_world_type_by_name:
   for (i = 0; i < ROCKER_WORLD_TYPE_MAX; i++) {
   if (r->worlds[i]) {

@@ -1443,7 +1439,6 @@ err_world_type_by_name:
 world_free(r->worlds[i]);
 }
 }
-return err;
 }



Does this leak r->world_name and r->name?


I think it was leaked neither r->world_name nor r->name, 

Re: [Qemu-devel] [RFC PATCH v2 4/4] spapr: Fix migration of Radix guests

2017-05-21 Thread Bharata B Rao
On Mon, May 22, 2017 at 12:44:48PM +1000, David Gibson wrote:
> On Fri, May 19, 2017 at 12:06:14PM +0530, Bharata B Rao wrote:
> > On Fri, May 19, 2017 at 11:10:39AM +0530, Bharata B Rao wrote:
> > > Fix migration of radix guests by ensuring that we issue
> > > KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.
> > > 
> > > Reported-by: Nageswara R Sastry 
> > > Signed-off-by: Bharata B Rao 
> > > ---
> > >  hw/ppc/spapr.c | 12 
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index daf335c..8f20f14 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, int 
> > > version_id)
> > >  err = spapr_rtc_import_offset(>rtc, spapr->rtc_offset);
> > >  }
> > > 
> > > +if (spapr->patb_entry) {
> > > +PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> > > +if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) {
> > > +err = kvmppc_configure_v3_mmu(cpu, SPAPR_PROC_TABLE_RADIX,
> > > +((cpu->env.spr[SPR_LPCR] & LPCR_GTSE) ? 
> > > SPAPR_PROC_TABLE_GTSE :
> > > +0), spapr->patb_entry);
> > 
> > Better to use explicit 'true' and 'false' in the above call. Here is
> > the updated patch:
> 
> Or just !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE) and avoid the ?:
> entirely.
> 
> With this version you no longer need patch 3/4 AFAICT.

Ah yes, will send the updated version next.

Regards,
Bharata.




Re: [Qemu-devel] [PATCH v4 1/3] net/rocker: Remove the dead error handling

2017-05-21 Thread Mao Zhongyi

Hi, Markus
Thanks for review and sorry for replying late, I was on the weekend.

On 05/19/2017 02:24 PM, Markus Armbruster wrote:

Mao Zhongyi  writes:


The function of_dpa_world_alloc is a wrapper around world_alloc(), which
returns null only when g_malloc0(size_t size) does. But g_malloc0() also
is a wrapper around g_malloc0_n(1, size) that ignore the fact that
g_malloc0() of 0 bytes, it doesn't returns null. So the error handling is
dead. Similar like desc_ring_alloc(), fp_port_alloc() etc. Remove these
entirely.

Cc: jasow...@redhat.com
Cc: j...@resnulli.us
Cc: f4...@amsat.org
Cc: arm...@redhat.com
Signed-off-by: Mao Zhongyi 
---
 hw/net/rocker/rocker.c  | 32 
 hw/net/rocker/rocker_desc.c |  3 ---
 hw/net/rocker/rocker_fp.c   |  4 
 3 files changed, 39 deletions(-)

diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index 6e70fdd..b2b6dc7 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1313,13 +1313,6 @@ static int pci_rocker_init(PCIDevice *dev)

 r->worlds[ROCKER_WORLD_TYPE_OF_DPA] = of_dpa_world_alloc(r);

-for (i = 0; i < ROCKER_WORLD_TYPE_MAX; i++) {
-if (!r->worlds[i]) {
-err = -ENOMEM;
-goto err_world_alloc;
-}
-}
-
 if (!r->world_name) {
 r->world_name = 
g_strdup(world_name(r->worlds[ROCKER_WORLD_TYPE_OF_DPA]));
 }


Please also simplify world_alloc()

   World *world_alloc(Rocker *r, size_t sizeof_private,
  enum rocker_world_type type, WorldOps *ops)
   {
   World *w = g_malloc0(sizeof(World) + sizeof_private);

  -if (w) {
   w->r = r;
   w->type = type;
   w->ops = ops;
   if (w->ops->init) {
   w->ops->init(w);
   }
   }

  -return w;
   }


OK, I will fix it and search for the similar error handling from others 
allocation
functions.



and reindent.


@@ -1396,9 +1389,6 @@ static int pci_rocker_init(PCIDevice *dev)
 }

 r->rings = g_new(DescRing *, rocker_pci_ring_count(r));
-if (!r->rings) {
-goto err_rings_alloc;
-}

 /* Rings are ordered like this:
  * - command ring
@@ -1410,14 +1400,9 @@ static int pci_rocker_init(PCIDevice *dev)
  * .
  */

-err = -ENOMEM;
 for (i = 0; i < rocker_pci_ring_count(r); i++) {
 DescRing *ring = desc_ring_alloc(r, i);

-if (!ring) {
-goto err_ring_alloc;
-}
-
 if (i == ROCKER_RING_CMD) {
 desc_ring_set_consume(ring, cmd_consume, ROCKER_MSIX_VEC_CMD);
 } else if (i == ROCKER_RING_EVENT) {
@@ -1437,10 +1422,6 @@ static int pci_rocker_init(PCIDevice *dev)
 fp_port_alloc(r, r->name, >fp_start_macaddr,
   i, >fp_ports_peers[i]);

-if (!port) {
-goto err_port_alloc;
-}
-
 r->fp_port[i] = port;
 fp_port_set_world(port, r->world_dflt);
 }
@@ -1449,25 +1430,12 @@ static int pci_rocker_init(PCIDevice *dev)

 return 0;

-err_port_alloc:
-for (--i; i >= 0; i--) {
-FpPort *port = r->fp_port[i];
-fp_port_free(port);
-}
-i = rocker_pci_ring_count(r);
-err_ring_alloc:
-for (--i; i >= 0; i--) {
-desc_ring_free(r->rings[i]);
-}
-g_free(r->rings);
-err_rings_alloc:
 err_duplicate:
 rocker_msix_uninit(r);
 err_msix_init:
 object_unparent(OBJECT(>msix_bar));
 object_unparent(OBJECT(>mmio));
 err_world_type_by_name:
-err_world_alloc:
 for (i = 0; i < ROCKER_WORLD_TYPE_MAX; i++) {
 if (r->worlds[i]) {
 world_free(r->worlds[i]);
diff --git a/hw/net/rocker/rocker_desc.c b/hw/net/rocker/rocker_desc.c
index ac02797..bd7e364 100644
--- a/hw/net/rocker/rocker_desc.c
+++ b/hw/net/rocker/rocker_desc.c
@@ -347,9 +347,6 @@ DescRing *desc_ring_alloc(Rocker *r, int index)
 DescRing *ring;

 ring = g_new0(DescRing, 1);
-if (!ring) {
-return NULL;
-}

 ring->r = r;
 ring->index = index;
diff --git a/hw/net/rocker/rocker_fp.c b/hw/net/rocker/rocker_fp.c
index 1305ac3..4b3c984 100644
--- a/hw/net/rocker/rocker_fp.c
+++ b/hw/net/rocker/rocker_fp.c
@@ -226,10 +226,6 @@ FpPort *fp_port_alloc(Rocker *r, char *sw_name,
 {
 FpPort *port = g_new0(FpPort, 1);

-if (!port) {
-return NULL;
-}
-
 port->r = r;
 port->index = index;
 port->pport = index + 1;


There's more!

In tx_consume():

   }
   iov[iovcnt].iov_len = frag_len;
   iov[iovcnt].iov_base = g_malloc(frag_len);
  -if (!iov[iovcnt].iov_base) {
  -err = -ROCKER_ENOMEM;
  -goto err_no_mem;
  -}

   if (pci_dma_read(dev, frag_addr, iov[iovcnt].iov_base,

In rx_produce():

   data = g_malloc(data_size);
  -if (!data) {
  -err = -ROCKER_ENOMEM;
  -goto out;
  -}
   iov_to_buf(iov, iovcnt, 0, data, data_size);
   

Re: [Qemu-devel] [RFC PATCH v2 1/4] migration: Introduce unregister_savevm_live()

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 03:14:35PM +0200, Laurent Vivier wrote:
> On 19/05/2017 09:33, David Gibson wrote:
> > On Fri, May 19, 2017 at 11:10:36AM +0530, Bharata B Rao wrote:
> >> Introduce a new function unregister_savevm_live() to unregister the vmstate
> >> handlers registered via register_savevm_live().
> >>
> >> register_savevm() allocates SaveVMHandlers while register_savevm_live()
> >> gets passed with SaveVMHandlers. During unregistration, we  want to
> >> free SaveVMHandlers in the former case but not free in the latter case.
> >> Hence this new API is needed to differentiate this.
> >>
> >> This new API will be needed by PowerPC to unregister the HTAB savevm
> >> handlers.
> >>
> >> Signed-off-by: Bharata B Rao 
> > 
> > Reviewed-by: David Gibson 
> > 
> > I could take this through my tree, but it would need an ACK from Dave
> > Gilbert or Juan Quintela.
> 
> I cc: them for that.

Dave, Juan: his is fairly urgent, since it's a prereq for (sanely)
fixing a migration bug.  Please let me know what I can do to expedite
review.

> 
> Just a comment on the patch.
> 
> Instead of introducing a new function, perhaps we can homogenize the use
> of register_savevm() by always providing a SaveVMHandlers pointer and
> never a couple of (SaveStateHandler, LoadStateHandler) so the
> unregister_save() has never to free se->ops?

Sounds reasonable to me.  Again, Dave?  Juan?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC PATCH v2 4/4] spapr: Fix migration of Radix guests

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 12:06:14PM +0530, Bharata B Rao wrote:
> On Fri, May 19, 2017 at 11:10:39AM +0530, Bharata B Rao wrote:
> > Fix migration of radix guests by ensuring that we issue
> > KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.
> > 
> > Reported-by: Nageswara R Sastry 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  hw/ppc/spapr.c | 12 
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index daf335c..8f20f14 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, int 
> > version_id)
> >  err = spapr_rtc_import_offset(>rtc, spapr->rtc_offset);
> >  }
> > 
> > +if (spapr->patb_entry) {
> > +PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> > +if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) {
> > +err = kvmppc_configure_v3_mmu(cpu, SPAPR_PROC_TABLE_RADIX,
> > +((cpu->env.spr[SPR_LPCR] & LPCR_GTSE) ? 
> > SPAPR_PROC_TABLE_GTSE :
> > +0), spapr->patb_entry);
> 
> Better to use explicit 'true' and 'false' in the above call. Here is
> the updated patch:

Or just !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE) and avoid the ?:
entirely.

With this version you no longer need patch 3/4 AFAICT.

> 
> >From 937c51cac73b4211ef153c1f5940215960383494 Mon Sep 17 00:00:00 2001
> From: Bharata B Rao 
> Date: Tue, 16 May 2017 12:19:54 +0530
> Subject: [RFC PATCH v2.1 4/4] spapr: Fix migration of Radix guests
> 
> Fix migration of radix guests by ensuring that we issue
> KVM_PPC_CONFIGURE_V3_MMU for radix case post migration.
> 
> Reported-by: Nageswara R Sastry 
> Signed-off-by: Bharata B Rao 
> ---
>  hw/ppc/spapr.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index daf335c..69e184b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1400,6 +1400,18 @@ static int spapr_post_load(void *opaque, int 
> version_id)
>  err = spapr_rtc_import_offset(>rtc, spapr->rtc_offset);
>  }
>  
> +if (spapr->patb_entry) {
> +PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> +if (kvmppc_has_cap_mmu_radix() && kvm_enabled()) {
> +err = kvmppc_configure_v3_mmu(cpu, true,
> +((cpu->env.spr[SPR_LPCR] & LPCR_GTSE) ? true : false),
> +spapr->patb_entry);
> +} else {
> +error_report("Radix guest is unsupported by the host");
> +return -EINVAL;
> +}
> +}
> +
>  return err;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC PATCH v2 2/4] spapr: Unregister HPT savevm handlers for radix guests

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 11:10:37AM +0530, Bharata B Rao wrote:
> HPT gets created by default for TCG guests and later when the guest turns
> out to be a radix guest, the HPT is destroyed when guest does
> H_REGISTER_PROC_TBL hcall. Let HTAB savevm handlers registration and
> unregistration follow the same model so that we don't end up having
> unrequired HTAB savevm handlers for radix guests.
> 
> This also ensures that HTAB savevm handlers seemlessly get destroyed and
> recreated like HTAB itself when hash guest reboots.
> 
> HTAB savevm handlers registration/unregistration is now done from
> spapr_reallocate_hpt() which itself is called from one of the
> savevm_htab_handlers.htab_load(). To cater to this circular dependency
> spapr_reallocate_hpt() is made global.
> 
> Signed-off-by: Bharata B Rao 

Reviewed-by: David Gibson 


Weirdly, diff seems to have done a terrible job at minimizing the
patch below.  AFAICT this is really just a one line addition to
spapr_free_hpt() and spapr_reallocate_hpt().

> ---
>  hw/ppc/spapr.c | 99 
> +-
>  include/hw/ppc/spapr.h |  2 +
>  2 files changed, 52 insertions(+), 49 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 91f7434..daf335c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1233,53 +1233,7 @@ void spapr_free_hpt(sPAPRMachineState *spapr)
>  spapr->htab = NULL;
>  spapr->htab_shift = 0;
>  close_htab_fd(spapr);
> -}
> -
> -static void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
> - Error **errp)
> -{
> -long rc;
> -
> -/* Clean up any HPT info from a previous boot */
> -spapr_free_hpt(spapr);
> -
> -rc = kvmppc_reset_htab(shift);
> -if (rc < 0) {
> -/* kernel-side HPT needed, but couldn't allocate one */
> -error_setg_errno(errp, errno,
> - "Failed to allocate KVM HPT of order %d (try 
> smaller maxmem?)",
> - shift);
> -/* This is almost certainly fatal, but if the caller really
> - * wants to carry on with shift == 0, it's welcome to try */
> -} else if (rc > 0) {
> -/* kernel-side HPT allocated */
> -if (rc != shift) {
> -error_setg(errp,
> -   "Requested order %d HPT, but kernel allocated order 
> %ld (try smaller maxmem?)",
> -   shift, rc);
> -}
> -
> -spapr->htab_shift = shift;
> -spapr->htab = NULL;
> -} else {
> -/* kernel-side HPT not needed, allocate in userspace instead */
> -size_t size = 1ULL << shift;
> -int i;
> -
> -spapr->htab = qemu_memalign(size, size);
> -if (!spapr->htab) {
> -error_setg_errno(errp, errno,
> - "Could not allocate HPT of order %d", shift);
> -return;
> -}
> -
> -memset(spapr->htab, 0, size);
> -spapr->htab_shift = shift;
> -
> -for (i = 0; i < size / HASH_PTE_SIZE_64; i++) {
> -DIRTY_HPTE(HPTE(spapr->htab, i));
> -}
> -}
> +unregister_savevm_live(NULL, "spapr/htab", spapr);
>  }
>  
>  void spapr_setup_hpt_and_vrma(sPAPRMachineState *spapr)
> @@ -1879,6 +1833,55 @@ static SaveVMHandlers savevm_htab_handlers = {
>  .load_state = htab_load,
>  };
>  
> +void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift,
> + Error **errp)
> +{
> +long rc;
> +
> +/* Clean up any HPT info from a previous boot */
> +spapr_free_hpt(spapr);
> +
> +rc = kvmppc_reset_htab(shift);
> +if (rc < 0) {
> +/* kernel-side HPT needed, but couldn't allocate one */
> +error_setg_errno(errp, errno,
> + "Failed to allocate KVM HPT of order %d (try 
> smaller maxmem?)",
> + shift);
> +/* This is almost certainly fatal, but if the caller really
> + * wants to carry on with shift == 0, it's welcome to try */
> +} else if (rc > 0) {
> +/* kernel-side HPT allocated */
> +if (rc != shift) {
> +error_setg(errp,
> +   "Requested order %d HPT, but kernel allocated order 
> %ld (try smaller maxmem?)",
> +   shift, rc);
> +}
> +
> +spapr->htab_shift = shift;
> +spapr->htab = NULL;
> +} else {
> +/* kernel-side HPT not needed, allocate in userspace instead */
> +size_t size = 1ULL << shift;
> +int i;
> +
> +spapr->htab = qemu_memalign(size, size);
> +if (!spapr->htab) {
> +error_setg_errno(errp, errno,
> + "Could not allocate HPT of order %d", shift);
> +return;
> +}
> +
> +memset(spapr->htab, 0, size);
> +spapr->htab_shift = shift;
> +
> +for (i = 0; i < size / 

Re: [Qemu-devel] [RFC PATCH v2 3/4] spapr: Make h_register_process_table hcall flags global

2017-05-21 Thread David Gibson
On Sun, May 21, 2017 at 08:48:42PM +0200, Laurent Vivier wrote:
> On 19/05/2017 07:40, Bharata B Rao wrote:
> > The flags used in h_register_process_table hcall are needed in spapr.c
> > and hence move them to a header file. While doing so, give them
> > slightly specific names.
> > 
> > Signed-off-by: Bharata B Rao 
> > Reviewed-by: David Gibson 
> > ---
> >  hw/ppc/spapr_hcall.c   | 31 ++-
> >  include/hw/ppc/spapr.h | 10 ++
> >  2 files changed, 24 insertions(+), 17 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > index cea5d99..3915e6f 100644
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -921,13 +921,6 @@ static void 
> > spapr_check_setup_free_hpt(sPAPRMachineState *spapr,
> >  return;
> >  }
> >  
> > -#define FLAGS_MASK  0x01FULL
> > -#define FLAG_MODIFY 0x10
> > -#define FLAG_REGISTER   0x08
> > -#define FLAG_RADIX  0x04
> > -#define FLAG_HASH_PROC_TBL  0x02
> > -#define FLAG_GTSE   0x01
> > -
> ...
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index e581c4a..588872a 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -685,4 +685,14 @@ int spapr_rng_populate_dt(void *fdt);
> >  
> >  void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
> >  
> > +/*
> > + * Defines for flag value used in H_REGISTER_PROC_TBL hcall.
> > + */
> > +#define SPAPR_PROC_TABLE_MASK0x01FULL
> > +#define SPAPR_PROC_TABLE_MODIFY  0x10
> > +#define SPAPR_PROC_TABLE_REGISTER0x08
> > +#define SPAPR_PROC_TABLE_RADIX   0x04
> > +#define SPAPR_PROC_TABLE_HPT_PT  0x02
> > +#define SPAPR_PROC_TABLE_GTSE0x01
> 
> I think it should be cleaner if you use 0x1fULL

I agree, but looking at the new version of the final patch, I don't
think this patch will actually be necessary any more.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [RESEND PATCH v10 1/5] hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 08:19:41AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 05/19/2017 01:26 AM, David Gibson wrote:
> > On Thu, May 18, 2017 at 06:54:12PM -0300, Daniel Henrique Barboza wrote:
> > > The LMB DRC release callback, spapr_lmb_release(), uses an opaque
> > > parameter, a sPAPRDIMMState struct that stores the current LMBs that
> > > are allocated to a DIMM (nr_lmbs). After each call to this callback,
> > > the nr_lmbs is decremented by one and, when it reaches zero, the callback
> > > proceeds with the qdev calls to hot unplug the LMB.
> > > 
> > > Using drc->detach_cb_opaque is problematic because it can't be migrated in
> > > the future DRC migration work. This patch makes the following changes to
> > > eliminate the usage of this opaque callback inside spapr_lmb_release:
> > > 
> > > - sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new
> > > attribute called 'addr' was added to it. This is used as an unique
> > > identifier to associate a sPAPRDIMMState to a PCDIMM element.
> > > 
> > > - sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'.
> > > This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs
> > > that are currently going under an unplug process.
> > > 
> > > - spapr_lmb_release() will now retrieve the nr_lmbs value by getting the
> > > correspondent sPAPRDIMMState. A helper function called 
> > > spapr_dimm_get_address
> > > was created to fetch the address of a PCDIMM device inside 
> > > spapr_lmb_release.
> > > When nr_lmbs reaches zero and the callback proceeds with the qdev hot 
> > > unplug
> > > calls, the sPAPRDIMMState struct is removed from 
> > > spapr->pending_dimm_unplugs.
> > > 
> > > After these changes, the opaque argument for spapr_lmb_release is now
> > > unused and is passed as NULL inside spapr_del_lmbs. This and the other
> > > opaque arguments can now be safely removed from the code.
> > > 
> > > Signed-off-by: Daniel Henrique Barboza 
> > > ---
> > >   hw/ppc/spapr.c | 57 
> > > +-
> > >   include/hw/ppc/spapr.h |  4 
> > >   2 files changed, 56 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 0980d73..b05abe5 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -2050,6 +2050,7 @@ static void ppc_spapr_init(MachineState *machine)
> > >   msi_nonbroken = true;
> > >   QLIST_INIT(>phbs);
> > > +QTAILQ_INIT(>pending_dimm_unplugs);
> > >   /* Allocate RMA if necessary */
> > >   rma_alloc_size = kvmppc_alloc_rma();
> > > @@ -2603,20 +2604,63 @@ out:
> > >   error_propagate(errp, local_err);
> > >   }
> > > -typedef struct sPAPRDIMMState {
> > > +struct sPAPRDIMMState {
> > > +uint64_t addr;
> > Since you're not trying to migrate this any more, you can index the
> > list by an actual PCDIMMDevice *, rather than the base address.
> > You're already passing the DeviceState * for the DIMM around, so this
> > will actually remove the address parameter from some functions.
> Good idea.
> 
> > 
> > I think that could actually be done as a preliminary cleanup.  It also
> > probably makes sense to merge spapr_del_lmbs() with
> > spapr_memory_unplug_request(), they're both very small.
> 
> Ok.
> 
> > 
> > 
> > >   uint32_t nr_lmbs;
> > > -} sPAPRDIMMState;
> > > +QTAILQ_ENTRY(sPAPRDIMMState) next;
> > > +};
> > > +
> > > +static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState 
> > > *s,
> > > +   uint64_t addr)
> > > +{
> > > +sPAPRDIMMState *dimm_state = NULL;
> > > +QTAILQ_FOREACH(dimm_state, >pending_dimm_unplugs, next) {
> > > +if (dimm_state->addr == addr) {
> > > +break;
> > > +}
> > > +}
> > > +return dimm_state;
> > > +}
> > > +
> > > +static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr,
> > > +   sPAPRDIMMState *dimm_state)
> > > +{
> > > +g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->addr));
> > > +QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next);
> > > +}
> > > +
> > > +static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,
> > > +  sPAPRDIMMState *dimm_state)
> > > +{
> > > +QTAILQ_REMOVE(>pending_dimm_unplugs, dimm_state, next);
> > > +g_free(dimm_state);
> > > +}
> > > +
> > > +static uint64_t spapr_dimm_get_address(PCDIMMDevice *dimm)
> > > +{
> > > +Error *local_err = NULL;
> > > +uint64_t addr;
> > > +addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP,
> > > +   _err);
> > > +if (local_err) {
> > > +error_propagate(_abort, local_err);
> > > +return 0;
> > > +}
> > > +return addr;
> > > +}
> > >   static void spapr_lmb_release(DeviceState *dev, void *opaque)

Re: [Qemu-devel] [RFC PATCH V2 2/2] msi: Handle remappable format interrupt request

2017-05-21 Thread Lan Tianyu
Hi Anthony:
Thanks for your review.

On 2017年05月19日 19:57, Anthony PERARD wrote:
> On Thu, May 18, 2017 at 01:33:00AM -0400, Lan Tianyu wrote:
>> From: Chao Gao 
>>
>> According to VT-d spec Interrupt Remapping and Interrupt Posting ->
>> Interrupt Remapping -> Interrupt Request Formats On Intel 64
>> Platforms, fields of MSI data register have changed. This patch
>> avoids wrongly regarding a remappable format interrupt request as
>> an interrupt binded with an event channel.
>>
>> Signed-off-by: Chao Gao 
>> Signed-off-by: Lan Tianyu 
>> ---
>>  hw/pci/msi.c | 5 +++--
>>  hw/pci/msix.c| 4 +++-
>>  hw/xen/xen_pt_msi.c  | 2 +-
>>  include/hw/xen/xen.h | 2 +-
>>  xen-hvm-stub.c   | 2 +-
>>  xen-hvm.c| 7 ++-
>>  6 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
>> index a87b227..199cb47 100644
>> --- a/hw/pci/msi.c
>> +++ b/hw/pci/msi.c
>> @@ -289,7 +289,7 @@ void msi_reset(PCIDevice *dev)
>>  static bool msi_is_masked(const PCIDevice *dev, unsigned int vector)
>>  {
>>  uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
>> -uint32_t mask, data;
>> +uint32_t mask, data, addr_lo;
>>  bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
>>  assert(vector < PCI_MSI_VECTORS_MAX);
>>  
>> @@ -298,7 +298,8 @@ static bool msi_is_masked(const PCIDevice *dev, unsigned 
>> int vector)
>>  }
>>  
>>  data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
>> -if (xen_is_pirq_msi(data)) {
>> +addr_lo = pci_get_long(dev->config + msi_address_lo_off(dev));
>> +if (xen_is_pirq_msi(data, addr_lo)) {
>>  return false;
>>  }
>>  
>> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
>> index bb54e8b..efe2982 100644
>> --- a/hw/pci/msix.c
>> +++ b/hw/pci/msix.c
>> @@ -82,9 +82,11 @@ static bool msix_vector_masked(PCIDevice *dev, unsigned 
>> int vector, bool fmask)
>>  {
>>  unsigned offset = vector * PCI_MSIX_ENTRY_SIZE;
>>  uint8_t *data = >msix_table[offset + PCI_MSIX_ENTRY_DATA];
>> +uint8_t *addr_lo = >msix_table[offset + PCI_MSIX_ENTRY_LOWER_ADDR];
>>  /* MSIs on Xen can be remapped into pirqs. In those cases, masking
>>   * and unmasking go through the PV evtchn path. */
>> -if (xen_enabled() && xen_is_pirq_msi(pci_get_long(data))) {
>> +if (xen_enabled() && xen_is_pirq_msi(pci_get_long(data),
>> + pci_get_long(addr_lo))) {
>>  return false;
>>  }
>>  return fmask || dev->msix_table[offset + PCI_MSIX_ENTRY_VECTOR_CTRL] &
>> diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
>> index 5fab95e..45a9e9f 100644
>> --- a/hw/xen/xen_pt_msi.c
>> +++ b/hw/xen/xen_pt_msi.c
>> @@ -114,7 +114,7 @@ static int msi_msix_setup(XenPCIPassthroughState *s,
>>  
>>  assert((!is_msix && msix_entry == 0) || is_msix);
>>  
>> -if (xen_is_pirq_msi(data)) {
>> +if (xen_is_pirq_msi(data, addr)) {
>>  *ppirq = msi_ext_dest_id(addr >> 32) | msi_dest_id(addr);
>>  if (!*ppirq) {
>>  /* this probably identifies an misconfiguration of the guest,
>> diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
>> index 09c2ce5..af759bc 100644
>> --- a/include/hw/xen/xen.h
>> +++ b/include/hw/xen/xen.h
>> @@ -33,7 +33,7 @@ int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);
>>  void xen_piix3_set_irq(void *opaque, int irq_num, int level);
>>  void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int 
>> len);
>>  void xen_hvm_inject_msi(uint64_t addr, uint32_t data);
>> -int xen_is_pirq_msi(uint32_t msi_data);
>> +int xen_is_pirq_msi(uint32_t msi_data, uint32_t msi_addr_lo);
> 
> Maybe inverting the arguments would be better, so the arguments would be
> the address first, then the data, like I think it is often the case.
> What do you think?


That make sense. Will update.

> 
>>  
>>  qemu_irq *xen_interrupt_controller_init(void);
>>  
>> diff --git a/xen-hvm-stub.c b/xen-hvm-stub.c
>> index c500325..dae421c 100644
>> --- a/xen-hvm-stub.c
>> +++ b/xen-hvm-stub.c
>> @@ -31,7 +31,7 @@ void xen_hvm_inject_msi(uint64_t addr, uint32_t data)
>>  {
>>  }
>>  
>> -int xen_is_pirq_msi(uint32_t msi_data)
>> +int xen_is_pirq_msi(uint32_t msi_data, uint32_t msi_addr_lo)
>>  {
>>  return 0;
>>  }
>> diff --git a/xen-hvm.c b/xen-hvm.c
>> index 5043beb..db29121 100644
>> --- a/xen-hvm.c
>> +++ b/xen-hvm.c
>> @@ -146,8 +146,13 @@ void xen_piix_pci_write_config_client(uint32_t address, 
>> uint32_t val, int len)
>>  }
>>  }
>>  
>> -int xen_is_pirq_msi(uint32_t msi_data)
>> +int xen_is_pirq_msi(uint32_t msi_data, uint32_t msi_addr_lo)
>>  {
>> +/* If msi address is configurate to remapping format, the msi will not
>> + * remapped into a pirq.
> 
> What do you think of: "If the MSI address is configured in remappable
> format, the MSI will not be remapped into a pirq." ?

Will update.

> 
>> + */

Re: [Qemu-devel] [PATCH] e1000e: Fix a bug where guest hangs upon migration

2017-05-21 Thread Jason Wang



On 2017年05月19日 22:04, Sameeh Jubran wrote:

On Fri, May 19, 2017 at 9:25 AM, Jason Wang  wrote:



On 2017年05月17日 19:46, Sameeh Jubran wrote:


The bug was caused by the "receive overrun" (bit #6 of the ICR register)
interrupt
which would be triggered post migration in a heavy traffic environment.
Even though the
"receive overrun" bit (#6) is masked out by the IMS register (refer to
the log below)
the driver still receives an interrupt as the "receive overrun" bit (#6)
causes the
"Other" - bit #24 of the ICR register - bit to be set as documented
below. The driver
handles the interrupt and clears the "Other" bit (#24) but doesn't clear
the
"receive overrun" bit (#6) which leads to an infinite loop. Apparently
the Windows
driver expects that the "receive overrun" bit and other ones - documented
below - to be
cleared when the "Other" bit (#24) is cleared.

So to sum that up:
1. Bit #6 of the ICR register is set by heavy traffic
2. As a results of setting bit #6, bit #24 is set
3. The driver receives an interrupt for bit 24 (it doesn't receieve an
interrupt for bit #6 as it is masked out by IMS)
4. The driver handles and clears the interrupt of bit #24
5. Bit #6 is still set.
6. 2 happens all over again

The Interrupt Cause Read - ICR register:

The ICR has the "Other" bit - bit #24 - that is set when one or more of
the following
ICR register's bits are set:

LSC - bit #2, RXO - bit #6, MDAC - bit #9, SRPD - bit #16, ACK - bit #17,
MNG - bit #18

Log sample of the storm:

27563@1494850819.411877:e1000e_irq_pending_interrupts ICR PENDING:
0x100 (ICR: 0x815000c2, IMS: 0x1a4)
27563@1494850819.411900:e1000e_irq_pending_interrupts ICR PENDING: 0x0
(ICR: 0x815000c2, IMS: 0xa4)
27563@1494850819.411915:e1000e_irq_pending_interrupts ICR PENDING: 0x0
(ICR: 0x815000c2, IMS: 0xa4)
27563@1494850819.412380:e1000e_irq_pending_interrupts ICR PENDING: 0x0
(ICR: 0x815000c2, IMS: 0xa4)
27563@1494850819.412395:e1000e_irq_pending_interrupts ICR PENDING: 0x0
(ICR: 0x815000c2, IMS: 0xa4)
27563@1494850819.412436:e1000e_irq_pending_interrupts ICR PENDING: 0x0
(ICR: 0x815000c2, IMS: 0xa4)
27563@1494850819.412441:e1000e_irq_pending_interrupts ICR PENDING: 0x0
(ICR: 0x815000c2, IMS: 0xa4)
27563@1494850819.412998:e1000e_irq_pending_interrupts ICR PENDING:
0x100 (ICR: 0x815000c2, IMS: 0x1a4)

This commit solves:
https://bugzilla.redhat.com/show_bug.cgi?id=1447935
https://bugzilla.redhat.com/show_bug.cgi?id=1449490

Signed-off-by: Sameeh Jubran 
---
   hw/net/e1000e_core.c | 7 +--
   1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 28c5be1..8174b53 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2454,14 +2454,17 @@ e1000e_set_ics(E1000ECore *core, int index,
uint32_t val)
   static void
   e1000e_set_icr(E1000ECore *core, int index, uint32_t val)
   {
+uint32_t icr = 0;
   if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
   (core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
   trace_e1000e_irq_icr_process_iame();
   e1000e_clear_ims_bits(core, core->mac[IAM]);
   }
   -trace_e1000e_irq_icr_write(val, core->mac[ICR], core->mac[ICR] &
~val);
-core->mac[ICR] &= ~val;
+icr = core->mac[ICR] & ~val;
+icr = (val & E1000_ICR_OTHER) ? (icr & ~E1000_ICR_OTHER_CAUSES) :
icr;
+trace_e1000e_irq_icr_write(val, core->mac[ICR], icr);
+core->mac[ICR] = icr;
   e1000e_update_interrupt_state(core);
   }



Thanks for the patch.

So this is an undocumented behavior, we must be careful on this. Several
question below:

- have you verified this on real hardware?


No I haven't


Any chance to verify the behavior on real hardware?




- is MSIX enabled in this case?


Yes it is, I have tested the patch with msi disabled too.


Does the problem exist in no msi case. If not, is this better to hack 
EIAC behavior?





- according to the steps you've summed up above, it's not specific to
migration?


True


Then we probably need tweak the title.

Thanks.




Thanks









Re: [Qemu-devel] [PATCH v4 10/10] vhost: iommu: cache static mapping if there is

2017-05-21 Thread Peter Xu
On Fri, May 19, 2017 at 07:55:26PM +0300, Michael S. Tsirkin wrote:
> On Fri, May 19, 2017 at 11:19:49AM +0800, Peter Xu wrote:
> > This patch pre-heat vhost iotlb cache when passthrough mode enabled.
> > 
> > Sometimes, even if user specified iommu_platform for vhost devices,
> > IOMMU might still be disabled. One case is passthrough mode in VT-d
> > implementation. We can detect this by observing iommu_list. If it's
> > empty, it means IOMMU translation is disabled, then we can actually
> > pre-heat the translation (it'll be static mapping then) by first
> > invalidating all IOTLB, then cache existing memory ranges into vhost
> > backend iotlb using 1:1 mapping.
> > 
> > Signed-off-by: Peter Xu 
> 
> I don't really understand. Is this a performance optimization?
> Can you post some #s please?

Yes, it is. Vhost can work even without this patch, but it should be
faster when with this patch.

As mentioned in the commit message and below comment [1], this patch
pre-heat the cache for vhost. Currently the cache entries depends on
the system memory ranges (dev->mem->nregions), and it should be far
smaller than vhost's cache count (currently it is statically defined
as max_iotlb_entries=2048 in kernel). If with current patch, these
cache entries can cover the whole possible DMA ranges that PT mode
would allow, so we won't have any cache miss then.

For the comments, do you have any better suggestion besides commit
message and [1]?

> 
> Also, if it's PT, can't we bypass iommu altogether? That would be
> even faster ...

Yes, but I don't yet know a good way to do it... Any suggestion is
welcomed as well.

Btw, do you have any comment on other patches besides this one? Since
this patch can really be isolated from the whole PT support series.

Thanks,

> 
> > ---
> >  hw/virtio/trace-events |  4 
> >  hw/virtio/vhost.c  | 49 
> > +
> >  2 files changed, 53 insertions(+)
> > 
> > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > index 1f7a7c1..54dcbb3 100644
> > --- a/hw/virtio/trace-events
> > +++ b/hw/virtio/trace-events
> > @@ -24,3 +24,7 @@ virtio_balloon_handle_output(const char *name, uint64_t 
> > gpa) "section name: %s g
> >  virtio_balloon_get_config(uint32_t num_pages, uint32_t actual) "num_pages: 
> > %d actual: %d"
> >  virtio_balloon_set_config(uint32_t actual, uint32_t oldactual) "actual: %d 
> > oldactual: %d"
> >  virtio_balloon_to_target(uint64_t target, uint32_t num_pages) "balloon 
> > target: %"PRIx64" num_pages: %d"
> > +
> > +# hw/virtio/vhost.c
> > +vhost_iommu_commit(void) ""
> > +vhost_iommu_static_preheat(void) ""
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 03a46a7..8069135 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -27,6 +27,7 @@
> >  #include "hw/virtio/virtio-access.h"
> >  #include "migration/blocker.h"
> >  #include "sysemu/dma.h"
> > +#include "trace.h"
> >  
> >  /* enabled until disconnected backend stabilizes */
> >  #define _VHOST_DEBUG 1
> > @@ -730,6 +731,11 @@ static void vhost_iommu_unmap_notify(IOMMUNotifier *n, 
> > IOMMUTLBEntry *iotlb)
> >  }
> >  }
> >  
> > +static bool vhost_iommu_mr_enabled(struct vhost_dev *dev)
> > +{
> > +return !QLIST_EMPTY(>iommu_list);
> > +}
> > +
> >  static void vhost_iommu_region_add(MemoryListener *listener,
> > MemoryRegionSection *section)
> >  {
> > @@ -782,6 +788,48 @@ static void vhost_iommu_region_del(MemoryListener 
> > *listener,
> >  }
> >  }
> >  
> > +static void vhost_iommu_commit(MemoryListener *listener)
> > +{
> > +struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> > + iommu_listener);
> > +struct vhost_memory_region *r;
> > +int i;
> > +
> > +trace_vhost_iommu_commit();
> > +
> > +if (!vhost_iommu_mr_enabled(dev)) {
> > +/*
> > +* This means iommu_platform is enabled, however iommu memory
> > +* region is disabled, e.g., when device passthrough is setup.
> > +* Then, no translation is needed any more.
> > +*
> > +* Let's first invalidate the whole IOTLB, then pre-heat the
> > +* static mapping by looping over vhost memory ranges.
> > +*/

[1]

> > +
> > +if (dev->vhost_ops->vhost_invalidate_device_iotlb(dev, 0,
> > +  UINT64_MAX)) {
> > +error_report("%s: flush existing IOTLB failed", __func__);
> > +return;
> > +}
> > +
> > +for (i = 0; i < dev->mem->nregions; i++) {
> > +r = >mem->regions[i];
> > +/* Vhost regions are writable RAM, so IOMMU_RW suites. */
> > +if (dev->vhost_ops->vhost_update_device_iotlb(dev,
> > +  
> > r->guest_phys_addr,
> > +  
> > 

Re: [Qemu-devel] [PATCH v2 4/4] spapr: fix migration of ICP objects from/to older QEMU

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 12:32:27PM +0200, Greg Kurz wrote:
> Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under
> sPAPRCPUCore") moved ICP objects from the machine to CPU cores. This
> is an improvement since we no longer allocate ICP objects that will
> never be used. But it has the side-effect of breaking migration of
> older machine types from older QEMU versions.
> 
> This patch introduces a compat flag in the sPAPR machine class so
> that all pseries machine up to 2.9 go on with the previous behavior
> of pre-allocating ICP objects.
> 
> Signed-off-by: Greg Kurz 
> ---
> v2: - s/void* /void * in xics_system_init()
> - don't use "[*]" in the ICP object name
> - use pre_2_10_ prefix in field names
> - added xics_nr_servers() helper
> ---
>  hw/ppc/spapr.c  |   40 +++-
>  hw/ppc/spapr_cpu_core.c |   29 -
>  include/hw/ppc/spapr.h  |2 ++
>  3 files changed, 61 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 1bb05a9a6b07..182262257c60 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -123,9 +123,15 @@ error:
>  return NULL;
>  }
>  
> +static inline int xics_nr_servers(void)
> +{
> +return ppc_cpu_dt_id_from_index(max_cpus);
> +}
> +
>  static void xics_system_init(MachineState *machine, int nr_irqs, Error 
> **errp)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>  
>  if (kvm_enabled()) {
>  if (machine_kernel_irqchip_allowed(machine) &&
> @@ -147,6 +153,35 @@ static void xics_system_init(MachineState *machine, int 
> nr_irqs, Error **errp)
>  return;
>  }
>  }
> +
> +if (smc->pre_2_10_icp_allocation) {
> +int nr_servers = xics_nr_servers();
> +Error *local_err = NULL;
> +int i;
> +
> +spapr->pre_2_10_icps = g_malloc0(nr_servers * sizeof(ICPState));
> +
> +for (i = 0; i < nr_servers; i++) {
> +void *obj = >pre_2_10_icps[i];
> +char *name = g_strdup_printf("icp[%d]", i);
> +
> +object_initialize(obj, sizeof(ICPState), spapr->icp_type);
> +object_property_add_child(OBJECT(spapr), name, obj, 
> _abort);
> +g_free(name);
> +object_unref(obj);
> +object_property_add_const_link(obj, "xics", OBJECT(spapr),
> +   _abort);
> +object_property_set_bool(obj, true, "realized", _err);
> +if (local_err) {
> +while (i--) {
> +object_unparent(obj);
> +}
> +g_free(spapr->pre_2_10_icps);
> +error_propagate(errp, local_err);
> +break;
> +}
> +}
> +}
>  }
>  
>  static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
> @@ -1020,7 +1055,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>  _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
>  
>  /* /interrupt controller */
> -spapr_dt_xics(ppc_cpu_dt_id_from_index(max_cpus), fdt, PHANDLE_XICP);
> +spapr_dt_xics(xics_nr_servers(), fdt, PHANDLE_XICP);
>  
>  ret = spapr_populate_memory(spapr, fdt);
>  if (ret < 0) {
> @@ -3286,9 +3321,12 @@ static void 
> spapr_machine_2_9_instance_options(MachineState *machine)
>  
>  static void spapr_machine_2_9_class_options(MachineClass *mc)
>  {
> +sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
> +
>  spapr_machine_2_10_class_options(mc);
>  SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
>  mc->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
> +smc->pre_2_10_icp_allocation = true;
>  }
>  
>  DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index ff7058ecc00e..13c4916aa5e6 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -119,6 +119,7 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, 
> Error **errp)
>  size_t size = object_type_get_instance_size(typename);
>  CPUCore *cc = CPU_CORE(dev);
>  int i;
> +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>  
>  for (i = 0; i < cc->nr_threads; i++) {
>  void *obj = sc->threads + i * size;
> @@ -127,7 +128,9 @@ static void spapr_cpu_core_unrealizefn(DeviceState *dev, 
> Error **errp)
>  PowerPCCPU *cpu = POWERPC_CPU(cs);
>  
>  spapr_cpu_destroy(cpu);
> -object_unparent(cpu->intc);
> +if (!spapr->pre_2_10_icps) {

Hrm.  I dislike code for the core object directly reaching into the
machine to check the compat flag here (and a bunch of other places
below).  I can think of a few possible ways of avoiding this:

1) The most direct is to make another compat flag in the cpu core
object, set by the machine.  Straightforward, but ugly.

2) Use a property to 

Re: [Qemu-devel] [PATCH v4 10/10] vhost: iommu: cache static mapping if there is

2017-05-21 Thread Jason Wang



On 2017年05月20日 00:55, Michael S. Tsirkin wrote:

On Fri, May 19, 2017 at 11:19:49AM +0800, Peter Xu wrote:

This patch pre-heat vhost iotlb cache when passthrough mode enabled.

Sometimes, even if user specified iommu_platform for vhost devices,
IOMMU might still be disabled. One case is passthrough mode in VT-d
implementation. We can detect this by observing iommu_list. If it's
empty, it means IOMMU translation is disabled, then we can actually
pre-heat the translation (it'll be static mapping then) by first
invalidating all IOTLB, then cache existing memory ranges into vhost
backend iotlb using 1:1 mapping.

Signed-off-by: Peter Xu

I don't really understand. Is this a performance optimization?
Can you post some #s please?

Also, if it's PT, can't we bypass iommu altogether?


The problem is, since device could be moved between domains, which means 
we need new notifier to notify vhost to enable or disable IOMMU_PLATFORM.



That would be
even faster ...



Should be the same (except for the first access in no CM mode), we pass 
and use vhost_memory_regions as what we've used for non iommu case.


Thanks



Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] Re: [PATCH v2 00/16] Vhost-pci for inter-VM communication

2017-05-21 Thread Jason Wang



On 2017年05月19日 23:33, Stefan Hajnoczi wrote:

On Fri, May 19, 2017 at 11:10:33AM +0800, Jason Wang wrote:

On 2017年05月18日 11:03, Wei Wang wrote:

On 05/17/2017 02:22 PM, Jason Wang wrote:

On 2017年05月17日 14:16, Jason Wang wrote:

On 2017年05月16日 15:12, Wei Wang wrote:

Hi:

Care to post the driver codes too?


OK. It may take some time to clean up the driver code before
post it out. You can first
have a check of the draft at the repo here:
https://github.com/wei-w-wang/vhost-pci-driver

Best,
Wei

Interesting, looks like there's one copy on tx side. We used to
have zerocopy support for tun for VM2VM traffic. Could you
please try to compare it with your vhost-pci-net by:


We can analyze from the whole data path - from VM1's network stack to
send packets -> VM2's
network stack to receive packets. The number of copies are actually the
same for both.

That's why I'm asking you to compare the performance. The only reason for
vhost-pci is performance. You should prove it.

There is another reason for vhost-pci besides maximum performance:

vhost-pci makes it possible for end-users to run networking or storage
appliances in compute clouds.  Cloud providers do not allow end-users to
run custom vhost-user processes on the host so you need vhost-pci.

Stefan


Then it has non NFV use cases and the question goes back to the 
performance comparing between vhost-pci and zerocopy vhost_net. If it 
does not perform better, it was less interesting at least in this case.


Thanks



Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] Re: [PATCH v2 00/16] Vhost-pci for inter-VM communication

2017-05-21 Thread Jason Wang



On 2017年05月20日 00:49, Michael S. Tsirkin wrote:

On Fri, May 19, 2017 at 11:10:33AM +0800, Jason Wang wrote:


On 2017年05月18日 11:03, Wei Wang wrote:

On 05/17/2017 02:22 PM, Jason Wang wrote:


On 2017年05月17日 14:16, Jason Wang wrote:


On 2017年05月16日 15:12, Wei Wang wrote:

Hi:

Care to post the driver codes too?


OK. It may take some time to clean up the driver code before
post it out. You can first
have a check of the draft at the repo here:
https://github.com/wei-w-wang/vhost-pci-driver

Best,
Wei

Interesting, looks like there's one copy on tx side. We used to
have zerocopy support for tun for VM2VM traffic. Could you
please try to compare it with your vhost-pci-net by:


We can analyze from the whole data path - from VM1's network stack to
send packets -> VM2's
network stack to receive packets. The number of copies are actually the
same for both.

That's why I'm asking you to compare the performance. The only reason for
vhost-pci is performance. You should prove it.


vhost-pci: 1-copy happen in VM1's driver xmit(), which copes packets
from its network stack to VM2's
RX ring buffer. (we call it "zerocopy" because there is no intermediate
copy between VMs)
zerocopy enabled vhost-net: 1-copy happen in tun's recvmsg, which copies
packets from VM1's TX ring
buffer to VM2's RX ring buffer.

Actually, there's a major difference here. You do copy in guest which
consumes time slice of vcpu thread on host. Vhost_net do this in its own
thread. So I feel vhost_net is even faster here, maybe I was wrong.

Yes but only if you have enough CPUs. The point of vhost-pci
is to put the switch in a VM and scale better with # of VMs.


Does the overall performance really increase? I suspect the only thing 
vhost-pci gains here is probably scheduling cost and copying in guest 
should be slower than doing it in host.





That being said, we compared to vhost-user, instead of vhost_net,
because vhost-user is the one
that is used in NFV, which we think is a major use case for vhost-pci.

If this is true, why not draft a pmd driver instead of a kernel one? And do
you use virtio-net kernel driver to compare the performance? If yes, has OVS
dpdk optimized for kernel driver (I think not)?

What's more important, if vhost-pci is faster, I think its kernel driver
should be also faster than virtio-net, no?

If you have a vhost CPU per VCPU and can give a host CPU to each using
that will be faster.  But not everyone has so many host CPUs.


If the major use case is NFV, we should have sufficient CPU resources I 
believe?


Thanks







- make sure zerocopy is enabled for vhost_net
- comment skb_orphan_frags() in tun_net_xmit()

Thanks


You can even enable tx batching for tun by ethtool -C tap0 rx-frames
N. This will greatly improve the performance according to my test.


Thanks, but would this hurt latency?

Best,
Wei

I don't see this in my test.

Thanks





Re: [Qemu-devel] [PATCH v2 3/4] target/ppc: consolidate CPU device-tree id computation in helper

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 12:32:20PM +0200, Greg Kurz wrote:
> For historical reasons, we compute CPU device-tree ids with a non-trivial
> logic. This patch consolidate the logic in a single helper to be used
> in various places where it is currently open-coded.
> 
> It is okay to get rid of DIV_ROUND_UP() because we're sure that the number
> of threads per core in the guest cannot exceed the number of threads per
> core in the host.

However, your new logic still gives different answers in some cases.
In particular when max_cpus is not a multiple of smp_threads.  Which
is generally a bad idea, but allowed for older machine types for
compatibility.   e.g. smp_threads=4, max_cpus=6 smt=8

Old logic:
 DIV_ROUND_UP(6 * 8, 4)
   = ⌈48 / 4⌉ = 12

New logic gives: ⌊6 / 4⌋ * 8 + (6 % 4)
   = 1 * 8 + 2
   = 10

In any case the DIV_ROUND_UP() isn't to handle the case where guest
threads-per-core is bigger than host threads-per-core, it's (IIRC) for
the case where guest threads-per-core is not a factor of host
threads-per-core.  Again, a bad idea, but I think allowed in some old
cases.

> 
> Signed-off-by: Greg Kurz 
> ---
>  hw/ppc/spapr.c  |6 ++
>  target/ppc/cpu.h|   17 +
>  target/ppc/translate_init.c |3 +--
>  3 files changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 75e298b4c6be..1bb05a9a6b07 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -981,7 +981,6 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>  void *fdt;
>  sPAPRPHBState *phb;
>  char *buf;
> -int smt = kvmppc_smt_threads();
>  
>  fdt = g_malloc0(FDT_MAX_SIZE);
>  _FDT((fdt_create_empty_tree(fdt, FDT_MAX_SIZE)));
> @@ -1021,7 +1020,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>  _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
>  
>  /* /interrupt controller */
> -spapr_dt_xics(DIV_ROUND_UP(max_cpus * smt, smp_threads), fdt, 
> PHANDLE_XICP);
> +spapr_dt_xics(ppc_cpu_dt_id_from_index(max_cpus), fdt, PHANDLE_XICP);
>  
>  ret = spapr_populate_memory(spapr, fdt);
>  if (ret < 0) {
> @@ -1977,7 +1976,6 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
>  MachineState *machine = MACHINE(spapr);
>  MachineClass *mc = MACHINE_GET_CLASS(machine);
>  char *type = spapr_get_cpu_core_type(machine->cpu_model);
> -int smt = kvmppc_smt_threads();
>  const CPUArchIdList *possible_cpus;
>  int boot_cores_nr = smp_cpus / smp_threads;
>  int i;
> @@ -2014,7 +2012,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
>  sPAPRDRConnector *drc =
>  spapr_dr_connector_new(OBJECT(spapr),
> SPAPR_DR_CONNECTOR_TYPE_CPU,
> -   (core_id / smp_threads) * smt);
> +   ppc_cpu_dt_id_from_index(core_id));
>  
>  qemu_register_reset(spapr_drc_reset, drc);
>  }
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 401e10e7dad8..47fe6c64698f 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -2529,4 +2529,21 @@ int ppc_get_vcpu_dt_id(PowerPCCPU *cpu);
>  PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id);
>  
>  void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
> +
> +#if !defined(CONFIG_USER_ONLY)
> +#include "sysemu/cpus.h"
> +#include "target/ppc/kvm_ppc.h"
> +
> +static inline int ppc_cpu_dt_id_from_index(int cpu_index)
> +{
> +/* POWER HV support has an historical limitation that different threads
> + * on a single core cannot be in different guests at the same time. In
> + * order to allow KVM to assign guest threads to host cores accordingly,
> + * CPU device tree ids are spaced by the number of threads per host 
> cores.
> + */
> +return (cpu_index / smp_threads) * kvmppc_smt_threads()
> ++ (cpu_index % smp_threads);
> +}
> +#endif
> +
>  #endif /* PPC_CPU_H */
> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> index 56a0ab22cfbe..837a9a496a65 100644
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -9851,8 +9851,7 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  }
>  
>  #if !defined(CONFIG_USER_ONLY)
> -cpu->cpu_dt_id = (cs->cpu_index / smp_threads) * max_smt
> -+ (cs->cpu_index % smp_threads);
> +cpu->cpu_dt_id = ppc_cpu_dt_id_from_index(cs->cpu_index);
>  
>  if (kvm_enabled() && !kvm_vcpu_id_is_valid(cpu->cpu_dt_id)) {
>  error_setg(errp, "Can't create CPU with id %d in KVM", 
> cpu->cpu_dt_id);
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc

Re: [Qemu-devel] [PATCH v2 2/4] spapr: fix error reporting in xics_system_init()

2017-05-21 Thread David Gibson
On Sun, May 21, 2017 at 07:03:33PM +0200, Greg Kurz wrote:
> On Sat, 20 May 2017 16:45:09 +1000
> David Gibson  wrote:
> 
> > On Fri, May 19, 2017 at 12:32:12PM +0200, Greg Kurz wrote:
> > > If the user explicitely asked for kernel-irqchip support and "xics-kvm"
> > > initialization fails, we shouldn't fallback to emulated "xics" as we
> > > do now. It is also awkward to print an error message when we have an
> > > errp pointer argument.
> > > 
> > > Let's use the errp argument to report the error and let the caller decide.
> > > This simplifies the code as we don't need a local Error * here.
> > > 
> > > Signed-off-by: Greg Kurz   
> > 
> > Concept looks good, but..
> > 
> > > ---
> > > v2: - total rewrite
> > > ---
> > >  hw/ppc/spapr.c |   13 ++---
> > >  1 file changed, 6 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 91f7434861a8..75e298b4c6be 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -128,18 +128,14 @@ static void xics_system_init(MachineState *machine, 
> > > int nr_irqs, Error **errp)
> > >  sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> > >  
> > >  if (kvm_enabled()) {
> > > -Error *err = NULL;
> > > -
> > >  if (machine_kernel_irqchip_allowed(machine) &&
> > >  !xics_kvm_init(spapr, errp)) {
> > >  spapr->icp_type = TYPE_KVM_ICP;
> > > -spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, 
> > > );
> > > +spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, 
> > > errp);  
> > 
> > I believe there are reasons you're not supposed to just pass an errp
> > through to a subordinate function.  Instead you're supposed to have a
> > local Error * and use error_propagate().
> > 
> 
> You only need to have a local Error * if it is used to check the return status
> of the function (ie, you cannot check *errp because errp could be NULL) as
> described in error.h. This isn't the case here but...

Fair point; patch applied to ppc-for-2.10.

> 
> > >  }
> > >  if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
> > > -error_reportf_err(err,
> > > -  "kernel_irqchip requested but unavailable: 
> > > ");
> > > -} else {
> > > -error_free(err);
> > > +error_prepend(errp, "kernel_irqchip requested but 
> > > unavailable: ");
> > > +return;
> > >  }
> > >  }
> > >  
> > > @@ -147,6 +143,9 @@ static void xics_system_init(MachineState *machine, 
> > > int nr_irqs, Error **errp)
> > >  xics_spapr_init(spapr);
> > >  spapr->icp_type = TYPE_ICP;
> > >  spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, 
> > > errp);
> > > +if (!spapr->ics) {  
> > 
> > It would also be more standard to check the returned error, rather
> > than the other result.
> > 
> 
> ... if you prefer to use a local Error *, I'll gladly do that. :)
> 
> > > +return;
> > > +}
> > >  }
> > >  }
> > >  
> > >   
> > 
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 3/4] target/ppc: consolidate CPU device-tree id computation in helper

2017-05-21 Thread David Gibson
On Mon, May 22, 2017 at 12:04:13PM +1000, David Gibson wrote:
> On Fri, May 19, 2017 at 12:32:20PM +0200, Greg Kurz wrote:
> > For historical reasons, we compute CPU device-tree ids with a non-trivial
> > logic. This patch consolidate the logic in a single helper to be used
> > in various places where it is currently open-coded.
> > 
> > It is okay to get rid of DIV_ROUND_UP() because we're sure that the number
> > of threads per core in the guest cannot exceed the number of threads per
> > core in the host.
> 
> However, your new logic still gives different answers in some cases.
> In particular when max_cpus is not a multiple of smp_threads.  Which
> is generally a bad idea, but allowed for older machine types for
> compatibility.   e.g. smp_threads=4, max_cpus=6 smt=8
> 
> Old logic:
>DIV_ROUND_UP(6 * 8, 4)
>  = ⌈48 / 4⌉ = 12
> 
> New logic gives: ⌊6 / 4⌋ * 8 + (6 % 4)
>= 1 * 8 + 2
>  = 10
> 
> In any case the DIV_ROUND_UP() isn't to handle the case where guest
> threads-per-core is bigger than host threads-per-core, it's (IIRC) for
> the case where guest threads-per-core is not a factor of host
> threads-per-core.  Again, a bad idea, but I think allowed in some old
> cases.

Oh, so, the other more general point here is that I actually want to
get rid of dt_id from the cpu structure.  It's basically an abuse of
the cpu stuff to include what's really an spapr concept - dt IDs for
powernv are based on the PIR and not allocate the same way.

That said, I'm still ok with a fixed version of this patch as an
interim step.

> > Signed-off-by: Greg Kurz 
> > ---
> >  hw/ppc/spapr.c  |6 ++
> >  target/ppc/cpu.h|   17 +
> >  target/ppc/translate_init.c |3 +--
> >  3 files changed, 20 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 75e298b4c6be..1bb05a9a6b07 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -981,7 +981,6 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
> >  void *fdt;
> >  sPAPRPHBState *phb;
> >  char *buf;
> > -int smt = kvmppc_smt_threads();
> >  
> >  fdt = g_malloc0(FDT_MAX_SIZE);
> >  _FDT((fdt_create_empty_tree(fdt, FDT_MAX_SIZE)));
> > @@ -1021,7 +1020,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
> >  _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2));
> >  
> >  /* /interrupt controller */
> > -spapr_dt_xics(DIV_ROUND_UP(max_cpus * smt, smp_threads), fdt, 
> > PHANDLE_XICP);
> > +spapr_dt_xics(ppc_cpu_dt_id_from_index(max_cpus), fdt, PHANDLE_XICP);
> >  
> >  ret = spapr_populate_memory(spapr, fdt);
> >  if (ret < 0) {
> > @@ -1977,7 +1976,6 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
> >  MachineState *machine = MACHINE(spapr);
> >  MachineClass *mc = MACHINE_GET_CLASS(machine);
> >  char *type = spapr_get_cpu_core_type(machine->cpu_model);
> > -int smt = kvmppc_smt_threads();
> >  const CPUArchIdList *possible_cpus;
> >  int boot_cores_nr = smp_cpus / smp_threads;
> >  int i;
> > @@ -2014,7 +2012,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
> >  sPAPRDRConnector *drc =
> >  spapr_dr_connector_new(OBJECT(spapr),
> > SPAPR_DR_CONNECTOR_TYPE_CPU,
> > -   (core_id / smp_threads) * smt);
> > +   ppc_cpu_dt_id_from_index(core_id));
> >  
> >  qemu_register_reset(spapr_drc_reset, drc);
> >  }
> > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > index 401e10e7dad8..47fe6c64698f 100644
> > --- a/target/ppc/cpu.h
> > +++ b/target/ppc/cpu.h
> > @@ -2529,4 +2529,21 @@ int ppc_get_vcpu_dt_id(PowerPCCPU *cpu);
> >  PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id);
> >  
> >  void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
> > +
> > +#if !defined(CONFIG_USER_ONLY)
> > +#include "sysemu/cpus.h"
> > +#include "target/ppc/kvm_ppc.h"
> > +
> > +static inline int ppc_cpu_dt_id_from_index(int cpu_index)
> > +{
> > +/* POWER HV support has an historical limitation that different threads
> > + * on a single core cannot be in different guests at the same time. In
> > + * order to allow KVM to assign guest threads to host cores 
> > accordingly,
> > + * CPU device tree ids are spaced by the number of threads per host 
> > cores.
> > + */
> > +return (cpu_index / smp_threads) * kvmppc_smt_threads()
> > ++ (cpu_index % smp_threads);
> > +}
> > +#endif
> > +
> >  #endif /* PPC_CPU_H */
> > diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
> > index 56a0ab22cfbe..837a9a496a65 100644
> > --- a/target/ppc/translate_init.c
> > +++ b/target/ppc/translate_init.c
> > @@ -9851,8 +9851,7 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error 
> > **errp)
> >  }
> >  
> >  

Re: [Qemu-devel] [Qemu-arm] [PATCH 1/2] hw/adc: Add basic Aspeed ADC model

2017-05-21 Thread Andrew Jeffery
Hi Phil,

On Sat, 2017-05-20 at 00:21 -0300, Philippe Mathieu-Daudé wrote:
> Hi Andrew,
> 
> On 05/19/2017 09:26 PM, Andrew Jeffery wrote:
> > This model implements enough behaviour to do basic functionality tests
> > such as device initialisation and read out of dummy sample values. The
> > sample value generation strategy is similar to the STM ADC already in
> > the tree.
> > 
> > > > Signed-off-by: Andrew Jeffery 
> > ---
> >  hw/adc/Makefile.objs|   1 +
> >  hw/adc/aspeed_adc.c | 246 
> > 
> >  include/hw/adc/aspeed_adc.h |  33 ++
> >  3 files changed, 280 insertions(+)
> >  create mode 100644 hw/adc/aspeed_adc.c
> >  create mode 100644 include/hw/adc/aspeed_adc.h
> > 
> > diff --git a/hw/adc/Makefile.objs b/hw/adc/Makefile.objs
> > index 3f6dfdedaec7..2bf9362ba3c4 100644
> > --- a/hw/adc/Makefile.objs
> > +++ b/hw/adc/Makefile.objs
> > @@ -1 +1,2 @@
> >  obj-$(CONFIG_STM32F2XX_ADC) += stm32f2xx_adc.o
> > +obj-$(CONFIG_ASPEED_SOC) += aspeed_adc.o
> > diff --git a/hw/adc/aspeed_adc.c b/hw/adc/aspeed_adc.c
> > new file mode 100644
> > index ..d08f1684f7bc
> > --- /dev/null
> > +++ b/hw/adc/aspeed_adc.c
> > @@ -0,0 +1,246 @@
> > +/*
> > + * Aspeed ADC
> > + *
> > > > + * Andrew Jeffery 
> > + *
> > + * Copyright 2017 IBM Corp.
> > + *
> > + * This code is licensed under the GPL version 2 or later.  See
> > + * the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "hw/adc/aspeed_adc.h"
> > +#include "qapi/error.h"
> > +#include "qemu/log.h"
> > +
> > +#define ASPEED_ADC_ENGINE_CTRL  0x00
> > +#define  ASPEED_ADC_ENGINE_CH_EN_MASK   0x
> > +#define   ASPEED_ADC_ENGINE_CH_EN(x)((BIT(x)) << 16)
> > +#define  ASPEED_ADC_ENGINE_INIT BIT(8)
> > +#define  ASPEED_ADC_ENGINE_AUTO_COMPBIT(5)
> > +#define  ASPEED_ADC_ENGINE_COMP BIT(4)
> > +#define  ASPEED_ADC_ENGINE_MODE_MASK0x000e
> > +#define   ASPEED_ADC_ENGINE_MODE_OFF(0b000 << 1)
> > +#define   ASPEED_ADC_ENGINE_MODE_STANDBY(0b001 << 1)
> > +#define   ASPEED_ADC_ENGINE_MODE_NORMAL (0b111 << 1)
> > +#define  ASPEED_ADC_ENGINE_EN   BIT(0)
> > +
> > +#define ASPEED_ADC_L_MASK   ((1 << 10) - 1)
> > +#define ASPEED_ADC_L(x) ((x) & ASPEED_ADC_L_MASK)
> > +#define ASPEED_ADC_H(x) (((x) >> 16) & ASPEED_ADC_L_MASK)
> > +#define ASPEED_ADC_LH_MASK  (ASPEED_ADC_L_MASK << 16 | 
> > ASPEED_ADC_L_MASK)
> > +
> > +static inline uint32_t update_channels(uint32_t current)
> > +{
> > +const uint32_t next = (current + 7) & 0x3ff;
> > +
> > +return (next << 16) | next;
> > +}
> > +
> > +static bool breaks_threshold(AspeedADCState *s, int ch_off)
> > +{
> > +const uint32_t a = ASPEED_ADC_L(s->channels[ch_off]);
> > +const uint32_t a_lower = ASPEED_ADC_L(s->bounds[2 * ch_off]);
> > +const uint32_t a_upper = ASPEED_ADC_H(s->bounds[2 * ch_off]);
> > +const uint32_t b = ASPEED_ADC_H(s->channels[ch_off]);
> > +const uint32_t b_lower = ASPEED_ADC_L(s->bounds[2 * ch_off + 1]);
> > +const uint32_t b_upper = ASPEED_ADC_H(s->bounds[2 * ch_off + 1]);
> > +
> > +return ((a < a_lower || a > a_upper)) ||
> > +   ((b < b_lower || b > b_upper));
> > +}
> > +
> > +static uint32_t read_channel_sample(AspeedADCState *s, int ch_off)
> > +{
> > +uint32_t ret;
> > +
> > +/* Poor man's sampling */
> > +ret = s->channels[ch_off];
> > +s->channels[ch_off] = update_channels(s->channels[ch_off]);
> > +
> > +if (breaks_threshold(s, ch_off)) {
> > +qemu_irq_raise(s->irq);
> > +}
> > +
> > +return ret;
> > +}
> > +
> > +#define TO_INDEX(addr, base) (((addr) - (base)) >> 2)
> > +
> > +static uint64_t aspeed_adc_read(void *opaque, hwaddr addr,
> > + unsigned int size)
> > +{
> > +AspeedADCState *s = ASPEED_ADC(opaque);
> > +uint64_t ret;
> > +
> > +switch (addr) {
> > +case 0x00:
> > +ret = s->engine_ctrl;
> > +break;
> > +case 0x04:
> > +ret = s->irq_ctrl;
> > +break;
> > +case 0x08:
> > +ret = s->vga_detect_ctrl;
> > +break;
> > +case 0x0c:
> > +ret = s->adc_clk_ctrl;
> > +break;
> > +case 0x10 ... 0x2e:
> > +ret = read_channel_sample(s, TO_INDEX(addr, 0x10));
> > +break;
> > +case 0x30 ... 0x6e:
> > +ret = s->bounds[TO_INDEX(addr, 0x30)];
> > +break;
> > +case 0x70 ... 0xae:
> > +ret = s->hysteresis[TO_INDEX(addr, 0x70)];
> > +break;
> > +case 0xc0:
> > +ret = s->irq_src;
> > +break;
> > +case 0xc4:
> > +ret = s->comp_trim;
> > +break;
> > +default:
> > +qemu_log_mask(LOG_UNIMP, "%s: addr: 0x%lx, size: %u\n", __func__, 
> > addr,
> > +size);
> > +

Re: [Qemu-devel] [RFC PATCH v2 3/4] spapr: Make h_register_process_table hcall flags global

2017-05-21 Thread Laurent Vivier
On 19/05/2017 07:40, Bharata B Rao wrote:
> The flags used in h_register_process_table hcall are needed in spapr.c
> and hence move them to a header file. While doing so, give them
> slightly specific names.
> 
> Signed-off-by: Bharata B Rao 
> Reviewed-by: David Gibson 
> ---
>  hw/ppc/spapr_hcall.c   | 31 ++-
>  include/hw/ppc/spapr.h | 10 ++
>  2 files changed, 24 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index cea5d99..3915e6f 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -921,13 +921,6 @@ static void spapr_check_setup_free_hpt(sPAPRMachineState 
> *spapr,
>  return;
>  }
>  
> -#define FLAGS_MASK  0x01FULL
> -#define FLAG_MODIFY 0x10
> -#define FLAG_REGISTER   0x08
> -#define FLAG_RADIX  0x04
> -#define FLAG_HASH_PROC_TBL  0x02
> -#define FLAG_GTSE   0x01
> -
...
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index e581c4a..588872a 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -685,4 +685,14 @@ int spapr_rng_populate_dt(void *fdt);
>  
>  void spapr_do_system_reset_on_cpu(CPUState *cs, run_on_cpu_data arg);
>  
> +/*
> + * Defines for flag value used in H_REGISTER_PROC_TBL hcall.
> + */
> +#define SPAPR_PROC_TABLE_MASK0x01FULL
> +#define SPAPR_PROC_TABLE_MODIFY  0x10
> +#define SPAPR_PROC_TABLE_REGISTER0x08
> +#define SPAPR_PROC_TABLE_RADIX   0x04
> +#define SPAPR_PROC_TABLE_HPT_PT  0x02
> +#define SPAPR_PROC_TABLE_GTSE0x01

I think it should be cleaner if you use 0x1fULL

Laurent



Re: [Qemu-devel] [PATCH v2 2/4] spapr: fix error reporting in xics_system_init()

2017-05-21 Thread Greg Kurz
On Sat, 20 May 2017 16:45:09 +1000
David Gibson  wrote:

> On Fri, May 19, 2017 at 12:32:12PM +0200, Greg Kurz wrote:
> > If the user explicitely asked for kernel-irqchip support and "xics-kvm"
> > initialization fails, we shouldn't fallback to emulated "xics" as we
> > do now. It is also awkward to print an error message when we have an
> > errp pointer argument.
> > 
> > Let's use the errp argument to report the error and let the caller decide.
> > This simplifies the code as we don't need a local Error * here.
> > 
> > Signed-off-by: Greg Kurz   
> 
> Concept looks good, but..
> 
> > ---
> > v2: - total rewrite
> > ---
> >  hw/ppc/spapr.c |   13 ++---
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 91f7434861a8..75e298b4c6be 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -128,18 +128,14 @@ static void xics_system_init(MachineState *machine, 
> > int nr_irqs, Error **errp)
> >  sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
> >  
> >  if (kvm_enabled()) {
> > -Error *err = NULL;
> > -
> >  if (machine_kernel_irqchip_allowed(machine) &&
> >  !xics_kvm_init(spapr, errp)) {
> >  spapr->icp_type = TYPE_KVM_ICP;
> > -spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, 
> > );
> > +spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, 
> > errp);  
> 
> I believe there are reasons you're not supposed to just pass an errp
> through to a subordinate function.  Instead you're supposed to have a
> local Error * and use error_propagate().
> 

You only need to have a local Error * if it is used to check the return status
of the function (ie, you cannot check *errp because errp could be NULL) as
described in error.h. This isn't the case here but...

> >  }
> >  if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
> > -error_reportf_err(err,
> > -  "kernel_irqchip requested but unavailable: 
> > ");
> > -} else {
> > -error_free(err);
> > +error_prepend(errp, "kernel_irqchip requested but unavailable: 
> > ");
> > +return;
> >  }
> >  }
> >  
> > @@ -147,6 +143,9 @@ static void xics_system_init(MachineState *machine, int 
> > nr_irqs, Error **errp)
> >  xics_spapr_init(spapr);
> >  spapr->icp_type = TYPE_ICP;
> >  spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, 
> > errp);
> > +if (!spapr->ics) {  
> 
> It would also be more standard to check the returned error, rather
> than the other result.
> 

... if you prefer to use a local Error *, I'll gladly do that. :)

> > +return;
> > +}
> >  }
> >  }
> >  
> >   
> 



pgprnKK2ei3Zq.pgp
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v3 06/11] hw/arm/exynos: Move DRAM initialization next boards

2017-05-21 Thread Krzysztof Kozlowski
Before QOM-ifying the Exynos4 SoC model, move the DRAM initialization
from exynos4210.c to exynos4_boards.c because DRAM is board specific,
not SoC.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/arm/exynos4210.c | 20 +-
 hw/arm/exynos4_boards.c | 50 ++---
 include/hw/arm/exynos4210.h |  5 +
 3 files changed, 45 insertions(+), 30 deletions(-)

diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 960f27e45a36..0da877f8db0a 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -160,13 +160,11 @@ static uint64_t exynos4210_calc_affinity(int cpu)
 return mp_affinity;
 }
 
-Exynos4210State *exynos4210_init(MemoryRegion *system_mem,
-unsigned long ram_size)
+Exynos4210State *exynos4210_init(MemoryRegion *system_mem)
 {
 int i, n;
 Exynos4210State *s = g_new(Exynos4210State, 1);
 qemu_irq gate_irq[EXYNOS4210_NCPUS][EXYNOS4210_IRQ_GATE_NINPUTS];
-unsigned long mem_size;
 DeviceState *dev;
 SysBusDevice *busdev;
 ObjectClass *cpu_oc;
@@ -299,22 +297,6 @@ Exynos4210State *exynos4210_init(MemoryRegion *system_mem,
 memory_region_add_subregion(system_mem, EXYNOS4210_IRAM_BASE_ADDR,
 >iram_mem);
 
-/* DRAM */
-mem_size = ram_size;
-if (mem_size > EXYNOS4210_DRAM_MAX_SIZE) {
-memory_region_init_ram(>dram1_mem, NULL, "exynos4210.dram1",
-mem_size - EXYNOS4210_DRAM_MAX_SIZE, _fatal);
-vmstate_register_ram_global(>dram1_mem);
-memory_region_add_subregion(system_mem, EXYNOS4210_DRAM1_BASE_ADDR,
->dram1_mem);
-mem_size = EXYNOS4210_DRAM_MAX_SIZE;
-}
-memory_region_init_ram(>dram0_mem, NULL, "exynos4210.dram0", mem_size,
-   _fatal);
-vmstate_register_ram_global(>dram0_mem);
-memory_region_add_subregion(system_mem, EXYNOS4210_DRAM0_BASE_ADDR,
->dram0_mem);
-
/* PMU.
 * The only reason of existence at the moment is that secondary CPU boot
 * loader uses PMU INFORM5 register as a holding pen.
diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
index 4853c318023c..6240b26839cd 100644
--- a/hw/arm/exynos4_boards.c
+++ b/hw/arm/exynos4_boards.c
@@ -22,6 +22,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qemu-common.h"
 #include "cpu.h"
@@ -56,6 +57,12 @@ typedef enum Exynos4BoardType {
 EXYNOS4_NUM_OF_BOARDS
 } Exynos4BoardType;
 
+typedef struct Exynos4BoardState {
+Exynos4210State *soc;
+MemoryRegion dram0_mem;
+MemoryRegion dram1_mem;
+} Exynos4BoardState;
+
 static int exynos4_board_id[EXYNOS4_NUM_OF_BOARDS] = {
 [EXYNOS4_BOARD_NURI] = 0xD33,
 [EXYNOS4_BOARD_SMDKC210] = 0xB16,
@@ -96,9 +103,34 @@ static void lan9215_init(uint32_t base, qemu_irq irq)
 }
 }
 
-static Exynos4210State *exynos4_boards_init_common(MachineState *machine,
-   Exynos4BoardType board_type)
+static void exynos4_boards_init_ram(Exynos4BoardState *s,
+MemoryRegion *system_mem,
+unsigned long ram_size)
+{
+unsigned long mem_size = ram_size;
+
+if (mem_size > EXYNOS4210_DRAM_MAX_SIZE) {
+memory_region_init_ram(>dram1_mem, NULL, "exynos4210.dram1",
+   mem_size - EXYNOS4210_DRAM_MAX_SIZE,
+   _fatal);
+vmstate_register_ram_global(>dram1_mem);
+memory_region_add_subregion(system_mem, EXYNOS4210_DRAM1_BASE_ADDR,
+>dram1_mem);
+mem_size = EXYNOS4210_DRAM_MAX_SIZE;
+}
+
+memory_region_init_ram(>dram0_mem, NULL, "exynos4210.dram0", mem_size,
+   _fatal);
+vmstate_register_ram_global(>dram0_mem);
+memory_region_add_subregion(system_mem, EXYNOS4210_DRAM0_BASE_ADDR,
+>dram0_mem);
+}
+
+static Exynos4BoardState *
+exynos4_boards_init_common(MachineState *machine,
+   Exynos4BoardType board_type)
 {
+Exynos4BoardState *s = g_new(Exynos4BoardState, 1);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 
 if (smp_cpus != EXYNOS4210_NCPUS && !qtest_enabled()) {
@@ -127,8 +159,12 @@ static Exynos4210State 
*exynos4_boards_init_common(MachineState *machine,
 machine->kernel_cmdline,
 machine->initrd_filename);
 
-return exynos4210_init(get_system_memory(),
-exynos4_board_ram_size[board_type]);
+exynos4_boards_init_ram(s, get_system_memory(),
+exynos4_board_ram_size[board_type]);
+
+s->soc = exynos4210_init(get_system_memory());
+
+return s;
 }
 
 static void nuri_init(MachineState *machine)
@@ -140,11 +176,11 @@ static void nuri_init(MachineState *machine)
 
 static void 

[Qemu-devel] [PATCH v3 10/11] hw/intc/exynos4210_gic: Constify array of combiner interrupts

2017-05-21 Thread Krzysztof Kozlowski
The static array of interrupt combiner mappings is not modified so it
can be made const for code safeness.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/intc/exynos4210_gic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index 9a2254f0b13c..d79b4cfcc7c9 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -116,7 +116,7 @@ enum ExtInt {
  * which is INTG16 in Internal Interrupt Combiner.
  */
 
-static uint32_t
+static const uint32_t
 combiner_grp_to_gic_id[64-EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 /* int combiner groups 16-19 */
 { }, { }, { }, { },
-- 
2.9.3




[Qemu-devel] [PATCH v3 04/11] hw/timer/exynos4210_mct: Cleanup indentation and empty new lines

2017-05-21 Thread Krzysztof Kozlowski
Statements under 'case' were in some places wrongly indented bringing
confusion and making the code less readable.  Remove also few unneeded
blank lines.  No functional changes.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Peter Maydell 
---
 hw/timer/exynos4210_mct.c | 45 -
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c
index 2404fb737ac4..ea5f99d9a41b 100644
--- a/hw/timer/exynos4210_mct.c
+++ b/hw/timer/exynos4210_mct.c
@@ -1016,9 +1016,9 @@ static uint64_t exynos4210_mct_read(void *opaque, hwaddr 
offset,
 
 case G_COMP_L(0): case G_COMP_L(1): case G_COMP_L(2): case G_COMP_L(3):
 case G_COMP_U(0): case G_COMP_U(1): case G_COMP_U(2): case G_COMP_U(3):
-index = GET_G_COMP_IDX(offset);
-shift = 8 * (offset & 0x4);
-value = UINT32_MAX & (s->g_timer.reg.comp[index] >> shift);
+index = GET_G_COMP_IDX(offset);
+shift = 8 * (offset & 0x4);
+value = UINT32_MAX & (s->g_timer.reg.comp[index] >> shift);
 break;
 
 case G_TCON:
@@ -1067,7 +1067,6 @@ static uint64_t exynos4210_mct_read(void *opaque, hwaddr 
offset,
 lt_i = GET_L_TIMER_IDX(offset);
 
 value = exynos4210_lfrc_get_count(>l_timer[lt_i]);
-
 break;
 
 case L0_TCON: case L1_TCON:
@@ -1153,23 +1152,23 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 
 case G_COMP_L(0): case G_COMP_L(1): case G_COMP_L(2): case G_COMP_L(3):
 case G_COMP_U(0): case G_COMP_U(1): case G_COMP_U(2): case G_COMP_U(3):
-index = GET_G_COMP_IDX(offset);
-shift = 8 * (offset & 0x4);
-s->g_timer.reg.comp[index] =
-(s->g_timer.reg.comp[index] &
-(((uint64_t)UINT32_MAX << 32) >> shift)) +
-(value << shift);
+index = GET_G_COMP_IDX(offset);
+shift = 8 * (offset & 0x4);
+s->g_timer.reg.comp[index] =
+(s->g_timer.reg.comp[index] &
+(((uint64_t)UINT32_MAX << 32) >> shift)) +
+(value << shift);
 
-DPRINTF("comparator %d write 0x%llx val << %d\n", index, value, shift);
+DPRINTF("comparator %d write 0x%llx val << %d\n", index, value, shift);
 
-if (offset & 0x4) {
-s->g_timer.reg.wstat |= G_WSTAT_COMP_U(index);
-} else {
-s->g_timer.reg.wstat |= G_WSTAT_COMP_L(index);
-}
+if (offset & 0x4) {
+s->g_timer.reg.wstat |= G_WSTAT_COMP_U(index);
+} else {
+s->g_timer.reg.wstat |= G_WSTAT_COMP_L(index);
+}
 
-exynos4210_gfrc_restart(s);
-break;
+exynos4210_gfrc_restart(s);
+break;
 
 case G_TCON:
 old_val = s->g_timer.reg.tcon;
@@ -1207,7 +1206,6 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 break;
 
 case G_INT_ENB:
-
 /* Raise IRQ if transition from disabled to enabled and CSTAT pending 
*/
 for (i = 0; i < MCT_GT_CMP_NUM; i++) {
 if ((value & G_INT_ENABLE(i)) > (s->g_timer.reg.tcon &
@@ -1288,7 +1286,6 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 break;
 
 case L0_TCNTB: case L1_TCNTB:
-
 lt_i = GET_L_TIMER_IDX(offset);
 index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
 
@@ -1316,7 +1313,6 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 break;
 
 case L0_ICNTB: case L1_ICNTB:
-
 lt_i = GET_L_TIMER_IDX(offset);
 index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
 
@@ -1353,13 +1349,12 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 if (icntb_max[lt_i] < value) {
 icntb_max[lt_i] = value;
 }
-DPRINTF("local timer[%d] ICNTB write %llx; max=%x, min=%x\n\n",
-lt_i, value, icntb_max[lt_i], icntb_min[lt_i]);
+DPRINTF("local timer[%d] ICNTB write %llx; max=%x, min=%x\n\n",
+lt_i, value, icntb_max[lt_i], icntb_min[lt_i]);
 #endif
-break;
+break;
 
 case L0_FRCNTB: case L1_FRCNTB:
-
 lt_i = GET_L_TIMER_IDX(offset);
 index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
 
-- 
2.9.3




[Qemu-devel] [PATCH v3 09/11] hw/arm/exynos: Use type define instead of hard-coded a9mpcore_priv string

2017-05-21 Thread Krzysztof Kozlowski
Use a define for a9mpcore_priv device type name instead of hard-coded
string.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/arm/exynos4210.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 034fc8be9d76..a9e221c5b7fe 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -26,6 +26,7 @@
 #include "qemu-common.h"
 #include "qemu/log.h"
 #include "cpu.h"
+#include "hw/cpu/a9mpcore.h"
 #include "hw/boards.h"
 #include "sysemu/sysemu.h"
 #include "hw/sysbus.h"
@@ -212,7 +213,7 @@ static void exynos4210_init(Object *obj)
 }
 
 /* Private memory region and Internal GIC */
-dev = qdev_create(NULL, "a9mpcore_priv");
+dev = qdev_create(NULL, TYPE_A9MPCORE_PRIV);
 qdev_prop_set_uint32(dev, "num-cpu", EXYNOS4210_NCPUS);
 qdev_init_nofail(dev);
 busdev = SYS_BUS_DEVICE(dev);
-- 
2.9.3




[Qemu-devel] [PATCH v3 03/11] hw/timer/exynos4210_mct: Fix checkpatch style errors

2017-05-21 Thread Krzysztof Kozlowski
Fix checkpatch errors:
1. ERROR: spaces required around that '+' (ctx:VxV)
2. ERROR: spaces required around that '&' (ctx:VxV)

No functional changes.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Peter Maydell 
---
 hw/timer/exynos4210_mct.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c
index a2ec3920f82e..2404fb737ac4 100644
--- a/hw/timer/exynos4210_mct.c
+++ b/hw/timer/exynos4210_mct.c
@@ -937,7 +937,7 @@ static void exynos4210_mct_update_freq(Exynos4210MCTState 
*s)
 {
 uint32_t freq = s->freq;
 s->freq = 2400 /
-((MCT_CFG_GET_PRESCALER(s->reg_mct_cfg)+1) *
+((MCT_CFG_GET_PRESCALER(s->reg_mct_cfg) + 1) *
 MCT_CFG_GET_DIVIDER(s->reg_mct_cfg));
 
 if (freq != s->freq) {
@@ -1162,7 +1162,7 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 
 DPRINTF("comparator %d write 0x%llx val << %d\n", index, value, shift);
 
-if (offset&0x4) {
+if (offset & 0x4) {
 s->g_timer.reg.wstat |= G_WSTAT_COMP_U(index);
 } else {
 s->g_timer.reg.wstat |= G_WSTAT_COMP_L(index);
-- 
2.9.3




[Qemu-devel] [PATCH v3 11/11] hw/misc/exynos4210_pmu: Add support for system poweroff

2017-05-21 Thread Krzysztof Kozlowski
On all Exynos-based boards, the system powers down itself by driving
PS_HOLD signal low - eight bit in PS_HOLD_CONTROL register of PMU.
Handle writing to respective PMU register to fix power off failure:

reboot: Power down
Unable to poweroff system
shutdown: 31 output lines suppressed due to ratelimiting
Kernel panic - not syncing: Attempted to kill init! exitcode=0x

CPU: 0 PID: 1 Comm: shutdown Not tainted 4.11.0-rc8 #846
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x88/0x9c)
[] (dump_stack) from [] (panic+0xdc/0x268)
[] (panic) from [] (do_exit+0xa90/0xab4)
[] (do_exit) from [] (SyS_reboot+0x164/0x1d0)
[] (SyS_reboot) from [] (ret_fast_syscall+0x0/0x3c)

Additionally the initial value of PS_HOLD has to be changed because
recent Linux kernel (v4.12-rc1) uses regmap cache for this access.
When the register is kept at reset value, the kernel will not issue a
write to it.  Usually the bootloader sets the eight bit of PS_HOLD high
so mimic its existence here.

Signed-off-by: Krzysztof Kozlowski 
---
 hw/misc/exynos4210_pmu.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/hw/misc/exynos4210_pmu.c b/hw/misc/exynos4210_pmu.c
index 63a8ccd35559..f3f96b1f4889 100644
--- a/hw/misc/exynos4210_pmu.c
+++ b/hw/misc/exynos4210_pmu.c
@@ -26,6 +26,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/sysbus.h"
+#include "sysemu/sysemu.h"
 
 #ifndef DEBUG_PMU
 #define DEBUG_PMU   0
@@ -350,7 +351,11 @@ static const Exynos4210PmuReg exynos4210_pmu_regs[] = {
 {"PAD_RETENTION_MMCB_OPTION", PAD_RETENTION_MMCB_OPTION, 0x},
 {"PAD_RETENTION_EBIA_OPTION", PAD_RETENTION_EBIA_OPTION, 0x},
 {"PAD_RETENTION_EBIB_OPTION", PAD_RETENTION_EBIB_OPTION, 0x},
-{"PS_HOLD_CONTROL", PS_HOLD_CONTROL, 0x5200},
+/*
+ * PS_HOLD_CONTROL: reset value and manually toggle high the DATA bit.
+ * DATA bit high, set usually by bootloader, keeps system on.
+ */
+{"PS_HOLD_CONTROL", PS_HOLD_CONTROL, 0x5200 | BIT(8)},
 {"XUSBXTI_CONFIGURATION", XUSBXTI_CONFIGURATION, 0x0001},
 {"XUSBXTI_STATUS", XUSBXTI_STATUS, 0x0001},
 {"XUSBXTI_DURATION", XUSBXTI_DURATION, 0xFFF0},
@@ -397,6 +402,12 @@ typedef struct Exynos4210PmuState {
 uint32_t reg[PMU_NUM_OF_REGISTERS];
 } Exynos4210PmuState;
 
+static void exynos4210_pmu_poweroff(void)
+{
+PRINT_DEBUG("QEMU PMU: PS_HOLD bit down, powering off\n");
+qemu_system_shutdown_request();
+}
+
 static uint64_t exynos4210_pmu_read(void *opaque, hwaddr offset,
 unsigned size)
 {
@@ -428,6 +439,13 @@ static void exynos4210_pmu_write(void *opaque, hwaddr 
offset,
 PRINT_DEBUG_EXTEND("%s <0x%04x> <- 0x%04x\n", reg_p->name,
 (uint32_t)offset, (uint32_t)val);
 s->reg[i] = val;
+if ((offset == PS_HOLD_CONTROL) && ((val & BIT(8)) == 0)) {
+/*
+ * We are interested only in setting data bit
+ * of PS_HOLD_CONTROL register to indicate power off request.
+ */
+exynos4210_pmu_poweroff();
+}
 return;
 }
 reg_p++;
-- 
2.9.3




[Qemu-devel] [PATCH v3 05/11] hw/timer/exynos4210_mct: Remove unused defines

2017-05-21 Thread Krzysztof Kozlowski
Remove defines not used anywhere.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Peter Maydell 
---
 hw/timer/exynos4210_mct.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c
index ea5f99d9a41b..e4ef4cfd3625 100644
--- a/hw/timer/exynos4210_mct.c
+++ b/hw/timer/exynos4210_mct.c
@@ -173,13 +173,10 @@ enum LocalTimerRegCntIndexes {
 L_REG_CNT_AMOUNT
 };
 
-#define MCT_NIRQ6
 #define MCT_SFR_SIZE0x444
 
 #define MCT_GT_CMP_NUM  4
 
-#define MCT_GT_MAX_VAL  UINT64_MAX
-
 #define MCT_GT_COUNTER_STEP 0x1ULL
 #define MCT_LT_COUNTER_STEP 0x1ULL
 #define MCT_LT_CNT_LOW_LIMIT0x100
-- 
2.9.3




[Qemu-devel] [PATCH v3 00/11] hw: arm: exynos: Bring up secondary CPU, QOM-ify Soc, other improvements

2017-05-21 Thread Krzysztof Kozlowski

Hi,

This is a collection of three already sent patchsets [1] [2] [3].


Changes since previous versions (v2):
1. Patch 11/11: set BIT(8) in PS_HOLD to fix power off on Linux v4.12-rc1.
2. Add reviewed-by tags.


The first patch in set is a bugfix which also exposes problem of cpuidle.
Kernel compiled with disabled CPU_IDLE works fine (two CPUs, coming SGIs,
poweroff working) but with CPU_IDLE it behaves bad due to low-power
mode (AFTR). This was discussed at [1].

There are no external dependencies.

Patchset is also available here:
https://github.com/krzk/qemu/tree/for-next/exynos4210-gic-qom-soc-poweroff-v3


Best regards,
Krzysztof

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg436902.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg01562.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg01553.html


Krzysztof Kozlowski (11):
  hw/intc/exynos4210_gic: Fix GIC memory mappings for secondary CPU
  hw/intc/exynos4210_gic: Use more meaningful name for local variable
  hw/timer/exynos4210_mct: Fix checkpatch style errors
  hw/timer/exynos4210_mct: Cleanup indentation and empty new lines
  hw/timer/exynos4210_mct: Remove unused defines
  hw/arm/exynos: Move DRAM initialization next boards
  hw/arm/exynos: Declare local variables in some order
  hw/arm/exynos: QOM-ify the SoC
  hw/arm/exynos: Use type define instead of hard-coded a9mpcore_priv
string
  hw/intc/exynos4210_gic: Constify array of combiner interrupts
  hw/misc/exynos4210_pmu: Add support for system poweroff

 hw/arm/exynos4210.c | 43 
 hw/arm/exynos4_boards.c | 53 +++--
 hw/intc/exynos4210_gic.c| 35 ++
 hw/misc/exynos4210_pmu.c| 20 -
 hw/timer/exynos4210_mct.c   | 50 ++
 include/hw/arm/exynos4210.h | 11 +-
 6 files changed, 132 insertions(+), 80 deletions(-)

-- 
2.9.3




[Qemu-devel] [PATCH v3 08/11] hw/arm/exynos: QOM-ify the SoC

2017-05-21 Thread Krzysztof Kozlowski
Convert the Exynos4210 SoC code into a QOM model which is a preferred
approach instead of directly initializing SoC-related devices from the
board file.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/arm/exynos4210.c | 18 +++---
 hw/arm/exynos4_boards.c |  9 ++---
 include/hw/arm/exynos4210.h |  8 ++--
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 27a7bf28a5a9..034fc8be9d76 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -160,9 +160,10 @@ static uint64_t exynos4210_calc_affinity(int cpu)
 return mp_affinity;
 }
 
-Exynos4210State *exynos4210_init(MemoryRegion *system_mem)
+static void exynos4210_init(Object *obj)
 {
-Exynos4210State *s = g_new(Exynos4210State, 1);
+MemoryRegion *system_mem = get_system_memory();
+Exynos4210State *s = EXYNOS4210(obj);
 qemu_irq gate_irq[EXYNOS4210_NCPUS][EXYNOS4210_IRQ_GATE_NINPUTS];
 SysBusDevice *busdev;
 ObjectClass *cpu_oc;
@@ -402,6 +403,17 @@ Exynos4210State *exynos4210_init(MemoryRegion *system_mem)
 
 sysbus_create_simple(TYPE_EXYNOS4210_EHCI, EXYNOS4210_EHCI_BASE_ADDR,
 s->irq_table[exynos4210_get_irq(28, 3)]);
+}
+
+static const TypeInfo exynos4210_type_info = {
+.name = TYPE_EXYNOS4210,
+.parent = TYPE_DEVICE,
+.instance_size = sizeof(Exynos4210State),
+.instance_init = exynos4210_init,
+};
 
-return s;
+static void exynos4210_register_types(void)
+{
+type_register_static(_type_info);
 }
+type_init(exynos4210_register_types)
diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
index 6240b26839cd..5e7c6b562ae2 100644
--- a/hw/arm/exynos4_boards.c
+++ b/hw/arm/exynos4_boards.c
@@ -58,7 +58,7 @@ typedef enum Exynos4BoardType {
 } Exynos4BoardType;
 
 typedef struct Exynos4BoardState {
-Exynos4210State *soc;
+Exynos4210State soc;
 MemoryRegion dram0_mem;
 MemoryRegion dram1_mem;
 } Exynos4BoardState;
@@ -162,7 +162,10 @@ exynos4_boards_init_common(MachineState *machine,
 exynos4_boards_init_ram(s, get_system_memory(),
 exynos4_board_ram_size[board_type]);
 
-s->soc = exynos4210_init(get_system_memory());
+object_initialize(>soc, sizeof(s->soc), TYPE_EXYNOS4210);
+object_property_add_child(OBJECT(machine), "soc", OBJECT(>soc),
+  _abort);
+object_property_set_bool(OBJECT(>soc), true, "realized", _fatal);
 
 return s;
 }
@@ -180,7 +183,7 @@ static void smdkc210_init(MachineState *machine)
   EXYNOS4_BOARD_SMDKC210);
 
 lan9215_init(SMDK_LAN9118_BASE_ADDR,
-qemu_irq_invert(s->soc->irq_table[exynos4210_get_irq(37, 1)]));
+qemu_irq_invert(s->soc.irq_table[exynos4210_get_irq(37, 1)]));
 arm_load_kernel(ARM_CPU(first_cpu), _board_binfo);
 }
 
diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 098a69ec73d3..116eae62756b 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -29,6 +29,10 @@
 #include "exec/memory.h"
 #include "target/arm/cpu-qom.h"
 
+#define TYPE_EXYNOS4210 "exynos4210"
+#define EXYNOS4210(obj) \
+OBJECT_CHECK(Exynos4210State, (obj), TYPE_EXYNOS4210)
+
 #define EXYNOS4210_NCPUS2
 
 #define EXYNOS4210_DRAM0_BASE_ADDR  0x4000
@@ -85,6 +89,8 @@ typedef struct Exynos4210Irq {
 } Exynos4210Irq;
 
 typedef struct Exynos4210State {
+DeviceState parent_obj;
+
 ARMCPU *cpu[EXYNOS4210_NCPUS];
 Exynos4210Irq irqs;
 qemu_irq *irq_table;
@@ -101,8 +107,6 @@ typedef struct Exynos4210State {
 void exynos4210_write_secondary(ARMCPU *cpu,
 const struct arm_boot_info *info);
 
-Exynos4210State *exynos4210_init(MemoryRegion *system_mem);
-
 /* Initialize exynos4210 IRQ subsystem stub */
 qemu_irq *exynos4210_init_irq(Exynos4210Irq *env);
 
-- 
2.9.3




[Qemu-devel] [PATCH v3 07/11] hw/arm/exynos: Declare local variables in some order

2017-05-21 Thread Krzysztof Kozlowski
Bring some more readability by declaring local function variables: first
initialized ones and then the rest (with reversed-christmas-tree order).

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/arm/exynos4210.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 0da877f8db0a..27a7bf28a5a9 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -162,12 +162,12 @@ static uint64_t exynos4210_calc_affinity(int cpu)
 
 Exynos4210State *exynos4210_init(MemoryRegion *system_mem)
 {
-int i, n;
 Exynos4210State *s = g_new(Exynos4210State, 1);
 qemu_irq gate_irq[EXYNOS4210_NCPUS][EXYNOS4210_IRQ_GATE_NINPUTS];
-DeviceState *dev;
 SysBusDevice *busdev;
 ObjectClass *cpu_oc;
+DeviceState *dev;
+int i, n;
 
 cpu_oc = cpu_class_by_name(TYPE_ARM_CPU, "cortex-a9");
 assert(cpu_oc);
-- 
2.9.3




[Qemu-devel] [PATCH v3 02/11] hw/intc/exynos4210_gic: Use more meaningful name for local variable

2017-05-21 Thread Krzysztof Kozlowski
There are to SysBusDevice variables in exynos4210_gic_realize()
function: one for the device itself and second for arm_gic device.  Add
a prefix "gic" to the second one so it will be easier to understand the
code.

While at it, put local uninitialized 'i' variable at the end, next to
other uninitialized ones.

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Peter Maydell 
---
 hw/intc/exynos4210_gic.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index 222cfd6c6387..9a2254f0b13c 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -297,21 +297,21 @@ static void exynos4210_gic_realize(DeviceState *dev, 
Error **errp)
 Exynos4210GicState *s = EXYNOS4210_GIC(dev);
 SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 Object *obj = OBJECT(dev);
-uint32_t i;
 const char cpu_prefix[] = "exynos4210-gic-alias_cpu";
 const char dist_prefix[] = "exynos4210-gic-alias_dist";
 char cpu_alias_name[sizeof(cpu_prefix) + 3];
 char dist_alias_name[sizeof(cpu_prefix) + 3];
-SysBusDevice *busdev;
+SysBusDevice *gicbusdev;
+uint32_t i;
 
 s->gic = qdev_create(NULL, "arm_gic");
 qdev_prop_set_uint32(s->gic, "num-cpu", s->num_cpu);
 qdev_prop_set_uint32(s->gic, "num-irq", EXYNOS4210_GIC_NIRQ);
 qdev_init_nofail(s->gic);
-busdev = SYS_BUS_DEVICE(s->gic);
+gicbusdev = SYS_BUS_DEVICE(s->gic);
 
 /* Pass through outbound IRQ lines from the GIC */
-sysbus_pass_irq(sbd, busdev);
+sysbus_pass_irq(sbd, gicbusdev);
 
 /* Pass through inbound GPIO lines to the GIC */
 qdev_init_gpio_in(dev, exynos4210_gic_set_irq,
@@ -322,7 +322,7 @@ static void exynos4210_gic_realize(DeviceState *dev, Error 
**errp)
 sprintf(cpu_alias_name, "%s%x", cpu_prefix, i);
 memory_region_init_alias(>cpu_alias[i], obj,
  cpu_alias_name,
- sysbus_mmio_get_region(busdev, 1),
+ sysbus_mmio_get_region(gicbusdev, 1),
  0,
  EXYNOS4210_GIC_CPU_REGION_SIZE);
 memory_region_add_subregion(>cpu_container,
@@ -332,7 +332,7 @@ static void exynos4210_gic_realize(DeviceState *dev, Error 
**errp)
 sprintf(dist_alias_name, "%s%x", dist_prefix, i);
 memory_region_init_alias(>dist_alias[i], obj,
  dist_alias_name,
- sysbus_mmio_get_region(busdev, 0),
+ sysbus_mmio_get_region(gicbusdev, 0),
  0,
  EXYNOS4210_GIC_DIST_REGION_SIZE);
 memory_region_add_subregion(>dist_container,
-- 
2.9.3




[Qemu-devel] [PATCH v3 01/11] hw/intc/exynos4210_gic: Fix GIC memory mappings for secondary CPU

2017-05-21 Thread Krzysztof Kozlowski
Recent Linux kernel (tested next-20170224) was complaining about missing
GIC mask and was unable to bring up secondary CPU:

[    0.00] NR_IRQS:16 nr_irqs:16 16
[0.00] GIC CPU mask not found - kernel will fail to boot.
...
[    0.400492] smp: Bringing up secondary CPUs ...
[    1.413184] CPU1: failed to boot: -110
[    1.423981] smp: Brought up 1 node, 1 CPU

In its instance_init() call, the Exynos GIC driver was setting GIC
memory mappings for each CPU, from 1 up to "num-cpu" property.  The
Exynos4210 machine init call on the other hand, first created Exynos GIC
device and then set the "num-cpu" property which was too late.  The init
already happened with default "num-cpu" value of 1 thus GIC mappings
were created only for the first CPU.

Split the Exynos GIC init code into realize function so the code will
see updated "num-cpu" property.  This fixes the warning and brings
second CPU:
[0.435780] CPU1: thread -1, cpu 1, socket 9, mpidr 8901
[0.451838] smp: Brought up 1 node, 2 CPUs

Additionally this fixes missing Software Generated Interrupts (except
CPU wakeup, non of SGIs are coming) which are needed for example for IRQ
work.  Lack of IRQ work causes kernel to hang during system power off
because cpufreq_dbs_governor_stop() waits for completion with
irq_work_sync().

Signed-off-by: Krzysztof Kozlowski 
Reviewed-by: Peter Maydell 
---
 hw/intc/exynos4210_gic.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index 2a55817b7660..222cfd6c6387 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -283,9 +283,20 @@ static void exynos4210_gic_set_irq(void *opaque, int irq, 
int level)
 
 static void exynos4210_gic_init(Object *obj)
 {
-DeviceState *dev = DEVICE(obj);
 Exynos4210GicState *s = EXYNOS4210_GIC(obj);
-SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+
+memory_region_init(>cpu_container, obj, "exynos4210-cpu-container",
+EXYNOS4210_EXT_GIC_CPU_REGION_SIZE);
+memory_region_init(>dist_container, obj, "exynos4210-dist-container",
+EXYNOS4210_EXT_GIC_DIST_REGION_SIZE);
+
+}
+
+static void exynos4210_gic_realize(DeviceState *dev, Error **errp)
+{
+Exynos4210GicState *s = EXYNOS4210_GIC(dev);
+SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+Object *obj = OBJECT(dev);
 uint32_t i;
 const char cpu_prefix[] = "exynos4210-gic-alias_cpu";
 const char dist_prefix[] = "exynos4210-gic-alias_dist";
@@ -306,11 +317,6 @@ static void exynos4210_gic_init(Object *obj)
 qdev_init_gpio_in(dev, exynos4210_gic_set_irq,
   EXYNOS4210_GIC_NIRQ - 32);
 
-memory_region_init(>cpu_container, obj, "exynos4210-cpu-container",
-EXYNOS4210_EXT_GIC_CPU_REGION_SIZE);
-memory_region_init(>dist_container, obj, "exynos4210-dist-container",
-EXYNOS4210_EXT_GIC_DIST_REGION_SIZE);
-
 for (i = 0; i < s->num_cpu; i++) {
 /* Map CPU interface per SMP Core */
 sprintf(cpu_alias_name, "%s%x", cpu_prefix, i);
@@ -346,6 +352,7 @@ static void exynos4210_gic_class_init(ObjectClass *klass, 
void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 
+dc->realize = exynos4210_gic_realize;
 dc->props = exynos4210_gic_properties;
 }
 
-- 
2.9.3




[Qemu-devel] Precopy VM Live Migration

2017-05-21 Thread ali saeedi
Hello
I have understood from code that In Precopy, there is no Priority for
sending dirty pages and qemu  sends a page as soon as it is found in a
block(by linear searching in blocks). in other word, qemu does not consider
to pages that dirties more than others until sends them later than other
pages.
is my understanding true?
thanks a lot


[Qemu-devel] [PATCH 1/1] qcow2: handle cluster leak happening with a guest TRIM command

2017-05-21 Thread Denis V. Lunev
  qemu-img create -f qcow2 1.img 64G
  qemu-io -c "write -P 0x32 0 64k" 1.img
results in
  324 -rw-r--r-- 1 den den 393216 May 21 16:48 1.img
Subsequent
  qemu-io -c "write -z 0 64k" 1.img
  qemu-io -c "write -P 0x32 0 64k" 1.img
results in
  388 -rw-r--r-- 1 den den 458752 May 21 16:50 1.img
which looks like we have 1 cluster leaked.

Indeed, qcow2_co_pwrite_zeroes calls qcow2_zero_clusters/zero_single_l2,
which does not update refcount for the host cluster and keep the offset
as used. Later on handle_copied() does not take into account
QCOW2_CLUSTER_ZERO type of the cluster.

For now we comes into a very bad situation after qcow2_zero_cluster.
We have a hole in the host file and we have the offset allocated for
that guest cluster. Now there are 2 options:
1) allocate new offset once the write will come into this guest
   cluster (actually happens, but original cluster offset is leaked)
2) re-use host offset, i.e. fix handle_copied() to allow to reuse
   offset not only for QCOW2_CLUSTER_NORMAL but for QCOW2_CLUSTER_ZERO
   too
Option 2) seems worse to me as in this case we can easily have host
fragmentation in that cluster if writes will come in small pieces.

This is not a very big deal if we have filesystem with PUCH_HOLE support,
but without this feature the cluster is actually leaked forever.

The patch replaces zero_single_l2 with discard_single_l2 and removes
now unused zero_single_l2 to fix the situation.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/qcow2-cluster.c | 46 ++
 1 file changed, 2 insertions(+), 44 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 100398c..1e53a7c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1548,49 +1548,6 @@ fail:
 return ret;
 }
 
-/*
- * This zeroes as many clusters of nb_clusters as possible at once (i.e.
- * all clusters in the same L2 table) and returns the number of zeroed
- * clusters.
- */
-static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
-  uint64_t nb_clusters, int flags)
-{
-BDRVQcow2State *s = bs->opaque;
-uint64_t *l2_table;
-int l2_index;
-int ret;
-int i;
-
-ret = get_cluster_table(bs, offset, _table, _index);
-if (ret < 0) {
-return ret;
-}
-
-/* Limit nb_clusters to one L2 table */
-nb_clusters = MIN(nb_clusters, s->l2_size - l2_index);
-assert(nb_clusters <= INT_MAX);
-
-for (i = 0; i < nb_clusters; i++) {
-uint64_t old_offset;
-
-old_offset = be64_to_cpu(l2_table[l2_index + i]);
-
-/* Update L2 entries */
-qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
-if (old_offset & QCOW_OFLAG_COMPRESSED || flags & BDRV_REQ_MAY_UNMAP) {
-l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
-qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
-} else {
-l2_table[l2_index + i] |= cpu_to_be64(QCOW_OFLAG_ZERO);
-}
-}
-
-qcow2_cache_put(bs, s->l2_table_cache, (void **) _table);
-
-return nb_clusters;
-}
-
 int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
 int flags)
 {
@@ -1609,7 +1566,8 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t 
offset, int nb_sectors,
 s->cache_discards = true;
 
 while (nb_clusters > 0) {
-ret = zero_single_l2(bs, offset, nb_clusters, flags);
+ret = discard_single_l2(bs, offset, nb_clusters,
+QCOW2_DISCARD_REQUEST, false);
 if (ret < 0) {
 goto fail;
 }
-- 
2.7.4




[Qemu-devel] [Bug 877498] Re: qemu does not pass sector size from physical devices to virtual devices

2017-05-21 Thread Paolo Bonzini
Recent QEMU is able to do read-modify-write operations when using 512
byte logical sectors in the VM and 4K logical sectors in the host.

** Changed in: qemu
   Status: Expired => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/877498

Title:
  qemu does not pass sector size from physical devices to virtual
  devices

Status in QEMU:
  Fix Released

Bug description:
  When passing a physical disk (i.e. a multipathed fcal volume in my
  case) with a 4k sector size as raw image to qemu (-drive
  file=/dev/mapper/hartebeest-sys,if=none,id=drive-virtio-
  disk0,boot=on,format=raw), the resulting virtual device has a sector
  size of 512b, rendering the partition table unusable.

  Versions used: QEMU 0.12.5 (qemu-kvm-0.12.5) from debian unstable

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/877498/+subscriptions



Re: [Qemu-devel] [PATCH v2 00/11] Preparation for block job mutex

2017-05-21 Thread Paolo Bonzini
Ping?

On 08/05/2017 16:12, Paolo Bonzini wrote:
> These are a bunch of cleanups and patches extracted from the AioContext
> lock removal series.  As a general theme, the patches reorganize
> blockjob.c to follow the blockjob.h/blockjob_int.h separation more
> closely.  For this reason, a lot of the patches are just moving functions
> around.
> 
> The blockjob.h/blockjob_int.h split later will correspond to different
> locking rules, but the patches are independent from this change, and
> can be applied/reviewed separately.
> 
> There is no code change from v1, but all patches now have Reviewed-by
> from at least one of John and Stefan.
> 
> Thanks,
> 
> Paolo
> 
> 
> Paolo Bonzini (11):
>   blockjob: remove unnecessary check
>   blockjob: remove iostatus_reset callback
>   blockjob: introduce block_job_early_fail
>   blockjob: introduce block_job_pause/resume_all
>   blockjob: separate monitor and blockjob APIs
>   blockjob: move iostatus reset inside block_job_user_resume
>   blockjob: introduce block_job_cancel_async, check iostatus invariants
>   blockjob: group BlockJob transaction functions together
>   blockjob: strengthen a bit test-blockjob-txn
>   blockjob: reorganize block_job_completed_txn_abort
>   blockjob: use deferred_to_main_loop to indicate the coroutine has
> ended
> 
>  block/backup.c   |   2 +-
>  block/commit.c   |   2 +-
>  block/io.c   |  19 +-
>  block/mirror.c   |   2 +-
>  blockdev.c   |   1 -
>  blockjob.c   | 900 
> +++
>  include/block/blockjob.h |  16 -
>  include/block/blockjob_int.h |  27 +-
>  tests/test-blockjob-txn.c|   7 +-
>  tests/test-blockjob.c|  10 +-
>  10 files changed, 522 insertions(+), 464 deletions(-)
> 



[Qemu-devel] get_queued_page

2017-05-21 Thread ali saeedi
Hello
Is "get_queued_page" function (which is called in line 1383 of ram.c) only
used in postcopy mode and has no usage in precopy mode?
thanks a lot


Re: [Qemu-devel] [PATCH v12 1/2] hw/ppc/spapr_events.c: removing 'exception' from sPAPREventLogEntry

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 11:27:49AM -0300, Daniel Henrique Barboza wrote:
> Currenty we do not have any RTAS event that is reported by the
> event-scan interface. The existing events, RTAS_LOG_TYPE_EPOW and
> RTAS_LOG_TYPE_HOTPLUG, are being reported by the check-exception
> interface and, as such, marked as 'exception=true'.
> 
> Commit 79853e18d9, 'spapr_events: event-scan RTAS interface', added
> the event_scan interface because the guest kernel requires it to
> initialize other required interfaces. It is acting since then as
> a stub because no events that would be reported by it were added
> since then. However, the existence of the 'exception' boolean adds
> an unnecessary load in the future migration of the pending_events,
> sPAPREventLogEntry QTAILQ that hosts the pending RTAS events.
> 
> To make the code cleaner and ease the future migration changes, this
> patch makes the following changes:
> 
> - remove the 'exception' boolean that filter these events. There is
> nothing to filter since all events are reported by check-exception;
> 
> - functions rtas_event_log_queue, rtas_event_log_dequeue and
> rtas_event_log_contains don't receive the 'exception' boolean
> as parameter;
> 
> - event_scan function was simplified. It was calling
> 'rtas_event_log_dequeue(mask, false)' that was always returning
> 'NULL' because we have no events that are created with
> exception=false, thus in the end it would execute a jump to
> 'out_no_events' all the time. The function now assumes that
> this will always be the case and all the remaining logic were
> deleted.
> 
> In the future, when or if we add new RTAS events that should
> be reported with the event_scan interface, we can refer to
> the changes made in this patch to add the event_scan logic
> back.
> 
> Signed-off-by: Daniel Henrique Barboza 

Applied to ppc-for-2.10.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 1/4] spapr_cpu_core: drop reference on ICP object during CPU realization

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 12:32:04PM +0200, Greg Kurz wrote:
> When a piece of code allocates an object, it implicitely gets a reference
> on it. If it then makes that object a child property of another object, it
> should drop its own reference at some point otherwise the child object can
> never be finalized. The current code hence leaks one ICP object per CPU
> when hot-removing a core.
> 
> Failing to add a newly allocated ICP object to the CPU is a bug. While here,
> let's ensure QEMU aborts if this ever happens.
> 
> Signed-off-by: Greg Kurz 

Applied to ppc-for-2.10.

> ---
>  hw/ppc/spapr_cpu_core.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 1df1404ea52d..ff7058ecc00e 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -143,7 +143,8 @@ static void spapr_cpu_core_realize_child(Object *child, 
> Error **errp)
>  Object *obj;
>  
>  obj = object_new(spapr->icp_type);
> -object_property_add_child(OBJECT(cpu), "icp", obj, NULL);
> +object_property_add_child(OBJECT(cpu), "icp", obj, _abort);
> +object_unref(obj);
>  object_property_add_const_link(obj, "xics", OBJECT(spapr), _abort);
>  object_property_set_bool(obj, true, "realized", _err);
>  if (local_err) {
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 2/4] spapr: fix error reporting in xics_system_init()

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 12:32:12PM +0200, Greg Kurz wrote:
> If the user explicitely asked for kernel-irqchip support and "xics-kvm"
> initialization fails, we shouldn't fallback to emulated "xics" as we
> do now. It is also awkward to print an error message when we have an
> errp pointer argument.
> 
> Let's use the errp argument to report the error and let the caller decide.
> This simplifies the code as we don't need a local Error * here.
> 
> Signed-off-by: Greg Kurz 

Concept looks good, but..

> ---
> v2: - total rewrite
> ---
>  hw/ppc/spapr.c |   13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 91f7434861a8..75e298b4c6be 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -128,18 +128,14 @@ static void xics_system_init(MachineState *machine, int 
> nr_irqs, Error **errp)
>  sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
>  
>  if (kvm_enabled()) {
> -Error *err = NULL;
> -
>  if (machine_kernel_irqchip_allowed(machine) &&
>  !xics_kvm_init(spapr, errp)) {
>  spapr->icp_type = TYPE_KVM_ICP;
> -spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, 
> );
> +spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs, 
> errp);

I believe there are reasons you're not supposed to just pass an errp
through to a subordinate function.  Instead you're supposed to have a
local Error * and use error_propagate().

>  }
>  if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
> -error_reportf_err(err,
> -  "kernel_irqchip requested but unavailable: ");
> -} else {
> -error_free(err);
> +error_prepend(errp, "kernel_irqchip requested but unavailable: 
> ");
> +return;
>  }
>  }
>  
> @@ -147,6 +143,9 @@ static void xics_system_init(MachineState *machine, int 
> nr_irqs, Error **errp)
>  xics_spapr_init(spapr);
>  spapr->icp_type = TYPE_ICP;
>  spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs, errp);
> +if (!spapr->ics) {

It would also be more standard to check the returned error, rather
than the other result.

> +return;
> +}
>  }
>  }
>  
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v12 2/2] migration: spapr: migrate pending_events of spapr state

2017-05-21 Thread David Gibson
On Fri, May 19, 2017 at 11:27:50AM -0300, Daniel Henrique Barboza wrote:
> From: Jianjun Duan 
> 
> In racing situations between hotplug events and migration operation,
> a rtas hotplug event could have not yet be delivered to the source
> guest when migration is started. In this case the pending_events of
> spapr state need be transmitted to the target so that the hotplug
> event can be finished on the target.
> 
> All the different fields of the events are encoded as defined by
> PAPR. We can migrate them as a binary stream inside VBUFFER without
> any concerns about data padding or endianess.
> 
> pending_events is put in a subsection in the spapr state VMSD to make
> sure migration across different versions is not broken.
> 
> Signed-off-by: Jianjun Duan 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr.c | 32 
>  hw/ppc/spapr_events.c  | 12 
>  include/hw/ppc/spapr.h |  3 ++-
>  3 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0980d73..5afd328 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1444,6 +1444,37 @@ static bool version_before_3(void *opaque, int 
> version_id)
>  return version_id < 3;
>  }
>  
> +static bool spapr_pending_events_needed(void *opaque)
> +{
> +sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
> +return !QTAILQ_EMPTY(>pending_events);
> +}
> +
> +static const VMStateDescription vmstate_spapr_event_entry = {
> +.name = "spapr_event_log_entry",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_INT32(log_type, sPAPREventLogEntry),
> +VMSTATE_UINT32(data_size, sPAPREventLogEntry),
> +VMSTATE_VBUFFER_ALLOC_UINT32(data, sPAPREventLogEntry, 0,
> + NULL, data_size),
> +VMSTATE_END_OF_LIST()
> +},
> +};
> +
> +static const VMStateDescription vmstate_spapr_pending_events = {
> +.name = "spapr_pending_events",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = spapr_pending_events_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_QTAILQ_V(pending_events, sPAPRMachineState, 1,
> + vmstate_spapr_event_entry, sPAPREventLogEntry, 
> next),
> +VMSTATE_END_OF_LIST()
> +},
> +};
> +
>  static bool spapr_ov5_cas_needed(void *opaque)
>  {
>  sPAPRMachineState *spapr = opaque;
> @@ -1542,6 +1573,7 @@ static const VMStateDescription vmstate_spapr = {
>  .subsections = (const VMStateDescription*[]) {
>  _spapr_ov5_cas,
>  _spapr_patb_entry,
> +_spapr_pending_events,
>  NULL
>  }
>  };
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 73e2a18..a509c46 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -350,6 +350,18 @@ static void rtas_event_log_queue(int log_type, void 
> *data)
>  g_assert(data);
>  entry->log_type = log_type;
>  entry->data = data;
> +
> +switch (log_type) {
> +case RTAS_LOG_TYPE_EPOW:
> +entry->data_size = sizeof(struct epow_log_full);
> +break;
> +case RTAS_LOG_TYPE_HOTPLUG:
> +entry->data_size = sizeof(struct hp_log_full);
> +break;
> +default:
> +g_assert(false);
> +}

You can do better than this.  The header of the event has an
'extended_length' field which is set (correctly) by the existing
functions creating the events.  You'll need to add the base/common
header size back in, but that's easy.

That's how the guest determines the size of the events once they
arrive, and that's what you should use here as well.

> +
>  QTAILQ_INSERT_TAIL(>pending_events, entry, next);
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 02239a5..0554e11 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -597,8 +597,9 @@ struct sPAPRTCETable {
>  sPAPRTCETable *spapr_tce_find_by_liobn(target_ulong liobn);
>  
>  struct sPAPREventLogEntry {
> -int log_type;
> +int32_t log_type;
>  void *data;
> +uint32_t data_size;
>  QTAILQ_ENTRY(sPAPREventLogEntry) next;
>  };
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v3 2/3] arm64: kvm: inject SError with virtual syndrome

2017-05-21 Thread gengdongjiu
2017-05-13 1:24 GMT+08:00, James Morse :
> Hi gengdongjiu,
>
> On 05/05/17 14:19, gengdongjiu wrote:
>> On 2017/5/2 23:37, James Morse wrote:
>> > ... I think you expect an SError to arrive at EL2 and have its ESR
>> > recorded in
>> > vcpu->arch.fault.vsesr_el2. Some time later KVM decides to inject an
>> > SError into
>> > the guest, and this ESR is reused...
>> >
>> > We shouldn't do this. Qemu/kvmtool may want to inject a virtual-SError
>> > that
>> > never started as a physical-SError. Qemu/kvmtool may choose to notify
>> > the guest
>> > of RAS events via another mechanism, or not at all.
>> >
>> > KVM should not give the guest an ESR value of its choice. For SError the
>> > ESR
>> > describes whether the error is corrected, correctable or fatal.
>> > Qemu/kvmtool
>> > must choose this.
>
>> Below is my previous solution:
>> For the SError, CPU will firstly trap to EL3 firmware and records the
>> syndrome to ESR_EL3.
>> Before jumping to El2 hypervisors, it will copy the esr_el3 to esr_el2.
>
> (Copying the ESR value won't always be the right thing to do.)
  James,  thanks for your kindly explanation, understand you  thought.

>
>
>> so in order to pass this syndrome to vsesr_el2, using the esr_el2 value to
>> assign it.
>
>> If Qemu/kvmtool chooses the ESR value and ESR only describes whether the
>> error is corrected/correctable/fatal,
>> whether the information is not enough for the guest?
>
> So the API should specify which of these three severities to use? I think
> this
> is too specific. The API should be useful for anything the VSE/VSESR
> hardware
> can do.
>
> VSESR_EL2 is described in the RAS spec: 4.4.12 [0], its a 64 bit register.
> I
> think we should let Qemu/kvmtool specify any 64bit value here, but KVM
> should
> reject values that try to set bits described as RES0.
>
> This would let Qemu/kvmtool specify any SError ISS, either setting
> ESR_ELx.IDS
> and some virtual-machine specific value, or encoding any severity in AET
> and
> choosing the DFSC/EA bits appropriately.
   it sounds reasonable

>
>
>>> > I think we need an API that allows Qemu/kvmtool to inject SError into a
>>> > guest,
>>> > but that isn't quite what you have here.
>
>> KVM provides APIs to inject the SError, Qemu/kvmtool call the API though
>> IOCTL, may be OK?
>
> (just the one API call), yes.
   Ok, have added.

>
>
> Thanks,
>
> James
>
> [0]
> https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf
>
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>