Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-09 Thread David Gibson
On Mon, Jul 08, 2024 at 04:59:30PM +0100, Peter Maydell wrote:
> On Mon, 8 Jul 2024 at 08:49, Nicholas Piggin  wrote:
> >
> > On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote:
> > > On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote:
> > > > On Fri, 5 Jul 2024 at 06:13, David Gibson  
> > > > wrote:
> > > > > Huh.. well I'm getting different impressions of what the problem
> > > > > actually is from what I initially read versus Peter Maydell's
> > > > > comments, so I don't really know what to think.
> > > > >
> > > > > If it's just the load then fdt32_ld() etc. already exist.  Or is it
> > > > > really such a hot path that unconditionally handling unaligned
> > > > > accesses isn't tenable?
> > > >
> > > > The specific problem here is that the code as written tries to
> > > > cast a not-aligned-enough pointer to uint64_t* to do the load,
> > > > which is UB.
> > >
> > > Ah... and I'm assuming it's the cast itself which triggers the UB, not
> > > just dereferencing it.
> >
> > Oh it's just the cast itself that is UB? Looks like that's true.
> > Interesting gcc and clang don't flag it, I guess they care about
> > warning on practical breakage first.
> 
> Er, I was speaking a bit vaguely there, don't take my word for
> it without going and looking at the text of the C standard.

Sure.

> What I *meant* was that the practical problem here is that we
> really do dereference a pointer for a 64-bit load when the
> pointer isn't necessarily 64-bit-aligned.

From the qemu point of view, yes.  And theoretically, the fix is easy,
since libfdt provides fdt32_ld() etc. for exactly this use case.  But..

> As it happens, C99 says that it is the cast that is UB:
> section 6.3.2.3 para 7 says:
>  "A pointer to an object or incomplete type may be converted to
>   a pointer to a different object or incomplete type. If the
>   resulting pointer is not correctly aligned for the pointed-to
>   type, the behavior is undefined. Otherwise, when converted back
>   again, the result shall compare equal to the original pointer."

.. this makes fdt32_ld() etc. unusable by design.

> Presumably this is envisaging the possibility of a pointer cast
> being a destructive operation somehow, such that e.g. a uint64_t*
> can only represent 64-bit-aligned values. But I bet QEMU does
> a lot of casting pointers around that might fall foul of this
> rule, so I'm not particularly worried about trying to clean up
> that kind of thing (until/unless analysers start warning about
> it, in which case we have a specific set of things to clean up).

Fair enough from the qemu point of view.  However, this unusable by
design interface was written by me as part of a library I maintain, so
it certainly worries *me*.

> What I care about from the point of view of this patch
> is that we fix the actually-broken-on-some-real-hardware problem
> of doing the load as a misaligned access. My vote would be for
> "take Akihiko's patch as-is, rather than gating fixing the bug
> on deciding on an improvement/change to the fdt API or our
> wrappers of it".
> 
> thanks
> -- PMM
> 

-- 
David Gibson (he or they)   | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-09 Thread David Gibson
On Mon, Jul 08, 2024 at 05:49:32PM +1000, Nicholas Piggin wrote:
> On Sun Jul 7, 2024 at 9:46 AM AEST, David Gibson wrote:
> > On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote:
> > > On Fri, 5 Jul 2024 at 06:13, David Gibson  
> > > wrote:
> > > >
> > > > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> > > > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > > > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Akihiko Odaki 
> > > > > > > > > > > ---
> > > > > > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > > > > > --- a/hw/ppc/vof.c
> > > > > > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > > > > > @@ -646,7 +646,7 @@ static void 
> > > > > > > > > > > vof_dt_memory_available(void *fdt, GArray *claimed, 
> > > > > > > > > > > uint64_t base)
> > > > > > > > > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", );
> > > > > > > > > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * 
> > > > > > > > > > > (ac + sc));
> > > > > > > > > > >  if (sc == 2) {
> > > > > > > > > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > > > > > > > > sizeof(uint32_t) * ac));
> > > > > > > > > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) 
> > > > > > > > > > > * ac);
> > > > > > > > > > >  } else {
> > > > > > > > > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > > > > > > > > sizeof(uint32_t) * ac));
> > > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > I did wonder if there was a better way to do what this is 
> > > > > > > > > > doing,
> > > > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > > > > > provide one.
> > > > > > > > >
> > > > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() 
> > > > > > > > > etc.), but
> > > > > > > > > not an automatic aligned-or-unaligned helper.   Maybe we 
> > > > > > > > > should add that?
> > > > > > > >
> > > > > > > > fdt32_ld() and friends only do the "load from this bit of 
> > > > > > > > memory"
> > > > > > > > part, which we already have QEMU utility functions for (and 
> > > > > > > > which
> > > > > > > > are this patch uses).
> > > > > > > >
> > > > > > > > This particular bit of code is dealing with an fdt property 
> > > > > > > > ("memory")
> > > > > > > > that is an array of (address, size) tuples where address and 
> > > > > > > > size
> > > > > > > > can independently be either 32 or 64 bits, and it wants the
> > > > &g

Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-06 Thread David Gibson
On Sat, Jul 06, 2024 at 11:37:08AM +0100, Peter Maydell wrote:
> On Fri, 5 Jul 2024 at 06:13, David Gibson  wrote:
> >
> > On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> > > On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > > > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > > > On Sat, 29 Jun 2024 at 04:17, David Gibson 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Akihiko Odaki 
> > > > > > > > > ---
> > > > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > > > --- a/hw/ppc/vof.c
> > > > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void 
> > > > > > > > > *fdt, GArray *claimed, uint64_t base)
> > > > > > > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", );
> > > > > > > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + 
> > > > > > > > > sc));
> > > > > > > > >  if (sc == 2) {
> > > > > > > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > > > > > > sizeof(uint32_t) * ac));
> > > > > > > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * 
> > > > > > > > > ac);
> > > > > > > > >  } else {
> > > > > > > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > > > > > > sizeof(uint32_t) * ac));
> > > > > > > > >  }
> > > > > > > >
> > > > > > > > I did wonder if there was a better way to do what this is doing,
> > > > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > > > provide one.
> > > > > > >
> > > > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), 
> > > > > > > but
> > > > > > > not an automatic aligned-or-unaligned helper.   Maybe we should 
> > > > > > > add that?
> > > > > >
> > > > > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > > > > part, which we already have QEMU utility functions for (and which
> > > > > > are this patch uses).
> > > > > >
> > > > > > This particular bit of code is dealing with an fdt property 
> > > > > > ("memory")
> > > > > > that is an array of (address, size) tuples where address and size
> > > > > > can independently be either 32 or 64 bits, and it wants the
> > > > > > size value of tuple 0. So the missing functionality is something at
> > > > > > a higher level than fdt32_ld() which would let you say "give me
> > > > > > tuple N field X" with some way to specify the tuple layout. (Which
> > > > > > is an awkward kind of API to write in C.)
> > > > > >
> > > > > > Slightly less general, but for this case we could perhaps have
> > > > > > something like the getprop equivalent of 
> > > > > > qemu_fdt_setprop_sized_cells():
> > > > > >
> > > > > >   uint64_t value_array[2];
> > > > > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", 
> > > > > > _array,
> > > > > >ac, sc);
> > > > > >   /*
> >

Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-04 Thread David Gibson
On Fri, Jul 05, 2024 at 02:40:19PM +1000, Nicholas Piggin wrote:
> On Fri Jul 5, 2024 at 11:41 AM AEST, David Gibson wrote:
> > On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> > > On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > > > On Sat, 29 Jun 2024 at 04:17, David Gibson 
> > > >  wrote:
> > > > >
> > > > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki 
> > > > > >  wrote:
> > > > > > >
> > > > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > > > >
> > > > > > > Signed-off-by: Akihiko Odaki 
> > > > > > > ---
> > > > > > >  hw/ppc/vof.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > > > --- a/hw/ppc/vof.c
> > > > > > > +++ b/hw/ppc/vof.c
> > > > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void 
> > > > > > > *fdt, GArray *claimed, uint64_t base)
> > > > > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", );
> > > > > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + 
> > > > > > > sc));
> > > > > > >  if (sc == 2) {
> > > > > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > > > > sizeof(uint32_t) * ac));
> > > > > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > > > >  } else {
> > > > > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > > > > sizeof(uint32_t) * ac));
> > > > > > >  }
> > > > > >
> > > > > > I did wonder if there was a better way to do what this is doing,
> > > > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > > > provide one.
> > > > >
> > > > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > > > not an automatic aligned-or-unaligned helper.   Maybe we should add 
> > > > > that?
> > > >
> > > > fdt32_ld() and friends only do the "load from this bit of memory"
> > > > part, which we already have QEMU utility functions for (and which
> > > > are this patch uses).
> > > >
> > > > This particular bit of code is dealing with an fdt property ("memory")
> > > > that is an array of (address, size) tuples where address and size
> > > > can independently be either 32 or 64 bits, and it wants the
> > > > size value of tuple 0. So the missing functionality is something at
> > > > a higher level than fdt32_ld() which would let you say "give me
> > > > tuple N field X" with some way to specify the tuple layout. (Which
> > > > is an awkward kind of API to write in C.)
> > > >
> > > > Slightly less general, but for this case we could perhaps have
> > > > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> > > >
> > > >   uint64_t value_array[2];
> > > >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", _array,
> > > >ac, sc);
> > > >   /*
> > > >* fills in value_array[0] with address, value_array[1] with size,
> > > >* probably barfs if the varargs-list of cell-sizes doesn't
> > > >* cover the whole property, similar to the current assert on
> > > >* proplen.
> > > >*/
> > > >   mem0_end = value_array[0];
> > > 
> > > Since 4/8 byte cells are most common and size is probably
> > > normally known, what about something simpler to start with?
> >
> > Hrm, I don't think this helps much.  As Peter points out the actual
> > load isn't really the issue, it's locating the right spot for it.
> 
> I don't really see why that's a problem, it's just a pointer
> addition - base + fdt_address_cells * 4. The problem was in

This is harder if #address-cells and #size-cells are different, or if
you're parsing ranges and #address-cells is different between parent
and child node.

> the memory access (yes it's fixed with the patch but you could
> add a general libfdt way to do it).

Huh.. well I'm getting different impressions of what the problem
actually is from what I initially read versus Peter Maydell's
comments, so I don't really know what to think.

If it's just the load then fdt32_ld() etc. already exist.  Or is it
really such a hot path that unconditionally handling unaligned
accesses isn't tenable?

> Some fancy function like above could be used, But is it really
> worth implementing such a thing for this?
> 
> Thanks,
> Nick
> 

-- 
David Gibson (he or they)   | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-04 Thread David Gibson
On Fri, Jul 05, 2024 at 11:18:47AM +1000, Nicholas Piggin wrote:
> On Thu Jul 4, 2024 at 10:15 PM AEST, Peter Maydell wrote:
> > On Sat, 29 Jun 2024 at 04:17, David Gibson  
> > wrote:
> > >
> > > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki  
> > > > wrote:
> > > > >
> > > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > > >
> > > > > Signed-off-by: Akihiko Odaki 
> > > > > ---
> > > > >  hw/ppc/vof.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > > --- a/hw/ppc/vof.c
> > > > > +++ b/hw/ppc/vof.c
> > > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, 
> > > > > GArray *claimed, uint64_t base)
> > > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", );
> > > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > > >  if (sc == 2) {
> > > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > > sizeof(uint32_t) * ac));
> > > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > > >  } else {
> > > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > > sizeof(uint32_t) * ac));
> > > > >  }
> > > >
> > > > I did wonder if there was a better way to do what this is doing,
> > > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > > provide one.
> > >
> > > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> >
> > fdt32_ld() and friends only do the "load from this bit of memory"
> > part, which we already have QEMU utility functions for (and which
> > are this patch uses).
> >
> > This particular bit of code is dealing with an fdt property ("memory")
> > that is an array of (address, size) tuples where address and size
> > can independently be either 32 or 64 bits, and it wants the
> > size value of tuple 0. So the missing functionality is something at
> > a higher level than fdt32_ld() which would let you say "give me
> > tuple N field X" with some way to specify the tuple layout. (Which
> > is an awkward kind of API to write in C.)
> >
> > Slightly less general, but for this case we could perhaps have
> > something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> >
> >   uint64_t value_array[2];
> >   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", _array,
> >ac, sc);
> >   /*
> >* fills in value_array[0] with address, value_array[1] with size,
> >* probably barfs if the varargs-list of cell-sizes doesn't
> >* cover the whole property, similar to the current assert on
> >* proplen.
> >*/
> >   mem0_end = value_array[0];
> 
> Since 4/8 byte cells are most common and size is probably
> normally known, what about something simpler to start with?

Hrm, I don't think this helps much.  As Peter points out the actual
load isn't really the issue, it's locating the right spot for it.

> 
> Thanks,
> Nick
> 
> ---
> diff --git a/libfdt/libfdt.h b/libfdt/libfdt.h
> index 0677fea..c4b6355 100644
> --- a/libfdt/libfdt.h
> +++ b/libfdt/libfdt.h
> @@ -148,6 +148,15 @@ static inline uint32_t fdt32_ld(const fdt32_t *p)
>   | bp[3];
>  }
>  
> +/*
> + * Load the value from a 32-bit cell of a property. Cells are 32-bit aligned
> + * so can use a single load.
> + */
> +static inline uint32_t fdt32_ld_prop(const fdt32_t *p)
> +{
> + return fdt32_to_cpu(*p);
> +}
> +
>  static inline void fdt32_st(void *property, uint32_t value)
>  {
>   uint8_t *bp = (uint8_t *)property;
> @@ -172,6 +181,18 @@ static inline uint64_t fdt64_ld(const fdt64_t *p)
>   | bp[7];
>  }
>  
> +/*
> + * Load the value from a 64-bit cell of a property. Cells are 32-bit aligned
> + * so can use two loads.
> + */
> +static inline uint64_t fdt64_ld_prop(const fdt64_t *p)
> +{
> + const fdt64_t *_p = p;
> +
> + return ((uint64_t)fdt32_to_cpu(_p[0]) << 32)
> + | fdt32_to_cpu(_p[1]);
> +}
> +
>  static inline void fdt64_st(void *property, uint64_t value)
>  {
>   uint8_t *bp = (uint8_t *)property;
> 

-- 
David Gibson (he or they)   | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-07-04 Thread David Gibson
On Thu, Jul 04, 2024 at 01:15:57PM +0100, Peter Maydell wrote:
> On Sat, 29 Jun 2024 at 04:17, David Gibson  
> wrote:
> >
> > On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> > > On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki  
> > > wrote:
> > > >
> > > > FDT properties are aligned by 4 bytes, not 8 bytes.
> > > >
> > > > Signed-off-by: Akihiko Odaki 
> > > > ---
> > > >  hw/ppc/vof.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > > > index e3b430a81f4f..b5b6514d79fc 100644
> > > > --- a/hw/ppc/vof.c
> > > > +++ b/hw/ppc/vof.c
> > > > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, 
> > > > GArray *claimed, uint64_t base)
> > > >  mem0_reg = fdt_getprop(fdt, offset, "reg", );
> > > >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> > > >  if (sc == 2) {
> > > > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + 
> > > > sizeof(uint32_t) * ac));
> > > > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> > > >  } else {
> > > >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + 
> > > > sizeof(uint32_t) * ac));
> > > >  }
> > >
> > > I did wonder if there was a better way to do what this is doing,
> > > but neither we (in system/device_tree.c) nor libfdt seem to
> > > provide one.
> >
> > libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
> > not an automatic aligned-or-unaligned helper.   Maybe we should add that?
> 
> fdt32_ld() and friends only do the "load from this bit of memory"
> part, which we already have QEMU utility functions for (and which
> are this patch uses).
> 
> This particular bit of code is dealing with an fdt property ("memory")
> that is an array of (address, size) tuples where address and size
> can independently be either 32 or 64 bits, and it wants the
> size value of tuple 0. So the missing functionality is something at
> a higher level than fdt32_ld() which would let you say "give me
> tuple N field X" with some way to specify the tuple layout. (Which
> is an awkward kind of API to write in C.)

Ah, right.  Yeah.. that's a pretty awkward API in C.

> Slightly less general, but for this case we could perhaps have
> something like the getprop equivalent of qemu_fdt_setprop_sized_cells():
> 
>   uint64_t value_array[2];
>   qemu_fdt_getprop_sized_cells(fdt, nodename, "memory", _array,
>ac, sc);
>   /*
>* fills in value_array[0] with address, value_array[1] with size,
>* probably barfs if the varargs-list of cell-sizes doesn't
>* cover the whole property, similar to the current assert on
>* proplen.
>*/
>   mem0_end = value_array[0];

Seems reasonable to me.  The only other thought I had was something
like Python's struct.unpack() [0].  But your suggestion is probably
more natural in C.

[0] https://docs.python.org/3/library/struct.html#struct.unpack

-- 
David Gibson (he or they)   | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access

2024-06-28 Thread David Gibson
On Fri, Jun 28, 2024 at 04:20:02PM +0100, Peter Maydell wrote:
> On Thu, 27 Jun 2024 at 14:39, Akihiko Odaki  wrote:
> >
> > FDT properties are aligned by 4 bytes, not 8 bytes.
> >
> > Signed-off-by: Akihiko Odaki 
> > ---
> >  hw/ppc/vof.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
> > index e3b430a81f4f..b5b6514d79fc 100644
> > --- a/hw/ppc/vof.c
> > +++ b/hw/ppc/vof.c
> > @@ -646,7 +646,7 @@ static void vof_dt_memory_available(void *fdt, GArray 
> > *claimed, uint64_t base)
> >  mem0_reg = fdt_getprop(fdt, offset, "reg", );
> >  g_assert(mem0_reg && proplen == sizeof(uint32_t) * (ac + sc));
> >  if (sc == 2) {
> > -mem0_end = be64_to_cpu(*(uint64_t *)(mem0_reg + sizeof(uint32_t) * 
> > ac));
> > +mem0_end = ldq_be_p(mem0_reg + sizeof(uint32_t) * ac);
> >  } else {
> >  mem0_end = be32_to_cpu(*(uint32_t *)(mem0_reg + sizeof(uint32_t) * 
> > ac));
> >  }
> 
> I did wonder if there was a better way to do what this is doing,
> but neither we (in system/device_tree.c) nor libfdt seem to
> provide one.

libfdt does provide unaligned access helpers (fdt32_ld() etc.), but
not an automatic aligned-or-unaligned helper.   Maybe we should add that?

-- 
David Gibson (he or they)   | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] hw/net: prevent potential NULL dereference

2024-05-30 Thread David Gibson
On Thu, May 30, 2024 at 10:03:51AM +0100, Peter Maydell wrote:
> On Thu, 30 May 2024 at 01:52, David Gibson  
> wrote:
> >
> > On Wed, May 29, 2024 at 02:07:18PM +0300, Oleg Sviridov wrote:
> > > Pointer, returned from function 'spapr_vio_find_by_reg', may be NULL and 
> > > is dereferenced immediately after.
> > >
> > > Found by Linux Verification Center (linuxtesting.org) with SVACE.
> > >
> > > Signed-off-by: Oleg Sviridov 
> > > ---
> > >  hw/net/spapr_llan.c | 4 
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
> > > index ecb30b7c76..f40b733229 100644
> > > --- a/hw/net/spapr_llan.c
> > > +++ b/hw/net/spapr_llan.c
> > > @@ -770,6 +770,10 @@ static target_ulong 
> > > h_change_logical_lan_mac(PowerPCCPU *cpu,
> > >  SpaprVioVlan *dev = VIO_SPAPR_VLAN_DEVICE(sdev);
> >
> > Hmm... I thought VIO_SPAPR_VLAN_DEVICE() was supposed to abort if sdev
> > was NULL or not of the right type.  Or have the rules for qom helpers
> > changed since I wrote this.
> 
> QOM casts abort if the type is wrong, but a NULL pointer is
> passed through as a NULL pointer.

Ah, my mistake.  LGTM, then.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] hw/net: prevent potential NULL dereference

2024-05-29 Thread David Gibson
On Wed, May 29, 2024 at 02:07:18PM +0300, Oleg Sviridov wrote:
> Pointer, returned from function 'spapr_vio_find_by_reg', may be NULL and is 
> dereferenced immediately after.
> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Signed-off-by: Oleg Sviridov 
> ---
>  hw/net/spapr_llan.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
> index ecb30b7c76..f40b733229 100644
> --- a/hw/net/spapr_llan.c
> +++ b/hw/net/spapr_llan.c
> @@ -770,6 +770,10 @@ static target_ulong h_change_logical_lan_mac(PowerPCCPU 
> *cpu,
>  SpaprVioVlan *dev = VIO_SPAPR_VLAN_DEVICE(sdev);

Hmm... I thought VIO_SPAPR_VLAN_DEVICE() was supposed to abort if sdev
was NULL or not of the right type.  Or have the rules for qom helpers
changed since I wrote this.

>  int i;
>  
> +if (!dev) {
> +return H_PARAMETER;
> +}
> +
>  for (i = 0; i < ETH_ALEN; i++) {
>  dev->nicconf.macaddr.a[ETH_ALEN - i - 1] = macaddr & 0xff;
>  macaddr >>= 8;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 09/10] ppc: Make Power11 as default cpu type for 'pseries' and 'powernv'

2024-04-28 Thread David Gibson
On Fri, Apr 26, 2024 at 04:32:18PM +0200, Cédric le Goater wrote:
> On 4/26/24 13:00, Aditya Gupta wrote:
> > Make Power11 as default cpu type for 'pseries' and 'powernv' machine type,
> > with Power11 being the newest supported Power processor in QEMU.
> 
> This is too early. We should merge Power11 support first, possibly in 9.1,
> and then change default in a future release, 9.2, 10.0

Additionally, changes to defaults in pseries must be versioned, so
that the behaviour of existing machine types won't change.

> 
> Thanks,
> 
> C.
> 
> 
> 
> > 
> > Cc: Cédric Le Goater 
> > Cc: Daniel Henrique Barboza 
> > Cc: David Gibson 
> > Cc: Frédéric Barrat 
> > Cc: Harsh Prateek Bora 
> > Cc: Mahesh J Salgaonkar 
> > Cc: Madhavan Srinivasan 
> > Cc: Nicholas Piggin 
> > Signed-off-by: Aditya Gupta 
> > ---
> >   hw/ppc/pnv.c   | 4 ++--
> >   hw/ppc/spapr.c | 2 +-
> >   2 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> > index 06e272f3bdd3..0c5a6bc424af 100644
> > --- a/hw/ppc/pnv.c
> > +++ b/hw/ppc/pnv.c
> > @@ -2531,8 +2531,6 @@ static void 
> > pnv_machine_p10_common_class_init(ObjectClass *oc, void *data)
> >   mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power10_v2.0");
> >   compat_props_add(mc->compat_props, phb_compat, 
> > G_N_ELEMENTS(phb_compat));
> > -mc->alias = "powernv";
> > -
> >   pmc->compat = compat;
> >   pmc->compat_size = sizeof(compat);
> >   pmc->dt_power_mgt = pnv_dt_power_mgt;
> > @@ -2569,6 +2567,8 @@ static void 
> > pnv_machine_power11_class_init(ObjectClass *oc, void *data)
> >   /* do power10_class_init as p11 core is same as p10 */
> >   pnv_machine_p10_common_class_init(oc, data);
> > +mc->alias = "powernv";
> > +
> >   mc->desc = "IBM PowerNV (Non-Virtualized) POWER11";
> >   mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power11");
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index d2d1e310a3be..1c3e2da8e9e4 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -4698,7 +4698,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> > void *data)
> >   smc->dr_lmb_enabled = true;
> >   smc->update_dt_enabled = true;
> > -mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power10_v2.0");
> > +mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power11");
> >   mc->has_hotpluggable_cpus = true;
> >   mc->nvdimm_supported = true;
> >   smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED;
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 04/10] ppc/spapr: Remove copy-paste from pa-features

2024-03-13 Thread David Gibson
On Tue, Mar 12, 2024 at 11:14:13PM +1000, Nicholas Piggin wrote:
> TCG does not support copy/paste instructions. Remove it from
> ibm,pa-features. This has never been implemented under TCG or
> practically usable under KVM, so it won't be missed.

As with the previous patch, the specific circumstances here justify
breaking the general rule.

> 
> Reviewed-by: Harsh Prateek Bora 
> Signed-off-by: Nicholas Piggin 
> ---
>  hw/ppc/spapr.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 3108d7c532..4192cd8d6c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -237,6 +237,10 @@ static void spapr_dt_pa_features(SpaprMachineState 
> *spapr,
>   * SSO (SAO) ordering is supported on KVM and thread=single hosts,
>   * but not MTTCG, so disable it. To advertise it, a cap would have
>   * to be added, or support implemented for MTTCG.
> + *
> + * Copy/paste is not supported by TCG, so it is not advertised. KVM
> + * can execute them but it has no accelerator drivers which are usable,
> + * so there isn't much need for it anyway.
>   */
>  
>  uint8_t pa_features_206[] = { 6, 0,
> @@ -260,8 +264,8 @@ static void spapr_dt_pa_features(SpaprMachineState *spapr,
>  0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 24 - 29 */
>  /* 30: MMR, 32: LE atomic, 34: EBB + ext EBB */
>  0x80, 0x00, 0x80, 0x00, 0xC0, 0x00, /* 30 - 35 */
> -/* 36: SPR SO, 38: Copy/Paste, 40: Radix MMU */
> -0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 36 - 41 */
> +/* 36: SPR SO, 40: Radix MMU */
> +0x80, 0x00, 0x00, 0x00, 0x80, 0x00, /* 36 - 41 */
>  /* 42: PM, 44: PC RA, 46: SC vec'd */
>      0x80, 0x00, 0x80, 0x00, 0x80, 0x00, /* 42 - 47 */
>  /* 48: SIMD, 50: QP BFP, 52: String */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 03/10] ppc/spapr|pnv: Remove SAO from pa-features

2024-03-13 Thread David Gibson
On Tue, Mar 12, 2024 at 11:14:12PM +1000, Nicholas Piggin wrote:
> SAO is a page table attribute that strengthens the memory ordering of
> accesses. QEMU with MTTCG does not implement this, so clear it in
> ibm,pa-features. This is an obscure feature that has been removed from
> POWER10 ISA v3.1, there isn't much concern with removing it.
> 
> Reviewed-by: Harsh Prateek Bora 
> Signed-off-by: Nicholas Piggin 

Usually altering a user visible feature like this without versioning
would be a no-no.  However, I think it's probably ok here: AFAICT the
feature was basically never used, it didn't work in some cases anyway,
and it's now gone away.

> ---
>  hw/ppc/pnv.c   |  2 +-
>  hw/ppc/spapr.c | 14 ++
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index 0b47b92baa..aa9786e970 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -150,7 +150,7 @@ static void pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
> *fdt)
>  uint32_t page_sizes_prop[64];
>  size_t page_sizes_prop_size;
>  const uint8_t pa_features[] = { 24, 0,
> -0xf6, 0x3f, 0xc7, 0xc0, 0x80, 0xf0,
> +0xf6, 0x3f, 0xc7, 0xc0, 0x00, 0xf0,
>  0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
>  0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
>  0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 55263f0815..3108d7c532 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -233,17 +233,23 @@ static void spapr_dt_pa_features(SpaprMachineState 
> *spapr,
>   PowerPCCPU *cpu,
>   void *fdt, int offset)
>  {
> +/*
> + * SSO (SAO) ordering is supported on KVM and thread=single hosts,
> + * but not MTTCG, so disable it. To advertise it, a cap would have
> + * to be added, or support implemented for MTTCG.
> + */
> +
>  uint8_t pa_features_206[] = { 6, 0,
> -0xf6, 0x1f, 0xc7, 0x00, 0x80, 0xc0 };
> +0xf6, 0x1f, 0xc7, 0x00, 0x00, 0xc0 };
>  uint8_t pa_features_207[] = { 24, 0,
> -0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0,
> +0xf6, 0x1f, 0xc7, 0xc0, 0x00, 0xf0,
>  0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
>  0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
>  0x80, 0x00, 0x80, 0x00, 0x00, 0x00 };
>  uint8_t pa_features_300[] = { 66, 0,
>  /* 0: MMU|FPU|SLB|RUN|DABR|NX, 1: fri[nzpm]|DABRX|SPRG3|SLB0|PP110 */
> -/* 2: VPM|DS205|PPR|DS202|DS206, 3: LSD|URG, SSO, 5: LE|CFAR|EB|LSQ 
> */
> -0xf6, 0x1f, 0xc7, 0xc0, 0x80, 0xf0, /* 0 - 5 */
> +/* 2: VPM|DS205|PPR|DS202|DS206, 3: LSD|URG, 5: LE|CFAR|EB|LSQ */
> +0xf6, 0x1f, 0xc7, 0xc0, 0x00, 0xf0, /* 0 - 5 */
>  /* 6: DS207 */
>  0x80, 0x00, 0x00, 0x00, 0x00, 0x00, /* 6 - 11 */
>  /* 16: Vector */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v8 2/2] ppc: spapr: Enable 2nd DAWR on Power10 pSeries machine

2024-02-27 Thread David Gibson
 > +
> >  SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >  [SPAPR_CAP_HTM] = {
> >  .name = "htm",
> > @@ -781,6 +807,15 @@ SpaprCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >  .type = "bool",
> >  .apply = cap_ail_mode_3_apply,
> >  },
> > +[SPAPR_CAP_DAWR1] = {
> > +.name = "dawr1",
> > +.description = "Allow 2nd Data Address Watchpoint Register 
> > (DAWR1)",
> > +.index = SPAPR_CAP_DAWR1,
> > +.get = spapr_cap_get_bool,
> > +.set = spapr_cap_set_bool,
> > +.type = "bool",
> > +.apply = cap_dawr1_apply,
> > +},
> >  };
> >  
> >  static SpaprCapabilities default_caps_with_cpu(SpaprMachineState *spapr,
> > @@ -923,6 +958,7 @@ SPAPR_CAP_MIG_STATE(large_decr, 
> > SPAPR_CAP_LARGE_DECREMENTER);
> >  SPAPR_CAP_MIG_STATE(ccf_assist, SPAPR_CAP_CCF_ASSIST);
> >  SPAPR_CAP_MIG_STATE(fwnmi, SPAPR_CAP_FWNMI);
> >  SPAPR_CAP_MIG_STATE(rpt_invalidate, SPAPR_CAP_RPT_INVALIDATE);
> > +SPAPR_CAP_MIG_STATE(dawr1, SPAPR_CAP_DAWR1);
> >  
> >  void spapr_caps_init(SpaprMachineState *spapr)
> >  {
> > diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> > index fcefd1d1c7..34c1c77c95 100644
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -814,11 +814,12 @@ static target_ulong 
> > h_set_mode_resource_set_ciabr(PowerPCCPU *cpu,
> >  return H_SUCCESS;
> >  }
> >  
> > -static target_ulong h_set_mode_resource_set_dawr0(PowerPCCPU *cpu,
> > -  SpaprMachineState *spapr,
> > -  target_ulong mflags,
> > -  target_ulong value1,
> > -  target_ulong value2)
> > +static target_ulong h_set_mode_resource_set_dawr(PowerPCCPU *cpu,
> > + SpaprMachineState 
> > *spapr,
> > + target_ulong mflags,
> > + target_ulong resource,
> > + target_ulong value1,
> > + target_ulong value2)
> 
> Did the text alignment go wrong here?
> 
> Aside from those things,
> 
> Reviewed-by: Nicholas Piggin 
> 
> Thanks,
> Nick
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] spapr: avoid overhead of finding vhyp class in critical operations

2024-02-25 Thread David Gibson
set_c(PowerPCCPU *cpu, hwaddr 
> ptex, uint64_t pte1)
>  hwaddr base, offset = ptex * HASH_PTE_SIZE_64 + HPTE64_DW1_C;
>  
>      if (cpu->vhyp) {
> -PPCVirtualHypervisorClass *vhc =
> -PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> -vhc->hpte_set_c(cpu->vhyp, ptex, pte1);
> +cpu->vhyp_class->hpte_set_c(cpu->vhyp, ptex, pte1);
>  return;
>  }
>  base = ppc_hash64_hpt_base(cpu);
> diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> index 5823e039e6..496ba87a95 100644
> --- a/target/ppc/mmu-radix64.c
> +++ b/target/ppc/mmu-radix64.c
> @@ -677,9 +677,7 @@ static bool ppc_radix64_xlate_impl(PowerPCCPU *cpu, vaddr 
> eaddr,
>  
>  /* Get Partition Table */
>  if (cpu->vhyp) {
> -PPCVirtualHypervisorClass *vhc;
> -vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> -if (!vhc->get_pate(cpu->vhyp, cpu, lpid, )) {
> +if (!cpu->vhyp_class->get_pate(cpu->vhyp, cpu, lpid, )) {
>  if (guest_visible) {
>  ppc_radix64_raise_hsi(cpu, access_type, eaddr, eaddr,
>DSISR_R_BADCONFIG);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] MAINTAINERS: Remove myself as reviewer from PPC

2024-02-20 Thread David Gibson
On Tue, Feb 20, 2024 at 09:09:56AM +0100, Cédric le Goater wrote:
> PPC maintainership has been a side activity for the last 2 years and
> it is time to let go some of it now that Nick has taken over.
> 
> Signed-off-by: Cédric Le Goater 

Thanks for all your contributions Cédric.

> ---
>  MAINTAINERS | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a74d73960c0a..f5a4e4745c92 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -316,7 +316,6 @@ F: tests/tcg/openrisc/
>  PowerPC TCG CPUs
>  M: Nicholas Piggin 
>  M: Daniel Henrique Barboza 
> -R: Cédric Le Goater 
>  L: qemu-...@nongnu.org
>  S: Odd Fixes
>  F: target/ppc/
> @@ -468,7 +467,6 @@ F: target/mips/sysemu/
>  PPC KVM CPUs
>  M: Nicholas Piggin 
>  R: Daniel Henrique Barboza 
> -R: Cédric Le Goater 
>  S: Odd Fixes
>  F: target/ppc/kvm.c
>  
> @@ -1502,7 +1500,6 @@ F: tests/avocado/ppc_prep_40p.py
>  sPAPR (pseries)
>  M: Nicholas Piggin 
>  R: Daniel Henrique Barboza 
> -R: Cédric Le Goater 
>  R: David Gibson 
>  R: Harsh Prateek Bora 
>  L: qemu-...@nongnu.org

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: spapr watchdog vs watchdog_perform_action() / QMP watchdog-set-action

2024-01-28 Thread David Gibson
On Sat, Jan 27, 2024 at 01:08:02PM +, Peter Maydell wrote:
> On Fri, 26 Jan 2024 at 20:49, Markus Armbruster  wrote:
> >
> > Peter Maydell  writes:
> >
> > > Hi; one of the "bitesized tasks" we have listed is to convert
> > > watchdog timers which directly call qemu_system_reset_request() on
> > > watchdog timeout to call watchdog_perform_action() instead. This
> > > means they honour the QMP commands that let the user specifiy
> > > the behaviour on watchdog expiry:
> > > https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#qapidoc-141
> > > https://www.qemu.org/docs/master/interop/qemu-qmp-ref.html#qapidoc-129
> > > (choices include reset, power off the system, do nothing, etc).
> > >
> > > There are only a few remaining watchdogs that don't use the
> > > watchdog_perform_action() function. In most cases the change
> > > is obvious and easy: just make them do that instead of calling
> > > qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET).
> > >
> > > However, the hw/watchdog/spapr_watchdog.c case is trickier. As
> > > far as I can tell from the sources, this is a watchdog set up via
> > > a hypercall, and the guest makes a choice of "power off, restart,
> > > or dump and restart" for its on-expiry action.
> > >
> > > What should this watchdog's interaction with the watchdog-set-action
> > > QMP command be? If the user says "do X" and the guest says "do Y",
> > > which do we do? (With the current code, we always honour what
> > > the guest asks for and ignore what the user asks for.)
> >
> > Gut reaction: when the user says "do X", the guest should not get a say.
> > But one of the values of X could be "whatever the guest says".

That would also be my inclination.

> Mmm. Slightly awkwardly, we don't currently distinguish between
> "action is reset because the user never expressed a preference"
> and "action is reset because the user specifically asked for that",
> but I guess in theory we could make that distinction. (Conveniently
> there is no QMP action for "query current watchdog-action state",
> so we don't need to worry about reflecting that distinction in the
> QMP interface if we make it.)

I think that change is necessary in order to accomodate this sort of
watchdog with guest-progammable behaviour (which is part of the PAPR
spec, so we shouldn't just ignore it).

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 2/8] ppc/spapr|pnv: Remove SAO from pa-features when running MTTCG

2024-01-24 Thread David Gibson
On Tue, Jan 23, 2024 at 11:57:56AM +1000, Nicholas Piggin wrote:
> On Fri Jan 19, 2024 at 10:23 AM AEST, David Gibson wrote:
> > On Fri, Jan 19, 2024 at 12:09:36AM +1000, Nicholas Piggin wrote:
> > > SAO is a page table attribute that strengthens the memory ordering of
> > > accesses. QEMU with MTTCG does not implement this, so clear it in
> > > ibm,pa-features. There is a complication with spapr migration that is
> > > addressed with comments, it is not a new problem here.
> > > 
> > > Signed-off-by: Nicholas Piggin 
> > > ---
> > >  hw/ppc/pnv.c   |  5 +
> > >  hw/ppc/spapr.c | 15 +++
> > >  2 files changed, 20 insertions(+)
> > > 
> > > diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> > > index b949398689..4969fbdb05 100644
> > > --- a/hw/ppc/pnv.c
> > > +++ b/hw/ppc/pnv.c
> > > @@ -158,6 +158,11 @@ static void pnv_dt_core(PnvChip *chip, PnvCore *pc, 
> > > void *fdt)
> > >  char *nodename;
> > >  int cpus_offset = get_cpus_node(fdt);
> > >  
> > > +if (qemu_tcg_mttcg_enabled()) {
> > > +/* SSO (SAO) ordering is not supported under MTTCG. */
> > > +pa_features[4 + 2] &= ~0x80;
> > > +}
> > > +
> > >  nodename = g_strdup_printf("%s@%x", dc->fw_name, pc->pir);
> > >  offset = fdt_add_subnode(fdt, cpus_offset, nodename);
> > >  _FDT(offset);
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 021b1a00e1..1c79d5670d 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -284,6 +284,21 @@ static void spapr_dt_pa_features(SpaprMachineState 
> > > *spapr,
> > >  return;
> > >  }
> > >  
> > > +if (qemu_tcg_mttcg_enabled()) {
> > > +/*
> > > + * SSO (SAO) ordering is not supported under MTTCG, so disable 
> > > it.
> > > + * There is no cap for this, so there is a migration bug here.
> > > + * However don't disable it entirely, to allow it to be used 
> > > under
> > > + * KVM. This is a minor concern because:
> > > + * - SAO is an obscure an rarely (if ever) used feature.
> > > + * - SAO is removed from POWER10 / v3.1, so there is already a
> > > + *   migration problem today.
> > > + * - Linux does not test this pa-features bit today anyway, so 
> > > it's
> > > + *   academic.
> > > + */
> > > +pa_features[4 + 2] &= ~0x80;
> >
> > Oof.. I see the reasoning but modifying guest visible parameters based
> > on host capabilities without a cap really worries me nonetheless.
> 
> Yeah :( It's not a new problem, but changing it based on host
> does make it look uglier I guess.

It's not really about whether it looks uglier, it's the fact that any
dependency of guest visible aspects of the VM on host properties is a
potential landmine for migration.

The qemu migration model is - pretty fundamentally - that the VM
should look and behave, from the point of view of the guest, the same
before and after migration.  If the behaviour of the VM changes based
on host properties it breaks that assumption, and it does so in a way
that the user can't control or even easily predict.  Tools such as
libvirt, or even qemu itself, can't verify that the migration is valid
if there are effectively invisible parameters to the VM configuration
that come from the host instead of the command line.

> Other option could be to just disable it always. I don't mind
> but someone did mention experimenting with it when I asked
> about removing support from Linux. They could still test with
> bare metal, and if ever started actually being used then we
> could add a cap for it.

I think that's a better idea.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 2/8] ppc/spapr|pnv: Remove SAO from pa-features when running MTTCG

2024-01-18 Thread David Gibson
On Fri, Jan 19, 2024 at 12:09:36AM +1000, Nicholas Piggin wrote:
> SAO is a page table attribute that strengthens the memory ordering of
> accesses. QEMU with MTTCG does not implement this, so clear it in
> ibm,pa-features. There is a complication with spapr migration that is
> addressed with comments, it is not a new problem here.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  hw/ppc/pnv.c   |  5 +
>  hw/ppc/spapr.c | 15 +++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> index b949398689..4969fbdb05 100644
> --- a/hw/ppc/pnv.c
> +++ b/hw/ppc/pnv.c
> @@ -158,6 +158,11 @@ static void pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
> *fdt)
>  char *nodename;
>  int cpus_offset = get_cpus_node(fdt);
>  
> +if (qemu_tcg_mttcg_enabled()) {
> +/* SSO (SAO) ordering is not supported under MTTCG. */
> +pa_features[4 + 2] &= ~0x80;
> +}
> +
>  nodename = g_strdup_printf("%s@%x", dc->fw_name, pc->pir);
>  offset = fdt_add_subnode(fdt, cpus_offset, nodename);
>  _FDT(offset);
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 021b1a00e1..1c79d5670d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -284,6 +284,21 @@ static void spapr_dt_pa_features(SpaprMachineState 
> *spapr,
>  return;
>  }
>  
> +if (qemu_tcg_mttcg_enabled()) {
> +/*
> + * SSO (SAO) ordering is not supported under MTTCG, so disable it.
> + * There is no cap for this, so there is a migration bug here.
> + * However don't disable it entirely, to allow it to be used under
> + * KVM. This is a minor concern because:
> + * - SAO is an obscure an rarely (if ever) used feature.
> + * - SAO is removed from POWER10 / v3.1, so there is already a
> + *   migration problem today.
> + * - Linux does not test this pa-features bit today anyway, so it's
> + *   academic.
> + */
> +pa_features[4 + 2] &= ~0x80;

Oof.. I see the reasoning but modifying guest visible parameters based
on host capabilities without a cap really worries me nonetheless.

> +    }
> +
>  if (ppc_hash64_has(cpu, PPC_HASH64_CI_LARGEPAGE)) {
>  /*
>   * Note: we keep CI large pages off by default because a 64K capable

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] ppc: qtest already exports qtest_rtas_call()

2023-10-30 Thread David Gibson
On Mon, Oct 30, 2023 at 05:41:36PM +0100, Cédric le Goater wrote:
> On 10/30/23 17:38, Juan Quintela wrote:
> > Having two functions with the same name is a bad idea.  As spapr only
> > uses the function locally, made it static.
> > 
> > When you compile with clang, you get this compilation error:
> > 
> > /usr/bin/ld: tests/qtest/libqos/libqos.fa.p/.._libqtest.c.o: in function 
> > `qtest_rtas_call':
> > /scratch/qemu/clang/full/all/../../../../../mnt/code/qemu/full/tests/qtest/libqtest.c:1195:
> >  multiple definition of `qtest_rtas_call'; 
> > libqemu-ppc64-softmmu.fa.p/hw_ppc_spapr_rtas.c.o:/scratch/qemu/clang/full/all/../../../../../mnt/code/qemu/full/hw/ppc/spapr_rtas.c:536:
> >  first defined here
> > clang-16: error: linker command failed with exit code 1 (use -v to see 
> > invocation)
> > ninja: build stopped: subcommand failed.
> > make: *** [Makefile:162: run-ninja] Error 1
> > 
> > Signed-off-by: Juan Quintela 
> 
> 
> Reviewed-by: Cédric Le Goater 

I think changing the name of one of the functions would be even
better.  Making it static means it won't confuse the compiler, but it
can still confuse people.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 4/4] hw/ppc/spapr: Rename 'softmmu' -> 'tcg'

2023-10-02 Thread David Gibson
On Mon, Oct 02, 2023 at 04:38:54PM +0200, Philippe Mathieu-Daudé wrote:
> spapr_softmmu.c isn't related to having a soft MMU, but having
> the TCG accelerator. Rename it using the 'tcg' suffix.

That's not really accurate.  The functions in there absolutely are
about the emulated MMU.  They're not needed for KVM, because KVM has
its own MMU emulation, but they're not strictly speaking related to TCG.

> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/ppc/{spapr_softmmu.c => spapr_tcg.c} | 0
>  hw/ppc/meson.build  | 2 +-
>  2 files changed, 1 insertion(+), 1 deletion(-)
>  rename hw/ppc/{spapr_softmmu.c => spapr_tcg.c} (100%)
> 
> diff --git a/hw/ppc/spapr_softmmu.c b/hw/ppc/spapr_tcg.c
> similarity index 100%
> rename from hw/ppc/spapr_softmmu.c
> rename to hw/ppc/spapr_tcg.c
> diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
> index 7c2c52434a..281100a58d 100644
> --- a/hw/ppc/meson.build
> +++ b/hw/ppc/meson.build
> @@ -31,7 +31,7 @@ ppc_ss.add(when: 'CONFIG_PSERIES', if_true: files(
>'pef.c',
>  ))
>  ppc_ss.add(when: ['CONFIG_PSERIES', 'CONFIG_TCG'], if_true: files(
> -  'spapr_softmmu.c',
> +  'spapr_tcg.c',
>  ))
>  ppc_ss.add(when: 'CONFIG_SPAPR_RNG', if_true: files('spapr_rng.c'))
>  ppc_ss.add(when: ['CONFIG_PSERIES', 'CONFIG_LINUX'], if_true: files(

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 3/4] hw/ppc/spapr_hcall: Rename {softmmu -> tcgppc}_resize_hpt_prepare/commit

2023-10-02 Thread David Gibson
On Mon, Oct 02, 2023 at 04:38:53PM +0200, Philippe Mathieu-Daudé wrote:
> We use the 'kvmppc' prefix for KVM specific functions:
> 
>   $ git grep \ kvmppc_ | wc -l
>402
> 
> Following the same pattern for TCG specific functions,
> use the 'tcgppc' prefix (which is clearer than 'softmmu').

In this specific case, I think "softmmu" is more accurate than "tcg".
These are specifically related to the emulated MMU, and not really to
instruction emulation per se.

> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/hw/ppc/spapr.h | 8 
>  hw/ppc/spapr_hcall.c   | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index e91791a1a9..160a5823fb 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -634,10 +634,10 @@ void spapr_register_hypercall(target_ulong opcode, 
> spapr_hcall_fn fn);
>  target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>   target_ulong *args);
>  
> -target_ulong softmmu_resize_hpt_prepare(PowerPCCPU *cpu, SpaprMachineState 
> *spapr,
> - target_ulong shift);
> -target_ulong softmmu_resize_hpt_commit(PowerPCCPU *cpu, SpaprMachineState 
> *spapr,
> -target_ulong flags, target_ulong 
> shift);
> +target_ulong tcgppc_resize_hpt_prepare(PowerPCCPU *cpu, SpaprMachineState 
> *spapr,
> +   target_ulong shift);
> +target_ulong tcgppc_resize_hpt_commit(PowerPCCPU *cpu, SpaprMachineState 
> *spapr,
> +  target_ulong flags, target_ulong 
> shift);
>  bool is_ram_address(SpaprMachineState *spapr, hwaddr addr);
>  void push_sregs_to_kvm_pr(SpaprMachineState *spapr);
>  
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index a860c626b7..7b0f2e2e1c 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -125,7 +125,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU *cpu,
>  if (kvm_enabled()) {
>  return H_HARDWARE;
>  } else if (tcg_enabled()) {
> -return softmmu_resize_hpt_prepare(cpu, spapr, shift);
> +return tcgppc_resize_hpt_prepare(cpu, spapr, shift);
>  } else {
>  g_assert_not_reached();
>  }
> @@ -195,7 +195,7 @@ static target_ulong h_resize_hpt_commit(PowerPCCPU *cpu,
>  if (kvm_enabled()) {
>  return H_HARDWARE;
>  } else if (tcg_enabled()) {
> -return softmmu_resize_hpt_commit(cpu, spapr, flags, shift);
> +return tcgppc_resize_hpt_commit(cpu, spapr, flags, shift);
>  } else {
>  g_assert_not_reached();
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] MAINTAINERS: Nick Piggin PPC maintainer, other PPC changes

2023-09-15 Thread David Gibson
On Fri, Sep 15, 2023 at 08:05:07AM -0300, Daniel Henrique Barboza wrote:
> Update all relevant PowerPC entries as follows:
> 
> - Nick Piggin is promoted to Maintainer in all qemu-ppc subsystems.
>   Nick has  been a solid contributor for the last couple of years and
>   has the required knowledge and motivation to drive the boat.
> 
> - Greg Kurz is being removed from all qemu-ppc entries. Greg has moved
>   to other areas of interest and will retire from qemu-ppc.  Thanks Mr
>   Kurz for all the years of service.
> 
> - David Gibson was removed as 'Reviewer' from PowerPC TCG CPUs and PPC
>   KVM CPUs. Change done per his request.
> 
> - Daniel Barboza downgraded from 'Maintainer' to 'Reviewer' in sPAPR and
>   PPC KVM CPUs. It has been a long since I last touched those areas and
>   it's not justified to be kept as maintainer in them.
> 
> - Cedric Le Goater and Daniel Barboza removed as 'Reviewer' in VOF. We
>   don't have the required knowledge to justify it.
> 
> - VOF support downgraded from 'Maintained' to 'Odd Fixes' since it
>   better reflects the current state of the subsystem.
> 
> Acked-by: Cédric Le Goater 
> Signed-off-by: Daniel Henrique Barboza 

Acked-by: David Gibson 

> ---
>  MAINTAINERS | 20 +++-
>  1 file changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 00562f924f..c4aa1c1c9f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -298,11 +298,9 @@ F: hw/openrisc/
>  F: tests/tcg/openrisc/
>  
>  PowerPC TCG CPUs
> +M: Nicholas Piggin 
>  M: Daniel Henrique Barboza 
>  R: Cédric Le Goater 
> -R: David Gibson 
> -R: Greg Kurz 
> -R: Nicholas Piggin 
>  L: qemu-...@nongnu.org
>  S: Odd Fixes
>  F: target/ppc/
> @@ -438,10 +436,9 @@ F: target/mips/kvm*
>  F: target/mips/sysemu/
>  
>  PPC KVM CPUs
> -M: Daniel Henrique Barboza 
> +M: Nicholas Piggin 
> +R: Daniel Henrique Barboza 
>  R: Cédric Le Goater 
> -R: David Gibson 
> -R: Greg Kurz 
>  S: Odd Fixes
>  F: target/ppc/kvm.c
>  
> @@ -1430,10 +1427,10 @@ F: include/hw/rtc/m48t59.h
>  F: tests/avocado/ppc_prep_40p.py
>  
>  sPAPR (pseries)
> -M: Daniel Henrique Barboza 
> +M: Nicholas Piggin 
> +R: Daniel Henrique Barboza 
>  R: Cédric Le Goater 
>  R: David Gibson 
> -R: Greg Kurz 
>  R: Harsh Prateek Bora 
>  L: qemu-...@nongnu.org
>  S: Odd Fixes
> @@ -1452,8 +1449,8 @@ F: tests/avocado/ppc_pseries.py
>  
>  PowerNV (Non-Virtualized)
>  M: Cédric Le Goater 
> +M: Nicholas Piggin 
>  R: Frédéric Barrat 
> -R: Nicholas Piggin 
>  L: qemu-...@nongnu.org
>  S: Odd Fixes
>  F: docs/system/ppc/powernv.rst
> @@ -1497,12 +1494,9 @@ F: include/hw/pci-host/mv64361.h
>  
>  Virtual Open Firmware (VOF)
>  M: Alexey Kardashevskiy 
> -R: Cédric Le Goater 
> -R: Daniel Henrique Barboza 
>  R: David Gibson 
> -R: Greg Kurz 
>  L: qemu-...@nongnu.org
> -S: Maintained
> +S: Odd Fixes
>  F: hw/ppc/spapr_vof*
>  F: hw/ppc/vof*
>  F: include/hw/ppc/vof*

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] hw/ppc/spapr: Test whether TCG is enabled with tcg_enabled()

2023-06-20 Thread David Gibson
On Tue, Jun 20, 2023 at 09:48:02AM +0200, Philippe Mathieu-Daudé wrote:
> Although the PPC target only supports the TCG and KVM
> accelerators, QEMU supports more. We can no assume that
> '!kvm == tcg', so test for the correct accelerator. This
> also eases code review, because here we don't care about
> KVM, we really want to test for TCG.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index dcb7f1c70a..c4b666587b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2524,7 +2524,7 @@ static void spapr_set_vsmt_mode(SpaprMachineState 
> *spapr, Error **errp)
>  int ret;
>  unsigned int smp_threads = ms->smp.threads;
>  
> -if (!kvm_enabled() && (smp_threads > 1)) {
> +if (tcg_enabled() && (smp_threads > 1)) {
>  error_setg(errp, "TCG cannot support more than 1 thread/core "
> "on a pseries machine");
>  return;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 3/3] net: socket: remove net_init_socket()

2023-06-14 Thread David Gibson
On Fri,  9 Jun 2023 09:27:48 +0200
Laurent Vivier  wrote:

> Move the file descriptor type checking before doing anything with it.
> If it's not usable, don't close it as it could be in use by another
> part of QEMU, only fail and report an error.
> 
> Signed-off-by: Laurent Vivier 

Reviewed-by: David Gibson 

> ---
>  net/socket.c | 43 +--
>  1 file changed, 17 insertions(+), 26 deletions(-)
> 
> diff --git a/net/socket.c b/net/socket.c
> index 6b1f0fec3a10..8e3702e1f3a8 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -463,28 +463,6 @@ static int net_socket_fd_check(int fd, Error **errp)
>  return so_type;
>  }
>  
> -static NetSocketState *net_socket_fd_init(NetClientState *peer,
> -  const char *model, const char 
> *name,
> -  int fd, int is_connected,
> -  const char *mc, Error **errp)
> -{
> -int so_type;
> -
> -so_type = net_socket_fd_check(fd, errp);
> -if (so_type < 0) {
> -close(fd);
> -return NULL;
> -}
> -switch(so_type) {
> -case SOCK_DGRAM:
> -return net_socket_fd_init_dgram(peer, model, name, fd, is_connected,
> -mc, errp);
> -case SOCK_STREAM:
> -return net_socket_fd_init_stream(peer, model, name, fd, 
> is_connected);
> -}
> -return NULL;
> -}
> -
>  static void net_socket_accept(void *opaque)
>  {
>  NetSocketState *s = opaque;
> @@ -728,21 +706,34 @@ int net_init_socket(const Netdev *netdev, const char 
> *name,
>  }
>  
>  if (sock->fd) {
> -int fd, ret;
> +int fd, ret, so_type;
>  
>  fd = monitor_fd_param(monitor_cur(), sock->fd, errp);
>  if (fd == -1) {
>  return -1;
>  }
> +so_type = net_socket_fd_check(fd, errp);
> +if (so_type < 0) {
> +return -1;
> +}
>  ret = qemu_socket_try_set_nonblock(fd);
>  if (ret < 0) {
>  error_setg_errno(errp, -ret, "%s: Can't use file descriptor %d",
>   name, fd);
>  return -1;
>  }
> -if (!net_socket_fd_init(peer, "socket", name, fd, 1, sock->mcast,
> -errp)) {
> -return -1;
> +switch (so_type) {
> +case SOCK_DGRAM:
> +if (!net_socket_fd_init_dgram(peer, "socket", name, fd, 1,
> +  sock->mcast, errp)) {
> +return -1;
> +        }
> +break;
> +case SOCK_STREAM:
> +if (!net_socket_fd_init_stream(peer, "socket", name, fd, 1)) {
> +return -1;
> +}
> +break;
>  }
>  return 0;
>  }
> -- 
> 2.39.2
> 


-- 
David Gibson 
Principal Software Engineer, Virtualization, Red Hat




Re: [PATCH 2/3] net: socket: move fd type checking to its own function

2023-06-14 Thread David Gibson
On Fri,  9 Jun 2023 09:27:47 +0200
Laurent Vivier  wrote:

> Signed-off-by: Laurent Vivier 

Reviewed-by: David Gibson 


> ---
>  net/socket.c | 28 
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/net/socket.c b/net/socket.c
> index 24dcaa55bc46..6b1f0fec3a10 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -446,16 +446,32 @@ static NetSocketState 
> *net_socket_fd_init_stream(NetClientState *peer,
>  return s;
>  }
>  
> +static int net_socket_fd_check(int fd, Error **errp)
> +{
> +int so_type, optlen = sizeof(so_type);
> +
> +if (getsockopt(fd, SOL_SOCKET, SO_TYPE, (char *)_type,
> +(socklen_t *)) < 0) {
> +error_setg(errp, "can't get socket option SO_TYPE");
> +return -1;
> +}
> +if (so_type != SOCK_DGRAM && so_type != SOCK_STREAM) {
> +error_setg(errp, "socket type=%d for fd=%d must be either"
> +   " SOCK_DGRAM or SOCK_STREAM", so_type, fd);
> +return -1;
> +}
> +return so_type;
> +}
> +> 
>  static NetSocketState *net_socket_fd_init(NetClientState *peer,
>const char *model, const char 
> *name,
>int fd, int is_connected,
>const char *mc, Error **errp)
>  {
> -int so_type = -1, optlen=sizeof(so_type);
> +int so_type;
>  
> -if(getsockopt(fd, SOL_SOCKET, SO_TYPE, (char *)_type,
> -(socklen_t *))< 0) {
> -error_setg(errp, "can't get socket option SO_TYPE");
> +so_type = net_socket_fd_check(fd, errp);
> +if (so_type < 0) {
>  close(fd);
>  return NULL;
>  }
> @@ -465,10 +481,6 @@ static NetSocketState *net_socket_fd_init(NetClientState 
> *peer,
>  mc, errp);
>  case SOCK_STREAM:
>  return net_socket_fd_init_stream(peer, model, name, fd, 
> is_connected);
> -default:
> -error_setg(errp, "socket type=%d for fd=%d must be either"
> -   " SOCK_DGRAM or SOCK_STREAM", so_type, fd);
> -close(fd);
>  }
>  return NULL;
>  }
> -- 
> 2.39.2
> 


-- 
David Gibson 
Principal Software Engineer, Virtualization, Red Hat




Re: [PATCH 1/3] net: socket: prepare to cleanup net_init_socket()

2023-06-14 Thread David Gibson
On Fri,  9 Jun 2023 09:27:46 +0200
Laurent Vivier  wrote:

> Use directly net_socket_fd_init_stream() and net_socket_fd_init_dgram()
> when the socket type is already known.
> 
> Signed-off-by: Laurent Vivier 

This makes sense as a clean up regardless of the rest of the series.

Reviewed-by: David Gibson 


> ---
>  net/socket.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/socket.c b/net/socket.c
> index ba6e5b0b0035..24dcaa55bc46 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -587,7 +587,7 @@ static int net_socket_connect_init(NetClientState *peer,
>  break;
>  }
>  }
> -s = net_socket_fd_init(peer, model, name, fd, connected, NULL, errp);
> +s = net_socket_fd_init_stream(peer, model, name, fd, connected);
>  if (!s) {
>  return -1;
>  }
> @@ -629,7 +629,7 @@ static int net_socket_mcast_init(NetClientState *peer,
>  return -1;
>  }
>  
> -s = net_socket_fd_init(peer, model, name, fd, 0, NULL, errp);
> +s = net_socket_fd_init_dgram(peer, model, name, fd, 0, NULL, errp);
>  if (!s) {
>  return -1;
>  }
> @@ -683,7 +683,7 @@ static int net_socket_udp_init(NetClientState *peer,
>  }
>  qemu_socket_set_nonblock(fd);
>  
> -s = net_socket_fd_init(peer, model, name, fd, 0, NULL, errp);
> +s = net_socket_fd_init_dgram(peer, model, name, fd, 0, NULL, errp);
>  if (!s) {
>  return -1;
>  }
> -- 
> 2.39.2
> 


-- 
David Gibson 
Principal Software Engineer, Virtualization, Red Hat




Re: [RFC PATCH-for-8.0 2/3] hw/ppc/spapr: Replace tswap64(HPTE) by cpu_to_be64(HPTE)

2022-12-21 Thread David Gibson
On Wed, Dec 21, 2022 at 10:15:28PM +, Peter Maydell wrote:
> On Wed, 21 Dec 2022 at 16:03, Cédric Le Goater  wrote:
> >
> > On 12/21/22 13:33, Peter Maydell wrote:
> > > On Wed, 21 Dec 2022 at 01:35, David Gibson  
> > > wrote:
> > >> On Mon, Dec 19, 2022 at 10:39:40AM +, Peter Maydell wrote:
> > >>> OK. I still think we should consistently change all the places that are
> > >>> accessing this data structure, though, not just half of them.
> > >>
> > >> Yes, that makes sense.  Although what exactly constitutes "this data
> > >> structure" is a bit complex here.  If we mean just the spapr specific
> > >> "external HPT", then there are only a few more references to it.  If
> > >> we mean all instances of a powerpc hashed page table, then there are a
> > >> bunch more in the cpu target code.
> > >
> > > I had in mind "places where we write this specific array of bytes
> > > spapr->htab".

Seems a reasonable amount to tackle for now.

> > spapr_store_hpte() seems to be the most annoying part. It is used
> > by hcalls h_enter, h_remove, h_protect. Reworking the interface
> > to present pte0/pte1 as BE variables means reworking the whole
> > hw/ppc/spapr_softmmu.c file. That's feasible but not a small task
> > since the changes will root down in the target hash mmu code which
> > is shared by all platforms ... :/
> 
> Don't you just need to change spapr_store_hpte() to use stq_be_p()
> instead of stq_p() ?

I think Peter is right.  The values passed to the function are "host
endian" (really, they don't have an endianness since they'll be in
registers).

> > spapr_hpte_set_c() are spapr_hpte_set_r() are of a different kind.
> 
> That code seems to suggest we already implicitly assume that
> spapr->htab fields have a given endianness...

Yes, we absolutely do.  We rely on the HPTE always being big-endian.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH-for-8.0 4/4] hw/ppc/spapr_ovec: Avoid target_ulong spapr_ovec_parse_vector()

2022-12-21 Thread David Gibson
On Wed, Dec 21, 2022 at 10:26:51AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 12/21/22 06:46, Cédric Le Goater wrote:
> > On 12/16/22 17:47, Daniel Henrique Barboza wrote:
> > > 
> > > 
> > > On 12/13/22 09:35, Philippe Mathieu-Daudé wrote:
> > > > spapr_ovec.c is a device, but it uses target_ulong which is
> > > > target specific. The hwaddr type (declared in "exec/hwaddr.h")
> > > > better fits hardware addresses.
> > > 
> > > As said by Harsh, spapr_ovec is in fact a data structure that stores 
> > > platform
> > > options that are supported by the guest.
> > > 
> > > That doesn't mean that I oppose the change made here. Aside from 
> > > semantics - which
> > > I also don't have a strong opinion about it - I don't believe it matters 
> > > that
> > > much - spapr is 64 bit only, so hwaddr will always be == target_ulong.
> > > 
> > > Cedric/David/Greg, let me know if you have any restriction/thoughts about 
> > > this.
> > > I'm inclined to accept it as is.
> > 
> > Well, I am not sure.
> > 
> > The vector table variable is the result of a ppc64_phys_to_real() conversion
> > of the CAS hcall parameter, which is a target_ulong, but 
> > ppc64_phys_to_real()
> > returns a uint64_t.
> > 
> > The code is not consistent in another places :
> > 
> >    hw/ppc/spapr_tpm_proxy.c uses a uint64_t
> >    hw/ppc/spapr_hcall.c, a target_ulong
> >    hw/ppc/spapr_rtas.c, a hwaddr
> >    hw/ppc/spapr_drc.c, a hwaddr indirectly
> > 
> > Should we change ppc64_phys_to_real() to return an hwaddr (also) ?
> 
> It makes sense to me a function called ppc64_phys_to_real() returning
> a hwaddr type.

Yes, I also think that makes sense.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH-for-8.0 2/3] hw/ppc/spapr: Replace tswap64(HPTE) by cpu_to_be64(HPTE)

2022-12-20 Thread David Gibson
On Mon, Dec 19, 2022 at 10:39:40AM +, Peter Maydell wrote:
> On Mon, 19 Dec 2022 at 06:35, David Gibson  
> wrote:
> >
> > On Fri, Dec 16, 2022 at 09:39:19PM +, Peter Maydell wrote:
> > > On Fri, 16 Dec 2022 at 19:11, Daniel Henrique Barboza
> > >  wrote:
> > > >
> > > >
> > > >
> > > > On 12/13/22 10:51, Peter Maydell wrote:
> > > > Yes, most if not all accesses are being handled as "target endian", even
> > > > though the target is always big endian.
> >
> > So "target is always big endian" is pretty misleading for POWER.  We
> > always define "TARGET_BIG_ENDIAN" in qemu, but for at least 10 years
> > the CPUs have been capable of running in either big endian or little
> > endian mode (selected at runtime).  Some variants can choose
> > endianness on a per-page basis.  Since the creation of the ISA it's
> > had "byte reversed" load and store instructions that let it use little
> > endian for specific memory accesses.
> 
> Yeah, this is like Arm (and for the purposes of this thread
> I meant essentially "TARGET_BIG_ENDIAN is always defined").

Ok.

> > Really the whole notion of an ISA having an "endianness" doesn't make
> > a lot of sense - it's an individual load or store to memory that has
> > an endianness which can depend on a bunch of factors.  When these
> > macros were created, an ISA nearly always used the same endianness,
> > but that's not really true any more - not just for POWER, but for a
> > bunch of targets.  So from that point of view, I think getting rid of
> > tswap() - particularly one that has compile time semantics, rather
> > than behaviour which can depend on cpu mode/state is a good idea.
> 
> I tend to think of the TARGET_BIG_ENDIAN/not setting as being
> something like "CPU bus endianness". At least for Arm, when you
> put the CPU into BE mode it pretty much means "the CPU byteswaps
> the data when it comes in/out", AIUI.

Hmm, I guess.  We're not really modelling down to the level of bus
byte lanes, though, so I'm not really convinced that's a meaningful
definition in the context of qemu.

> > I believe that even when running in little-endian mode, the hash page
> > tables are encoded in big-endian, so I think the proposed change makes
> > sense.
> 
> OK. I still think we should consistently change all the places that are
> accessing this data structure, though, not just half of them.

Yes, that makes sense.  Although what exactly constitutes "this data
structure" is a bit complex here.  If we mean just the spapr specific
"external HPT", then there are only a few more references to it.  If
we mean all instances of a powerpc hashed page table, then there are a
bunch more in the cpu target code.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH-for-8.0 2/3] hw/ppc/spapr: Replace tswap64(HPTE) by cpu_to_be64(HPTE)

2022-12-18 Thread David Gibson
On Fri, Dec 16, 2022 at 09:39:19PM +, Peter Maydell wrote:
> On Fri, 16 Dec 2022 at 19:11, Daniel Henrique Barboza
>  wrote:
> >
> >
> >
> > On 12/13/22 10:51, Peter Maydell wrote:
> > > On Tue, 13 Dec 2022 at 12:52, Philippe Mathieu-Daudé  
> > > wrote:
> > >>
> > >> The tswap64() calls introduced in commit 4be21d561d ("pseries:
> > >> savevm support for pseries machine") are used to store the HTAB
> > >> in the migration stream (see savevm_htab_handlers) and are in
> > >> big-endian format.
> > >
> > > I think they're reading the run-time spapr->htab data structure
> > > (some of which is stuck onto the wire as a stream-of-bytes buffer
> > > and some of which is not). But either way, it's a target-endian
> > > data structure, because the code in hw/ppc/spapr_softmmu.c which
> > > reads and writes entries in it is using ldq_p() and stq_p(),
> > > and the current in-tree version of these macros is doing a
> > > "read host 64-bit and convert to/from target endianness wih tswap64".
> > >
> > >>   #define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 
> > >> 2))
> > >> -#define HPTE_VALID(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & 
> > >> HPTE64_V_VALID)
> > >> -#define HPTE_DIRTY(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & 
> > >> HPTE64_V_HPTE_DIRTY)
> > >> -#define CLEAN_HPTE(_hpte)  ((*(uint64_t *)(_hpte)) &= 
> > >> tswap64(~HPTE64_V_HPTE_DIRTY))
> > >> -#define DIRTY_HPTE(_hpte)  ((*(uint64_t *)(_hpte)) |= 
> > >> tswap64(HPTE64_V_HPTE_DIRTY))
> > >> +#define HPTE_VALID(_hpte)  (be64_to_cpu(*((uint64_t *)(_hpte))) & 
> > >> HPTE64_V_VALID)
> > >> +#define HPTE_DIRTY(_hpte)  (be64_to_cpu(*((uint64_t *)(_hpte))) & 
> > >> HPTE64_V_HPTE_DIRTY)
> > >> +#define CLEAN_HPTE(_hpte)  ((*(uint64_t *)(_hpte)) &= 
> > >> cpu_to_be64(~HPTE64_V_HPTE_DIRTY))
> > >> +#define DIRTY_HPTE(_hpte)  ((*(uint64_t *)(_hpte)) |= 
> > >> cpu_to_be64(HPTE64_V_HPTE_DIRTY))
> > >
> > > This means we now have one file that's accessing this data structure
> > > as "this is target-endian", and one file that's accessing it as
> > > "this is big-endian". It happens that that ends up meaning the same
> > > thing because PPC is always TARGET_BIG_ENDIAN, but it seems a bit
> > > inconsistent.
> > >
> > > We should decide whether we're thinking of the data structure
> > > as target-endian or big-endian and change all the accessors
> > > appropriately (or none of them -- currently we're completely
> > > consistent about treating it as "target endian", I think).
> >
> > Yes, most if not all accesses are being handled as "target endian", even
> > though the target is always big endian.

So "target is always big endian" is pretty misleading for POWER.  We
always define "TARGET_BIG_ENDIAN" in qemu, but for at least 10 years
the CPUs have been capable of running in either big endian or little
endian mode (selected at runtime).  Some variants can choose
endianness on a per-page basis.  Since the creation of the ISA it's
had "byte reversed" load and store instructions that let it use little
endian for specific memory accesses.

Really the whole notion of an ISA having an "endianness" doesn't make
a lot of sense - it's an individual load or store to memory that has
an endianness which can depend on a bunch of factors.  When these
macros were created, an ISA nearly always used the same endianness,
but that's not really true any more - not just for POWER, but for a
bunch of targets.  So from that point of view, I think getting rid of
tswap() - particularly one that has compile time semantics, rather
than behaviour which can depend on cpu mode/state is a good idea.

I believe that even when running in little-endian mode, the hash page
tables are encoded in big-endian, so I think the proposed change makes
sense.

> > IIUC the idea behind Phil's cleanups is exactly to replace uses of
> > "target-something" if the endianess of the host is irrelevant, which
> > is the case for ppc64. We would then change the semantics of the code
> > gradually to make it consistent again.
> 
> I would be happier if we just did all the functions that read and
> write this byte array at once -- there are not many of them.
> 
> thanks
> -- PMM
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-8.0] MAINTAINERS: downgrade PPC KVM/TCG CPUs and pSeries to 'Odd Fixes'

2022-11-17 Thread David Gibson
On Thu, Nov 17, 2022 at 06:06:33PM +0100, Greg Kurz wrote:
> On Thu, 17 Nov 2022 12:32:18 -0300
> Daniel Henrique Barboza  wrote:
> 
> > The maintainer is no longer being paid to maintain these components. All
> > maintainership work is being done in his personal time since the middle
> > of the 7.2 development cycle.
> > 
> 
> Great thanks Daniel for all your contributions over
> the years, and for being the one steering the vessel
> to the dry dock. This is it.

Seconded.

Reviewed-by: David Gibson 

> 
> > Change the status of PPC KVM CPUs, PPC TCG CPUs and the pSeries machine
> > to 'Odd Fixes', reflecting that the maintainer no longer has exclusive
> > time to dedicate to them. It'll also (hopefully) keep expectations under
> > check when/if these components are used in a customer product.
> > 
> > Cc: Cédric Le Goater 
> > Cc: David Gibson 
> > Cc: Greg Kurz 
> > Signed-off-by: Daniel Henrique Barboza 
> > ---
> 
> Reviewed-by: Greg Kurz 
> 
> >  MAINTAINERS | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index be151f0024..1d43153e5f 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -264,7 +264,7 @@ R: Cédric Le Goater 
> >  R: David Gibson 
> >  R: Greg Kurz 
> >  L: qemu-...@nongnu.org
> > -S: Maintained
> > +S: Odd Fixes
> >  F: target/ppc/
> >  F: hw/ppc/ppc.c
> >  F: hw/ppc/ppc_booke.c
> > @@ -389,7 +389,7 @@ M: Daniel Henrique Barboza 
> >  R: Cédric Le Goater 
> >  R: David Gibson 
> >  R: Greg Kurz 
> > -S: Maintained
> > +S: Odd Fixes
> >  F: target/ppc/kvm.c
> >  
> >  S390 KVM CPUs
> > @@ -1367,7 +1367,7 @@ R: Cédric Le Goater 
> >  R: David Gibson 
> >  R: Greg Kurz 
> >  L: qemu-...@nongnu.org
> > -S: Maintained
> > +S: Odd Fixes
> >  F: hw/*/spapr*
> >  F: include/hw/*/spapr*
> >  F: hw/*/xics*
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v10 05/17] net: introduce qemu_set_info_str() function

2022-10-06 Thread David Gibson
On Wed, Oct 05, 2022 at 06:20:39PM +0200, Laurent Vivier wrote:
> Embed the setting of info_str in a function.
> 
> Signed-off-by: Laurent Vivier 

Reviewed-by: David Gibson 

> ---
>  include/net/net.h |  1 +
>  net/l2tpv3.c  |  3 +--
>  net/net.c | 17 -
>  net/slirp.c   |  5 ++---
>  net/socket.c  | 33 ++---
>  net/tap-win32.c   |  3 +--
>  net/tap.c | 13 +
>  net/vde.c |  3 +--
>  net/vhost-user.c  |  3 +--
>  net/vhost-vdpa.c  |  2 +-
>  10 files changed, 39 insertions(+), 44 deletions(-)
> 
> diff --git a/include/net/net.h b/include/net/net.h
> index 025dbf1e143b..3db75ff841ff 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -177,6 +177,7 @@ ssize_t qemu_send_packet_async(NetClientState *nc, const 
> uint8_t *buf,
>  void qemu_purge_queued_packets(NetClientState *nc);
>  void qemu_flush_queued_packets(NetClientState *nc);
>  void qemu_flush_or_purge_queued_packets(NetClientState *nc, bool purge);
> +void qemu_set_info_str(NetClientState *nc, const char *fmt, ...);
>  void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]);
>  bool qemu_has_ufo(NetClientState *nc);
>  bool qemu_has_vnet_hdr(NetClientState *nc);
> diff --git a/net/l2tpv3.c b/net/l2tpv3.c
> index af373e5c300c..350041a0d6c0 100644
> --- a/net/l2tpv3.c
> +++ b/net/l2tpv3.c
> @@ -723,8 +723,7 @@ int net_init_l2tpv3(const Netdev *netdev,
>  
>  l2tpv3_read_poll(s, true);
>  
> -snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> - "l2tpv3: connected");
> +qemu_set_info_str(>nc, "l2tpv3: connected");
>  return 0;
>  outerr:
>  qemu_del_net_client(nc);
> diff --git a/net/net.c b/net/net.c
> index ffe3e5a2cf1d..41e05137d431 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -141,13 +141,20 @@ char *qemu_mac_strdup_printf(const uint8_t *macaddr)
> macaddr[3], macaddr[4], macaddr[5]);
>  }
>  
> +void qemu_set_info_str(NetClientState *nc, const char *fmt, ...)
> +{
> +va_list ap;
> +
> +va_start(ap, fmt);
> +vsnprintf(nc->info_str, sizeof(nc->info_str), fmt, ap);
> +va_end(ap);
> +}
> +
>  void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6])
>  {
> -snprintf(nc->info_str, sizeof(nc->info_str),
> - "model=%s,macaddr=%02x:%02x:%02x:%02x:%02x:%02x",
> - nc->model,
> - macaddr[0], macaddr[1], macaddr[2],
> - macaddr[3], macaddr[4], macaddr[5]);
> +qemu_set_info_str(nc, "model=%s,macaddr=%02x:%02x:%02x:%02x:%02x:%02x",
> +  nc->model, macaddr[0], macaddr[1], macaddr[2],
> +  macaddr[3], macaddr[4], macaddr[5]);
>  }
>  
>  static int mac_table[256] = {0};
> diff --git a/net/slirp.c b/net/slirp.c
> index 8679be644420..14a8d592774c 100644
> --- a/net/slirp.c
> +++ b/net/slirp.c
> @@ -611,9 +611,8 @@ static int net_slirp_init(NetClientState *peer, const 
> char *model,
>  
>  nc = qemu_new_net_client(_slirp_info, peer, model, name);
>  
> -snprintf(nc->info_str, sizeof(nc->info_str),
> - "net=%s,restrict=%s", inet_ntoa(net),
> - restricted ? "on" : "off");
> +qemu_set_info_str(nc, "net=%s,restrict=%s", inet_ntoa(net),
> +  restricted ? "on" : "off");
>  
>  s = DO_UPCAST(SlirpState, nc, nc);
>  
> diff --git a/net/socket.c b/net/socket.c
> index bfd8596250c4..ade1ecf38b87 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -179,7 +179,7 @@ static void net_socket_send(void *opaque)
>  s->fd = -1;
>  net_socket_rs_init(>rs, net_socket_rs_finalize, false);
>  s->nc.link_down = true;
> -memset(s->nc.info_str, 0, sizeof(s->nc.info_str));
> +qemu_set_info_str(>nc, "");
>  
>  return;
>  }
> @@ -387,16 +387,15 @@ static NetSocketState 
> *net_socket_fd_init_dgram(NetClientState *peer,
>  /* mcast: save bound address as dst */
>  if (is_connected && mcast != NULL) {
>  s->dgram_dst = saddr;
> -snprintf(nc->info_str, sizeof(nc->info_str),
> - "socket: fd=%d (cloned mcast=%s:%d)",
> - fd, inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
> +qemu_set_info_str(nc, "socket: fd=%d (cloned mcast=%s:%d)", fd,
> +  inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
>  } else {
>  if (sa_type == SOCKET_ADDRESS_TYPE_UNIX) {
>

Re: [PATCH v10 08/17] net: stream: Don't ignore EINVAL on netdev socket connection

2022-10-06 Thread David Gibson
On Wed, Oct 05, 2022 at 06:20:42PM +0200, Laurent Vivier wrote:
> From: Stefano Brivio 
> 
> Other errors are treated as failure by net_stream_client_init(),
> but if connect() returns EINVAL, we'll fail silently. Remove the
> related exception.
> 
> Signed-off-by: Stefano Brivio 
> [lvivier: applied to net/stream.c]
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Daniel P. Berrangé 

Reviewed-by: David Gibson 

> ---
>  net/stream.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/stream.c b/net/stream.c
> index 37965eb74e1a..26e485438718 100644
> --- a/net/stream.c
> +++ b/net/stream.c
> @@ -360,8 +360,7 @@ static int net_stream_client_init(NetClientState *peer,
>  if (errno == EINTR || errno == EWOULDBLOCK) {
>  /* continue */
>  } else if (errno == EINPROGRESS ||
> -   errno == EALREADY ||
> -   errno == EINVAL) {
> +   errno == EALREADY) {
>  break;
>  } else {
>      error_setg_errno(errp, errno, "can't connect socket");

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v10 07/17] net: socket: Don't ignore EINVAL on netdev socket connection

2022-10-06 Thread David Gibson
On Wed, Oct 05, 2022 at 06:20:41PM +0200, Laurent Vivier wrote:
> From: Stefano Brivio 
> 
> Other errors are treated as failure by net_socket_connect_init(),
> but if connect() returns EINVAL, we'll fail silently. Remove the
> related exception.
> 
> Signed-off-by: Stefano Brivio 
> Signed-off-by: Laurent Vivier 

Reviewed-by: David Gibson 

> ---
>  net/socket.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/socket.c b/net/socket.c
> index ade1ecf38b87..4944bb70d580 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -577,8 +577,7 @@ static int net_socket_connect_init(NetClientState *peer,
>  if (errno == EINTR || errno == EWOULDBLOCK) {
>  /* continue */
>  } else if (errno == EINPROGRESS ||
> -   errno == EALREADY ||
> -   errno == EINVAL) {
> +   errno == EALREADY) {
>  break;
>  } else {
>      error_setg_errno(errp, errno, "can't connect socket");

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v9 05/16] qapi: net: add stream and dgram netdevs

2022-10-05 Thread David Gibson
On Wed, Oct 05, 2022 at 12:08:27PM +0200, Laurent Vivier wrote:
> On 9/28/22 07:55, David Gibson wrote:
> > > +static int net_stream_server_init(NetClientState *peer,
> > > +  const char *model,
> > > +  const char *name,
> > > +  SocketAddress *addr,
> > > +  Error **errp)
> > > +{
> > > +NetClientState *nc;
> > > +NetStreamState *s;
> > > +int fd, ret;
> > > +
> > > +switch (addr->type) {
> > > +case SOCKET_ADDRESS_TYPE_INET: {
> > > +struct sockaddr_in saddr_in;
> > > +
> > > +if (convert_host_port(_in, addr->u.inet.host, 
> > > addr->u.inet.port,
> > > +  errp) < 0) {
> > > +return -1;
> > > +}
> > > +
> > > +fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> > > +if (fd < 0) {
> > > +error_setg_errno(errp, errno, "can't create stream socket");
> > > +return -1;
> > > +}
> > > +qemu_socket_set_nonblock(fd);
> > > +
> > > +socket_set_fast_reuse(fd);
> > > +
> > > +ret = bind(fd, (struct sockaddr *)_in, sizeof(saddr_in));
> > > +if (ret < 0) {
> > > +error_setg_errno(errp, errno, "can't bind ip=%s to socket",
> > > + inet_ntoa(saddr_in.sin_addr));
> > > +closesocket(fd);
> > > +return -1;
> > > +}
> > > +break;
> > > +}
> > > +case SOCKET_ADDRESS_TYPE_FD:
> > > +fd = monitor_fd_param(monitor_cur(), addr->u.fd.str, errp);
> > > +if (fd == -1) {
> > > +return -1;
> > > +}
> > > +ret = qemu_socket_try_set_nonblock(fd);
> > > +if (ret < 0) {
> > > +error_setg_errno(errp, -ret, "%s: Can't use file descriptor 
> > > %d",
> > > + name, fd);
> > > +return -1;
> > > +}
> > > +break;
> > > +default:
> > > +error_setg(errp, "only support inet or fd type");
> > > +return -1;
> > > +}
> > > +
> > > +ret = listen(fd, 0);
> > Does this make sense for a passed in fd?  If someone passes a "server"
> > fd, are they likely to be passing a socket on which bind() but not
> > listen() has been called?  Or one on which both bind() and listen()
> > have been called?
> > 
> 
> Original code in net/socket.c doesn't manage server case with fd.
> 
> So I have checked what is done for QIO (all this code is overwritten by
> patch introducing QIO anyway):
> 
> At the end of the series, we use qio_channel_socket_listen_async() in
> net_stream_server_init(), that in the end calls socket_listen().
> 
> With SOCKET_ADDRESS_TYPE_FD we does the listen() (without bind()) with the 
> following comment:
> 
> case SOCKET_ADDRESS_TYPE_FD:
> fd = socket_get_fd(addr->u.fd.str, errp);
> if (fd < 0) {
> return -1;
> }
> 
> /*
>  * If the socket is not yet in the listen state, then transition it to
>  * the listen state now.
>  *
>  * If it's already listening then this updates the backlog value as
>  * requested.
>  *
>  * If this socket cannot listen because it's already in another state
>  * (e.g. unbound or connected) then we'll catch the error here.
>  */
> if (listen(fd, num) != 0) {
> error_setg_errno(errp, errno, "Failed to listen on fd socket");
> closesocket(fd);
> return -1;
> }
> break;
> 
> So I think we should keep the listen() in our case too.

Ok, that makes sense to me.  Or at least, if it's not correct we
should fix it later for all the places at the same time in the qio
code.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v9 08/16] net: stream: add unix socket

2022-10-05 Thread David Gibson
On Wed, Oct 05, 2022 at 03:38:09PM +0200, Laurent Vivier wrote:
> On 9/28/22 08:12, David Gibson wrote:
> > > @@ -253,9 +253,27 @@ static void net_stream_accept(void *opaque)
> > >   s->fd = fd;
> > >   s->nc.link_down = false;
> > >   net_stream_connect(s);
> > > -snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> > > - "connection from %s:%d",
> > > - inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
> > > +switch (saddr.ss_family) {
> > > +case AF_INET: {
> > > +struct sockaddr_in *saddr_in = (struct sockaddr_in *)
> > > +
> > > +snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> > > + "connection from %s:%d",
> > > + inet_ntoa(saddr_in->sin_addr), 
> > > ntohs(saddr_in->sin_port));
> > So, here you print the address from which the connection has come -
> > the remote address.
> > 
> > > +break;
> > > +}
> > > +case AF_UNIX: {
> > > +struct sockaddr_un saddr_un;
> > > +
> > > +len = sizeof(saddr_un);
> > > +getsockname(s->listen_fd, (struct sockaddr *)_un, );
> > > +snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> > > + "connect from %s", saddr_un.sun_path);
> > Here you print the bound address - the local address.  Does that make
> > sense?  I mean, in almost every occasion the remote Unix socket will
> > be anonymous, so it probably doesn't make sense to display that, but
> > is the bound address actually a useful substitute?
> > 
> > Maybe it should just be "connect from Unix socket".
> > 
> 
> I agree the needed information is "connected" and type "unix".
> 
> But I think more information we can put here can be useful for a debugging 
> purpose.

Fair enough.  I feel like "connect from" is still possible
misleading.  Maybe "connect via"?  Or even "connection to Unix socket %s"?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] MAINTAINERS: step back from PPC

2022-09-30 Thread David Gibson
On Thu, Sep 29, 2022 at 08:13:40PM +0200, Greg Kurz wrote:
> On Thu, 29 Sep 2022 20:09:46 +0200
> Cédric Le Goater  wrote:
> 
> > I am not active anymore on the PPC maintainership, degrade my self as
> > standard Reviewer. Also degrade PowerNV and XIVE status since I am not
> > funded for this work.
> > 
> > Signed-off-by: Cédric Le Goater 
> > ---
> 
> End of an era. Thank you for all the dedication and accomplishments !

Seconded.

Reviewed-by: David Gibson 

> 
> Reviewed-by: Greg Kurz 
> 
> >  MAINTAINERS | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 1729c0901cea..40f4984b439b 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -267,8 +267,8 @@ F: hw/openrisc/
> >  F: tests/tcg/openrisc/
> >  
> >  PowerPC TCG CPUs
> > -M: Cédric Le Goater 
> >  M: Daniel Henrique Barboza 
> > +R: Cédric Le Goater 
> >  R: David Gibson 
> >  R: Greg Kurz 
> >  L: qemu-...@nongnu.org
> > @@ -392,8 +392,8 @@ F: target/mips/kvm*
> >  F: target/mips/sysemu/
> >  
> >  PPC KVM CPUs
> > -M: Cédric Le Goater 
> >  M: Daniel Henrique Barboza 
> > +R: Cédric Le Goater 
> >  R: David Gibson 
> >  R: Greg Kurz 
> >  S: Maintained
> > @@ -1365,8 +1365,8 @@ F: include/hw/rtc/m48t59.h
> >  F: tests/avocado/ppc_prep_40p.py
> >  
> >  sPAPR (pseries)
> > -M: Cédric Le Goater 
> >  M: Daniel Henrique Barboza 
> > +R: Cédric Le Goater 
> >  R: David Gibson 
> >  R: Greg Kurz 
> >  L: qemu-...@nongnu.org
> > @@ -1387,7 +1387,7 @@ F: tests/avocado/ppc_pseries.py
> >  PowerNV (Non-Virtualized)
> >  M: Cédric Le Goater 
> >  L: qemu-...@nongnu.org
> > -S: Maintained
> > +S: Odd Fixes
> >  F: docs/system/ppc/powernv.rst
> >  F: hw/ppc/pnv*
> >  F: hw/intc/pnv*
> > @@ -2321,7 +2321,7 @@ T: git https://github.com/philmd/qemu.git fw_cfg-next
> >  XIVE
> >  M: Cédric Le Goater 
> >  L: qemu-...@nongnu.org
> > -S: Supported
> > +S: Odd Fixes
> >  F: hw/*/*xive*
> >  F: include/hw/*/*xive*
> >  F: docs/*/*xive*
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v9 05/16] qapi: net: add stream and dgram netdevs

2022-09-28 Thread David Gibson
tate, nc, nc);
> +s->fd = -1;
> +s->listen_fd = fd;
> +s->nc.link_down = true;
> +net_socket_rs_init(>rs, net_stream_rs_finalize, false);
> +
> +qemu_set_fd_handler(s->listen_fd, net_stream_accept, NULL, s);
> +return 0;
> +}
> +
> +static int net_stream_client_init(NetClientState *peer,
> +  const char *model,
> +  const char *name,
> +  SocketAddress *addr,
> +  Error **errp)
> +{
> +NetStreamState *s;
> +int fd, connected, ret;
> +gchar *info_str;
> +
> +switch (addr->type) {
> +case SOCKET_ADDRESS_TYPE_INET: {
> +struct sockaddr_in saddr_in;
> +
> +if (convert_host_port(_in, addr->u.inet.host, 
> addr->u.inet.port,
> +  errp) < 0) {
> +return -1;
> +}
> +
> +fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
> +if (fd < 0) {
> +error_setg_errno(errp, errno, "can't create stream socket");
> +return -1;
> +}
> +qemu_socket_set_nonblock(fd);
> +
> +connected = 0;
> +for (;;) {
> +ret = connect(fd, (struct sockaddr *)_in, 
> sizeof(saddr_in));
> +if (ret < 0) {
> +if (errno == EINTR || errno == EWOULDBLOCK) {
> +/* continue */
> +} else if (errno == EINPROGRESS ||
> +   errno == EALREADY ||
> +   errno == EINVAL) {
> +break;
> +} else {
> +error_setg_errno(errp, errno, "can't connect socket");
> +closesocket(fd);
> +return -1;
> +}
> +} else {
> +connected = 1;
> +break;
> +}
> +}
> +info_str = g_strdup_printf("connect to %s:%d",
> +   inet_ntoa(saddr_in.sin_addr),
> +   ntohs(saddr_in.sin_port));
> +break;
> +}
> +case SOCKET_ADDRESS_TYPE_FD:
> +fd = monitor_fd_param(monitor_cur(), addr->u.fd.str, errp);
> +if (fd == -1) {
> +return -1;
> +}
> +ret = qemu_socket_try_set_nonblock(fd);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "%s: Can't use file descriptor %d",
> + name, fd);
> +return -1;
> +}
> +connected = 1;
> +info_str = g_strdup_printf("connect to fd %d", fd);
> +break;
> +default:
> +error_setg(errp, "only support inet or fd type");
> +return -1;
> +}
> +
> +s = net_stream_fd_init_stream(peer, model, name, fd, connected);
> +
> +pstrcpy(s->nc.info_str, sizeof(s->nc.info_str), info_str);
> +g_free(info_str);
> +
> +return 0;
> +}
> +
> +int net_init_stream(const Netdev *netdev, const char *name,
> +NetClientState *peer, Error **errp)
> +{
> +const NetdevStreamOptions *sock;
> +
> +assert(netdev->type == NET_CLIENT_DRIVER_STREAM);
> +sock = >u.stream;
> +
> +if (!sock->has_server || sock->server) {
> +return net_stream_server_init(peer, "stream", name, sock->addr, 
> errp);
> +}
> +return net_stream_client_init(peer, "stream", name, sock->addr, errp);
> +}
> diff --git a/qapi/net.json b/qapi/net.json
> index dd088c09c509..e02e8001a000 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -7,6 +7,7 @@
>  ##
>  
>  { 'include': 'common.json' }
> +{ 'include': 'sockets.json' }
>  
>  ##
>  # @set_link:
> @@ -573,6 +574,61 @@
>  '*isolated':  'bool' },
>'if': 'CONFIG_VMNET' }
>  
> +##
> +# @NetdevStreamOptions:
> +#
> +# Configuration info for stream socket netdev
> +#
> +# @addr: socket address to listen on (server=true)
> +#or connect to (server=false)
> +# @server: create server socket (default: true)
> +#
> +# Only SocketAddress types 'inet' and 'fd' are supported.
> +#
> +# Since: 7.1
> +##
> +{ 'struct': 'NetdevStreamOptions',
> +  'data': {
> +'addr':   'SocketAddress',
> +'*server': 'bool' } }
> +
> +##
> +# @NetdevDgramOptions:
> +#
> +# Configuration info for datagram socket netdev.
> +#
> +# @remote: remote address
> +# @local: local address
> +#
> +# Only SocketAddress types 'inet' and 'fd' are supported.
> +#
> +# The code checks there is at least one of these options and reports an error
> +# if not. If remote address is present and it's a multicast address, local
> +# address is optional. Otherwise local address is required and remote address
> +# is optional.
> +#
> +# .. table:: Valid parameters combination table
> +#:widths: auto
> +#
> +#=    =
> +#remote local okay?
> +#=    =
> +#absent absentno
> +#absent not fdno
> +#absent fdyes
> +#multicast  absentyes
> +#multicast  present   yes
> +#not multicast  absentno
> +#not multicast  present   yes
> +#=    =
> +#
> +# Since: 7.1
> +##
> +{ 'struct': 'NetdevDgramOptions',
> +  'data': {
> +'*local':  'SocketAddress',
> +'*remote': 'SocketAddress' } }
> +
>  ##
>  # @NetClientDriver:
>  #
> @@ -586,8 +642,9 @@
>  #@vmnet-bridged since 7.1
>  ##
>  { 'enum': 'NetClientDriver',
> -  'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde',
> -'bridge', 'hubport', 'netmap', 'vhost-user', 'vhost-vdpa',
> +  'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'stream',
> +'dgram', 'vde', 'bridge', 'hubport', 'netmap', 'vhost-user',
> +'vhost-vdpa',
>  { 'name': 'vmnet-host', 'if': 'CONFIG_VMNET' },
>  { 'name': 'vmnet-shared', 'if': 'CONFIG_VMNET' },
>  { 'name': 'vmnet-bridged', 'if': 'CONFIG_VMNET' }] }
> @@ -617,6 +674,8 @@
>  'tap':  'NetdevTapOptions',
>  'l2tpv3':   'NetdevL2TPv3Options',
>  'socket':   'NetdevSocketOptions',
> +'stream':   'NetdevStreamOptions',
> +'dgram':'NetdevDgramOptions',
>  'vde':  'NetdevVdeOptions',
>  'bridge':   'NetdevBridgeOptions',
>  'hubport':  'NetdevHubPortOptions',
> diff --git a/qemu-options.hx b/qemu-options.hx
> index d8b5ce5b4354..8c765f345da8 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2734,6 +2734,18 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
>  "-netdev socket,id=str[,fd=h][,udp=host:port][,localaddr=host:port]\n"
>  "configure a network backend to connect to another 
> network\n"
>  "using an UDP tunnel\n"
> +"-netdev 
> stream,id=str[,server=on|off],addr.type=inet,addr.host=host,addr.port=port\n"
> +"-netdev stream,id=str[,server=on|off],addr.type=fd,addr.str=h\n"
> +"configure a network backend to connect to another 
> network\n"
> +"using a socket connection in stream mode.\n"
> +"-netdev 
> dgram,id=str,remote.type=inet,remote.host=maddr,remote.port=port[,local.type=inet,local.host=addr]\n"
> +"-netdev 
> dgram,id=str,remote.type=inet,remote.host=maddr,remote.port=port[,local.type=fd,local.str=h]\n"
> +"configure a network backend to connect to a multicast 
> maddr and port\n"
> +"use ``local.host=addr`` to specify the host address to 
> send packets from\n"
> +"-netdev 
> dgram,id=str,local.type=inet,local.host=addr,local.port=port[,remote.type=inet,remote.host=addr,remote.port=port]\n"
> +"-netdev dgram,id=str,local.type=fd,local.str=h\n"
> +"configure a network backend to connect to another 
> network\n"
> +"using an UDP tunnel\n"
>  #ifdef CONFIG_VDE
>  "-netdev 
> vde,id=str[,sock=socketpath][,port=n][,group=groupname][,mode=octalmode]\n"
>  "configure a network backend to connect to port 'n' of a 
> vde switch\n"

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v9 01/16] net: introduce convert_host_port()

2022-09-28 Thread David Gibson
On Mon, Sep 26, 2022 at 09:50:33PM +0200, Laurent Vivier wrote:
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Stefano Brivio 

Reviewed-by: David Gibson 

> ---
>  include/qemu/sockets.h |  2 ++
>  net/net.c  | 62 ++
>  2 files changed, 34 insertions(+), 30 deletions(-)
> 
> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
> index 038faa157f59..47194b9732f8 100644
> --- a/include/qemu/sockets.h
> +++ b/include/qemu/sockets.h
> @@ -47,6 +47,8 @@ void socket_listen_cleanup(int fd, Error **errp);
>  int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
>  
>  /* Old, ipv4 only bits.  Don't use for new code. */
> +int convert_host_port(struct sockaddr_in *saddr, const char *host,
> +  const char *port, Error **errp);
>  int parse_host_port(struct sockaddr_in *saddr, const char *str,
>  Error **errp);
>  int socket_init(void);
> diff --git a/net/net.c b/net/net.c
> index 2db160e0634d..d2288bd3a929 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -66,55 +66,57 @@ static QTAILQ_HEAD(, NetClientState) net_clients;
>  /***/
>  /* network device redirectors */
>  
> -int parse_host_port(struct sockaddr_in *saddr, const char *str,
> -Error **errp)
> +int convert_host_port(struct sockaddr_in *saddr, const char *host,
> +  const char *port, Error **errp)
>  {
> -gchar **substrings;
>  struct hostent *he;
> -const char *addr, *p, *r;
> -int port, ret = 0;
> +const char *r;
> +long p;
>  
>  memset(saddr, 0, sizeof(*saddr));
>  
> -substrings = g_strsplit(str, ":", 2);
> -if (!substrings || !substrings[0] || !substrings[1]) {
> -error_setg(errp, "host address '%s' doesn't contain ':' "
> -   "separating host from port", str);
> -ret = -1;
> -goto out;
> -}
> -
> -addr = substrings[0];
> -p = substrings[1];
> -
>  saddr->sin_family = AF_INET;
> -if (addr[0] == '\0') {
> +if (host[0] == '\0') {
>  saddr->sin_addr.s_addr = 0;
>  } else {
> -if (qemu_isdigit(addr[0])) {
> -if (!inet_aton(addr, >sin_addr)) {
> +if (qemu_isdigit(host[0])) {
> +if (!inet_aton(host, >sin_addr)) {
>  error_setg(errp, "host address '%s' is not a valid "
> -   "IPv4 address", addr);
> -ret = -1;
> -goto out;
> +   "IPv4 address", host);
> +return -1;
>  }
>  } else {
> -he = gethostbyname(addr);
> +he = gethostbyname(host);
>  if (he == NULL) {
> -error_setg(errp, "can't resolve host address '%s'", addr);
> -ret = -1;
> -goto out;
> +error_setg(errp, "can't resolve host address '%s'", host);
> +return -1;
>  }
>  saddr->sin_addr = *(struct in_addr *)he->h_addr;
>  }
>  }
> -port = strtol(p, (char **), 0);
> -if (r == p) {
> -error_setg(errp, "port number '%s' is invalid", p);
> +if (qemu_strtol(port, , 0, ) != 0) {
> +error_setg(errp, "port number '%s' is invalid", port);
> +return -1;
> +}
> +saddr->sin_port = htons(p);
> +return 0;
> +}
> +
> +int parse_host_port(struct sockaddr_in *saddr, const char *str,
> +Error **errp)
> +{
> +gchar **substrings;
> +int ret;
> +
> +substrings = g_strsplit(str, ":", 2);
> +if (!substrings || !substrings[0] || !substrings[1]) {
> +error_setg(errp, "host address '%s' doesn't contain ':' "
> +   "separating host from port", str);
>  ret = -1;
>  goto out;
>  }
> -saddr->sin_port = htons(port);
> +
> +ret = convert_host_port(saddr, substrings[0], substrings[1], errp);
>  
>  out:
>  g_strfreev(substrings);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v9 03/16] net: simplify net_client_parse() error management

2022-09-28 Thread David Gibson
On Mon, Sep 26, 2022 at 09:50:35PM +0200, Laurent Vivier wrote:
> All net_client_parse() callers exit in case of error.
> 
> Move exit(1) to net_client_parse() and remove error checking from
> the callers.
> 
> Suggested-by: Markus Armbruster 
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Markus Armbruster 

Reviewed-by: David Gibson 

> ---
>  include/net/net.h |  2 +-
>  net/net.c |  6 ++
>  softmmu/vl.c  | 12 +++-
>  3 files changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/include/net/net.h b/include/net/net.h
> index c1c34a58f849..55023e7e9fa9 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -220,7 +220,7 @@ extern NICInfo nd_table[MAX_NICS];
>  extern const char *host_net_devices[];
>  
>  /* from net.c */
> -int net_client_parse(QemuOptsList *opts_list, const char *str);
> +void net_client_parse(QemuOptsList *opts_list, const char *str);
>  void show_netdevs(void);
>  void net_init_clients(void);
>  void net_check_clients(void);
> diff --git a/net/net.c b/net/net.c
> index 15958f881776..f056e8aebfb2 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1579,13 +1579,11 @@ void net_init_clients(void)
>_fatal);
>  }
>  
> -int net_client_parse(QemuOptsList *opts_list, const char *optarg)
> +void net_client_parse(QemuOptsList *opts_list, const char *optarg)
>  {
>  if (!qemu_opts_parse_noisily(opts_list, optarg, true)) {
> -return -1;
> +exit(1);
>  }
> -
> -return 0;
>  }
>  
>  /* From FreeBSD */
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index b172134a62cb..f71fca2a9f73 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2809,21 +2809,15 @@ void qemu_init(int argc, char **argv, char **envp)
>  break;
>  case QEMU_OPTION_netdev:
>  default_net = 0;
> -if (net_client_parse(qemu_find_opts("netdev"), optarg) == 
> -1) {
> -exit(1);
> -}
> +net_client_parse(qemu_find_opts("netdev"), optarg);
>  break;
>  case QEMU_OPTION_nic:
>  default_net = 0;
> -if (net_client_parse(qemu_find_opts("nic"), optarg) == -1) {
> -exit(1);
> -}
> +net_client_parse(qemu_find_opts("nic"), optarg);
>  break;
>  case QEMU_OPTION_net:
>  default_net = 0;
> -if (net_client_parse(qemu_find_opts("net"), optarg) == -1) {
> -exit(1);
> -}
> +net_client_parse(qemu_find_opts("net"), optarg);
>  break;
>  #ifdef CONFIG_LIBISCSI
>  case QEMU_OPTION_iscsi:

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v9 13/16] qemu-sockets: update socket_uri() and socket_parse() to be consistent

2022-09-28 Thread David Gibson
On Mon, Sep 26, 2022 at 09:50:45PM +0200, Laurent Vivier wrote:
> To be consistent with socket_uri(), add 'tcp:' prefix for inet type in
> socket_parse(), by default socket_parse() use tcp when no prefix is
> provided (format is host:port).
> 
> In socket_uri(), use 'vsock:' prefix for vsock type rather than 'tcp:'
> because it makes a vsock address look like an inet address with CID
> misinterpreted as host.
> Goes back to commit 9aca82ba31 "migration: Create socket-address parameter"
> 
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Dr. David Alan Gilbert 
> Reviewed-by: Markus Armbruster 

Reviewed-by: David Gibson 

> ---
>  util/qemu-sockets.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 9f6f655fd526..a9926af714c4 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -1090,7 +1090,7 @@ char *socket_uri(SocketAddress *addr)
>  case SOCKET_ADDRESS_TYPE_FD:
>  return g_strdup_printf("fd:%s", addr->u.fd.str);
>  case SOCKET_ADDRESS_TYPE_VSOCK:
> -return g_strdup_printf("tcp:%s:%s",
> +return g_strdup_printf("vsock:%s:%s",
> addr->u.vsock.cid,
> addr->u.vsock.port);
>  default:
> @@ -1124,6 +1124,11 @@ SocketAddress *socket_parse(const char *str, Error 
> **errp)
>  if (vsock_parse(>u.vsock, str + strlen("vsock:"), errp)) {
>  goto fail;
>  }
> +} else if (strstart(str, "tcp:", NULL)) {
> +addr->type = SOCKET_ADDRESS_TYPE_INET;
> +if (inet_parse(>u.inet, str + strlen("tcp:"), errp)) {
> +goto fail;
> +}
>  } else {
>  addr->type = SOCKET_ADDRESS_TYPE_INET;
>  if (inet_parse(>u.inet, str, errp)) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v9 11/16] net: dgram: add unix socket

2022-09-28 Thread David Gibson
On Mon, Sep 26, 2022 at 09:50:43PM +0200, Laurent Vivier wrote:
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Stefano Brivio 

Reviewed-by: David Gibson 

Although one note below.

> ---
>  net/dgram.c | 65 ++---
>  qapi/net.json   |  2 +-
>  qemu-options.hx |  1 +
>  3 files changed, 64 insertions(+), 4 deletions(-)
> 
> diff --git a/net/dgram.c b/net/dgram.c
> index 9fb01410304e..db631f6e2270 100644
> --- a/net/dgram.c
> +++ b/net/dgram.c
> @@ -84,8 +84,15 @@ static ssize_t net_dgram_receive_dgram(NetClientState *nc,
>  
>  do {
>  if (s->dgram_dst) {
> -ret = sendto(s->fd, buf, size, 0, s->dgram_dst,
> - sizeof(struct sockaddr_in));
> +socklen_t len;
> +
> +if (s->dgram_dst->sa_family == AF_INET) {
> +len = sizeof(struct sockaddr_in);
> +} else {
> +len = sizeof(struct sockaddr_un);
> +}

It really seems like you're going to want a common helper for getting
the socklet, if not now, then pretty soon.

> +ret = sendto(s->fd, buf, size, 0, s->dgram_dst, len);
>  } else {
>  ret = send(s->fd, buf, size, 0);
>  }
> @@ -450,7 +457,7 @@ static int net_dgram_init(NetClientState *peer,
>  }
>  } else {
>  if (local->type != SOCKET_ADDRESS_TYPE_FD) {
> -error_setg(errp, "type=inet requires remote parameter");
> +error_setg(errp, "type=inet or unix require remote parameter");
>  return -1;
>  }
>  }
> @@ -500,6 +507,58 @@ static int net_dgram_init(NetClientState *peer,
>  
>  break;
>  }
> +case SOCKET_ADDRESS_TYPE_UNIX: {
> +struct sockaddr_un laddr_un, raddr_un;
> +
> +ret = unlink(local->u.q_unix.path);
> +if (ret < 0 && errno != ENOENT) {
> +error_setg_errno(errp, errno, "failed to unlink socket %s",
> + local->u.q_unix.path);
> +return -1;
> +}
> +
> +laddr_un.sun_family = PF_UNIX;
> +ret = snprintf(laddr_un.sun_path, sizeof(laddr_un.sun_path), "%s",
> +   local->u.q_unix.path);
> +if (ret < 0 || ret >= sizeof(laddr_un.sun_path)) {
> +error_setg(errp, "UNIX socket path '%s' is too long",
> +   local->u.q_unix.path);
> +error_append_hint(errp, "Path must be less than %zu bytes\n",
> +  sizeof(laddr_un.sun_path));
> +}
> +
> +raddr_un.sun_family = PF_UNIX;
> +ret = snprintf(raddr_un.sun_path, sizeof(raddr_un.sun_path), "%s",
> +   remote->u.q_unix.path);
> +if (ret < 0 || ret >= sizeof(raddr_un.sun_path)) {
> +error_setg(errp, "UNIX socket path '%s' is too long",
> +   remote->u.q_unix.path);
> +error_append_hint(errp, "Path must be less than %zu bytes\n",
> +  sizeof(raddr_un.sun_path));
> +}
> +
> +fd = qemu_socket(PF_UNIX, SOCK_DGRAM, 0);
> +if (fd < 0) {
> +error_setg_errno(errp, errno, "can't create datagram socket");
> +return -1;
> +}
> +
> +ret = bind(fd, (struct sockaddr *)_un, sizeof(laddr_un));
> +if (ret < 0) {
> +error_setg_errno(errp, errno, "can't bind unix=%s to socket",
> + laddr_un.sun_path);
> +closesocket(fd);
> +return -1;
> +}
> +qemu_socket_set_nonblock(fd);
> +
> +dgram_dst = g_malloc(sizeof(raddr_un));
> +memcpy(dgram_dst, _un, sizeof(raddr_un));
> +
> +info_str = g_strdup_printf("udp=%s:%s",
> +   laddr_un.sun_path, raddr_un.sun_path);
> +break;
> +}
>  case SOCKET_ADDRESS_TYPE_FD: {
>  SocketAddress *sa;
>  SocketAddressType sa_type;
> diff --git a/qapi/net.json b/qapi/net.json
> index bb96701a49a7..9cc4be7535bb 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -600,7 +600,7 @@
>  # @remote: remote address
>  # @local: local address
>  #
> -# Only SocketAddress types 'inet' and 'fd' are supported.
> +# Only SocketAddress types 'unix', 'inet' and 'fd' are supported.
>  #
>  # The code checks there is at least one of these options and reports an error
>  # if not. If remote address is present and it's a multic

Re: [PATCH v9 08/16] net: stream: add unix socket

2022-09-28 Thread David Gibson
dr_un.sun_path, sizeof(saddr_un.sun_path), "%s",
> +   addr->u.q_unix.path);
> +if (ret < 0 || ret >= sizeof(saddr_un.sun_path)) {
> +error_setg(errp, "UNIX socket path '%s' is too long",
> +   addr->u.q_unix.path);
> +error_append_hint(errp, "Path must be less than %zu bytes\n",
> +  sizeof(saddr_un.sun_path));
> +return -1;
> +}
> +
> +fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
> +if (fd < 0) {
> +error_setg_errno(errp, errno, "can't create stream socket");
> +return -1;
> +}
> +qemu_socket_set_nonblock(fd);
> +
> +connected = 0;
> +for (;;) {
> +ret = connect(fd, (struct sockaddr *)_un, 
> sizeof(saddr_un));
> +if (ret < 0) {
> +if (errno == EINTR || errno == EWOULDBLOCK) {
> +/* continue */
> +} else if (errno == EAGAIN ||
> +   errno == EALREADY) {
> +break;
> +} else {
> +error_setg_errno(errp, errno, "can't connect socket");
> +closesocket(fd);
> +return -1;
> +}
> +} else {
> +connected = 1;
> +break;
> +}
> +}
> +info_str = g_strdup_printf(" connect to %s", saddr_un.sun_path);
> +break;
> +}
>  case SOCKET_ADDRESS_TYPE_FD:
>  fd = monitor_fd_param(monitor_cur(), addr->u.fd.str, errp);
>  if (fd == -1) {
> @@ -395,7 +493,7 @@ static int net_stream_client_init(NetClientState *peer,
>  info_str = g_strdup_printf("connect to fd %d", fd);
>  break;
>  default:
> -error_setg(errp, "only support inet or fd type");
> +error_setg(errp, "only support inet, unix or fd type");
>  return -1;
>  }
>  
> diff --git a/qapi/net.json b/qapi/net.json
> index e02e8001a000..bb96701a49a7 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -583,7 +583,7 @@
>  #or connect to (server=false)
>  # @server: create server socket (default: true)
>  #
> -# Only SocketAddress types 'inet' and 'fd' are supported.
> +# Only SocketAddress types 'unix', 'inet' and 'fd' are supported.
>  #
>  # Since: 7.1
>  ##
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8c765f345da8..7a34022ac651 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2735,6 +2735,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
>  "configure a network backend to connect to another 
> network\n"
>  "using an UDP tunnel\n"
>  "-netdev 
> stream,id=str[,server=on|off],addr.type=inet,addr.host=host,addr.port=port\n"
> +"-netdev stream,id=str[,server=on|off],addr.type=unix,addr.path=path\n"
>  "-netdev stream,id=str[,server=on|off],addr.type=fd,addr.str=h\n"
>  "configure a network backend to connect to another 
> network\n"
>  "using a socket connection in stream mode.\n"

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v9 09/16] net: dgram: make dgram_dst generic

2022-09-28 Thread David Gibson
urn s;
>  
>  err:
> +g_free(saddr);
>  closesocket(fd);
>  return NULL;
>  }
> @@ -340,21 +340,24 @@ static int net_dgram_mcast_init(NetClientState *peer,
>  {
>  NetDgramState *s;
>  int fd, ret;
> -struct sockaddr_in saddr;
> +struct sockaddr_in *saddr;
>  
>  if (remote->type != SOCKET_ADDRESS_TYPE_INET) {
>  error_setg(errp, "multicast only support inet type");
>  return -1;
>  }
>  
> -if (convert_host_port(, remote->u.inet.host, remote->u.inet.port,
> +saddr = g_new(struct sockaddr_in, 1);
> +if (convert_host_port(saddr, remote->u.inet.host, remote->u.inet.port,
>errp) < 0) {
> +g_free(saddr);
>  return -1;
>  }
>  
>  if (!local) {
> -fd = net_dgram_mcast_create(, NULL, errp);
> +fd = net_dgram_mcast_create(saddr, NULL, errp);
>  if (fd < 0) {
> +g_free(saddr);
>  return -1;
>  }
>  } else {
> @@ -363,13 +366,15 @@ static int net_dgram_mcast_init(NetClientState *peer,
>  struct in_addr localaddr;
>  
>  if (inet_aton(local->u.inet.host, ) == 0) {
> +g_free(saddr);
>  error_setg(errp, "localaddr '%s' is not a valid IPv4 
> address",
> local->u.inet.host);
>  return -1;
>  }
>  
> -fd = net_dgram_mcast_create(, , errp);
> +fd = net_dgram_mcast_create(saddr, , errp);
>  if (fd < 0) {
> +g_free(saddr);
>  return -1;
>  }
>  break;
> @@ -377,16 +382,19 @@ static int net_dgram_mcast_init(NetClientState *peer,
>  case SOCKET_ADDRESS_TYPE_FD:
>  fd = monitor_fd_param(monitor_cur(), local->u.fd.str, errp);
>  if (fd == -1) {
> +g_free(saddr);
>  return -1;
>  }
>  ret = qemu_socket_try_set_nonblock(fd);
>  if (ret < 0) {
> +g_free(saddr);
>  error_setg_errno(errp, -ret, "%s: Can't use file descriptor 
> %d",
>   name, fd);
>  return -1;
>  }
>  break;
>  default:
> +g_free(saddr);
>  error_setg(errp, "only support inet or fd type for local");
>  return -1;
>  }
> @@ -396,13 +404,16 @@ static int net_dgram_mcast_init(NetClientState *peer,
>   local->type == SOCKET_ADDRESS_TYPE_FD,
>   remote, errp);
>  if (!s) {
> +g_free(saddr);
>  return -1;
>  }
>  
> -s->dgram_dst = saddr;
> +g_assert(s->dgram_dst == NULL);
> +s->dgram_dst = (struct sockaddr *)saddr;
>  
>  snprintf(s->nc.info_str, sizeof(s->nc.info_str), "mcast=%s:%d",
> - inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
> + inet_ntoa(saddr->sin_addr), ntohs(saddr->sin_port));
> +
>  return 0;
>  
>  }
> @@ -416,8 +427,8 @@ static int net_dgram_init(NetClientState *peer,
>  {
>  NetDgramState *s;
>  int fd, ret;
> -struct sockaddr_in raddr_in;
>  gchar *info_str;
> +struct sockaddr *dgram_dst;
>  
>  /* detect multicast address */
>  if (remote && remote->type == SOCKET_ADDRESS_TYPE_INET) {
> @@ -458,7 +469,7 @@ static int net_dgram_init(NetClientState *peer,
>  
>  switch (local->type) {
>  case SOCKET_ADDRESS_TYPE_INET: {
> -struct sockaddr_in laddr_in;
> +struct sockaddr_in laddr_in, raddr_in;
>  
>  if (convert_host_port(_in, local->u.inet.host, 
> local->u.inet.port,
>errp) < 0) {
> @@ -492,9 +503,12 @@ static int net_dgram_init(NetClientState *peer,
>  }
>  qemu_socket_set_nonblock(fd);
>  
> +dgram_dst = g_malloc(sizeof(raddr_in));
> +memcpy(dgram_dst, _in, sizeof(raddr_in));
> +
>  info_str = g_strdup_printf("udp=%s:%d/%s:%d",
> - inet_ntoa(laddr_in.sin_addr), ntohs(laddr_in.sin_port),
> - inet_ntoa(raddr_in.sin_addr), ntohs(raddr_in.sin_port));
> +inet_ntoa(laddr_in.sin_addr), 
> ntohs(laddr_in.sin_port),
> +inet_ntoa(raddr_in.sin_addr), 
> ntohs(raddr_in.sin_port));
>  
>  break;
>  }
> @@ -521,7 +535,8 @@ static int net_dgram_init(NetClientState *peer,
>  }
>  
>  if (remote) {
> -s->dgram_dst = raddr_in;
> +g_assert(s->dgram_dst == NULL);
> +s->dgram_dst = dgram_dst;
>  
>  pstrcpy(s->nc.info_str, sizeof(s->nc.info_str), info_str);
>  g_free(info_str);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v8 06/14] net: stream: Don't ignore EINVAL on netdev socket connection

2022-09-13 Thread David Gibson
On Tue, Sep 13, 2022 at 08:39:52AM +0200, Laurent Vivier wrote:
> From: Stefano Brivio 
> 
> Other errors are treated as failure by net_stream_client_init(),
> but if connect() returns EINVAL, we'll fail silently. Remove the
> related exception.

Is this also a bug in net_socket_connect_init()?  Is there an
equivalent bug in dgram.c?

> Signed-off-by: Stefano Brivio 
> [lvivier: applied to net/stream.c]
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Daniel P. Berrangé 
> ---
>  net/stream.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/stream.c b/net/stream.c
> index 0851e90becca..e8afbaca50b6 100644
> --- a/net/stream.c
> +++ b/net/stream.c
> @@ -363,8 +363,7 @@ static int net_stream_client_init(NetClientState *peer,
>  if (errno == EINTR || errno == EWOULDBLOCK) {
>  /* continue */
>  } else if (errno == EINPROGRESS ||
> -   errno == EALREADY ||
> -   errno == EINVAL) {
> +   errno == EALREADY) {
>  break;
>  } else {
>      error_setg_errno(errp, errno, "can't connect socket");

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v8 11/14] qemu-sockets: move and rename SocketAddress_to_str()

2022-09-13 Thread David Gibson
On Tue, Sep 13, 2022 at 08:39:57AM +0200, Laurent Vivier wrote:
> Rename SocketAddress_to_str() to socket_uri() and move it to
> util/qemu-sockets.c close to socket_parse().
> 
> socket_uri() generates a string from a SocketAddress while
> socket_parse() generates a SocketAddress from a string.
> 
> Signed-off-by: Laurent Vivier 

Reviewed-by: David Gibson 

> ---
>  include/qemu/sockets.h |  2 +-
>  monitor/hmp-cmds.c | 23 +--
>  util/qemu-sockets.c| 20 
>  3 files changed, 22 insertions(+), 23 deletions(-)
> 
> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
> index 47194b9732f8..e5a06d2e3729 100644
> --- a/include/qemu/sockets.h
> +++ b/include/qemu/sockets.h
> @@ -40,6 +40,7 @@ NetworkAddressFamily inet_netfamily(int family);
>  int unix_listen(const char *path, Error **errp);
>  int unix_connect(const char *path, Error **errp);
>  
> +char *socket_uri(SocketAddress *addr);
>  SocketAddress *socket_parse(const char *str, Error **errp);
>  int socket_connect(SocketAddress *addr, Error **errp);
>  int socket_listen(SocketAddress *addr, int num, Error **errp);
> @@ -123,5 +124,4 @@ SocketAddress *socket_address_flatten(SocketAddressLegacy 
> *addr);
>   * Return 0 on success.
>   */
>  int socket_address_parse_named_fd(SocketAddress *addr, Error **errp);
> -
>  #endif /* QEMU_SOCKETS_H */
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index c6cd6f91dde6..cb35059c2d45 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -197,27 +197,6 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict)
>  qapi_free_MouseInfoList(mice_list);
>  }
>  
> -static char *SocketAddress_to_str(SocketAddress *addr)
> -{
> -switch (addr->type) {
> -case SOCKET_ADDRESS_TYPE_INET:
> -return g_strdup_printf("tcp:%s:%s",
> -   addr->u.inet.host,
> -   addr->u.inet.port);
> -case SOCKET_ADDRESS_TYPE_UNIX:
> -return g_strdup_printf("unix:%s",
> -   addr->u.q_unix.path);
> -case SOCKET_ADDRESS_TYPE_FD:
> -return g_strdup_printf("fd:%s", addr->u.fd.str);
> -case SOCKET_ADDRESS_TYPE_VSOCK:
> -return g_strdup_printf("tcp:%s:%s",
> -   addr->u.vsock.cid,
> -   addr->u.vsock.port);
> -default:
> -return g_strdup("unknown address type");
> -}
> -}
> -
>  void hmp_info_migrate(Monitor *mon, const QDict *qdict)
>  {
>  MigrationInfo *info;
> @@ -380,7 +359,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
>  monitor_printf(mon, "socket address: [\n");
>  
>  for (addr = info->socket_address; addr; addr = addr->next) {
> -char *s = SocketAddress_to_str(addr->value);
> +char *s = socket_uri(addr->value);
>  monitor_printf(mon, "\t%s\n", s);
>  g_free(s);
>  }
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 83f4bd6fd211..9f6f655fd526 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -1077,6 +1077,26 @@ int unix_connect(const char *path, Error **errp)
>  return sock;
>  }
>  
> +char *socket_uri(SocketAddress *addr)
> +{
> +switch (addr->type) {
> +case SOCKET_ADDRESS_TYPE_INET:
> +return g_strdup_printf("tcp:%s:%s",
> +   addr->u.inet.host,
> +   addr->u.inet.port);
> +case SOCKET_ADDRESS_TYPE_UNIX:
> +return g_strdup_printf("unix:%s",
> +   addr->u.q_unix.path);
> +case SOCKET_ADDRESS_TYPE_FD:
> +    return g_strdup_printf("fd:%s", addr->u.fd.str);
> +case SOCKET_ADDRESS_TYPE_VSOCK:
> +return g_strdup_printf("tcp:%s:%s",
> +   addr->u.vsock.cid,
> +   addr->u.vsock.port);
> +default:
> +return g_strdup("unknown address type");
> +}
> +}
>  
>  SocketAddress *socket_parse(const char *str, Error **errp)
>  {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v8 07/14] net: stream: add unix socket

2022-09-13 Thread David Gibson
  sizeof(saddr_un.sun_path));
> +return -1;
> +}
> +
> +fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
> +if (fd < 0) {
> +error_setg_errno(errp, errno, "can't create stream socket");
> +return -1;
> +}
> +qemu_socket_set_nonblock(fd);
> +
> +connected = 0;
> +for (;;) {
> +ret = connect(fd, (struct sockaddr *)_un, 
> sizeof(saddr_un));
> +if (ret < 0) {
> +if (errno == EINTR || errno == EWOULDBLOCK) {
> +/* continue */
> +} else if (errno == EAGAIN ||
> +   errno == EALREADY) {
> +break;
> +} else {
> +error_setg_errno(errp, errno, "can't connect socket");
> +closesocket(fd);
> +return -1;
> +}
> +} else {
> +connected = 1;
> +break;
> +}
> +}
> +info_str = g_strdup_printf(" connect to %s", saddr_un.sun_path);
> +break;
> +}
>  case SOCKET_ADDRESS_TYPE_FD:
>  fd = monitor_fd_param(monitor_cur(), addr->u.fd.str, errp);
>  if (fd == -1) {
> @@ -395,7 +493,7 @@ static int net_stream_client_init(NetClientState *peer,
>  info_str = g_strdup_printf("connect to fd %d", fd);
>  break;
>  default:
> -error_setg(errp, "only support inet or fd type");
> +error_setg(errp, "only support inet, unix or fd type");
>  return -1;
>  }
>  
> diff --git a/qapi/net.json b/qapi/net.json
> index e02e8001a000..bb96701a49a7 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -583,7 +583,7 @@
>  #or connect to (server=false)
>  # @server: create server socket (default: true)
>  #
> -# Only SocketAddress types 'inet' and 'fd' are supported.
> +# Only SocketAddress types 'unix', 'inet' and 'fd' are supported.
>  #
>  # Since: 7.1
>  ##
> diff --git a/qemu-options.hx b/qemu-options.hx
> index bb16a61bae8e..8870bcce6bcd 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2733,6 +2733,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
>  "configure a network backend to connect to another 
> network\n"
>  "using an UDP tunnel\n"
>  "-netdev 
> stream,id=str[,server=on|off],addr.type=inet,addr.host=host,addr.port=port\n"
> +"-netdev stream,id=str[,server=on|off],addr.type=unix,addr.path=path\n"
>  "-netdev stream,id=str[,server=on|off],addr.type=fd,addr.str=h\n"
>  "configure a network backend to connect to another 
> network\n"
>  "using a socket connection in stream mode.\n"

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v8 05/14] qapi: net: add stream and dgram netdevs

2022-09-13 Thread David Gibson
uot;,
> +   inet_ntoa(saddr_in.sin_addr),
> +   ntohs(saddr_in.sin_port));
> +break;
> +}
> +case SOCKET_ADDRESS_TYPE_FD:
> +fd = monitor_fd_param(monitor_cur(), addr->u.fd.str, errp);
> +if (fd == -1) {
> +return -1;
> +}
> +ret = qemu_socket_try_set_nonblock(fd);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "%s: Can't use file descriptor %d",
> + name, fd);
> +return -1;
> +}
> +connected = 1;
> +info_str = g_strdup_printf("connect to fd %d", fd);
> +break;
> +default:
> +error_setg(errp, "only support inet or fd type");
> +return -1;
> +}
> +
> +s = net_stream_fd_init_stream(peer, model, name, fd, connected);
> +
> +pstrcpy(s->nc.info_str, sizeof(s->nc.info_str), info_str);
> +g_free(info_str);
> +
> +return 0;

Re: [PATCH v8 02/14] net: remove the @errp argument of net_client_inits()

2022-09-13 Thread David Gibson
On Tue, Sep 13, 2022 at 08:39:48AM +0200, Laurent Vivier wrote:
> The only caller passes _fatal, so use this directly in the function.
> 
> It's what we do for -blockdev, -device, and -object.
> 
> Suggested-by: Markus Armbruster 
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Markus Armbruster 

Reviewed-by: David Gibson 

> ---
>  include/net/net.h |  2 +-
>  net/net.c | 20 +++-
>  softmmu/vl.c  |  2 +-
>  3 files changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/include/net/net.h b/include/net/net.h
> index 81d0b21defce..c1c34a58f849 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -222,7 +222,7 @@ extern const char *host_net_devices[];
>  /* from net.c */
>  int net_client_parse(QemuOptsList *opts_list, const char *str);
>  void show_netdevs(void);
> -int net_init_clients(Error **errp);
> +void net_init_clients(void);
>  void net_check_clients(void);
>  void net_cleanup(void);
>  void hmp_host_net_add(Monitor *mon, const QDict *qdict);
> diff --git a/net/net.c b/net/net.c
> index d2288bd3a929..15958f881776 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1562,27 +1562,21 @@ out:
>  return ret;
>  }
>  
> -int net_init_clients(Error **errp)
> +void net_init_clients(void)
>  {
>  net_change_state_entry =
>  qemu_add_vm_change_state_handler(net_vm_change_state_handler, NULL);
>  
>  QTAILQ_INIT(_clients);
>  
> -if (qemu_opts_foreach(qemu_find_opts("netdev"),
> -  net_init_netdev, NULL, errp)) {
> -return -1;
> -}
> -
> -if (qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL, errp)) 
> {
> -return -1;
> -}
> +qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL,
> +  _fatal);
>  
> -if (qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL, 
> errp)) {
> -return -1;
> -}
> +qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL,
> +  _fatal);
>  
> -return 0;
> +qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL,
> +  _fatal);
>  }
>  
>  int net_client_parse(QemuOptsList *opts_list, const char *optarg)
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index dea4005e4791..1fe8b5c5a120 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -1906,7 +1906,7 @@ static void qemu_create_late_backends(void)
>  qtest_server_init(qtest_chrdev, qtest_log, _fatal);
>  }
>  
> -net_init_clients(_fatal);
> +net_init_clients();
>  
>  object_option_foreach_add(object_create_late);
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v8 03/14] net: simplify net_client_parse() error management

2022-09-13 Thread David Gibson
On Tue, Sep 13, 2022 at 08:39:49AM +0200, Laurent Vivier wrote:
> All net_client_parse() callers exit in case of error.
> 
> Move exit(1) to net_client_parse() and remove error checking from
> the callers.
> 
> Suggested-by: Markus Armbruster 
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Markus Armbruster 

Reviewed-by: David Gibson 
> ---
>  include/net/net.h |  2 +-
>  net/net.c |  6 ++
>  softmmu/vl.c  | 12 +++-
>  3 files changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/include/net/net.h b/include/net/net.h
> index c1c34a58f849..55023e7e9fa9 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -220,7 +220,7 @@ extern NICInfo nd_table[MAX_NICS];
>  extern const char *host_net_devices[];
>  
>  /* from net.c */
> -int net_client_parse(QemuOptsList *opts_list, const char *str);
> +void net_client_parse(QemuOptsList *opts_list, const char *str);
>  void show_netdevs(void);
>  void net_init_clients(void);
>  void net_check_clients(void);
> diff --git a/net/net.c b/net/net.c
> index 15958f881776..f056e8aebfb2 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1579,13 +1579,11 @@ void net_init_clients(void)
>_fatal);
>  }
>  
> -int net_client_parse(QemuOptsList *opts_list, const char *optarg)
> +void net_client_parse(QemuOptsList *opts_list, const char *optarg)
>  {
>  if (!qemu_opts_parse_noisily(opts_list, optarg, true)) {
> -return -1;
> +exit(1);
>  }
> -
> -return 0;
>  }
>  
>  /* From FreeBSD */
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 1fe8b5c5a120..55d163475e9e 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2809,21 +2809,15 @@ void qemu_init(int argc, char **argv, char **envp)
>  break;
>  case QEMU_OPTION_netdev:
>  default_net = 0;
> -if (net_client_parse(qemu_find_opts("netdev"), optarg) == 
> -1) {
> -exit(1);
> -}
> +net_client_parse(qemu_find_opts("netdev"), optarg);
>  break;
>  case QEMU_OPTION_nic:
>  default_net = 0;
> -if (net_client_parse(qemu_find_opts("nic"), optarg) == -1) {
> -exit(1);
> -}
> +net_client_parse(qemu_find_opts("nic"), optarg);
>  break;
>  case QEMU_OPTION_net:
>  default_net = 0;
> -if (net_client_parse(qemu_find_opts("net"), optarg) == -1) {
> -exit(1);
> -}
> +net_client_parse(qemu_find_opts("net"), optarg);
>  break;
>  #ifdef CONFIG_LIBISCSI
>  case QEMU_OPTION_iscsi:

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v8 01/14] net: introduce convert_host_port()

2022-09-13 Thread David Gibson
On Tue, Sep 13, 2022 at 08:39:47AM +0200, Laurent Vivier wrote:
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Stefano Brivio 

Reviewed-by: David Gibson 

Although, if you do respin, an actual commit message would be nice to
have.


> ---
>  include/qemu/sockets.h |  2 ++
>  net/net.c  | 62 ++
>  2 files changed, 34 insertions(+), 30 deletions(-)
> 
> diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
> index 038faa157f59..47194b9732f8 100644
> --- a/include/qemu/sockets.h
> +++ b/include/qemu/sockets.h
> @@ -47,6 +47,8 @@ void socket_listen_cleanup(int fd, Error **errp);
>  int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
>  
>  /* Old, ipv4 only bits.  Don't use for new code. */
> +int convert_host_port(struct sockaddr_in *saddr, const char *host,
> +  const char *port, Error **errp);
>  int parse_host_port(struct sockaddr_in *saddr, const char *str,
>  Error **errp);
>  int socket_init(void);
> diff --git a/net/net.c b/net/net.c
> index 2db160e0634d..d2288bd3a929 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -66,55 +66,57 @@ static QTAILQ_HEAD(, NetClientState) net_clients;
>  /***/
>  /* network device redirectors */
>  
> -int parse_host_port(struct sockaddr_in *saddr, const char *str,
> -Error **errp)
> +int convert_host_port(struct sockaddr_in *saddr, const char *host,
> +  const char *port, Error **errp)
>  {
> -gchar **substrings;
>  struct hostent *he;
> -const char *addr, *p, *r;
> -int port, ret = 0;
> +const char *r;
> +long p;
>  
>  memset(saddr, 0, sizeof(*saddr));
>  
> -substrings = g_strsplit(str, ":", 2);
> -if (!substrings || !substrings[0] || !substrings[1]) {
> -error_setg(errp, "host address '%s' doesn't contain ':' "
> -   "separating host from port", str);
> -ret = -1;
> -goto out;
> -}
> -
> -addr = substrings[0];
> -p = substrings[1];
> -
>  saddr->sin_family = AF_INET;
> -if (addr[0] == '\0') {
> +if (host[0] == '\0') {
>  saddr->sin_addr.s_addr = 0;
>  } else {
> -if (qemu_isdigit(addr[0])) {
> -if (!inet_aton(addr, >sin_addr)) {
> +if (qemu_isdigit(host[0])) {
> +if (!inet_aton(host, >sin_addr)) {
>  error_setg(errp, "host address '%s' is not a valid "
> -   "IPv4 address", addr);
> -ret = -1;
> -goto out;
> +   "IPv4 address", host);
> +return -1;
>  }
>  } else {
> -he = gethostbyname(addr);
> +he = gethostbyname(host);
>  if (he == NULL) {
> -error_setg(errp, "can't resolve host address '%s'", addr);
> -ret = -1;
> -goto out;
> +error_setg(errp, "can't resolve host address '%s'", host);
> +return -1;
>  }
>  saddr->sin_addr = *(struct in_addr *)he->h_addr;
>  }
>  }
> -port = strtol(p, (char **), 0);
> -if (r == p) {
> -error_setg(errp, "port number '%s' is invalid", p);
> +if (qemu_strtol(port, , 0, ) != 0) {
> +error_setg(errp, "port number '%s' is invalid", port);
> +return -1;
> +}
> +saddr->sin_port = htons(p);
> +return 0;
> +}
> +
> +int parse_host_port(struct sockaddr_in *saddr, const char *str,
> +Error **errp)
> +{
> +gchar **substrings;
> +int ret;
> +
> +substrings = g_strsplit(str, ":", 2);
> +if (!substrings || !substrings[0] || !substrings[1]) {
> +error_setg(errp, "host address '%s' doesn't contain ':' "
> +   "separating host from port", str);
>  ret = -1;
>  goto out;
>  }
> -saddr->sin_port = htons(port);
> +
> +ret = convert_host_port(saddr, substrings[0], substrings[1], errp);
>  
>  out:
>  g_strfreev(substrings);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v4 15/21] qmp/hmp, device_tree.c: introduce 'info fdt' command

2022-08-31 Thread David Gibson
On Tue, Aug 30, 2022 at 12:43:23PM +0200, Markus Armbruster wrote:
> David Gibson  writes:
> 
> > On Mon, Aug 29, 2022 at 07:00:55PM -0300, Daniel Henrique Barboza wrote:
> >> 
> >> 
> >> On 8/29/22 00:34, David Gibson wrote:
> >> > On Fri, Aug 26, 2022 at 11:11:44AM -0300, Daniel Henrique Barboza wrote:
> >> > > Reading the FDT requires that the user saves the fdt_blob and then use
> >> > > 'dtc' to read the contents. Saving the file and using 'dtc' is a strong
> >> > > use case when we need to compare two FDTs, but it's a lot of steps if
> >> > > you want to do quick check on a certain node or property.
> >> > > 
> >> > > 'info fdt' retrieves FDT nodes (and properties, later on) and print it
> >> > > to the user. This can be used to check the FDT on running machines
> >> > > without having to save the blob and use 'dtc'.
> >> > > 
> >> > > The implementation is based on the premise that the machine thas a FDT
> >> > > created using libfdt and pointed by 'machine->fdt'. As long as this
> >> > > pre-requisite is met the machine should be able to support it.
> >> > > 
> >> > > For now we're going to add the required QMP/HMP boilerplate and the
> >> > > capability of printing the name of the properties of a given node. Next
> >> > > patches will extend 'info fdt' to be able to print nodes recursively,
> >> > > and then individual properties.
> >> > > 
> >> > > This command will always be executed in-band (i.e. holding BQL),
> >> > > avoiding potential race conditions with machines that might change the
> >> > > FDT during runtime (e.g. PowerPC 'pseries' machine).
> >> > > 
> >> > > 'info fdt' is not something that we expect to be used aside from 
> >> > > debugging,
> >> > > so we're implementing it in QMP as 'x-query-fdt'.
> >> > > 
> >> > > This is an example of 'info fdt' fetching the '/chosen' node of the
> >> > > pSeries machine:
> >> > > 
> >> > > (qemu) info fdt /chosen
> >> > > chosen {
> >> > >  ibm,architecture-vec-5;
> >> > >  rng-seed;
> >> > >  ibm,arch-vec-5-platform-support;
> >> > >  linux,pci-probe-only;
> >> > >  stdout-path;
> >> > >  linux,stdout-path;
> >> > >  qemu,graphic-depth;
> >> > >  qemu,graphic-height;
> >> > >  qemu,graphic-width;
> >> > > };
> >> > > 
> >> > > And the same node for the aarch64 'virt' machine:
> >> > > 
> >> > > (qemu) info fdt /chosen
> >> > > chosen {
> >> > >  stdout-path;
> >> > >  rng-seed;
> >> > >  kaslr-seed;
> >> > > };
> >> > 
> >> > So, I'm reasonably convinced allowing dumping the whole dtb from
> >> > qmp/hmp is useful.  I'm less convined that info fdt is worth the
> >> > additional complexity it incurs.  Note that as well as being able to
> >> > decompile a whole dtb using dtc, you can also extract and list
> >> > specific properties from a dtb blob using the 'fdtget' tool which is
> >> > part of the dtc tree.
> >> 
> >> What's your opinion on patch 21/21, where 'dumpdtb' can write a formatted
> >> FDT in a file with an extra option? That was possible because of the
> >> format helpers introduced for 'info fdt'. The idea is that since we're
> >> able to format a FDT in DTS format, we can also write the FDT in text
> >> format without relying on DTC to decode it.
> >
> > Since it's mostly the same code, I think it's reasonable to throw in
> > if the info fdt stuff is there, but I don't think it's worth including
> > without that.  As a whole, I remain dubious that (info fdt + dumpdts)
> > is worth the complexity cost.
> 
> How much code does it take, and who's going to maintain it?

It's not especially big, but it's not negligible.  Perhaps the part
that I'm most uncomfortable about is that it requires a bunch of messy
heuristics to guess how to format the output - DT properties are just
bytestrings, any internal interpretation is based on the specific
bindings for them.

dtc already has these and I don't love having a second, potentially
different copy of necessarily imperfect heuristics out in the wild.

> > People with more practical experience debugging the embedded ARM
> > platforms might have a different opinion if they thing info fdt would
> > be really useful though.
> 
> They better speak up then :)

Just so.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 10/20] hw/ppc: set machine->fdt in spapr machine

2022-08-31 Thread David Gibson
On Mon, Aug 22, 2022 at 07:30:36AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/22/22 00:29, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 22/08/2022 13:05, David Gibson wrote:
> > > On Fri, Aug 19, 2022 at 06:42:34AM -0300, Daniel Henrique Barboza wrote:
> > > > 
> > > > 
> > > > On 8/18/22 23:11, Alexey Kardashevskiy wrote:
> > > > > 
> > > > > 
> > > > > On 05/08/2022 19:39, Daniel Henrique Barboza wrote:
> > > > > > The pSeries machine never bothered with the common machine->fdt
> > > > > > attribute. We do all the FDT related work using spapr->fdt_blob.
> > > > > > 
> > > > > > We're going to introduce HMP commands to read and save the FDT, 
> > > > > > which
> > > > > > will rely on setting machine->fdt properly to work across all 
> > > > > > machine
> > > > > > archs/types.
> > > > > 
> > > > > 
> > > > > Out of curiosity - why new HMP command, is not QOM'ing this ms::fdt 
> > > > > property enough?
> > > > 
> > > > I tried to do the minimal changes needed for the commands to work. 
> > > > ms::fdt is
> > > > one of the few MachineState fields that hasn't been QOMified by
> > > > machine_class_init() yet. All pre-existing code that uses ms::fdt are 
> > > > using the
> > > > pointer directly. To make a QOMified use of it would require extra 
> > > > patches
> > > > in machine.c to QOMify the property first.
> > > > 
> > > > There's also the issue with how each machine is creating the FDT. Most 
> > > > are using
> > > > helpers from device_tree.c, some are creating it from scratch, others 
> > > > required
> > > > a .dtb file, most of them are not doing a fdt_pack() and so on. To 
> > > > really QOMify
> > > > the use of ms::fdt we would need some machine hooks that standardize 
> > > > all that.
> > > > I believe it's worth the trouble, but it would be too much to do
> > > > right now.
> > > 
> > > Hmm.. I think this depends on what you mean by "QOM"ify exactly.  If
> > > you're meaning make the full DT representation QOM objects, that you
> > > can look into in detail, then, yes, that's pretty complicated.
> > > 
> > > I suspect what Alexey was suggesting though, was merely to make
> > > ms::fdt accessible as a single bytestring property on the machine QOM
> > > object.  Effectively it's just "dumpdtb" but as a property get.
> > 
> > 
> > Yes, I meant the bytestream, as DTC can easily decompile it onto a DTS.
> > 
> > 
> > > I'm not 100% certain if QOM can safely represent arbitrary bytestrings
> > > as QOM properties, which would need checking.
> > 
> > I am not sure either but rather than adding another command to HMP, I'd 
> > explore this option first.
> 
> 
> I'm not sure what you mean by that. The HMP version of 'dumpdtb' is more 
> flexible
> that the current "-machine dumpdtb", an extra machine option that would cause
> the guest to exit after writing the dtb. And 'info fdt' is a new command that
> makes it easier to inspect specific nodes/props.
> 
> I don't see how making ms::fdt being retrievable by object_property_get() 
> internally
> (remember that ms::fdt it's not fully QOMified, so there's no introspection 
> of its
> value from the QEMU monitor) would make any of these new HMP commands 
> obsolete.

I believe what we were thinking is if the dtb (as a single bytestring) can be
retrieved with a qom-get on a suitable property on the machine, that
might make things marginally simpler than adding a new command.  I'm
not certain if the JSON format of the QMP responses can safely encode
an arbitrary bytestring, though (as opoosed to a Unicode string).

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v4 15/21] qmp/hmp, device_tree.c: introduce 'info fdt' command

2022-08-29 Thread David Gibson
On Mon, Aug 29, 2022 at 07:00:55PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/29/22 00:34, David Gibson wrote:
> > On Fri, Aug 26, 2022 at 11:11:44AM -0300, Daniel Henrique Barboza wrote:
> > > Reading the FDT requires that the user saves the fdt_blob and then use
> > > 'dtc' to read the contents. Saving the file and using 'dtc' is a strong
> > > use case when we need to compare two FDTs, but it's a lot of steps if
> > > you want to do quick check on a certain node or property.
> > > 
> > > 'info fdt' retrieves FDT nodes (and properties, later on) and print it
> > > to the user. This can be used to check the FDT on running machines
> > > without having to save the blob and use 'dtc'.
> > > 
> > > The implementation is based on the premise that the machine thas a FDT
> > > created using libfdt and pointed by 'machine->fdt'. As long as this
> > > pre-requisite is met the machine should be able to support it.
> > > 
> > > For now we're going to add the required QMP/HMP boilerplate and the
> > > capability of printing the name of the properties of a given node. Next
> > > patches will extend 'info fdt' to be able to print nodes recursively,
> > > and then individual properties.
> > > 
> > > This command will always be executed in-band (i.e. holding BQL),
> > > avoiding potential race conditions with machines that might change the
> > > FDT during runtime (e.g. PowerPC 'pseries' machine).
> > > 
> > > 'info fdt' is not something that we expect to be used aside from 
> > > debugging,
> > > so we're implementing it in QMP as 'x-query-fdt'.
> > > 
> > > This is an example of 'info fdt' fetching the '/chosen' node of the
> > > pSeries machine:
> > > 
> > > (qemu) info fdt /chosen
> > > chosen {
> > >  ibm,architecture-vec-5;
> > >  rng-seed;
> > >  ibm,arch-vec-5-platform-support;
> > >  linux,pci-probe-only;
> > >  stdout-path;
> > >  linux,stdout-path;
> > >  qemu,graphic-depth;
> > >  qemu,graphic-height;
> > >  qemu,graphic-width;
> > > };
> > > 
> > > And the same node for the aarch64 'virt' machine:
> > > 
> > > (qemu) info fdt /chosen
> > > chosen {
> > >  stdout-path;
> > >  rng-seed;
> > >  kaslr-seed;
> > > };
> > 
> > So, I'm reasonably convinced allowing dumping the whole dtb from
> > qmp/hmp is useful.  I'm less convined that info fdt is worth the
> > additional complexity it incurs.  Note that as well as being able to
> > decompile a whole dtb using dtc, you can also extract and list
> > specific properties from a dtb blob using the 'fdtget' tool which is
> > part of the dtc tree.
> 
> What's your opinion on patch 21/21, where 'dumpdtb' can write a formatted
> FDT in a file with an extra option? That was possible because of the
> format helpers introduced for 'info fdt'. The idea is that since we're
> able to format a FDT in DTS format, we can also write the FDT in text
> format without relying on DTC to decode it.

Since it's mostly the same code, I think it's reasonable to throw in
if the info fdt stuff is there, but I don't think it's worth including
without that.  As a whole, I remain dubious that (info fdt + dumpdts)
is worth the complexity cost.

People with more practical experience debugging the embedded ARM
platforms might have a different opinion if they thing info fdt would
be really useful though.

> If we think that this 'dumpdtb' capability is worth having, I can respin
> the patches without 'info fdt' but adding these helpers to enable this
> 'dumpdtb' support. If not, then we can just remove patches 15-21 and
> be done with it.
> 
> 
> Thanks,
> 
> 
> Daniel
> 
> > 
> > > 
> > > Cc: Dr. David Alan Gilbert 
> > > Acked-by: Dr. David Alan Gilbert 
> > > Signed-off-by: Daniel Henrique Barboza 
> > > ---
> > >   hmp-commands-info.hx | 13 ++
> > >   include/monitor/hmp.h|  1 +
> > >   include/sysemu/device_tree.h |  4 +++
> > >   monitor/hmp-cmds.c   | 13 ++
> > >   monitor/qmp-cmds.c   | 12 +
> > >   qapi/machine.json| 19 +++
> > >   softmmu/device_tree.c| 47 
> > >   7 files changed, 109 insertions(+)
> > > 
> > > diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> > > index 188d9ece3b..743b48865d 100644

Re: [PATCH for-7.2 v4 15/21] qmp/hmp, device_tree.c: introduce 'info fdt' command

2022-08-28 Thread David Gibson
t;  void qemu_fdt_qmp_dumpdtb(const char *filename, Error **errp);
> +HumanReadableText *qemu_fdt_qmp_query_fdt(const char *nodepath,
> +  Error **errp);
>  
>  /**
>   * qemu_fdt_setprop_sized_cells_from_array:
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 1c7bfd3b9d..93a4103afa 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -2484,3 +2484,16 @@ void hmp_dumpdtb(Monitor *mon, const QDict *qdict)
>  hmp_handle_error(mon, local_err);
>  }
>  }
> +
> +void hmp_info_fdt(Monitor *mon, const QDict *qdict)
> +{
> +const char *nodepath = qdict_get_str(qdict, "nodepath");
> +Error *err = NULL;
> +g_autoptr(HumanReadableText) info = qmp_x_query_fdt(nodepath, );
> +
> +if (hmp_handle_error(mon, err)) {
> +return;
> +}
> +
> +monitor_printf(mon, "%s", info->human_readable_text);
> +}
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index 8415aca08c..db2c6aa7da 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -603,9 +603,21 @@ void qmp_dumpdtb(const char *filename, Error **errp)
>  {
>  return qemu_fdt_qmp_dumpdtb(filename, errp);
>  }
> +
> +HumanReadableText *qmp_x_query_fdt(const char *nodepath, Error **errp)
> +{
> +return qemu_fdt_qmp_query_fdt(nodepath, errp);
> +}
>  #else
>  void qmp_dumpdtb(const char *filename, Error **errp)
>  {
>  error_setg(errp, "dumpdtb requires libfdt");
>  }
> +
> +HumanReadableText *qmp_x_query_fdt(const char *nodepath, Error **errp)
> +{
> +error_setg(errp, "this command requires libfdt");
> +
> +return NULL;
> +}
>  #endif
> diff --git a/qapi/machine.json b/qapi/machine.json
> index aeb013f3dd..96cff541ca 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1681,3 +1681,22 @@
>  ##
>  { 'command': 'dumpdtb',
>'data': { 'filename': 'str' } }
> +
> +##
> +# @x-query-fdt:
> +#
> +# Query for FDT element (node or property). Requires 'libfdt'.
> +#
> +# @nodepath: the path of the FDT node to be retrieved
> +#
> +# Features:
> +# @unstable: This command is meant for debugging.
> +#
> +# Returns: FDT node
> +#
> +# Since: 7.2
> +##
> +{ 'command': 'x-query-fdt',
> +  'data': { 'nodepath': 'str' },
> +  'returns': 'HumanReadableText',
> +  'features': [ 'unstable' ]  }
> diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> index cd487ddd4d..6b15f6ace2 100644
> --- a/softmmu/device_tree.c
> +++ b/softmmu/device_tree.c
> @@ -18,6 +18,7 @@
>  #endif
>  
>  #include "qapi/error.h"
> +#include "qapi/type-helpers.h"
>  #include "qemu/error-report.h"
>  #include "qemu/option.h"
>  #include "qemu/bswap.h"
> @@ -661,3 +662,49 @@ void qemu_fdt_qmp_dumpdtb(const char *filename, Error 
> **errp)
>  
>  error_setg(errp, "Error when saving machine FDT to file %s", filename);
>  }
> +
> +static void fdt_format_node(GString *buf, int node, int depth)
> +{
> +const struct fdt_property *prop = NULL;
> +const char *propname = NULL;
> +void *fdt = current_machine->fdt;
> +int padding = depth * 4;
> +int property = 0;
> +int prop_size;
> +
> +g_string_append_printf(buf, "%*s%s {\n", padding, "",
> +   fdt_get_name(fdt, node, NULL));
> +
> +padding += 4;
> +
> +fdt_for_each_property_offset(property, fdt, node) {
> +prop = fdt_get_property_by_offset(fdt, property, _size);
> +propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
> +
> +g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
> +}
> +
> +padding -= 4;
> +g_string_append_printf(buf, "%*s};\n", padding, "");
> +}
> +
> +HumanReadableText *qemu_fdt_qmp_query_fdt(const char *nodepath, Error **errp)
> +{
> +g_autoptr(GString) buf = g_string_new("");
> +int node;
> +
> +if (!current_machine->fdt) {
> +error_setg(errp, "Unable to find the machine FDT");
> +return NULL;
> +}
> +
> +node = fdt_path_offset(current_machine->fdt, nodepath);
> +if (node < 0) {
> +error_setg(errp, "node '%s' not found in FDT", nodepath);
> +return NULL;
> +}
> +
> +fdt_format_node(buf, node, 0);
> +
> +return human_readable_text_from_str(buf);
> +}

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v4 10/21] hw/ppc: set machine->fdt in spapr machine

2022-08-28 Thread David Gibson
On Fri, Aug 26, 2022 at 11:11:39AM -0300, Daniel Henrique Barboza wrote:
> The pSeries machine never bothered with the common machine->fdt
> attribute. We do all the FDT related work using spapr->fdt_blob.
> 
> We're going to introduce HMP commands to read and save the FDT, which
> will rely on setting machine->fdt properly to work across all machine
> archs/types.
> 
> Let's set machine->fdt in two places where we manipulate the FDT:
> spapr_machine_reset() and CAS. There are other places where the FDT is
> manipulated in the pSeries machines, most notably the hotplug/unplug
> path. For now we'll acknowledge that we won't have the most accurate
> representation of the FDT, depending on the current machine state, when
> using these QMP/HMP fdt commands. Making the internal FDT representation
> always match the actual FDT representation that the guest is using is a
> problem for another day.
> 
> spapr->fdt_blob is left untouched for now. To replace it with
> machine->fdt, since we're migrating spapr->fdt_blob, we would need to
> migrate machine->fdt as well. This is something that we would like to to
> do keep our code simpler but it's also a work we'll leave for later.

As discussed elswhere, this doesn't give a full picture of the
"runtime" device tree, which can get modified later.  For now, I think
that's ok - we can define the fdt property / dumpdtb etc. as
describing specifically the boot time DT before guest firmware or OS
does any further mangling of it.  That's effectively what it means for
all the other embedded cases, though in those cases the firmware
usually doesn't need to do further modifications, unlike a "full OF"
environment like spapr.

Reviewed-by: David Gibson 

> 
> Cc: Cédric Le Goater 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr.c   | 6 ++
>  hw/ppc/spapr_hcall.c | 8 
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index bc9ba6e6dc..7031cf964a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1713,6 +1713,12 @@ static void spapr_machine_reset(MachineState *machine)
>  spapr->fdt_initial_size = spapr->fdt_size;
>  spapr->fdt_blob = fdt;
>  
> +/*
> + * Set the common machine->fdt pointer to enable support
> + * for 'dumpdtb' and 'info fdt' QMP/HMP commands.
> + */
> +machine->fdt = fdt;
> +
>  /* Set up the entry state */
>  first_ppc_cpu->env.gpr[5] = 0;
>  
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index a8d4a6bcf0..a53bfd76f4 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1256,6 +1256,14 @@ target_ulong do_client_architecture_support(PowerPCCPU 
> *cpu,
>  spapr->fdt_initial_size = spapr->fdt_size;
>  spapr->fdt_blob = fdt;
>  
> +/*
> + * Set the machine->fdt pointer again since we just freed
> +     * it above (by freeing spapr->fdt_blob). We set this
> + * pointer to enable support for 'dumpdtb' and 'info fdt'
> + * QMP/HMP commands.
> + */
> +MACHINE(spapr)->fdt = fdt;
> +
>  return H_SUCCESS;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 10/20] hw/ppc: set machine->fdt in spapr machine

2022-08-21 Thread David Gibson
On Fri, Aug 19, 2022 at 06:42:34AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/18/22 23:11, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 05/08/2022 19:39, Daniel Henrique Barboza wrote:
> > > The pSeries machine never bothered with the common machine->fdt
> > > attribute. We do all the FDT related work using spapr->fdt_blob.
> > > 
> > > We're going to introduce HMP commands to read and save the FDT, which
> > > will rely on setting machine->fdt properly to work across all machine
> > > archs/types.
> > 
> > 
> > Out of curiosity - why new HMP command, is not QOM'ing this ms::fdt 
> > property enough?
> 
> I tried to do the minimal changes needed for the commands to work. ms::fdt is
> one of the few MachineState fields that hasn't been QOMified by
> machine_class_init() yet. All pre-existing code that uses ms::fdt are using 
> the
> pointer directly. To make a QOMified use of it would require extra patches
> in machine.c to QOMify the property first.
> 
> There's also the issue with how each machine is creating the FDT. Most are 
> using
> helpers from device_tree.c, some are creating it from scratch, others required
> a .dtb file, most of them are not doing a fdt_pack() and so on. To really 
> QOMify
> the use of ms::fdt we would need some machine hooks that standardize all that.
> I believe it's worth the trouble, but it would be too much to do
> right now.

Hmm.. I think this depends on what you mean by "QOM"ify exactly.  If
you're meaning make the full DT representation QOM objects, that you
can look into in detail, then, yes, that's pretty complicated.

I suspect what Alexey was suggesting though, was merely to make
ms::fdt accessible as a single bytestring property on the machine QOM
object.  Effectively it's just "dumpdtb" but as a property get.

I'm not 100% certain if QOM can safely represent arbitrary bytestrings
as QOM properties, which would need checking.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 10/20] hw/ppc: set machine->fdt in spapr machine

2022-08-18 Thread David Gibson
On Fri, Aug 19, 2022 at 12:11:40PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 05/08/2022 19:39, Daniel Henrique Barboza wrote:
> > The pSeries machine never bothered with the common machine->fdt
> > attribute. We do all the FDT related work using spapr->fdt_blob.
> > 
> > We're going to introduce HMP commands to read and save the FDT, which
> > will rely on setting machine->fdt properly to work across all machine
> > archs/types.
> 
> Out of curiosity - why new HMP command, is not QOM'ing this ms::fdt property
> enough?

Huh.. I didn't think of that.  For dumpdtb you could be right, that
you might be able to use existing qom commands to extract the
property.  Would need to check that the size is is handled properly,
fdt's are a bit weird in having their size "in band".

"info fdt" etc. obviously have additional funtionality in formatting
the contents more helpfully.


> Another thing is that on every HMP dump I'd probably rebuild the entire FDT
> for the reasons David explained. Thanks,

This would require per-machine hooks, however.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 15/20] qmp/hmp, device_tree.c: introduce 'info fdt' command

2022-08-18 Thread David Gibson
On Mon, Aug 15, 2022 at 07:48:14PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/8/22 01:21, David Gibson wrote:
> > On Fri, Aug 05, 2022 at 06:39:43AM -0300, Daniel Henrique Barboza wrote:
> > > Reading the FDT requires that the user saves the fdt_blob and then use
> > > 'dtc' to read the contents. Saving the file and using 'dtc' is a strong
> > > use case when we need to compare two FDTs, but it's a lot of steps if
> > > you want to do quick check on a certain node or property.
> > > 
> > > 'info fdt' retrieves FDT nodes (and properties, later on) and print it
> > > to the user. This can be used to check the FDT on running machines
> > > without having to save the blob and use 'dtc'.
> > > 
> > > The implementation is based on the premise that the machine thas a FDT
> > > created using libfdt and pointed by 'machine->fdt'. As long as this
> > > pre-requisite is met the machine should be able to support it.
> > > 
> > > For now we're going to add the required QMP/HMP boilerplate and the
> > > capability of printing the name of the properties of a given node. Next
> > > patches will extend 'info fdt' to be able to print nodes recursively,
> > > and then individual properties.
> > > 
> > > 'info fdt' is not something that we expect to be used aside from 
> > > debugging,
> > > so we're implementing it in QMP as 'x-query-fdt'.
> > > 
> > > This is an example of 'info fdt' fetching the '/chosen' node of the
> > > pSeries machine:
> > > 
> > > (qemu) info fdt /chosen
> > > chosen {
> > >  ibm,architecture-vec-5;
> > >  rng-seed;
> > >  ibm,arch-vec-5-platform-support;
> > >  linux,pci-probe-only;
> > >  stdout-path;
> > >  linux,stdout-path;
> > >  qemu,graphic-depth;
> > >  qemu,graphic-height;
> > >  qemu,graphic-width;
> > > }
> > > 
> > > And the same node for the aarch64 'virt' machine:
> > > 
> > > (qemu) info fdt /chosen
> > > chosen {
> > >  stdout-path;
> > >  rng-seed;
> > >  kaslr-seed;
> > > }
> > 
> > So... it's listing the names of the properties, but not the contents?
> > That seems kind of odd.
> > 
> > > 
> > > Cc: Dr. David Alan Gilbert 
> > > Signed-off-by: Daniel Henrique Barboza 
> > > ---
> > >   hmp-commands-info.hx | 13 ++
> > >   include/monitor/hmp.h|  1 +
> > >   include/sysemu/device_tree.h |  4 +++
> > >   monitor/hmp-cmds.c   | 13 ++
> > >   monitor/qmp-cmds.c   | 12 +
> > >   qapi/machine.json| 19 +++
> > >   softmmu/device_tree.c| 47 
> > >   7 files changed, 109 insertions(+)
> > > 
> > > diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> > > index 188d9ece3b..743b48865d 100644
> > > --- a/hmp-commands-info.hx
> > > +++ b/hmp-commands-info.hx
> > > @@ -921,3 +921,16 @@ SRST
> > > ``stats``
> > >   Show runtime-collected statistics
> > >   ERST
> > > +
> > > +{
> > > +.name   = "fdt",
> > > +.args_type  = "nodepath:s",
> > > +.params = "nodepath",
> > > +.help   = "show firmware device tree node given its path",
> > > +.cmd= hmp_info_fdt,
> > > +},
> > > +
> > > +SRST
> > > +  ``info fdt``
> > > +Show a firmware device tree node given its path. Requires libfdt.
> > > +ERST
> > > diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> > > index d7f324da59..c0883dd1e3 100644
> > > --- a/include/monitor/hmp.h
> > > +++ b/include/monitor/hmp.h
> > > @@ -135,6 +135,7 @@ void hmp_set_vcpu_dirty_limit(Monitor *mon, const 
> > > QDict *qdict);
> > >   void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
> > >   void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
> > >   void hmp_dumpdtb(Monitor *mon, const QDict *qdict);
> > > +void hmp_info_fdt(Monitor *mon, const QDict *qdict);
> > >   void hmp_human_readable_text_helper(Monitor *mon,
> > >   HumanReadableText 
> > > *(*qmp_handler)(Error **));
> > >   void hmp_info_stats(Monitor *mon, const 

Re: [PATCH 2/2] hw/mips/boston: Pack fdt in fdt filter

2022-08-18 Thread David Gibson
On Tue, Aug 16, 2022 at 12:46:46PM +0100, Jiaxun Yang wrote:
> 
> 
> > 2022年8月16日 01:44,Philippe Mathieu-Daudé  写道:
> > 
> > On 13/8/22 18:27, Jiaxun Yang wrote:
> >> FDT can be awfully fat after series of modifications in fdt
> >> filter. Just pack it up before add to ram.
> >> Signed-off-by: Jiaxun Yang 
> >> ---
> >>  hw/mips/boston.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >> diff --git a/hw/mips/boston.c b/hw/mips/boston.c
> >> index 5145179951..a40f193f78 100644
> >> --- a/hw/mips/boston.c
> >> +++ b/hw/mips/boston.c
> >> @@ -400,6 +400,7 @@ static const void *boston_fdt_filter(void *opaque, 
> >> const void *fdt_orig,
> >>  1, boston_memmap[BOSTON_HIGHDDR].base + 
> >> ram_low_sz,
> >>  1, ram_high_sz);
> >>  +fdt_pack(fdt);
> >>  fdt = g_realloc(fdt, fdt_totalsize(fdt));
> >>  qemu_fdt_dumpdtb(fdt, fdt_sz);
> >>  
> > 
> > Why not pack by default in qemu_fdt_dumpdtb()?
> 
> qemu_fdt_dumpdtb() is explicitly a function for debugging purpose.
> Donno if it’s wise to hijack it.

Agreed.  Having this modify the dtb sounds like a very bad idea.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v3 20/20] hmp, device_tree.c: add 'info fdt ' support

2022-08-18 Thread David Gibson
lename, Error **errp)
> @@ -614,7 +615,8 @@ void qmp_dumpdtb(const char *filename, Error **errp)
>  error_setg(errp, "dumpdtb requires libfdt");
>  }
>  
> -HumanReadableText *qmp_x_query_fdt(const char *nodepath, Error **errp)
> +HumanReadableText *qmp_x_query_fdt(const char *nodepath, bool has_propname,
> +   const char *propname, Error **errp)
>  {
>  error_setg(errp, "this command requires libfdt");
>  
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 96cff541ca..c15ce60f46 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1688,6 +1688,7 @@
>  # Query for FDT element (node or property). Requires 'libfdt'.
>  #
>  # @nodepath: the path of the FDT node to be retrieved
> +# @propname: name of the property inside the node
>  #
>  # Features:
>  # @unstable: This command is meant for debugging.
> @@ -1697,6 +1698,7 @@
>  # Since: 7.2
>  ##
>  { 'command': 'x-query-fdt',
> -  'data': { 'nodepath': 'str' },
> +  'data': { 'nodepath': 'str',
> +'*propname': 'str' },
>'returns': 'HumanReadableText',
>'features': [ 'unstable' ]  }
> diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> index 9e681739bd..523c9b8d4d 100644
> --- a/softmmu/device_tree.c
> +++ b/softmmu/device_tree.c
> @@ -823,23 +823,42 @@ static void fdt_format_node(GString *buf, int node, int 
> depth,
>  g_string_append_printf(buf, "%*s}\n", padding, "");
>  }
>  
> -HumanReadableText *qemu_fdt_qmp_query_fdt(const char *nodepath, Error **errp)
> +HumanReadableText *qemu_fdt_qmp_query_fdt(const char *nodepath,
> +  bool has_propname,
> +  const char *propname,
> +  Error **errp)
>  {
>  g_autoptr(GString) buf = g_string_new("");
> -int node;
> +const struct fdt_property *prop = NULL;
> +void *fdt = current_machine->fdt;
> +int node, prop_size;
>  
> -if (!current_machine->fdt) {
> +if (!fdt) {
>  error_setg(errp, "Unable to find the machine FDT");
>  return NULL;
>  }
>  
> -node = fdt_path_offset(current_machine->fdt, nodepath);
> +node = fdt_path_offset(fdt, nodepath);
>  if (node < 0) {
>  error_setg(errp, "node '%s' not found in FDT", nodepath);
>  return NULL;
>  }
>  
> -fdt_format_node(buf, node, 0, nodepath);
> +if (!has_propname) {
> +fdt_format_node(buf, node, 0, nodepath);
> +} else {
> +g_autofree char *proppath = g_strdup_printf("%s/%s", nodepath,
> +propname);
> +
> +prop = fdt_get_property(fdt, node, propname, _size);
> +if (!prop) {
> +error_setg(errp, "property '%s' not found in node '%s' in FDT",
> +   propname, nodepath);
> +return NULL;
> +}
> +
> +fdt_format_property(buf, proppath, prop->data, prop_size, 0);
> +}
>  
>  return human_readable_text_from_str(buf);
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH 01/13] target/ppc: define PPC_INTERRUPT_* values directly

2022-08-18 Thread David Gibson
On Mon, Aug 15, 2022 at 01:20:07PM -0300, Matheus Ferst wrote:
> This enum defines the bit positions in env->pending_interrupts for each
> interrupt. However, except for the comparison in kvmppc_set_interrupt,
> the values are always used as (1 << PPC_INTERRUPT_*). Define them
> directly like that to save some clutter. No functional change intended.
> 
> Signed-off-by: Matheus Ferst 

Good idea.

Reviewed-by: David Gibson 

> ---
>  hw/ppc/ppc.c | 10 +++---
>  hw/ppc/trace-events  |  2 +-
>  target/ppc/cpu.h | 40 +++---
>  target/ppc/cpu_init.c| 56 +++---
>  target/ppc/excp_helper.c | 74 
>  target/ppc/misc_helper.c |  6 ++--
>  6 files changed, 94 insertions(+), 94 deletions(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 690f448cb9..77e611e81c 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -40,7 +40,7 @@
>  static void cpu_ppc_tb_stop (CPUPPCState *env);
>  static void cpu_ppc_tb_start (CPUPPCState *env);
>  
> -void ppc_set_irq(PowerPCCPU *cpu, int n_IRQ, int level)
> +void ppc_set_irq(PowerPCCPU *cpu, int irq, int level)
>  {
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = >env;
> @@ -56,21 +56,21 @@ void ppc_set_irq(PowerPCCPU *cpu, int n_IRQ, int level)
>  old_pending = env->pending_interrupts;
>  
>  if (level) {
> -env->pending_interrupts |= 1 << n_IRQ;
> +env->pending_interrupts |= irq;
>  cpu_interrupt(cs, CPU_INTERRUPT_HARD);
>  } else {
> -env->pending_interrupts &= ~(1 << n_IRQ);
> +env->pending_interrupts &= ~irq;
>  if (env->pending_interrupts == 0) {
>  cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
>  }
>  }
>  
>  if (old_pending != env->pending_interrupts) {
> -kvmppc_set_interrupt(cpu, n_IRQ, level);
> +kvmppc_set_interrupt(cpu, irq, level);
>  }
>  
>  
> -trace_ppc_irq_set_exit(env, n_IRQ, level, env->pending_interrupts,
> +trace_ppc_irq_set_exit(env, irq, level, env->pending_interrupts,
> CPU(cpu)->interrupt_request);
>  
>  if (locked) {
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 5c0a215cad..c9ee1285b8 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -116,7 +116,7 @@ ppc40x_set_tb_clk(uint32_t value) "new frequency %" PRIu32
>  ppc40x_timers_init(uint32_t value) "frequency %" PRIu32
>  
>  ppc_irq_set(void *env, uint32_t pin, uint32_t level) "env [%p] pin %d level 
> %d"
> -ppc_irq_set_exit(void *env, uint32_t n_IRQ, uint32_t level, uint32_t 
> pending, uint32_t request) "env [%p] n_IRQ %d level %d => pending 0x%08" 
> PRIx32 " req 0x%08" PRIx32
> +ppc_irq_set_exit(void *env, uint32_t irq, uint32_t level, uint32_t pending, 
> uint32_t request) "env [%p] irq 0x%05" PRIx32 " level %d => pending 0x%08" 
> PRIx32 " req 0x%08" PRIx32
>  ppc_irq_set_state(const char *name, uint32_t level) "\"%s\" level %d"
>  ppc_irq_reset(const char *name) "%s"
>  ppc_irq_cpu(const char *action) "%s"
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index a4c893cfad..c7864bb3b1 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -2418,27 +2418,27 @@ enum {
>  /* Hardware exceptions definitions */
>  enum {
>  /* External hardware exception sources */
> -PPC_INTERRUPT_RESET = 0,  /* Reset exception  */
> -PPC_INTERRUPT_WAKEUP, /* Wakeup exception */
> -PPC_INTERRUPT_MCK,/* Machine check exception  */
> -PPC_INTERRUPT_EXT,/* External interrupt   */
> -PPC_INTERRUPT_SMI,/* System management interrupt  */
> -PPC_INTERRUPT_CEXT,   /* Critical external interrupt  */
> -PPC_INTERRUPT_DEBUG,  /* External debug exception */
> -PPC_INTERRUPT_THERM,  /* Thermal exception*/
> +PPC_INTERRUPT_RESET = 0x1,  /* Reset exception   
>  */
> +PPC_INTERRUPT_WAKEUP= 0x2,  /* Wakeup exception  
>  */
> +PPC_INTERRUPT_MCK   = 0x4,  /* Machine check exception   
>  */
> +PPC_INTERRUPT_EXT   = 0x8,  /* External interrupt
>  */
> +PPC_INTERRUPT_SMI   = 0x00010,  /* System management interrupt   
>  */
> +PPC_INTERRUPT_CEXT  = 0x00020,  /* Critical external interrupt   
>  */

Re: [PATCH for-7.2 v3 16/20] device_tree.c: support string array prop in fdt_format_node()

2022-08-18 Thread David Gibson
On Tue, Aug 16, 2022 at 02:34:24PM -0300, Daniel Henrique Barboza wrote:
> To support printing string properties in 'info fdt' we need to determine
> whether a void data might contain a string array.
> 
> We do that by casting the void data to a string array and:
> 
> - check if the array finishes with a null character
> - check if there's no empty string in the middle of the array (i.e.
> consecutive \0\0 characters)
> - check if all characters of each substring are printable
> 
> If all conditions are met, we'll consider it to be a string array data
> type and print it accordingly. After this change, 'info fdt' is now able
> to print string arrays. Here's an example of string arrays we're able to
> print in the /rtas node of the ppc64 pSeries machine:
> 
> (qemu) info fdt /rtas
> rtas {
> (...)
> qemu,hypertas-functions = "hcall-memop1";
> ibm,hypertas-functions = "hcall-pft","hcall-term","hcall-dabr",
> "hcall-interrupt","hcall-tce","hcall-vio","hcall-splpar","hcall-join",
> "hcall-bulk","hcall-set-mode","hcall-sprg0","hcall-copy","hcall-debug",
> "hcall-vphn","hcall-multi-tce","hcall-hpt-resize","hcall-watchdog";

nit: typical dts style would have extra spaces:
prop = "foo", "bar";
i.e. separated by ", " not just ",".

> }
> 
> 'qemu,hypertas-functions' is a property with a single string while
> 'ibm,hypertas-functions' is a string array.
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  softmmu/device_tree.c | 64 ++-
>  1 file changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> index 3fb07b537f..d32d6856da 100644
> --- a/softmmu/device_tree.c
> +++ b/softmmu/device_tree.c
> @@ -663,6 +663,63 @@ void qemu_fdt_qmp_dumpdtb(const char *filename, Error 
> **errp)
>  error_setg(errp, "Error when saving machine FDT to file %s", filename);
>  }
>  
> +static bool fdt_prop_is_string_array(const void *data, int size)
> +{
> +const char *str_arr, *str;
> +int i, str_len;
> +
> +str_arr = str = data;
> +
> +if (size <= 0 || str_arr[size - 1] != '\0') {
> +return false;
> +}
> +
> +while (str < str_arr + size) {
> +str_len = strlen(str);
> +
> +/*
> + * Do not consider empty strings (consecutives \0\0)
> + * as valid.
> + */
> +if (str_len == 0) {
> +return false;
> +}
> +
> +for (i = 0; i < str_len; i++) {
> +if (!g_ascii_isprint(str[i])) {
> +return false;
> +}
> +}
> +
> +str += str_len + 1;
> +}
> +
> +return true;
> +}
> +
> +static void fdt_prop_format_string_array(GString *buf,
> + const char *propname,
> + const char *data,
> + int prop_size, int padding)
> +{
> +const char *str = data;
> +
> +g_string_append_printf(buf, "%*s%s = ", padding, "", propname);
> +
> +while (str < data + prop_size) {
> +/* appends up to the next '\0' */
> +g_string_append_printf(buf, "\"%s\"", str);
> +
> +str += strlen(str) + 1;
> +if (str < data + prop_size) {
> +/* add a comma separator for the next string */
> +g_string_append_printf(buf, ",");
> +}
> +}
> +
> +g_string_append_printf(buf, ";\n");
> +}
> +
>  static void fdt_format_node(GString *buf, int node, int depth)
>  {
>  const struct fdt_property *prop = NULL;
> @@ -681,7 +738,12 @@ static void fdt_format_node(GString *buf, int node, int 
> depth)
>  prop = fdt_get_property_by_offset(fdt, property, _size);
>  propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
>  
> -g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
> +if (fdt_prop_is_string_array(prop->data, prop_size)) {
> +fdt_prop_format_string_array(buf, propname, prop->data,
> + prop_size, padding);
> +} else {
> +g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
> +}
>  }
>  
>  padding -= 4;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v3 18/20] device_node.c: enable 'info fdt' to print subnodes

2022-08-18 Thread David Gibson
On Tue, Aug 16, 2022 at 02:34:26PM -0300, Daniel Henrique Barboza wrote:
> Printing subnodes of a given node will allow us to show a whole subtree,
> which the additional perk of 'info fdt /' being able to print the whole
> FDT.
> 
> Since we're now printing more than one subnode, change 'fdt_info' to
> print the full path of the first node. This small tweak helps
> identifying which node or subnode are being displayed.
> 
> To demostrate this capability without printing the whole FDT, the
> '/cpus/cpu-map' node from the ARM 'virt' machine has a lot of subnodes:
> 
> (qemu) info fdt /cpus/cpu-map
> /cpus/cpu-map {
> socket0 {
> cluster0 {
> core0 {
> cpu = <0x8001>
> }
> }
> }
> }

nit: dts format requires a ; after each closing }
foo {
bar {
};
};

> Signed-off-by: Daniel Henrique Barboza 
> ---
>  softmmu/device_tree.c | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> index 43f96e371b..a6bfbc0617 100644
> --- a/softmmu/device_tree.c
> +++ b/softmmu/device_tree.c
> @@ -766,17 +766,26 @@ static void fdt_prop_format_val(GString *buf, const 
> char *propname,
>  g_string_append_printf(buf, "];\n");
>  }
>  
> -static void fdt_format_node(GString *buf, int node, int depth)
> +
> +static void fdt_format_node(GString *buf, int node, int depth,
> +const char *fullpath)
>  {
>  const struct fdt_property *prop = NULL;
> +const char *nodename = NULL;
>  const char *propname = NULL;
>  void *fdt = current_machine->fdt;
>  int padding = depth * 4;
>  int property = 0;
> +int parent = node;
>  int prop_size;
>  
> -g_string_append_printf(buf, "%*s%s {\n", padding, "",
> -   fdt_get_name(fdt, node, NULL));
> +if (fullpath != NULL) {
> +nodename = fullpath;
> +} else {
> +nodename = fdt_get_name(fdt, node, NULL);
> +}
> +
> +g_string_append_printf(buf, "%*s%s {\n", padding, "", nodename);
>  
>  padding += 4;
>  
> @@ -801,6 +810,10 @@ static void fdt_format_node(GString *buf, int node, int 
> depth)
>  }
>  }
>  
> +fdt_for_each_subnode(node, fdt, parent) {
> +fdt_format_node(buf, node, depth + 1, NULL);
> +}
> +
>  padding -= 4;
>  g_string_append_printf(buf, "%*s}\n", padding, "");
>  }
> @@ -821,7 +834,7 @@ HumanReadableText *qemu_fdt_qmp_query_fdt(const char 
> *nodepath, Error **errp)
>  return NULL;
>  }
>  
> -fdt_format_node(buf, node, 0);
> +fdt_format_node(buf, node, 0, nodepath);
>  
>  return human_readable_text_from_str(buf);
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v3 10/20] hw/ppc: set machine->fdt in spapr machine

2022-08-18 Thread David Gibson
On Tue, Aug 16, 2022 at 02:34:18PM -0300, Daniel Henrique Barboza wrote:
> The pSeries machine never bothered with the common machine->fdt
> attribute. We do all the FDT related work using spapr->fdt_blob.
> 
> We're going to introduce HMP commands to read and save the FDT, which
> will rely on setting machine->fdt properly to work across all machine
> archs/types.
> 
> Let's set machine->fdt in the two places where we manipulate the FDT:
> spapr_machine_reset() and CAS.

So, there's a third place where fdt_blob is updated, in h_update_dt();
that happens because SLOF can make some updates to the DT that qemu
needs to be aware of.  It's kinda ugly, and is a consequence of the
fact that qemu and SLOF kind of share the role of "platform firmware"
for spapr.

But.. it's worse than that.  Those are the only 3 places we actually
alter fdt_blob, but not the only places we logically update the device
tree.  Up until now there wasn't a way to introspect the DT, and so we
didn't bother keeping spapr->fdt_blob update.  Essentially, we
considered maintaining the DT image the job of the guest after CAS.

Specifically, every dynamic reconfiguration event (hotplug or unplug)
alters the device tree.  We generate an fdt fragment for the new
device then stream that as an update to the guest using the PAPR
specified interface (rtas_ibm_configure_connector).  As noted we
currently don't update qemu's global fdt image based on that.  On hot
unplug logically we need to revert those changes, which is actually
pretty tricky, but currently the guest's job.


Really, the trouble is that just dumping or viewing the dt is only
simple in an "embedded" style environment where the fdt is generate
then spit into the guest.  In an actual open firmware environment like
spapr, the DT is logically a dynamic thing maintained by firmware -
but because "firmware"'s responsibility is split between SLOF and
RTAS/qemu, keeping track of that is pretty nasty.  For an environment
like this, the flat tree format isn't really suited either - we'd want
a dynamic representation of the tree.  We get away with flat trees for
now (barely) only because we mostly delegate the responsibility for
managing the tree to SLOF and/or the OS kernel, both of which do use
non-flat representations of the tree.

> spapr->fdt_blob is left untouched for now. To replace it with
> machine->fdt, since we're migrating spapr->fdt_blob, we would need to
> migrate machine->fdt as well. This is something that we would like to to
> do keep our code simpler but it's a work we'll do another day.
> 
> Cc: Cédric Le Goater 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr.c   | 6 ++
>  hw/ppc/spapr_hcall.c | 8 
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index bc9ba6e6dc..7031cf964a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1713,6 +1713,12 @@ static void spapr_machine_reset(MachineState *machine)
>  spapr->fdt_initial_size = spapr->fdt_size;
>  spapr->fdt_blob = fdt;
>  
> +/*
> + * Set the common machine->fdt pointer to enable support
> + * for 'dumpdtb' and 'info fdt' QMP/HMP commands.
> + */
> +machine->fdt = fdt;
> +
>  /* Set up the entry state */
>  first_ppc_cpu->env.gpr[5] = 0;
>  
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index a8d4a6bcf0..a53bfd76f4 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1256,6 +1256,14 @@ target_ulong do_client_architecture_support(PowerPCCPU 
> *cpu,
>  spapr->fdt_initial_size = spapr->fdt_size;
>  spapr->fdt_blob = fdt;
>  
> +/*
> + * Set the machine->fdt pointer again since we just freed
> + * it above (by freeing spapr->fdt_blob). We set this
> + * pointer to enable support for 'dumpdtb' and 'info fdt'
> + * QMP/HMP commands.
> + */
> +MACHINE(spapr)->fdt = fdt;
> +
>  return H_SUCCESS;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 10/20] hw/ppc: set machine->fdt in spapr machine

2022-08-14 Thread David Gibson
On Fri, Aug 12, 2022 at 07:23:09PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/8/22 00:26, David Gibson wrote:
> > On Fri, Aug 05, 2022 at 06:39:38AM -0300, Daniel Henrique Barboza wrote:
> > > The pSeries machine never bothered with the common machine->fdt
> > > attribute. We do all the FDT related work using spapr->fdt_blob.
> > > 
> > > We're going to introduce HMP commands to read and save the FDT, which
> > > will rely on setting machine->fdt properly to work across all machine
> > > archs/types.
> > > 
> > > Let's set machine->fdt in the two places where we manipulate the FDT:
> > > spapr_machine_reset() and CAS. spapr->fdt_blob is left untouched: what
> > > we want is a way to access the FDT from HMP, not replace
> > > spapr->fdt_blob.
> > 
> > Given there is now an fdt field in the generic MACHINE structure, we
> > should be able to remove the one in spapr->fdt_blob, yes?
> 
> I thought about it but I backed down when I realized that spapr->fdt_blob is
> being migrated.
> 
> At first glance it would be a matter of migrating ms->fdt and then removing
> spapr->fdt_blob, but then I got confused about how a migration to an older
> version would occur (e.g. QEMU 7.2 with ms->fdt to a QEMU 7.0 with
> spapr->fdt_blob).
> 
> Migration to a newer QEMU would require us to move the spapr->version_id to 4
> and then handle the old version accordingly in spapr_post_load().
> 
> Probably something to think about after this work is accepted.

Fair enough.  I'm confident the migration semantics can be worked out,
but it will require some care.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 01/20] hw/arm: do not free machine->fdt in arm_load_dtb()

2022-08-14 Thread David Gibson
On Fri, Aug 12, 2022 at 07:03:26PM -0300, Daniel Henrique Barboza wrote:
> David,
> 
> On 8/8/22 00:23, David Gibson wrote:
> > On Fri, Aug 05, 2022 at 06:39:29AM -0300, Daniel Henrique Barboza wrote:
> > > At this moment, arm_load_dtb() can free machine->fdt when
> > > binfo->dtb_filename is NULL. If there's no 'dtb_filename', 'fdt' will be
> > > retrieved by binfo->get_dtb(). If get_dtb() returns machine->fdt, as is
> > > the case of machvirt_dtb() from hw/arm/virt.c, fdt now has a pointer to
> > > machine->fdt. And, in that case, the existing g_free(fdt) at the end of
> > > arm_load_dtb() will make machine->fdt point to an invalid memory region.
> > > 
> > > This is not an issue right now because there's no code that access
> > > machine->fdt after arm_load_dtb(), but we're going to add a couple do
> > > FDT HMP commands that will rely on machine->fdt being valid.
> > > 
> > > Instead of freeing 'fdt' at the end of arm_load_dtb(), assign it to
> > > machine->fdt. This will allow the FDT of ARM machines that relies on
> > > arm_load_dtb() to be accessed later on.
> > > 
> > > Since all ARM machines allocates the FDT only once, we don't need to
> > > worry about leaking the existing FDT during a machine reset (which is
> > > something that other machines have to look after, e.g. the ppc64 pSeries
> > > machine).
> > > 
> > > Cc: Peter Maydell 
> > > Cc: qemu-...@nongnu.org
> > > Signed-off-by: Daniel Henrique Barboza 
> > > ---
> > >   hw/arm/boot.c | 8 +++-
> > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> > > index ada2717f76..9f5ceb62d2 100644
> > > --- a/hw/arm/boot.c
> > > +++ b/hw/arm/boot.c
> > > @@ -684,7 +684,13 @@ int arm_load_dtb(hwaddr addr, const struct 
> > > arm_boot_info *binfo,
> > >*/
> > >   rom_add_blob_fixed_as("dtb", fdt, size, addr, as);
> > > -g_free(fdt);
> > > +/*
> > > + * Update the ms->fdt pointer to enable support for 'dumpdtb'
> > > + * and 'info fdt' commands. Use fdt_pack() to shrink the blob
> > > + * size we're going to store.
> > > + */
> > > +fdt_pack(fdt);
> > > +ms->fdt = fdt;
> > >   return size;
> > 
> > fdt_pack() could change (reduce) the effective size of the dtb blob,
> > so returning a 'size' value from above rather than the new value of
> > fdt_totalsize(fdt) doesn't see right.
> 
> After some thought I think executing fdt_pack() like I'm doing here is not
> a good idea. The first problem is that I'm not returning the updated size,
> as you've said.
> 
> But I can't just amend a 'return fdt_totalsize(fdt);' either. I'm packing the
> FDT **after** the machine store it in the guest physical memory. If I return
> the packed size, but the machine isn't packing the FDT before a
> cpu_physical_memory_write(), I'll be under-reporting the FDT size written.

Ah... good point.

> Machines such as e500 (patch 4) uses this returned value to put more stuff in
> the guest memory. In that case, returning a smaller size that what was 
> actually
> written can cause the machine to overwrite the FDT by accident. In fact, only
> a handful of machines (ppc/pseries, ppc/pvn, riscv, oepenrisc) is doing a
> fdt_pack() before a cpu_physical_memory_write(). So this change would be
> potentially harmful to a lot of people.
> 
> One alternative would be to do a fdt_pack() before the machine writes the
> FDT in the guest memory, but that is too intrusive to do because I can't say
> if each of these machines will be OK with that. I have a hunch that it would
> be OK, but a hunch isn't going to cut it.

Hmm.. I'd be fairly confident that would be ok.  It certainly should
be ok for the fdt content itself, fdt_pack() doesn't change that
semantically.  If the machine is using that size to put stuff after, I
can't really see how that could break, either.  Unless the machine
were using the fdtsize in one place and a fixed size somewhere else to
address the same data, which would be a bug.

> I'll just drop the fdt_pack() for each of these patches. If the machine code
> is already packing it, fine. If not, that's fine too. Each maintainer can
> assess whether packing the FDT is worth it or not.

That's probably reasonable for the time being.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 16/20] device_tree.c: support string props in fdt_format_node()

2022-08-10 Thread David Gibson
On Wed, Aug 10, 2022 at 04:40:18PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/8/22 01:36, David Gibson wrote:
> > On Fri, Aug 05, 2022 at 06:39:44AM -0300, Daniel Henrique Barboza wrote:
> > > To support printing string properties in 'info fdt' we need to determine
> > > whether a void data might contain a string.
> > 
> > Oh... sorry, obviously I hadn't read these later patches when I
> > complained about the command not printing property values.
> > 
> > > 
> > > We do that by casting the void data to a string array and:
> > > 
> > > - check if the array finishes with a null character
> > > - check if all characters are printable
> > 
> > This won't handle the case of the "string list" several strings tacked
> > together, separated by their terminating \0 characters.
> 
> H how is this printed? Should we concatenate them? Replace the \0
> with a whitespace? Or ignore the zero and concatenate them?
> 
> E.g. this is a\0string\0list
> 
> Should we print it as:
> 
> this is astringlist
> 
> or
> 
> this is a string list ?

Well, if you're going for dts like output, which you seem to be, you
have two options:

1) Escape the medial nulls

"this\0is\0a\0string\0list"

2) Multiple strings:

"this", "is", "a", "string", "list"

Both forms are allowed in dts and will result in an identical
bytestring in the property.

> > > If both conditions are met, we'll consider it to be a string data type
> > > and print it accordingly. After this change, 'info fdt' is now able to
> > > print string properties. Here's an example with the ARM 'virt' machine:
> > > 
> > > (qemu) info fdt /chosen
> > > chosen {
> > >  stdout-path = '/pl011@900'
> > >  rng-seed;
> > >  kaslr-seed;
> > > }
> > > 
> > > Signed-off-by: Daniel Henrique Barboza 
> > > ---
> > >   softmmu/device_tree.c | 25 -
> > >   1 file changed, 24 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> > > index 3fb07b537f..8691c3ccc0 100644
> > > --- a/softmmu/device_tree.c
> > > +++ b/softmmu/device_tree.c
> > > @@ -663,6 +663,24 @@ void qemu_fdt_qmp_dumpdtb(const char *filename, 
> > > Error **errp)
> > >   error_setg(errp, "Error when saving machine FDT to file %s", 
> > > filename);
> > >   }
> > > +static bool fdt_prop_is_string(const void *data, int size)
> > > +{
> > > +const char *str = data;
> > > +int i;
> > > +
> > > +if (size <= 0 || str[size - 1] != '\0') {
> > > +return false;
> > > +}
> > > +
> > > +for (i = 0; i < size - 1; i++) {
> > > +if (!g_ascii_isprint(str[i])) {
> > > +return false;
> > > +}
> > > +}
> > > +
> > > +return true;
> > > +}
> > > +
> > >   static void fdt_format_node(GString *buf, int node, int depth)
> > >   {
> > >   const struct fdt_property *prop = NULL;
> > > @@ -681,7 +699,12 @@ static void fdt_format_node(GString *buf, int node, 
> > > int depth)
> > >   prop = fdt_get_property_by_offset(fdt, property, _size);
> > >   propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
> > > -g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
> > > +if (fdt_prop_is_string(prop->data, prop_size)) {
> > > +g_string_append_printf(buf, "%*s%s = '%s'\n",
> > > +   padding, "", propname, prop->data);
> > 
> > If you're going for dts like output, I'd suggest going all the way.
> > That means \" instead of \' and a ';' at the end of the line.
> > 
> > > +} else {
> > > +g_string_append_printf(buf, "%*s%s;\n", padding, "", 
> > > propname);
> > > +}
> > >   }
> > >   padding -= 4;
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 15/20] qmp/hmp, device_tree.c: introduce 'info fdt' command

2022-08-07 Thread David Gibson
ror_report_err(local_err);
>  }
>  }
> +
> +void hmp_info_fdt(Monitor *mon, const QDict *qdict)
> +{
> +const char *nodepath = qdict_get_str(qdict, "nodepath");
> +Error *err = NULL;
> +g_autoptr(HumanReadableText) info = qmp_x_query_fdt(nodepath, );
> +
> +if (hmp_handle_error(mon, err)) {
> +return;
> +}
> +
> +monitor_printf(mon, "%s", info->human_readable_text);
> +}
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index 8415aca08c..db2c6aa7da 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -603,9 +603,21 @@ void qmp_dumpdtb(const char *filename, Error **errp)
>  {
>  return qemu_fdt_qmp_dumpdtb(filename, errp);
>  }
> +
> +HumanReadableText *qmp_x_query_fdt(const char *nodepath, Error **errp)
> +{
> +return qemu_fdt_qmp_query_fdt(nodepath, errp);
> +}
>  #else
>  void qmp_dumpdtb(const char *filename, Error **errp)
>  {
>  error_setg(errp, "dumpdtb requires libfdt");
>  }
> +
> +HumanReadableText *qmp_x_query_fdt(const char *nodepath, Error **errp)
> +{
> +error_setg(errp, "this command requires libfdt");
> +
> +return NULL;
> +}
>  #endif
> diff --git a/qapi/machine.json b/qapi/machine.json
> index aeb013f3dd..96cff541ca 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1681,3 +1681,22 @@
>  ##
>  { 'command': 'dumpdtb',
>'data': { 'filename': 'str' } }
> +
> +##
> +# @x-query-fdt:
> +#
> +# Query for FDT element (node or property). Requires 'libfdt'.
> +#
> +# @nodepath: the path of the FDT node to be retrieved
> +#
> +# Features:
> +# @unstable: This command is meant for debugging.
> +#
> +# Returns: FDT node
> +#
> +# Since: 7.2
> +##
> +{ 'command': 'x-query-fdt',
> +  'data': { 'nodepath': 'str' },
> +  'returns': 'HumanReadableText',
> +  'features': [ 'unstable' ]  }


A QMP command that returns human readable text, rather than something
JSON structured seems... odd.

Admittedly, *how* you'd JSON structure the results gets a bit tricky.
Listing nodes or property names would be easy enough, but getting the
property contents is hairy, since JSON strings are supposed to be
Unicode, but DT properties can be arbitrary bytestrings.

> diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> index cd487ddd4d..3fb07b537f 100644
> --- a/softmmu/device_tree.c
> +++ b/softmmu/device_tree.c
> @@ -18,6 +18,7 @@
>  #endif
>  
>  #include "qapi/error.h"
> +#include "qapi/type-helpers.h"
>  #include "qemu/error-report.h"
>  #include "qemu/option.h"
>  #include "qemu/bswap.h"
> @@ -661,3 +662,49 @@ void qemu_fdt_qmp_dumpdtb(const char *filename, Error 
> **errp)
>  
>  error_setg(errp, "Error when saving machine FDT to file %s", filename);
>  }
> +
> +static void fdt_format_node(GString *buf, int node, int depth)
> +{
> +const struct fdt_property *prop = NULL;
> +const char *propname = NULL;
> +void *fdt = current_machine->fdt;
> +int padding = depth * 4;
> +int property = 0;
> +int prop_size;
> +
> +g_string_append_printf(buf, "%*s%s {\n", padding, "",
> +   fdt_get_name(fdt, node, NULL));
> +
> +padding += 4;
> +
> +fdt_for_each_property_offset(property, fdt, node) {
> +prop = fdt_get_property_by_offset(fdt, property, _size);
> +propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
> +
> +g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
> +}
> +
> +padding -= 4;
> +g_string_append_printf(buf, "%*s}\n", padding, "");

So, this lists it in dts format... kind of.  Because you don't include
the property values, it makes it look like all properties are binary
(either absent or present-but-empty).  I think that's misleading.  If
you're only going to list properties, I think you'd be better off
using different formatting (and maybe a more specific command name as
well).

> +}
> +
> +HumanReadableText *qemu_fdt_qmp_query_fdt(const char *nodepath, Error **errp)
> +{
> +g_autoptr(GString) buf = g_string_new("");
> +int node;
> +
> +if (!current_machine->fdt) {
> +error_setg(errp, "Unable to find the machine FDT");
> +return NULL;
> +}
> +
> +node = fdt_path_offset(current_machine->fdt, nodepath);
> +if (node < 0) {
> +error_setg(errp, "node '%s' not found in FDT", nodepath);
> +return NULL;
> +}
> +
> +fdt_format_node(buf, node, 0);
> +
> +return human_readable_text_from_str(buf);
> +}

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 16/20] device_tree.c: support string props in fdt_format_node()

2022-08-07 Thread David Gibson
On Fri, Aug 05, 2022 at 06:39:44AM -0300, Daniel Henrique Barboza wrote:
> To support printing string properties in 'info fdt' we need to determine
> whether a void data might contain a string.

Oh... sorry, obviously I hadn't read these later patches when I
complained about the command not printing property values.

> 
> We do that by casting the void data to a string array and:
> 
> - check if the array finishes with a null character
> - check if all characters are printable

This won't handle the case of the "string list" several strings tacked
together, separated by their terminating \0 characters.

> 
> If both conditions are met, we'll consider it to be a string data type
> and print it accordingly. After this change, 'info fdt' is now able to
> print string properties. Here's an example with the ARM 'virt' machine:
> 
> (qemu) info fdt /chosen
> chosen {
> stdout-path = '/pl011@900'
> rng-seed;
> kaslr-seed;
> }
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  softmmu/device_tree.c | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> index 3fb07b537f..8691c3ccc0 100644
> --- a/softmmu/device_tree.c
> +++ b/softmmu/device_tree.c
> @@ -663,6 +663,24 @@ void qemu_fdt_qmp_dumpdtb(const char *filename, Error 
> **errp)
>  error_setg(errp, "Error when saving machine FDT to file %s", filename);
>  }
>  
> +static bool fdt_prop_is_string(const void *data, int size)
> +{
> +const char *str = data;
> +int i;
> +
> +if (size <= 0 || str[size - 1] != '\0') {
> +return false;
> +}
> +
> +for (i = 0; i < size - 1; i++) {
> +if (!g_ascii_isprint(str[i])) {
> +return false;
> +}
> +}
> +
> +return true;
> +}
> +
>  static void fdt_format_node(GString *buf, int node, int depth)
>  {
>  const struct fdt_property *prop = NULL;
> @@ -681,7 +699,12 @@ static void fdt_format_node(GString *buf, int node, int 
> depth)
>  prop = fdt_get_property_by_offset(fdt, property, _size);
>  propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
>  
> -g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
> +if (fdt_prop_is_string(prop->data, prop_size)) {
> +g_string_append_printf(buf, "%*s%s = '%s'\n",
> +   padding, "", propname, prop->data);

If you're going for dts like output, I'd suggest going all the way.
That means \" instead of \' and a ';' at the end of the line.

> +} else {
> +g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
> +}
>  }
>  
>  padding -= 4;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 17/20] device_tree.c: support remaining FDT prop types

2022-08-07 Thread David Gibson
On Fri, Aug 05, 2022 at 06:39:45AM -0300, Daniel Henrique Barboza wrote:
> When printing a blob with 'dtc' using the '-O dts' option there are 3
> distinct data types being printed: strings, arrays of uint32s and
> regular byte arrays.
> 
> Previous patch added support to print strings. Let's add the remaining
> formats. We want to resemble the format that 'dtc -O dts' uses, so every
> uint32 array uses angle brackets (<>), and regular byte array uses square
> brackets ([]). For properties that has no values we keep printing just
> its name.
> 
> The /chosen FDT node from the pSeris machine gives an example of all
> property types 'info fdt' is now able to display:
> 
> (qemu) info fdt /chosen
> chosen {
> ibm,architecture-vec-5 = [0 0]
> rng-seed = <0x5967a270 0x62b0fb4f 0x8262b46a 0xabf48423 0xcce9615 
> 0xf9daae64 0x66564790 0x357d1604>
> ibm,arch-vec-5-platform-support = <0x178018c0 0x19001a40>
> linux,pci-probe-only = <0x0>
> stdout-path = '/vdevice/vty@7100'
> linux,stdout-path = '/vdevice/vty@7100'
> qemu,graphic-depth = <0x20>
> qemu,graphic-height = <0x258>
> qemu,graphic-width = <0x320>
> }
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  softmmu/device_tree.c | 58 ++-
>  1 file changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> index 8691c3ccc0..9d95e4120b 100644
> --- a/softmmu/device_tree.c
> +++ b/softmmu/device_tree.c
> @@ -681,6 +681,53 @@ static bool fdt_prop_is_string(const void *data, int 
> size)
>  return true;
>  }
>  
> +static bool fdt_prop_is_uint32_array(int size)
> +{
> +return size % 4 == 0;
> +}
> +
> +static void fdt_prop_format_uint32_array(GString *buf,
> + const char *propname,
> + const void *data,
> + int prop_size, int padding)
> +{
> +const fdt32_t *array = data;
> +int array_len = prop_size / 4;
> +int i;
> +
> +g_string_append_printf(buf, "%*s%s = <", padding, "", propname);
> +
> +for (i = 0; i < array_len; i++) {
> +g_string_append_printf(buf, "0x%" PRIx32, fdt32_to_cpu(array[i]));
> +
> +if (i < array_len - 1) {
> +g_string_append_printf(buf, " ");
> +}
> +}
> +
> +g_string_append_printf(buf, ">\n");

Add a ';' to match dts.

> +}
> +
> +static void fdt_prop_format_val(GString *buf, const char *propname,
> +const void *data, int prop_size,
> +int padding)
> +{
> +const char *val = data;
> +int i;
> +
> +g_string_append_printf(buf, "%*s%s = [", padding, "", propname);
> +
> +for (i = 0; i < prop_size; i++) {
> +g_string_append_printf(buf, "%x", val[i]);

For dts like output you need %02x.  In [] notation, the spaces are
actually optional and ignored, so [0 0] is equivalent to [00], which is
equivalent to "".

> +
> +if (i < prop_size - 1) {
> +g_string_append_printf(buf, " ");
> +}
> +}
> +
> +g_string_append_printf(buf, "]\n");

;

> +}
> +
>  static void fdt_format_node(GString *buf, int node, int depth)
>  {
>  const struct fdt_property *prop = NULL;
> @@ -699,11 +746,20 @@ static void fdt_format_node(GString *buf, int node, int 
> depth)
>  prop = fdt_get_property_by_offset(fdt, property, _size);
>  propname = fdt_string(fdt, fdt32_to_cpu(prop->nameoff));
>  
> +if (prop_size == 0) {
> +g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
> +continue;
> +}
> +
>  if (fdt_prop_is_string(prop->data, prop_size)) {
>  g_string_append_printf(buf, "%*s%s = '%s'\n",
>         padding, "", propname, prop->data);
> +} else if (fdt_prop_is_uint32_array(prop_size)) {
> +fdt_prop_format_uint32_array(buf, propname, prop->data, 
> prop_size,
> + padding);
>  } else {
> -g_string_append_printf(buf, "%*s%s;\n", padding, "", propname);
> +fdt_prop_format_val(buf, propname, prop->data,
> +prop_size, padding);
>  }
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 09/20] hw/ppc: set machine->fdt in pnv_reset()

2022-08-07 Thread David Gibson
On Fri, Aug 05, 2022 at 09:31:11AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 8/5/22 08:03, Frederic Barrat wrote:
> > 
> > 
> > On 05/08/2022 11:39, Daniel Henrique Barboza wrote:
> > > This will enable support for 'dumpdtb' and 'info fdt' HMP commands for
> > > all powernv machines.
> > > 
> > > Cc: Cédric Le Goater 
> > > Cc: Frederic Barrat 
> > > Signed-off-by: Daniel Henrique Barboza 
> > > ---
> > >   hw/ppc/pnv.c | 6 +-
> > >   1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> > > index d3f77c8367..f5162f8b7b 100644
> > > --- a/hw/ppc/pnv.c
> > > +++ b/hw/ppc/pnv.c
> > > @@ -608,7 +608,11 @@ static void pnv_reset(MachineState *machine)
> > >   qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt));
> > >   cpu_physical_memory_write(PNV_FDT_ADDR, fdt, fdt_totalsize(fdt));
> > > -    g_free(fdt);
> > > +    /*
> > > + * Update the machine->fdt pointer to enable support for
> > > + * 'dumpdtb' and 'info fdt' commands.
> > > + */
> > > +    machine->fdt = fdt;
> > 
> > 
> > Can pnv_reset() be called several times in the same instance of the qemu 
> > process, in which case we leak memory?
> 
> hmmm I think it's possible if we issue a 'system_reset' via the
> monitor.

Right.  I'm not certain about pnv, but on most platforms there's a way
to trigger system_reset from the guest side as well.

> I'll put a g_free(machine->fdt) before the assignment.
> 
> 
> Daniel
> 
> 
> > 
> >    Fred
> > 
> > 
> > >   }
> > >   static ISABus *pnv_chip_power8_isa_create(PnvChip *chip, Error **errp)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 01/20] hw/arm: do not free machine->fdt in arm_load_dtb()

2022-08-07 Thread David Gibson
On Fri, Aug 05, 2022 at 06:39:29AM -0300, Daniel Henrique Barboza wrote:
> At this moment, arm_load_dtb() can free machine->fdt when
> binfo->dtb_filename is NULL. If there's no 'dtb_filename', 'fdt' will be
> retrieved by binfo->get_dtb(). If get_dtb() returns machine->fdt, as is
> the case of machvirt_dtb() from hw/arm/virt.c, fdt now has a pointer to
> machine->fdt. And, in that case, the existing g_free(fdt) at the end of
> arm_load_dtb() will make machine->fdt point to an invalid memory region.
> 
> This is not an issue right now because there's no code that access
> machine->fdt after arm_load_dtb(), but we're going to add a couple do
> FDT HMP commands that will rely on machine->fdt being valid.
> 
> Instead of freeing 'fdt' at the end of arm_load_dtb(), assign it to
> machine->fdt. This will allow the FDT of ARM machines that relies on
> arm_load_dtb() to be accessed later on.
> 
> Since all ARM machines allocates the FDT only once, we don't need to
> worry about leaking the existing FDT during a machine reset (which is
> something that other machines have to look after, e.g. the ppc64 pSeries
> machine).
> 
> Cc: Peter Maydell 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/arm/boot.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index ada2717f76..9f5ceb62d2 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -684,7 +684,13 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
> *binfo,
>   */
>  rom_add_blob_fixed_as("dtb", fdt, size, addr, as);
>  
> -g_free(fdt);
> +/*
> + * Update the ms->fdt pointer to enable support for 'dumpdtb'
> + * and 'info fdt' commands. Use fdt_pack() to shrink the blob
> + * size we're going to store.
> + */
> +fdt_pack(fdt);
> +ms->fdt = fdt;
>  
>  return size;

fdt_pack() could change (reduce) the effective size of the dtb blob,
so returning a 'size' value from above rather than the new value of
fdt_totalsize(fdt) doesn't see right.

I believe some of the other patches in the series have similar concerns.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 10/20] hw/ppc: set machine->fdt in spapr machine

2022-08-07 Thread David Gibson
On Fri, Aug 05, 2022 at 06:39:38AM -0300, Daniel Henrique Barboza wrote:
> The pSeries machine never bothered with the common machine->fdt
> attribute. We do all the FDT related work using spapr->fdt_blob.
> 
> We're going to introduce HMP commands to read and save the FDT, which
> will rely on setting machine->fdt properly to work across all machine
> archs/types.
> 
> Let's set machine->fdt in the two places where we manipulate the FDT:
> spapr_machine_reset() and CAS. spapr->fdt_blob is left untouched: what
> we want is a way to access the FDT from HMP, not replace
> spapr->fdt_blob.

Given there is now an fdt field in the generic MACHINE structure, we
should be able to remove the one in spapr->fdt_blob, yes?

> 
> Cc: Cédric Le Goater 
> Cc: qemu-...@nongnu.org
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr.c   | 6 ++
>  hw/ppc/spapr_hcall.c | 8 
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index bc9ba6e6dc..94c90f0351 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1713,6 +1713,12 @@ static void spapr_machine_reset(MachineState *machine)
>  spapr->fdt_initial_size = spapr->fdt_size;
>  spapr->fdt_blob = fdt;
>  
> +/*
> + * Set the common machine->fdt pointer to enable support
> + * for 'dumpdtb' and 'info fdt' commands.
> + */
> +machine->fdt = fdt;
> +
>  /* Set up the entry state */
>  first_ppc_cpu->env.gpr[5] = 0;
>  
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index a8d4a6bcf0..0079bc6fdc 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1256,6 +1256,14 @@ target_ulong do_client_architecture_support(PowerPCCPU 
> *cpu,
>  spapr->fdt_initial_size = spapr->fdt_size;
>  spapr->fdt_blob = fdt;
>  
> +/*
> + * Set the machine->fdt pointer again since we just freed
> + * it above (by freeing spapr->fdt_blob). We set this
> + * pointer to enable support for 'dumpdtb' and 'info fdt'
> + * HMP commands.
> + */
> +MACHINE(spapr)->fdt = fdt;
> +
>  return H_SUCCESS;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.2 v2 14/20] qmp/hmp, device_tree.c: introduce dumpdtb

2022-08-07 Thread David Gibson
**errp)
> +{
> +return qemu_fdt_qmp_dumpdtb(filename, errp);
> +}
> +#else
> +void qmp_dumpdtb(const char *filename, Error **errp)
> +{
> +error_setg(errp, "dumpdtb requires libfdt");
> +}
> +#endif
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 6afd1936b0..aeb013f3dd 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1664,3 +1664,20 @@
>   '*size': 'size',
>   '*max-size': 'size',
>   '*slots': 'uint64' } }
> +
> +##
> +# @dumpdtb:
> +#
> +# Save the FDT in dtb format. Requires 'libfdt' support.
> +#
> +# @filename: name of the FDT file to be created
> +#
> +# Since: 7.2
> +#
> +# Example:
> +#   {"execute": "dumpdtb"}
> +#"arguments": { "filename": "/tmp/fdt.dtb" } }
> +#
> +##
> +{ 'command': 'dumpdtb',
> +  'data': { 'filename': 'str' } }
> diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> index 6ca3fad285..cd487ddd4d 100644
> --- a/softmmu/device_tree.c
> +++ b/softmmu/device_tree.c
> @@ -643,3 +643,21 @@ out:
>  g_free(propcells);
>  return ret;
>  }
> +
> +void qemu_fdt_qmp_dumpdtb(const char *filename, Error **errp)
> +{
> +int size;
> +
> +if (!current_machine->fdt) {
> +error_setg(errp, "Unable to find the machine FDT");
> +return;
> +}
> +
> +size = fdt_totalsize(current_machine->fdt);
> +
> +if (g_file_set_contents(filename, current_machine->fdt, size, NULL)) {
> +return;
> +}
> +
> +error_setg(errp, "Error when saving machine FDT to file %s", filename);
> +}

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2] target/ppc/kvm: Skip current and parent directories in kvmppc_find_cpu_dt

2022-07-12 Thread David Gibson
On Tue, Jul 12, 2022 at 06:08:10PM -0300, Murilo Opsfelder Araujo wrote:
> Some systems have /proc/device-tree/cpus/../clock-frequency. However,
> this is not the expected path for a CPU device tree directory.
> 
> Signed-off-by: Murilo Opsfelder Araujo 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: David Gibson 

and, I believe, mea culpa.

> ---
> v2:
> - Skip current and parent directories.
> 
> v1: 
> https://lore.kernel.org/qemu-devel/20220711193743.51456-1-muri...@linux.ibm.com/
> 
>  target/ppc/kvm.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 6eed466f80..466d0d2f4c 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1877,6 +1877,12 @@ static int kvmppc_find_cpu_dt(char *buf, int buf_len)
>  buf[0] = '\0';
>  while ((dirp = readdir(dp)) != NULL) {
>  FILE *f;
> +
> +/* Don't accidentally read from the current and parent directories */
> +if (strcmp(dirp->d_name, ".") == 0 || strcmp(dirp->d_name, "..") == 
> 0) {
> +continue;
> +}
> +
>  snprintf(buf, buf_len, "%s%s/clock-frequency", PROC_DEVTREE_CPU,
>   dirp->d_name);
>  f = fopen(buf, "r");

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH] target/ppc: don't print TB in ppc_cpu_dump_state if it's not initialized

2022-07-12 Thread David Gibson
On Tue, Jul 12, 2022 at 06:13:44PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 7/12/22 16:25, Matheus Ferst wrote:
> > When using "-machine none", env->tb_env is not allocated, causing the
> > segmentation fault reported in issue #85 (launchpad bug #811683). To
> > avoid this problem, check if the pointer != NULL before calling the
> > methods to print TBU/TBL/DECR.
> > 
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/85
> > Signed-off-by: Matheus Ferst 
> > ---
> > This patch fixes the reported problem, but may be an incomplete solution
> > since many other places dereference env->tb_env without checking for
> > NULL. AFAICS, "-machine none" is the only way to trigger this problem,
> > and I'm not familiar with the use-cases for this option.
> 
> The "none"  machine type is mainly used by libvirt to do instrospection
> of the available options/capabilities of the QEMU binary. It starts a QEMU
> process like the following:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -S -no-user-config -nodefaults \
>   -nographic -machine none,accel=kvm:tcg -daemonize
> 
> And then it uses QMP to probe the binary.
> 
> Aside from this libvirt usage I am not aware of anyone else using -machine
> none extensively.

Right.  -machine none basically cannot work as a real machine for
POWER (maybe some other CPUs as well).  At least the more modern POWER
CPUs simply cannot boot without a bunch of supporting board/system
level elements, and there's not really a sane way to encode those into
individual emulated devices at present (maybe ever).

One of those things is that POWER expects the timebases to be
synchronized across all CPUs in the system, which obviously can't be
done locally to a single CPU chip.  It requires system level
operations, which is why it's handled by the machine type

[Example: a typical sequence which might be handled in hardware by
 low-level firmware would be to use machine-specific board-level
 registers to suspend the clock pulse to the CPUs which drives the
 timebase, then write the same value to the TB on each CPU, then
 (atomically) restart the clock pulse using board registers again]
 
> > Should we stop assuming env->tb_env != NULL and add checks everywhere?
> > Or should we find a way to provide Time Base/Decrementer for
> > "-machine none"?
> > ---
> 
> Are there other cases where env->tb_env can be NULL, aside from the case
> reported in the bug?

If there are, I'd say that's a bug in the machine type.  Setting up
(and synchronizing) the timebase is part of the machine's job.

> I don't mind the bug fix, but I'm not fond of the idea of adding additional
> checks because of this particular issue. I mean, the bug is using  the 'prep'
> machine that Thomas removed year ago in b2ce76a0730. If there's no other
> foreseeable problem, that we care about, with env->tb_env being NULL, IMO
> let's fix the bug and move on.
> 
> 
> 
> Thanks,
> 
> 
> Daniel
> 
> 
> 
> 
> >   target/ppc/cpu_init.c | 16 
> >   1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> > index 86ad28466a..7e96baac9f 100644
> > --- a/target/ppc/cpu_init.c
> > +++ b/target/ppc/cpu_init.c
> > @@ -7476,18 +7476,18 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, int 
> > flags)
> >"%08x iidx %d didx %d\n",
> >env->msr, env->spr[SPR_HID0], env->hflags,
> >cpu_mmu_index(env, true), cpu_mmu_index(env, false));
> > -#if !defined(NO_TIMER_DUMP)
> > -qemu_fprintf(f, "TB %08" PRIu32 " %08" PRIu64
> > +if (env->tb_env) {
> > +qemu_fprintf(f, "TB %08" PRIu32 " %08" PRIu64
> >   #if !defined(CONFIG_USER_ONLY)
> > - " DECR " TARGET_FMT_lu
> > + " DECR " TARGET_FMT_lu
> >   #endif
> > - "\n",
> > - cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
> > + "\n",
> > + cpu_ppc_load_tbu(env), cpu_ppc_load_tbl(env)
> >   #if !defined(CONFIG_USER_ONLY)
> > - , cpu_ppc_load_decr(env)
> > -#endif
> > -);
> > + , cpu_ppc_load_decr(env)
> >   #endif
> > +);
> > +}
> >   for (i = 0; i < 32; i++) {
> >   if ((i & (RGPL - 1)) == 0) {
> >   qemu_fprintf(f, "GPR%02d", i);
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] target/ppc/kvm: Skip ".." directory in kvmppc_find_cpu_dt

2022-07-12 Thread David Gibson
On Mon, Jul 11, 2022 at 04:37:43PM -0300, Murilo Opsfelder Araujo wrote:
> Some systems have /proc/device-tree/cpus/../clock-frequency. However,
> this is not the expected path for a CPU device tree directory.
> 
> Signed-off-by: Murilo Opsfelder Araujo 
> Signed-off-by: Fabiano Rosas 
> ---
>  target/ppc/kvm.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 6eed466f80..c8485a5cc0 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -1877,6 +1877,12 @@ static int kvmppc_find_cpu_dt(char *buf, int buf_len)
>  buf[0] = '\0';
>  while ((dirp = readdir(dp)) != NULL) {
>  FILE *f;
> +
> +/* Don't accidentally read from the upper directory */
> +if (strcmp(dirp->d_name, "..") == 0) {

It might not be causing problems now, but it would be technically more
correct to also skip ".", wouldn't it?

> +continue;
> +}
> +
>  snprintf(buf, buf_len, "%s%s/clock-frequency", PROC_DEVTREE_CPU,
>   dirp->d_name);
>  f = fopen(buf, "r");

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC 07/18] vfio: Add base object for VFIOContainer

2022-04-29 Thread David Gibson
On Thu, Apr 14, 2022 at 03:46:59AM -0700, Yi Liu wrote:
> Qomify the VFIOContainer object which acts as a base class for a
> container. This base class is derived into the legacy VFIO container
> and later on, into the new iommufd based container.

You certainly need the abstraction, but I'm not sure QOM is the right
way to accomplish it in this case.  The QOM class of things is visible
to the user/config layer via QMP (and sometimes command line).  It
doesn't necessarily correspond to guest visible differences, but it
often does.

AIUI, the idea here is that the back end in use should be an
implementation detail which doesn't affect the interfaces outside the
vfio subsystem itself.  If that's the case QOM may not be a great
fit, even though you can probably make it work.

> The base class implements generic code such as code related to
> memory_listener and address space management whereas the derived
> class implements callbacks that depend on the kernel user space
> being used.
> 
> 'as.c' only manipulates the base class object with wrapper functions
> that call the right class functions. Existing 'container.c' code is
> converted to implement the legacy container class functions.
> 
> Existing migration code only works with the legacy container.
> Also 'spapr.c' isn't BE agnostic.
> 
> Below is the object. It's named as VFIOContainer, old VFIOContainer
> is replaced with VFIOLegacyContainer.
> 
> struct VFIOContainer {
> /* private */
> Object parent_obj;
> 
> VFIOAddressSpace *space;
> MemoryListener listener;
> Error *error;
> bool initialized;
> bool dirty_pages_supported;
> uint64_t dirty_pgsizes;
> uint64_t max_dirty_bitmap_size;
> unsigned long pgsizes;
> unsigned int dma_max_mappings;
> QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
> QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
> QLIST_ENTRY(VFIOContainer) next;
> };
> 
> struct VFIOLegacyContainer {
> VFIOContainer obj;
> int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> MemoryListener prereg_listener;
> unsigned iommu_type;
> QLIST_HEAD(, VFIOGroup) group_list;
> };
> 
> Co-authored-by: Eric Auger 
> Signed-off-by: Eric Auger 
> Signed-off-by: Yi Liu 
> ---
>  hw/vfio/as.c |  48 +++---
>  hw/vfio/container-obj.c  | 195 +++
>  hw/vfio/container.c  | 224 ---
>  hw/vfio/meson.build  |   1 +
>  hw/vfio/migration.c  |   4 +-
>  hw/vfio/pci.c|   4 +-
>  hw/vfio/spapr.c  |  22 +--
>  include/hw/vfio/vfio-common.h|  78 ++
>  include/hw/vfio/vfio-container-obj.h | 154 ++
>  9 files changed, 540 insertions(+), 190 deletions(-)
>  create mode 100644 hw/vfio/container-obj.c
>  create mode 100644 include/hw/vfio/vfio-container-obj.h
> 
> diff --git a/hw/vfio/as.c b/hw/vfio/as.c
> index 4181182808..37423d2c89 100644
> --- a/hw/vfio/as.c
> +++ b/hw/vfio/as.c
> @@ -215,9 +215,9 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> IOMMUTLBEntry *iotlb)
>   * of vaddr will always be there, even if the memory object is
>   * destroyed and its backing memory munmap-ed.
>   */
> -ret = vfio_dma_map(container, iova,
> -   iotlb->addr_mask + 1, vaddr,
> -   read_only);
> +ret = vfio_container_dma_map(container, iova,
> + iotlb->addr_mask + 1, vaddr,
> + read_only);
>  if (ret) {
>  error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>   "0x%"HWADDR_PRIx", %p) = %d (%m)",
> @@ -225,7 +225,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> IOMMUTLBEntry *iotlb)
>   iotlb->addr_mask + 1, vaddr, ret);
>  }
>  } else {
> -ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
> +ret = vfio_container_dma_unmap(container, iova,
> +   iotlb->addr_mask + 1, iotlb);
>  if (ret) {
>  error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>   "0x%"HWADDR_PRIx") = %d (%m)",
> @@ -242,12 +243,13 @@ static void 
> vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>  {
>  VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>  listener);
> +VFIOContainer *container = vrdl->container;
>  const hwaddr size = int128_get64(section->size);
>  const hwaddr iova = section->offset_within_address_space;
>  int ret;
>  
>  /* Unmap with a single call. */
> -ret = vfio_dma_unmap(vrdl->container, iova, size , NULL);
> +ret = vfio_container_dma_unmap(container, iova, size , NULL);
>  if (ret) 

Re: [PATCH for-7.1 1/1] hw/ppc: check if spapr_drc_index() returns NULL in spapr_nvdimm.c

2022-04-06 Thread David Gibson
On Tue, Apr 05, 2022 at 05:34:16PM -0300, Daniel Henrique Barboza wrote:
> spapr_nvdimm_flush_completion_cb() and flush_worker_cb() are using the
> DRC object returned by spapr_drc_index() without checking it for NULL.
> In this case we would be dereferencing a NULL pointer when doing
> SPAPR_NVDIMM(drc->dev) and PC_DIMM(drc->dev).
> 
> This can happen if, during a scm_flush(), the DRC object is wrongly
> freed/released by another part of the code (i.e. hotunplug the device).
> spapr_drc_index() would then return NULL in the callbacks.

I'm not entirely clear if you're saying this would only happen due to
a bug elsewhere in the code, or if there's some unusual race case or
set of guest/user actions that could trigger this.

> 
> Fixes: Coverity CID 1487108, 1487178
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr_nvdimm.c | 26 ++
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
> index c4c97da5de..e92d92fdae 100644
> --- a/hw/ppc/spapr_nvdimm.c
> +++ b/hw/ppc/spapr_nvdimm.c
> @@ -447,9 +447,19 @@ static int flush_worker_cb(void *opaque)
>  {
>  SpaprNVDIMMDeviceFlushState *state = opaque;
>  SpaprDrc *drc = spapr_drc_by_index(state->drcidx);
> -PCDIMMDevice *dimm = PC_DIMM(drc->dev);
> -HostMemoryBackend *backend = MEMORY_BACKEND(dimm->hostmem);
> -int backend_fd = memory_region_get_fd(>mr);
> +PCDIMMDevice *dimm;
> +HostMemoryBackend *backend;
> +int backend_fd;
> +
> +if (!drc) {
> +error_report("papr_scm: Could not find nvdimm device with DRC 0x%u",
> + state->drcidx);
> +return H_HARDWARE;

If this does indicate a bug elswhere in qemu, this should probably be
an assert rather than an H_HARDWARE.

> +}
> +
> +dimm = PC_DIMM(drc->dev);
> +backend = MEMORY_BACKEND(dimm->hostmem);
> +backend_fd = memory_region_get_fd(>mr);
>  
>  if (object_property_get_bool(OBJECT(backend), "pmem", NULL)) {
>  MemoryRegion *mr = host_memory_backend_get_memory(dimm->hostmem);
> @@ -475,7 +485,15 @@ static void spapr_nvdimm_flush_completion_cb(void 
> *opaque, int hcall_ret)
>  {
>  SpaprNVDIMMDeviceFlushState *state = opaque;
>  SpaprDrc *drc = spapr_drc_by_index(state->drcidx);
> -SpaprNVDIMMDevice *s_nvdimm = SPAPR_NVDIMM(drc->dev);
> +SpaprNVDIMMDevice *s_nvdimm;
> +
> +if (!drc) {
> +error_report("papr_scm: Could not find nvdimm device with DRC 0x%u",
> + state->drcidx);
> +return;
> +}
> +
> +s_nvdimm = SPAPR_NVDIMM(drc->dev);
>  
>  state->hcall_ret = hcall_ret;
>  QLIST_REMOVE(state, node);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH 1/1] kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory

2022-04-06 Thread David Gibson
On Tue, Apr 05, 2022 at 03:30:26PM +0100, Peter Maydell wrote:
> On Tue, 5 Apr 2022 at 14:07, Daniel Henrique Barboza
>  wrote:
> >
> > There is a lot of Valgrind warnings about conditional jump depending on
> > unintialized values like this one (taken from a pSeries guest):
> >
> >  Conditional jump or move depends on uninitialised value(s)
> > at 0xB011DC: kvmppc_enable_cap_large_decr (kvm.c:2544)
> > by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523)
> > by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921)
> > by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73)
> > (...)
> >   Uninitialised value was created by a stack allocation
> > at 0xB01150: kvmppc_enable_cap_large_decr (kvm.c:2538)
> >
> > In this case, the alleged unintialized value is the 'lpcr' variable that
> > is written by kvm_get_one_reg() and then used in an if clause:
> >
> > int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable)
> > {
> > CPUState *cs = CPU(cpu);
> > uint64_t lpcr;
> >
> > kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
> > /* Do we need to modify the LPCR? */
> > if (!!(lpcr & LPCR_LD) != !!enable) { < Valgrind warns here
> > (...)
> >
> > A quick fix is to init the variable that kvm_get_one_reg() is going to
> > write ('lpcr' in the example above). Another idea is to convince
> > Valgrind that kvm_get_one_reg() inits the 'void *target' memory in case
> > the ioctl() is successful. This will put some boilerplate in the
> > function but it will bring benefit for its other callers.
> 
> Doesn't Valgrind have a way of modelling ioctls where it
> knows what data is read and written ? In general
> ioctl-using programs don't need to have special case
> "I am running under valgrind" handling, so this seems to
> me like valgrind is missing support for this particular ioctl.

I think that's true, but would obviously have a much larger lead time
- someone would need to figure out how to add support for this
specific ioctl() (handling any ambiguity about what type of fd we're
dealing with), get it merged then we'd need to update to the new
valgrind to get the benefits.

> More generally, how much use is running QEMU with KVM enabled
> under valgrind anyway? Valgrind has no way of knowing about
> writes to memory that the guest vCPUs do...

Those should be limited to the guest memory area though, which as
mmap()ed space would already be considered initialized, I believe.  If
there's some fancy data race checking tool for valgrind then that
might be a problem, but for just the normal memcheck tool, I don't
think it should be an issue.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH 1/1] kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory

2022-04-06 Thread David Gibson
On Tue, Apr 05, 2022 at 10:04:39AM -0300, Daniel Henrique Barboza wrote:
> There is a lot of Valgrind warnings about conditional jump depending on
> unintialized values like this one (taken from a pSeries guest):
> 
>  Conditional jump or move depends on uninitialised value(s)
> at 0xB011DC: kvmppc_enable_cap_large_decr (kvm.c:2544)
> by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523)
> by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921)
> by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73)
> (...)
>   Uninitialised value was created by a stack allocation
> at 0xB01150: kvmppc_enable_cap_large_decr (kvm.c:2538)
> 
> In this case, the alleged unintialized value is the 'lpcr' variable that
> is written by kvm_get_one_reg() and then used in an if clause:
> 
> int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable)
> {
> CPUState *cs = CPU(cpu);
> uint64_t lpcr;
> 
> kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
> /* Do we need to modify the LPCR? */
> if (!!(lpcr & LPCR_LD) != !!enable) { < Valgrind warns here
> (...)
> 
> A quick fix is to init the variable that kvm_get_one_reg() is going to
> write ('lpcr' in the example above). Another idea is to convince
> Valgrind that kvm_get_one_reg() inits the 'void *target' memory in case
> the ioctl() is successful. This will put some boilerplate in the
> function but it will bring benefit for its other callers.
> 
> This patch uses the memcheck VALGRING_MAKE_MEM_DEFINED() to mark the
> 'target' variable as initialized if the ioctl is successful.
> 
> Cc: Paolo Bonzini 
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

> ---
>  accel/kvm/kvm-all.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 5f1377ca04..d9acba23c7 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -53,6 +53,10 @@
>  #include 
>  #endif
>  
> +#ifdef CONFIG_VALGRIND_H
> +#include 
> +#endif
> +
>  /* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
>   * need to use the real host PAGE_SIZE, as that's what KVM will use.
>   */
> @@ -3504,6 +3508,19 @@ int kvm_get_one_reg(CPUState *cs, uint64_t id, void 
> *target)
>  if (r) {
>  trace_kvm_failed_reg_get(id, strerror(-r));
>  }
> +
> +#ifdef CONFIG_VALGRIND_H
> +if (r == 0) {
> +switch (id & KVM_REG_SIZE_MASK) {
> +case KVM_REG_SIZE_U32:
> +VALGRIND_MAKE_MEM_DEFINED(target, sizeof(uint32_t));
> +    break;
> +case KVM_REG_SIZE_U64:
> +VALGRIND_MAKE_MEM_DEFINED(target, sizeof(uint64_t));
> +break;
> +}
> +}
> +#endif
>  return r;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 2/4] target/ppc: init 'lpcr' in kvmppc_enable_cap_large_decr()

2022-03-31 Thread David Gibson
On Thu, Mar 31, 2022 at 03:46:57PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 3/31/22 14:36, Richard Henderson wrote:
> > On 3/31/22 11:17, Daniel Henrique Barboza wrote:
> > > > Hmm... this is seeming a bit like whack-a-mole.  Could we instead use
> > > > one of the valgrind hinting mechanisms to inform it that
> > > > kvm_get_one_reg() writes the variable at *target?
> > > 
> > > I didn't find a way of doing that looking in the memcheck helpers
> > > (https://valgrind.org/docs/manual/mc-manual.html section 4.7). That would 
> > > be a
> > > good way of solving this warning because we would put stuff inside a 
> > > specific
> > > function X and all callers of X would be covered by it.
> > > 
> > > What I did find instead is a memcheck macro called 
> > > VALGRIND_MAKE_MEM_DEFINED that
> > > tells Valgrind that the var was initialized.
> > > 
> > > This patch would then be something as follows:
> > > 
> > > 
> > > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> > > index dc93b99189..b0e22fa283 100644
> > > --- a/target/ppc/kvm.c
> > > +++ b/target/ppc/kvm.c
> > > @@ -56,6 +56,10 @@
> > >   #define DEBUG_RETURN_GUEST 0
> > >   #define DEBUG_RETURN_GDB   1
> > > 
> > > +#ifdef CONFIG_VALGRIND_H
> > > +#include 
> > > +#endif
> > > +
> > >   const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
> > >   KVM_CAP_LAST_INFO
> > >   };
> > > @@ -2539,6 +2543,10 @@ int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, 
> > > int enable)
> > >   CPUState *cs = CPU(cpu);
> > >   uint64_t lpcr;
> > > 
> > > +#ifdef CONFIG_VALGRIND_H
> > > +    VALGRIND_MAKE_MEM_DEFINED(lpcr, sizeof(uint64_t));
> > > +#endif
> > > +
> > >   kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
> > >   /* Do we need to modify the LPCR? */
> > > 
> > > 
> > > CONFIG_VALGRIND_H needs 'valgrind-devel´ installed.
> > > 
> > > I agree that this "Valgrind is complaining about variable initialization" 
> > > is a whack-a-mole
> > > situation that will keep happening in the future if we keep adding this 
> > > same code pattern
> > > (passing as reference an uninitialized var). For now, given that we have 
> > > only 4 instances
> > > to fix it in ppc code (as far as I'm aware of), and we don't have a 
> > > better way of telling
> > > Valgrind that we know what we're doing, I think we're better of 
> > > initializing these vars.
> > 
> > I would instead put this annotation inside kvm_get_one_reg, so that it 
> > covers all kvm hosts.  But it's too late to do this for 7.0.
> 
> I wasn't planning on pushing these changes for 7.0 since they aren't fixing 
> mem
> leaks or anything really bad. It's more of a quality of life improvement when
> using Valgrind.
> 
> I also tried to put this annotation in kvm_get_one_reg() and it didn't solve 
> the
> warning.

That's weird, I'm pretty sure that should work.  I'd double check to
make sure you had all the parameters right (e.g. could you have marked
the pointer itself as initialized, rather than the memory it points
to).

> I didn't find a way of telling Valgrind "consider that every time this
> function is called with parameter X it initializes X". That would be a good 
> solution
> to put in the common KVM files and fix the problem for everybody.
> 
> 
> Daniel
> 
> 
> 
> > 
> > 
> > r~
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 2/4] target/ppc: init 'lpcr' in kvmppc_enable_cap_large_decr()

2022-03-31 Thread David Gibson
On Thu, Mar 31, 2022 at 02:17:42PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 3/30/22 22:25, David Gibson wrote:
> > On Wed, Mar 30, 2022 at 09:17:15PM -0300, Daniel Henrique Barboza wrote:
> > > 'lpcr' is used as an input of kvm_get_one_reg(). Valgrind doesn't
> > > understand that and it returns warnings as such for this function:
> > > 
> > > ==55240== Thread 1:
> > > ==55240== Conditional jump or move depends on uninitialised value(s)
> > > ==55240==at 0xB011E4: kvmppc_enable_cap_large_decr (kvm.c:2546)
> > > ==55240==by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523)
> > > ==55240==by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921)
> > > ==55240==by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73)
> > > ==55240==by 0x95612B: spapr_cpu_core_reset (spapr_cpu_core.c:209)
> > > ==55240==by 0x95619B: spapr_cpu_core_reset_handler 
> > > (spapr_cpu_core.c:218)
> > > ==55240==by 0xD3605F: qemu_devices_reset (reset.c:69)
> > > ==55240==by 0x92112B: spapr_machine_reset (spapr.c:1641)
> > > ==55240==by 0x4FBD63: qemu_system_reset (runstate.c:444)
> > > ==55240==by 0x62812B: qdev_machine_creation_done (machine.c:1247)
> > > ==55240==by 0x5064C3: qemu_machine_creation_done (vl.c:2725)
> > > ==55240==by 0x5065DF: qmp_x_exit_preconfig (vl.c:2748)
> > > ==55240==  Uninitialised value was created by a stack allocation
> > > ==55240==at 0xB01158: kvmppc_enable_cap_large_decr (kvm.c:2540)
> > > 
> > > Init 'lpcr' to avoid this warning.
> > 
> > Hmm... this is seeming a bit like whack-a-mole.  Could we instead use
> > one of the valgrind hinting mechanisms to inform it that
> > kvm_get_one_reg() writes the variable at *target?
> 
> I didn't find a way of doing that looking in the memcheck helpers
> (https://valgrind.org/docs/manual/mc-manual.html section 4.7). That would be a
> good way of solving this warning because we would put stuff inside a specific
> function X and all callers of X would be covered by it.
> 
> What I did find instead is a memcheck macro called VALGRIND_MAKE_MEM_DEFINED 
> that
> tells Valgrind that the var was initialized.

I think that's the one I was thinking of.

> This patch would then be something as follows:
> 
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index dc93b99189..b0e22fa283 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -56,6 +56,10 @@
>  #define DEBUG_RETURN_GUEST 0
>  #define DEBUG_RETURN_GDB   1
> +#ifdef CONFIG_VALGRIND_H
> +#include 
> +#endif
> +
>  const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
>  KVM_CAP_LAST_INFO
>  };
> @@ -2539,6 +2543,10 @@ int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int 
> enable)
>  CPUState *cs = CPU(cpu);
>  uint64_t lpcr;
> +#ifdef CONFIG_VALGRIND_H
> +VALGRIND_MAKE_MEM_DEFINED(lpcr, sizeof(uint64_t));
> +#endif
> +
>  kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
>  /* Do we need to modify the LPCR? */

The macro call should only go after the get_one_reg, of course.

> CONFIG_VALGRIND_H needs 'valgrind-devel´ installed.

Right.. better would probably be to make a wrapper macro defined as a
no-op in the !CONFIG_VALGRIND_H case, so you don't need the ifdefs at
the point you use it.
> 
> I agree that this "Valgrind is complaining about variable initialization" is 
> a whack-a-mole
> situation that will keep happening in the future if we keep adding this same 
> code pattern
> (passing as reference an uninitialized var). For now, given that we have only 
> 4 instances
> to fix it in ppc code (as far as I'm aware of), and we don't have a better 
> way of telling
> Valgrind that we know what we're doing, I think we're better of
> initializing these vars.

Hmm... still feels like it would be better to put the
MAKE_MEM_DEFINED inside kvm_get_one_reg().  I think the difficulty
with that is that it handles both 32-bit and 64-bit registers and I'm
not sure if there's an easy way to work out exactly how many bits
*have* been initialized.

> 
> 
> Thanks,
> 
> 
> Daniel
> 
> 
> 
> > 
> > > Reviewed-by: Philippe Mathieu-Daudé 
> > > Signed-off-by: Daniel Henrique Barboza 
> > > ---
> > >   target/ppc/kvm.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> > > index 858866ecd4..42814e1b97 100644
> > > --- a/target/ppc/kvm.c
> > > +++ b/target/ppc/kvm.c
> > > @@ -2538,7 +2538,7 @@ int kvmppc_get_cap_large_decr(void)
> > >   int kvmpp

Re: [RFC PATCH 1/2] spapr: Report correct GTSE support via ov5

2022-03-30 Thread David Gibson
On Mon, Mar 14, 2022 at 07:10:10PM -0300, Fabiano Rosas wrote:
> David Gibson  writes:
> 
> > On Tue, Mar 08, 2022 at 10:23:59PM -0300, Fabiano Rosas wrote:
> >> QEMU reports MMU support to the guest via the ibm,architecture-vec-5
> >> property of the /chosen node. Byte number 26 specifies Radix Table
> >> Expansions, currently only GTSE (Guest Translation Shootdown
> >> Enable). This feature determines whether the tlbie instruction (and
> >> others) are HV privileged.
> >> 
> >> Up until now, we always reported GTSE=1 to guests. Even after the
> >> support for GTSE=0 was added. As part of that support, a kernel
> >> command line radix_hcall_invalidate=on was introduced that overrides
> >> the GTSE value received via CAS. So a guest can run with GTSE=0 and
> >> use the H_RPT_INVALIDATE hcall instead of tlbie.
> >> 
> >> In this scenario, having GTSE always set to 1 by QEMU leads to a crash
> >> when running nested KVM guests because KVM does not allow a nested
> >> hypervisor to set GTSE support for its nested guests. So a nested
> >> guest always uses the same value for LPCR_GTSE as its HV. Since the
> >> nested HV disabled GTSE, but the L2 QEMU always reports GTSE=1, we run
> >> into a crash when:
> >> 
> >> L1 LPCR_GTSE=0
> >> L2 LPCR_GTSE=0
> >> L2 CAS GTSE=1
> >> 
> >> The nested guest will run 'tlbie' and crash because the HW looks at
> >> LPCR_GTSE, which is clear.
> >> 
> >> Having GTSE disabled in the L1 and enabled in the L2 is not an option
> >> because the whole purpose of GTSE is to disallow access to tlbie and
> >> we cannot allow L1 to spawn L2s that can access features that L1
> >> itself cannot.
> >> 
> >> We also cannot have the guest check the LPCR bit, because LPCR is
> >> HV-privileged.
> >> 
> >> So this patch goes through the most intuitive route which is to have
> >> QEMU ask KVM about GTSE support and advertise the correct value to the
> >> guest. A new KVM_CAP_PPC_GTSE capability is being added.
> >> 
> >> TCG continues to always enable GTSE.
> >> 
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >>  hw/ppc/spapr.c   | 38 +++---
> >>  target/ppc/kvm.c |  8 
> >>  target/ppc/kvm_ppc.h |  6 ++
> >>  3 files changed, 45 insertions(+), 7 deletions(-)
> >> 
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 4cc204f90d..3e95a1831f 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -971,7 +971,7 @@ static void 
> >> spapr_dt_ov5_platform_support(SpaprMachineState *spapr, void *fdt,
> >>  23, 0x00, /* XICS / XIVE mode */
> >>  24, 0x00, /* Hash/Radix, filled in below. */
> >>  25, 0x00, /* Hash options: Segment Tables == no, GTSE == no. */
> >> -26, 0x40, /* Radix options: GTSE == yes. */
> >> +26, 0x00, /* Radix options, filled in below. */
> >>  };
> >>  
> >>  if (spapr->irq->xics && spapr->irq->xive) {
> >> @@ -1000,10 +1000,16 @@ static void 
> >> spapr_dt_ov5_platform_support(SpaprMachineState *spapr, void *fdt,
> >>  } else {
> >>  val[3] = 0x00; /* Hash */
> >>  }
> >> +
> >> +if (kvmppc_has_cap_gtse()) {
> >> +val[7] = 0x40 /* OV5_MMU_RADIX_GTSE */;
> >> +}
> >>  } else {
> >>  /* V3 MMU supports both hash and radix in tcg (with dynamic 
> >> switching) */
> >>  val[3] = 0xC0;
> >> +val[7] = 0x40 /* OV5_MMU_RADIX_GTSE */;
> >>  }
> >> +
> >>  _FDT(fdt_setprop(fdt, chosen, "ibm,arch-vec-5-platform-support",
> >>   val, sizeof(val)));
> >>  }
> >> @@ -2824,14 +2830,32 @@ static void spapr_machine_init(MachineState 
> >> *machine)
> >>  /* Init numa_assoc_array */
> >>  spapr_numa_associativity_init(spapr, machine);
> >>  
> >> -if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
> >> -ppc_type_check_compat(machine->cpu_type, 
> >> CPU_POWERPC_LOGICAL_3_00, 0,
> >> +if (ppc_type_check_compat(machine->cpu_type, 
> >> CPU_POWERPC_LOGICAL_3_00, 0,
> >>spapr->max_compat_pvr)) {
> >> -spapr_ovec_set(spapr->ov5, OV5_MM

Re: [PATCH v2 2/4] target/ppc: init 'lpcr' in kvmppc_enable_cap_large_decr()

2022-03-30 Thread David Gibson
On Wed, Mar 30, 2022 at 09:17:15PM -0300, Daniel Henrique Barboza wrote:
> 'lpcr' is used as an input of kvm_get_one_reg(). Valgrind doesn't
> understand that and it returns warnings as such for this function:
> 
> ==55240== Thread 1:
> ==55240== Conditional jump or move depends on uninitialised value(s)
> ==55240==at 0xB011E4: kvmppc_enable_cap_large_decr (kvm.c:2546)
> ==55240==by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523)
> ==55240==by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921)
> ==55240==by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73)
> ==55240==by 0x95612B: spapr_cpu_core_reset (spapr_cpu_core.c:209)
> ==55240==by 0x95619B: spapr_cpu_core_reset_handler (spapr_cpu_core.c:218)
> ==55240==by 0xD3605F: qemu_devices_reset (reset.c:69)
> ==55240==by 0x92112B: spapr_machine_reset (spapr.c:1641)
> ==55240==by 0x4FBD63: qemu_system_reset (runstate.c:444)
> ==55240==by 0x62812B: qdev_machine_creation_done (machine.c:1247)
> ==55240==by 0x5064C3: qemu_machine_creation_done (vl.c:2725)
> ==55240==by 0x5065DF: qmp_x_exit_preconfig (vl.c:2748)
> ==55240==  Uninitialised value was created by a stack allocation
> ==55240==at 0xB01158: kvmppc_enable_cap_large_decr (kvm.c:2540)
> 
> Init 'lpcr' to avoid this warning.

Hmm... this is seeming a bit like whack-a-mole.  Could we instead use
one of the valgrind hinting mechanisms to inform it that
kvm_get_one_reg() writes the variable at *target?

> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  target/ppc/kvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index 858866ecd4..42814e1b97 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -2538,7 +2538,7 @@ int kvmppc_get_cap_large_decr(void)
>  int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable)
>  {
>  CPUState *cs = CPU(cpu);
> -uint64_t lpcr;
> +uint64_t lpcr = 0;
>  
>  kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
>  /* Do we need to modify the LPCR? */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 1/4] target/ppc: initialize 'val' union in kvm_get_one_spr()

2022-03-30 Thread David Gibson
On Wed, Mar 30, 2022 at 09:17:14PM -0300, Daniel Henrique Barboza wrote:
> Valgrind isn't convinced that we are initializing the values we assign
> to env->spr[spr] because it doesn't understand that the 'val' union is
> being written by the kvm_vcpu_ioctl() that follows (via struct
> kvm_one_reg).
> 
> This results in Valgrind complaining about uninitialized values every
> time we use env->spr in a conditional, like this instance:
> 
> ==707578== Thread 1:
> ==707578== Conditional jump or move depends on uninitialised value(s)
> ==707578==at 0xA10A40: hreg_compute_hflags_value (helper_regs.c:106)
> ==707578==by 0xA10C9F: hreg_compute_hflags (helper_regs.c:173)
> ==707578==by 0xA110F7: hreg_store_msr (helper_regs.c:262)
> ==707578==by 0xA051A3: ppc_cpu_reset (cpu_init.c:7168)
> ==707578==by 0xD4730F: device_transitional_reset (qdev.c:799)
> ==707578==by 0xD4A11B: resettable_phase_hold (resettable.c:182)
> ==707578==by 0xD49A77: resettable_assert_reset (resettable.c:60)
> ==707578==by 0xD4994B: resettable_reset (resettable.c:45)
> ==707578==by 0xD458BB: device_cold_reset (qdev.c:296)
> ==707578==by 0x48FBC7: cpu_reset (cpu-common.c:114)
> ==707578==by 0x97B5EB: spapr_reset_vcpu (spapr_cpu_core.c:38)
> ==707578==by 0x97BABB: spapr_cpu_core_reset (spapr_cpu_core.c:209)
> ==707578==  Uninitialised value was created by a stack allocation
> ==707578==at 0xB11F08: kvm_get_one_spr (kvm.c:543)
> 
> Initializing 'val' has no impact in the logic and makes Valgrind output
> more bearable.
> 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

> ---
>  target/ppc/kvm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index dc93b99189..858866ecd4 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -543,10 +543,11 @@ static void kvm_get_one_spr(CPUState *cs, uint64_t id, 
> int spr)
>  {
>  PowerPCCPU *cpu = POWERPC_CPU(cs);
>  CPUPPCState *env = >env;
> +/* Init 'val' to avoid "uninitialised value" Valgrind warnings */
>  union {
>  uint32_t u32;
>      uint64_t u64;
> -} val;
> +} val = { };
>  struct kvm_one_reg reg = {
>  .id = id,
>  .addr = (uintptr_t) ,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 1/1] hw/ppc: free env->tb_env in spapr_unrealize_vcpu()

2022-03-29 Thread David Gibson
On Tue, Mar 29, 2022 at 09:45:45AM -0300, Daniel Henrique Barboza wrote:
> The timebase is allocated during spapr_realize_vcpu() and it's not
> freed. This results in memory leaks when doing vcpu unplugs:
> 
> ==636935==
> ==636935== 144 (96 direct, 48 indirect) bytes in 1 blocks are definitely lost 
> in loss record 6
> ,461 of 8,135
> ==636935==at 0x4897468: calloc (vg_replace_malloc.c:760)
> ==636935==by 0x5077213: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6400.4)
> ==636935==by 0x507757F: g_malloc0_n (in 
> /usr/lib64/libglib-2.0.so.0.6400.4)
> ==636935==by 0x93C3FB: cpu_ppc_tb_init (ppc.c:1066)
> ==636935==by 0x97BC2B: spapr_realize_vcpu (spapr_cpu_core.c:268)
> ==636935==by 0x97C01F: spapr_cpu_core_realize (spapr_cpu_core.c:337)
> ==636935==by 0xD4626F: device_set_realized (qdev.c:531)
> ==636935==by 0xD55273: property_set_bool (object.c:2273)
> ==636935==by 0xD523DF: object_property_set (object.c:1408)
> ==636935==by 0xD588B7: object_property_set_qobject (qom-qobject.c:28)
> ==636935==by 0xD52897: object_property_set_bool (object.c:1477)
> ==636935==by 0xD4579B: qdev_realize (qdev.c:333)
> ==636935==
> 
> This patch adds a cpu_ppc_tb_free() helper in hw/ppc/ppc.c to allow us
> to free the timebase. This leak is then solved by calling
> cpu_ppc_tb_free() in spapr_unrealize_vcpu().
> 
> Fixes: 6f4b5c3ec590 ("spapr: CPU hot unplug support")
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/ppc.c| 7 +++
>  hw/ppc/spapr_cpu_core.c | 3 +++
>  include/hw/ppc/ppc.h| 1 +
>  3 files changed, 11 insertions(+)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index faa02d6710..fea70df45e 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -1083,6 +1083,13 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, 
> uint32_t freq)
>  return _ppc_set_tb_clk;
>  }
>  
> +void cpu_ppc_tb_free(CPUPPCState *env)
> +{
> +timer_free(env->tb_env->decr_timer);
> +timer_free(env->tb_env->hdecr_timer);
> +g_free(env->tb_env);
> +}
> +
>  /* cpu_ppc_hdecr_init may be used if the timer is not used by HDEC emulation 
> */
>  void cpu_ppc_hdecr_init(CPUPPCState *env)
>  {
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index ed84713960..8a4861f45a 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -189,10 +189,13 @@ static const VMStateDescription vmstate_spapr_cpu_state 
> = {
>  
>  static void spapr_unrealize_vcpu(PowerPCCPU *cpu, SpaprCpuCore *sc)
>  {
> +CPUPPCState *env = >env;
> +
>  if (!sc->pre_3_0_migration) {
>  vmstate_unregister(NULL, _spapr_cpu_state, 
> cpu->machine_data);
>  }
>  spapr_irq_cpu_intc_destroy(SPAPR_MACHINE(qdev_get_machine()), cpu);
> +cpu_ppc_tb_free(env);
>  qdev_unrealize(DEVICE(cpu));
>  }
>  
> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
> index b0ba4bd6b9..364f165b4b 100644
> --- a/include/hw/ppc/ppc.h
> +++ b/include/hw/ppc/ppc.h
> @@ -54,6 +54,7 @@ struct ppc_tb_t {
>  
>  uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset);
>  clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq);
> +void cpu_ppc_tb_free(CPUPPCState *env);
>  void cpu_ppc_hdecr_init(CPUPPCState *env);
>  void cpu_ppc_hdecr_exit(CPUPPCState *env);
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 0/2] ppc: fix vcpu hotunplug leak in spapr_realize_vcpu

2022-03-28 Thread David Gibson
On Mon, Mar 28, 2022 at 09:59:16AM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This is a memory leak found by Valgrind when testing vcpu
> hotplug/unplug in pSeries guests.
> 
> Other vcpu hotplug/unplug leaks are still present in the common code
> (one in the KVM thread loop and another in cpu_address_space via
> cpu->cpu_ases) but these are already being handled by Mark Kanda and
> Phillipe.

Changes LGTM, but I don't see much reason to split this into two
patches.  They're both small, and are part of the same logical change.

> 
> 
> Daniel Henrique Barboza (2):
>   hw/ppc/ppc.c: add cpu_ppc_tb_free()
>   hw/ppc: free env->tb_env in spapr_unrealize_vcpu()
> 
>  hw/ppc/ppc.c| 7 +++
>  hw/ppc/spapr_cpu_core.c | 3 +++
>  include/hw/ppc/ppc.h    | 1 +
>  3 files changed, 11 insertions(+)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH 1/6] target/ppc: Add support for the Processor Attention instruction

2022-03-25 Thread David Gibson
On Fri, Mar 25, 2022 at 12:11:47PM -0300, Fabiano Rosas wrote:
> Leandro Lupori  writes:
> 
> > From: Cédric Le Goater 
> >
> > Check the HID0 bit to send signal, currently modeled as a checkstop.
> > The QEMU implementation adds an exit using the GPR[3] value (that's a
> > hack for tests)
> >
> > Signed-off-by: Cédric Le Goater 
> > Signed-off-by: Leandro Lupori 
> > ---
> >  target/ppc/cpu.h |  8 
> >  target/ppc/excp_helper.c | 27 +++
> >  target/ppc/helper.h  |  1 +
> >  target/ppc/translate.c   | 14 ++
> >  4 files changed, 50 insertions(+)
> >
> > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > index 047b24ba50..12f9f3a880 100644
> > --- a/target/ppc/cpu.h
> > +++ b/target/ppc/cpu.h
> > @@ -173,6 +173,12 @@ enum {
> >  POWERPC_EXCP_PRIV_REG  = 0x02,  /* Privileged register exception   
> >   */
> >  /* Trap
> >   */
> >  POWERPC_EXCP_TRAP  = 0x40,
> > +/* Processor Attention 
> >   */
> > +POWERPC_EXCP_ATTN  = 0x100,
> > +/*
> > + * NOTE: POWERPC_EXCP_ATTN uses values from 0x100 to 0x1ff to return
> > + *   error codes.
> > + */
> >  };
> >  
> >  #define PPC_INPUT(env) ((env)->bus_model)
> > @@ -2089,6 +2095,8 @@ void ppc_compat_add_property(Object *obj, const char 
> > *name,
> >  #define HID0_DOZE   (1 << 23)   /* pre-2.06 */
> >  #define HID0_NAP(1 << 22)   /* pre-2.06 */
> >  #define HID0_HILE   PPC_BIT(19) /* POWER8 */
> > +#define HID0_ATTN   PPC_BIT(31) /* Processor Attention */
> > +#define HID0_POWER9_ATTNPPC_BIT(3)
> >  #define HID0_POWER9_HILEPPC_BIT(4)
> >  
> >  
> > /*/
> > diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> > index d3e2cfcd71..b0c629905c 100644
> > --- a/target/ppc/excp_helper.c
> > +++ b/target/ppc/excp_helper.c
> > @@ -1379,6 +1379,9 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int 
> > excp)
> >  }
> >  cs->halted = 1;
> >  cpu_interrupt_exittb(cs);
> > +if ((env->error_code & ~0xff) == POWERPC_EXCP_ATTN) {
> > +exit(env->error_code & 0xff);
> > +}
> >  }
> >  if (env->msr_mask & MSR_HVB) {
> >  /*
> > @@ -1971,6 +1974,30 @@ void helper_pminsn(CPUPPCState *env, 
> > powerpc_pm_insn_t insn)
> >  env->resume_as_sreset = (insn != PPC_PM_STOP) ||
> >  (env->spr[SPR_PSSCR] & PSSCR_EC);
> >  }
> > +
> > +/*
> > + * Processor Attention instruction (Implementation dependent)
> > + */
> > +void helper_attn(CPUPPCState *env, target_ulong r3)
> > +{
> > +bool attn = false;
> > +
> > +if (env->excp_model == POWERPC_EXCP_POWER8) {
> > +attn = !!(env->spr[SPR_HID0] & HID0_ATTN);
> > +} else if (env->excp_model == POWERPC_EXCP_POWER9 ||
> > +   env->excp_model == POWERPC_EXCP_POWER10) {
> > +attn = !!(env->spr[SPR_HID0] & HID0_POWER9_ATTN);
> > +}
> 
> The excp_model is not a CPU identifier. This should ideally be a flag
> set during init_proc. Something like HID0_ATTN_P8/HID0_ATTN_P9.
> 
> Maybe we should consider adding a hid0_mask similar to lpcr_mask.

I don't think that's a good idea.  By definition, the meaning of the
HID registers is model specific - having a hid0_mask would imply it
always has the same meaning, just different bits that are present or
not.  I think you want to explicitly dispath to cpu family specific
functions for this.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.1 2/4] hw/ppc: use qdev to register physical DRC vmstates

2022-03-22 Thread David Gibson
On Tue, Mar 22, 2022 at 03:38:52PM -0300, Daniel Henrique Barboza wrote:
> Similar to logical DRCs, let's convert physical DRCs to register their
> vmstates using dc->vmsd.
> 
> The same constraints with instance_id being set to spapr_drc_index()
> also applies in this case. However, since realize_physical() calls
> drc_realize(), qdev_set_legacy_instance_id() is already being set.

Ok, and you've verified that you don't need to set the legacy ID on
both "layers"?  That is, have you tested that you can migrate from
before this change to after?

> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr_drc.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index a5ef64d2a2..5a60885876 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -640,9 +640,6 @@ static void realize_physical(DeviceState *d, Error **errp)
>  return;
>  }
>  
> -vmstate_register(VMSTATE_IF(drcp),
> - spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)),
> - _spapr_drc_physical, drcp);
>  qemu_register_reset(drc_physical_reset, drcp);
>  }
>  
> @@ -651,7 +648,6 @@ static void unrealize_physical(DeviceState *d)
>  SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
>  
>  drc_unrealize(d);
> -vmstate_unregister(VMSTATE_IF(drcp), _spapr_drc_physical, drcp);
>  qemu_unregister_reset(drc_physical_reset, drcp);
>  }
>  
> @@ -662,6 +658,8 @@ static void spapr_drc_physical_class_init(ObjectClass *k, 
> void *data)
>  
>  dk->realize = realize_physical;
>  dk->unrealize = unrealize_physical;
> +dk->vmsd = _spapr_drc_physical;
> +
>  drck->dr_entity_sense = physical_entity_sense;
>  drck->isolate = drc_isolate_physical;
>  drck->unisolate = drc_unisolate_physical;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.1 0/4] use dc->vmsd with spapr devices vmstate

2022-03-22 Thread David Gibson
On Tue, Mar 22, 2022 at 03:38:50PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This short series converts some spapr devices to use the dc->vmsd
> interface to register the vmstate. For most of them it was needed
> to use qdev_set_legacy_instance_id() to keep compatibility with the
> instance_id being used for awhile.
> 
> Although no functional changes were made the resulting code is a bit
> shorter and maintainable. After these patches there are only 3 places
> where vmstate_register() APIs are being used.
> 
> No behavior changes were detected when testing migration scenarios with
> hotplug/unplug of devices.

Looks good tome.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.1 0/9] spapr: add drc->index, remove spapr_drc_index()

2022-03-21 Thread David Gibson
On Mon, Mar 21, 2022 at 04:58:47AM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 3/21/22 00:55, David Gibson wrote:
> > On Fri, Mar 18, 2022 at 02:33:11PM -0300, Daniel Henrique Barboza wrote:
> > > Hi,
> > > 
> > > I decided to make this change after realizing that (1) spapr_drc_index()
> > > always return the same index value for the DRC regardless of machine or
> > > device state and (2) we call spapr_drc_index() a lot throughout the
> > > spapr code.
> > 
> > Hmm.. so, spapr_drc_index() wasn't ever intended as an abstraction
> > point.  Rather, it's just there as a matter of data redundancy.  The
> > index can be derived from the drc->id and the type.  Unless there's a
> > compelling reason otherwise, it's usually a good idea to store data in
> > just one form (if there's more it's an opportunity for bugs to let it
> > get out of sync).
> 
> 
> Hmm what if we store drc->index instead and derive drc->id from it? drc->index
> is read from several places, while drc->id is used just in spapr_drc_name() to
> write the DT (via spapr_dt_drc()).

That could work.  It's still slightly redundant since the type part of
the index can be derived from the class, but I think it's not unreasonable.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/3] spapr: Ignore nested KVM hypercalls when not running TCG

2022-03-21 Thread David Gibson
On Fri, Mar 18, 2022 at 10:41:19AM -0300, Fabiano Rosas wrote:
> David Gibson  writes:
> 
> > On Thu, Mar 17, 2022 at 02:20:47PM -0300, Fabiano Rosas wrote:
> >> It is possible that nested KVM hypercalls reach QEMU while we're
> >> running KVM. The spapr virtual hypervisor implementation of the nested
> >> KVM API only works when the L1 is running under TCG. So return
> >> H_FUNCTION if we are under KVM.
> >> 
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >>  hw/ppc/spapr_hcall.c | 10 +-
> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> >> index f008290787..119baa1d2d 100644
> >> --- a/hw/ppc/spapr_hcall.c
> >> +++ b/hw/ppc/spapr_hcall.c
> >> @@ -1508,7 +1508,7 @@ static target_ulong h_set_ptbl(PowerPCCPU *cpu,
> >>  {
> >>  target_ulong ptcr = args[0];
> >>  
> >> -if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_KVM_HV)) {
> >> +if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_KVM_HV) || !tcg_enabled()) 
> >> {
> >
> > I was about to nack this on the grounds that it changes guest visible
> > behaviour based on host properties.  Then I realized that's not the
> > case, because in the KVM + SPAPR_CAP_NESTED_KVM_HV case the hypercall
> > should be caught by KVM first and never reach here.
> >
> > So at the very least I think this needs a comment explaining that.
> 
> Ok.
> 
> > However, I'm still kind of confused how we would get here in the first
> > place.  If SPAPR_CAP_NESTED_KVM_HV is set, but KVM doesn't support it,
> > we should fail outright in cap_nested_kvm_hv_apply().  So how *do* we
> > get here?  Is the kernel not doing what we expect of it?  If so, we
> > should probably abort, rather than just returning H_FUNCTION.
> 
> Indeed, If all parts are functioning this should never happen. I was
> hacking in L0 and accidentally let some hcalls through. So I'm just
> being overly cautions with this patch. If that will end up causing too
> much confusion, we could drop this one.

Ok, having something check that case is reasonable - but as a "can't
happen" it should abort, rather than returning something sensible to
the guest.

> 
> >>  return H_FUNCTION;
> >>  }
> >>  
> >> @@ -1532,6 +1532,10 @@ static target_ulong h_tlb_invalidate(PowerPCCPU 
> >> *cpu,
> >>   * across L1<->L2 transitions, so nothing is required here.
> >>   */
> >>  
> >> +if (!tcg_enabled()) {
> >> +    return H_FUNCTION;
> >> +}
> >> +
> >>  return H_SUCCESS;
> >>  }
> >>  
> >> @@ -1572,6 +1576,10 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
> >>  uint64_t cr;
> >>  int i;
> >>  
> >> +if (!tcg_enabled()) {
> >> +return H_FUNCTION;
> >> +}
> >> +
> >>  if (spapr->nested_ptcr == 0) {
> >>  return H_NOT_AVAILABLE;
> >>  }
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH for-7.1 0/9] spapr: add drc->index, remove spapr_drc_index()

2022-03-20 Thread David Gibson
On Fri, Mar 18, 2022 at 02:33:11PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> I decided to make this change after realizing that (1) spapr_drc_index()
> always return the same index value for the DRC regardless of machine or
> device state and (2) we call spapr_drc_index() a lot throughout the
> spapr code.

Hmm.. so, spapr_drc_index() wasn't ever intended as an abstraction
point.  Rather, it's just there as a matter of data redundancy.  The
index can be derived from the drc->id and the type.  Unless there's a
compelling reason otherwise, it's usually a good idea to store data in
just one form (if there's more it's an opportunity for bugs to let it
get out of sync).

> 
> This means that a new attribute to store the generated index in the DRC
> object time will spare us from calling a function that always returns
> the same value.
> 
> No functional changes were made.
> 
>  
> Daniel Henrique Barboza (9):
>   hw/ppc/spapr_drc.c: add drc->index
>   hw/ppc/spapr_drc.c: redefine 'index' SpaprDRC property
>   hw/ppc/spapr_drc.c: use drc->index in trace functions
>   hw/ppc/spapr_drc.c: use drc->index
>   hw/ppc/spapr.c: use drc->index
>   hw/ppc/spapr_events.c: use drc->index
>   hw/ppc/spapr_nvdimm.c: use drc->index
>   hw/ppc/spapr_pci.c: use drc->index
>   hw/ppc/spapr_drc.c: remove spapr_drc_index()
> 
>  hw/ppc/spapr.c | 18 -
>  hw/ppc/spapr_drc.c | 79 +++---
>  hw/ppc/spapr_events.c  |  4 +-
>  hw/ppc/spapr_nvdimm.c  | 10 ++---
>  hw/ppc/spapr_pci.c |  5 +--
>  include/hw/ppc/spapr_drc.h |  2 +-
>  6 files changed, 48 insertions(+), 70 deletions(-)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: Question about vmstate_register(), dc->vmsd and instance_id

2022-03-19 Thread David Gibson
On Fri, Mar 18, 2022 at 04:51:10PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 3/18/22 00:43, David Gibson wrote:
> > On Thu, Mar 17, 2022 at 04:29:14PM +, Dr. David Alan Gilbert wrote:
> > > * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > > > On Thu, 17 Mar 2022 at 14:03, Daniel Henrique Barboza
> > > >  wrote:
> > > > > I've been looking into converting some vmstate_register() calls to 
> > > > > use dc->vmsd,
> > > > > using as a base the docs in docs/devel/migration.rst. This doc 
> > > > > mentions that we
> > > > > can either register the vmsd by using vmstate_register() or we can 
> > > > > use dc->vmsd
> > > > > for qdev-based devices.
> > > > > 
> > > > > When trying to convert this vmstate() call for the qdev alternative 
> > > > > (hw/ppc/spapr_drc.c,
> > > > > drc_realize()) I found this:
> > > > > 
> > > > >   vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), 
> > > > > _spapr_drc,
> > > > >drc);
> > > > > 
> > > > > spapr_drc_index() is an unique identifier for these DRC devices and 
> > > > > it's being used
> > > > > as instance_id. It is not clear to me how we can keep using this same 
> > > > > instance_id when
> > > > > using the dc->vmsd alternative. By looking a bit into migration files 
> > > > > I understood
> > > > > that if dc->vmsd is being used the instance_id is always 
> > > > > autogenerated. Is that correct?
> > > > 
> > > > Not entirely. It is the intended common setup, but because changing
> > > > the ID value breaks migration compatibility there is a mechanism
> > > > for saying "my device is special and needs to set the instance ID
> > > > to something else" -- qdev_set_legacy_instance_id().
> > > 
> > > Yes, this is normally only an issue for 'system' or memory mapped
> > > devices;  for things hung off a bus that has it's own device naming,
> > > then each instance of a device has it's own device due to the bus name
> > > so instance_id's aren't used.  Where you've got a few of the
> > > same device with the same name, and no bus for them to be named by, then
> > > the instance_id is used to uniquify them.
> 
> 
> Thanks for the info. qdev_set_legacy_instance_id() was the missing piece I was
> looking for to continue with the dc->vmsd transition I'd like to do.
> 
> 
> > 
> > Thanks for the information.  I remember deciding at the time that just
> > using vmsd wouldn't work for the DRCs because we needed this fixed
> > index.  At the time either qdev_set_legacy_instance_id() didn't exist,
> > or I didn't know about it, hence the explicit vmstate_register() call
> > so that an explicit instance id could be supplied.
> > 
> 
> This is the commit that introduced DRC migration:
> 
> 
> commit a50919dddf148b0a2008db4a0593dbe69e1059c0
> Author: Daniel Henrique Barboza 
> Date:   Mon May 22 16:35:49 2017 -0300
> 
> hw/ppc: migrating the DRC state of hotplugged devices
> 
> 
> I'd say you can cut yourself some slack this time. Blame that guy
> instead.

Man, not that guy again! ;-)

I think I must have done something similar with some other migration
component.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 3/3] spapr: Move nested KVM hypercalls under a TCG only config.

2022-03-17 Thread David Gibson
On Thu, Mar 17, 2022 at 02:20:49PM -0300, Fabiano Rosas wrote:
> These are the spapr virtual hypervisor implementation of the nested
> KVM API. They only make sense when running with TCG.
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  hw/ppc/spapr_hcall.c | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index c0bfc4bc9c..f2c802c155 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -2,6 +2,7 @@
>  #include "qemu/cutils.h"
>  #include "qapi/error.h"
>  #include "sysemu/hw_accel.h"
> +#include "sysemu/tcg.h"
>  #include "sysemu/runstate.h"
>  #include "qemu/log.h"
>  #include "qemu/main-loop.h"
> @@ -1473,7 +1474,8 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, 
> target_ulong opcode,
>  return H_FUNCTION;
>  }
>  
> -/* TCG only */
> +#ifdef CONFIG_TCG
> +
>  #define PRTS_MASK  0x1f
>  
>  static target_ulong h_set_ptbl(PowerPCCPU *cpu,
> @@ -1807,6 +1809,12 @@ out_restore_l1:
>  g_free(spapr_cpu->nested_host_state);
>  spapr_cpu->nested_host_state = NULL;
>  }
> +#else
> +void spapr_exit_nested(PowerPCCPU *cpu, int excp)
> +{
> +g_assert_not_reached();
> +}
> +#endif
>  
>  #ifndef CONFIG_TCG
>  static target_ulong h_softmmu(PowerPCCPU *cpu, SpaprMachineState *spapr,
> @@ -1829,7 +1837,10 @@ static void hypercall_register_softmmu(void)
>  #else
>  static void hypercall_register_softmmu(void)
>  {
> -/* DO NOTHING */
> +spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> +spapr_register_hypercall(KVMPPC_H_ENTER_NESTED, h_enter_nested);
> +spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
> +spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, 
> h_copy_tofrom_guest);

This doesn't fit.  This is specifically about the MMU hypercalls - if
you want to put other things in there it needs a name change at least.

>  }
>  #endif
>  
> @@ -1888,11 +1899,6 @@ static void hypercall_register_types(void)
>  spapr_register_hypercall(KVMPPC_H_CAS, h_client_architecture_support);
>  
>  spapr_register_hypercall(KVMPPC_H_UPDATE_DT, h_update_dt);
> -
> -spapr_register_hypercall(KVMPPC_H_SET_PARTITION_TABLE, h_set_ptbl);
> -spapr_register_hypercall(KVMPPC_H_ENTER_NESTED, h_enter_nested);
> -spapr_register_hypercall(KVMPPC_H_TLB_INVALIDATE, h_tlb_invalidate);
> -spapr_register_hypercall(KVMPPC_H_COPY_TOFROM_GUEST, 
> h_copy_tofrom_guest);
>  }
>  
>  type_init(hypercall_register_types)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: Question about vmstate_register(), dc->vmsd and instance_id

2022-03-17 Thread David Gibson
On Thu, Mar 17, 2022 at 04:29:14PM +, Dr. David Alan Gilbert wrote:
> * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > On Thu, 17 Mar 2022 at 14:03, Daniel Henrique Barboza
> >  wrote:
> > > I've been looking into converting some vmstate_register() calls to use 
> > > dc->vmsd,
> > > using as a base the docs in docs/devel/migration.rst. This doc mentions 
> > > that we
> > > can either register the vmsd by using vmstate_register() or we can use 
> > > dc->vmsd
> > > for qdev-based devices.
> > >
> > > When trying to convert this vmstate() call for the qdev alternative 
> > > (hw/ppc/spapr_drc.c,
> > > drc_realize()) I found this:
> > >
> > >  vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), 
> > > _spapr_drc,
> > >   drc);
> > >
> > > spapr_drc_index() is an unique identifier for these DRC devices and it's 
> > > being used
> > > as instance_id. It is not clear to me how we can keep using this same 
> > > instance_id when
> > > using the dc->vmsd alternative. By looking a bit into migration files I 
> > > understood
> > > that if dc->vmsd is being used the instance_id is always autogenerated. 
> > > Is that correct?
> > 
> > Not entirely. It is the intended common setup, but because changing
> > the ID value breaks migration compatibility there is a mechanism
> > for saying "my device is special and needs to set the instance ID
> > to something else" -- qdev_set_legacy_instance_id().
> 
> Yes, this is normally only an issue for 'system' or memory mapped
> devices;  for things hung off a bus that has it's own device naming,
> then each instance of a device has it's own device due to the bus name
> so instance_id's aren't used.  Where you've got a few of the
> same device with the same name, and no bus for them to be named by, then
> the instance_id is used to uniquify them.

Thanks for the information.  I remember deciding at the time that just
using vmsd wouldn't work for the DRCs because we needed this fixed
index.  At the time either qdev_set_legacy_instance_id() didn't exist,
or I didn't know about it, hence the explicit vmstate_register() call
so that an explicit instance id could be supplied.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 1/3] spapr: Ignore nested KVM hypercalls when not running TCG

2022-03-17 Thread David Gibson
On Thu, Mar 17, 2022 at 02:20:47PM -0300, Fabiano Rosas wrote:
> It is possible that nested KVM hypercalls reach QEMU while we're
> running KVM. The spapr virtual hypervisor implementation of the nested
> KVM API only works when the L1 is running under TCG. So return
> H_FUNCTION if we are under KVM.
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  hw/ppc/spapr_hcall.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index f008290787..119baa1d2d 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1508,7 +1508,7 @@ static target_ulong h_set_ptbl(PowerPCCPU *cpu,
>  {
>  target_ulong ptcr = args[0];
>  
> -if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_KVM_HV)) {
> +if (!spapr_get_cap(spapr, SPAPR_CAP_NESTED_KVM_HV) || !tcg_enabled()) {

I was about to nack this on the grounds that it changes guest visible
behaviour based on host properties.  Then I realized that's not the
case, because in the KVM + SPAPR_CAP_NESTED_KVM_HV case the hypercall
should be caught by KVM first and never reach here.

So at the very least I think this needs a comment explaining that.

However, I'm still kind of confused how we would get here in the first
place.  If SPAPR_CAP_NESTED_KVM_HV is set, but KVM doesn't support it,
we should fail outright in cap_nested_kvm_hv_apply().  So how *do* we
get here?  Is the kernel not doing what we expect of it?  If so, we
should probably abort, rather than just returning H_FUNCTION.


>  return H_FUNCTION;
>  }
>  
> @@ -1532,6 +1532,10 @@ static target_ulong h_tlb_invalidate(PowerPCCPU *cpu,
>   * across L1<->L2 transitions, so nothing is required here.
>   */
>  
> +if (!tcg_enabled()) {
> +return H_FUNCTION;
> +}
> +
>  return H_SUCCESS;
>  }
>  
> @@ -1572,6 +1576,10 @@ static target_ulong h_enter_nested(PowerPCCPU *cpu,
>  uint64_t cr;
>  int i;
>  
> +if (!tcg_enabled()) {
> +    return H_FUNCTION;
> +}
> +
>  if (spapr->nested_ptcr == 0) {
>  return H_NOT_AVAILABLE;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v3 1/4] target/ppc: Fix masked PVR matching

2022-03-12 Thread David Gibson
On Mon, Mar 07, 2022 at 04:55:24PM +1000, Nicholas Piggin wrote:
> The pvr_match for a CPU class is not supposed to just match for any
> CPU in the family, but rather whether this particular CPU class is the
> best match in the family.

Ok... but I don't see how that question can possibly be answered
without reference to all the available options.

> Prior to this fix, e.g., a POWER9 DD2.3 KVM host matches to the
> power9_v1.0 class (because that's first in the list). After the patch,
> it matches the power9_v2.0 class.

.. so, doesn't this indicate a problem in the check order, rather than
a problem with the matching function?

> 
> Fixes: 03ae4133ab8 ("target-ppc: Add pvr_match() callback")
> Signed-off-by: Nicholas Piggin 
> ---
>  target/ppc/cpu_init.c | 51 ---
>  1 file changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 073fd10168..83ca741bea 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -5910,13 +5910,14 @@ static void init_proc_POWER7(CPUPPCState *env)
>  
>  static bool ppc_pvr_match_power7(PowerPCCPUClass *pcc, uint32_t pvr)
>  {
> -if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER7P_BASE) {
> -return true;
> -}
> -if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER7_BASE) {
> -return true;
> +uint32_t base = pvr & CPU_POWERPC_POWER_SERVER_MASK;
> +uint32_t pcc_base = pcc->pvr & CPU_POWERPC_POWER_SERVER_MASK;
> +
> +if (base != pcc_base) {
> +return false;
>  }
> -return false;
> +
> +return true;
>  }
>  
>  static bool cpu_has_work_POWER7(CPUState *cs)
> @@ -6070,16 +6071,14 @@ static void init_proc_POWER8(CPUPPCState *env)
>  
>  static bool ppc_pvr_match_power8(PowerPCCPUClass *pcc, uint32_t pvr)
>  {
> -if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER8NVL_BASE) 
> {
> -return true;
> -}
> -if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER8E_BASE) {
> -return true;
> -}
> -if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER8_BASE) {
> -return true;
> +uint32_t base = pvr & CPU_POWERPC_POWER_SERVER_MASK;
> +uint32_t pcc_base = pcc->pvr & CPU_POWERPC_POWER_SERVER_MASK;
> +
> +if (base != pcc_base) {
> +return false;
>  }
> -return false;
> +
> +return true;
>  }
>  
>  static bool cpu_has_work_POWER8(CPUState *cs)
> @@ -6277,9 +6276,18 @@ static void init_proc_POWER9(CPUPPCState *env)
>  
>  static bool ppc_pvr_match_power9(PowerPCCPUClass *pcc, uint32_t pvr)
>  {
> -if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER9_BASE) {
> +uint32_t base = pvr & CPU_POWERPC_POWER_SERVER_MASK;
> +uint32_t pcc_base = pcc->pvr & CPU_POWERPC_POWER_SERVER_MASK;
> +
> +if (base != pcc_base) {
> +return false;
> +}
> +
> +if ((pvr & 0x0f00) == (pcc->pvr & 0x0f00)) {
> +/* Major DD version matches to power9_v1.0 and power9_v2.0 */
>  return true;
>  }
> +
>  return false;
>  }
>  
> @@ -6489,9 +6497,18 @@ static void init_proc_POWER10(CPUPPCState *env)
>  
>  static bool ppc_pvr_match_power10(PowerPCCPUClass *pcc, uint32_t pvr)
>  {
> -if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER10_BASE) {
> +uint32_t base = pvr & CPU_POWERPC_POWER_SERVER_MASK;
> +uint32_t pcc_base = pcc->pvr & CPU_POWERPC_POWER_SERVER_MASK;
> +
> +if (base != pcc_base) {
> +return false;
> +}
> +
> +if ((pvr & 0x0f00) == (pcc->pvr & 0x0f00)) {
> +/* Major DD version matches to power10_v1.0 and power10_v2.0 */
>  return true;
>  }
> +
>  return false;
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


  1   2   3   4   5   6   7   8   9   10   >