date:20240130

On Tue, Jan 30, 2024 at 12:11:47PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Mon, Jan 29, 2024 at 09:42:24AM -0300, Fabiano Rosas wrote:
> >> Peter Xu  writes:
> >> 
> >> > On Fri, Jan 26, 2024 at 07:19:39PM -0300, Fabiano Rosas wrote:
> >> >> +static MultiFDMethods multifd_socket_ops = {
> >> >> +.send_setup = multifd_socket_send_setup,
> >> >> +.send_cleanup = multifd_socket_send_cleanup,
> >> >> +.send_prepare = multifd_socket_send_prepare,
> >> >
> >> > Here it's named with "socket", however not all socket-based multifd
> >> > migrations will go into this route, e.g., when zstd compression enabled 
> >> > it
> >> > will not go via this route, even if zstd also uses sockets as transport.
> >> > From that pov, this may be slightly confusing.  Maybe it suites more to 
> >> > be
> >> > called "socket_plain" / "socket_no_comp"?
> >> >
> >> > One step back, I had a feeling that the current proposal tried to 
> >> > provide a
> >> > single ->ops to cover a model where we may need more than one layer of
> >> > abstraction.
> >> >
> >> > Since it might be helpful to allow multifd send arbitrary data (e.g. for
> >> > VFIO?  Avihai might have an answer there..), I'll try to even consider 
> >> > that
> >> > into the picture.
> >> >
> >> > Let's consider the ultimate goal of multifd, where the simplest model 
> >> > could
> >> > look like this in my mind (I'm only discussing sender side, but it'll be
> >> > similar on recv side):
> >> >
> >> >prepare()   send()
> >> >   Input   > IOVs > iochannels
> >> >
> >> > [I used prepare/send, but please think them as generic terms, not 100%
> >> >  aligned with what we have with existing multifd_ops, or what you 
> >> > proposed
> >> >  later]
> >> >
> >> > Here what are sure, IMHO, is:
> >> >
> >> >   - We always can have some input data to dump; I didn't use "guest 
> >> > pages"
> >> > just to say we may allow arbitrary data.  For any multifd user that
> >> > would like to dump arbitrary data, they can already provide IOVs, so
> >> > here input can be either "MultiFDPages_t" or "IOVs".
> >> 
> >> Or anything else, since the client code also has control over send(),
> >> no? So it could give multifd a pointer to some memory and then use
> >> send() to do whatever it wants with it. Multifd is just providing worker
> >> threads and "scheduling".
> >
> > IOVs contain the case of one single buffer, where n_iovs==1.  Here I
> > mentioned IOVs explicitly because I want to make it part of the protocol so
> > that the interface might be clearer, on what is not changing, and what can
> > change for a multifd client.
> 
> Got it. I agree.
> 
> >> 
> >> Also note that multifd clients currently _do not_ provide IOVs. They
> >> merely provide data to multifd (p->pages) and then convert that data
> >> into IOVs at prepare(). This is different, because multifd currently
> >> holds that p->pages (and turns that into p->normal), which means the
> >> client code does not need to store the data across iterations (in the
> >> case of RAM which is iterative).
> >
> > They provide?  AFAIU that's exactly MultiFDSendParams.iov as of now, while
> > iov_nums is the length.
> 
> Before that, the ram code needs to pass in the p->pages->offset array
> first. Then, that gets put into p->normal. Then, that gets put into
> p->iov at prepare(). So it's not a simple "fill p->iov and pass it to
> multifd".
> 
> Hmm, could we just replace multifd_send_state->pages with a
> multifd_send_state->iov? I don't really understand why do we need to
> carry that pages->offset around.

I am thinking the p->normal is mostly redundant.. at least on the sender
side that I just read.  Since I'll be preparing a new spin of the multifd
cleanup series I posted, maybe I can append one more to try dropping
p->normal[] completely.

> 
> >> 
> >> >
> >> >   - We may always want to have IOVs to represent the buffers at some 
> >> > point,
> >> > no matter what the input it
> >> >
> >> >   - We always flush the IOVs to iochannels; basically I want to say we 
> >> > can
> >> > always assume the last layer is connecting to QIOChannel APIs, while 
> >> > I
> >> > don't think there's outliers here so far, even if the send() may 
> >> > differ.
> >> >
> >> > Then _maybe_ it's clearer that we can have two layers of OPs?
> >> >
> >> >   - prepare(): it tells how the "input" will be converted into a scatter
> >> > gatter list of buffers.  All compression methods fall into this 
> >> > afaiu.
> >> > This has _nothing_ to do on how the buffers will be sent.  For
> >> > arbitrary-typed input, this can already be a no-op since the IOVs
> >> > provided can already be passed over to send().
> >> >
> >> >   - send(): how to dump the IOVs to the iochannels.  AFAIU this is motly
> >> > only useful for fixed-ram migrations.
> >> >
> >> > Would this be clearer, rather than keep using a single multifd_ops?
> >> 
> >> Sorry, I d

Re: [PATCH 2/3] ui/gtk: set the ui size to 0 when invisible

2024-01-30 Thread Marc-André Lureau

Hi

On Wed, Jan 31, 2024 at 3:50 AM  wrote:
>
> From: Dongwon Kim 
>
> UI size is set to 0 when the VC is invisible, which will prevent
> the further scanout update by notifying the guest that the display
> is not in active state. Then it is restored to the original size
> whenever the VC becomes visible again.

This can have unwanted results on multi monitor setups, such as moving
windows or icons around on different monitors.

Switching tabs or minimizing the display window shouldn't cause a
guest display reconfiguration.

What is the benefit of disabling the monitor here? Is it for
performance reasons?

>
> Cc: Marc-André Lureau 
> Cc: Gerd Hoffmann 
> Cc: Vivek Kasireddy 
> Signed-off-by: Dongwon Kim 
> ---
>  ui/gtk.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/ui/gtk.c b/ui/gtk.c
> index 02eb667d8a..651ed3492f 100644
> --- a/ui/gtk.c
> +++ b/ui/gtk.c
> @@ -1314,10 +1314,12 @@ static void gd_menu_switch_vc(GtkMenuItem *item, void 
> *opaque)
>  GtkDisplayState *s = opaque;
>  VirtualConsole *vc;
>  GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
> +GdkWindow *window;
>  gint page;
>
>  vc = gd_vc_find_current(s);
>  vc->gfx.visible = false;
> +gd_set_ui_size(vc, 0, 0);
>
>  vc = gd_vc_find_by_menu(s);
>  gtk_release_modifiers(s);
> @@ -1325,6 +1327,9 @@ static void gd_menu_switch_vc(GtkMenuItem *item, void 
> *opaque)
>  page = gtk_notebook_page_num(nb, vc->tab_item);
>  gtk_notebook_set_current_page(nb, page);
>  gtk_widget_grab_focus(vc->focus);
> +window = gtk_widget_get_window(vc->gfx.drawing_area);
> +gd_set_ui_size(vc, gdk_window_get_width(window),
> +   gdk_window_get_height(window));
>  vc->gfx.visible = true;
>  }
>  }
> @@ -1356,6 +1361,7 @@ static gboolean gd_tab_window_close(GtkWidget *widget, 
> GdkEvent *event,
>  GtkDisplayState *s = vc->s;
>
>  vc->gfx.visible = false;
> +gd_set_ui_size(vc, 0, 0);
>  gtk_widget_set_sensitive(vc->menu_item, true);
>  gd_widget_reparent(vc->window, s->notebook, vc->tab_item);
>  gtk_notebook_set_tab_label_text(GTK_NOTEBOOK(s->notebook),
> @@ -1391,6 +1397,7 @@ static gboolean gd_win_grab(void *opaque)
>  static void gd_menu_untabify(GtkMenuItem *item, void *opaque)
>  {
>  GtkDisplayState *s = opaque;
> +GdkWindow *window;
>  VirtualConsole *vc = gd_vc_find_current(s);
>
>  if (vc->type == GD_VC_GFX &&
> @@ -1429,6 +1436,10 @@ static void gd_menu_untabify(GtkMenuItem *item, void 
> *opaque)
>  gd_update_geometry_hints(vc);
>  gd_update_caption(s);
>  }
> +
> +window = gtk_widget_get_window(vc->gfx.drawing_area);
> +gd_set_ui_size(vc, gdk_window_get_width(window),
> +   gdk_window_get_height(window));
>  vc->gfx.visible = true;
>  }
>
> @@ -1753,7 +1764,9 @@ static gboolean gd_configure(GtkWidget *widget,
>  {
>  VirtualConsole *vc = opaque;
>
> -gd_set_ui_size(vc, cfg->width, cfg->height);
> +if (vc->gfx.visible) {
> +gd_set_ui_size(vc, cfg->width, cfg->height);
> +}
>  return FALSE;
>  }
>
> --
> 2.34.1
>
>


-- 
Marc-André Lureau

Re: [PATCH 1/3] ui/gtk: skip drawing guest scanout when associated VC is invisible

2024-01-30 Thread Marc-André Lureau

Hi Dongwon

On Wed, Jan 31, 2024 at 3:50 AM  wrote:
>
> From: Dongwon Kim 
>
> A new flag "visible" is added to show visibility status of the gfx console.
> The flag is set to 'true' when the VC is visible but set to 'false' when
> it is hidden or closed. When the VC is invisible, drawing guest frames
> should be skipped as it will never be completed and it would potentially
> lock up the guest display especially when blob scanout is used.

Can't it skip drawing when the widget is not visible instead?
https://docs.gtk.org/gtk3/method.Widget.is_visible.html

>
> Cc: Marc-André Lureau 
> Cc: Gerd Hoffmann 
> Cc: Vivek Kasireddy 
>
> Signed-off-by: Dongwon Kim 
> ---
>  include/ui/gtk.h |  1 +
>  ui/gtk-egl.c |  8 
>  ui/gtk-gl-area.c |  8 
>  ui/gtk.c | 10 +-
>  4 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/ui/gtk.h b/include/ui/gtk.h
> index aa3d637029..2de38e5724 100644
> --- a/include/ui/gtk.h
> +++ b/include/ui/gtk.h
> @@ -57,6 +57,7 @@ typedef struct VirtualGfxConsole {
>  bool y0_top;
>  bool scanout_mode;
>  bool has_dmabuf;
> +bool visible;
>  #endif
>  } VirtualGfxConsole;
>
> diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
> index 3af5ac5bcf..993c283191 100644
> --- a/ui/gtk-egl.c
> +++ b/ui/gtk-egl.c
> @@ -265,6 +265,10 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
>  #ifdef CONFIG_GBM
>  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
>
> +if (!vc->gfx.visible) {
> +return;
> +}
> +
>  eglMakeCurrent(qemu_egl_display, vc->gfx.esurface,
> vc->gfx.esurface, vc->gfx.ectx);
>
> @@ -363,6 +367,10 @@ void gd_egl_flush(DisplayChangeListener *dcl,
>  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
>  GtkWidget *area = vc->gfx.drawing_area;
>
> +if (!vc->gfx.visible) {
> +return;
> +}
> +
>  if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf->draw_submitted) 
> {
>  graphic_hw_gl_block(vc->gfx.dcl.con, true);
>  vc->gfx.guest_fb.dmabuf->draw_submitted = true;
> diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
> index 52dcac161e..04e07bd7ee 100644
> --- a/ui/gtk-gl-area.c
> +++ b/ui/gtk-gl-area.c
> @@ -285,6 +285,10 @@ void gd_gl_area_scanout_flush(DisplayChangeListener *dcl,
>  {
>  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
>
> +if (!vc->gfx.visible) {
> +return;
> +}
> +
>  if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf->draw_submitted) 
> {
>  graphic_hw_gl_block(vc->gfx.dcl.con, true);
>  vc->gfx.guest_fb.dmabuf->draw_submitted = true;
> @@ -299,6 +303,10 @@ void gd_gl_area_scanout_dmabuf(DisplayChangeListener 
> *dcl,
>  #ifdef CONFIG_GBM
>  VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
>
> +if (!vc->gfx.visible) {
> +return;
> +}
> +
>  gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
>  egl_dmabuf_import_texture(dmabuf);
>  if (!dmabuf->texture) {
> diff --git a/ui/gtk.c b/ui/gtk.c
> index 810d7fc796..02eb667d8a 100644
> --- a/ui/gtk.c
> +++ b/ui/gtk.c
> @@ -1312,15 +1312,20 @@ static void gd_menu_quit(GtkMenuItem *item, void 
> *opaque)
>  static void gd_menu_switch_vc(GtkMenuItem *item, void *opaque)
>  {
>  GtkDisplayState *s = opaque;
> -VirtualConsole *vc = gd_vc_find_by_menu(s);
> +VirtualConsole *vc;
>  GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
>  gint page;
>
> +vc = gd_vc_find_current(s);
> +vc->gfx.visible = false;
> +
> +vc = gd_vc_find_by_menu(s);
>  gtk_release_modifiers(s);
>  if (vc) {
>  page = gtk_notebook_page_num(nb, vc->tab_item);
>  gtk_notebook_set_current_page(nb, page);
>  gtk_widget_grab_focus(vc->focus);
> +vc->gfx.visible = true;
>  }
>  }
>
> @@ -1350,6 +1355,7 @@ static gboolean gd_tab_window_close(GtkWidget *widget, 
> GdkEvent *event,
>  VirtualConsole *vc = opaque;
>  GtkDisplayState *s = vc->s;
>
> +vc->gfx.visible = false;
>  gtk_widget_set_sensitive(vc->menu_item, true);
>  gd_widget_reparent(vc->window, s->notebook, vc->tab_item);
>  gtk_notebook_set_tab_label_text(GTK_NOTEBOOK(s->notebook),
> @@ -1423,6 +1429,7 @@ static void gd_menu_untabify(GtkMenuItem *item, void 
> *opaque)
>  gd_update_geometry_hints(vc);
>  gd_update_caption(s);
>  }
> +vc->gfx.visible = true;
>  }
>
>  static void gd_menu_show_menubar(GtkMenuItem *item, void *opaque)
> @@ -2471,6 +2478,7 @@ static void gtk_display_init(DisplayState *ds, 
> DisplayOptions *opts)
>  #ifdef CONFIG_GTK_CLIPBOARD
>  gd_clipboard_init(s);
>  #endif /* CONFIG_GTK_CLIPBOARD */
> +vc->gfx.visible = true;
>  }
>
>  static void early_gtk_display_init(DisplayOptions *opts)
> --
> 2.34.1
>
>


-- 
Marc-André Lureau

Re: [PATCH v4 28/47] hw/arm/npcm7xx: use qemu_configure_nic_device, allow emc0/emc1 as aliases


On 26/01/2024 18.25, David Woodhouse wrote:

From: David Woodhouse 

Also update the test to specify which device to attach the test socket
to, and remove the comment lamenting the fact that we can't do so.

Signed-off-by: David Woodhouse 
---
  hw/arm/npcm7xx.c   | 16 +---
  tests/qtest/npcm7xx_emc-test.c | 18 --
  2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index 15ff21d047..ee395864e4 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -655,8 +655,9 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
  
  /*

   * EMC Modules. Cannot fail.
- * The mapping of the device to its netdev backend works as follows:
- * emc[i] = nd_table[i]
+ * Use the available NIC configurations in order, allowing 'emc0' and
+ * 'emc1' to by used as aliases for the model= parameter to override.
+ *
   * This works around the inability to specify the netdev property for the
   * emc device: it's not pluggable and thus the -device option can't be
   * used.
@@ -664,12 +665,13 @@ static void npcm7xx_realize(DeviceState *dev, Error 
**errp)
  QEMU_BUILD_BUG_ON(ARRAY_SIZE(npcm7xx_emc_addr) != ARRAY_SIZE(s->emc));
  QEMU_BUILD_BUG_ON(ARRAY_SIZE(s->emc) != 2);
  for (i = 0; i < ARRAY_SIZE(s->emc); i++) {
-s->emc[i].emc_num = i;
  SysBusDevice *sbd = SYS_BUS_DEVICE(&s->emc[i]);
-if (nd_table[i].used) {
-qemu_check_nic_model(&nd_table[i], TYPE_NPCM7XX_EMC);
-qdev_set_nic_properties(DEVICE(sbd), &nd_table[i]);
-}
+char alias[6];
+
+s->emc[i].emc_num = i;
+snprintf(alias, sizeof(alias), "emc%u", i);
+qemu_configure_nic_device(DEVICE(sbd), true, alias);
+
  /*
   * The device exists regardless of whether it's connected to a QEMU
   * netdev backend. So always instantiate it even if there is no
diff --git a/tests/qtest/npcm7xx_emc-test.c b/tests/qtest/npcm7xx_emc-test.c
index b046f1d76a..f7646fae2c 100644
--- a/tests/qtest/npcm7xx_emc-test.c
+++ b/tests/qtest/npcm7xx_emc-test.c
@@ -225,21 +225,11 @@ static int *packet_test_init(int module_num, GString 
*cmd_line)
  g_assert_cmpint(ret, != , -1);
  
  /*

- * KISS and use -nic. We specify two nics (both emc{0,1}) because there's
- * currently no way to specify only emc1: The driver implicitly relies on
- * emc[i] == nd_table[i].
+ * KISS and use -nic. The driver accepts 'emc0' and 'emc1' as aliases
+ * in the 'model' field to specify the device to match.
   */
-if (module_num == 0) {
-g_string_append_printf(cmd_line,
-   " -nic socket,fd=%d,model=" TYPE_NPCM7XX_EMC " "
-   " -nic user,model=" TYPE_NPCM7XX_EMC " ",
-   test_sockets[1]);
-} else {
-g_string_append_printf(cmd_line,
-   " -nic user,model=" TYPE_NPCM7XX_EMC " "
-   " -nic socket,fd=%d,model=" TYPE_NPCM7XX_EMC " 
",
-   test_sockets[1]);
-}
+g_string_append_printf(cmd_line, " -nic socket,fd=%d,model=emc%d ",
+   test_sockets[1], module_num);
  
  g_test_queue_destroy(packet_test_clear, test_sockets);

  return test_sockets;


I like the idea to use the alias to configure a certain on-board NIC :-)

Reviewed-by: Thomas Huth

Re: [PATCH v4 26/47] hw/net/lan9118: use qemu_configure_nic_device()


On 26/01/2024 18.25, David Woodhouse wrote:

From: David Woodhouse 

Some callers instantiate the device unconditionally, others will do so only
if there is a NICInfo to go with it. This appears to be fairly random, but
preseve the existing behaviour for now.

Signed-off-by: David Woodhouse 
---
  hw/arm/kzm.c | 4 ++--
  hw/arm/mps2.c| 2 +-
  hw/arm/realview.c| 6 ++
  hw/arm/vexpress.c| 4 ++--
  hw/net/lan9118.c | 5 ++---
  include/hw/net/lan9118.h | 2 +-
  6 files changed, 10 insertions(+), 13 deletions(-)


Reviewed-by: Thomas Huth

Re: [PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus

On Wed, Jan 31, 2024 at 10:47:29AM +0530, Ani Sinha wrote:
> Date: Wed, 31 Jan 2024 10:47:29 +0530
> From: Ani Sinha 
> Subject: Re: [PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus
> 
> On Wed, Jan 31, 2024 at 9:27 AM Zhao Liu  wrote:
> >
> > Hi Ani,
> >
> > On Wed, Jan 31, 2024 at 08:19:06AM +0530, Ani Sinha wrote:
> > > Date: Wed, 31 Jan 2024 08:19:06 +0530
> > > From: Ani Sinha 
> > > Subject: [PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus
> > > X-Mailer: git-send-email 2.42.0
> > >
> > > Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to 
> > > allow up to 4096 vCPUs")
> > > Linux kernel can support upto a maximum number of 4096 vCPUS when MAXSMP 
> > > is
> > > enabled in the kernel. At present, QEMU has been tested to correctly boot 
> > > a
> > > linux guest with 1856 vcpus and no more both with edk2 and seabios 
> > > firmwares.
> >
> > About background, could I ask if there will be Host machines with so
> > much CPUs? What are the benefits of vCPUs that far exceed the number
> > of Host CPUs?
> 
> Yes HPE has SAP HANA host machines with large numbers of physical
> cores and memory. For example QEMU was tested on a system with 3840
> cores.

Thanks! For such large system, does the vCPU need the CPU affinity, or
just let them run free on the Host's physical cores?

> 
> >
> > Thanks,
> > Zhao
> >
> > > If an additional vcpu is added, that is with 1857 vcpus, edk2 currently 
> > > fails
> > > with the following error messages:
> > >
> > > AllocatePages failed: No 0x400 Pages is available.
> > > There is only left 0x2BF pages memory resource to be allocated.
> > > ERROR: Out of aligned pages
> > > ASSERT 
> > > /builddir/build/BUILD/edk2-ba91d0292e/MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c(814):
> > >  BigPageAddress != 0
> > >
> > > This error exists only with edk2. Seabios currently can boot a linux guest
> > > fine with 4096 vcpus. Since the lowest common denominator for a working 
> > > VM for
> > > both edk2 and seabios is 1856 vcpus, bump up the value max_cpus to 1856 
> > > for q35
> > > machines versions 9 and newer. Q35 machines versions 8.2 and older 
> > > continue
> > > to support 1024 maximum vcpus as before for compatibility reasons.
> > >
> > > If KVM is not able to support the specified number of vcpus, QEMU would
> > > return the following error messages:
> > >
> > > $ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728

In practice, do users need to set the socket level topology and NUMA to
be consistent with Host for this large system?

NUMA settings are also related to topology, and it's better if NUMA is
also covered.

Thanks,
Zhao

> > > qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested 
> > > (1728) exceeds the recommended cpus supported by KVM (12)
> > > qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus 
> > > requested (1728) exceeds the recommended cpus supported by KVM (12)
> > > Number of SMP cpus requested (1728) exceeds the maximum cpus supported by 
> > > KVM (1024)
> > >
> > > Cc: Daniel P. Berrangé 
> > > Cc: Igor Mammedov 
> > > Cc: Michael S. Tsirkin 
> > > Cc: Julia Suvorova 
> > > Cc: kra...@redhat.com
> > > Signed-off-by: Ani Sinha 
> > > ---
> > >  hw/i386/pc_q35.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > Changelog:
> > > v2: bump up the vcpu number to 1856. Add failure messages from ekd2 in
> > > the commit description.
> > > See also RH Jira https://issues.redhat.com/browse/RHEL-22202
> > >
> > > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> > > index f43d5142b8..f9c4b6594d 100644
> > > --- a/hw/i386/pc_q35.c
> > > +++ b/hw/i386/pc_q35.c
> > > @@ -375,7 +375,7 @@ static void pc_q35_machine_options(MachineClass *m)
> > >  m->default_nic = "e1000e";
> > >  m->default_kernel_irqchip_split = false;
> > >  m->no_floppy = 1;
> > > -m->max_cpus = 1024;
> > > +m->max_cpus = 1856;
> > >  m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
> > >  machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
> > >  machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
> > > @@ -396,6 +396,7 @@ static void pc_q35_8_2_machine_options(MachineClass 
> > > *m)
> > >  {
> > >  pc_q35_9_0_machine_options(m);
> > >  m->alias = NULL;
> > > +m->max_cpus = 1024;
> > >  compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);
> > >  compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
> > >  }
> > > --
> > > 2.42.0
> > >
> > >
> >
> 
>

Re: [PATCH 2/2] target/riscv: Support xtheadmaee for thead-c906

2024-01-30 Thread LIU Zhiwei




On 2024/1/31 13:07, Richard Henderson wrote:

On 1/30/24 21:11, LIU Zhiwei wrote:

+riscv_csr_operations th_csr_ops[CSR_TABLE_SIZE] = {
+#if !defined(CONFIG_USER_ONLY)
+    [CSR_TH_MXSTATUS] = { "th_mxstatus", th_maee_check, 
read_th_mxstatus,

+ write_th_mxstatus},
+#endif /* !CONFIG_USER_ONLY */
+};


This is clearly the wrong data structure for a single entry in the array.


This array should have the same size with csr_ops so that we can 
override custom CSR behavior directly. Besides, It will have other 
entries in the near future.


I see that I missed surround the th_maee_check, read_th_mxstatus, 
write_mxstatus with !CONFIG_USER_ONLY.  But I don't understand why it is 
wrong for a single entry in the array, at least the compiler think it 
has no error.


Thanks,
Zhiwei




r~

Re: [PATCH v3 02/29] hw/core: Declare CPUArchId::cpu as CPUState instead of Object


On 29/01/2024 17.44, Philippe Mathieu-Daudé wrote:

Do not accept any Object for CPUArchId::cpu field,
restrict it to CPUState type.

Signed-off-by: Philippe Mathieu-Daudé 
---
  include/hw/boards.h| 2 +-
  hw/core/machine.c  | 4 ++--
  hw/i386/x86.c  | 2 +-
  hw/loongarch/virt.c| 2 +-
  hw/ppc/spapr.c | 5 ++---
  hw/s390x/s390-virtio-ccw.c | 2 +-
  6 files changed, 8 insertions(+), 9 deletions(-)


Reviewed-by: Thomas Huth

Re: [PATCH 1/2] target/riscv: Register vendors CSR

2024-01-30 Thread LIU Zhiwei




On 2024/1/31 13:06, Richard Henderson wrote:

On 1/30/24 21:11, LIU Zhiwei wrote:

+/* This stub just works for making vendors array not empty */
+riscv_csr_operations stub_csr_ops[CSR_TABLE_SIZE];
+static inline bool never_p(const RISCVCPUConfig *cfg)
+{
+    return false;
+}
+
+void riscv_tcg_cpu_register_vendor_csr(RISCVCPU *cpu)
+{
+    static const struct {
+    bool (*guard_func)(const RISCVCPUConfig *);
+    riscv_csr_operations *csr_ops;
+    } vendors[] = {
+    { never_p, stub_csr_ops },
+    };
+    for (size_t i = 0; i < ARRAY_SIZE(vendors); ++i) {


Presumably you did this to avoid a Werror for "i < 0", since i is 
unsigned.

Yes. That's the gcc complains.


It would be better to either use "int i"

OK

, or

  for (size_t i = 0, n = ARRAY_SIZE(vendors); i < n; ++i)

either of which will not Werror.

This works.  I don't know why it works, because n is 0 and never changes.


Especially considering the size of stub_csr_ops.


Do you mean we should remove the stub_csr_ops? I don't know how to 
relate your two solving ways to stub_csr_ops.


Thanks,
Zhiwei



r~

Re: [PATCH v4 8/8] Add tests for the STM32L4x5_RCC


On 30/01/2024 17.13, Arnaud Minier wrote:

Tests:
- the ability to change the sysclk of the device
- the ability to enable/disable/configure the PLLs
- if the clock multiplexers work
- the register flags and the generation of irqs

Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
---
  tests/qtest/meson.build  |   3 +-
  tests/qtest/stm32l4x5_rcc-test.c | 207 +++
  2 files changed, 209 insertions(+), 1 deletion(-)
  create mode 100644 tests/qtest/stm32l4x5_rcc-test.c


Acked-by: Thomas Huth

Re: [PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus

On Wed, Jan 31, 2024 at 9:27 AM Zhao Liu  wrote:
>
> Hi Ani,
>
> On Wed, Jan 31, 2024 at 08:19:06AM +0530, Ani Sinha wrote:
> > Date: Wed, 31 Jan 2024 08:19:06 +0530
> > From: Ani Sinha 
> > Subject: [PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus
> > X-Mailer: git-send-email 2.42.0
> >
> > Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow 
> > up to 4096 vCPUs")
> > Linux kernel can support upto a maximum number of 4096 vCPUS when MAXSMP is
> > enabled in the kernel. At present, QEMU has been tested to correctly boot a
> > linux guest with 1856 vcpus and no more both with edk2 and seabios 
> > firmwares.
>
> About background, could I ask if there will be Host machines with so
> much CPUs? What are the benefits of vCPUs that far exceed the number
> of Host CPUs?

Yes HPE has SAP HANA host machines with large numbers of physical
cores and memory. For example QEMU was tested on a system with 3840
cores.

>
> Thanks,
> Zhao
>
> > If an additional vcpu is added, that is with 1857 vcpus, edk2 currently 
> > fails
> > with the following error messages:
> >
> > AllocatePages failed: No 0x400 Pages is available.
> > There is only left 0x2BF pages memory resource to be allocated.
> > ERROR: Out of aligned pages
> > ASSERT 
> > /builddir/build/BUILD/edk2-ba91d0292e/MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c(814):
> >  BigPageAddress != 0
> >
> > This error exists only with edk2. Seabios currently can boot a linux guest
> > fine with 4096 vcpus. Since the lowest common denominator for a working VM 
> > for
> > both edk2 and seabios is 1856 vcpus, bump up the value max_cpus to 1856 for 
> > q35
> > machines versions 9 and newer. Q35 machines versions 8.2 and older continue
> > to support 1024 maximum vcpus as before for compatibility reasons.
> >
> > If KVM is not able to support the specified number of vcpus, QEMU would
> > return the following error messages:
> >
> > $ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728
> > qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested 
> > (1728) exceeds the recommended cpus supported by KVM (12)
> > qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus 
> > requested (1728) exceeds the recommended cpus supported by KVM (12)
> > Number of SMP cpus requested (1728) exceeds the maximum cpus supported by 
> > KVM (1024)
> >
> > Cc: Daniel P. Berrangé 
> > Cc: Igor Mammedov 
> > Cc: Michael S. Tsirkin 
> > Cc: Julia Suvorova 
> > Cc: kra...@redhat.com
> > Signed-off-by: Ani Sinha 
> > ---
> >  hw/i386/pc_q35.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > Changelog:
> > v2: bump up the vcpu number to 1856. Add failure messages from ekd2 in
> > the commit description.
> > See also RH Jira https://issues.redhat.com/browse/RHEL-22202
> >
> > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> > index f43d5142b8..f9c4b6594d 100644
> > --- a/hw/i386/pc_q35.c
> > +++ b/hw/i386/pc_q35.c
> > @@ -375,7 +375,7 @@ static void pc_q35_machine_options(MachineClass *m)
> >  m->default_nic = "e1000e";
> >  m->default_kernel_irqchip_split = false;
> >  m->no_floppy = 1;
> > -m->max_cpus = 1024;
> > +m->max_cpus = 1856;
> >  m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
> >  machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
> >  machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
> > @@ -396,6 +396,7 @@ static void pc_q35_8_2_machine_options(MachineClass *m)
> >  {
> >  pc_q35_9_0_machine_options(m);
> >  m->alias = NULL;
> > +m->max_cpus = 1024;
> >  compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);
> >  compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
> >  }
> > --
> > 2.42.0
> >
> >
>

Re: [PATCH 2/2] target/riscv: Support xtheadmaee for thead-c906

2024-01-30 Thread Richard Henderson


On 1/30/24 21:11, LIU Zhiwei wrote:

+riscv_csr_operations th_csr_ops[CSR_TABLE_SIZE] = {
+#if !defined(CONFIG_USER_ONLY)
+[CSR_TH_MXSTATUS] = { "th_mxstatus", th_maee_check, read_th_mxstatus,
+write_th_mxstatus},
+#endif /* !CONFIG_USER_ONLY */
+};


This is clearly the wrong data structure for a single entry in the array.


r~

Re: [PATCH 1/2] target/riscv: Register vendors CSR

2024-01-30 Thread Richard Henderson


On 1/30/24 21:11, LIU Zhiwei wrote:

+/* This stub just works for making vendors array not empty */
+riscv_csr_operations stub_csr_ops[CSR_TABLE_SIZE];
+static inline bool never_p(const RISCVCPUConfig *cfg)
+{
+return false;
+}
+
+void riscv_tcg_cpu_register_vendor_csr(RISCVCPU *cpu)
+{
+static const struct {
+bool (*guard_func)(const RISCVCPUConfig *);
+riscv_csr_operations *csr_ops;
+} vendors[] = {
+{ never_p, stub_csr_ops },
+};
+for (size_t i = 0; i < ARRAY_SIZE(vendors); ++i) {


Presumably you did this to avoid a Werror for "i < 0", since i is unsigned.

It would be better to either use "int i", or

  for (size_t i = 0, n = ARRAY_SIZE(vendors); i < n; ++i)

either of which will not Werror.

Especially considering the size of stub_csr_ops.

r~

Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding

On Tue, Jan 30, 2024 at 06:32:21PM -0300, Fabiano Rosas wrote:
> Avihai Horon  writes:
> 
> > On 29/01/2024 16:34, Fabiano Rosas wrote:
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Avihai Horon  writes:
> >>
> >>> Currently, multifd channels are created asynchronously without waiting
> >>> for their creation -- migration simply proceeds and may wait in
> >>> multifd_send_sync_main(), which is called by ram_save_setup(). This
> >>> hides in it some race conditions which can cause an unexpected behavior
> >>> if some channels creation fail.
> >>>
> >>> For example, the following scenario of multifd migration with two
> >>> channels, where the first channel creation fails, will end in a
> >>> segmentation fault (time advances from top to bottom):
> >> Is this reproducible? Or just observable at least.
> >
> > Yes, though I had to engineer it a bit:
> > 1. Run migration with two multifd channels and fail creation of the two 
> > channels (e.g., by changing the address they are connecting to).
> > 2. Add sleep(3) in multifd_send_sync_main() before we loop through the 
> > channels and check p->quit.
> > 3. Add sleep(5) only for the second multifd channel connect thread so 
> > its connection is delayed and runs last.
> 
> Ok, well, that's something at least. I'll try to reproduce it so we can
> keep track of it.
> 
> >> I acknowledge the situation you describe, but with multifd there's
> >> usually an issue in cleanup paths. Let's make sure we flushed those out
> >> before adding this new semaphore.
> >
> > Indeed, I was not keen on adding yet another semaphore either.
> > I think there are multiple bugs here, some of them overlap and some don't.
> > There is also your and Peter's previous work that I was not aware of to 
> > fix those and to clean up the code.
> >
> > Maybe we can take it one step at a time, pushing your series first, 
> > cleaning the code and fixing some bugs.
> > Then we can see what bugs are left (if any) and fix them. It might even 
> > be easier to fix after the cleanups.
> >
> >> This is similar to an issue Peter was addressing where we missed calling
> >> multifd_send_termiante_threads() in the multifd_channel_connect() path:
> >>
> >> patch 4 in this
> >> https://lore.kernel.org/r/20231022201211.452861-1-pet...@redhat.com
> >
> > What issue are you referring here? Can you elaborate?
> 
> Oh, I just realised that series doesn't address any particular bug. But
> my point is that including a call to multifd_send_terminate_threads() at
> new_send_channel_cleanup might be all that's needed because that has
> code to cause the channels and the migration thread to end.

It seems so to me.

One other issue is I hope we can get rid of p->quit before adding more code
to operate on it.

I'll see whether I can respin that series this week soon.  I'll consider
dropping the last ones, but pick the initial ones that may already help.  I
just noticed patch 2 is already merged with Avihai's similar patch;
obviously I completely forgot that series..

> 
> > The main issue I am trying to fix in my patch is that we don't wait for 
> > all multifd channels to be created/error out before tearing down
> > multifd resources in mulitfd_save_cleanup().
> 
> Ok, let me take a step back and ask why is this not solved by
> multifd_save_cleanup() -> qemu_thread_join()? I see you moved
> p->running=true to *after* the thread creation in patch 4. That will
> always leave a gap where p->running == false but the thread is already
> running.

The whole threading in multifd currently is just IMHO a mess.  We keep
creating threads but never cared on how that goes, and how to sync with the
threads.

I do have plan to track every TID that migration creates (including the
ones of qio tasks, maybe?), then we can always manage the threads, and
making sure all the threads will be freed when migrate_fd_cleanup()
finishes, by join()ing each of them.  I suppose we may also need things
like pthread_cancel(), consider when any thread got blocked somewhere but
the admin invoked a "migrate-cancel" request.

With any dangling thread being there, we always face weird risks: either
some migration code will be scheduled even after migration failed (like
this one), or it could be worse if that thread only got scheduled until the
2nd migration started after the 1st one cancelled - we need to be prepared
to see some extra threads running, having no idea where did they come from,
and those bugs will be hard to debug.

I haven't yet started looking into that, it'll be good if anyone would like
to explore that direction for a full resolution on multifd threadings.

> 
> >
> >>> Thread   | Code execution
> >>> 
> >>> Multifd 1|
> >>>   | multifd_new_send_channel_async (errors and quits)
> >>>   |   multifd_new_send_channel_cleanup
> >>>   |
> >>> Migration thread |
> >>>

Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64

On Tue, Jan 30, 2024 at 06:23:10PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Tue, Jan 30, 2024 at 10:18:07AM +, Peter Maydell wrote:
> >> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas  wrote:
> >> >
> >> > Fabiano Rosas  writes:
> >> >
> >> > > Peter Xu  writes:
> >> > >
> >> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
> >> > > The issue that occurs to me now is that 'cpu host' will not work with
> >> > > TCG. We might actually need to go poking /dev/kvm for this to work.
> >> >
> >> > Nevermind this last part. There's not going to be a scenario where we
> >> > build with CONFIG_KVM, but run in an environment that does not support
> >> > KVM.
> >> 
> >> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host,
> >> but that doesn't imply that the user running the build and
> >> test has permissions for /dev/kvm.
> >
> > I'm actually pretty confused on why this would be a problem even for
> > neoverse-n1: can we just try to use KVM, if it fails then use TCG?
> > Something like:
> >
> >   (construct qemu cmdline)
> >   ..
> > #ifdef CONFIG_KVM
> 
> >   "-accel kvm "
> > #endif
> >   "-accel tcg "
> >   ..
> >
> > ?
> > IIUC if we specify two "-accel", we'll try the first, then if failed then
> > the 2nd?
> 
> Aside from '-cpu max', there's no -accel and -cpu combination that works
> on all of:
> 
> x86_64 host - TCG-only
> aarch64 host - KVM & TCG
> aarch64 host with --disable-tcg - KVM-only
> aarch64 host without access to /dev/kvm - TCG-only
> 
> And the cpus are:
> host - KVM-only
> neoverse-n1 - TCG-only
> 
> We'll need something like:
> 
> /* covers aarch64 host with --disable-tcg */
> if (qtest_has_accel("kvm") && !qtest_has_accel("tcg")) {
>if (open("/dev/kvm", O_RDONLY) < 0) {
>g_test_skip()
>} else {
>"-accel kvm -cpu host"
>}
> }
> 
> /* covers x86_64 host */
> if (!qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
>"-accel tcg -cpu neoverse-n1"
> }
> 
> /* covers aarch64 host */
> if (qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
>if (open("/dev/kvm", O_RDONLY) < 0) {
>   "-accel tcg -cpu neoverse-n1"
>} else {
>   "-accel kvm -cpu host"
>}
> }

The open("/dev/kvm") logic more or less duplicates what QEMU already does
when init accelerators:

if (!qemu_opts_foreach(qemu_find_opts("accel"),
   do_configure_accelerator, &init_failed, 
&error_fatal)) {
if (!init_failed) {
error_report("no accelerator found");
}
exit(1);
}

If /dev/kvm not accessible I think it'll already fallback to tcg here, as
do_configure_accelerator() for kvm will just silently fail for qtest.  I
hope we can still rely on that for /dev/kvm access issues.

Hmm, I just notice that test_migrate_start() already has this later:

"-accel kvm%s -accel tcg "

So we're actually good from that regard, AFAIU.

Then did I understand it right that in the failure case KVM is properly
initialized, however it crashed later in neoverse-n1 asking for TCG?  So
the logic in the accel code above didn't really work to do a real fallback?
A backtrace of such crash would help, maybe; I tried to find it in the
pipeline log but I can only see:

  --- stderr ---
  Broken pipe
  ../tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process 
but encountered exit status 1 (expected 0)

Or, is there some aarch64 cpu that will have a stable CPU ABI (not like
-max, which is unstable), meanwhile supports both TCG + KVM?

Another thing I noticed that we may need to be caution is that currently
gic is also using max version:

machine_opts = "gic-version=max";

We may want to choose a sane version too, probably altogether with the
patch?

-- 
Peter Xu

Re: [PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus

Hi Ani,

On Wed, Jan 31, 2024 at 08:19:06AM +0530, Ani Sinha wrote:
> Date: Wed, 31 Jan 2024 08:19:06 +0530
> From: Ani Sinha 
> Subject: [PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus
> X-Mailer: git-send-email 2.42.0
> 
> Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow 
> up to 4096 vCPUs")
> Linux kernel can support upto a maximum number of 4096 vCPUS when MAXSMP is
> enabled in the kernel. At present, QEMU has been tested to correctly boot a
> linux guest with 1856 vcpus and no more both with edk2 and seabios firmwares.

About background, could I ask if there will be Host machines with so
much CPUs? What are the benefits of vCPUs that far exceed the number
of Host CPUs?

Thanks,
Zhao

> If an additional vcpu is added, that is with 1857 vcpus, edk2 currently fails
> with the following error messages:
> 
> AllocatePages failed: No 0x400 Pages is available.
> There is only left 0x2BF pages memory resource to be allocated.
> ERROR: Out of aligned pages
> ASSERT 
> /builddir/build/BUILD/edk2-ba91d0292e/MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c(814):
>  BigPageAddress != 0
> 
> This error exists only with edk2. Seabios currently can boot a linux guest
> fine with 4096 vcpus. Since the lowest common denominator for a working VM for
> both edk2 and seabios is 1856 vcpus, bump up the value max_cpus to 1856 for 
> q35
> machines versions 9 and newer. Q35 machines versions 8.2 and older continue
> to support 1024 maximum vcpus as before for compatibility reasons.
> 
> If KVM is not able to support the specified number of vcpus, QEMU would
> return the following error messages:
> 
> $ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728
> qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested (1728) 
> exceeds the recommended cpus supported by KVM (12)
> qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus 
> requested (1728) exceeds the recommended cpus supported by KVM (12)
> Number of SMP cpus requested (1728) exceeds the maximum cpus supported by KVM 
> (1024)
> 
> Cc: Daniel P. Berrangé 
> Cc: Igor Mammedov 
> Cc: Michael S. Tsirkin 
> Cc: Julia Suvorova 
> Cc: kra...@redhat.com
> Signed-off-by: Ani Sinha 
> ---
>  hw/i386/pc_q35.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Changelog:
> v2: bump up the vcpu number to 1856. Add failure messages from ekd2 in
> the commit description.
> See also RH Jira https://issues.redhat.com/browse/RHEL-22202
> 
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index f43d5142b8..f9c4b6594d 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -375,7 +375,7 @@ static void pc_q35_machine_options(MachineClass *m)
>  m->default_nic = "e1000e";
>  m->default_kernel_irqchip_split = false;
>  m->no_floppy = 1;
> -m->max_cpus = 1024;
> +m->max_cpus = 1856;
>  m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
>  machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
>  machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
> @@ -396,6 +396,7 @@ static void pc_q35_8_2_machine_options(MachineClass *m)
>  {
>  pc_q35_9_0_machine_options(m);
>  m->alias = NULL;
> +m->max_cpus = 1024;
>  compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);
>  compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
>  }
> -- 
> 2.42.0
> 
>

Re: [PATCH 5/5] hw/core: Remove transitional infrastructure from BusClass

On Fri, Jan 19, 2024 at 04:35:12PM +, Peter Maydell wrote:
> Date: Fri, 19 Jan 2024 16:35:12 +
> From: Peter Maydell 
> Subject: [PATCH 5/5] hw/core: Remove transitional infrastructure from
>  BusClass
> X-Mailer: git-send-email 2.34.1
> 
> BusClass currently has transitional infrastructure to support
> subclasses which implement the legacy BusClass::reset method rather
> than the Resettable interface.  We have now removed all the users of
> BusClass::reset in the tree, so we can remove the transitional
> infrastructure.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/qdev-core.h |  2 --
>  hw/core/bus.c  | 67 --
>  2 files changed, 69 deletions(-)

Reviewed-by: Zhao Liu 

It seems the similar cleanup for DeviceClass needs a lot of effort.

> 
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 151d9682380..986c924fa55 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -329,8 +329,6 @@ struct BusClass {
>   */
>  char *(*get_fw_dev_path)(DeviceState *dev);
>  
> -void (*reset)(BusState *bus);
> -
>  /*
>   * Return whether the device can be added to @bus,
>   * based on the address that was set (via device properties)
> diff --git a/hw/core/bus.c b/hw/core/bus.c
> index c7831b5293b..b9d89495cdf 100644
> --- a/hw/core/bus.c
> +++ b/hw/core/bus.c
> @@ -232,57 +232,6 @@ static char *default_bus_get_fw_dev_path(DeviceState 
> *dev)
>  return g_strdup(object_get_typename(OBJECT(dev)));
>  }
>  
> -/**
> - * bus_phases_reset:
> - * Transition reset method for buses to allow moving
> - * smoothly from legacy reset method to multi-phases
> - */
> -static void bus_phases_reset(BusState *bus)
> -{
> -ResettableClass *rc = RESETTABLE_GET_CLASS(bus);
> -
> -if (rc->phases.enter) {
> -rc->phases.enter(OBJECT(bus), RESET_TYPE_COLD);
> -}
> -if (rc->phases.hold) {
> -rc->phases.hold(OBJECT(bus));
> -}
> -if (rc->phases.exit) {
> -rc->phases.exit(OBJECT(bus));
> -}
> -}
> -
> -static void bus_transitional_reset(Object *obj)
> -{
> -BusClass *bc = BUS_GET_CLASS(obj);
> -
> -/*
> - * This will call either @bus_phases_reset (for multi-phases transitioned
> - * buses) or a bus's specific method for not-yet transitioned buses.
> - * In both case, it does not reset children.
> - */
> -if (bc->reset) {
> -bc->reset(BUS(obj));
> -}
> -}
> -
> -/**
> - * bus_get_transitional_reset:
> - * check if the bus's class is ready for multi-phase
> - */
> -static ResettableTrFunction bus_get_transitional_reset(Object *obj)
> -{
> -BusClass *dc = BUS_GET_CLASS(obj);
> -if (dc->reset != bus_phases_reset) {
> -/*
> - * dc->reset has been overridden by a subclass,
> - * the bus is not ready for multi phase yet.
> - */
> -return bus_transitional_reset;
> -}
> -return NULL;
> -}
> -
>  static void bus_class_init(ObjectClass *class, void *data)
>  {
>  BusClass *bc = BUS_CLASS(class);
> @@ -293,22 +242,6 @@ static void bus_class_init(ObjectClass *class, void 
> *data)
>  
>  rc->get_state = bus_get_reset_state;
>  rc->child_foreach = bus_reset_child_foreach;
> -
> -/*
> - * @bus_phases_reset is put as the default reset method below, allowing
> - * to do the multi-phase transition from base classes to leaf classes. It
> - * allows a legacy-reset Bus class to extend a multi-phases-reset
> - * Bus class for the following reason:
> - * + If a base class B has been moved to multi-phase, then it does not
> - *   override this default reset method and may have defined phase 
> methods.
> - * + A child class C (extending class B) which uses
> - *   bus_class_set_parent_reset() (or similar means) to override the
> - *   reset method will still work as expected. @bus_phases_reset function
> - *   will be registered as the parent reset method and effectively call
> - *   parent reset phases.
> - */
> -bc->reset = bus_phases_reset;
> -rc->get_transitional_function = bus_get_transitional_reset;
>  }
>  
>  static void qbus_finalize(Object *obj)
> -- 
> 2.34.1
> 
>

Re: [PATCH 4/5] hw/s390x/css-bridge: switch virtual-css bus to 3-phase-reset

On Fri, Jan 19, 2024 at 04:35:11PM +, Peter Maydell wrote:
> Date: Fri, 19 Jan 2024 16:35:11 +
> From: Peter Maydell 
> Subject: [PATCH 4/5] hw/s390x/css-bridge: switch virtual-css bus to
>  3-phase-reset
> X-Mailer: git-send-email 2.34.1
> 
> Switch the s390x virtual-css bus from using BusClass::reset to the
> Resettable interface.
> 
> This has no behavioural change, because the BusClass code to support
> subclasses that use the legacy BusClass::reset will call that method
> in the hold phase of 3-phase reset.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/s390x/css-bridge.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Zhao Liu 

> 
> diff --git a/hw/s390x/css-bridge.c b/hw/s390x/css-bridge.c
> index 15d26efc951..34639f21435 100644
> --- a/hw/s390x/css-bridge.c
> +++ b/hw/s390x/css-bridge.c
> @@ -56,7 +56,7 @@ static void ccw_device_unplug(HotplugHandler *hotplug_dev,
>  qdev_unrealize(dev);
>  }
>  
> -static void virtual_css_bus_reset(BusState *qbus)
> +static void virtual_css_bus_reset_hold(Object *obj)
>  {
>  /* This should actually be modelled via the generic css */
>  css_reset();
> @@ -81,8 +81,9 @@ static char *virtual_css_bus_get_dev_path(DeviceState *dev)
>  static void virtual_css_bus_class_init(ObjectClass *klass, void *data)
>  {
>  BusClass *k = BUS_CLASS(klass);
> +ResettableClass *rc = RESETTABLE_CLASS(klass);
>  
> -k->reset = virtual_css_bus_reset;
> +rc->phases.hold = virtual_css_bus_reset_hold;
>  k->get_dev_path = virtual_css_bus_get_dev_path;
>  }
>  
> -- 
> 2.34.1
> 
>

Re: [PATCH 2/5] vmbus: Switch bus reset to 3-phase-reset

On Fri, Jan 19, 2024 at 04:35:09PM +, Peter Maydell wrote:
> Date: Fri, 19 Jan 2024 16:35:09 +
> From: Peter Maydell 
> Subject: [PATCH 2/5] vmbus: Switch bus reset to 3-phase-reset
> X-Mailer: git-send-email 2.34.1
> 
> Switch vmbus from using BusClass::reset to the Resettable interface.
> 
> This has no behavioural change, because the BusClass code to support
> subclasses that use the legacy BusClass::reset will call that method
> in the hold phase of 3-phase reset.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/hyperv/vmbus.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Zhao Liu 

> 
> diff --git a/hw/hyperv/vmbus.c b/hw/hyperv/vmbus.c
> index c86d1895bae..380239af2c7 100644
> --- a/hw/hyperv/vmbus.c
> +++ b/hw/hyperv/vmbus.c
> @@ -2453,9 +2453,9 @@ static void vmbus_unrealize(BusState *bus)
>  qemu_mutex_destroy(&vmbus->rx_queue_lock);
>  }
>  
> -static void vmbus_reset(BusState *bus)
> +static void vmbus_reset_hold(Object *obj)
>  {
> -vmbus_deinit(VMBUS(bus));
> +vmbus_deinit(VMBUS(obj));
>  }
>  
>  static char *vmbus_get_dev_path(DeviceState *dev)
> @@ -2476,12 +2476,13 @@ static char *vmbus_get_fw_dev_path(DeviceState *dev)
>  static void vmbus_class_init(ObjectClass *klass, void *data)
>  {
>  BusClass *k = BUS_CLASS(klass);
> +ResettableClass *rc = RESETTABLE_CLASS(klass);
>  
>  k->get_dev_path = vmbus_get_dev_path;
>  k->get_fw_dev_path = vmbus_get_fw_dev_path;
>  k->realize = vmbus_realize;
>  k->unrealize = vmbus_unrealize;
> -k->reset = vmbus_reset;
> +rc->phases.hold = vmbus_reset_hold;
>  }
>  
>  static int vmbus_pre_load(void *opaque)
> -- 
> 2.34.1
> 
>

Re: [PATCH 3/5] adb: Switch bus reset to 3-phase-reset

On Fri, Jan 19, 2024 at 04:35:10PM +, Peter Maydell wrote:
> Date: Fri, 19 Jan 2024 16:35:10 +
> From: Peter Maydell 
> Subject: [PATCH 3/5] adb: Switch bus reset to 3-phase-reset
> X-Mailer: git-send-email 2.34.1
> 
> Switch the ADB bus from using BusClass::reset to the Resettable
> interface.
> 
> This has no behavioural change, because the BusClass code to support
> subclasses that use the legacy BusClass::reset will call that method
> in the hold phase of 3-phase reset.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/input/adb.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Zhao Liu 

> 
> diff --git a/hw/input/adb.c b/hw/input/adb.c
> index 0f3c73d6d00..98f39b4281a 100644
> --- a/hw/input/adb.c
> +++ b/hw/input/adb.c
> @@ -231,9 +231,9 @@ static const VMStateDescription vmstate_adb_bus = {
>  }
>  };
>  
> -static void adb_bus_reset(BusState *qbus)
> +static void adb_bus_reset_hold(Object *obj)
>  {
> -ADBBusState *adb_bus = ADB_BUS(qbus);
> +ADBBusState *adb_bus = ADB_BUS(obj);
>  
>  adb_bus->autopoll_enabled = false;
>  adb_bus->autopoll_mask = 0x;
> @@ -262,10 +262,11 @@ static void adb_bus_unrealize(BusState *qbus)
>  static void adb_bus_class_init(ObjectClass *klass, void *data)
>  {
>  BusClass *k = BUS_CLASS(klass);
> +ResettableClass *rc = RESETTABLE_CLASS(klass);
>  
>  k->realize = adb_bus_realize;
>  k->unrealize = adb_bus_unrealize;
> -k->reset = adb_bus_reset;
> +rc->phases.hold = adb_bus_reset_hold;
>  }
>  
>  static const TypeInfo adb_bus_type_info = {
> -- 
> 2.34.1
> 
>

Re: [PATCH 1/5] pci: Switch bus reset to 3-phase-reset

On Fri, Jan 19, 2024 at 04:35:08PM +, Peter Maydell wrote:
> Date: Fri, 19 Jan 2024 16:35:08 +
> From: Peter Maydell 
> Subject: [PATCH 1/5] pci: Switch bus reset to 3-phase-reset
> X-Mailer: git-send-email 2.34.1
> 
> Switch the PCI bus from using BusClass::reset to the Resettable
> interface.
> 
> This has no behavioural change, because the BusClass code to support
> subclasses that use the legacy BusClass::reset will call that method
> in the hold phase of 3-phase reset.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/pci/pci.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)

Reviewed-by: Zhao Liu 

> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 76080af580d..05c2e46bda5 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -64,7 +64,7 @@ bool pci_available = true;
>  
>  static char *pcibus_get_dev_path(DeviceState *dev);
>  static char *pcibus_get_fw_dev_path(DeviceState *dev);
> -static void pcibus_reset(BusState *qbus);
> +static void pcibus_reset_hold(Object *obj);
>  static bool pcie_has_upstream_port(PCIDevice *dev);
>  
>  static Property pci_props[] = {
> @@ -202,13 +202,15 @@ static void pci_bus_class_init(ObjectClass *klass, void 
> *data)
>  {
>  BusClass *k = BUS_CLASS(klass);
>  PCIBusClass *pbc = PCI_BUS_CLASS(klass);
> +ResettableClass *rc = RESETTABLE_CLASS(klass);
>  
>  k->print_dev = pcibus_dev_print;
>  k->get_dev_path = pcibus_get_dev_path;
>  k->get_fw_dev_path = pcibus_get_fw_dev_path;
>  k->realize = pci_bus_realize;
>  k->unrealize = pci_bus_unrealize;
> -k->reset = pcibus_reset;
> +
> +rc->phases.hold = pcibus_reset_hold;
>  
>  pbc->bus_num = pcibus_num;
>  pbc->numa_node = pcibus_numa_node;
> @@ -424,9 +426,9 @@ void pci_device_reset(PCIDevice *dev)
>   * Called via bus_cold_reset on RST# assert, after the devices
>   * have been reset device_cold_reset-ed already.
>   */
> -static void pcibus_reset(BusState *qbus)
> +static void pcibus_reset_hold(Object *obj)
>  {
> -PCIBus *bus = DO_UPCAST(PCIBus, qbus, qbus);
> +PCIBus *bus = PCI_BUS(obj);
>  int i;
>  
>  for (i = 0; i < ARRAY_SIZE(bus->devices); ++i) {
> -- 
> 2.34.1
> 
>

Re: [libvirt PATCH V2 0/4] add loongarch support for libvirt

2024-01-30 Thread lixianglai




Hi Philippe:

     When developing libvirt on loongarch, we encountered some problems
related to pflash.

libvirt and qemu met some difficulties in the coordination of UEFI loading.

I think we need your suggestions and opinions on the solution.


Anyway, I fetched and installed this. The firmware descriptor looks
like:

{
   "interface-types": [
 "uefi"
   ],
   "mapping": {
 "device": "memory",
 "filename": "/usr/share/edk2/loongarch64/QEMU_EFI.fd"
   },
   "targets": [
 {
   "architecture": "loongarch64",
   "machines": [
 "virt",
 "virt-*"
   ]
 }
   ],
   "features": [
   "acpi"
   ]
 }

This is not what I expected: specifically, it results in libvirt
generating

-bios /usr/share/edk2/loongarch64/QEMU_EFI.fd

So only one of the two files is used, in read-only mode, and there is
no persistent NVRAM storage that the guest can use.

This is what I expected instead:

{
   "interface-types": [
 "uefi"
   ],
   "mapping": {
 "device": "flash",
 "mode": "split",
 "executable": {
   "filename": "/usr/share/edk2/loongarch64/QEMU_EFI.fd",
   "format": "raw"
 },
 "nvram-template": {
   "filename": "/usr/share/edk2/loongarch64/QEMU_VARS.fd",
   "format": "raw"
 }
   },
   "targets": [
 {
   "architecture": "loongarch64",
   "machines": [
 "virt",
 "virt-*"
   ]
 }
   ],
   "features": [
   "acpi"
  ]
}

I've tried installing such a descriptor and libvirt picks it up,
resulting in the following guest configuration:


  hvm
  


  
  /usr/share/edk2/loongarch64/QEMU_EFI.fd
  /var/lib/libvirt/qemu/nvram/guest_VARS.fd
  


which in turn produces the following QEMU command line options:

-blockdev 
'{"driver":"file","filename":"/usr/share/edk2/loongarch64/QEMU_EFI.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}'
-blockdev 
'{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}'
-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/guest_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}'
-blockdev 
'{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}'

Unfortunately, with this configuration the guest fails to start:

qemu-system-loongarch64: Property 'virt-machine.pflash0' not found

This error message looked familiar to me, as it is the same that I
hit when trying out UEFI support on RISC-V roughly a year ago[1]. In
this case, however, it seems that the issue runs deeper: it's not
just that the flash devices are not wired up to work as blockdevs,
but even the old -drive syntax doesn't work.

Looking at the QEMU code, it appears that the loongarch/virt machine
only creates a single pflash device and exposes it via -bios. So it
seems that there is simply no way to achieve the configuration that
we want.

I think that this is something that needs to be addressed as soon as
possible. In the long run, guest-accessible NVRAM storage is a must,
and I'm not sure it would make a lot of sense to merge loongarch
support into libvirt until the firmware situation has been sorted out
in the lower layers.

In the qemu code, loongarch virt machine does only create a pflash,

which is used for nvram, and uefi code is loaded by rom.

In summary, loongarch virt machine can use nvram with the following command:

---

qemu-system-loongarch64 \
-m 8G \
-smp 4 \
-cpu la464 \
-blockdev 
'{"driver":"file","filename":"./QEMU_VARS-pflash.raw","node-name":"libvirt-pflash0-storage","auto-read-only":false,"discard":"unmap"}'
 \
-blockdev 
'{"node-name":"libvirt-pflash0-format","read-only":false,"driver":"raw","file":"libvirt-pflash0-storage"}'
 \
-machine virt,pflash=libvirt-pflash0-format \
-bios ./QEMU_EFI.fd

---


This is really a big difference from the following boot method, and it still
looks weird.

---

-blockdev 
'{"driver":"file","filename":"/usr/share/edk2/loongarch64/QEMU_EFI.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}'
-blockdev 
'{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}'
-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/guest_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}'
-blockdev 
'{"no

Re: [PATCH v15 0/9] rutabaga_gfx + gfxstream

2024-01-30 Thread Gurchetan Singh

On Fri, Jan 26, 2024 at 6:23 AM Alyssa Ross  wrote:

> Gurchetan Singh  writes:
>
> > On Sat, Jan 20, 2024 at 4:19 AM Alyssa Ross  wrote:
> >
> >> Gurchetan Singh  writes:
> >>
> >> > On Fri, Jan 19, 2024 at 1:13 PM Alyssa Ross  wrote:
> >> >>
> >> >> Hi Gurchetan,
> >> >>
> >> >> > Thanks for the reminder.  I did make a request to create the
> release
> >> >> > tags, but changes were requested by Fedora packaging effort:
> >> >> >
> >> >> > https://bugzilla.redhat.com/show_bug.cgi?id=2242058
> >> >> > https://bugzilla.redhat.com/show_bug.cgi?id=2241701
> >> >> >
> >> >> > So the request was canceled, but never re-requested.  I'll fire off
> >> >> > another request, with:
> >> >> >
> >> >> > gfxstream: 23d05703b94035ac045df60823fb1fc4be0fdf1c ("gfxstream:
> >> >> > manually add debug logic")
> >> >> > AEMU: dd8b929c247ce9872c775e0e5ddc4300011d0e82 ("aemu: improve
> >> licensing")
> >> >> >
> >> >> > as the commits.  These match the Fedora requests, and the AEMU one
> has
> >> >> > been merged into Fedora already it seems.
> >> >>
> >> >> These revisions have the problem I mentioned in my previous message:
> >> >>
> >> >> >> The gfxstream ref mentioned here isn't compatible with
> >> >> >> v0.1.2-rutabaga-release, because it no longer provides
> >> logging_base.pc,
> >> >>
> >> >> rutabaga was not fixed to use the new AEMU package names until after
> the
> >> >> v0.1.2-rutabaga-release tag, in commit 5dfd74a06.  So will there be a
> >> >> new Rutabaga release that's compatible with these release versions of
> >> >> gfxstream and AEMU?
> >> >
> >> > Good catch.
> >> >
> >> > One possible workaround is to build gfxstream as a shared library.  I
> >> > think that would avoid rutabaga looking for AEMU package config files.
> >> >
> >> > But if another rutabaga release is desired with support for a static
> >> > library, then we can make that happen too.
> >>
> >> We're exclusively building gfxstream as a shared library.
> >>
> >> Looking at rutabaga's build.rs, it appears to me like pkg-config is
> >> always used for gfxstream unless overridden by GFXSTREAM_PATH.
> >>
> >
> > Hmm, it seems we should be checking pkg-config --static before looking
> for
> > AEMU in build.rs -- oh well.
> >
> > Would this be a suitable commit for the 0.1.3 release of rutabaga?
> >
> >
> https://chromium.googlesource.com/crosvm/crosvm/+/5dfd74a0680d317c6edf44138def886f47cb1c7c
> >
> > The gfxstream/AEMU commits would remain unchanged.
>
> That combination works for me.
>

Just FYI, still working on it.  Could take 1-2 more weeks.

Re: [PATCH v3 3/3] virtio-gpu-rutabaga.c: override resource_destroy method

2024-01-30 Thread Gurchetan Singh

On Tue, Jan 30, 2024 at 7:00 AM Manos Pitsidianakis <
manos.pitsidiana...@linaro.org> wrote:

> When the Rutabaga GPU device frees resources, it calls
> rutabaga_resource_unref for that resource_id. However, when the generic
> VirtIOGPU functions destroys resources, it only removes the
> virtio_gpu_simple_resource from the device's VirtIOGPU->reslist list.
> The rutabaga resource associated with that resource_id is then leaked.
>
> This commit overrides the resource_destroy class method introduced in
> the previous commit to fix this.
>

Reviewed-by: Gurchetan Singh 


>
> Signed-off-by: Manos Pitsidianakis 
> ---
>  hw/display/virtio-gpu-rutabaga.c | 47 
>  1 file changed, 35 insertions(+), 12 deletions(-)
>
> diff --git a/hw/display/virtio-gpu-rutabaga.c
> b/hw/display/virtio-gpu-rutabaga.c
> index 9e67f9bd51..17bf701a21 100644
> --- a/hw/display/virtio-gpu-rutabaga.c
> +++ b/hw/display/virtio-gpu-rutabaga.c
> @@ -148,14 +148,38 @@ rutabaga_cmd_create_resource_3d(VirtIOGPU *g,
>  }
>
>  static void
> +virtio_gpu_rutabaga_resource_unref(VirtIOGPU *g,
> +   struct virtio_gpu_simple_resource *res,
> +   Error **errp)
> +{
> +int32_t result;
> +VirtIOGPURutabaga *vr = VIRTIO_GPU_RUTABAGA(g);
> +
> +result = rutabaga_resource_unref(vr->rutabaga, res->resource_id);
> +if (result) {
> +error_setg_errno(errp,
> +(int)result,
> +"%s: rutabaga_resource_unref returned %"PRIi32
> +" for resource_id = %"PRIu32, __func__, result,
> +res->resource_id);
> +}
> +
> +if (res->image) {
> +pixman_image_unref(res->image);
> +}
> +
> +QTAILQ_REMOVE(&g->reslist, res, next);
> +g_free(res);
> +}

+
> +static void
>  rutabaga_cmd_resource_unref(VirtIOGPU *g,
>  struct virtio_gpu_ctrl_command *cmd)
>  {
> -int32_t result;
> +int32_t result = 0;
>  struct virtio_gpu_simple_resource *res;
>  struct virtio_gpu_resource_unref unref;
> -
> -VirtIOGPURutabaga *vr = VIRTIO_GPU_RUTABAGA(g);
> +Error *local_err = NULL;
>
>  VIRTIO_GPU_FILL_CMD(unref);
>
> @@ -164,15 +188,14 @@ rutabaga_cmd_resource_unref(VirtIOGPU *g,
>  res = virtio_gpu_find_resource(g, unref.resource_id);
>  CHECK(res, cmd);
>
> -result = rutabaga_resource_unref(vr->rutabaga, unref.resource_id);
> -CHECK(!result, cmd);
> -
> -if (res->image) {
> -pixman_image_unref(res->image);
> +virtio_gpu_rutabaga_resource_unref(g, res, &local_err);
> +if (local_err) {
> +error_report_err(local_err);
> +/* local_err was freed, do not reuse it. */
> +local_err = NULL;
> +result = 1;
>  }
> -
> -QTAILQ_REMOVE(&g->reslist, res, next);
> -g_free(res);
> +CHECK(!result, cmd);
>  }
>
>  static void
> @@ -1099,7 +1122,7 @@ static void
> virtio_gpu_rutabaga_class_init(ObjectClass *klass, void *data)
>  vgc->handle_ctrl = virtio_gpu_rutabaga_handle_ctrl;
>  vgc->process_cmd = virtio_gpu_rutabaga_process_cmd;
>  vgc->update_cursor_data = virtio_gpu_rutabaga_update_cursor;
> -
> +vgc->resource_destroy = virtio_gpu_rutabaga_resource_unref;
>  vdc->realize = virtio_gpu_rutabaga_realize;
>  device_class_set_props(dc, virtio_gpu_rutabaga_properties);
>  }
> --
> γαῖα πυρί μιχθήτω
>
>

Re: [PATCH] migration/docs: Explain two solutions for VMSD compatibility

On Tue, Jan 30, 2024 at 10:58:24AM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Mon, Jan 29, 2024 at 10:44:46AM -0300, Fabiano Rosas wrote:
> >> > Since we're at it, I would also like to know how you think about whether 
> >> > we
> >> > should still suggest people using VMSD versioning, as we know that it 
> >> > won't
> >> > work for backward migrations.
> >> >
> >> > My current thoughts is it is still fine, as it's easier to use, and it
> >> > should still be applicable to the cases where a strict migration 
> >> > semantics
> >> > are not required.  However it's hard to justify which device needs that
> >> > strictness.
> >> 
> >> I'd prefer if we kept things strict. However I don't think we can do
> >> that without having enough testing and specially, clear recipes on how
> >> to add compatibility back once it gets lost. Think of that recent thread
> >
> > If it was broken, IMHO we should just fix it and backport to stable.
> 
> (tangent)
> Sure, but I'm talking about how do we instruct device developers on
> fixing migration bugs. We cannot simply yell "regression!" and expect
> people to care.
> 
> Once something breaks there's no easy way to determine what's the right
> fix. It will always involve copying the migration maintainers and some
> back and forth with the device people before we reach an agreement on
> what's even broken.
> 
> When I say "clear recipes" what I mean is we'd have a "catalogue" of
> types of failures that could happen. Those would be both documented in
> plain english and also have some instrumentation in the code to produce
> a clear error/message.
> 
>   E.g.: "Device 'foo' failed to migrate because of error type X: the src
>   machine provided more state than the dst was expecting around the
>   value Y".
> 
> And that "error type X" would come with some docs listing examples of
> other similar errors and what strategies we suggest do deal with them.
> 
> Currently most migration failures are just a completely helpless:
> "blergh, error -5". And the only thing we can say about it upfront is
> "well, something must have changed in the stream".

Yes, IMHO it's because of the current design of VMSD isn't self describing,
then if some VMSD added new fields without boosting the version_id, what
can happen is when the destination reads that new VMSD field added it'll
assume it is the next thing to read, and that can be a completely different
VMSD for another device.  We just don't have anything to describe it.

E.g. for sending an uint32_t field, vmstate_info_uint32 only does
qemu_put_be32s(), we keep pushing data onto the wire without proper
description of that field.

We used to have some attempt describing these fields so it might be easier
at least for debugging, see:

commit 8118f0950fc77cce7873002a5021172dd6e040b5
Author: Alexander Graf 
Date:   Thu Jan 22 15:01:39 2015 +0100

migration: Append JSON description of migration stream

But that seems only for debugging.  E.g., that happens _after_ all vmstate
loaded.  So the reported error could be the same confusing, as when an
error happens it is before the JSON chunk ready.

We may be able to do better in this regard in the future, but that'll take
some thoughts and effort.

> 
> Real migration failures I have seen recently (all fixed already):
> 
> 1- Some feature bit was mistakenly removed from an arm cpu. Migration
>complains about a 'length' field being different.
> 
> 2- A group of devices was moved from the machine init to the cpu init on
>pseries. Migration spews some nonsense about an "index".
> 
> 3- Recent (invalid) bug on -cpu max on arm, a couple of bits were set in
>a register. Migration barfs incomprehensibly with: "error while
>loading state for instance 0x0 of device 'cpu', Operation not
>permitted".
> 
> So I bet we could improve these error cases to be a bit more predictable
> and that would help device developers to be able to maintain migration
> compatibility without making it seem like an arbitrary, hard to achieve
> requirement.
> (/tangent)

Right, there can be multiple ways to fail, and we may need to look into
them one by one.  They all take quite some effort.

> 
> >
> > I think Juan used to worry on what happens if someone already used an old
> > version of old release, e.g., someone using 8.2.0 may not be able to
> > migrate to 8.2.1 if we fix that breakage in 9.0 and backport that to 8.2.1.
> > My take is that maybe that's overcomplicated, and maybe we should simply
> > only maintain the latest stable version, rather than all.  In this case,
> > IMHO it will be less burden if we only guarantee 8.2.1 will be working,
> > e.g., when migrating from 8.1.z -> 8.2.1.  Then we should just state a
> > known issue in 8.2.0 that it is broken, and both:
> >
> >   (1) 8.1.z -> 8.2.0, and
> 
> Fair enough.
> 
> >   (2) 8.2.0 -> 8.2.1
> 
> Do you think we may not be able to always ensure that the user can get
> out of the broken version? Or do you simply think that's too

[PATCH v2] pc: q35: Bump max_cpus to 1856 vcpus

Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow up 
to 4096 vCPUs")
Linux kernel can support upto a maximum number of 4096 vCPUS when MAXSMP is
enabled in the kernel. At present, QEMU has been tested to correctly boot a
linux guest with 1856 vcpus and no more both with edk2 and seabios firmwares.
If an additional vcpu is added, that is with 1857 vcpus, edk2 currently fails
with the following error messages:

AllocatePages failed: No 0x400 Pages is available.
There is only left 0x2BF pages memory resource to be allocated.
ERROR: Out of aligned pages
ASSERT 
/builddir/build/BUILD/edk2-ba91d0292e/MdeModulePkg/Core/DxeIplPeim/X64/VirtualMemory.c(814):
 BigPageAddress != 0

This error exists only with edk2. Seabios currently can boot a linux guest
fine with 4096 vcpus. Since the lowest common denominator for a working VM for
both edk2 and seabios is 1856 vcpus, bump up the value max_cpus to 1856 for q35
machines versions 9 and newer. Q35 machines versions 8.2 and older continue
to support 1024 maximum vcpus as before for compatibility reasons.

If KVM is not able to support the specified number of vcpus, QEMU would
return the following error messages:

$ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728
qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested (1728) 
exceeds the recommended cpus supported by KVM (12)
qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus requested 
(1728) exceeds the recommended cpus supported by KVM (12)
Number of SMP cpus requested (1728) exceeds the maximum cpus supported by KVM 
(1024)

Cc: Daniel P. Berrangé 
Cc: Igor Mammedov 
Cc: Michael S. Tsirkin 
Cc: Julia Suvorova 
Cc: kra...@redhat.com
Signed-off-by: Ani Sinha 
---
 hw/i386/pc_q35.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Changelog:
v2: bump up the vcpu number to 1856. Add failure messages from ekd2 in
the commit description.
See also RH Jira https://issues.redhat.com/browse/RHEL-22202

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index f43d5142b8..f9c4b6594d 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -375,7 +375,7 @@ static void pc_q35_machine_options(MachineClass *m)
 m->default_nic = "e1000e";
 m->default_kernel_irqchip_split = false;
 m->no_floppy = 1;
-m->max_cpus = 1024;
+m->max_cpus = 1856;
 m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
@@ -396,6 +396,7 @@ static void pc_q35_8_2_machine_options(MachineClass *m)
 {
 pc_q35_9_0_machine_options(m);
 m->alias = NULL;
+m->max_cpus = 1024;
 compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);
 compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
 }
-- 
2.42.0

RE: [PATCH v3 0/4] Live Migration Acceleration with IAA Compression

2024-01-30 Thread Liu, Yuan1

> -Original Message-
> From: Peter Xu 
> Sent: Tuesday, January 30, 2024 6:32 PM
> To: Liu, Yuan1 
> Cc: faro...@suse.de; leob...@redhat.com; qemu-devel@nongnu.org; Zou,
> Nanhai 
> Subject: Re: [PATCH v3 0/4] Live Migration Acceleration with IAA
> Compression
> 
> On Tue, Jan 30, 2024 at 03:56:05AM +, Liu, Yuan1 wrote:
> > > -Original Message-
> > > From: Peter Xu 
> > > Sent: Monday, January 29, 2024 6:43 PM
> > > To: Liu, Yuan1 
> > > Cc: faro...@suse.de; leob...@redhat.com; qemu-devel@nongnu.org; Zou,
> > > Nanhai 
> > > Subject: Re: [PATCH v3 0/4] Live Migration Acceleration with IAA
> > > Compression
> > >
> > > On Wed, Jan 03, 2024 at 07:28:47PM +0800, Yuan Liu wrote:
> > > > Hi,
> > >
> > > Hi, Yuan,
> > >
> > > I have a few comments and questions.  Many of them can be pure
> > > questions as I don't know enough on these new technologies.
> > >
> > > >
> > > > I am writing to submit a code change aimed at enhancing live
> > > > migration acceleration by leveraging the compression capability of
> > > > the Intel In-Memory Analytics Accelerator (IAA).
> > > >
> > > > The implementation of the IAA (de)compression code is based on
> > > > Intel Query Processing Library (QPL), an open-source software
> > > > project designed for IAA high-level software programming.
> > > > https://github.com/intel/qpl
> > > >
> > > > In the last version, there was some discussion about whether to
> > > > introduce a new compression algorithm for IAA. Because the
> > > > compression algorithm of IAA hardware is based on deflate, and QPL
> > > > already supports Zlib, so in this version, I implemented IAA as an
> > > > accelerator for the Zlib compression method. However, due to some
> > > > reasons, QPL is currently not compatible with the existing Zlib
> > > > method that Zlib compressed data can be decompressed by QPl and vice
> versa.
> > > >
> > > > I have some concerns about the existing Zlib compression
> > > >   1. Will you consider supporting one channel to support multi-
> stream
> > > >  compression? Of course, this may lead to a reduction in
> compression
> > > >  ratio, but it will allow the hardware to process each stream
> > > >  concurrently. We can have each stream process multiple pages,
> > > >  reducing the loss of compression ratio. For example, 128 pages
> are
> > > >  divided into 16 streams for independent compression. I will
> provide
> > > >  the a early performance data in the next version(v4).
> > >
> > > I think Juan used to ask similar question: how much this can help if
> > > multifd can already achieve some form of concurrency over the pages?
> >
> >
> > > Couldn't the user specify more multifd channels if they want to
> > > grant more cpu resource for comp/decomp purpose?
> > >
> > > IOW, how many concurrent channels QPL can provide?  What is the
> > > suggested concurrency channels there?
> >
> > From the QPL software, there is no limit on the number of concurrent
> compression and decompression tasks.
> > From the IAA hardware, one IAA physical device can process two
> compressions concurrently or eight decompression tasks concurrently. There
> are up to 8 IAA devices on an Intel SPR Server and it will vary according
> to the customer’s product selection and deployment.
> >
> > Regarding the requirement for the number of concurrent channels, I think
> this may not be a bottleneck problem.
> > Please allow me to introduce a little more here
> >
> > 1. If the compression design is based on Zlib/Deflate/Gzip streaming
> mode, then we indeed need more channels to maintain concurrent processing.
> Because each time a multifd packet is compressed (including 128
> independent pages), it needs to be compressed page by page. These 128
> pages are not concurrent. The concurrency is reflected in the logic of
> multiple channels for the multifd packet.
> 
> Right.  However since you said there're only a max of 8 IAA devices, would
> it also mean n_multifd_threads=8 can be a good enough scenario to achieve
> proper concurrency, no matter the size of data chunk for one compression
> request?
> 
> Maybe you meant each device can still process concurrent compression
> requests, so the real capability of concurrency can be much larger than 8?

Yes, the number of concurrent requests can be greater than 8, one device can 
handle 2 compression requests or 8 decompression requests concurrently. 

> >
> > 2. Through testing, we prefer concurrent processing on 4K pages, not
> multifd packet, which means that 128 pages belonging to a packet can be
> compressed/decompressed concurrently. Even one channel can also utilize
> all the resources of IAA. But this is not compatible with existing zlib.
> > The code is similar to the following
> >   for(int i = 0; i < num_pages; i++) {
> > job[i]->input_data = pages[i]
> > submit_job(job[i] //Non-block submit for compression/decompression
> tasks
> >   }
> >   for(int i = 0; i < num_pages; i++) {
> > wait_job(job[i]

Re: Call for GSoC/Outreachy internship project ideas

2024-01-30 Thread Palmer Dabbelt


On Tue, 30 Jan 2024 17:26:11 PST (-0800), alistai...@gmail.com wrote:

On Wed, Jan 31, 2024 at 10:30 AM Palmer Dabbelt  wrote:


On Tue, 30 Jan 2024 12:28:27 PST (-0800), stefa...@gmail.com wrote:
> On Tue, 30 Jan 2024 at 14:40, Palmer Dabbelt  wrote:
>>
>> On Mon, 15 Jan 2024 08:32:59 PST (-0800), stefa...@gmail.com wrote:
>> > Dear QEMU and KVM communities,
>> > QEMU will apply for the Google Summer of Code and Outreachy internship
>> > programs again this year. Regular contributors can submit project
>> > ideas that they'd like to mentor by replying to this email before
>> > January 30th.
>>
>> It's the 30th, sorry if this is late but I just saw it today.  +Alistair
>> and Daniel, as I didn't sync up with anyone about this so not sure if
>> someone else is looking already (we're not internally).
>>
>> > Internship programs
>> > ---
>> > GSoC (https://summerofcode.withgoogle.com/) and Outreachy
>> > (https://www.outreachy.org/) offer paid open source remote work
>> > internships to eligible people wishing to participate in open source
>> > development. QEMU has been part of these internship programs for many
>> > years. Our mentors have enjoyed helping talented interns make their
>> > first open source contributions and some former interns continue to
>> > participate today.
>> >
>> > Who can mentor
>> > --
>> > Regular contributors to QEMU and KVM can participate as mentors.
>> > Mentorship involves about 5 hours of time commitment per week to
>> > communicate with the intern, review their patches, etc. Time is also
>> > required during the intern selection phase to communicate with
>> > applicants. Being a mentor is an opportunity to help someone get
>> > started in open source development, will give you experience with
>> > managing a project in a low-stakes environment, and a chance to
>> > explore interesting technical ideas that you may not have time to
>> > develop yourself.
>> >
>> > How to propose your idea
>> > --
>> > Reply to this email with the following project idea template filled in:
>> >
>> > === TITLE ===
>> >
>> > '''Summary:''' Short description of the project
>> >
>> > Detailed description of the project that explains the general idea,
>> > including a list of high-level tasks that will be completed by the
>> > project, and provides enough background for someone unfamiliar with
>> > the codebase to do research. Typically 2 or 3 paragraphs.
>> >
>> > '''Links:'''
>> > * Wiki links to relevant material
>> > * External links to mailing lists or web sites
>> >
>> > '''Details:'''
>> > * Skill level: beginner or intermediate or advanced
>> > * Language: C/Python/Rust/etc
>>
>> I'm not 100% sure this is a sane GSoC idea, as it's a bit open ended and
>> might have some tricky parts.  That said it's tripping some people up
>> and as far as I know nobody's started looking at it, so I figrued I'd
>> write something up.
>>
>> I can try and dig up some more links if folks thing it's interesting,
>> IIRC there's been a handful of bug reports related to very small loops
>> that run ~10x slower when vectorized.  Large benchmarks like SPEC have
>> also shown slowdowns.
>
> Hi Palmer,
> Performance optimization can be challenging for newcomers. I wouldn't
> recommend it for a GSoC project unless you have time to seed the
> project idea with specific optimizations to implement based on your
> experience and profiling. That way the intern has a solid starting
> point where they can have a few successes before venturing out to do
> their own performance analysis.

Ya, I agree.  That's part of the reason why I wasn't sure if it's a
good idea.  At least for this one I think there should be some easy to
understand performance issue, as the loops that go very slowly consist
of a small number of instructions and go a lot slower.

I'm actually more worried about this running into a rabbit hole of
adding new TCG operations or even just having no well defined mappings
between RVV and AVX, those might make the project really hard.

> Do you have the time to profile and add specifics to the project idea
> by Feb 21st? If that sounds good to you, I'll add it to the project
> ideas list and you can add more detailed tasks in the coming weeks.

I can at least dig up some of the examples I ran into, there's been a
handful filtering in over the last year or so.

This one

still has a much more than 10x slowdown (73ms -> 13s) with
vectorization, for example.


It's probably worth creating a Gitlab issue for this and adding all of
the examples there. That way we have a single place to store them all


Makes sense.  I think I'd been telling people to make bug reports for 
them, so there might be some in there already -- I just dug this one out 
of some history.


Here's a start: https://gitlab.com/qemu-project/qemu/-/issues/2137



Alistair



> Thanks,
> Stefan

Re: [PATCH v1 1/1] migration: prevent migration when VM has poisoned memory

On Tue, Jan 30, 2024 at 07:06:40PM +, “William Roche wrote:
> From: William Roche 
> 
> A memory page poisoned from the hypervisor level is no longer readable.
> The migration of a VM will crash Qemu when it tries to read the
> memory address space and stumbles on the poisoned page with a similar
> stack trace:
> 
> Program terminated with signal SIGBUS, Bus error.
> #0  _mm256_loadu_si256
> #1  buffer_zero_avx2
> #2  select_accel_fn
> #3  buffer_is_zero
> #4  save_zero_page
> #5  ram_save_target_page_legacy
> #6  ram_save_host_page
> #7  ram_find_and_save_block
> #8  ram_save_iterate
> #9  qemu_savevm_state_iterate
> #10 migration_iteration_run
> #11 migration_thread
> #12 qemu_thread_start
> 
> To avoid this VM crash during the migration, prevent the migration
> when a known hardware poison exists on the VM.
> 
> Signed-off-by: William Roche 

I queued it for now, while it'll always good to get feedback from either
Paolo or anyone else, as the pull won't happen in one week.  If no
objection it'll be included the next migration pull.

Thanks,

-- 
Peter Xu

Re: Call for GSoC/Outreachy internship project ideas

2024-01-30 Thread Alistair Francis

On Wed, Jan 31, 2024 at 10:30 AM Palmer Dabbelt  wrote:
>
> On Tue, 30 Jan 2024 12:28:27 PST (-0800), stefa...@gmail.com wrote:
> > On Tue, 30 Jan 2024 at 14:40, Palmer Dabbelt  wrote:
> >>
> >> On Mon, 15 Jan 2024 08:32:59 PST (-0800), stefa...@gmail.com wrote:
> >> > Dear QEMU and KVM communities,
> >> > QEMU will apply for the Google Summer of Code and Outreachy internship
> >> > programs again this year. Regular contributors can submit project
> >> > ideas that they'd like to mentor by replying to this email before
> >> > January 30th.
> >>
> >> It's the 30th, sorry if this is late but I just saw it today.  +Alistair
> >> and Daniel, as I didn't sync up with anyone about this so not sure if
> >> someone else is looking already (we're not internally).
> >>
> >> > Internship programs
> >> > ---
> >> > GSoC (https://summerofcode.withgoogle.com/) and Outreachy
> >> > (https://www.outreachy.org/) offer paid open source remote work
> >> > internships to eligible people wishing to participate in open source
> >> > development. QEMU has been part of these internship programs for many
> >> > years. Our mentors have enjoyed helping talented interns make their
> >> > first open source contributions and some former interns continue to
> >> > participate today.
> >> >
> >> > Who can mentor
> >> > --
> >> > Regular contributors to QEMU and KVM can participate as mentors.
> >> > Mentorship involves about 5 hours of time commitment per week to
> >> > communicate with the intern, review their patches, etc. Time is also
> >> > required during the intern selection phase to communicate with
> >> > applicants. Being a mentor is an opportunity to help someone get
> >> > started in open source development, will give you experience with
> >> > managing a project in a low-stakes environment, and a chance to
> >> > explore interesting technical ideas that you may not have time to
> >> > develop yourself.
> >> >
> >> > How to propose your idea
> >> > --
> >> > Reply to this email with the following project idea template filled in:
> >> >
> >> > === TITLE ===
> >> >
> >> > '''Summary:''' Short description of the project
> >> >
> >> > Detailed description of the project that explains the general idea,
> >> > including a list of high-level tasks that will be completed by the
> >> > project, and provides enough background for someone unfamiliar with
> >> > the codebase to do research. Typically 2 or 3 paragraphs.
> >> >
> >> > '''Links:'''
> >> > * Wiki links to relevant material
> >> > * External links to mailing lists or web sites
> >> >
> >> > '''Details:'''
> >> > * Skill level: beginner or intermediate or advanced
> >> > * Language: C/Python/Rust/etc
> >>
> >> I'm not 100% sure this is a sane GSoC idea, as it's a bit open ended and
> >> might have some tricky parts.  That said it's tripping some people up
> >> and as far as I know nobody's started looking at it, so I figrued I'd
> >> write something up.
> >>
> >> I can try and dig up some more links if folks thing it's interesting,
> >> IIRC there's been a handful of bug reports related to very small loops
> >> that run ~10x slower when vectorized.  Large benchmarks like SPEC have
> >> also shown slowdowns.
> >
> > Hi Palmer,
> > Performance optimization can be challenging for newcomers. I wouldn't
> > recommend it for a GSoC project unless you have time to seed the
> > project idea with specific optimizations to implement based on your
> > experience and profiling. That way the intern has a solid starting
> > point where they can have a few successes before venturing out to do
> > their own performance analysis.
>
> Ya, I agree.  That's part of the reason why I wasn't sure if it's a
> good idea.  At least for this one I think there should be some easy to
> understand performance issue, as the loops that go very slowly consist
> of a small number of instructions and go a lot slower.
>
> I'm actually more worried about this running into a rabbit hole of
> adding new TCG operations or even just having no well defined mappings
> between RVV and AVX, those might make the project really hard.
>
> > Do you have the time to profile and add specifics to the project idea
> > by Feb 21st? If that sounds good to you, I'll add it to the project
> > ideas list and you can add more detailed tasks in the coming weeks.
>
> I can at least dig up some of the examples I ran into, there's been a
> handful filtering in over the last year or so.
>
> This one
> 
> still has a much more than 10x slowdown (73ms -> 13s) with
> vectorization, for example.

It's probably worth creating a Gitlab issue for this and adding all of
the examples there. That way we have a single place to store them all

Alistair

>
> > Thanks,
> > Stefan
>

[PATCH] linux-user/aarch64: Extend PR_SET_TAGGED_ADDR_CTRL for FEAT_MTE3

2024-01-30 Thread Richard Henderson

When MTE3 is supported, the kernel maps
  PR_MTE_TCF_ASYNC | PR_MTE_TCF_SYNC
to
  MTE_CTRL_TCF_ASYMM
and from there to
  SCTLR_EL1.TCF0 = 3

There is no error reported for setting ASYNC | SYNC
when MTE3 is not supported; the kernel simply selects
the ASYNC behavior of TCG0=2.

Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/target_prctl.h | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/linux-user/aarch64/target_prctl.h 
b/linux-user/aarch64/target_prctl.h
index 5067e7d731..49bd16aa95 100644
--- a/linux-user/aarch64/target_prctl.h
+++ b/linux-user/aarch64/target_prctl.h
@@ -173,21 +173,22 @@ static abi_long 
do_prctl_set_tagged_addr_ctrl(CPUArchState *env, abi_long arg2)
 env->tagged_addr_enable = arg2 & PR_TAGGED_ADDR_ENABLE;
 
 if (cpu_isar_feature(aa64_mte, cpu)) {
-switch (arg2 & PR_MTE_TCF_MASK) {
-case PR_MTE_TCF_NONE:
-case PR_MTE_TCF_SYNC:
-case PR_MTE_TCF_ASYNC:
-break;
-default:
-return -EINVAL;
-}
-
 /*
  * Write PR_MTE_TCF to SCTLR_EL1[TCF0].
- * Note that the syscall values are consistent with hw.
+ * Note that SYNC | ASYNC -> ASYMM with FEAT_MTE3,
+ * otherwise mte_update_sctlr_user chooses ASYNC.
  */
-env->cp15.sctlr_el[1] =
-deposit64(env->cp15.sctlr_el[1], 38, 2, arg2 >> PR_MTE_TCF_SHIFT);
+unsigned tcf = 0;
+if (arg2 & PR_MTE_TCF_ASYNC) {
+if ((arg2 & PR_MTE_TCF_SYNC) && cpu_isar_feature(aa64_mte3, cpu)) {
+tcf = 3;
+} else {
+tcf = 2;
+}
+} else if (arg2 & PR_MTE_TCF_SYNC) {
+tcf = 1;
+}
+env->cp15.sctlr_el[1] = deposit64(env->cp15.sctlr_el[1], 38, 2, tcf);
 
 /*
  * Write PR_MTE_TAG to GCR_EL1[Exclude].
-- 
2.34.1

Re: Call for GSoC/Outreachy internship project ideas

2024-01-30 Thread Palmer Dabbelt

On Tue, 30 Jan 2024 12:28:27 PST (-0800), stefa...@gmail.com wrote:

On Tue, 30 Jan 2024 at 14:40, Palmer Dabbelt  wrote:

On Mon, 15 Jan 2024 08:32:59 PST (-0800), stefa...@gmail.com wrote:
> Dear QEMU and KVM communities,
> QEMU will apply for the Google Summer of Code and Outreachy internship
> programs again this year. Regular contributors can submit project
> ideas that they'd like to mentor by replying to this email before
> January 30th.

It's the 30th, sorry if this is late but I just saw it today.  +Alistair
and Daniel, as I didn't sync up with anyone about this so not sure if
someone else is looking already (we're not internally).

> Internship programs
> ---
> GSoC (https://summerofcode.withgoogle.com/) and Outreachy
> (https://www.outreachy.org/) offer paid open source remote work
> internships to eligible people wishing to participate in open source
> development. QEMU has been part of these internship programs for many
> years. Our mentors have enjoyed helping talented interns make their
> first open source contributions and some former interns continue to
> participate today.
>
> Who can mentor
> --
> Regular contributors to QEMU and KVM can participate as mentors.
> Mentorship involves about 5 hours of time commitment per week to
> communicate with the intern, review their patches, etc. Time is also
> required during the intern selection phase to communicate with
> applicants. Being a mentor is an opportunity to help someone get
> started in open source development, will give you experience with
> managing a project in a low-stakes environment, and a chance to
> explore interesting technical ideas that you may not have time to
> develop yourself.
>
> How to propose your idea
> --
> Reply to this email with the following project idea template filled in:
>
> === TITLE ===
>
> '''Summary:''' Short description of the project
>
> Detailed description of the project that explains the general idea,
> including a list of high-level tasks that will be completed by the
> project, and provides enough background for someone unfamiliar with
> the codebase to do research. Typically 2 or 3 paragraphs.
>
> '''Links:'''
> * Wiki links to relevant material
> * External links to mailing lists or web sites
>
> '''Details:'''
> * Skill level: beginner or intermediate or advanced
> * Language: C/Python/Rust/etc

I'm not 100% sure this is a sane GSoC idea, as it's a bit open ended and
might have some tricky parts.  That said it's tripping some people up
and as far as I know nobody's started looking at it, so I figrued I'd
write something up.

I can try and dig up some more links if folks thing it's interesting,
IIRC there's been a handful of bug reports related to very small loops
that run ~10x slower when vectorized.  Large benchmarks like SPEC have
also shown slowdowns.

Hi Palmer,
Performance optimization can be challenging for newcomers. I wouldn't
recommend it for a GSoC project unless you have time to seed the
project idea with specific optimizations to implement based on your
experience and profiling. That way the intern has a solid starting
point where they can have a few successes before venturing out to do
their own performance analysis.

Ya, I agree.  That's part of the reason why I wasn't sure if it's a 
good idea.  At least for this one I think there should be some easy to 
understand performance issue, as the loops that go very slowly consist 
of a small number of instructions and go a lot slower.

I'm actually more worried about this running into a rabbit hole of 
adding new TCG operations or even just having no well defined mappings 
between RVV and AVX, those might make the project really hard.

Do you have the time to profile and add specifics to the project idea
by Feb 21st? If that sounds good to you, I'll add it to the project
ideas list and you can add more detailed tasks in the coming weeks.

I can at least dig up some of the examples I ran into, there's been a 
handful filtering in over the last year or so.

This one 

still has a much more than 10x slowdown (73ms -> 13s) with 
vectorization, for example.

Thanks,
Stefan

[PATCH v16 1/6] hw/net: Add NPCMXXX GMAC device

From: Hao Wu 

This patch implements the basic registers of GMAC device and sets
registers for networking functionalities.
Squashed IRQ Implementation patch into this one for compliation.
Tested:
The following message shows up with the change:
Broadcom BCM54612E stmmac-0:00: attached PHY driver [Broadcom BCM54612E] 
(mii_bus:phy_addr=stmmac-0:00, irq=POLL)
stmmaceth f0802000.eth eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Change-Id: If71c6d486b95edcccba109ba454870714d7e0940
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan Diaz 
Reviewed-by: Tyrone Ting 
---
 hw/net/meson.build |   2 +-
 hw/net/npcm_gmac.c | 467 +
 hw/net/trace-events|  12 +
 include/hw/net/npcm_gmac.h | 343 +++
 4 files changed, 823 insertions(+), 1 deletion(-)
 create mode 100644 hw/net/npcm_gmac.c
 create mode 100644 include/hw/net/npcm_gmac.h

diff --git a/hw/net/meson.build b/hw/net/meson.build
index 9afceb0619..d4e1dc9838 100644
--- a/hw/net/meson.build
+++ b/hw/net/meson.build
@@ -38,7 +38,7 @@ system_ss.add(when: 'CONFIG_I82596_COMMON', if_true: 
files('i82596.c'))
 system_ss.add(when: 'CONFIG_SUNHME', if_true: files('sunhme.c'))
 system_ss.add(when: 'CONFIG_FTGMAC100', if_true: files('ftgmac100.c'))
 system_ss.add(when: 'CONFIG_SUNGEM', if_true: files('sungem.c'))
-system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c'))
+system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c', 
'npcm_gmac.c'))
 
 system_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c'))
 system_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c'))
diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
new file mode 100644
index 00..7118b4c7c7
--- /dev/null
+++ b/hw/net/npcm_gmac.c
@@ -0,0 +1,467 @@
+/*
+ * Nuvoton NPCM7xx/8xx GMAC Module
+ *
+ * Copyright 2024 Google LLC
+ * Authors:
+ * Hao Wu 
+ * Nabih Estefan 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * Unsupported/unimplemented features:
+ * - MII is not implemented, MII_ADDR.BUSY and MII_DATA always return zero
+ * - Precision timestamp (PTP) is not implemented.
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/registerfields.h"
+#include "hw/net/mii.h"
+#include "hw/net/npcm_gmac.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
+#include "sysemu/dma.h"
+#include "trace.h"
+
+REG32(NPCM_DMA_BUS_MODE, 0x1000)
+REG32(NPCM_DMA_XMT_POLL_DEMAND, 0x1004)
+REG32(NPCM_DMA_RCV_POLL_DEMAND, 0x1008)
+REG32(NPCM_DMA_RX_BASE_ADDR, 0x100c)
+REG32(NPCM_DMA_TX_BASE_ADDR, 0x1010)
+REG32(NPCM_DMA_STATUS, 0x1014)
+REG32(NPCM_DMA_CONTROL, 0x1018)
+REG32(NPCM_DMA_INTR_ENA, 0x101c)
+REG32(NPCM_DMA_MISSED_FRAME_CTR, 0x1020)
+REG32(NPCM_DMA_HOST_TX_DESC, 0x1048)
+REG32(NPCM_DMA_HOST_RX_DESC, 0x104c)
+REG32(NPCM_DMA_CUR_TX_BUF_ADDR, 0x1050)
+REG32(NPCM_DMA_CUR_RX_BUF_ADDR, 0x1054)
+REG32(NPCM_DMA_HW_FEATURE, 0x1058)
+
+REG32(NPCM_GMAC_MAC_CONFIG, 0x0)
+REG32(NPCM_GMAC_FRAME_FILTER, 0x4)
+REG32(NPCM_GMAC_HASH_HIGH, 0x8)
+REG32(NPCM_GMAC_HASH_LOW, 0xc)
+REG32(NPCM_GMAC_MII_ADDR, 0x10)
+REG32(NPCM_GMAC_MII_DATA, 0x14)
+REG32(NPCM_GMAC_FLOW_CTRL, 0x18)
+REG32(NPCM_GMAC_VLAN_FLAG, 0x1c)
+REG32(NPCM_GMAC_VERSION, 0x20)
+REG32(NPCM_GMAC_WAKEUP_FILTER, 0x28)
+REG32(NPCM_GMAC_PMT, 0x2c)
+REG32(NPCM_GMAC_LPI_CTRL, 0x30)
+REG32(NPCM_GMAC_TIMER_CTRL, 0x34)
+REG32(NPCM_GMAC_INT_STATUS, 0x38)
+REG32(NPCM_GMAC_INT_MASK, 0x3c)
+REG32(NPCM_GMAC_MAC0_ADDR_HI, 0x40)
+REG32(NPCM_GMAC_MAC0_ADDR_LO, 0x44)
+REG32(NPCM_GMAC_MAC1_ADDR_HI, 0x48)
+REG32(NPCM_GMAC_MAC1_ADDR_LO, 0x4c)
+REG32(NPCM_GMAC_MAC2_ADDR_HI, 0x50)
+REG32(NPCM_GMAC_MAC2_ADDR_LO, 0x54)
+REG32(NPCM_GMAC_MAC3_ADDR_HI, 0x58)
+REG32(NPCM_GMAC_MAC3_ADDR_LO, 0x5c)
+REG32(NPCM_GMAC_RGMII_STATUS, 0xd8)
+REG32(NPCM_GMAC_WATCHDOG, 0xdc)
+REG32(NPCM_GMAC_PTP_TCR, 0x700)
+REG32(NPCM_GMAC_PTP_SSIR, 0x704)
+REG32(NPCM_GMAC_PTP_STSR, 0x708)
+REG32(NPCM_GMAC_PTP_STNSR, 0x70c)
+REG32(NPCM_GMAC_PTP_STSUR, 0x710)
+REG32(NPCM_GMAC_PTP_STNSUR, 0x714)
+REG32(NPCM_GMAC_PTP_TAR, 0x718)
+REG32(NPCM_GMAC_PTP_TTSR, 0x71c)
+
+/* Register Fields */
+#define NPCM_GMAC_MII_ADDR_BUSY BIT(0)
+#define NPCM_GMAC_MII_ADDR_WRITEBIT(1)
+#define NPCM_GMAC_MII_ADDR_GR(rv)   extract16((rv), 6, 5)
+#define NPCM_GMAC_MII_ADDR_PA(rv)   extract16((rv), 11, 5)
+
+#define NPCM_GMAC_INT_MASK_LPIIMBIT(10)
+#define NPCM_GMAC_INT_MASK_PMTM BIT(3)
+#define NPCM_GMAC_INT_MASK_RGIM BIT(0)
+
+#define NPCM_DMA_BUS_MODE_SWR

[PATCH v16 0/6] Implementation of GMAC Networking Module

From: Nabih Estefan Diaz 

[Changes since v15]

Dropped PCI MBox patches. They were presenting a lot of problems with 
endianness and are not directly related to the GMAC. Breaking them apart to 
debug separately and let the GMAC itself be upstreamed faster.

[Changes since v14]
Expanded comment on chardev device and fixed comment formatting

[Changes since v13]
Added a couple clarifying comments and documentation about chardev
device expected protocol for ease of review.

[Changes since v12]
Fix errors found when testing in big-endian host.

[Changes since v11]
Branch couldn't be merged with master because of issues in patchset 6.
Fixed.

[Changes since v10]
Fixed macOS build issue. Changed imports to not be linux-specific.

[Changes since v9]
More cleanup and fixes based on suggestions from Peter Maydell
(peter.mayd...@linaro.org) suggestions.

[Changes since v8]
Suggestions and Fixes from Peter Maydell (peter.mayd...@linaro.org),
also cleaned up changes so nothing is deleted in a later patch that was
added in an earlier patch. Patch count decresed by 1 because this cleanup
led to one of the patches being irrelevant.

[Changes since v7]
Fixed patch 4 declaration of new NIC based on comments by Peter Maydell
(peter.mayd...@linaro.org)

[Changes since v6]
Remove the Change-Ids from the commit messages.

[Changes since v5]
Undid remove of some qtests that seem to have been caused by a merge
conflict.

[Changes since v4]
Added Signed-off-by tag and fixed patch 4 commit message as suggested by
Peter Maydell (peter.mayd...@linaro.org)

[Changes since v3]
Fixed comments from Hao Wu (wuhao...@google.com)

[Changes since v2]
Fixed bugs related to the RC functionality of the GMAC. Added and
squashed patches related to that.

[Changes since v1]
Fixed some errors in formatting.
Fixed a merge error that I didn't see in v1.
Removed Nuvoton 8xx references since that is a separate patch set.

[Original Cover]
Creates NPI Mailbox Module with data verification for read and write (internal 
and external),
wiring to the Nuvoton SoC, and QTests.

Also creates the GMAC Networking Module. Implements read and write 
functionalities with cooresponding descriptors
and registers. Also includes QTests for the different functionalities.

Hao Wu (2):
  hw/net: Add NPCMXXX GMAC device
  hw/arm: Add GMAC devices to NPCM7XX SoC

Nabih Estefan Diaz (4):
  tests/qtest: Creating qtest for GMAC Module
  hw/net: GMAC Rx Implementation
  hw/net: GMAC Tx Implementation
  tests/qtest: Adding PCS Module test to GMAC Qtest

 hw/arm/npcm7xx.c |  37 +-
 hw/net/meson.build   |   2 +-
 hw/net/npcm_gmac.c   | 942 +++
 hw/net/trace-events  |  19 +
 include/hw/arm/npcm7xx.h |   2 +
 include/hw/net/npcm_gmac.h   | 343 +
 tests/qtest/meson.build  |   1 +
 tests/qtest/npcm_gmac-test.c | 344 +
 8 files changed, 1687 insertions(+), 3 deletions(-)
 create mode 100644 hw/net/npcm_gmac.c
 create mode 100644 include/hw/net/npcm_gmac.h
 create mode 100644 tests/qtest/npcm_gmac-test.c

-- 
2.43.0.429.g432eaa2c6b-goog

[PATCH v16 4/6] hw/net: GMAC Rx Implementation

From: Nabih Estefan Diaz 

- Implementation of Receive function for packets
- Implementation for reading and writing from and to descriptors in
  memory for Rx

When RX starts, we need to flush the queued packets so that they
can be received by the GMAC device. Without this it won't work
with TAP NIC device.

When RX descriptor list is full, it returns a DMA_STATUS for
software to handle it. But there's no way to indicate the software has
handled all RX descriptors and the whole pipeline stalls.

We do something similar to NPCM7XX EMC to handle this case.

1. Return packet size when RX descriptor is full, effectively dropping
these packets in such a case.
2. When software clears RX descriptor full bit, continue receiving
further packets by flushing QEMU packet queue.

Added relevant trace-events

Change-Id: I132aa254a94cda1a586aba2ea33bbfc74ecdb831
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/net/npcm_gmac.c  | 276 +++-
 hw/net/trace-events |   5 +
 2 files changed, 279 insertions(+), 2 deletions(-)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index 7118b4c7c7..a3c626e1b8 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -27,6 +27,10 @@
 #include "hw/net/mii.h"
 #include "hw/net/npcm_gmac.h"
 #include "migration/vmstate.h"
+#include "net/checksum.h"
+#include "net/eth.h"
+#include "net/net.h"
+#include "qemu/cutils.h"
 #include "qemu/log.h"
 #include "qemu/units.h"
 #include "sysemu/dma.h"
@@ -149,6 +153,17 @@ static void gmac_phy_set_link(NPCMGMACState *gmac, bool 
active)
 
 static bool gmac_can_receive(NetClientState *nc)
 {
+NPCMGMACState *gmac = NPCM_GMAC(qemu_get_nic_opaque(nc));
+
+/* If GMAC receive is disabled. */
+if (!(gmac->regs[R_NPCM_GMAC_MAC_CONFIG] & NPCM_GMAC_MAC_CONFIG_RX_EN)) {
+return false;
+}
+
+/* If GMAC DMA RX is stopped. */
+if (!(gmac->regs[R_NPCM_DMA_CONTROL] & NPCM_DMA_CONTROL_START_STOP_RX)) {
+return false;
+}
 return true;
 }
 
@@ -192,12 +207,258 @@ static void gmac_update_irq(NPCMGMACState *gmac)
 qemu_set_irq(gmac->irq, level);
 }
 
-static ssize_t gmac_receive(NetClientState *nc, const uint8_t *buf, size_t len)
+static int gmac_read_rx_desc(dma_addr_t addr, struct NPCMGMACRxDesc *desc)
+{
+if (dma_memory_read(&address_space_memory, addr, desc,
+sizeof(*desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+desc->rdes0 = le32_to_cpu(desc->rdes0);
+desc->rdes1 = le32_to_cpu(desc->rdes1);
+desc->rdes2 = le32_to_cpu(desc->rdes2);
+desc->rdes3 = le32_to_cpu(desc->rdes3);
+return 0;
+}
+
+static int gmac_write_rx_desc(dma_addr_t addr, struct NPCMGMACRxDesc *desc)
 {
-/* Placeholder. Function will be filled in following patches */
+struct NPCMGMACRxDesc le_desc;
+le_desc.rdes0 = cpu_to_le32(desc->rdes0);
+le_desc.rdes1 = cpu_to_le32(desc->rdes1);
+le_desc.rdes2 = cpu_to_le32(desc->rdes2);
+le_desc.rdes3 = cpu_to_le32(desc->rdes3);
+if (dma_memory_write(&address_space_memory, addr, &le_desc,
+sizeof(le_desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
 return 0;
 }
 
+static int gmac_rx_transfer_frame_to_buffer(uint32_t rx_buf_len,
+uint32_t *left_frame,
+uint32_t rx_buf_addr,
+bool *eof_transferred,
+const uint8_t **frame_ptr,
+uint16_t *transferred)
+{
+uint32_t to_transfer;
+/*
+ * Check that buffer is bigger than the frame being transfered
+ * If bigger then transfer only whats left of frame
+ * Else, fill frame with all the content possible
+ */
+if (rx_buf_len >= *left_frame) {
+to_transfer = *left_frame;
+*eof_transferred = true;
+} else {
+to_transfer = rx_buf_len;
+}
+
+/* write frame part to memory */
+if (dma_memory_write(&address_space_memory, (uint64_t) rx_buf_addr,
+ *frame_ptr, to_transfer, MEMTXATTRS_UNSPECIFIED)) {
+return -1;
+}
+
+/* update frame pointer and size of whats left of frame */
+*frame_ptr += to_transfer;
+*left_frame -= to_transfer;
+*transferred += to_transfer;
+
+return 0;
+}
+
+static void gmac_dma_set_state(NPCMGMACState *gmac, int shift, uint32_t state)
+{
+gmac->regs[R_NPCM_DMA_STATUS] = deposit32(gmac->regs[R_NPCM_DMA_STATUS],
+shift, 3, state);
+}
+
+static ssize_t gmac_receive(NetClientState *nc, const uint8_t *buf, size_t len)
+{
+/*
+ * Commen

[PATCH v16 5/6] hw/net: GMAC Tx Implementation

From: Nabih Estefan Diaz 

- Implementation of Transmit function for packets
- Implementation for reading and writing from and to descriptors in
  memory for Tx

Added relevant trace-events

NOTE: This function implements the steps detailed in the datasheet for
transmitting messages from the GMAC.

Change-Id: Icf14f9fcc6cc7808a41acd872bca67c9832087e6
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/net/npcm_gmac.c  | 203 
 hw/net/trace-events |   2 +
 2 files changed, 205 insertions(+)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index a3c626e1b8..1b71e2526e 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -238,6 +238,37 @@ static int gmac_write_rx_desc(dma_addr_t addr, struct 
NPCMGMACRxDesc *desc)
 return 0;
 }
 
+static int gmac_read_tx_desc(dma_addr_t addr, struct NPCMGMACTxDesc *desc)
+{
+if (dma_memory_read(&address_space_memory, addr, desc,
+sizeof(*desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+desc->tdes0 = le32_to_cpu(desc->tdes0);
+desc->tdes1 = le32_to_cpu(desc->tdes1);
+desc->tdes2 = le32_to_cpu(desc->tdes2);
+desc->tdes3 = le32_to_cpu(desc->tdes3);
+return 0;
+}
+
+static int gmac_write_tx_desc(dma_addr_t addr, struct NPCMGMACTxDesc *desc)
+{
+struct NPCMGMACTxDesc le_desc;
+le_desc.tdes0 = cpu_to_le32(desc->tdes0);
+le_desc.tdes1 = cpu_to_le32(desc->tdes1);
+le_desc.tdes2 = cpu_to_le32(desc->tdes2);
+le_desc.tdes3 = cpu_to_le32(desc->tdes3);
+if (dma_memory_write(&address_space_memory, addr, &le_desc,
+sizeof(le_desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+return 0;
+}
+
 static int gmac_rx_transfer_frame_to_buffer(uint32_t rx_buf_len,
 uint32_t *left_frame,
 uint32_t rx_buf_addr,
@@ -459,6 +490,155 @@ static ssize_t gmac_receive(NetClientState *nc, const 
uint8_t *buf, size_t len)
 return len;
 }
 
+static int gmac_tx_get_csum(uint32_t tdes1)
+{
+uint32_t mask = TX_DESC_TDES1_CHKSM_INS_CTRL_MASK(tdes1);
+int csum = 0;
+
+if (likely(mask > 0)) {
+csum |= CSUM_IP;
+}
+if (likely(mask > 1)) {
+csum |= CSUM_TCP | CSUM_UDP;
+}
+
+return csum;
+}
+
+static void gmac_try_send_next_packet(NPCMGMACState *gmac)
+{
+/*
+ * Comments about steps refer to steps for
+ * transmitting in page 384 of datasheet
+ */
+uint16_t tx_buffer_size = 2048;
+g_autofree uint8_t *tx_send_buffer = g_malloc(tx_buffer_size);
+uint32_t desc_addr;
+struct NPCMGMACTxDesc tx_desc;
+uint32_t tx_buf_addr, tx_buf_len;
+uint16_t length = 0;
+uint8_t *buf = tx_send_buffer;
+uint32_t prev_buf_size = 0;
+int csum = 0;
+
+/* steps 1&2 */
+if (!gmac->regs[R_NPCM_DMA_HOST_TX_DESC]) {
+gmac->regs[R_NPCM_DMA_HOST_TX_DESC] =
+NPCM_DMA_HOST_TX_DESC_MASK(gmac->regs[R_NPCM_DMA_TX_BASE_ADDR]);
+}
+desc_addr = gmac->regs[R_NPCM_DMA_HOST_TX_DESC];
+
+while (true) {
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_RUNNING_FETCHING_STATE);
+if (gmac_read_tx_desc(desc_addr, &tx_desc)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "TX Descriptor @ 0x%x can't be read\n",
+  desc_addr);
+return;
+}
+/* step 3 */
+
+trace_npcm_gmac_packet_desc_read(DEVICE(gmac)->canonical_path,
+desc_addr);
+trace_npcm_gmac_debug_desc_data(DEVICE(gmac)->canonical_path, &tx_desc,
+tx_desc.tdes0, tx_desc.tdes1, tx_desc.tdes2, tx_desc.tdes3);
+
+/* 1 = DMA Owned, 0 = Software Owned */
+if (!(tx_desc.tdes0 & TX_DESC_TDES0_OWN)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "TX Descriptor @ 0x%x is owned by software\n",
+  desc_addr);
+gmac->regs[R_NPCM_DMA_STATUS] |= NPCM_DMA_STATUS_TU;
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_SUSPENDED_STATE);
+gmac_update_irq(gmac);
+return;
+}
+
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_RUNNING_READ_STATE);
+/* Give the descriptor back regardless of what happens. */
+tx_desc.tdes0 &= ~TX_DESC_TDES0_OWN;
+
+if (tx_desc.tdes1 & TX_DESC_TDES1_FIRST_SEG_MASK) {
+csum = gmac_tx_get_csum(tx_desc.tdes1);
+}
+
+/* step 4 */
+tx_buf_addr = tx_desc.tdes2;
+

[PATCH v16 6/6] tests/qtest: Adding PCS Module test to GMAC Qtest

From: Nabih Estefan Diaz 

 - Add PCS Register check to npcm_gmac-test

Change-Id: I34821beb5e0b1e89e2be576ab58eabe41545af12
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 tests/qtest/npcm_gmac-test.c | 132 +++
 1 file changed, 132 insertions(+)

diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
index 72c68874df..9e58b15ca1 100644
--- a/tests/qtest/npcm_gmac-test.c
+++ b/tests/qtest/npcm_gmac-test.c
@@ -23,6 +23,10 @@
 /* Name of the GMAC Device */
 #define TYPE_NPCM_GMAC "npcm-gmac"
 
+/* Address of the PCS Module */
+#define PCS_BASE_ADDRESS 0xf078
+#define NPCM_PCS_IND_AC_BA 0x1fe
+
 typedef struct GMACModule {
 int irq;
 uint64_t base_addr;
@@ -114,6 +118,62 @@ typedef enum NPCMRegister {
 NPCM_GMAC_PTP_STNSUR = 0x714,
 NPCM_GMAC_PTP_TAR = 0x718,
 NPCM_GMAC_PTP_TTSR = 0x71c,
+
+/* PCS Registers */
+NPCM_PCS_SR_CTL_ID1 = 0x3c0008,
+NPCM_PCS_SR_CTL_ID2 = 0x3c000a,
+NPCM_PCS_SR_CTL_STS = 0x3c0010,
+
+NPCM_PCS_SR_MII_CTRL = 0x3e,
+NPCM_PCS_SR_MII_STS = 0x3e0002,
+NPCM_PCS_SR_MII_DEV_ID1 = 0x3e0004,
+NPCM_PCS_SR_MII_DEV_ID2 = 0x3e0006,
+NPCM_PCS_SR_MII_AN_ADV = 0x3e0008,
+NPCM_PCS_SR_MII_LP_BABL = 0x3e000a,
+NPCM_PCS_SR_MII_AN_EXPN = 0x3e000c,
+NPCM_PCS_SR_MII_EXT_STS = 0x3e001e,
+
+NPCM_PCS_SR_TIM_SYNC_ABL = 0x3e0e10,
+NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR = 0x3e0e12,
+NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR = 0x3e0e14,
+NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR = 0x3e0e16,
+NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR = 0x3e0e18,
+NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR = 0x3e0e1a,
+NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR = 0x3e0e1c,
+NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR = 0x3e0e1e,
+NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR = 0x3e0e20,
+
+NPCM_PCS_VR_MII_MMD_DIG_CTRL1 = 0x3f,
+NPCM_PCS_VR_MII_AN_CTRL = 0x3f0002,
+NPCM_PCS_VR_MII_AN_INTR_STS = 0x3f0004,
+NPCM_PCS_VR_MII_TC = 0x3f0006,
+NPCM_PCS_VR_MII_DBG_CTRL = 0x3f000a,
+NPCM_PCS_VR_MII_EEE_MCTRL0 = 0x3f000c,
+NPCM_PCS_VR_MII_EEE_TXTIMER = 0x3f0010,
+NPCM_PCS_VR_MII_EEE_RXTIMER = 0x3f0012,
+NPCM_PCS_VR_MII_LINK_TIMER_CTRL = 0x3f0014,
+NPCM_PCS_VR_MII_EEE_MCTRL1 = 0x3f0016,
+NPCM_PCS_VR_MII_DIG_STS = 0x3f0020,
+NPCM_PCS_VR_MII_ICG_ERRCNT1 = 0x3f0022,
+NPCM_PCS_VR_MII_MISC_STS = 0x3f0030,
+NPCM_PCS_VR_MII_RX_LSTS = 0x3f0040,
+NPCM_PCS_VR_MII_MP_TX_BSTCTRL0 = 0x3f0070,
+NPCM_PCS_VR_MII_MP_TX_LVLCTRL0 = 0x3f0074,
+NPCM_PCS_VR_MII_MP_TX_GENCTRL0 = 0x3f007a,
+NPCM_PCS_VR_MII_MP_TX_GENCTRL1 = 0x3f007c,
+NPCM_PCS_VR_MII_MP_TX_STS = 0x3f0090,
+NPCM_PCS_VR_MII_MP_RX_GENCTRL0 = 0x3f00b0,
+NPCM_PCS_VR_MII_MP_RX_GENCTRL1 = 0x3f00b2,
+NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0 = 0x3f00ba,
+NPCM_PCS_VR_MII_MP_MPLL_CTRL0 = 0x3f00f0,
+NPCM_PCS_VR_MII_MP_MPLL_CTRL1 = 0x3f00f2,
+NPCM_PCS_VR_MII_MP_MPLL_STS = 0x3f0110,
+NPCM_PCS_VR_MII_MP_MISC_CTRL2 = 0x3f0126,
+NPCM_PCS_VR_MII_MP_LVL_CTRL = 0x3f0130,
+NPCM_PCS_VR_MII_MP_MISC_CTRL0 = 0x3f0132,
+NPCM_PCS_VR_MII_MP_MISC_CTRL1 = 0x3f0134,
+NPCM_PCS_VR_MII_DIG_CTRL2 = 0x3f01c2,
+NPCM_PCS_VR_MII_DIG_ERRCNT_SEL = 0x3f01c4,
 } NPCMRegister;
 
 static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
@@ -122,6 +182,15 @@ static uint32_t gmac_read(QTestState *qts, const 
GMACModule *mod,
 return qtest_readl(qts, mod->base_addr + regno);
 }
 
+static uint16_t pcs_read(QTestState *qts, const GMACModule *mod,
+  NPCMRegister regno)
+{
+uint32_t write_value = (regno & 0x3ffe00) >> 9;
+qtest_writel(qts, PCS_BASE_ADDRESS + NPCM_PCS_IND_AC_BA, write_value);
+uint32_t read_offset = regno & 0x1ff;
+return qtest_readl(qts, PCS_BASE_ADDRESS + read_offset);
+}
+
 /* Check that GMAC registers are reset to default value */
 static void test_init(gconstpointer test_data)
 {
@@ -134,6 +203,11 @@ static void test_init(gconstpointer test_data)
 g_assert_cmphex(gmac_read(qts, mod, (regno)), ==, (value)); \
 } while (0)
 
+#define CHECK_REG_PCS(regno, value) \
+do { \
+g_assert_cmphex(pcs_read(qts, mod, (regno)), ==, (value)); \
+} while (0)
+
 CHECK_REG32(NPCM_DMA_BUS_MODE, 0x00020100);
 CHECK_REG32(NPCM_DMA_XMT_POLL_DEMAND, 0);
 CHECK_REG32(NPCM_DMA_RCV_POLL_DEMAND, 0);
@@ -183,6 +257,64 @@ static void test_init(gconstpointer test_data)
 CHECK_REG32(NPCM_GMAC_PTP_TAR, 0);
 CHECK_REG32(NPCM_GMAC_PTP_TTSR, 0);
 
+/* TODO Add registers PCS */
+if (mod->base_addr == 0xf0802000) {
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID1, 0x699e);
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID2, 0);
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_STS, 0x8000);
+
+CHECK_REG_PCS(NPCM_PCS_SR_MII_CTRL, 0x1140);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_STS, 0x0109);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID1, 0x699e);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID2, 0x0ced0);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_

[PATCH v16 2/6] hw/arm: Add GMAC devices to NPCM7XX SoC

From: Hao Wu 

Change-Id: Id8a3461fb5042adc4c3fd6f4fbd1ca0d33e22565
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/arm/npcm7xx.c | 37 +++--
 include/hw/arm/npcm7xx.h |  2 ++
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index e3243a520d..3b29206265 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -84,8 +84,10 @@ enum NPCM7xxInterrupt {
 NPCM7XX_UART1_IRQ,
 NPCM7XX_UART2_IRQ,
 NPCM7XX_UART3_IRQ,
+NPCM7XX_GMAC1_IRQ   = 14,
 NPCM7XX_EMC1RX_IRQ  = 15,
 NPCM7XX_EMC1TX_IRQ,
+NPCM7XX_GMAC2_IRQ,
 NPCM7XX_MMC_IRQ = 26,
 NPCM7XX_PSPI2_IRQ   = 28,
 NPCM7XX_PSPI1_IRQ   = 31,
@@ -229,6 +231,12 @@ static const hwaddr npcm7xx_pspi_addr[] = {
 0xf0201000,
 };
 
+/* Register base address for each GMAC Module */
+static const hwaddr npcm7xx_gmac_addr[] = {
+0xf0802000,
+0xf0804000,
+};
+
 static const struct {
 hwaddr regs_addr;
 uint32_t unconnected_pins;
@@ -456,6 +464,10 @@ static void npcm7xx_init(Object *obj)
 for (i = 0; i < ARRAY_SIZE(s->pspi); i++) {
 object_initialize_child(obj, "pspi[*]", &s->pspi[i], TYPE_NPCM_PSPI);
 }
+
+for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
+object_initialize_child(obj, "gmac[*]", &s->gmac[i], TYPE_NPCM_GMAC);
+}
 
 object_initialize_child(obj, "mmc", &s->mmc, TYPE_NPCM7XX_SDHCI);
 }
@@ -688,6 +700,29 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 sysbus_connect_irq(sbd, 1, npcm7xx_irq(s, rx_irq));
 }
 
+/*
+ * GMAC Modules. Cannot fail.
+ */
+QEMU_BUILD_BUG_ON(ARRAY_SIZE(npcm7xx_gmac_addr) != ARRAY_SIZE(s->gmac));
+QEMU_BUILD_BUG_ON(ARRAY_SIZE(s->gmac) != 2);
+for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
+SysBusDevice *sbd = SYS_BUS_DEVICE(&s->gmac[i]);
+
+/*
+ * The device exists regardless of whether it's connected to a QEMU
+ * netdev backend. So always instantiate it even if there is no
+ * backend.
+ */
+sysbus_realize(sbd, &error_abort);
+sysbus_mmio_map(sbd, 0, npcm7xx_gmac_addr[i]);
+int irq = i == 0 ? NPCM7XX_GMAC1_IRQ : NPCM7XX_GMAC2_IRQ;
+/*
+ * N.B. The values for the second argument sysbus_connect_irq are
+ * chosen to match the registration order in npcm7xx_emc_realize.
+ */
+sysbus_connect_irq(sbd, 0, npcm7xx_irq(s, irq));
+}
+
 /*
  * Flash Interface Unit (FIU). Can fail if incorrect number of chip selects
  * specified, but this is a programming error.
@@ -750,8 +785,6 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 create_unimplemented_device("npcm7xx.siox[2]",  0xf0102000,   4 * KiB);
 create_unimplemented_device("npcm7xx.ahbpci",   0xf040,   1 * MiB);
 create_unimplemented_device("npcm7xx.mcphy",0xf05f,  64 * KiB);
-create_unimplemented_device("npcm7xx.gmac1",0xf0802000,   8 * KiB);
-create_unimplemented_device("npcm7xx.gmac2",0xf0804000,   8 * KiB);
 create_unimplemented_device("npcm7xx.vcd",  0xf081,  64 * KiB);
 create_unimplemented_device("npcm7xx.ece",  0xf082,   8 * KiB);
 create_unimplemented_device("npcm7xx.vdma", 0xf0822000,   8 * KiB);
diff --git a/include/hw/arm/npcm7xx.h b/include/hw/arm/npcm7xx.h
index 72c7722096..4e0d210188 100644
--- a/include/hw/arm/npcm7xx.h
+++ b/include/hw/arm/npcm7xx.h
@@ -29,6 +29,7 @@
 #include "hw/misc/npcm7xx_pwm.h"
 #include "hw/misc/npcm7xx_rng.h"
 #include "hw/net/npcm7xx_emc.h"
+#include "hw/net/npcm_gmac.h"
 #include "hw/nvram/npcm7xx_otp.h"
 #include "hw/timer/npcm7xx_timer.h"
 #include "hw/ssi/npcm7xx_fiu.h"
@@ -104,6 +105,7 @@ struct NPCM7xxState {
 OHCISysBusState ohci;
 NPCM7xxFIUState fiu[2];
 NPCM7xxEMCState emc[2];
+NPCMGMACState   gmac[2];
 NPCM7xxSDHCIState   mmc;
 NPCMPSPIState   pspi[2];
 };
-- 
2.43.0.429.g432eaa2c6b-goog

[PATCH v16 3/6] tests/qtest: Creating qtest for GMAC Module

From: Nabih Estefan Diaz 

 - Created qtest to check initialization of registers in GMAC Module.
 - Implemented test into Build File.

Change-Id: I8b2fe152d3987a7eec4cf6a1d25ba92e75a5391d
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 tests/qtest/meson.build  |   1 +
 tests/qtest/npcm_gmac-test.c | 212 +++
 2 files changed, 213 insertions(+)
 create mode 100644 tests/qtest/npcm_gmac-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 84a055a7d9..016cd77d20 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -230,6 +230,7 @@ qtests_aarch64 = \
   (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) +  \
   (config_all_accel.has_key('CONFIG_TCG') and  
  \
config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : 
[]) + \
+  (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
   ['arm-cpu-features',
'numa-test',
'boot-serial-test',
diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
new file mode 100644
index 00..72c68874df
--- /dev/null
+++ b/tests/qtest/npcm_gmac-test.c
@@ -0,0 +1,212 @@
+/*
+ * QTests for Nuvoton NPCM7xx/8xx GMAC Modules.
+ *
+ * Copyright 2024 Google LLC
+ * Authors:
+ * Hao Wu 
+ * Nabih Estefan 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "libqos/libqos.h"
+
+/* Name of the GMAC Device */
+#define TYPE_NPCM_GMAC "npcm-gmac"
+
+typedef struct GMACModule {
+int irq;
+uint64_t base_addr;
+} GMACModule;
+
+typedef struct TestData {
+const GMACModule *module;
+} TestData;
+
+/* Values extracted from hw/arm/npcm8xx.c */
+static const GMACModule gmac_module_list[] = {
+{
+.irq= 14,
+.base_addr  = 0xf0802000
+},
+{
+.irq= 15,
+.base_addr  = 0xf0804000
+},
+{
+.irq= 16,
+.base_addr  = 0xf0806000
+},
+{
+.irq= 17,
+.base_addr  = 0xf0808000
+}
+};
+
+/* Returns the index of the GMAC module. */
+static int gmac_module_index(const GMACModule *mod)
+{
+ptrdiff_t diff = mod - gmac_module_list;
+
+g_assert_true(diff >= 0 && diff < ARRAY_SIZE(gmac_module_list));
+
+return diff;
+}
+
+/* 32-bit register indices. Taken from npcm_gmac.c */
+typedef enum NPCMRegister {
+/* DMA Registers */
+NPCM_DMA_BUS_MODE = 0x1000,
+NPCM_DMA_XMT_POLL_DEMAND = 0x1004,
+NPCM_DMA_RCV_POLL_DEMAND = 0x1008,
+NPCM_DMA_RCV_BASE_ADDR = 0x100c,
+NPCM_DMA_TX_BASE_ADDR = 0x1010,
+NPCM_DMA_STATUS = 0x1014,
+NPCM_DMA_CONTROL = 0x1018,
+NPCM_DMA_INTR_ENA = 0x101c,
+NPCM_DMA_MISSED_FRAME_CTR = 0x1020,
+NPCM_DMA_HOST_TX_DESC = 0x1048,
+NPCM_DMA_HOST_RX_DESC = 0x104c,
+NPCM_DMA_CUR_TX_BUF_ADDR = 0x1050,
+NPCM_DMA_CUR_RX_BUF_ADDR = 0x1054,
+NPCM_DMA_HW_FEATURE = 0x1058,
+
+/* GMAC Registers */
+NPCM_GMAC_MAC_CONFIG = 0x0,
+NPCM_GMAC_FRAME_FILTER = 0x4,
+NPCM_GMAC_HASH_HIGH = 0x8,
+NPCM_GMAC_HASH_LOW = 0xc,
+NPCM_GMAC_MII_ADDR = 0x10,
+NPCM_GMAC_MII_DATA = 0x14,
+NPCM_GMAC_FLOW_CTRL = 0x18,
+NPCM_GMAC_VLAN_FLAG = 0x1c,
+NPCM_GMAC_VERSION = 0x20,
+NPCM_GMAC_WAKEUP_FILTER = 0x28,
+NPCM_GMAC_PMT = 0x2c,
+NPCM_GMAC_LPI_CTRL = 0x30,
+NPCM_GMAC_TIMER_CTRL = 0x34,
+NPCM_GMAC_INT_STATUS = 0x38,
+NPCM_GMAC_INT_MASK = 0x3c,
+NPCM_GMAC_MAC0_ADDR_HI = 0x40,
+NPCM_GMAC_MAC0_ADDR_LO = 0x44,
+NPCM_GMAC_MAC1_ADDR_HI = 0x48,
+NPCM_GMAC_MAC1_ADDR_LO = 0x4c,
+NPCM_GMAC_MAC2_ADDR_HI = 0x50,
+NPCM_GMAC_MAC2_ADDR_LO = 0x54,
+NPCM_GMAC_MAC3_ADDR_HI = 0x58,
+NPCM_GMAC_MAC3_ADDR_LO = 0x5c,
+NPCM_GMAC_RGMII_STATUS = 0xd8,
+NPCM_GMAC_WATCHDOG = 0xdc,
+NPCM_GMAC_PTP_TCR = 0x700,
+NPCM_GMAC_PTP_SSIR = 0x704,
+NPCM_GMAC_PTP_STSR = 0x708,
+NPCM_GMAC_PTP_STNSR = 0x70c,
+NPCM_GMAC_PTP_STSUR = 0x710,
+NPCM_GMAC_PTP_STNSUR = 0x714,
+NPCM_GMAC_PTP_TAR = 0x718,
+NPCM_GMAC_PTP_TTSR = 0x71c,
+} NPCMRegister;
+
+static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
+  NPCMRegister regno)
+{
+return qtest_readl(qts, mod->base_addr + regno);
+}
+
+/* Check that GMAC registers are reset to default value */
+static void test_init(gconstpointer test_data)
+{
+const TestData *td = test_data;
+const GMACModule *mod = td->module;
+QTestState *qts = qtest_init("-mach

[PATCH v16 6/6] tests/qtest: Adding PCS Module test to GMAC Qtest

From: Nabih Estefan Diaz 

 - Add PCS Register check to npcm_gmac-test

Change-Id: I34821beb5e0b1e89e2be576ab58eabe41545af12
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 tests/qtest/npcm_gmac-test.c | 132 +++
 1 file changed, 132 insertions(+)

diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
index 72c68874df..9e58b15ca1 100644
--- a/tests/qtest/npcm_gmac-test.c
+++ b/tests/qtest/npcm_gmac-test.c
@@ -23,6 +23,10 @@
 /* Name of the GMAC Device */
 #define TYPE_NPCM_GMAC "npcm-gmac"
 
+/* Address of the PCS Module */
+#define PCS_BASE_ADDRESS 0xf078
+#define NPCM_PCS_IND_AC_BA 0x1fe
+
 typedef struct GMACModule {
 int irq;
 uint64_t base_addr;
@@ -114,6 +118,62 @@ typedef enum NPCMRegister {
 NPCM_GMAC_PTP_STNSUR = 0x714,
 NPCM_GMAC_PTP_TAR = 0x718,
 NPCM_GMAC_PTP_TTSR = 0x71c,
+
+/* PCS Registers */
+NPCM_PCS_SR_CTL_ID1 = 0x3c0008,
+NPCM_PCS_SR_CTL_ID2 = 0x3c000a,
+NPCM_PCS_SR_CTL_STS = 0x3c0010,
+
+NPCM_PCS_SR_MII_CTRL = 0x3e,
+NPCM_PCS_SR_MII_STS = 0x3e0002,
+NPCM_PCS_SR_MII_DEV_ID1 = 0x3e0004,
+NPCM_PCS_SR_MII_DEV_ID2 = 0x3e0006,
+NPCM_PCS_SR_MII_AN_ADV = 0x3e0008,
+NPCM_PCS_SR_MII_LP_BABL = 0x3e000a,
+NPCM_PCS_SR_MII_AN_EXPN = 0x3e000c,
+NPCM_PCS_SR_MII_EXT_STS = 0x3e001e,
+
+NPCM_PCS_SR_TIM_SYNC_ABL = 0x3e0e10,
+NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR = 0x3e0e12,
+NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR = 0x3e0e14,
+NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR = 0x3e0e16,
+NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR = 0x3e0e18,
+NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR = 0x3e0e1a,
+NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR = 0x3e0e1c,
+NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR = 0x3e0e1e,
+NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR = 0x3e0e20,
+
+NPCM_PCS_VR_MII_MMD_DIG_CTRL1 = 0x3f,
+NPCM_PCS_VR_MII_AN_CTRL = 0x3f0002,
+NPCM_PCS_VR_MII_AN_INTR_STS = 0x3f0004,
+NPCM_PCS_VR_MII_TC = 0x3f0006,
+NPCM_PCS_VR_MII_DBG_CTRL = 0x3f000a,
+NPCM_PCS_VR_MII_EEE_MCTRL0 = 0x3f000c,
+NPCM_PCS_VR_MII_EEE_TXTIMER = 0x3f0010,
+NPCM_PCS_VR_MII_EEE_RXTIMER = 0x3f0012,
+NPCM_PCS_VR_MII_LINK_TIMER_CTRL = 0x3f0014,
+NPCM_PCS_VR_MII_EEE_MCTRL1 = 0x3f0016,
+NPCM_PCS_VR_MII_DIG_STS = 0x3f0020,
+NPCM_PCS_VR_MII_ICG_ERRCNT1 = 0x3f0022,
+NPCM_PCS_VR_MII_MISC_STS = 0x3f0030,
+NPCM_PCS_VR_MII_RX_LSTS = 0x3f0040,
+NPCM_PCS_VR_MII_MP_TX_BSTCTRL0 = 0x3f0070,
+NPCM_PCS_VR_MII_MP_TX_LVLCTRL0 = 0x3f0074,
+NPCM_PCS_VR_MII_MP_TX_GENCTRL0 = 0x3f007a,
+NPCM_PCS_VR_MII_MP_TX_GENCTRL1 = 0x3f007c,
+NPCM_PCS_VR_MII_MP_TX_STS = 0x3f0090,
+NPCM_PCS_VR_MII_MP_RX_GENCTRL0 = 0x3f00b0,
+NPCM_PCS_VR_MII_MP_RX_GENCTRL1 = 0x3f00b2,
+NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0 = 0x3f00ba,
+NPCM_PCS_VR_MII_MP_MPLL_CTRL0 = 0x3f00f0,
+NPCM_PCS_VR_MII_MP_MPLL_CTRL1 = 0x3f00f2,
+NPCM_PCS_VR_MII_MP_MPLL_STS = 0x3f0110,
+NPCM_PCS_VR_MII_MP_MISC_CTRL2 = 0x3f0126,
+NPCM_PCS_VR_MII_MP_LVL_CTRL = 0x3f0130,
+NPCM_PCS_VR_MII_MP_MISC_CTRL0 = 0x3f0132,
+NPCM_PCS_VR_MII_MP_MISC_CTRL1 = 0x3f0134,
+NPCM_PCS_VR_MII_DIG_CTRL2 = 0x3f01c2,
+NPCM_PCS_VR_MII_DIG_ERRCNT_SEL = 0x3f01c4,
 } NPCMRegister;
 
 static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
@@ -122,6 +182,15 @@ static uint32_t gmac_read(QTestState *qts, const 
GMACModule *mod,
 return qtest_readl(qts, mod->base_addr + regno);
 }
 
+static uint16_t pcs_read(QTestState *qts, const GMACModule *mod,
+  NPCMRegister regno)
+{
+uint32_t write_value = (regno & 0x3ffe00) >> 9;
+qtest_writel(qts, PCS_BASE_ADDRESS + NPCM_PCS_IND_AC_BA, write_value);
+uint32_t read_offset = regno & 0x1ff;
+return qtest_readl(qts, PCS_BASE_ADDRESS + read_offset);
+}
+
 /* Check that GMAC registers are reset to default value */
 static void test_init(gconstpointer test_data)
 {
@@ -134,6 +203,11 @@ static void test_init(gconstpointer test_data)
 g_assert_cmphex(gmac_read(qts, mod, (regno)), ==, (value)); \
 } while (0)
 
+#define CHECK_REG_PCS(regno, value) \
+do { \
+g_assert_cmphex(pcs_read(qts, mod, (regno)), ==, (value)); \
+} while (0)
+
 CHECK_REG32(NPCM_DMA_BUS_MODE, 0x00020100);
 CHECK_REG32(NPCM_DMA_XMT_POLL_DEMAND, 0);
 CHECK_REG32(NPCM_DMA_RCV_POLL_DEMAND, 0);
@@ -183,6 +257,64 @@ static void test_init(gconstpointer test_data)
 CHECK_REG32(NPCM_GMAC_PTP_TAR, 0);
 CHECK_REG32(NPCM_GMAC_PTP_TTSR, 0);
 
+/* TODO Add registers PCS */
+if (mod->base_addr == 0xf0802000) {
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID1, 0x699e);
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID2, 0);
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_STS, 0x8000);
+
+CHECK_REG_PCS(NPCM_PCS_SR_MII_CTRL, 0x1140);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_STS, 0x0109);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID1, 0x699e);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID2, 0x0ced0);
+CHECK_REG_PCS(NPCM_PCS_SR_MII_

[PATCH v16 0/6] Implementation of NPI Mailbox and GMAC Networking Module

From: Nabih Estefan Diaz 

[Changes since v15]

Dropped PCI MBox patches. They were presenting a lot of problems with 
endianness and are not directly related to the GMAC. Breaking them apart to 
debug separately and let the GMAC itself be upstreamed faster.

[Changes since v14]
Expanded comment on chardev device and fixed comment formatting

[Changes since v13]
Added a couple clarifying comments and documentation about chardev
device expected protocol for ease of review.

[Changes since v12]
Fix errors found when testing in big-endian host.

[Changes since v11]
Branch couldn't be merged with master because of issues in patchset 6.
Fixed.

[Changes since v10]
Fixed macOS build issue. Changed imports to not be linux-specific.

[Changes since v9]
More cleanup and fixes based on suggestions from Peter Maydell
(peter.mayd...@linaro.org) suggestions.

[Changes since v8]
Suggestions and Fixes from Peter Maydell (peter.mayd...@linaro.org),
also cleaned up changes so nothing is deleted in a later patch that was
added in an earlier patch. Patch count decresed by 1 because this cleanup
led to one of the patches being irrelevant.

[Changes since v7]
Fixed patch 4 declaration of new NIC based on comments by Peter Maydell
(peter.mayd...@linaro.org)

[Changes since v6]
Remove the Change-Ids from the commit messages.

[Changes since v5]
Undid remove of some qtests that seem to have been caused by a merge
conflict.

[Changes since v4]
Added Signed-off-by tag and fixed patch 4 commit message as suggested by
Peter Maydell (peter.mayd...@linaro.org)

[Changes since v3]
Fixed comments from Hao Wu (wuhao...@google.com)

[Changes since v2]
Fixed bugs related to the RC functionality of the GMAC. Added and
squashed patches related to that.

[Changes since v1]
Fixed some errors in formatting.
Fixed a merge error that I didn't see in v1.
Removed Nuvoton 8xx references since that is a separate patch set.

[Original Cover]
Creates NPI Mailbox Module with data verification for read and write (internal 
and external),
wiring to the Nuvoton SoC, and QTests.

Also creates the GMAC Networking Module. Implements read and write 
functionalities with cooresponding descriptors
and registers. Also includes QTests for the different functionalities.

Hao Wu (2):
  hw/net: Add NPCMXXX GMAC device
  hw/arm: Add GMAC devices to NPCM7XX SoC

Nabih Estefan Diaz (4):
  tests/qtest: Creating qtest for GMAC Module
  hw/net: GMAC Rx Implementation
  hw/net: GMAC Tx Implementation
  tests/qtest: Adding PCS Module test to GMAC Qtest

 hw/arm/npcm7xx.c |  37 +-
 hw/net/meson.build   |   2 +-
 hw/net/npcm_gmac.c   | 942 +++
 hw/net/trace-events  |  19 +
 include/hw/arm/npcm7xx.h |   2 +
 include/hw/net/npcm_gmac.h   | 343 +
 tests/qtest/meson.build  |   1 +
 tests/qtest/npcm_gmac-test.c | 344 +
 8 files changed, 1687 insertions(+), 3 deletions(-)
 create mode 100644 hw/net/npcm_gmac.c
 create mode 100644 include/hw/net/npcm_gmac.h
 create mode 100644 tests/qtest/npcm_gmac-test.c

-- 
2.43.0.429.g432eaa2c6b-goog

[PATCH v16 4/6] hw/net: GMAC Rx Implementation

From: Nabih Estefan Diaz 

- Implementation of Receive function for packets
- Implementation for reading and writing from and to descriptors in
  memory for Rx

When RX starts, we need to flush the queued packets so that they
can be received by the GMAC device. Without this it won't work
with TAP NIC device.

When RX descriptor list is full, it returns a DMA_STATUS for
software to handle it. But there's no way to indicate the software has
handled all RX descriptors and the whole pipeline stalls.

We do something similar to NPCM7XX EMC to handle this case.

1. Return packet size when RX descriptor is full, effectively dropping
these packets in such a case.
2. When software clears RX descriptor full bit, continue receiving
further packets by flushing QEMU packet queue.

Added relevant trace-events

Change-Id: I132aa254a94cda1a586aba2ea33bbfc74ecdb831
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/net/npcm_gmac.c  | 276 +++-
 hw/net/trace-events |   5 +
 2 files changed, 279 insertions(+), 2 deletions(-)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index 7118b4c7c7..a3c626e1b8 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -27,6 +27,10 @@
 #include "hw/net/mii.h"
 #include "hw/net/npcm_gmac.h"
 #include "migration/vmstate.h"
+#include "net/checksum.h"
+#include "net/eth.h"
+#include "net/net.h"
+#include "qemu/cutils.h"
 #include "qemu/log.h"
 #include "qemu/units.h"
 #include "sysemu/dma.h"
@@ -149,6 +153,17 @@ static void gmac_phy_set_link(NPCMGMACState *gmac, bool 
active)
 
 static bool gmac_can_receive(NetClientState *nc)
 {
+NPCMGMACState *gmac = NPCM_GMAC(qemu_get_nic_opaque(nc));
+
+/* If GMAC receive is disabled. */
+if (!(gmac->regs[R_NPCM_GMAC_MAC_CONFIG] & NPCM_GMAC_MAC_CONFIG_RX_EN)) {
+return false;
+}
+
+/* If GMAC DMA RX is stopped. */
+if (!(gmac->regs[R_NPCM_DMA_CONTROL] & NPCM_DMA_CONTROL_START_STOP_RX)) {
+return false;
+}
 return true;
 }
 
@@ -192,12 +207,258 @@ static void gmac_update_irq(NPCMGMACState *gmac)
 qemu_set_irq(gmac->irq, level);
 }
 
-static ssize_t gmac_receive(NetClientState *nc, const uint8_t *buf, size_t len)
+static int gmac_read_rx_desc(dma_addr_t addr, struct NPCMGMACRxDesc *desc)
+{
+if (dma_memory_read(&address_space_memory, addr, desc,
+sizeof(*desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+desc->rdes0 = le32_to_cpu(desc->rdes0);
+desc->rdes1 = le32_to_cpu(desc->rdes1);
+desc->rdes2 = le32_to_cpu(desc->rdes2);
+desc->rdes3 = le32_to_cpu(desc->rdes3);
+return 0;
+}
+
+static int gmac_write_rx_desc(dma_addr_t addr, struct NPCMGMACRxDesc *desc)
 {
-/* Placeholder. Function will be filled in following patches */
+struct NPCMGMACRxDesc le_desc;
+le_desc.rdes0 = cpu_to_le32(desc->rdes0);
+le_desc.rdes1 = cpu_to_le32(desc->rdes1);
+le_desc.rdes2 = cpu_to_le32(desc->rdes2);
+le_desc.rdes3 = cpu_to_le32(desc->rdes3);
+if (dma_memory_write(&address_space_memory, addr, &le_desc,
+sizeof(le_desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
 return 0;
 }
 
+static int gmac_rx_transfer_frame_to_buffer(uint32_t rx_buf_len,
+uint32_t *left_frame,
+uint32_t rx_buf_addr,
+bool *eof_transferred,
+const uint8_t **frame_ptr,
+uint16_t *transferred)
+{
+uint32_t to_transfer;
+/*
+ * Check that buffer is bigger than the frame being transfered
+ * If bigger then transfer only whats left of frame
+ * Else, fill frame with all the content possible
+ */
+if (rx_buf_len >= *left_frame) {
+to_transfer = *left_frame;
+*eof_transferred = true;
+} else {
+to_transfer = rx_buf_len;
+}
+
+/* write frame part to memory */
+if (dma_memory_write(&address_space_memory, (uint64_t) rx_buf_addr,
+ *frame_ptr, to_transfer, MEMTXATTRS_UNSPECIFIED)) {
+return -1;
+}
+
+/* update frame pointer and size of whats left of frame */
+*frame_ptr += to_transfer;
+*left_frame -= to_transfer;
+*transferred += to_transfer;
+
+return 0;
+}
+
+static void gmac_dma_set_state(NPCMGMACState *gmac, int shift, uint32_t state)
+{
+gmac->regs[R_NPCM_DMA_STATUS] = deposit32(gmac->regs[R_NPCM_DMA_STATUS],
+shift, 3, state);
+}
+
+static ssize_t gmac_receive(NetClientState *nc, const uint8_t *buf, size_t len)
+{
+/*
+ * Commen

[PATCH v16 5/6] hw/net: GMAC Tx Implementation

From: Nabih Estefan Diaz 

- Implementation of Transmit function for packets
- Implementation for reading and writing from and to descriptors in
  memory for Tx

Added relevant trace-events

NOTE: This function implements the steps detailed in the datasheet for
transmitting messages from the GMAC.

Change-Id: Icf14f9fcc6cc7808a41acd872bca67c9832087e6
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/net/npcm_gmac.c  | 203 
 hw/net/trace-events |   2 +
 2 files changed, 205 insertions(+)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index a3c626e1b8..1b71e2526e 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -238,6 +238,37 @@ static int gmac_write_rx_desc(dma_addr_t addr, struct 
NPCMGMACRxDesc *desc)
 return 0;
 }
 
+static int gmac_read_tx_desc(dma_addr_t addr, struct NPCMGMACTxDesc *desc)
+{
+if (dma_memory_read(&address_space_memory, addr, desc,
+sizeof(*desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+desc->tdes0 = le32_to_cpu(desc->tdes0);
+desc->tdes1 = le32_to_cpu(desc->tdes1);
+desc->tdes2 = le32_to_cpu(desc->tdes2);
+desc->tdes3 = le32_to_cpu(desc->tdes3);
+return 0;
+}
+
+static int gmac_write_tx_desc(dma_addr_t addr, struct NPCMGMACTxDesc *desc)
+{
+struct NPCMGMACTxDesc le_desc;
+le_desc.tdes0 = cpu_to_le32(desc->tdes0);
+le_desc.tdes1 = cpu_to_le32(desc->tdes1);
+le_desc.tdes2 = cpu_to_le32(desc->tdes2);
+le_desc.tdes3 = cpu_to_le32(desc->tdes3);
+if (dma_memory_write(&address_space_memory, addr, &le_desc,
+sizeof(le_desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+return 0;
+}
+
 static int gmac_rx_transfer_frame_to_buffer(uint32_t rx_buf_len,
 uint32_t *left_frame,
 uint32_t rx_buf_addr,
@@ -459,6 +490,155 @@ static ssize_t gmac_receive(NetClientState *nc, const 
uint8_t *buf, size_t len)
 return len;
 }
 
+static int gmac_tx_get_csum(uint32_t tdes1)
+{
+uint32_t mask = TX_DESC_TDES1_CHKSM_INS_CTRL_MASK(tdes1);
+int csum = 0;
+
+if (likely(mask > 0)) {
+csum |= CSUM_IP;
+}
+if (likely(mask > 1)) {
+csum |= CSUM_TCP | CSUM_UDP;
+}
+
+return csum;
+}
+
+static void gmac_try_send_next_packet(NPCMGMACState *gmac)
+{
+/*
+ * Comments about steps refer to steps for
+ * transmitting in page 384 of datasheet
+ */
+uint16_t tx_buffer_size = 2048;
+g_autofree uint8_t *tx_send_buffer = g_malloc(tx_buffer_size);
+uint32_t desc_addr;
+struct NPCMGMACTxDesc tx_desc;
+uint32_t tx_buf_addr, tx_buf_len;
+uint16_t length = 0;
+uint8_t *buf = tx_send_buffer;
+uint32_t prev_buf_size = 0;
+int csum = 0;
+
+/* steps 1&2 */
+if (!gmac->regs[R_NPCM_DMA_HOST_TX_DESC]) {
+gmac->regs[R_NPCM_DMA_HOST_TX_DESC] =
+NPCM_DMA_HOST_TX_DESC_MASK(gmac->regs[R_NPCM_DMA_TX_BASE_ADDR]);
+}
+desc_addr = gmac->regs[R_NPCM_DMA_HOST_TX_DESC];
+
+while (true) {
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_RUNNING_FETCHING_STATE);
+if (gmac_read_tx_desc(desc_addr, &tx_desc)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "TX Descriptor @ 0x%x can't be read\n",
+  desc_addr);
+return;
+}
+/* step 3 */
+
+trace_npcm_gmac_packet_desc_read(DEVICE(gmac)->canonical_path,
+desc_addr);
+trace_npcm_gmac_debug_desc_data(DEVICE(gmac)->canonical_path, &tx_desc,
+tx_desc.tdes0, tx_desc.tdes1, tx_desc.tdes2, tx_desc.tdes3);
+
+/* 1 = DMA Owned, 0 = Software Owned */
+if (!(tx_desc.tdes0 & TX_DESC_TDES0_OWN)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "TX Descriptor @ 0x%x is owned by software\n",
+  desc_addr);
+gmac->regs[R_NPCM_DMA_STATUS] |= NPCM_DMA_STATUS_TU;
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_SUSPENDED_STATE);
+gmac_update_irq(gmac);
+return;
+}
+
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_RUNNING_READ_STATE);
+/* Give the descriptor back regardless of what happens. */
+tx_desc.tdes0 &= ~TX_DESC_TDES0_OWN;
+
+if (tx_desc.tdes1 & TX_DESC_TDES1_FIRST_SEG_MASK) {
+csum = gmac_tx_get_csum(tx_desc.tdes1);
+}
+
+/* step 4 */
+tx_buf_addr = tx_desc.tdes2;
+

[PATCH v16 2/6] hw/arm: Add GMAC devices to NPCM7XX SoC

From: Hao Wu 

Change-Id: Id8a3461fb5042adc4c3fd6f4fbd1ca0d33e22565
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 hw/arm/npcm7xx.c | 37 +++--
 include/hw/arm/npcm7xx.h |  2 ++
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index e3243a520d..3b29206265 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -84,8 +84,10 @@ enum NPCM7xxInterrupt {
 NPCM7XX_UART1_IRQ,
 NPCM7XX_UART2_IRQ,
 NPCM7XX_UART3_IRQ,
+NPCM7XX_GMAC1_IRQ   = 14,
 NPCM7XX_EMC1RX_IRQ  = 15,
 NPCM7XX_EMC1TX_IRQ,
+NPCM7XX_GMAC2_IRQ,
 NPCM7XX_MMC_IRQ = 26,
 NPCM7XX_PSPI2_IRQ   = 28,
 NPCM7XX_PSPI1_IRQ   = 31,
@@ -229,6 +231,12 @@ static const hwaddr npcm7xx_pspi_addr[] = {
 0xf0201000,
 };
 
+/* Register base address for each GMAC Module */
+static const hwaddr npcm7xx_gmac_addr[] = {
+0xf0802000,
+0xf0804000,
+};
+
 static const struct {
 hwaddr regs_addr;
 uint32_t unconnected_pins;
@@ -456,6 +464,10 @@ static void npcm7xx_init(Object *obj)
 for (i = 0; i < ARRAY_SIZE(s->pspi); i++) {
 object_initialize_child(obj, "pspi[*]", &s->pspi[i], TYPE_NPCM_PSPI);
 }
+
+for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
+object_initialize_child(obj, "gmac[*]", &s->gmac[i], TYPE_NPCM_GMAC);
+}
 
 object_initialize_child(obj, "mmc", &s->mmc, TYPE_NPCM7XX_SDHCI);
 }
@@ -688,6 +700,29 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 sysbus_connect_irq(sbd, 1, npcm7xx_irq(s, rx_irq));
 }
 
+/*
+ * GMAC Modules. Cannot fail.
+ */
+QEMU_BUILD_BUG_ON(ARRAY_SIZE(npcm7xx_gmac_addr) != ARRAY_SIZE(s->gmac));
+QEMU_BUILD_BUG_ON(ARRAY_SIZE(s->gmac) != 2);
+for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
+SysBusDevice *sbd = SYS_BUS_DEVICE(&s->gmac[i]);
+
+/*
+ * The device exists regardless of whether it's connected to a QEMU
+ * netdev backend. So always instantiate it even if there is no
+ * backend.
+ */
+sysbus_realize(sbd, &error_abort);
+sysbus_mmio_map(sbd, 0, npcm7xx_gmac_addr[i]);
+int irq = i == 0 ? NPCM7XX_GMAC1_IRQ : NPCM7XX_GMAC2_IRQ;
+/*
+ * N.B. The values for the second argument sysbus_connect_irq are
+ * chosen to match the registration order in npcm7xx_emc_realize.
+ */
+sysbus_connect_irq(sbd, 0, npcm7xx_irq(s, irq));
+}
+
 /*
  * Flash Interface Unit (FIU). Can fail if incorrect number of chip selects
  * specified, but this is a programming error.
@@ -750,8 +785,6 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 create_unimplemented_device("npcm7xx.siox[2]",  0xf0102000,   4 * KiB);
 create_unimplemented_device("npcm7xx.ahbpci",   0xf040,   1 * MiB);
 create_unimplemented_device("npcm7xx.mcphy",0xf05f,  64 * KiB);
-create_unimplemented_device("npcm7xx.gmac1",0xf0802000,   8 * KiB);
-create_unimplemented_device("npcm7xx.gmac2",0xf0804000,   8 * KiB);
 create_unimplemented_device("npcm7xx.vcd",  0xf081,  64 * KiB);
 create_unimplemented_device("npcm7xx.ece",  0xf082,   8 * KiB);
 create_unimplemented_device("npcm7xx.vdma", 0xf0822000,   8 * KiB);
diff --git a/include/hw/arm/npcm7xx.h b/include/hw/arm/npcm7xx.h
index 72c7722096..4e0d210188 100644
--- a/include/hw/arm/npcm7xx.h
+++ b/include/hw/arm/npcm7xx.h
@@ -29,6 +29,7 @@
 #include "hw/misc/npcm7xx_pwm.h"
 #include "hw/misc/npcm7xx_rng.h"
 #include "hw/net/npcm7xx_emc.h"
+#include "hw/net/npcm_gmac.h"
 #include "hw/nvram/npcm7xx_otp.h"
 #include "hw/timer/npcm7xx_timer.h"
 #include "hw/ssi/npcm7xx_fiu.h"
@@ -104,6 +105,7 @@ struct NPCM7xxState {
 OHCISysBusState ohci;
 NPCM7xxFIUState fiu[2];
 NPCM7xxEMCState emc[2];
+NPCMGMACState   gmac[2];
 NPCM7xxSDHCIState   mmc;
 NPCMPSPIState   pspi[2];
 };
-- 
2.43.0.429.g432eaa2c6b-goog

[PATCH v16 3/6] tests/qtest: Creating qtest for GMAC Module

From: Nabih Estefan Diaz 

 - Created qtest to check initialization of registers in GMAC Module.
 - Implemented test into Build File.

Change-Id: I8b2fe152d3987a7eec4cf6a1d25ba92e75a5391d
Signed-off-by: Nabih Estefan 
Reviewed-by: Tyrone Ting 
---
 tests/qtest/meson.build  |   1 +
 tests/qtest/npcm_gmac-test.c | 212 +++
 2 files changed, 213 insertions(+)
 create mode 100644 tests/qtest/npcm_gmac-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 84a055a7d9..016cd77d20 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -230,6 +230,7 @@ qtests_aarch64 = \
   (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) +  \
   (config_all_accel.has_key('CONFIG_TCG') and  
  \
config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : 
[]) + \
+  (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
   ['arm-cpu-features',
'numa-test',
'boot-serial-test',
diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
new file mode 100644
index 00..72c68874df
--- /dev/null
+++ b/tests/qtest/npcm_gmac-test.c
@@ -0,0 +1,212 @@
+/*
+ * QTests for Nuvoton NPCM7xx/8xx GMAC Modules.
+ *
+ * Copyright 2024 Google LLC
+ * Authors:
+ * Hao Wu 
+ * Nabih Estefan 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "libqos/libqos.h"
+
+/* Name of the GMAC Device */
+#define TYPE_NPCM_GMAC "npcm-gmac"
+
+typedef struct GMACModule {
+int irq;
+uint64_t base_addr;
+} GMACModule;
+
+typedef struct TestData {
+const GMACModule *module;
+} TestData;
+
+/* Values extracted from hw/arm/npcm8xx.c */
+static const GMACModule gmac_module_list[] = {
+{
+.irq= 14,
+.base_addr  = 0xf0802000
+},
+{
+.irq= 15,
+.base_addr  = 0xf0804000
+},
+{
+.irq= 16,
+.base_addr  = 0xf0806000
+},
+{
+.irq= 17,
+.base_addr  = 0xf0808000
+}
+};
+
+/* Returns the index of the GMAC module. */
+static int gmac_module_index(const GMACModule *mod)
+{
+ptrdiff_t diff = mod - gmac_module_list;
+
+g_assert_true(diff >= 0 && diff < ARRAY_SIZE(gmac_module_list));
+
+return diff;
+}
+
+/* 32-bit register indices. Taken from npcm_gmac.c */
+typedef enum NPCMRegister {
+/* DMA Registers */
+NPCM_DMA_BUS_MODE = 0x1000,
+NPCM_DMA_XMT_POLL_DEMAND = 0x1004,
+NPCM_DMA_RCV_POLL_DEMAND = 0x1008,
+NPCM_DMA_RCV_BASE_ADDR = 0x100c,
+NPCM_DMA_TX_BASE_ADDR = 0x1010,
+NPCM_DMA_STATUS = 0x1014,
+NPCM_DMA_CONTROL = 0x1018,
+NPCM_DMA_INTR_ENA = 0x101c,
+NPCM_DMA_MISSED_FRAME_CTR = 0x1020,
+NPCM_DMA_HOST_TX_DESC = 0x1048,
+NPCM_DMA_HOST_RX_DESC = 0x104c,
+NPCM_DMA_CUR_TX_BUF_ADDR = 0x1050,
+NPCM_DMA_CUR_RX_BUF_ADDR = 0x1054,
+NPCM_DMA_HW_FEATURE = 0x1058,
+
+/* GMAC Registers */
+NPCM_GMAC_MAC_CONFIG = 0x0,
+NPCM_GMAC_FRAME_FILTER = 0x4,
+NPCM_GMAC_HASH_HIGH = 0x8,
+NPCM_GMAC_HASH_LOW = 0xc,
+NPCM_GMAC_MII_ADDR = 0x10,
+NPCM_GMAC_MII_DATA = 0x14,
+NPCM_GMAC_FLOW_CTRL = 0x18,
+NPCM_GMAC_VLAN_FLAG = 0x1c,
+NPCM_GMAC_VERSION = 0x20,
+NPCM_GMAC_WAKEUP_FILTER = 0x28,
+NPCM_GMAC_PMT = 0x2c,
+NPCM_GMAC_LPI_CTRL = 0x30,
+NPCM_GMAC_TIMER_CTRL = 0x34,
+NPCM_GMAC_INT_STATUS = 0x38,
+NPCM_GMAC_INT_MASK = 0x3c,
+NPCM_GMAC_MAC0_ADDR_HI = 0x40,
+NPCM_GMAC_MAC0_ADDR_LO = 0x44,
+NPCM_GMAC_MAC1_ADDR_HI = 0x48,
+NPCM_GMAC_MAC1_ADDR_LO = 0x4c,
+NPCM_GMAC_MAC2_ADDR_HI = 0x50,
+NPCM_GMAC_MAC2_ADDR_LO = 0x54,
+NPCM_GMAC_MAC3_ADDR_HI = 0x58,
+NPCM_GMAC_MAC3_ADDR_LO = 0x5c,
+NPCM_GMAC_RGMII_STATUS = 0xd8,
+NPCM_GMAC_WATCHDOG = 0xdc,
+NPCM_GMAC_PTP_TCR = 0x700,
+NPCM_GMAC_PTP_SSIR = 0x704,
+NPCM_GMAC_PTP_STSR = 0x708,
+NPCM_GMAC_PTP_STNSR = 0x70c,
+NPCM_GMAC_PTP_STSUR = 0x710,
+NPCM_GMAC_PTP_STNSUR = 0x714,
+NPCM_GMAC_PTP_TAR = 0x718,
+NPCM_GMAC_PTP_TTSR = 0x71c,
+} NPCMRegister;
+
+static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
+  NPCMRegister regno)
+{
+return qtest_readl(qts, mod->base_addr + regno);
+}
+
+/* Check that GMAC registers are reset to default value */
+static void test_init(gconstpointer test_data)
+{
+const TestData *td = test_data;
+const GMACModule *mod = td->module;
+QTestState *qts = qtest_init("-mach

[PATCH v16 1/6] hw/net: Add NPCMXXX GMAC device

From: Hao Wu 

This patch implements the basic registers of GMAC device and sets
registers for networking functionalities.
Squashed IRQ Implementation patch into this one for compliation.
Tested:
The following message shows up with the change:
Broadcom BCM54612E stmmac-0:00: attached PHY driver [Broadcom BCM54612E] 
(mii_bus:phy_addr=stmmac-0:00, irq=POLL)
stmmaceth f0802000.eth eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Change-Id: If71c6d486b95edcccba109ba454870714d7e0940
Signed-off-by: Hao Wu 
Signed-off-by: Nabih Estefan Diaz 
Reviewed-by: Tyrone Ting 
---
 hw/net/meson.build |   2 +-
 hw/net/npcm_gmac.c | 467 +
 hw/net/trace-events|  12 +
 include/hw/net/npcm_gmac.h | 343 +++
 4 files changed, 823 insertions(+), 1 deletion(-)
 create mode 100644 hw/net/npcm_gmac.c
 create mode 100644 include/hw/net/npcm_gmac.h

diff --git a/hw/net/meson.build b/hw/net/meson.build
index 9afceb0619..d4e1dc9838 100644
--- a/hw/net/meson.build
+++ b/hw/net/meson.build
@@ -38,7 +38,7 @@ system_ss.add(when: 'CONFIG_I82596_COMMON', if_true: 
files('i82596.c'))
 system_ss.add(when: 'CONFIG_SUNHME', if_true: files('sunhme.c'))
 system_ss.add(when: 'CONFIG_FTGMAC100', if_true: files('ftgmac100.c'))
 system_ss.add(when: 'CONFIG_SUNGEM', if_true: files('sungem.c'))
-system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c'))
+system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c', 
'npcm_gmac.c'))
 
 system_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c'))
 system_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c'))
diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
new file mode 100644
index 00..7118b4c7c7
--- /dev/null
+++ b/hw/net/npcm_gmac.c
@@ -0,0 +1,467 @@
+/*
+ * Nuvoton NPCM7xx/8xx GMAC Module
+ *
+ * Copyright 2024 Google LLC
+ * Authors:
+ * Hao Wu 
+ * Nabih Estefan 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * Unsupported/unimplemented features:
+ * - MII is not implemented, MII_ADDR.BUSY and MII_DATA always return zero
+ * - Precision timestamp (PTP) is not implemented.
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/registerfields.h"
+#include "hw/net/mii.h"
+#include "hw/net/npcm_gmac.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
+#include "sysemu/dma.h"
+#include "trace.h"
+
+REG32(NPCM_DMA_BUS_MODE, 0x1000)
+REG32(NPCM_DMA_XMT_POLL_DEMAND, 0x1004)
+REG32(NPCM_DMA_RCV_POLL_DEMAND, 0x1008)
+REG32(NPCM_DMA_RX_BASE_ADDR, 0x100c)
+REG32(NPCM_DMA_TX_BASE_ADDR, 0x1010)
+REG32(NPCM_DMA_STATUS, 0x1014)
+REG32(NPCM_DMA_CONTROL, 0x1018)
+REG32(NPCM_DMA_INTR_ENA, 0x101c)
+REG32(NPCM_DMA_MISSED_FRAME_CTR, 0x1020)
+REG32(NPCM_DMA_HOST_TX_DESC, 0x1048)
+REG32(NPCM_DMA_HOST_RX_DESC, 0x104c)
+REG32(NPCM_DMA_CUR_TX_BUF_ADDR, 0x1050)
+REG32(NPCM_DMA_CUR_RX_BUF_ADDR, 0x1054)
+REG32(NPCM_DMA_HW_FEATURE, 0x1058)
+
+REG32(NPCM_GMAC_MAC_CONFIG, 0x0)
+REG32(NPCM_GMAC_FRAME_FILTER, 0x4)
+REG32(NPCM_GMAC_HASH_HIGH, 0x8)
+REG32(NPCM_GMAC_HASH_LOW, 0xc)
+REG32(NPCM_GMAC_MII_ADDR, 0x10)
+REG32(NPCM_GMAC_MII_DATA, 0x14)
+REG32(NPCM_GMAC_FLOW_CTRL, 0x18)
+REG32(NPCM_GMAC_VLAN_FLAG, 0x1c)
+REG32(NPCM_GMAC_VERSION, 0x20)
+REG32(NPCM_GMAC_WAKEUP_FILTER, 0x28)
+REG32(NPCM_GMAC_PMT, 0x2c)
+REG32(NPCM_GMAC_LPI_CTRL, 0x30)
+REG32(NPCM_GMAC_TIMER_CTRL, 0x34)
+REG32(NPCM_GMAC_INT_STATUS, 0x38)
+REG32(NPCM_GMAC_INT_MASK, 0x3c)
+REG32(NPCM_GMAC_MAC0_ADDR_HI, 0x40)
+REG32(NPCM_GMAC_MAC0_ADDR_LO, 0x44)
+REG32(NPCM_GMAC_MAC1_ADDR_HI, 0x48)
+REG32(NPCM_GMAC_MAC1_ADDR_LO, 0x4c)
+REG32(NPCM_GMAC_MAC2_ADDR_HI, 0x50)
+REG32(NPCM_GMAC_MAC2_ADDR_LO, 0x54)
+REG32(NPCM_GMAC_MAC3_ADDR_HI, 0x58)
+REG32(NPCM_GMAC_MAC3_ADDR_LO, 0x5c)
+REG32(NPCM_GMAC_RGMII_STATUS, 0xd8)
+REG32(NPCM_GMAC_WATCHDOG, 0xdc)
+REG32(NPCM_GMAC_PTP_TCR, 0x700)
+REG32(NPCM_GMAC_PTP_SSIR, 0x704)
+REG32(NPCM_GMAC_PTP_STSR, 0x708)
+REG32(NPCM_GMAC_PTP_STNSR, 0x70c)
+REG32(NPCM_GMAC_PTP_STSUR, 0x710)
+REG32(NPCM_GMAC_PTP_STNSUR, 0x714)
+REG32(NPCM_GMAC_PTP_TAR, 0x718)
+REG32(NPCM_GMAC_PTP_TTSR, 0x71c)
+
+/* Register Fields */
+#define NPCM_GMAC_MII_ADDR_BUSY BIT(0)
+#define NPCM_GMAC_MII_ADDR_WRITEBIT(1)
+#define NPCM_GMAC_MII_ADDR_GR(rv)   extract16((rv), 6, 5)
+#define NPCM_GMAC_MII_ADDR_PA(rv)   extract16((rv), 11, 5)
+
+#define NPCM_GMAC_INT_MASK_LPIIMBIT(10)
+#define NPCM_GMAC_INT_MASK_PMTM BIT(3)
+#define NPCM_GMAC_INT_MASK_RGIM BIT(0)
+
+#define NPCM_DMA_BUS_MODE_SWR

[PATCH] linux-user: Make TARGET_NR_setgroups affect only the current thread

2024-01-30 Thread Ilya Leoshkevich

Like TARGET_NR_setuid, TARGET_NR_setgroups should affect only the
calling thread, and not the entire process. Therefore, implement it
using a syscall, and not a libc call.

Cc: qemu-sta...@nongnu.org
Fixes: 19b84f3c35d7 ("added setgroups and getgroups syscalls")
Signed-off-by: Ilya Leoshkevich 
---
 linux-user/syscall.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ff245dade51..da15d727e16 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7203,11 +7203,17 @@ static inline int tswapid(int id)
 #else
 #define __NR_sys_setresgid __NR_setresgid
 #endif
+#ifdef __NR_setgroups32
+#define __NR_sys_setgroups __NR_setgroups32
+#else
+#define __NR_sys_setgroups __NR_setgroups
+#endif
 
 _syscall1(int, sys_setuid, uid_t, uid)
 _syscall1(int, sys_setgid, gid_t, gid)
 _syscall3(int, sys_setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
 _syscall3(int, sys_setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
+_syscall2(int, sys_setgroups, int, size, gid_t *, grouplist)
 
 void syscall_init(void)
 {
@@ -11772,7 +11778,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 unlock_user(target_grouplist, arg2,
 gidsetsize * sizeof(target_id));
 }
-return get_errno(setgroups(gidsetsize, grouplist));
+return get_errno(sys_setgroups(gidsetsize, grouplist));
 }
 case TARGET_NR_fchown:
 return get_errno(fchown(arg1, low2highuid(arg2), low2highgid(arg3)));
@@ -12108,7 +12114,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 }
 unlock_user(target_grouplist, arg2, 0);
 }
-return get_errno(setgroups(gidsetsize, grouplist));
+return get_errno(sys_setgroups(gidsetsize, grouplist));
 }
 #endif
 #ifdef TARGET_NR_fchown32
-- 
2.43.0

[PATCH 0/3] ui/gtk: introducing vc->visible

From: Dongwon Kim 

Drawing guest display frames can't be completed while the VC is not in
visible state, which could result in timeout in both the host and the
guest especially when using blob scanout. Therefore it is needed to
update and track the visiblity status of the VC and unblock the pipeline
in case when VC becomes invisible (e.g. windows minimization, switching
among tabs) while processing a guest frame.

First patch (0001-ui-gtk-skip...) is introducing a flag 'visible' to
VirtualConsole struct then set it only if the VC and its window is
visible.
 
Second patch (0002-ui-gtk-set-...) sets the ui size to 0 when VC is
invisible when the tab is closed or deactivated. This notifies the guest
that the associated guest display is not active anymore.

Third patch (0003-ui-gtk-reset-visible...) adds a callback for GTK
window-state-event. The flag, 'visible' is updated based on the
minization status of the window.

Dongwon Kim (3):
  ui/gtk: skip drawing guest scanout when associated VC is invisible
  ui/gtk: set the ui size to 0 when invisible
  ui/gtk: reset visible flag when window is minimized

 include/ui/gtk.h |  1 +
 ui/gtk-egl.c |  8 +++
 ui/gtk-gl-area.c |  8 +++
 ui/gtk.c | 62 ++--
 4 files changed, 77 insertions(+), 2 deletions(-)

-- 
2.34.1

[PATCH 1/3] ui/gtk: skip drawing guest scanout when associated VC is invisible

From: Dongwon Kim 

A new flag "visible" is added to show visibility status of the gfx console.
The flag is set to 'true' when the VC is visible but set to 'false' when
it is hidden or closed. When the VC is invisible, drawing guest frames
should be skipped as it will never be completed and it would potentially
lock up the guest display especially when blob scanout is used.

Cc: Marc-André Lureau 
Cc: Gerd Hoffmann 
Cc: Vivek Kasireddy 

Signed-off-by: Dongwon Kim 
---
 include/ui/gtk.h |  1 +
 ui/gtk-egl.c |  8 
 ui/gtk-gl-area.c |  8 
 ui/gtk.c | 10 +-
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index aa3d637029..2de38e5724 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -57,6 +57,7 @@ typedef struct VirtualGfxConsole {
 bool y0_top;
 bool scanout_mode;
 bool has_dmabuf;
+bool visible;
 #endif
 } VirtualGfxConsole;
 
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 3af5ac5bcf..993c283191 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -265,6 +265,10 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
 #ifdef CONFIG_GBM
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+if (!vc->gfx.visible) {
+return;
+}
+
 eglMakeCurrent(qemu_egl_display, vc->gfx.esurface,
vc->gfx.esurface, vc->gfx.ectx);
 
@@ -363,6 +367,10 @@ void gd_egl_flush(DisplayChangeListener *dcl,
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 GtkWidget *area = vc->gfx.drawing_area;
 
+if (!vc->gfx.visible) {
+return;
+}
+
 if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf->draw_submitted) {
 graphic_hw_gl_block(vc->gfx.dcl.con, true);
 vc->gfx.guest_fb.dmabuf->draw_submitted = true;
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index 52dcac161e..04e07bd7ee 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -285,6 +285,10 @@ void gd_gl_area_scanout_flush(DisplayChangeListener *dcl,
 {
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+if (!vc->gfx.visible) {
+return;
+}
+
 if (vc->gfx.guest_fb.dmabuf && !vc->gfx.guest_fb.dmabuf->draw_submitted) {
 graphic_hw_gl_block(vc->gfx.dcl.con, true);
 vc->gfx.guest_fb.dmabuf->draw_submitted = true;
@@ -299,6 +303,10 @@ void gd_gl_area_scanout_dmabuf(DisplayChangeListener *dcl,
 #ifdef CONFIG_GBM
 VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
 
+if (!vc->gfx.visible) {
+return;
+}
+
 gtk_gl_area_make_current(GTK_GL_AREA(vc->gfx.drawing_area));
 egl_dmabuf_import_texture(dmabuf);
 if (!dmabuf->texture) {
diff --git a/ui/gtk.c b/ui/gtk.c
index 810d7fc796..02eb667d8a 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1312,15 +1312,20 @@ static void gd_menu_quit(GtkMenuItem *item, void 
*opaque)
 static void gd_menu_switch_vc(GtkMenuItem *item, void *opaque)
 {
 GtkDisplayState *s = opaque;
-VirtualConsole *vc = gd_vc_find_by_menu(s);
+VirtualConsole *vc;
 GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
 gint page;
 
+vc = gd_vc_find_current(s);
+vc->gfx.visible = false;
+
+vc = gd_vc_find_by_menu(s);
 gtk_release_modifiers(s);
 if (vc) {
 page = gtk_notebook_page_num(nb, vc->tab_item);
 gtk_notebook_set_current_page(nb, page);
 gtk_widget_grab_focus(vc->focus);
+vc->gfx.visible = true;
 }
 }
 
@@ -1350,6 +1355,7 @@ static gboolean gd_tab_window_close(GtkWidget *widget, 
GdkEvent *event,
 VirtualConsole *vc = opaque;
 GtkDisplayState *s = vc->s;
 
+vc->gfx.visible = false;
 gtk_widget_set_sensitive(vc->menu_item, true);
 gd_widget_reparent(vc->window, s->notebook, vc->tab_item);
 gtk_notebook_set_tab_label_text(GTK_NOTEBOOK(s->notebook),
@@ -1423,6 +1429,7 @@ static void gd_menu_untabify(GtkMenuItem *item, void 
*opaque)
 gd_update_geometry_hints(vc);
 gd_update_caption(s);
 }
+vc->gfx.visible = true;
 }
 
 static void gd_menu_show_menubar(GtkMenuItem *item, void *opaque)
@@ -2471,6 +2478,7 @@ static void gtk_display_init(DisplayState *ds, 
DisplayOptions *opts)
 #ifdef CONFIG_GTK_CLIPBOARD
 gd_clipboard_init(s);
 #endif /* CONFIG_GTK_CLIPBOARD */
+vc->gfx.visible = true;
 }
 
 static void early_gtk_display_init(DisplayOptions *opts)
-- 
2.34.1

[PATCH 3/3] ui/gtk: reset visible flag when window is minimized

From: Dongwon Kim 

Adding a callback for window-state-event that resets the flag, 'visible'
when associated window is minimized or restored. When minimizing, it cancels
any of queued draw events associated with the VC.

Cc: Marc-André Lureau 
Cc: Gerd Hoffmann 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/ui/gtk.c b/ui/gtk.c
index 651ed3492f..5bbcb7de62 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1381,6 +1381,37 @@ static gboolean gd_tab_window_close(GtkWidget *widget, 
GdkEvent *event,
 return TRUE;
 }
 
+static gboolean gd_window_state_event(GtkWidget *widget, GdkEvent *event,
+  void *opaque)
+{
+VirtualConsole *vc = opaque;
+
+if (!vc) {
+return TRUE;
+}
+
+if (event->window_state.new_window_state & GDK_WINDOW_STATE_ICONIFIED) {
+vc->gfx.visible = false;
+gd_set_ui_size(vc, 0, 0);
+if (vc->gfx.guest_fb.dmabuf &&
+vc->gfx.guest_fb.dmabuf->draw_submitted) {
+vc->gfx.guest_fb.dmabuf->draw_submitted = false;
+graphic_hw_gl_block(vc->gfx.dcl.con, false);
+}
+/* Showing the ui only if window exists except for the current vc as GTK
+ * window for 's' is being used to display the GUI */
+} else if (vc->window || vc == gd_vc_find_current(vc->s)) {
+GdkWindow *window;
+window = gtk_widget_get_window(vc->gfx.drawing_area);
+gd_set_ui_size(vc, gdk_window_get_width(window),
+   gdk_window_get_height(window));
+
+vc->gfx.visible = true;
+}
+
+return TRUE;
+}
+
 static gboolean gd_win_grab(void *opaque)
 {
 VirtualConsole *vc = opaque;
@@ -1422,6 +1453,9 @@ static void gd_menu_untabify(GtkMenuItem *item, void 
*opaque)
 
 g_signal_connect(vc->window, "delete-event",
  G_CALLBACK(gd_tab_window_close), vc);
+g_signal_connect(vc->window, "window-state-event",
+ G_CALLBACK(gd_window_state_event), vc);
+
 gtk_widget_show_all(vc->window);
 
 if (qemu_console_is_graphic(vc->gfx.dcl.con)) {
@@ -2470,6 +2504,11 @@ static void gtk_display_init(DisplayState *ds, 
DisplayOptions *opts)
 }
 
 vc = gd_vc_find_current(s);
+
+g_signal_connect(s->window, "window-state-event",
+ G_CALLBACK(gd_window_state_event),
+ vc);
+
 gtk_widget_set_sensitive(s->view_menu, vc != NULL);
 #ifdef CONFIG_VTE
 gtk_widget_set_sensitive(s->copy_item,
-- 
2.34.1

[PATCH 2/3] ui/gtk: set the ui size to 0 when invisible

From: Dongwon Kim 

UI size is set to 0 when the VC is invisible, which will prevent
the further scanout update by notifying the guest that the display
is not in active state. Then it is restored to the original size
whenever the VC becomes visible again.

Cc: Marc-André Lureau 
Cc: Gerd Hoffmann 
Cc: Vivek Kasireddy 
Signed-off-by: Dongwon Kim 
---
 ui/gtk.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index 02eb667d8a..651ed3492f 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1314,10 +1314,12 @@ static void gd_menu_switch_vc(GtkMenuItem *item, void 
*opaque)
 GtkDisplayState *s = opaque;
 VirtualConsole *vc;
 GtkNotebook *nb = GTK_NOTEBOOK(s->notebook);
+GdkWindow *window;
 gint page;
 
 vc = gd_vc_find_current(s);
 vc->gfx.visible = false;
+gd_set_ui_size(vc, 0, 0);
 
 vc = gd_vc_find_by_menu(s);
 gtk_release_modifiers(s);
@@ -1325,6 +1327,9 @@ static void gd_menu_switch_vc(GtkMenuItem *item, void 
*opaque)
 page = gtk_notebook_page_num(nb, vc->tab_item);
 gtk_notebook_set_current_page(nb, page);
 gtk_widget_grab_focus(vc->focus);
+window = gtk_widget_get_window(vc->gfx.drawing_area);
+gd_set_ui_size(vc, gdk_window_get_width(window),
+   gdk_window_get_height(window));
 vc->gfx.visible = true;
 }
 }
@@ -1356,6 +1361,7 @@ static gboolean gd_tab_window_close(GtkWidget *widget, 
GdkEvent *event,
 GtkDisplayState *s = vc->s;
 
 vc->gfx.visible = false;
+gd_set_ui_size(vc, 0, 0);
 gtk_widget_set_sensitive(vc->menu_item, true);
 gd_widget_reparent(vc->window, s->notebook, vc->tab_item);
 gtk_notebook_set_tab_label_text(GTK_NOTEBOOK(s->notebook),
@@ -1391,6 +1397,7 @@ static gboolean gd_win_grab(void *opaque)
 static void gd_menu_untabify(GtkMenuItem *item, void *opaque)
 {
 GtkDisplayState *s = opaque;
+GdkWindow *window;
 VirtualConsole *vc = gd_vc_find_current(s);
 
 if (vc->type == GD_VC_GFX &&
@@ -1429,6 +1436,10 @@ static void gd_menu_untabify(GtkMenuItem *item, void 
*opaque)
 gd_update_geometry_hints(vc);
 gd_update_caption(s);
 }
+
+window = gtk_widget_get_window(vc->gfx.drawing_area);
+gd_set_ui_size(vc, gdk_window_get_width(window),
+   gdk_window_get_height(window));
 vc->gfx.visible = true;
 }
 
@@ -1753,7 +1764,9 @@ static gboolean gd_configure(GtkWidget *widget,
 {
 VirtualConsole *vc = opaque;
 
-gd_set_ui_size(vc, cfg->width, cfg->height);
+if (vc->gfx.visible) {
+gd_set_ui_size(vc, cfg->width, cfg->height);
+}
 return FALSE;
 }
 
-- 
2.34.1

Re: [PATCH] pc: q35: Bump max_cpus to 1728 vcpus

2024-01-30 Thread Michael S. Tsirkin

On Tue, Jan 30, 2024 at 10:39:51PM +0530, Ani Sinha wrote:
> 
> 
> > On 30-Jan-2024, at 22:17, Daniel P. Berrangé  wrote:
> > 
> > On Tue, Jan 30, 2024 at 10:14:28PM +0530, Ani Sinha wrote:
> >> Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to 
> >> allow up to 4096 vCPUs")
> >> Linux kernel can support upto a maximum number of 4096 vCPUS when MAXSMP is
> >> enabled in the kernel. QEMU has been tested to correctly boot a linux guest
> >> with 1728 vcpus both with edk2 and seabios firmwares. So bump up the 
> >> max_cpus
> >> value for q35 machines versions 9 and newer to 1728. Q35 machines versions
> >> 8.2 and older continue to support 1024 maximum vcpus as before for
> >> compatibility.
> > 
> > Where does the 1728 number come from ?
> > 
> > Did something break at 1729, or did the test machine simply not
> > have sufficient resources to do practical larger tests ?
> 
> Actual limit currently is 1856 for EDK2. The HPE folks tested QEMU with edk2 
> and QEMU fails to boot beyond that limit.
> There are RH internal bugs tracking this and Gerd is working on it from RH 
> side [1].
> 
> We would ultimately like to go to 8192 vcpus for SAP HANA but 1728 vcpus is 
> our immediate target for now. If you want, I can resend the patch with 1856 
> since that is currently the tested limit.
> 
> 1. https://issues.redhat.com/browse/RHEL-22202

What is requested here is that you document the source of the number whatever
it is in a code comment and commit log.


> 
> > 
> >> 
> >> If KVM is not able to support the specified number of vcpus, QEMU would
> >> return the following error messages:
> >> 
> >> $ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728
> >> qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested 
> >> (1728) exceeds the recommended cpus supported by KVM (12)
> >> qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus 
> >> requested (1728) exceeds the recommended cpus supported by KVM (12)
> >> Number of SMP cpus requested (1728) exceeds the maximum cpus supported by 
> >> KVM (1024)
> >> 
> >> Cc: Daniel P. Berrangé 
> >> Cc: Igor Mammedov 
> >> Cc: Michael S. Tsirkin 
> >> Cc: Julia Suvorova 
> >> Signed-off-by: Ani Sinha 
> >> ---
> >> hw/i386/pc_q35.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> >> index f43d5142b8..bfa627a70b 100644
> >> --- a/hw/i386/pc_q35.c
> >> +++ b/hw/i386/pc_q35.c
> >> @@ -375,7 +375,7 @@ static void pc_q35_machine_options(MachineClass *m)
> >> m->default_nic = "e1000e";
> >> m->default_kernel_irqchip_split = false;
> >> m->no_floppy = 1;
> >> -m->max_cpus = 1024;
> >> +m->max_cpus = 1728;
> >> m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
> >> machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
> >> machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
> >> @@ -396,6 +396,7 @@ static void pc_q35_8_2_machine_options(MachineClass *m)
> >> {
> >> pc_q35_9_0_machine_options(m);
> >> m->alias = NULL;
> >> +m->max_cpus = 1024;
> >> compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);
> >> compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
> >> }
> >> -- 
> >> 2.42.0
> >> 
> > 
> > With regards,
> > Daniel
> > -- 
> > |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange 
> > :|
> > |: https://libvirt.org -o-https://fstop138.berrange.com 
> > :|
> > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange 
> > :|
>

Re: [PATCH] virtio-blk: avoid using ioeventfd state in irqfd conditional

On Mon, 22 Jan 2024 at 12:27, Stefan Hajnoczi  wrote:
>
> Requests that complete in an IOThread use irqfd to notify the guest
> while requests that complete in the main loop thread use the traditional
> qdev irq code path. The reason for this conditional is that the irq code
> path requires the BQL:
>
>   if (s->ioeventfd_started && !s->ioeventfd_disabled) {
>   virtio_notify_irqfd(vdev, req->vq);
>   } else {
>   virtio_notify(vdev, req->vq);
>   }
>
> There is a corner case where the conditional invokes the irq code path
> instead of the irqfd code path:
>
>   static void virtio_blk_stop_ioeventfd(VirtIODevice *vdev)
>   {
>   ...
>   /*
>* Set ->ioeventfd_started to false before draining so that host 
> notifiers
>* are not detached/attached anymore.
>*/
>   s->ioeventfd_started = false;
>
>   /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
>   blk_drain(s->conf.conf.blk);
>
> During blk_drain() the conditional produces the wrong result because
> ioeventfd_started is false.
>
> Use qemu_in_iothread() instead of checking the ioeventfd state.
>
> Buglink: https://issues.redhat.com/browse/RHEL-15394
> Signed-off-by: Stefan Hajnoczi 
> ---

Ping?

> Based-on: 
> https://repo.or.cz/qemu/kevin.git/shortlog/c14962c3ea6f0998d028142ed14affcb9dfccf28
>
> Stable backport notes: dataplane_started is being renamed to
> ioeventfd_started in the next block pull request. This patch can be
> safely applied to -stable although the variable name has changed and
> git-am will complain.
>
>  hw/block/virtio-blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 227d83569f..287c31ee3c 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -64,7 +64,7 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, 
> unsigned char status)
>  iov_discard_undo(&req->inhdr_undo);
>  iov_discard_undo(&req->outhdr_undo);
>  virtqueue_push(req->vq, &req->elem, req->in_len);
> -if (s->ioeventfd_started && !s->ioeventfd_disabled) {
> +if (qemu_in_iothread()) {
>  virtio_notify_irqfd(vdev, req->vq);
>  } else {
>  virtio_notify(vdev, req->vq);
> --
> 2.43.0
>
>

[PULL 5/5] hw/block/block.c: improve confusing blk_check_size_and_read_all() error

From: Manos Pitsidianakis 

In cases where a device tries to read more bytes than the block device
contains, the error is vague: "device requires X bytes, block backend
provides Y bytes".

This patch changes the errors of this function to include the block
backend name, the device id and device type name where appropriate.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Manos Pitsidianakis 
Message-id: 
7260eadff22c08457740117c1bb7bd2b4353acb9.1706598705.git.manos.pitsidiana...@linaro.org
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/block/block.h |  4 ++--
 hw/block/block.c | 25 +++--
 hw/block/m25p80.c|  3 ++-
 hw/block/pflash_cfi01.c  |  4 ++--
 hw/block/pflash_cfi02.c  |  2 +-
 5 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 15fff66435..de3946a5f1 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -88,8 +88,8 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
 
 /* Backend access helpers */
 
-bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
- Error **errp);
+bool blk_check_size_and_read_all(BlockBackend *blk, DeviceState *dev,
+ void *buf, hwaddr size, Error **errp);
 
 /* Configuration helpers */
 
diff --git a/hw/block/block.c b/hw/block/block.c
index ff503002aa..3ceca7dce6 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -54,29 +54,30 @@ static int blk_pread_nonzeroes(BlockBackend *blk, hwaddr 
size, void *buf)
  * BDRV_REQUEST_MAX_BYTES.
  * On success, return true.
  * On failure, store an error through @errp and return false.
- * Note that the error messages do not identify the block backend.
- * TODO Since callers don't either, this can result in confusing
- * errors.
+ *
  * This function not intended for actual block devices, which read on
  * demand.  It's for things like memory devices that (ab)use a block
  * backend to provide persistence.
  */
-bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
- Error **errp)
+bool blk_check_size_and_read_all(BlockBackend *blk, DeviceState *dev,
+ void *buf, hwaddr size, Error **errp)
 {
 int64_t blk_len;
 int ret;
+g_autofree char *dev_id = NULL;
 
 blk_len = blk_getlength(blk);
 if (blk_len < 0) {
 error_setg_errno(errp, -blk_len,
- "can't get size of block backend");
+ "can't get size of %s block backend", blk_name(blk));
 return false;
 }
 if (blk_len != size) {
-error_setg(errp, "device requires %" HWADDR_PRIu " bytes, "
-   "block backend provides %" PRIu64 " bytes",
-   size, blk_len);
+dev_id = qdev_get_human_name(dev);
+error_setg(errp, "%s device '%s' requires %" HWADDR_PRIu
+   " bytes, %s block backend provides %" PRIu64 " bytes",
+   object_get_typename(OBJECT(dev)), dev_id, size,
+   blk_name(blk), blk_len);
 return false;
 }
 
@@ -89,7 +90,11 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void 
*buf, hwaddr size,
 assert(size <= BDRV_REQUEST_MAX_BYTES);
 ret = blk_pread_nonzeroes(blk, size, buf);
 if (ret < 0) {
-error_setg_errno(errp, -ret, "can't read block backend");
+dev_id = qdev_get_human_name(dev);
+error_setg_errno(errp, -ret, "can't read %s block backend"
+ " for %s device '%s'",
+ blk_name(blk), object_get_typename(OBJECT(dev)),
+ dev_id);
 return false;
 }
 return true;
diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 26ce895628..0a12030a3a 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -1617,7 +1617,8 @@ static void m25p80_realize(SSIPeripheral *ss, Error 
**errp)
 trace_m25p80_binding(s);
 s->storage = blk_blockalign(s->blk, s->size);
 
-if (!blk_check_size_and_read_all(s->blk, s->storage, s->size, errp)) {
+if (!blk_check_size_and_read_all(s->blk, DEVICE(s),
+ s->storage, s->size, errp)) {
 return;
 }
 } else {
diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index f956f8bcf7..1bda8424b9 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -848,8 +848,8 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 }
 
 if (pfl->blk) {
-if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
- errp)) {
+if (!blk_check_size_and_read_all(pfl->blk, dev, pfl->storage,
+ total_len, errp)) {
 vmstate_unregister_ram(&pfl->mem, DEVICE(pfl));
 return;
 }
diff --git

[PULL 2/5] block/blkio: Make s->mem_region_alignment be 64 bits

From: "Richard W.M. Jones" 

With GCC 14 the code failed to compile on i686 (and was wrong for any
version of GCC):

../block/blkio.c: In function ‘blkio_file_open’:
../block/blkio.c:857:28: error: passing argument 3 of ‘blkio_get_uint64’ from 
incompatible pointer type [-Wincompatible-pointer-types]
  857 |&s->mem_region_alignment);
  |^~~~
  ||
  |size_t * {aka unsigned int *}
In file included from ../block/blkio.c:12:
/usr/include/blkio.h:49:67: note: expected ‘uint64_t *’ {aka ‘long long 
unsigned int *’} but argument is of type ‘size_t *’ {aka ‘unsigned int *’}
   49 | int blkio_get_uint64(struct blkio *b, const char *name, uint64_t 
*value);
  | ~~^

Signed-off-by: Richard W.M. Jones 
Message-id: 20240130122006.2977938-1-rjo...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blkio.c b/block/blkio.c
index 0a0a6c0f5f..bc2f21784c 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -68,7 +68,7 @@ typedef struct {
 CoQueue bounce_available;
 
 /* The value of the "mem-region-alignment" property */
-size_t mem_region_alignment;
+uint64_t mem_region_alignment;
 
 /* Can we skip adding/deleting blkio_mem_regions? */
 bool needs_mem_regions;
-- 
2.43.0

Re: [PATCH v3 0/2] hw/block/block.c: improve confusing error

On Tue, Jan 30, 2024 at 09:30:30AM +0200, Manos Pitsidianakis wrote:
> In cases where a device tries to read more bytes than the block device
> contains with the blk_check_size_and_read_all() function, the error is
> vague: "device requires X bytes, block backend provides Y bytes".
> 
> This patch changes the errors of this function to include the block
> backend name, the device id and device type name where appropriate.
> 
> Version 3:
> - Changed phrasing "%s device with id='%s'" to "%s device '%s'" since
>   second parameter might be either device id or device path.
> (thanks Stefan Hajnoczi )
> 
> Version 2:
> - Assert dev is not NULL on qdev_get_human_name
> (thanks Phil Mathieu-Daudé )
> 
> Manos Pitsidianakis (2):
>   hw/core/qdev.c: add qdev_get_human_name()
>   hw/block/block.c: improve confusing blk_check_size_and_read_all()
> error
> 
>  include/hw/block/block.h |  4 ++--
>  include/hw/qdev-core.h   | 14 ++
>  hw/block/block.c | 25 +++--
>  hw/block/m25p80.c|  3 ++-
>  hw/block/pflash_cfi01.c  |  4 ++--
>  hw/block/pflash_cfi02.c  |  2 +-
>  hw/core/qdev.c   |  8 
>  7 files changed, 44 insertions(+), 16 deletions(-)
> 
> Range-diff against v2:
> 1:  5fb5879708 ! 1:  8b566bfced hw/core/qdev.c: add qdev_get_human_name()
> @@ Commit message
>  Add a simple method to return some kind of human readable identifier 
> for
>  use in error messages.
>  
> +Reviewed-by: Stefan Hajnoczi 
>  Signed-off-by: Manos Pitsidianakis 
>  
>   ## include/hw/qdev-core.h ##
> 2:  8e7eb17fbd ! 2:  7260eadff2 hw/block/block.c: improve confusing 
> blk_check_size_and_read_all() error
> @@ hw/block/block.c: static int blk_pread_nonzeroes(BlockBackend *blk, 
> hwaddr size,
>  -   "block backend provides %" PRIu64 " bytes",
>  -   size, blk_len);
>  +dev_id = qdev_get_human_name(dev);
> -+error_setg(errp, "%s device with id='%s' requires %" HWADDR_PRIu
> ++error_setg(errp, "%s device '%s' requires %" HWADDR_PRIu
>  +   " bytes, %s block backend provides %" PRIu64 " 
> bytes",
>  +   object_get_typename(OBJECT(dev)), dev_id, size,
>  +   blk_name(blk), blk_len);
> @@ hw/block/block.c: bool blk_check_size_and_read_all(BlockBackend *blk, 
> void *buf,
>  -error_setg_errno(errp, -ret, "can't read block backend");
>  +dev_id = qdev_get_human_name(dev);
>  +error_setg_errno(errp, -ret, "can't read %s block backend"
> -+ "for %s device with id='%s'",
> ++ " for %s device '%s'",
>  + blk_name(blk), 
> object_get_typename(OBJECT(dev)),
>  + dev_id);
>   return false;
> 
> base-commit: 11be70677c70fdccd452a3233653949b79e97908
> -- 
> γαῖα πυρί μιχθήτω
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

[PULL 1/5] block/io_uring: improve error message when init fails

From: Fiona Ebner 

The man page for io_uring_queue_init states:

> io_uring_queue_init(3) returns 0 on success and -errno on failure.

and the man page for io_uring_setup (which is one of the functions
where the return value of io_uring_queue_init() can come from) states:

> On error, a negative error code is returned. The caller should not
> rely on errno variable.

Tested using 'sysctl kernel.io_uring_disabled=2'. Output before this
change:

> failed to init linux io_uring ring

Output after this change:

> failed to init linux io_uring ring: Operation not permitted

Signed-off-by: Fiona Ebner 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240123135044.204985-1-f.eb...@proxmox.com>
---
 block/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index d77ae55745..d11b2051ab 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -432,7 +432,7 @@ LuringState *luring_init(Error **errp)
 
 rc = io_uring_queue_init(MAX_ENTRIES, ring, 0);
 if (rc < 0) {
-error_setg_errno(errp, errno, "failed to init linux io_uring ring");
+error_setg_errno(errp, -rc, "failed to init linux io_uring ring");
 g_free(s);
 return NULL;
 }
-- 
2.43.0

[PULL 4/5] hw/core/qdev.c: add qdev_get_human_name()

From: Manos Pitsidianakis 

Add a simple method to return some kind of human readable identifier for
use in error messages.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Manos Pitsidianakis 
Message-id: 
8b566bfced98ae44be1fcc1f8e7215f0c3393aa1.1706598705.git.manos.pitsidiana...@linaro.org
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/qdev-core.h | 14 ++
 hw/core/qdev.c |  8 
 2 files changed, 22 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 151d968238..66338f479f 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -993,6 +993,20 @@ const char *qdev_fw_name(DeviceState *dev);
 void qdev_assert_realized_properly(void);
 Object *qdev_get_machine(void);
 
+/**
+ * qdev_get_human_name() - Return a human-readable name for a device
+ * @dev: The device. Must be a valid and non-NULL pointer.
+ *
+ * .. note::
+ *This function is intended for user friendly error messages.
+ *
+ * Returns: A newly allocated string containing the device id if not null,
+ * else the object canonical path.
+ *
+ * Use g_free() to free it.
+ */
+char *qdev_get_human_name(DeviceState *dev);
+
 /* FIXME: make this a link<> */
 bool qdev_set_parent_bus(DeviceState *dev, BusState *bus, Error **errp);
 
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 43d863b0c5..c68d0f7c51 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -879,6 +879,14 @@ Object *qdev_get_machine(void)
 return dev;
 }
 
+char *qdev_get_human_name(DeviceState *dev)
+{
+g_assert(dev != NULL);
+
+return dev->id ?
+   g_strdup(dev->id) : object_get_canonical_path(OBJECT(dev));
+}
+
 static MachineInitPhase machine_phase;
 
 bool phase_check(MachineInitPhase phase)
-- 
2.43.0

[PULL 3/5] pflash: fix sectors vs bytes confusion in blk_pread_nonzeroes()

The following expression is incorrect because blk_pread_nonzeroes()
deals in units of bytes, not sectors:

  bytes = MIN(size - offset, BDRV_REQUEST_MAX_SECTORS)
  ^^^

BDRV_REQUEST_MAX_BYTES is the appropriate constant.

Fixes: a4b15a8b9ef2 ("pflash: Only read non-zero parts of backend image")
Cc: Xiang Zheng 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240130002712.257815-1-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 9f52ee6e72..ff503002aa 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -30,7 +30,7 @@ static int blk_pread_nonzeroes(BlockBackend *blk, hwaddr 
size, void *buf)
 BlockDriverState *bs = blk_bs(blk);
 
 for (;;) {
-bytes = MIN(size - offset, BDRV_REQUEST_MAX_SECTORS);
+bytes = MIN(size - offset, BDRV_REQUEST_MAX_BYTES);
 if (bytes <= 0) {
 return 0;
 }
-- 
2.43.0

[PULL 0/5] Block patches

The following changes since commit 11be70677c70fdccd452a3233653949b79e97908:

  Merge tag 'pull-vfio-20240129' of https://github.com/legoater/qemu into 
staging (2024-01-29 10:53:56 +)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 954b33daee83fe79293fd81c2f7371db48e7d6bd:

  hw/block/block.c: improve confusing blk_check_size_and_read_all() error 
(2024-01-30 16:19:00 -0500)


Pull request



Fiona Ebner (1):
  block/io_uring: improve error message when init fails

Manos Pitsidianakis (2):
  hw/core/qdev.c: add qdev_get_human_name()
  hw/block/block.c: improve confusing blk_check_size_and_read_all()
error

Richard W.M. Jones (1):
  block/blkio: Make s->mem_region_alignment be 64 bits

Stefan Hajnoczi (1):
  pflash: fix sectors vs bytes confusion in blk_pread_nonzeroes()

 include/hw/block/block.h |  4 ++--
 include/hw/qdev-core.h   | 14 ++
 block/blkio.c|  2 +-
 block/io_uring.c |  2 +-
 hw/block/block.c | 27 ---
 hw/block/m25p80.c|  3 ++-
 hw/block/pflash_cfi01.c  |  4 ++--
 hw/block/pflash_cfi02.c  |  2 +-
 hw/core/qdev.c   |  8 
 9 files changed, 47 insertions(+), 19 deletions(-)

-- 
2.43.0

Re: [PATCH v2 2/4] scripts/replay-dump.py: Update to current rr record format

2024-01-30 Thread John Snow

On Thu, Jan 25, 2024 at 11:09 AM Nicholas Piggin  wrote:
>
> The v12 format support for replay-dump has a few issues still. This
> fixes async decoding; adds event, shutdown, and end decoding; fixes
> audio in / out events, fixes checkpoint checking of following async
> events.
>
> Signed-off-by: Nicholas Piggin 
> ---
>  scripts/replay-dump.py | 132 ++---
>  1 file changed, 98 insertions(+), 34 deletions(-)
>
> diff --git a/scripts/replay-dump.py b/scripts/replay-dump.py
> index d668193e79..35732da08f 100755
> --- a/scripts/replay-dump.py
> +++ b/scripts/replay-dump.py
> @@ -20,6 +20,7 @@
>
>  import argparse
>  import struct
> +import os
>  from collections import namedtuple
>  from os import path
>
> @@ -63,6 +64,10 @@ def read_byte(fin):
>  "Read a single byte"
>  return struct.unpack('>B', fin.read(1))[0]
>
> +def read_bytes(fin, nr):
> +"Read a nr bytes"

Existing problem in this file, but please use """triple quotes""" for
docstrings.

> +return fin.read(nr)
> +

Does it really save a lot of typing to alias fin.read(1) to
read_bytes(fin, 1) ...?

>  def read_event(fin):
>  "Read a single byte event, but save some state"
>  if replay_state.already_read:
> @@ -134,6 +139,18 @@ def swallow_async_qword(eid, name, dumpfile):
>  print("  %s(%d) @ %d" % (name, eid, step_id))
>  return True
>
> +def swallow_bytes(eid, name, dumpfile, nr):
> +"Swallow nr bytes of data without looking at it"
> +dumpfile.seek(nr, os.SEEK_CUR)
> +return True
> +

Why bother returning a bool if it's not based on any condition? Add an
error check or just drop the return value.

> +def decode_exception(eid, name, dumpfile):
> +print_event(eid, name)
> +return True
> +

I suppose in this case, the return is to fit a common signature.

> +# v12 does away with the additional event byte and encodes it in the main 
> type
> +# Between v8 and v9, REPLAY_ASYNC_BH_ONESHOT was added, but we don't decode
> +# those versions so leave it out.
>  async_decode_table = [ Decoder(0, "REPLAY_ASYNC_EVENT_BH", 
> swallow_async_qword),
> Decoder(1, "REPLAY_ASYNC_INPUT", decode_unimp),
> Decoder(2, "REPLAY_ASYNC_INPUT_SYNC", decode_unimp),
> @@ -142,8 +159,8 @@ def swallow_async_qword(eid, name, dumpfile):
> Decoder(5, "REPLAY_ASYNC_EVENT_NET", decode_unimp),
>  ]
>  # See replay_read_events/replay_read_event
> -def decode_async(eid, name, dumpfile):
> -"""Decode an ASYNC event"""
> +def decode_async_old(eid, name, dumpfile):
> +"""Decode an ASYNC event (pre-v8)"""
>
>  print_event(eid, name)
>
> @@ -157,6 +174,35 @@ def decode_async(eid, name, dumpfile):
>
>  return call_decode(async_decode_table, async_event_kind, dumpfile)
>
> +def decode_async_bh(eid, name, dumpfile):
> +op_id = read_qword(dumpfile)
> +print_event(eid, name)
> +return True
> +
> +def decode_async_bh_oneshot(eid, name, dumpfile):
> +op_id = read_qword(dumpfile)
> +print_event(eid, name)
> +return True
> +
> +def decode_async_char_read(eid, name, dumpfile):
> +char_id = read_byte(dumpfile)
> +size = read_dword(dumpfile)
> +print_event(eid, name, "device:%x chars:%s" % (char_id, 
> read_bytes(dumpfile, size)))
> +return True
> +
> +def decode_async_block(eid, name, dumpfile):
> +op_id = read_qword(dumpfile)
> +print_event(eid, name)
> +return True
> +
> +def decode_async_net(eid, name, dumpfile):
> +net_id = read_byte(dumpfile)
> +flags = read_dword(dumpfile)
> +size = read_dword(dumpfile)
> +swallow_bytes(eid, name, dumpfile, size)
> +print_event(eid, name, "net:%x flags:%x bytes:%d" % (net_id, flags, 
> size))
> +return True
> +
>  total_insns = 0
>
>  def decode_instruction(eid, name, dumpfile):
> @@ -166,6 +212,10 @@ def decode_instruction(eid, name, dumpfile):
>  print_event(eid, name, "+ %d -> %d" % (ins_diff, total_insns))
>  return True
>
> +def decode_shutdown(eid, name, dumpfile):
> +print_event(eid, name)
> +return True
> +
>  def decode_char_write(eid, name, dumpfile):
>  res = read_dword(dumpfile)
>  offset = read_dword(dumpfile)
> @@ -177,7 +227,7 @@ def decode_audio_out(eid, name, dumpfile):
>  print_event(eid, name, "%d" % (audio_data))
>  return True
>
> -def decode_checkpoint(eid, name, dumpfile):
> +def __decode_checkpoint(eid, name, dumpfile, old):
>  """Decode a checkpoint.
>
>  Checkpoints contain a series of async events with their own specific 
> data.
> @@ -189,14 +239,20 @@ def decode_checkpoint(eid, name, dumpfile):
>
>  # if the next event is EVENT_ASYNC there are a bunch of
>  # async events to read, otherwise we are done
> -if next_event != 3:
> -print_event(eid, name, "no additional data", event_number)
> -else:
> +if (old and next_event == 3) or (not old and next_event >= 3 and 
> next_event <= 9):
>  print_event(eid, name, "more

Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding

2024-01-30 Thread Fabiano Rosas

Avihai Horon  writes:

> On 29/01/2024 16:34, Fabiano Rosas wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Avihai Horon  writes:
>>
>>> Currently, multifd channels are created asynchronously without waiting
>>> for their creation -- migration simply proceeds and may wait in
>>> multifd_send_sync_main(), which is called by ram_save_setup(). This
>>> hides in it some race conditions which can cause an unexpected behavior
>>> if some channels creation fail.
>>>
>>> For example, the following scenario of multifd migration with two
>>> channels, where the first channel creation fails, will end in a
>>> segmentation fault (time advances from top to bottom):
>> Is this reproducible? Or just observable at least.
>
> Yes, though I had to engineer it a bit:
> 1. Run migration with two multifd channels and fail creation of the two 
> channels (e.g., by changing the address they are connecting to).
> 2. Add sleep(3) in multifd_send_sync_main() before we loop through the 
> channels and check p->quit.
> 3. Add sleep(5) only for the second multifd channel connect thread so 
> its connection is delayed and runs last.

Ok, well, that's something at least. I'll try to reproduce it so we can
keep track of it.

>> I acknowledge the situation you describe, but with multifd there's
>> usually an issue in cleanup paths. Let's make sure we flushed those out
>> before adding this new semaphore.
>
> Indeed, I was not keen on adding yet another semaphore either.
> I think there are multiple bugs here, some of them overlap and some don't.
> There is also your and Peter's previous work that I was not aware of to 
> fix those and to clean up the code.
>
> Maybe we can take it one step at a time, pushing your series first, 
> cleaning the code and fixing some bugs.
> Then we can see what bugs are left (if any) and fix them. It might even 
> be easier to fix after the cleanups.
>
>> This is similar to an issue Peter was addressing where we missed calling
>> multifd_send_termiante_threads() in the multifd_channel_connect() path:
>>
>> patch 4 in this
>> https://lore.kernel.org/r/20231022201211.452861-1-pet...@redhat.com
>
> What issue are you referring here? Can you elaborate?

Oh, I just realised that series doesn't address any particular bug. But
my point is that including a call to multifd_send_terminate_threads() at
new_send_channel_cleanup might be all that's needed because that has
code to cause the channels and the migration thread to end.

> The main issue I am trying to fix in my patch is that we don't wait for 
> all multifd channels to be created/error out before tearing down
> multifd resources in mulitfd_save_cleanup().

Ok, let me take a step back and ask why is this not solved by
multifd_save_cleanup() -> qemu_thread_join()? I see you moved
p->running=true to *after* the thread creation in patch 4. That will
always leave a gap where p->running == false but the thread is already
running.

>
>>> Thread   | Code execution
>>> 
>>> Multifd 1|
>>>   | multifd_new_send_channel_async (errors and quits)
>>>   |   multifd_new_send_channel_cleanup
>>>   |
>>> Migration thread |
>>>   | qemu_savevm_state_setup
>>>   |   ram_save_setup
>>>   | multifd_send_sync_main
>>>   | (detects Multifd 1 error and quits)
>>>   | [...]
>>>   | migration_iteration_finish
>>>   |   migrate_fd_cleanup_schedule
>>>   |
>>> Main thread  |
>>>   | migrate_fd_cleanup
>>>   |   multifd_save_cleanup (destroys Multifd 2 resources)
>>>   |
>>> Multifd 2|
>>>   | multifd_new_send_channel_async
>>>   | (accesses destroyed resources, segfault)
>>>
>>> In another scenario, migration can hang indefinitely:
>>> 1. Main migration thread reaches multifd_send_sync_main() and waits on
>>> the semaphores.
>>> 2. Then, all multifd channels creation fails, so they post the
>>> semaphores and quit.
>>> 3. Main migration channel will not identify the error, proceed to send
>>> pages and will hang.
>>>
>>> Fix it by waiting for all multifd channels to be created before
>>> proceeding with migration.
>>>
>>> Signed-off-by: Avihai Horon 
>>> ---
>>>   migration/multifd.h   |  3 +++
>>>   migration/migration.c |  1 +
>>>   migration/multifd.c   | 34 +++---
>>>   migration/ram.c   |  7 +++
>>>   4 files changed, 42 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/migration/multifd.h b/migration/multifd.h
>>> index 35d11f103c..87a64e0a87 100644
>>> --- a/migration/multifd.h
>>> +++ b/migration/multifd.h
>>> @@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error 
>>> **errp);
>>>   void multifd_recv_sync_main(void);
>>>   int mul

Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64

2024-01-30 Thread Fabiano Rosas

Peter Xu  writes:

> On Tue, Jan 30, 2024 at 10:18:07AM +, Peter Maydell wrote:
>> On Mon, 29 Jan 2024 at 23:31, Fabiano Rosas  wrote:
>> >
>> > Fabiano Rosas  writes:
>> >
>> > > Peter Xu  writes:
>> > >
>> > >> On Fri, Jan 26, 2024 at 11:54:32AM -0300, Fabiano Rosas wrote:
>> > > The issue that occurs to me now is that 'cpu host' will not work with
>> > > TCG. We might actually need to go poking /dev/kvm for this to work.
>> >
>> > Nevermind this last part. There's not going to be a scenario where we
>> > build with CONFIG_KVM, but run in an environment that does not support
>> > KVM.
>> 
>> Yes, there is. We'll build with CONFIG_KVM on any aarch64 host,
>> but that doesn't imply that the user running the build and
>> test has permissions for /dev/kvm.
>
> I'm actually pretty confused on why this would be a problem even for
> neoverse-n1: can we just try to use KVM, if it fails then use TCG?
> Something like:
>
>   (construct qemu cmdline)
>   ..
> #ifdef CONFIG_KVM

>   "-accel kvm "
> #endif
>   "-accel tcg "
>   ..
>
> ?
> IIUC if we specify two "-accel", we'll try the first, then if failed then
> the 2nd?

Aside from '-cpu max', there's no -accel and -cpu combination that works
on all of:

x86_64 host - TCG-only
aarch64 host - KVM & TCG
aarch64 host with --disable-tcg - KVM-only
aarch64 host without access to /dev/kvm - TCG-only

And the cpus are:
host - KVM-only
neoverse-n1 - TCG-only

We'll need something like:

/* covers aarch64 host with --disable-tcg */
if (qtest_has_accel("kvm") && !qtest_has_accel("tcg")) {
   if (open("/dev/kvm", O_RDONLY) < 0) {
   g_test_skip()
   } else {
   "-accel kvm -cpu host"
   }
}

/* covers x86_64 host */
if (!qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
   "-accel tcg -cpu neoverse-n1"
}

/* covers aarch64 host */
if (qtest_has_accel("kvm") && qtest_has_accel("tcg")) {
   if (open("/dev/kvm", O_RDONLY) < 0) {
  "-accel tcg -cpu neoverse-n1"
   } else {
  "-accel kvm -cpu host"
   }
}

Re: [PATCH] pflash: fix sectors vs bytes confusion in blk_pread_nonzeroes()

On Mon, Jan 29, 2024 at 07:27:12PM -0500, Stefan Hajnoczi wrote:
> The following expression is incorrect because blk_pread_nonzeroes()
> deals in units of bytes, not sectors:
> 
>   bytes = MIN(size - offset, BDRV_REQUEST_MAX_SECTORS)
>   ^^^
> 
> BDRV_REQUEST_MAX_BYTES is the appropriate constant.
> 
> Fixes: a4b15a8b9ef2 ("pflash: Only read non-zero parts of backend image")
> Cc: Xiang Zheng 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/block/block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v2] block/blkio: Make s->mem_region_alignment be 64 bits

On Tue, Jan 30, 2024 at 12:20:01PM +, Richard W.M. Jones wrote:
> With GCC 14 the code failed to compile on i686 (and was wrong for any
> version of GCC):
> 
> ../block/blkio.c: In function ‘blkio_file_open’:
> ../block/blkio.c:857:28: error: passing argument 3 of ‘blkio_get_uint64’ from 
> incompatible pointer type [-Wincompatible-pointer-types]
>   857 |&s->mem_region_alignment);
>   |^~~~
>   ||
>   |size_t * {aka unsigned int *}
> In file included from ../block/blkio.c:12:
> /usr/include/blkio.h:49:67: note: expected ‘uint64_t *’ {aka ‘long long 
> unsigned int *’} but argument is of type ‘size_t *’ {aka ‘unsigned int *’}
>49 | int blkio_get_uint64(struct blkio *b, const char *name, uint64_t 
> *value);
>   | 
> ~~^
> 
> Signed-off-by: Richard W.M. Jones 
> ---
>  block/blkio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

Re: [PATCH [repost]] block/blkio: Don't assume size_t is 64 bit

On Tue, Jan 30, 2024 at 12:19:37PM +, Richard W.M. Jones wrote:
> On Tue, Jan 30, 2024 at 01:04:46PM +0100, Kevin Wolf wrote:
> > Am 30.01.2024 um 11:30 hat Richard W.M. Jones geschrieben:
> > > On Tue, Jan 30, 2024 at 09:51:59AM +0100, Kevin Wolf wrote:
> > > > Am 29.01.2024 um 19:53 hat Richard W.M. Jones geschrieben:
> > > > > With GCC 14 the code failed to compile on i686 (and was wrong for any
> > > > > version of GCC):
> > > > > 
> > > > > ../block/blkio.c: In function ‘blkio_file_open’:
> > > > > ../block/blkio.c:857:28: error: passing argument 3 of 
> > > > > ‘blkio_get_uint64’ from incompatible pointer type 
> > > > > [-Wincompatible-pointer-types]
> > > > >   857 |&s->mem_region_alignment);
> > > > >   |^~~~
> > > > >   ||
> > > > >   |size_t * {aka unsigned int *}
> > > > > In file included from ../block/blkio.c:12:
> > > > > /usr/include/blkio.h:49:67: note: expected ‘uint64_t *’ {aka ‘long 
> > > > > long unsigned int *’} but argument is of type ‘size_t *’ {aka 
> > > > > ‘unsigned int *’}
> > > > >49 | int blkio_get_uint64(struct blkio *b, const char *name, 
> > > > > uint64_t *value);
> > > > >   | 
> > > > > ~~^
> > > > > 
> > > > > Signed-off-by: Richard W.M. Jones 
> > > > 
> > > > Why not simply make BDRVBlkioState.mem_region_alignment a uint64_t
> > > > instead of keeping it size_t and doing an additional conversion with
> > > > a check that requires an #if (probably to avoid a warning on 64 bit
> > > > hosts because the condition is never true)?
> > > 
> > > The smaller change (attached) does work on i686, but this worries me a
> > > little (although it doesn't give any error or warning):
> > > 
> > > if (((uintptr_t)host | size) % s->mem_region_alignment) {
> > > error_setg(errp, "unaligned buf %p with size %zu", host, size);
> > > return BMRR_FAIL;
> > > }
> > 
> > I don't see the problem? The calculation will now be done in 64 bits
> > even on a 32 bit host, but that seems fine to me. Is there a trap I'm
> > missing?
> 
> I guess not.  Stefan, any comments on whether we need to worry about
> huge mem-region-alignment?  I'll post the updated patch as a new
> message in a second.

An alignment of 32 or more bits is not required in any scenario that I'm
aware of.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v5 0/9] Misc clean ups to target/ppc exception handling

2024-01-30 Thread BALATON Zoltan


On Thu, 18 Jan 2024, BALATON Zoltan wrote:

These are some small clean ups for target/ppc/excp_helper.c trying to
make this code a bit simpler. No functional change is intended. This
series was submitted before but only partially merged due to freeze
and conflicting series os thia was postponed then to avoid conflicts.


Ping?

Regards,
BALATON Zoltan


v5:
- rebase on master
- keep logging nip pointing to the sc instruction
- add another patch

v4: Rebased on master dropping what was merged

BALATON Zoltan (9):
 target/ppc: Use env_cpu for cpu_abort in excp_helper
 target/ppc: Readability improvements in exception handlers
 target/ppc: Fix gen_sc to use correct nip
 target/ppc: Move patching nip from exception handler to helper_scv
 target/ppc: Simplify syscall exception handlers
 target/ppc: Clean up ifdefs in excp_helper.c, part 1
 target/ppc: Clean up ifdefs in excp_helper.c, part 2
 target/ppc: Clean up ifdefs in excp_helper.c, part 3
 target/ppc: Remove interrupt handler wrapper functions

target/ppc/cpu.h |   1 +
target/ppc/excp_helper.c | 490 +--
target/ppc/translate.c   |  16 +-
3 files changed, 170 insertions(+), 337 deletions(-)

Re: Call for GSoC/Outreachy internship project ideas

On Tue, 30 Jan 2024 at 14:40, Palmer Dabbelt  wrote:
>
> On Mon, 15 Jan 2024 08:32:59 PST (-0800), stefa...@gmail.com wrote:
> > Dear QEMU and KVM communities,
> > QEMU will apply for the Google Summer of Code and Outreachy internship
> > programs again this year. Regular contributors can submit project
> > ideas that they'd like to mentor by replying to this email before
> > January 30th.
>
> It's the 30th, sorry if this is late but I just saw it today.  +Alistair
> and Daniel, as I didn't sync up with anyone about this so not sure if
> someone else is looking already (we're not internally).
>
> > Internship programs
> > ---
> > GSoC (https://summerofcode.withgoogle.com/) and Outreachy
> > (https://www.outreachy.org/) offer paid open source remote work
> > internships to eligible people wishing to participate in open source
> > development. QEMU has been part of these internship programs for many
> > years. Our mentors have enjoyed helping talented interns make their
> > first open source contributions and some former interns continue to
> > participate today.
> >
> > Who can mentor
> > --
> > Regular contributors to QEMU and KVM can participate as mentors.
> > Mentorship involves about 5 hours of time commitment per week to
> > communicate with the intern, review their patches, etc. Time is also
> > required during the intern selection phase to communicate with
> > applicants. Being a mentor is an opportunity to help someone get
> > started in open source development, will give you experience with
> > managing a project in a low-stakes environment, and a chance to
> > explore interesting technical ideas that you may not have time to
> > develop yourself.
> >
> > How to propose your idea
> > --
> > Reply to this email with the following project idea template filled in:
> >
> > === TITLE ===
> >
> > '''Summary:''' Short description of the project
> >
> > Detailed description of the project that explains the general idea,
> > including a list of high-level tasks that will be completed by the
> > project, and provides enough background for someone unfamiliar with
> > the codebase to do research. Typically 2 or 3 paragraphs.
> >
> > '''Links:'''
> > * Wiki links to relevant material
> > * External links to mailing lists or web sites
> >
> > '''Details:'''
> > * Skill level: beginner or intermediate or advanced
> > * Language: C/Python/Rust/etc
>
> I'm not 100% sure this is a sane GSoC idea, as it's a bit open ended and
> might have some tricky parts.  That said it's tripping some people up
> and as far as I know nobody's started looking at it, so I figrued I'd
> write something up.
>
> I can try and dig up some more links if folks thing it's interesting,
> IIRC there's been a handful of bug reports related to very small loops
> that run ~10x slower when vectorized.  Large benchmarks like SPEC have
> also shown slowdowns.

Hi Palmer,
Performance optimization can be challenging for newcomers. I wouldn't
recommend it for a GSoC project unless you have time to seed the
project idea with specific optimizations to implement based on your
experience and profiling. That way the intern has a solid starting
point where they can have a few successes before venturing out to do
their own performance analysis.

Do you have the time to profile and add specifics to the project idea
by Feb 21st? If that sounds good to you, I'll add it to the project
ideas list and you can add more detailed tasks in the coming weeks.

Thanks,
Stefan

Re: Call for GSoC/Outreachy internship project ideas

On Tue, 30 Jan 2024 at 14:16, Alexander Graf  wrote:
> === Implement -M nitro-enclave in QEMU  ===
>
> '''Summary:''' AWS EC2 provides the ability to create an isolated
> sibling VM context from within a VM. This project implements the machine
> model and input data format parsing needed to run these sibling VMs
> stand alone in QEMU.

Thanks, Alex. I have added this project to the wiki and added a few
links (e.g. EIF file format). Feel free to edit:

https://wiki.qemu.org/Google_Summer_of_Code_2024#Implement_-M_nitro-enclave_in_QEMU

Stefan

Re: Call for GSoC/Outreachy internship project ideas

2024-01-30 Thread Palmer Dabbelt


On Mon, 15 Jan 2024 08:32:59 PST (-0800), stefa...@gmail.com wrote:

Dear QEMU and KVM communities,
QEMU will apply for the Google Summer of Code and Outreachy internship
programs again this year. Regular contributors can submit project
ideas that they'd like to mentor by replying to this email before
January 30th.


It's the 30th, sorry if this is late but I just saw it today.  +Alistair 
and Daniel, as I didn't sync up with anyone about this so not sure if 
someone else is looking already (we're not internally).



Internship programs
---
GSoC (https://summerofcode.withgoogle.com/) and Outreachy
(https://www.outreachy.org/) offer paid open source remote work
internships to eligible people wishing to participate in open source
development. QEMU has been part of these internship programs for many
years. Our mentors have enjoyed helping talented interns make their
first open source contributions and some former interns continue to
participate today.

Who can mentor
--
Regular contributors to QEMU and KVM can participate as mentors.
Mentorship involves about 5 hours of time commitment per week to
communicate with the intern, review their patches, etc. Time is also
required during the intern selection phase to communicate with
applicants. Being a mentor is an opportunity to help someone get
started in open source development, will give you experience with
managing a project in a low-stakes environment, and a chance to
explore interesting technical ideas that you may not have time to
develop yourself.

How to propose your idea
--
Reply to this email with the following project idea template filled in:

=== TITLE ===

'''Summary:''' Short description of the project

Detailed description of the project that explains the general idea,
including a list of high-level tasks that will be completed by the
project, and provides enough background for someone unfamiliar with
the codebase to do research. Typically 2 or 3 paragraphs.

'''Links:'''
* Wiki links to relevant material
* External links to mailing lists or web sites

'''Details:'''
* Skill level: beginner or intermediate or advanced
* Language: C/Python/Rust/etc


I'm not 100% sure this is a sane GSoC idea, as it's a bit open ended and 
might have some tricky parts.  That said it's tripping some people up 
and as far as I know nobody's started looking at it, so I figrued I'd 
write something up.


I can try and dig up some more links if folks thing it's interesting, 
IIRC there's been a handful of bug reports related to very small loops 
that run ~10x slower when vectorized.  Large benchmarks like SPEC have 
also shown slowdowns.


---

=== RISC-V Vector TCG Frontend Optimization ===

'''Summary:''' The RISC-V vector extension has been implemented in QEMU, 
but we have some performance pathologies mapping it to existing TCG 
backends.  This project would aim to improve the performance of the 
RISC-V vector ISA's mappings to TCG.


The RISC-V TCG frontend (ie, decoding RISC-V instructions 
and emitting TCG calls to emulate them) has some inefficient mappings to 
TCG, which results in binaries that have vector instructions frequently 
performing worse than those without, sometimes even up to 10x slower.  
This causes various headaches for users, including running toolchain 
regressions and doing distro work.  This project's aim would be to bring 
the performance of vectorized RISC-V code to a similar level as the 
corresponding scalar code.


This will definitely require changing the RISC-V TCG frontend.  It's 
likely there is some remaining optimization work that can be done 
without adding TCG primitives, but it may be necessary to do some core 
TCG work in order to improve performance sufficiently.


'''Links:'''
* https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg04495.html

'''Details'''
* Skill level: intermediate
* Language: C, RISC-V assembly



More information
--
You can find out about the process we follow here:
Video: https://www.youtube.com/watch?v=xNVCX7YMUL8
Slides (PDF): https://vmsplice.net/~stefan/stefanha-kvm-forum-2016.pdf

The QEMU wiki page for GSoC 2024 is now available:
https://wiki.qemu.org/Google_Summer_of_Code_2024

Thanks,
Stefan

Re: Call for GSoC/Outreachy internship project ideas

Hi Eugenio,
Stefano Garzarella and I had a SVQ-related project idea that I have added:
https://wiki.qemu.org/Google_Summer_of_Code_2024#vhost-user_memory_isolation

We want to support vhost-user devices without exposing guest RAM. This
is attractive for security reasons in vhost-user-vsock where a process
that connects multiple guests should not give access to other guests'
RAM in the case of a security bug. It is also useful on host platforms
where guest RAM cannot be shared (we think this is the case on macOS
Hypervisor.framework).

Please let us know if you have any thoughts about sharing/refactoring
the SVQ code.

Stefan

Re: Call for GSoC/Outreachy internship project ideas

2024-01-30 Thread Alexander Graf

Hey Stefan,

Thanks a lot for setting up GSoC this year again!

On 15.01.24 17:32, Stefan Hajnoczi wrote:

Dear QEMU and KVM communities,
QEMU will apply for the Google Summer of Code and Outreachy internship
programs again this year. Regular contributors can submit project
ideas that they'd like to mentor by replying to this email before
January 30th.

Internship programs
---
GSoC (https://summerofcode.withgoogle.com/) and Outreachy
(https://www.outreachy.org/) offer paid open source remote work
internships to eligible people wishing to participate in open source
development. QEMU has been part of these internship programs for many
years. Our mentors have enjoyed helping talented interns make their
first open source contributions and some former interns continue to
participate today.

Who can mentor
--
Regular contributors to QEMU and KVM can participate as mentors.
Mentorship involves about 5 hours of time commitment per week to
communicate with the intern, review their patches, etc. Time is also
required during the intern selection phase to communicate with
applicants. Being a mentor is an opportunity to help someone get
started in open source development, will give you experience with
managing a project in a low-stakes environment, and a chance to
explore interesting technical ideas that you may not have time to
develop yourself.

How to propose your idea
--
Reply to this email with the following project idea template filled in:

=== TITLE ===

'''Summary:''' Short description of the project

Detailed description of the project that explains the general idea,
including a list of high-level tasks that will be completed by the
project, and provides enough background for someone unfamiliar with
the codebase to do research. Typically 2 or 3 paragraphs.

'''Links:'''
* Wiki links to relevant material
* External links to mailing lists or web sites

'''Details:'''
* Skill level: beginner or intermediate or advanced
* Language: C/Python/Rust/etc

=== Implement -M nitro-enclave in QEMU ===

'''Summary:''' AWS EC2 provides the ability to create an isolated
sibling VM context from within a VM. This project implements the machine
model and input data format parsing needed to run these sibling VMs
stand alone in QEMU.

Nitro Enclaves are the first widely adopted implementation of hypervisor
assisted compute isolation. Similar to technologies like SGX, it allows
to spawn a separate context that is inaccessible by the parent Operating
System. This is implemented by "giving up" resources of the parent VM
(CPU cores, memory) to the hypervisor which then spawns a second vmm to
execute a completely separate virtual machine. That new VM only has a
vsock communication channel to the parent and has a built-in lightweight
TPM called NSM.

One big challenge with Nitro Enclaves is that due to its roots in
security, there are very few debugging / introspection capabilities.
That makes OS bringup, debugging and bootstrapping very difficult.
Having a local dev&test environment that looks like an Enclave, but is
100% controlled by the developer and introspectable would make life a
lot easier for everyone working on them. It also may pave the way to see
Nitro Enclaves adopted in VM environments outside of EC2.

This project will consist of adding a new machine model to QEMU that
mimics a Nitro Enclave environment, including NSM, the vsock
communication channel and building firmware which loads the special
"EIF" file format which contains kernel, initramfs and metadata from a
-kernel image.

If the student finishes early, we can then proceed to implement the
Nitro Enclaves parent driver in QEMU as well to create a full QEMU-only
Nitro Enclaves environment.

'''Tasks:'''
* Implement a device model for the NSM device (link to spec and driver
code below)

* Implement a new machine model
* Implement firmware for the new machine model that implements EIF parsing
* Add tests for the NSM device
* Add integration test for the machine model executing an actual EIF payload

'''Links:'''
* https://aws.amazon.com/ec2/nitro/nitro-enclaves/
*
https://lore.kernel.org/lkml/20200921121732.44291-10-andra...@amazon.com/T/
*
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/nsm.c

'''Details:'''
* Skill level: intermediate - advanced (some understanding of QEMU
machine modeling would be good)

* Language: C
* Mentor: agraf
* Suggested by: Alexander Graf (OFTC: agraf, Email: g...@amazon.com)

Alex

Re: [PATCH v2] scripts/checkpatch.pl: check for placeholders in cover letter patches


On 30/1/24 16:11, Alex Bennée wrote:

Manos Pitsidianakis  writes:


On Tue, 30 Jan 2024 12:15, "Daniel P. Berrangé"  wrote:

On Tue, Jan 30, 2024 at 12:11:07PM +0200, Manos Pitsidianakis wrote:

Check if a file argument is a cover letter patch produced by
git-format-patch --cover-letter; It is initialized with subject suffix "
*** SUBJECT HERE ***" and body prefix " *** BLURB HERE ***". If they
exist, warn the user.
Signed-off-by: Manos Pitsidianakis 
---
Range-diff against v1:
1:  64b7ec2287 ! 1:  9bf816eb4c scripts/checkpatch.pl: check for placeholders 
in cover letter patches
 @@ scripts/checkpatch.pl: sub process {
  +# --cover-letter; It is initialized with subject suffix
  +# " *** SUBJECT HERE ***" and body prefix " *** BLURB HERE ***"
  + if ($in_header_lines &&
 -+ $rawline =~ /^Subject:.+[*]{3} SUBJECT HERE [*]{3}\s*$/) {
 -+WARN("Patch appears to be a cover letter with uninitialized 
subject" .
 -+ " '*** SUBJECT HERE ***'\n$hereline\n");
 ++ $rawline =~ /^Subject:.+[*]{3} SUBJECT HERE 
[*]{3}\s*$/) {
 ++ WARN("Patch appears to be a cover letter with " .
 ++ "uninitialized subject '*** SUBJECT 
HERE ***'\n$hereline\n");
  + }
  +
  + if ($rawline =~ /^[*]{3} BLURB HERE [*]{3}\s*$/) {
 -+WARN("Patch appears to be a cover letter with leftover placeholder 
" .
 -+ "text '*** BLURB HERE ***'\n$hereline\n");
 ++ WARN("Patch appears to be a cover letter with " .
 ++ "leftover placeholder text '*** 
BLURB HERE ***'\n$hereline\n");
  + }
  +
if ($in_commit_log && $non_utf8_charset && $realfile =~ /^$/ &&
  scripts/checkpatch.pl | 14 ++
  1 file changed, 14 insertions(+)
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 7026895074..9a8d49f1d8 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1650,6 +1650,20 @@ sub process {
$non_utf8_charset = 1;
}
  +# Check if this is a cover letter patch produced by
git-format-patch
+# --cover-letter; It is initialized with subject suffix
+# " *** SUBJECT HERE ***" and body prefix " *** BLURB HERE ***"
+   if ($in_header_lines &&
+   $rawline =~ /^Subject:.+[*]{3} SUBJECT HERE 
[*]{3}\s*$/) {


This continuation line is now hugely over-indented - it should
be aligned just after the '('


It is not, it just uses tabs. Like line 2693 in current master:

https://gitlab.com/qemu-project/qemu/-/blob/11be70677c70fdccd452a3233653949b79e97908/scripts/checkpatch.pl#L2693

I will quote the **QEMU Coding Style** again on whitespace:


Whitespace
Of course, the most important aspect in any coding style is
whitespace. Crusty old coders who have trouble spotting the glasses
on their noses can tell the difference between a tab and eight
spaces from a distance of approximately fifteen parsecs. Many a
flamewar has been fought and lost on this issue.



QEMU indents are four spaces. Tabs are never used, except in
Makefiles where they have been irreversibly coded into the syntax.
Spaces of course are superior to tabs because:
 You have just one way to specify whitespace, not two. Ambiguity
breeds mistakes.
 The confusion surrounding ‘use tabs to indent, spaces to
justify’ is gone.
 Tab indents push your code to the right, making your screen
seriously unbalanced.
 Tabs will be rendered incorrectly on editors who are
misconfigured not to use tab stops of eight positions.
 Tabs are rendered badly in patches, causing off-by-one errors in
almost every line.
It is the QEMU coding style.


I think it's better if we leave this discussion here, and accept v1
which is consistent with the coding style, or this one which is
consistent with the inconsistency of the tabs and spaces mix of the
checkpatch.pl source code as a compromise, if it is deemed important.


I suspect the problem is that checkpatch.pl is an import from the Linux
source tree which has since had syncs with its upstream as well as a
slew of QEMU specific patches. If we don't care about tracking upstream
anymore we could bite the bullet and fix indentation going forward.


We diverged quite some time ago and don't track it anymore AFAICT.
Regardless, git tools are clever enough to deal with space changes
and a tab/space commit can be added to .git-blame-ignore-revs.


Of course arguably we should replace it with a python script and reduce
our dependence on perl. I'm sure someone had a go at that once but it
might have only been a partial undertaking.

Re: [PATCH] pc: q35: Bump max_cpus to 1728 vcpus


On 30/1/24 18:09, Ani Sinha wrote:




On 30-Jan-2024, at 22:17, Daniel P. Berrangé  wrote:

On Tue, Jan 30, 2024 at 10:14:28PM +0530, Ani Sinha wrote:

Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow up to 
4096 vCPUs")
Linux kernel can support upto a maximum number of 4096 vCPUS when MAXSMP is
enabled in the kernel. QEMU has been tested to correctly boot a linux guest
with 1728 vcpus both with edk2 and seabios firmwares. So bump up the max_cpus
value for q35 machines versions 9 and newer to 1728. Q35 machines versions
8.2 and older continue to support 1024 maximum vcpus as before for
compatibility.


Where does the 1728 number come from ?

Did something break at 1729, or did the test machine simply not
have sufficient resources to do practical larger tests ?


Actual limit currently is 1856 for EDK2. The HPE folks tested QEMU with edk2 
and QEMU fails to boot beyond that limit.
There are RH internal bugs tracking this and Gerd is working on it from RH side 
[1].

We would ultimately like to go to 8192 vcpus for SAP HANA but 1728 vcpus is our 
immediate target for now. If you want, I can resend the patch with 1856 since 
that is currently the tested limit.

1. https://issues.redhat.com/browse/RHEL-22202

Out of curiosity has the limit to be multiple of 64?

[PATCH v1 0/1] Qemu crashes on VM migration after an handled memory error

2024-01-30 Thread “William Roche

From: William Roche 

Problem:

A Qemu VM can survive a memory error, as qemu can relay the error to the
VM kernel which could also deal with it -- poisoning/off-lining the impacted
page. This situation creates a hole in the VM memory address space (an
unreadable page or set of pages).

A migration request of this VM (live migration through the network or
pseudo-migration with the creation of a state file) will crash Qemu when
it sequentially reads the memory address space and stumbles on the
existing hole.

New fix proposal:
-
Let's prevent the migration when we know that there is a poison page in
the VM address space.


History:

My first fix proposal for this crash condition (latest version:
https://lore.kernel.org/all/20231106220319.456765-1-william.ro...@oracle.com/ )
relied on a well behaving kernel to guaranty that a known poison page is
not accessed. It introduced an ARM platform specificity.
I haven't received any feedback about the ARM specificity to avoid
a possible memory corruption after a migration transforming a poisoned
page into an all zero page.

I also accept that when a memory error leads to memory poisoning, this
platform functionality has to be honored as long as a physical platform
would provide it.

Peter asked for a complete correction of this problem (transfering
the memory holes information with the migration and recreating these
holes on the destination platform).

In the meantime, this is a very small fix to avoid the current crash
situation reading the poisoned memory pages.  I'm simply preventing
the migration when we know that it would crash, when there is a
poisoned page in the VM address space.

This is a generic protection code, avoiding a crash condition and
reporting the following error message:
"Error: Can't migrate this vm with hardware poisoned memory, please reboot the 
vm and try again"
instead of crashing the VM.

This fix is scripts/checkpatch.pl clean.
Unit tested on ARM and x86.


William Roche (1):
  migration: prevent migration when VM has poisoned memory

 accel/kvm/kvm-all.c| 10 ++
 accel/stubs/kvm-stub.c |  5 +
 include/sysemu/kvm.h   |  6 ++
 migration/migration.c  |  7 +++
 4 files changed, 28 insertions(+)

-- 
2.39.3

[PATCH v1 1/1] migration: prevent migration when VM has poisoned memory

2024-01-30 Thread “William Roche

From: William Roche 

A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.
#0  _mm256_loadu_si256
#1  buffer_zero_avx2
#2  select_accel_fn
#3  buffer_is_zero
#4  save_zero_page
#5  ram_save_target_page_legacy
#6  ram_save_host_page
#7  ram_find_and_save_block
#8  ram_save_iterate
#9  qemu_savevm_state_iterate
#10 migration_iteration_run
#11 migration_thread
#12 qemu_thread_start

To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.

Signed-off-by: William Roche 
---
 accel/kvm/kvm-all.c| 10 ++
 accel/stubs/kvm-stub.c |  5 +
 include/sysemu/kvm.h   |  6 ++
 migration/migration.c  |  7 +++
 4 files changed, 28 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 49e755ec4a..a8cecd040e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1119,6 +1119,11 @@ int kvm_vm_check_extension(KVMState *s, unsigned int 
extension)
 return ret;
 }
 
+/*
+ * We track the poisoned pages to be able to:
+ * - replace them on VM reset
+ * - block a migration for a VM with a poisoned page
+ */
 typedef struct HWPoisonPage {
 ram_addr_t ram_addr;
 QLIST_ENTRY(HWPoisonPage) list;
@@ -1152,6 +1157,11 @@ void kvm_hwpoison_page_add(ram_addr_t ram_addr)
 QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
 }
 
+bool kvm_hwpoisoned_mem(void)
+{
+return !QLIST_EMPTY(&hwpoison_page_list);
+}
+
 static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size)
 {
 #if HOST_BIG_ENDIAN != TARGET_BIG_ENDIAN
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 1b37d9a302..ca38172884 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -124,3 +124,8 @@ uint32_t kvm_dirty_ring_size(void)
 {
 return 0;
 }
+
+bool kvm_hwpoisoned_mem(void)
+{
+return false;
+}
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index d614878164..fad9a7e8ff 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -538,4 +538,10 @@ bool kvm_arch_cpu_check_are_resettable(void);
 bool kvm_dirty_ring_enabled(void);
 
 uint32_t kvm_dirty_ring_size(void);
+
+/**
+ * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
+ * reported for the VM.
+ */
+bool kvm_hwpoisoned_mem(void);
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index d5f705ceef..b574e66f7b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -67,6 +67,7 @@
 #include "options.h"
 #include "sysemu/dirtylimit.h"
 #include "qemu/sockets.h"
+#include "sysemu/kvm.h"
 
 static NotifierList migration_state_notifiers =
 NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
@@ -1906,6 +1907,12 @@ static bool migrate_prepare(MigrationState *s, bool blk, 
bool blk_inc,
 return false;
 }
 
+if (kvm_hwpoisoned_mem()) {
+error_setg(errp, "Can't migrate this vm with hardware poisoned memory, 
"
+   "please reboot the vm and try again");
+return false;
+}
+
 if (migration_is_blocked(errp)) {
 return false;
 }
-- 
2.39.3

[PATCH 1/4] hw/arm/stellaris: Convert ADC controller to Resettable interface

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/stellaris.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
index d18b1144af..afbc83f1e6 100644
--- a/hw/arm/stellaris.c
+++ b/hw/arm/stellaris.c
@@ -773,8 +773,9 @@ static void stellaris_adc_trigger(void *opaque, int irq, 
int level)
 }
 }
 
-static void stellaris_adc_reset(StellarisADCState *s)
+static void stellaris_adc_reset_hold(Object *obj)
 {
+StellarisADCState *s = STELLARIS_ADC(obj);
 int n;
 
 for (n = 0; n < 4; n++) {
@@ -946,7 +947,6 @@ static void stellaris_adc_init(Object *obj)
 memory_region_init_io(&s->iomem, obj, &stellaris_adc_ops, s,
   "adc", 0x1000);
 sysbus_init_mmio(sbd, &s->iomem);
-stellaris_adc_reset(s);
 qdev_init_gpio_in(dev, stellaris_adc_trigger, 1);
 }
 
@@ -1397,7 +1397,9 @@ static const TypeInfo stellaris_i2c_info = {
 static void stellaris_adc_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
+ResettableClass *rc = RESETTABLE_CLASS(klass);
 
+rc->phases.hold = stellaris_adc_reset_hold;
 dc->vmsd = &vmstate_stellaris_adc;
 }
 
-- 
2.41.0

[PATCH 4/4] hw/arm/stellaris: Add missing QOM 'SoC' parent

QDev objects created with qdev_new() need to manually add
their parent relationship with object_property_add_child().

Since we don't model the SoC, just use a QOM container.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/stellaris.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
index bb88b3ebde..e349981308 100644
--- a/hw/arm/stellaris.c
+++ b/hw/arm/stellaris.c
@@ -1018,6 +1018,7 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
  * 400fe000 system control
  */
 
+Object *soc_container;
 DeviceState *gpio_dev[7], *nvic;
 qemu_irq gpio_in[7][8];
 qemu_irq gpio_out[7][8];
@@ -1038,6 +1039,9 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
 flash_size = (((board->dc0 & 0x) + 1) << 1) * 1024;
 sram_size = ((board->dc0 >> 18) + 1) * 1024;
 
+soc_container = object_new("container");
+object_property_add_child(OBJECT(ms), "soc", soc_container);
+
 /* Flash programming is done via the SCU, so pretend it is ROM.  */
 memory_region_init_rom(flash, NULL, "stellaris.flash", flash_size,
&error_fatal);
@@ -1052,6 +1056,7 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
  * need its sysclk output.
  */
 ssys_dev = qdev_new(TYPE_STELLARIS_SYS);
+object_property_add_child(soc_container, "sys", OBJECT(ssys_dev));
 /* Most devices come preprogrammed with a MAC address in the user data. */
 macaddr = nd_table[0].macaddr.a;
 qdev_prop_set_uint32(ssys_dev, "user0",
@@ -1068,6 +1073,7 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
 sysbus_realize_and_unref(SYS_BUS_DEVICE(ssys_dev), &error_fatal);
 
 nvic = qdev_new(TYPE_ARMV7M);
+object_property_add_child(soc_container, "v7m", OBJECT(nvic));
 qdev_prop_set_uint32(nvic, "num-irq", NUM_IRQ_LINES);
 qdev_prop_set_uint8(nvic, "num-prio-bits", NUM_PRIO_BITS);
 qdev_prop_set_string(nvic, "cpu-type", ms->cpu_type);
@@ -1101,6 +1107,7 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
 
 dev = qdev_new(TYPE_STELLARIS_GPTM);
 sbd = SYS_BUS_DEVICE(dev);
+object_property_add_child(soc_container, "gptm[*]", OBJECT(dev));
 qdev_connect_clock_in(dev, "clk",
   qdev_get_clock_out(ssys_dev, "SYSCLK"));
 sysbus_realize_and_unref(sbd, &error_fatal);
@@ -1114,7 +1121,7 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
 
 if (board->dc1 & (1 << 3)) { /* watchdog present */
 dev = qdev_new(TYPE_LUMINARY_WATCHDOG);
-
+object_property_add_child(soc_container, "wdg", OBJECT(dev));
 qdev_connect_clock_in(dev, "WDOGCLK",
   qdev_get_clock_out(ssys_dev, "SYSCLK"));
 
@@ -1154,6 +1161,7 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
 SysBusDevice *sbd;
 
 dev = qdev_new("pl011_luminary");
+object_property_add_child(soc_container, "uart[*]", OBJECT(dev));
 sbd = SYS_BUS_DEVICE(dev);
 qdev_prop_set_chr(dev, "chardev", serial_hd(i));
 sysbus_realize_and_unref(sbd, &error_fatal);
@@ -1276,6 +1284,7 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
 qemu_check_nic_model(&nd_table[0], "stellaris");
 
 enet = qdev_new("stellaris_enet");
+object_property_add_child(soc_container, "enet", OBJECT(enet));
 qdev_set_nic_properties(enet, &nd_table[0]);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(enet), &error_fatal);
 sysbus_mmio_map(SYS_BUS_DEVICE(enet), 0, 0x40048000);
-- 
2.41.0

[PATCH v4] doc/sphinx/hxtool.py: add optional label argument to SRST directive

2024-01-30 Thread David Woodhouse

From: David Woodhouse 

We can't just embed labels directly into files like qemu-options.hx which
are included from multiple top-level rST files, because Sphinx sees the
labels as duplicate: https://github.com/sphinx-doc/sphinx/issues/9707

So add an optional argument to the SRST directive which causes a label
of the form '.. _DOCNAME-HXFILE-LABEL:' to be emitted, where 'DOCNAME'
is the name of the top level rST file, 'HXFILE' is the filename of the
.hx file, and 'LABEL' is the text provided within the 'SRST()' directive.
Using the DOCNAME of the top-level rST document means that it is unique
even when the .hx file is included from two different documents, as is
the case for qemu-options.hx

Now where the Xen PV documentation refers to the documentation for the
-initrd command line option, it can emit a link directly to it as
''.

Signed-off-by: David Woodhouse 
Reviewed-by: Paul Durrant 
Reviewed-by: Peter Maydell 
---
v4:
 • Wrap long lines to shut checkpatch up

v3:
 • Include DOCNAME in label
 • Drop emitrefs option which is no longer needed

v2:
 • Invoke parse_srst() unconditionally
 • Change emitted label to include basename of .hx file
 • Describe it in docs/devel/docs.rst


 docs/devel/docs.rst  | 12 ++--
 docs/sphinx/hxtool.py| 16 
 docs/system/i386/xen.rst |  3 ++-
 qemu-options.hx  |  2 +-
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/docs/devel/docs.rst b/docs/devel/docs.rst
index 7da067905b..50ff0d67f8 100644
--- a/docs/devel/docs.rst
+++ b/docs/devel/docs.rst
@@ -30,6 +30,13 @@ nor the documentation output.
 
 ``SRST`` starts a reStructuredText section. Following lines
 are put into the documentation verbatim, and discarded from the C output.
+The alternative form ``SRST()`` is used to define a label which can be
+referenced from elsewhere in the rST documentation. The label will take
+the form , where ``DOCNAME`` is the name of the
+top level rST file, ``HXFILE`` is the filename of the .hx file without
+the ``.hx`` extension, and ``LABEL`` is the text provided within the
+``SRST()`` directive. For example,
+.
 
 ``ERST`` ends the documentation section started with ``SRST``,
 and switches back to a C code section.
@@ -53,8 +60,9 @@ text, but in ``hmp-commands.hx`` the C code sections are 
elements
 of an array of structs of type ``HMPCommand`` which define the
 name, behaviour and help text for each monitor command.
 
-In the file ``qemu-options.hx``, do not try to define a
+In the file ``qemu-options.hx``, do not try to explicitly define a
 reStructuredText label within a documentation section. This file
 is included into two separate Sphinx documents, and some
 versions of Sphinx will complain about the duplicate label
-that results.
+that results. Use the ``SRST()`` directive documented above, to
+emit an unambiguous label.
diff --git a/docs/sphinx/hxtool.py b/docs/sphinx/hxtool.py
index 9f6b9d87dc..3729084a36 100644
--- a/docs/sphinx/hxtool.py
+++ b/docs/sphinx/hxtool.py
@@ -78,6 +78,14 @@ def parse_archheading(file, lnum, line):
 serror(file, lnum, "Invalid ARCHHEADING line")
 return match.group(1)
 
+def parse_srst(file, lnum, line):
+"""Handle an SRST directive"""
+# The input should be either "SRST", or "SRST(label)".
+match = re.match(r'SRST(\((.*?)\))?', line)
+if match is None:
+serror(file, lnum, "Invalid SRST line")
+return match.group(2)
+
 class HxtoolDocDirective(Directive):
 """Extract rST fragments from the specified .hx file"""
 required_argument = 1
@@ -113,6 +121,14 @@ def run(self):
 serror(hxfile, lnum, 'expected ERST, found SRST')
 else:
 state = HxState.RST
+label = parse_srst(hxfile, lnum, line)
+if label:
+rstlist.append("", hxfile, lnum - 1)
+# Build label as _DOCNAME-HXNAME-LABEL
+hx = os.path.splitext(os.path.basename(hxfile))[0]
+refline = ".. _" + env.docname + "-" + hx + \
+"-" + label + ":"
+rstlist.append(refline, hxfile, lnum - 1)
 elif directive == 'ERST':
 if state == HxState.CTEXT:
 serror(hxfile, lnum, 'expected SRST, found ERST')
diff --git a/docs/system/i386/xen.rst b/docs/system/i386/xen.rst
index 81898768ba..46db5f34c1 100644
--- a/docs/system/i386/xen.rst
+++ b/docs/system/i386/xen.rst
@@ -132,7 +132,8 @@ The example above provides the guest kernel command line 
after a separator
 (" ``--`` ") on the Xen command line, and does not provide the guest kernel
 with an actual initramfs, which would need to listed as a second multiboot
 module. For more complicated alternatives, see the command line
-documentation for the ``-initrd`` option.
+:ref:`documentation ` for the
+``-initrd`` option.
 
 Hos

[PATCH 3/4] hw/arm/stellaris: Add missing QOM 'machine' parent

QDev objects created with qdev_new() need to manually add
their parent relationship with object_property_add_child().

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/stellaris.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
index 284b95005f..bb88b3ebde 100644
--- a/hw/arm/stellaris.c
+++ b/hw/arm/stellaris.c
@@ -1247,10 +1247,13 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
&error_fatal);
 
 ssddev = qdev_new("ssd0323");
+object_property_add_child(OBJECT(ms), "oled", OBJECT(ssddev));
 qdev_prop_set_uint8(ssddev, "cs", 1);
 qdev_realize_and_unref(ssddev, bus, &error_fatal);
 
 gpio_d_splitter = qdev_new(TYPE_SPLIT_IRQ);
+object_property_add_child(OBJECT(ms), "splitter",
+  OBJECT(gpio_d_splitter));
 qdev_prop_set_uint32(gpio_d_splitter, "num-lines", 2);
 qdev_realize_and_unref(gpio_d_splitter, NULL, &error_fatal);
 qdev_connect_gpio_out(
@@ -1287,6 +1290,7 @@ static void stellaris_init(MachineState *ms, 
stellaris_board_info *board)
 DeviceState *gpad;
 
 gpad = qdev_new(TYPE_STELLARIS_GAMEPAD);
+object_property_add_child(OBJECT(ms), "gamepad", OBJECT(gpad));
 for (i = 0; i < ARRAY_SIZE(gpad_keycode); i++) {
 qlist_append_int(gpad_keycode_list, gpad_keycode[i]);
 }
-- 
2.41.0

[PATCH 2/4] hw/arm/stellaris: Convert I2C controller to Resettable interface

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/stellaris.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
index afbc83f1e6..284b95005f 100644
--- a/hw/arm/stellaris.c
+++ b/hw/arm/stellaris.c
@@ -607,8 +607,11 @@ static void stellaris_i2c_write(void *opaque, hwaddr 
offset,
 stellaris_i2c_update(s);
 }
 
-static void stellaris_i2c_reset(stellaris_i2c_state *s)
+static void stellaris_i2c_reset_exit(Object *obj)
 {
+stellaris_i2c_state *s = STELLARIS_I2C(obj);
+
+/* ??? For now we only implement the master interface.  */
 if (s->mcs & STELLARIS_I2C_MCS_BUSBSY)
 i2c_end_transfer(s->bus);
 
@@ -658,8 +661,6 @@ static void stellaris_i2c_init(Object *obj)
 memory_region_init_io(&s->iomem, obj, &stellaris_i2c_ops, s,
   "i2c", 0x1000);
 sysbus_init_mmio(sbd, &s->iomem);
-/* ??? For now we only implement the master interface.  */
-stellaris_i2c_reset(s);
 }
 
 /* Analogue to Digital Converter.  This is only partially implemented,
@@ -1382,7 +1383,9 @@ type_init(stellaris_machine_init)
 static void stellaris_i2c_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
+ResettableClass *rc = RESETTABLE_CLASS(klass);
 
+rc->phases.exit = stellaris_i2c_reset_exit;
 dc->vmsd = &vmstate_stellaris_i2c;
 }
 
-- 
2.41.0

[PATCH 0/4] hw/arm/stellaris: QOM/QDev cleanups

Gustavo wants to access the QOM path of an input IRQ line
from the NVIC, but since the device is orphan he ends up
with this nasty path [*]:

  -device 
ivshmem-flat,chardev=ivshmem_flat,x-irq-qompath='/machine/unattached/device[1]/nvic/unnamed-gpio-in[0]',x-bus-qompath='/sysbus'

Add the missing parent so the tree is now:

(qemu) info qom-tree
/machine (lm3s6965evb-machine)
  /gamepad (stellaris-gamepad)
  /oled (ssd0323)
  /peripheral (container)
  /peripheral-anon (container)
  /soc (container)
/v7m (armv7m)
  /cpu (cortex-m3-arm-cpu)
/unnamed-gpio-in[0] (irq)
/unnamed-gpio-in[1] (irq)
/unnamed-gpio-in[2] (irq)
/unnamed-gpio-in[3] (irq)
  /cpuclk (clock)
  /nvic (armv7m_nvic)
/NMI[0] (irq)
/nvic_sysregs[0] (memory-region)
/systick-trigger[0] (irq)
/systick-trigger[1] (irq)
/unnamed-gpio-in[0] (irq)
...

[*] 
https://lore.kernel.org/qemu-devel/20231127052024.435743-1-gustavo.rom...@linaro.org/

Philippe Mathieu-Daudé (4):
  hw/arm/stellaris: Convert ADC controller to Resettable interface
  hw/arm/stellaris: Convert I2C controller to Resettable interface
  hw/arm/stellaris: Add missing QOM 'machine' parent
  hw/arm/stellaris: Add missing QOM 'SoC' parent

 hw/arm/stellaris.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

-- 
2.41.0

Re: [PATCH] hw/pci: migration: Skip config space check for vendor specific capability during restore/load

2024-01-30 Thread Alex Williamson

On Tue, 30 Jan 2024 23:32:26 +0530
Vinayak Kale  wrote:

> Missed adding Michael, Marcel, Alex and Avihai earlier, apologies.
> 
> Regards,
> Vinayak
> 
> On 30/01/24 3:26 pm, Vinayak Kale wrote:
> > In case of migration, during restore operation, qemu checks the config 
> > space of the pci device with the config space
> > in the migration stream captured during save operation. In case of config 
> > space data mismatch, restore operation is failed.
> > 
> > config space check is done in function get_pci_config_device(). By default 
> > VSC (vendor-specific-capability) in config space is checked.
> > 
> > Ideally qemu should not check VSC during restore/load. This patch skips the 
> > check by not setting pdev->cmask[] for VSC offsets in pci_add_capability().
> > If cmask[] is not set for an offset, then qemu skips config space check for 
> > that offset.
> > 
> > Signed-off-by: Vinayak Kale 
> > ---
> >   hw/pci/pci.c | 7 +--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index 76080af580..32429109df 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -2485,8 +2485,11 @@ int pci_add_capability(PCIDevice *pdev, uint8_t 
> > cap_id,
> >   memset(pdev->used + offset, 0xFF, QEMU_ALIGN_UP(size, 4));
> >   /* Make capability read-only by default */
> >   memset(pdev->wmask + offset, 0, size);
> > -/* Check capability by default */
> > -memset(pdev->cmask + offset, 0xFF, size);
> > +
> > +if (cap_id != PCI_CAP_ID_VNDR) {
> > +/* Check non-vendor specific capability by default */
> > +memset(pdev->cmask + offset, 0xFF, size);
> > +}
> >   return offset;
> >   }
> > 
> 

If there is a possibility that the data within the vendor specific cap
can be consumed by the driver or diagnostic tools, then it's part of
the device ABI and should be consistent across migration.  A mismatch
can certainly cause a migration failure, but why shouldn't it?

This might be arguably ok (with more details) for a specific device,
but I don't think it can be the default given the arbitrary data
vendors can expose here.  Also, if this one, why not also the vendor
specific extended capability?  Thanks,

Alex

Re: [PATCH v3] doc/sphinx/hxtool.py: add optional label argument to SRST directive

2024-01-30 Thread David Woodhouse

On Tue, 2024-01-30 at 17:55 +, Peter Maydell wrote:
> 
> This looks good so
> Reviewed-by: Peter Maydell 

Thanks.

> but something has got mangled somewhere: patchew can't apply it:
> https://patchew.org/QEMU/4114f7204e892316d66be8f810eb5b8de4c0f75f.ca...@infradead.org/
> and patches doesn't like it either. In both cases git am barfs with
> 
> error: corrupt patch at line 23
> 
> I'm guessing it doesn't like the quoted-printable encoding.

Nah, QP really ought to be fine. The problem is that for some reason
Evolution has decided to replace space characters with non-breaking-
space characters. That is a new and strange pathology... in a version
of Evolution that I haven't updated for over a year.

I'll send a v4 with git-send-email, as checkpatch was whinging about
line lengths anyway.

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 04/17] migration/multifd: Set p->running = true in the right place

2024-01-30 Thread Avihai Horon




On 30/01/2024 7:57, Peter Xu wrote:

External email: Use caution opening links or attachments


On Mon, Jan 29, 2024 at 02:20:35PM +0200, Avihai Horon wrote:

On 29/01/2024 6:17, Peter Xu wrote:

External email: Use caution opening links or attachments


On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:

On 25/01/2024 22:57, Fabiano Rosas wrote:

External email: Use caution opening links or attachments


Avihai Horon  writes:


The commit in the fixes line moved multifd thread creation to a
different location, but forgot to move the p->running = true assignment
as well. Thus, p->running is set to true before multifd thread is
actually created.

p->running is used in multifd_save_cleanup() to decide whether to join
the multifd thread or not.

With TLS, an error in multifd_tls_channel_connect() can lead to a
segmentation fault because p->running is true but p->thread is never
initialized, so multifd_save_cleanup() tries to join an uninitialized
thread.

Fix it by moving p->running = true assignment right after multifd thread
creation. Also move qio_channel_set_delay() to there, as this is where
it used to be originally.

Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
Signed-off-by: Avihai Horon 

Just for context, I haven't looked at this patch yet, but we were
planning to remove p->running altogether:

https://lore.kernel.org/r/20231110200241.20679-1-faro...@suse.de

Thanks for putting me in the picture.
I see that there has been a discussion about the multifd creation/treadown
flow.
In light of this discussion, I can already see a few problems in my series
that I didn't notice before (such as the TLS handshake thread leak).
The thread you mentioned here and some of my patches point out some problems
in multifd creation/treardown. I guess we can discuss it and see what's the
best way to solve them.

Regarding this patch, your solution indeed solves the bug that this patch
addresses, so maybe this could be dropped (or only noted in your patch).

Maybe I should also put you (and Peter) in context for this whole series --
I am writing it as preparation for adding a separate migration channel for
VFIO device migration, so VFIO devices could be migrated in parallel.
So this series tries to lay down some foundations to facilitate it.

Avihai, is the throughput the only reason that VFIO would like to have a
separate channel?

Actually, the main reason is to be able to send and load multiple VFIO
devices data in parallel.
For example, today if we have three VFIO devices, they are migrated
sequentially one after another.
This particularly hurts during the complete pre-copy phase (downtime), as
loading the VFIO data in destination involves FW interaction and resource
allocation, which takes time and simply blocks the other devices from
sending and loading their data.
Providing a separate channel and thread for each VIFO device solves this
problem and ideally reduces the VFIO contribution to downtime from sum{VFIO
device #1, ..., VFIO device #N} to max{VFIO device #1, ..., VFIO device #N}.

I see.


I'm wondering if we can also use multifd threads to send vfio data at some
point.  Now multifd indeed is closely bound to ram pages but maybe it'll
change in the near future to take any load?

Multifd is for solving the throughput issue already. If vfio has the same
goal, IMHO it'll be good to keep them using the same thread model, instead
of managing different threads in different places.  With that, any user
setting (for example, multifd-n-threads) will naturally apply to all
components, rather than relying on yet-another vfio-migration-threads-num
parameter.

Frankly, I didn't really put much attention to the throughput factor, and my
plan is to introduce only a single thread per device.
VFIO devices may have many GBs of data to migrate (e.g., vGPUs) and even
mlx5 VFs can have a few GBs of data.
So what you are saying here is interesting, although I didn't test such
scenario to see the actual benefit.

I am trying to think if/how this could work and I have a few concerns:
1. RAM is made of fixed-positioned pages that can be randomly read/written,
so sending these pages over multiple channels and loading them in the
destination can work pretty naturally without much overhead.
VFIO device data, on the other hand, is just an opaque stream of bytes
from QEMU point of view. This means that if we break this data to "packets"
and send them over multiple channels, we must preserve the order by which
this data was
originally read from the device and write the data in the same order to
the destination device.
I am wondering if the overhead of maintaining such order may hurt
performance, making it not worthwhile.

Indeed, it seems to me VFIO migration is based on a streaming model where
there's no easy way to index a chunk of data.


Yes, you can see it here as well: 
https://elixir.bootlin.com/linux/v6.8-rc2/source/include/uapi/linux/vfio.h#L1039




Is there any background

Re: [PATCH] hw/scsi/lsi53c895a: add missing decrement of reentrancy counter

2024-01-30 Thread Helge Deller


On 1/28/24 21:22, Sven Schnelle wrote:

When the maximum count of SCRIPTS instructions is reached, the code
stops execution and returns, but fails to decrement the reentrancy
counter. This effectively renders the SCSI controller unusable
because on next entry the reentrancy counter is still above the limit.

This bug was seen on HP-UX 10.20 which seems to trigger SCRIPTS
loops.

Fixes: b987718bbb ("hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI controller 
(CVE-2023-0330)")
Signed-off-by: Sven Schnelle 


Tested-by: Helge Deller 

Thanks!
Helge


---
  hw/scsi/lsi53c895a.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 34e3b89287..d607a5f9fb 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -1159,6 +1159,7 @@ again:
  lsi_script_scsi_interrupt(s, LSI_SIST0_UDC, 0);
  lsi_disconnect(s);
  trace_lsi_execute_script_stop();
+reentrancy_level--;
  return;
  }
  insn = read_dword(s, s->dsp);

Re: [PATCH 05/17] migration/multifd: Wait for multifd channels creation before proceeding

2024-01-30 Thread Avihai Horon




On 29/01/2024 16:34, Fabiano Rosas wrote:

External email: Use caution opening links or attachments


Avihai Horon  writes:


Currently, multifd channels are created asynchronously without waiting
for their creation -- migration simply proceeds and may wait in
multifd_send_sync_main(), which is called by ram_save_setup(). This
hides in it some race conditions which can cause an unexpected behavior
if some channels creation fail.

For example, the following scenario of multifd migration with two
channels, where the first channel creation fails, will end in a
segmentation fault (time advances from top to bottom):

Is this reproducible? Or just observable at least.


Yes, though I had to engineer it a bit:
1. Run migration with two multifd channels and fail creation of the two 
channels (e.g., by changing the address they are connecting to).
2. Add sleep(3) in multifd_send_sync_main() before we loop through the 
channels and check p->quit.
3. Add sleep(5) only for the second multifd channel connect thread so 
its connection is delayed and runs last.



I acknowledge the situation you describe, but with multifd there's
usually an issue in cleanup paths. Let's make sure we flushed those out
before adding this new semaphore.


Indeed, I was not keen on adding yet another semaphore either.
I think there are multiple bugs here, some of them overlap and some don't.
There is also your and Peter's previous work that I was not aware of to 
fix those and to clean up the code.


Maybe we can take it one step at a time, pushing your series first, 
cleaning the code and fixing some bugs.
Then we can see what bugs are left (if any) and fix them. It might even 
be easier to fix after the cleanups.



This is similar to an issue Peter was addressing where we missed calling
multifd_send_termiante_threads() in the multifd_channel_connect() path:

patch 4 in this
https://lore.kernel.org/r/20231022201211.452861-1-pet...@redhat.com


What issue are you referring here? Can you elaborate?

The main issue I am trying to fix in my patch is that we don't wait for 
all multifd channels to be created/error out before tearing down

multifd resources in mulitfd_save_cleanup().


Thread   | Code execution

Multifd 1|
  | multifd_new_send_channel_async (errors and quits)
  |   multifd_new_send_channel_cleanup
  |
Migration thread |
  | qemu_savevm_state_setup
  |   ram_save_setup
  | multifd_send_sync_main
  | (detects Multifd 1 error and quits)
  | [...]
  | migration_iteration_finish
  |   migrate_fd_cleanup_schedule
  |
Main thread  |
  | migrate_fd_cleanup
  |   multifd_save_cleanup (destroys Multifd 2 resources)
  |
Multifd 2|
  | multifd_new_send_channel_async
  | (accesses destroyed resources, segfault)

In another scenario, migration can hang indefinitely:
1. Main migration thread reaches multifd_send_sync_main() and waits on
the semaphores.
2. Then, all multifd channels creation fails, so they post the
semaphores and quit.
3. Main migration channel will not identify the error, proceed to send
pages and will hang.

Fix it by waiting for all multifd channels to be created before
proceeding with migration.

Signed-off-by: Avihai Horon 
---
  migration/multifd.h   |  3 +++
  migration/migration.c |  1 +
  migration/multifd.c   | 34 +++---
  migration/ram.c   |  7 +++
  4 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 35d11f103c..87a64e0a87 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -23,6 +23,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
  void multifd_recv_sync_main(void);
  int multifd_send_sync_main(void);
  int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
+int multifd_send_channels_created(void);

  /* Multifd Compression flags */
  #define MULTIFD_FLAG_SYNC (1 << 0)
@@ -86,6 +87,8 @@ typedef struct {
  /* multifd flags for sending ram */
  int write_flags;

+/* Syncs channel creation and migration thread */
+QemuSemaphore create_sem;
  /* sem where to wait for more work */
  QemuSemaphore sem;
  /* syncs main thread and channels */
diff --git a/migration/migration.c b/migration/migration.c
index 9c769a1ecd..d81d96eaa5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3621,6 +3621,7 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
  error_report_err(local_err);
  migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
MIGRATION_STATUS_FAILED);
+multifd_send_channels_created();
  migrate_fd_cleanup(s

Re: [RFC 0/7] VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling for hotplugged devices

2024-01-30 Thread Jean-Philippe Brucker

On Mon, Jan 29, 2024 at 05:38:55PM +0100, Eric Auger wrote:
> > There may be a separate argument for clearing bypass. With a coldplugged
> > VFIO device the flow is:
> >
> > 1. Map the whole guest address space in VFIO to implement boot-bypass.
> >This allocates all guest pages, which takes a while and is wasteful.
> >I've actually crashed a host that way, when spawning a guest with too
> >much RAM.
> interesting
> > 2. Start the VM
> > 3. When the virtio-iommu driver attaches a (non-identity) domain to the
> >assigned endpoint, then unmap the whole address space in VFIO, and most
> >pages are given back to the host.
> >
> > We can't disable boot-bypass because the BIOS needs it. But instead the
> > flow could be:
> >
> > 1. Start the VM, with only the virtual endpoints. Nothing to pin.
> > 2. The virtio-iommu driver disables bypass during boot
> We needed this boot-bypass mode for booting with virtio-blk-scsi
> protected with virtio-iommu for instance.
> That was needed because we don't have any virtio-iommu driver in edk2 as
> opposed to intel iommu driver, right?

Yes. What I had in mind is the x86 SeaBIOS which doesn't have any IOMMU
driver and accesses the default SATA device:

 $ qemu-system-x86_64 -M q35 -device virtio-iommu,boot-bypass=off
 qemu: virtio_iommu_translate sid=250 is not known!!
 qemu: no buffer available in event queue to report event
 qemu: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address

But it's the same problem with edk2. Also a guest OS without a
virtio-iommu driver needs boot-bypass. Once firmware boot is complete, the
OS with a virtio-iommu driver normally can turn bypass off in the config
space, it's not useful anymore. If it needs to put some endpoints in
bypass, then it can attach them to a bypass domain.

> > 3. Hotplug the VFIO device. With bypass disabled there is no need to pin
> >the whole guest address space, unless the guest explicitly asks for an
> >identity domain.
> >
> > However, I don't know if this is a realistic scenario that will actually
> > be used.
> >
> > By the way, do you have an easy way to reproduce the issue described here?
> > I've had to enable iommu.forcedac=1 on the command-line, otherwise Linux
> > just allocates 32-bit IOVAs.
> I don't have a simple generic reproducer. It happens when assigning this
> device:
> Ethernet Controller E810-C for QSFP (Ethernet Network Adapter E810-C-Q2)
> 
> I have not encountered that issue with another device yet.
> I see on guest side in dmesg:
> [    6.849292] ice :00:05.0: Using 64-bit DMA addresses
> 
> That's emitted in dma-iommu.c iommu_dma_alloc_iova().
> Looks like the guest first tries to allocate an iova in the 32-bit AS
> and if this fails use the whole dma_limit.
> Seems the 32b IOVA alloc failed here ;-)

Interesting, are you running some demanding workload and a lot of CPUs?
That's a lot of IOVAs used up, I'm curious about what kind of DMA pattern
does that.

Thanks,
Jean

Re: [PATCH] hw/pci: migration: Skip config space check for vendor specific capability during restore/load

2024-01-30 Thread Vinayak Kale


Missed adding Michael, Marcel, Alex and Avihai earlier, apologies.

Regards,
Vinayak

On 30/01/24 3:26 pm, Vinayak Kale wrote:

In case of migration, during restore operation, qemu checks the config space of 
the pci device with the config space
in the migration stream captured during save operation. In case of config space 
data mismatch, restore operation is failed.

config space check is done in function get_pci_config_device(). By default VSC 
(vendor-specific-capability) in config space is checked.

Ideally qemu should not check VSC during restore/load. This patch skips the check 
by not setting pdev->cmask[] for VSC offsets in pci_add_capability().
If cmask[] is not set for an offset, then qemu skips config space check for 
that offset.

Signed-off-by: Vinayak Kale 
---
  hw/pci/pci.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 76080af580..32429109df 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2485,8 +2485,11 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
  memset(pdev->used + offset, 0xFF, QEMU_ALIGN_UP(size, 4));
  /* Make capability read-only by default */
  memset(pdev->wmask + offset, 0, size);
-/* Check capability by default */
-memset(pdev->cmask + offset, 0xFF, size);
+
+if (cap_id != PCI_CAP_ID_VNDR) {
+/* Check non-vendor specific capability by default */
+memset(pdev->cmask + offset, 0xFF, size);
+}
  return offset;
  }

Re: [PATCH v3] doc/sphinx/hxtool.py: add optional label argument to SRST directive

2024-01-30 Thread Peter Maydell

On Sat, 27 Jan 2024 at 23:18, David Woodhouse  wrote:
>
> From: David Woodhouse 
>
> We can't just embed labels directly into files like qemu-options.hx which
> are included from multiple top-level rST files, because Sphinx sees the
> labels as duplicate: https://github.com/sphinx-doc/sphinx/issues/9707
>
> So add an optional argument to the SRST directive which causes a label
> of the form '.. _DOCNAME-HXFILE-LABEL:' to be emitted, where 'DOCNAME'
> is the name of the top level rST file, 'HXFILE' is the filename of the
> .hx file, and 'LABEL' is the text provided within the 'SRST()' directive.
> Using the DOCNAME of the top-level rST document means that it is unique
> even when the .hx file is included from two different documents, as is
> the case for qemu-options.hx
>
> Now where the Xen PV documentation refers to the documentation for the
> -initrd command line option, it can emit a link directly to it as
> ''.
>
> Signed-off-by: David Woodhouse 
> Reviewed-by: Paul Durrant 
> ---

This looks good so
Reviewed-by: Peter Maydell 

but something has got mangled somewhere: patchew can't apply it:
https://patchew.org/QEMU/4114f7204e892316d66be8f810eb5b8de4c0f75f.ca...@infradead.org/
and patches doesn't like it either. In both cases git am barfs with

error: corrupt patch at line 23

I'm guessing it doesn't like the quoted-printable encoding.

thanks
-- PMM

Re: [PATCH] pc: q35: Bump max_cpus to 1728 vcpus

2024-01-30 Thread Daniel P . Berrangé

On Tue, Jan 30, 2024 at 10:39:51PM +0530, Ani Sinha wrote:
> 
> 
> > On 30-Jan-2024, at 22:17, Daniel P. Berrangé  wrote:
> > 
> > On Tue, Jan 30, 2024 at 10:14:28PM +0530, Ani Sinha wrote:
> >> Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to 
> >> allow up to 4096 vCPUs")
> >> Linux kernel can support upto a maximum number of 4096 vCPUS when MAXSMP is
> >> enabled in the kernel. QEMU has been tested to correctly boot a linux guest
> >> with 1728 vcpus both with edk2 and seabios firmwares. So bump up the 
> >> max_cpus
> >> value for q35 machines versions 9 and newer to 1728. Q35 machines versions
> >> 8.2 and older continue to support 1024 maximum vcpus as before for
> >> compatibility.
> > 
> > Where does the 1728 number come from ?
> > 
> > Did something break at 1729, or did the test machine simply not
> > have sufficient resources to do practical larger tests ?
> 
> Actual limit currently is 1856 for EDK2. The HPE folks tested QEMU with edk2 
> and QEMU fails to boot beyond that limit.
> There are RH internal bugs tracking this and Gerd is working on it from RH 
> side [1].
> 
> We would ultimately like to go to 8192 vcpus for SAP HANA but 1728 vcpus is 
> our immediate target for now. If you want, I can resend the patch with 1856 
> since that is currently the tested limit.

Yes, could you resend with 1856, and include a description of
the blocking problem in the commit message for the historical
record, as this is the kind of thing that people will have
forgotten when re-visiting the patch later and wondering why
this limit was chosen.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v15 3/9] hw/misc: Add qtest for NPCM7xx PCI Mailbox

2024-01-30 Thread Peter Maydell

On Thu, 25 Jan 2024 at 19:42, Nabih Estefan  wrote:
>
> From: Hao Wu 
>
> This patches adds a qtest for NPCM7XX PCI Mailbox module.
> It sends read and write requests to the module, and verifies that
> the module contains the correct data after the requests.
>
> Change-Id: I2e1dbaecf8be9ec7eab55cb54f7fdeb0715b8275
> Signed-off-by: Hao Wu 
> Signed-off-by: Nabih Estefan 
> Reviewed-by: Tyrone Ting 

> +/*
> + * Create a local TCP socket with any port, then save off the port we got.
> + */
> +static in_port_t open_socket(void)

This should be "int" -- you've lost a change that I noted in
my review on v12 and made in the patchset I sent to the list
in the pullreq. (in_port_t doesn't exist on Windows.)

thanks
-- PMM

Re: [PATCH v15 1/9] hw/misc: Add Nuvoton's PCI Mailbox Module

2024-01-30 Thread Peter Maydell

On Thu, 25 Jan 2024 at 19:42, Nabih Estefan  wrote:
>
> From: Hao Wu 
>
> The PCI Mailbox Module is a high-bandwidth communcation module
> between a Nuvoton BMC and CPU. It features 16KB RAM that are both
> accessible by the BMC and core CPU. and supports interrupt for
> both sides.
>
> This patch implements the BMC side of the PCI mailbox module.
> Communication with the core CPU is emulated via a chardev and
> will be in a follow-up patch.
>
> This patch also adds documentation on the PCIe Protocol used
> by the chardev device.
>
> Change-Id: Iaca22f81c4526927d437aa367079ed038faf43f2
> Signed-off-by: Hao Wu 
> Signed-off-by: Nabih Estefan 
> Reviewed-by: Tyrone Ting 
> ---
>  docs/specs/pci_mbox_chardev.rst| 159 ++

Sphinx insists that every .rst file under docs
is specifically listed in a table of contents somewhere.
In this case that means you should list the new file
in docs/specs/index.rst.

If you configure with '--enable-docs' that should ensure
that you have all the prerequisites to build the documentation
and you ought then to get an error message during build
about the missing toc entry problem, so you can be sure you've
fixed it correctly.

You'll also find that there are other markup syntax errors you
need to fix (eg on my system sphinx complains
"pci_mbox_chardev.rst:29:Unexpected indentation.").

You should also look at the generated HTML to make sure
it renders as you expect it to. I suspect that your ascii-art
diagrams of the protocol packets are not going to render
correctly as they stand.

>  hw/misc/meson.build|   1 +
>  hw/misc/npcm7xx_pci_mbox.c | 335 +
>  hw/misc/trace-events   |   5 +
>  include/hw/misc/npcm7xx_pci_mbox.h |  81 +++
>  5 files changed, 581 insertions(+)
>  create mode 100644 docs/specs/pci_mbox_chardev.rst
>  create mode 100644 hw/misc/npcm7xx_pci_mbox.c
>  create mode 100644 include/hw/misc/npcm7xx_pci_mbox.h
>
> diff --git a/docs/specs/pci_mbox_chardev.rst b/docs/specs/pci_mbox_chardev.rst
> new file mode 100644
> index 00..2a26e6bb8f
> --- /dev/null
> +++ b/docs/specs/pci_mbox_chardev.rst
> @@ -0,0 +1,159 @@
> +Remote PCIe Protocol
> +
> +
> +Design
> +--
> +The communication or this device is done via a chardev. It is bidirectional:

"of" ?

> +QEMU can send requests to devices and the device can send MSI/DMA requests
> +to QEMU. All registers are encoded in Little Endian.

Lower case for "little endian".

> +
> +To distinguish between the two types of messages, any message with an error
> +code described below is a response, otherwise it is a request. The remote
> +PCIe device is responsible for guaranteeing the messages sent out are
> +integrated.

What does it mean to "integrate" a message ?

> +
> +The highest bit for the first byte reflects whether a message is a request
> +or response - 0 for request and 1 for response.
> +
> +For responses, the rest of the bits reflect the error code.
> +For requests, the rest of the bit is the command code specified below.
> +
> +
> +Initialization
> +--
> +During initialization of the remote PCIe device in QEMU, it needs to specify
> +a few configuration parameters. The PCIe connector is responsible for
> +getting these configuration parameters and passing them in as QDev
> +properties
> +The fields include:
> + 1. PCI endpoint device identifiers 
> (google3/platforms/asic_sw/proto/device_identifiers.proto).

A google internal filename isn't much use to the rest of us :-)

> + a. Vendor ID
> + b. Device ID
> + c. Subsystem Vendor ID
> + d. Subsystem Device ID
> + e. Class Code
> + f. Subclass
> + g. Programming Interface
> + h. Revision ID
> + 2. Number of BARs and the size of each BAR
> + 3. Whether DMA is supported.
> + 4. Number of MSI vectors supported (must be power of 2, up to 32)
> +
> +Request and Reponse Breakdowns
> +--
> +PCI Endpoint R/W Request
> +
> +QEMU can send this request to endpoint.
> +ReadData
> +Request:
> ++--+--++---+---+
> +| Byte | 0| 0x1| 0x2 ~ 0x9 | 0xa   |
> +| Data | 0x01 | bar_no | offset| read_size |
> ++--+--++---+---+
> +(read_size in number of bytes, must be between 1 and 8)
> +Response:
> +Success:
> ++--+--+---+
> +| Byte | 0| 0x1 ~ read_size+1 |
> +| Data | 0x80 | data  |
> ++--+--+---+
> +Failure:
> ++--+---+
> +| Byte | 0 |
> +| Data | 0x80 | error_code |
> ++--+---+

This doesn't seem to match what your test code does:
it sends an OP_READ 0x1 then a 4-byte offset then
a 1 byte size, with no "bar_no" field. Also there seems
to be some confusion of direction here -- the test
case is sending this command to QEMU, not receiving it
from QEMU.

You also don't seem to define what va

Re: [PATCH 1/3] hw/i386: Add `\n` to hint message

2024-01-30 Thread Greg Kurz

On Tue, 30 Jan 2024 21:43:27 +0530
Ani Sinha  wrote:

> 
> 
> > On 30-Jan-2024, at 21:26, Greg Kurz  wrote:
> > 
> > error_fprintf() doesn't add newlines.
> 
> ^
> 
> Should be error_printf(). Ditto for other patches.
> 

Thanks. Posted a v2.

> > 
> > Signed-off-by: Greg Kurz 
> > ---
> > hw/i386/acpi-build.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index edc979379c03..e990b0ae927f 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -2697,7 +2697,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
> > *machine)
> > " migration may not work",
> > tables_blob->len, legacy_table_size);
> > error_printf("Try removing CPUs, NUMA nodes, memory slots"
> > - " or PCI bridges.");
> > + " or PCI bridges.\n");
> > }
> > g_array_set_size(tables_blob, legacy_table_size);
> > } else {
> > @@ -2709,7 +2709,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
> > *machine)
> > " migration may not work",
> > tables_blob->len, ACPI_BUILD_TABLE_SIZE / 2);
> > error_printf("Try removing CPUs, NUMA nodes, memory slots"
> > - " or PCI bridges.");
> > + " or PCI bridges.\n");
> > }
> > acpi_align_size(tables_blob, ACPI_BUILD_TABLE_SIZE);
> > }
> > -- 
> > 2.43.0
> > 
> 



-- 
Greg

Re: [PATCH v2 3/3] hw/arm: Add `\n` to hint message




> On 30-Jan-2024, at 22:07, Greg Kurz  wrote:
> 
> error_printf() doesn't add newlines.
> 
> Signed-off-by: Greg Kurz 

Reviewed-by: Ani Sinha 

> ---
> hw/arm/virt-acpi-build.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 17aeec7a6f56..48febde1ccd1 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -1008,7 +1008,7 @@ void virt_acpi_build(VirtMachineState *vms, 
> AcpiBuildTables *tables)
> " migration may not work",
> tables_blob->len, ACPI_BUILD_TABLE_SIZE / 2);
> error_printf("Try removing CPUs, NUMA nodes, memory slots"
> - " or PCI bridges.");
> + " or PCI bridges.\n");
> }
> acpi_align_size(tables_blob, ACPI_BUILD_TABLE_SIZE);
> 
> -- 
> 2.43.0
>

Re: [PATCH] pc: q35: Bump max_cpus to 1728 vcpus




> On 30-Jan-2024, at 22:17, Daniel P. Berrangé  wrote:
> 
> On Tue, Jan 30, 2024 at 10:14:28PM +0530, Ani Sinha wrote:
>> Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow 
>> up to 4096 vCPUs")
>> Linux kernel can support upto a maximum number of 4096 vCPUS when MAXSMP is
>> enabled in the kernel. QEMU has been tested to correctly boot a linux guest
>> with 1728 vcpus both with edk2 and seabios firmwares. So bump up the max_cpus
>> value for q35 machines versions 9 and newer to 1728. Q35 machines versions
>> 8.2 and older continue to support 1024 maximum vcpus as before for
>> compatibility.
> 
> Where does the 1728 number come from ?
> 
> Did something break at 1729, or did the test machine simply not
> have sufficient resources to do practical larger tests ?

Actual limit currently is 1856 for EDK2. The HPE folks tested QEMU with edk2 
and QEMU fails to boot beyond that limit.
There are RH internal bugs tracking this and Gerd is working on it from RH side 
[1].

We would ultimately like to go to 8192 vcpus for SAP HANA but 1728 vcpus is our 
immediate target for now. If you want, I can resend the patch with 1856 since 
that is currently the tested limit.

1. https://issues.redhat.com/browse/RHEL-22202


> 
>> 
>> If KVM is not able to support the specified number of vcpus, QEMU would
>> return the following error messages:
>> 
>> $ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728
>> qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested (1728) 
>> exceeds the recommended cpus supported by KVM (12)
>> qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus 
>> requested (1728) exceeds the recommended cpus supported by KVM (12)
>> Number of SMP cpus requested (1728) exceeds the maximum cpus supported by 
>> KVM (1024)
>> 
>> Cc: Daniel P. Berrangé 
>> Cc: Igor Mammedov 
>> Cc: Michael S. Tsirkin 
>> Cc: Julia Suvorova 
>> Signed-off-by: Ani Sinha 
>> ---
>> hw/i386/pc_q35.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
>> index f43d5142b8..bfa627a70b 100644
>> --- a/hw/i386/pc_q35.c
>> +++ b/hw/i386/pc_q35.c
>> @@ -375,7 +375,7 @@ static void pc_q35_machine_options(MachineClass *m)
>> m->default_nic = "e1000e";
>> m->default_kernel_irqchip_split = false;
>> m->no_floppy = 1;
>> -m->max_cpus = 1024;
>> +m->max_cpus = 1728;
>> m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
>> machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
>> machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
>> @@ -396,6 +396,7 @@ static void pc_q35_8_2_machine_options(MachineClass *m)
>> {
>> pc_q35_9_0_machine_options(m);
>> m->alias = NULL;
>> +m->max_cpus = 1024;
>> compat_props_add(m->compat_props, hw_compat_8_2, hw_compat_8_2_len);
>> compat_props_add(m->compat_props, pc_compat_8_2, pc_compat_8_2_len);
>> }
>> -- 
>> 2.42.0
>> 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 1/3] hw/i386: Add `\n` to hint message