date:20220719

Re: [PATCH 3/4] Establishing connection between any non-default source and destination pair

2022-07-19 Thread Daniel P . Berrangé

Re-adding the mailing list, please don't drop the list in
replies to discussions.

On Wed, Jul 20, 2022 at 02:08:23AM +0530, Het Gala wrote:
> 
> On 13/07/22 3:10 pm, Het Gala wrote:
> > 
> > On 16/06/22 11:09 pm, Daniel P. Berrangé wrote:
> > > On Thu, Jun 09, 2022 at 07:33:04AM +, Het Gala wrote:
> > > > i) Binding of the socket to source ip address and port on the
> > > > non-default
> > > >     interface has been implemented for multi-FD connection,
> > > > which was not
> > > >     necessary earlier because the binding was on the default
> > > > interface itself.
> > > > 
> > > > ii) Created an end to end connection between all multi-FD source and
> > > >  destination pairs.
> > > > 
> > > > Suggested-by: Manish Mishra 
> > > > Signed-off-by: Het Gala 
> > > > ---
> > > >   chardev/char-socket.c   |  4 +-
> > > >   include/io/channel-socket.h | 26 ++-
> > > >   include/qemu/sockets.h  |  6 ++-
> > > >   io/channel-socket.c | 50 ++--
> > > >   migration/socket.c  | 15 +++---
> > > >   nbd/client-connection.c |  2 +-
> > > >   qemu-nbd.c  |  4 +-
> > > >   scsi/pr-manager-helper.c    |  1 +
> > > >   tests/unit/test-char.c  |  8 ++--
> > > >   tests/unit/test-io-channel-socket.c |  4 +-
> > > >   tests/unit/test-util-sockets.c  | 16 +++
> > > >   ui/input-barrier.c  |  2 +-
> > > >   ui/vnc.c    |  3 +-
> > > >   util/qemu-sockets.c | 71
> > > > -
> > > >   14 files changed, 135 insertions(+), 77 deletions(-)
> > > > 
> > > > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > > > index dc4e218eeb..f3725238c5 100644
> > > > --- a/chardev/char-socket.c
> > > > +++ b/chardev/char-socket.c
> > > > @@ -932,7 +932,7 @@ static int
> > > > tcp_chr_connect_client_sync(Chardev *chr, Error **errp)
> > > >   QIOChannelSocket *sioc = qio_channel_socket_new();
> > > >   tcp_chr_change_state(s, TCP_CHARDEV_STATE_CONNECTING);
> > > >   tcp_chr_set_client_ioc_name(chr, sioc);
> > > > -    if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) {
> > > > +    if (qio_channel_socket_connect_sync(sioc, s->addr, NULL,
> > > > errp) < 0) {
> > > >   tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED);
> > > >   object_unref(OBJECT(sioc));
> > > >   return -1;
> > > > @@ -1120,7 +1120,7 @@ static void
> > > > tcp_chr_connect_client_task(QIOTask *task,
> > > >   SocketAddress *addr = opaque;
> > > >   Error *err = NULL;
> > > >   -    qio_channel_socket_connect_sync(ioc, addr, &err);
> > > > +    qio_channel_socket_connect_sync(ioc, addr, NULL, &err);
> > > >     qio_task_set_error(task, err);
> > > >   }
> > > > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> > > > index 513c428fe4..59d5b1b349 100644
> > > > --- a/include/io/channel-socket.h
> > > > +++ b/include/io/channel-socket.h
> > > > @@ -83,41 +83,45 @@ qio_channel_socket_new_fd(int fd,
> > > >   /**
> > > >    * qio_channel_socket_connect_sync:
> > > >    * @ioc: the socket channel object
> > > > - * @addr: the address to connect to
> > > > + * @dst_addr: the destination address to connect to
> > > > + * @src_addr: the source address to be connected
> > > >    * @errp: pointer to a NULL-initialized error object
> > > >    *
> > > > - * Attempt to connect to the address @addr. This method
> > > > - * will run in the foreground so the caller will not regain
> > > > - * execution control until the connection is established or
> > > > + * Attempt to connect to the address @dst_addr with @src_addr.
> > > > + * This method will run in the foreground so the caller will not
> > > > + * regain execution control until the connection is established or
> > > >    * an error occurs.
> > > >    */
> > > >   int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
> > > > -    SocketAddress *addr,
> > > > +    SocketAddress *dst_addr,
> > > > +    SocketAddress *src_addr,
> > > >   Error **errp);
> > > >     /**
> > > >    * qio_channel_socket_connect_async:
> > > >    * @ioc: the socket channel object
> > > > - * @addr: the address to connect to
> > > > + * @dst_addr: the destination address to connect to
> > > >    * @callback: the function to invoke on completion
> > > >    * @opaque: user data to pass to @callback
> > > >    * @destroy: the function to free @opaque
> > > >    * @context: the context to run the async task. If %NULL, the default
> > > >    *   context will be used.
> > > > + * @src_addr: the source address to be connected
> > > >    *
> > > > - * Attempt to connect to the address @addr. This method
> > > > - * will run in the background so the caller will regain
> > > > + * Attempt to connect

[PATCH 1/1] docs: pcie: describe PCIe option ROMs

2022-07-19 Thread Heinrich Schuchardt

Provide a descriptions of the options that control the emulation of option
ROMS for PCIe devices.

Signed-off-by: Heinrich Schuchardt 
---
 docs/pcie.txt | 25 +
 1 file changed, 25 insertions(+)

diff --git a/docs/pcie.txt b/docs/pcie.txt
index 89e3502075..a22c1f69f7 100644
--- a/docs/pcie.txt
+++ b/docs/pcie.txt
@@ -292,6 +292,31 @@ PCI-PCI Bridge slots can be used for legacy PCI host 
devices.
 If you can see the "Express Endpoint" capability in the
 output, then the device is indeed PCI Express.
 
+8. Option ROM
+=
+PCIe devices may provide an option ROM. The following properties control the
+emulation of the option ROM:
+
+``rombar`` (default: ``1``)
+  Specifies that an option ROM is available. If set to ``0``, no option ROM
+  is present.
+
+``romsize`` (default: ``-1``)
+  Specifies the size of the option ROM in bytes. The value must be either
+  ``-1`` or a power of two. ``-1`` signifies unlimited size.
+
+``romfile``
+  Defines the name of the file to be loaded as option ROM.
+  Some devices like virtio-net-pci define a default file name.
+  The file size may neither exceed 2 GiB nor ``romsize``.
+
+Some QEMU PCIe devices like virtio-net-pci use an option ROM by default. In the
+following example the option ROM of a virtio-net-pci device is disabled. This
+is useful for architectures where QEMU does not supply an option ROM file.
+
+.. code-block:: console
+
+-device virtio-net-pci,netdev=eth1,mq=on,rombar=0
 
 7. Virtio devices
 =
-- 
2.36.1

Re: [PATCH v2] docs: Add caveats for Windows as the build platform

2022-07-19 Thread Paolo Bonzini

Queued, thanks.

Paolo

Re: [PULL 00/24] Net Patches

2022-07-19 Thread Jason Wang

On Wed, Jul 20, 2022 at 2:03 PM Eugenio Perez Martin
 wrote:
>
> On Wed, Jul 20, 2022 at 5:40 AM Jason Wang  wrote:
> >
> > On Wed, Jul 20, 2022 at 12:40 AM Peter Maydell  
> > wrote:
> > >
> > > On Tue, 19 Jul 2022 at 14:17, Jason Wang  wrote:
> > > >
> > > > The following changes since commit 
> > > > f9d9fff72eed03acde97ea2d66104748dc474b2e:
> > > >
> > > >   Merge tag 'qemu-sparc-20220718' of https://github.com/mcayland/qemu 
> > > > into staging (2022-07-19 09:57:13 +0100)
> > > >
> > > > are available in the git repository at:
> > > >
> > > >   https://github.com/jasowang/qemu.git tags/net-pull-request
> > > >
> > > > for you to fetch changes up to f8a9fd7b7ab6601b76e253bbcbfe952f8c1887ec:
> > > >
> > > >   net/colo.c: fix segmentation fault when packet is not parsed 
> > > > correctly (2022-07-19 21:05:20 +0800)
> > > >
> > > > 
> > > >
> > > > 
> > >
> > > Fails to build, many platforms:
> > >
> > > eg
> > > https://gitlab.com/qemu-project/qemu/-/jobs/2742242194
> > >
> > > libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cvq_unmap_buf':
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:234: undefined
> > > reference to `vhost_iova_tree_find_iova'
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:242: undefined
> > > reference to `vhost_vdpa_dma_unmap'
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:247: undefined
> > > reference to `vhost_iova_tree_remove'
> > > libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cleanup':
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:163: undefined
> > > reference to `vhost_iova_tree_delete'
> > > libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cvq_map_buf':
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:285: undefined
> > > reference to `vhost_iova_tree_map_alloc'
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:291: undefined
> > > reference to `vhost_vdpa_dma_map'
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:300: undefined
> > > reference to `vhost_iova_tree_remove'
> > > libcommon.fa.p/net_vhost-vdpa.c.o: In function
> > > `vhost_vdpa_net_handle_ctrl_avail':
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:445: undefined
> > > reference to `vhost_svq_push_elem'
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:408: undefined
> > > reference to `vhost_svq_add'
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:422: undefined
> > > reference to `vhost_svq_poll'
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:434: undefined
> > > reference to `virtio_net_handle_ctrl_iov'
> > > libcommon.fa.p/net_vhost-vdpa.c.o: In function `net_init_vhost_vdpa':
> > > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:611: undefined
> > > reference to `vhost_iova_tree_new'
> > > libcommon.fa.p/net_vhost-vdpa.c.o: In function
> > > `glib_autoptr_cleanup_VhostIOVATree':
> > > /builds/qemu-project/qemu/hw/virtio/vhost-iova-tree.h:20: undefined
> > > reference to `vhost_iova_tree_delete'
> > > collect2: error: ld returned 1 exit status
> > > [2436/4108] Compiling C object
> > > libqemu-s390x-softmmu.fa.p/meson-generated_.._qapi_qapi-introspect.c.o
> > >
> > >
> > >
> > > Presumably the conditions in the various meson.build files are
> > > out of sync about when to build the net/vhost-vdpa.c code vs
> > > the code that's implementing the functions it's trying to call.
> > >
> > > Specifically, the functions being called will only be present
> > > if the target architecture has CONFIG_VIRTIO, which isn't
> > > guaranteed, but we try to link the vhost-vdpa code in anyway.
> >
> > Right, this is probably because vhost-vdpa start to use virtio loigc (cvq).
> >
> > Eugenio, please fix this and I will send a new version of the pull request.
> >
>
> Is the right solution to build vhost-vdpa.c only if CONFIG_VIRTIO_NET
> is defined?

If you meant net/vhost-vdpa.c. I think so, since we're using cvq logic
in virtio-net.c.

>
> It would make it equal as vhost_net_user in net/meson.buikd:
> if have_vhost_net_user
>   softmmu_ss.add(when: 'CONFIG_VIRTIO_NET', if_true:
> files('vhost-user.c'), if_false: files('vhost-user-stub.c'))
>   softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-user-stub.c'))
> endif
>
> vs
>
> if have_vhost_net_vdpa
>   softmmu_ss.add(files('vhost-vdpa.c'))
> endif
>
> Or that would be considered as a regression?

Probably not since the compilation is not broken.

> The other solution would
> be to add vhost-shadow-virtqueue-stub.c and make these functions
> return -ENOTSUP and similar.

Either should be fine, just choose the one that is easier.

Thanks

>
> Thanks!
>

Re: [PULL 00/24] Net Patches

2022-07-19 Thread Eugenio Perez Martin

On Wed, Jul 20, 2022 at 5:40 AM Jason Wang  wrote:
>
> On Wed, Jul 20, 2022 at 12:40 AM Peter Maydell  
> wrote:
> >
> > On Tue, 19 Jul 2022 at 14:17, Jason Wang  wrote:
> > >
> > > The following changes since commit 
> > > f9d9fff72eed03acde97ea2d66104748dc474b2e:
> > >
> > >   Merge tag 'qemu-sparc-20220718' of https://github.com/mcayland/qemu 
> > > into staging (2022-07-19 09:57:13 +0100)
> > >
> > > are available in the git repository at:
> > >
> > >   https://github.com/jasowang/qemu.git tags/net-pull-request
> > >
> > > for you to fetch changes up to f8a9fd7b7ab6601b76e253bbcbfe952f8c1887ec:
> > >
> > >   net/colo.c: fix segmentation fault when packet is not parsed correctly 
> > > (2022-07-19 21:05:20 +0800)
> > >
> > > 
> > >
> > > 
> >
> > Fails to build, many platforms:
> >
> > eg
> > https://gitlab.com/qemu-project/qemu/-/jobs/2742242194
> >
> > libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cvq_unmap_buf':
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:234: undefined
> > reference to `vhost_iova_tree_find_iova'
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:242: undefined
> > reference to `vhost_vdpa_dma_unmap'
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:247: undefined
> > reference to `vhost_iova_tree_remove'
> > libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cleanup':
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:163: undefined
> > reference to `vhost_iova_tree_delete'
> > libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cvq_map_buf':
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:285: undefined
> > reference to `vhost_iova_tree_map_alloc'
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:291: undefined
> > reference to `vhost_vdpa_dma_map'
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:300: undefined
> > reference to `vhost_iova_tree_remove'
> > libcommon.fa.p/net_vhost-vdpa.c.o: In function
> > `vhost_vdpa_net_handle_ctrl_avail':
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:445: undefined
> > reference to `vhost_svq_push_elem'
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:408: undefined
> > reference to `vhost_svq_add'
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:422: undefined
> > reference to `vhost_svq_poll'
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:434: undefined
> > reference to `virtio_net_handle_ctrl_iov'
> > libcommon.fa.p/net_vhost-vdpa.c.o: In function `net_init_vhost_vdpa':
> > /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:611: undefined
> > reference to `vhost_iova_tree_new'
> > libcommon.fa.p/net_vhost-vdpa.c.o: In function
> > `glib_autoptr_cleanup_VhostIOVATree':
> > /builds/qemu-project/qemu/hw/virtio/vhost-iova-tree.h:20: undefined
> > reference to `vhost_iova_tree_delete'
> > collect2: error: ld returned 1 exit status
> > [2436/4108] Compiling C object
> > libqemu-s390x-softmmu.fa.p/meson-generated_.._qapi_qapi-introspect.c.o
> >
> >
> >
> > Presumably the conditions in the various meson.build files are
> > out of sync about when to build the net/vhost-vdpa.c code vs
> > the code that's implementing the functions it's trying to call.
> >
> > Specifically, the functions being called will only be present
> > if the target architecture has CONFIG_VIRTIO, which isn't
> > guaranteed, but we try to link the vhost-vdpa code in anyway.
>
> Right, this is probably because vhost-vdpa start to use virtio loigc (cvq).
>
> Eugenio, please fix this and I will send a new version of the pull request.
>

Is the right solution to build vhost-vdpa.c only if CONFIG_VIRTIO_NET
is defined?

It would make it equal as vhost_net_user in net/meson.buikd:
if have_vhost_net_user
  softmmu_ss.add(when: 'CONFIG_VIRTIO_NET', if_true:
files('vhost-user.c'), if_false: files('vhost-user-stub.c'))
  softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-user-stub.c'))
endif

vs

if have_vhost_net_vdpa
  softmmu_ss.add(files('vhost-vdpa.c'))
endif

Or that would be considered as a regression? The other solution would
be to add vhost-shadow-virtqueue-stub.c and make these functions
return -ENOTSUP and similar.

Thanks!

Re: [PULL v2 01/31] hw/ssi: Add Ibex SPI device model

2022-07-19 Thread Alistair Francis

On Fri, May 13, 2022 at 2:37 AM Peter Maydell  wrote:
>
> On Fri, 22 Apr 2022 at 01:40, Alistair Francis
>  wrote:
> >
> > From: Wilfred Mallawa 
> >
> > Adds the SPI_HOST device model for ibex. The device specification is as per
> > [1]. The model has been tested on opentitan with spi_host unit tests
> > written for TockOS.
> >
> > [1] https://docs.opentitan.org/hw/ip/spi_host/doc/
>
>
> Hi; Coverity points out a bug in this code (CID 1488107):
>
> > +REG32(STATUS, 0x14)
> > +FIELD(STATUS, TXQD, 0, 8)
> > +FIELD(STATUS, RXQD, 18, 8)
>
> RXQD isn't at the bottom of this register, so the R_STATUS_RXQD_MASK
> define is a shifted-up mask...
>
> > +static void ibex_spi_host_transfer(IbexSPIHostState *s)
> > +{
> > +uint32_t rx, tx;
> > +/* Get num of one byte transfers */
> > +uint8_t segment_len = ((s->regs[IBEX_SPI_HOST_COMMAND] & 
> > R_COMMAND_LEN_MASK)
> > +  >> R_COMMAND_LEN_SHIFT);
> > +while (segment_len > 0) {
> > +if (fifo8_is_empty(&s->tx_fifo)) {
> > +/* Assert Stall */
> > +s->regs[IBEX_SPI_HOST_STATUS] |= R_STATUS_TXSTALL_MASK;
> > +break;
> > +} else if (fifo8_is_full(&s->rx_fifo)) {
> > +/* Assert Stall */
> > +s->regs[IBEX_SPI_HOST_STATUS] |= R_STATUS_RXSTALL_MASK;
> > +break;
> > +} else {
> > +tx = fifo8_pop(&s->tx_fifo);
> > +}
> > +
> > +rx = ssi_transfer(s->ssi, tx);
> > +
> > +trace_ibex_spi_host_transfer(tx, rx);
> > +
> > +if (!fifo8_is_full(&s->rx_fifo)) {
> > +fifo8_push(&s->rx_fifo, rx);
> > +} else {
> > +/* Assert RXFULL */
> > +s->regs[IBEX_SPI_HOST_STATUS] |= R_STATUS_RXFULL_MASK;
> > +}
> > +--segment_len;
> > +}
> > +
> > +/* Assert Ready */
> > +s->regs[IBEX_SPI_HOST_STATUS] |= R_STATUS_READY_MASK;
> > +/* Set RXQD */
> > +s->regs[IBEX_SPI_HOST_STATUS] &= ~R_STATUS_RXQD_MASK;
> > +s->regs[IBEX_SPI_HOST_STATUS] |= (R_STATUS_RXQD_MASK
> > +& div4_round_up(segment_len));
>
> ...but here we don't shift div4_round_up(segment_len) up to the
> right place before ORing it with the MASK, so Coverity points
> out that the result is always zero.
>
> > +/* Set TXQD */
> > +s->regs[IBEX_SPI_HOST_STATUS] &= ~R_STATUS_TXQD_MASK;
> > +s->regs[IBEX_SPI_HOST_STATUS] |= (fifo8_num_used(&s->tx_fifo) / 4)
> > +& R_STATUS_TXQD_MASK;
>
> This has the same issue, but avoids it by luck because TXQD
> does start at bit 0.
>
> Since we're using the FIELD() macros, it would be clearer to
> write all this to use FIELD_DP32() rather than manual
> bit operations to clear the bits and then OR in the new ones.
> (True here and also in what looks like several other places
> through out the file, for deposit and extract operations.)

Thanks Peter,

Wilfred is looking into it and should be sending patches soon.

Alistair

>
> thanks
> -- PMM

Re: [PATCH] docs: List kvm as a supported accelerator on RISC-V

2022-07-19 Thread Alistair Francis

On Tue, Jul 19, 2022 at 11:37 PM Bin Meng  wrote:
>
> Since commit fbf43c7dbf18 ("target/riscv: enable riscv kvm accel"),
> KVM accelerator is supported on RISC-V. Let's document it.
>
> Signed-off-by: Bin Meng 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>
>  docs/about/build-platforms.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/docs/about/build-platforms.rst b/docs/about/build-platforms.rst
> index ebde20f981..118a4c64dc 100644
> --- a/docs/about/build-platforms.rst
> +++ b/docs/about/build-platforms.rst
> @@ -46,7 +46,7 @@ Those hosts are officially supported, with various 
> accelerators:
> * - PPC
>   - kvm, tcg
> * - RISC-V
> - - tcg
> + - kvm, tcg
> * - s390x
>   - kvm, tcg
> * - SPARC
> --
> 2.25.1
>
>

Re: [PATCH] target/riscv: Support SW update of PTE A/D bits and Ssptwad extension

2022-07-19 Thread Alistair Francis

On Wed, Jul 20, 2022 at 1:52 PM Anup Patel  wrote:
>
> On Wed, Jul 20, 2022 at 5:02 AM Jim Shu  wrote:
> >
> > Hi Anup,
> >
> > Do you think it is OK if we only use ssptwad as a CPU option name
> > but not a RISC-V extension? just like MMU and PMP options of RISC-V.
> > (And we could change it to RISC-V extension in the future
> > if Ssptwad becomes the formal RISC-V extension)
> >
> > Both HW/SW update schemes are already defined in RISC-V priv spec,
> > so I think it's reasonable to implement them in QEMU. The only issue here is
> > to choose a proper CPU option name to turn on/off HW update of A/D bits.
>
> I am not saying that we should avoid implementing it in QEMU.
>
> My suggestion is to differentiate HW optionalities from ISA extensions
> in QEMU. For example, instead of referring to this as Ssptwad, we should
> call it "hw_ptwad" or "opt_ptwad" and don't use the "ext_" prefix.
>
> @Alistair Francis might have better suggestions on this ?

I'm just trying to wrap my head around this.

So the priv spec has this line:

"Two schemes to manage the A and D bits are permitted"

The first scheme just raises a page-fault exception, the second scheme
updates the A/D bits as you would expect.

The profiles spec then states that you must do the second scheme, as
that is what all software expects.

This patch is adding optional support for the first scheme.

Why do we want to support that? We can do either and we are
implementing the much more usual scheme. I don't see a reason to
bother implementing the other one. Is anyone ever going to use it?

Alistair

Re: [PATCH] target/riscv: Support SW update of PTE A/D bits and Ssptwad extension

2022-07-19 Thread Anup Patel

On Wed, Jul 20, 2022 at 5:02 AM Jim Shu  wrote:
>
> Hi Anup,
>
> Do you think it is OK if we only use ssptwad as a CPU option name
> but not a RISC-V extension? just like MMU and PMP options of RISC-V.
> (And we could change it to RISC-V extension in the future
> if Ssptwad becomes the formal RISC-V extension)
>
> Both HW/SW update schemes are already defined in RISC-V priv spec,
> so I think it's reasonable to implement them in QEMU. The only issue here is
> to choose a proper CPU option name to turn on/off HW update of A/D bits.

I am not saying that we should avoid implementing it in QEMU.

My suggestion is to differentiate HW optionalities from ISA extensions
in QEMU. For example, instead of referring to this as Ssptwad, we should
call it "hw_ptwad" or "opt_ptwad" and don't use the "ext_" prefix.

@Alistair Francis might have better suggestions on this ?

Regards,
Anup

>
> Regards,
> Jim Shu
>
> On Mon, Jul 18, 2022 at 12:02 PM Anup Patel  wrote:
>>
>> +Atish
>>
>> On Mon, Jul 18, 2022 at 9:23 AM Jim Shu  wrote:
>> >
>> > RISC-V priv spec v1.12 permits 2 PTE-update schemes of A/D-bit
>> > (Access/Dirty bit): HW update or SW update. RISC-V profile defines the
>> > extension name 'Ssptwad' for HW update to PTE A/D bits.
>> > https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc
>>
>> The Ssptwad (even though used by profiles) is not a well defined RISC-V
>> ISA extension. Rather, Ssptwad is just a name used in profiles to represent
>> an optional feature.
>>
>> In fact, since it is not a well-defined ISA extension, QEMU cannot include
>> Ssptwad in the ISA string as well.
>>
>> I think for such optionalities which are not well-defined ISA extensions,
>> QEMU should treat it differently.
>>
>> Regards,
>> Anup
>>
>> >
>> > Current QEMU RISC-V implements HW update scheme, so this commit
>> > introduces SW update scheme to QEMU and uses the 'Ssptwad' extension
>> > as the CPU option to select 2 PTE-update schemes. QEMU RISC-V CPU still
>> > uses HW update scheme (ext_ssptwad=true) by default to keep the backward
>> > compatibility.
>> >
>> > SW update scheme implemention is based on priv spec v1.12:
>> > "When a virtual page is accessed and the A bit is clear, or is written
>> > and the D bit is clear, a page-fault exception (corresponding to the
>> > original access type) is raised."
>> >
>> > Signed-off-by: Jim Shu 
>> > Reviewed-by: Frank Chang 
>> > ---
>> >  target/riscv/cpu.c| 2 ++
>> >  target/riscv/cpu.h| 1 +
>> >  target/riscv/cpu_helper.c | 9 +
>> >  3 files changed, 12 insertions(+)
>> >
>> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>> > index 1bb3973806..1d38c1c1d2 100644
>> > --- a/target/riscv/cpu.c
>> > +++ b/target/riscv/cpu.c
>> > @@ -857,6 +857,7 @@ static void riscv_cpu_init(Object *obj)
>> >
>> >  cpu->cfg.ext_ifencei = true;
>> >  cpu->cfg.ext_icsr = true;
>> > +cpu->cfg.ext_ssptwad = true;
>> >  cpu->cfg.mmu = true;
>> >  cpu->cfg.pmp = true;
>> >
>> > @@ -900,6 +901,7 @@ static Property riscv_cpu_extensions[] = {
>> >  DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
>> >  DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
>> >  DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false),
>> > +DEFINE_PROP_BOOL("ssptwad", RISCVCPU, cfg.ext_ssptwad, true),
>> >
>> >  DEFINE_PROP_BOOL("zba", RISCVCPU, cfg.ext_zba, true),
>> >  DEFINE_PROP_BOOL("zbb", RISCVCPU, cfg.ext_zbb, true),
>> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>> > index 5c7acc055a..2eee59af98 100644
>> > --- a/target/riscv/cpu.h
>> > +++ b/target/riscv/cpu.h
>> > @@ -433,6 +433,7 @@ struct RISCVCPUConfig {
>> >  bool ext_zve32f;
>> >  bool ext_zve64f;
>> >  bool ext_zmmul;
>> > +bool ext_ssptwad;
>> >  bool rvv_ta_all_1s;
>> >
>> >  uint32_t mvendorid;
>> > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
>> > index 59b3680b1b..a8607c0d7b 100644
>> > --- a/target/riscv/cpu_helper.c
>> > +++ b/target/riscv/cpu_helper.c
>> > @@ -981,6 +981,15 @@ restart:
>> >
>> >  /* Page table updates need to be atomic with MTTCG enabled */
>> >  if (updated_pte != pte) {
>> > +if (!cpu->cfg.ext_ssptwad) {
>> > +/*
>> > + * If A/D bits are managed by SW, HW just raises the
>> > + * page fault exception corresponding to the original
>> > + * access type when A/D bits need to be updated.
>> > + */
>> > +return TRANSLATE_FAIL;
>> > +}
>> > +
>> >  /*
>> >   * - if accessed or dirty bits need updating, and the PTE 
>> > is
>> >   *   in RAM, then we do so atomically with a compare and 
>> > swap.
>> > --
>> > 2.17.1
>> >
>> >

Re: [PULL 00/24] Net Patches

2022-07-19 Thread Jason Wang

On Wed, Jul 20, 2022 at 12:40 AM Peter Maydell  wrote:
>
> On Tue, 19 Jul 2022 at 14:17, Jason Wang  wrote:
> >
> > The following changes since commit f9d9fff72eed03acde97ea2d66104748dc474b2e:
> >
> >   Merge tag 'qemu-sparc-20220718' of https://github.com/mcayland/qemu into 
> > staging (2022-07-19 09:57:13 +0100)
> >
> > are available in the git repository at:
> >
> >   https://github.com/jasowang/qemu.git tags/net-pull-request
> >
> > for you to fetch changes up to f8a9fd7b7ab6601b76e253bbcbfe952f8c1887ec:
> >
> >   net/colo.c: fix segmentation fault when packet is not parsed correctly 
> > (2022-07-19 21:05:20 +0800)
> >
> > 
> >
> > 
>
> Fails to build, many platforms:
>
> eg
> https://gitlab.com/qemu-project/qemu/-/jobs/2742242194
>
> libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cvq_unmap_buf':
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:234: undefined
> reference to `vhost_iova_tree_find_iova'
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:242: undefined
> reference to `vhost_vdpa_dma_unmap'
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:247: undefined
> reference to `vhost_iova_tree_remove'
> libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cleanup':
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:163: undefined
> reference to `vhost_iova_tree_delete'
> libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cvq_map_buf':
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:285: undefined
> reference to `vhost_iova_tree_map_alloc'
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:291: undefined
> reference to `vhost_vdpa_dma_map'
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:300: undefined
> reference to `vhost_iova_tree_remove'
> libcommon.fa.p/net_vhost-vdpa.c.o: In function
> `vhost_vdpa_net_handle_ctrl_avail':
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:445: undefined
> reference to `vhost_svq_push_elem'
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:408: undefined
> reference to `vhost_svq_add'
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:422: undefined
> reference to `vhost_svq_poll'
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:434: undefined
> reference to `virtio_net_handle_ctrl_iov'
> libcommon.fa.p/net_vhost-vdpa.c.o: In function `net_init_vhost_vdpa':
> /builds/qemu-project/qemu/build/../net/vhost-vdpa.c:611: undefined
> reference to `vhost_iova_tree_new'
> libcommon.fa.p/net_vhost-vdpa.c.o: In function
> `glib_autoptr_cleanup_VhostIOVATree':
> /builds/qemu-project/qemu/hw/virtio/vhost-iova-tree.h:20: undefined
> reference to `vhost_iova_tree_delete'
> collect2: error: ld returned 1 exit status
> [2436/4108] Compiling C object
> libqemu-s390x-softmmu.fa.p/meson-generated_.._qapi_qapi-introspect.c.o
>
>
>
> Presumably the conditions in the various meson.build files are
> out of sync about when to build the net/vhost-vdpa.c code vs
> the code that's implementing the functions it's trying to call.
>
> Specifically, the functions being called will only be present
> if the target architecture has CONFIG_VIRTIO, which isn't
> guaranteed, but we try to link the vhost-vdpa code in anyway.

Right, this is probably because vhost-vdpa start to use virtio loigc (cvq).

Eugenio, please fix this and I will send a new version of the pull request.

Thanks

>
> thanks
> -- PMM
>

Re: [PATCH] hw/nvme: add trace events for ioeventfd

2022-07-19 Thread Jinhao Fan

at 10:41 PM, Jinhao Fan  wrote:

> at 1:34 PM, Klaus Jensen  wrote:
> 
>> From: Klaus Jensen 
>> 
>> While testing Jinhaos ioeventfd patch I found it useful with a couple of
>> additional trace events since we no longer see the mmio events.
>> 
>> Signed-off-by: Klaus Jensen 
>> ---
>> hw/nvme/ctrl.c   | 8 
>> hw/nvme/trace-events | 4 
>> 2 files changed, 12 insertions(+)
>> 
>> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>> index 533ad14e7a61..09725ec49c5d 100644
>> --- a/hw/nvme/ctrl.c
>> +++ b/hw/nvme/ctrl.c
>> @@ -1346,6 +1346,8 @@ static void nvme_post_cqes(void *opaque)
>>bool pending = cq->head != cq->tail;
>>int ret;
>> 
>> +trace_pci_nvme_post_cqes(cq->cqid);
>> +
>>QTAILQ_FOREACH_SAFE(req, &cq->req_list, entry, next) {
>>NvmeSQueue *sq;
>>hwaddr addr;
>> @@ -4238,6 +4240,8 @@ static void nvme_cq_notifier(EventNotifier *e)
>>NvmeCQueue *cq = container_of(e, NvmeCQueue, notifier);
>>NvmeCtrl *n = cq->ctrl;
>> 
>> +trace_pci_nvme_cq_notify(cq->cqid);
>> +
>>event_notifier_test_and_clear(&cq->notifier);
>> 
>>nvme_update_cq_head(cq);
>> @@ -4275,6 +4279,8 @@ static void nvme_sq_notifier(EventNotifier *e)
>> {
>>NvmeSQueue *sq = container_of(e, NvmeSQueue, notifier);
>> 
>> +trace_pci_nvme_sq_notify(sq->sqid);
>> +
>>event_notifier_test_and_clear(&sq->notifier);
>> 
>>nvme_process_sq(sq);
>> @@ -6240,6 +6246,8 @@ static void nvme_process_sq(void *opaque)
>>NvmeCtrl *n = sq->ctrl;
>>NvmeCQueue *cq = n->cq[sq->cqid];
>> 
>> +trace_pci_nvme_process_sq(sq->sqid);
>> +
>>uint16_t status;
>>hwaddr addr;
>>NvmeCmd cmd;
>> diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
>> index fccb79f48973..45dd708bd2fa 100644
>> --- a/hw/nvme/trace-events
>> +++ b/hw/nvme/trace-events
>> @@ -104,6 +104,10 @@ pci_nvme_mmio_shutdown_set(void) "shutdown bit set"
>> pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared"
>> pci_nvme_shadow_doorbell_cq(uint16_t cqid, uint16_t new_shadow_doorbell) 
>> "cqid %"PRIu16" new_shadow_doorbell %"PRIu16""
>> pci_nvme_shadow_doorbell_sq(uint16_t sqid, uint16_t new_shadow_doorbell) 
>> "sqid %"PRIu16" new_shadow_doorbell %"PRIu16""
>> +pci_nvme_sq_notify(uint16_t sqid) "sqid %"PRIu16""
>> +pci_nvme_cq_notify(uint16_t cqid) "cqid %"PRIu16""
>> +pci_nvme_process_sq(uint16_t sqid) "sqid %"PRIu16""
>> +pci_nvme_post_cqes(uint16_t cqid) "cqid %"PRIu16""
>> pci_nvme_open_zone(uint64_t slba, uint32_t zone_idx, int all) "open zone, 
>> slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
>> pci_nvme_close_zone(uint64_t slba, uint32_t zone_idx, int all) "close zone, 
>> slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
>> pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish 
>> zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
>> -- 
>> 2.36.1
> 
> I agree on the addition of SQ and CQ notify trace events. But what is the
> purpose for adding tracepoints for nvme_process_sq and nvme_post_cqes?

I realized these two events are useful when debugging iothread support. We
are processing sqe and cqe’s in a batch in nvme_process_sq and
nvme_post_cqes. It is important to mark the beginning of the batch.

RE: [PATCH] i386: Disable BTS and PEBS

2022-07-19 Thread Duan, Zhenzhong




>-Original Message-
>From: Sean Christopherson 
>Sent: Wednesday, July 20, 2022 2:53 AM
>To: Paolo Bonzini 
>Cc: Duan, Zhenzhong ; qemu-
>de...@nongnu.org; mtosa...@redhat.com; lik...@tencent.com; Ma,
>XiangfeiX 
>Subject: Re: [PATCH] i386: Disable BTS and PEBS
>
>On Tue, Jul 19, 2022, Paolo Bonzini wrote:
>> On 7/18/22 22:12, Sean Christopherson wrote:
>> > On Mon, Jul 18, 2022, Paolo Bonzini wrote:
>> > > This needs to be fixed in the kernel because old QEMU/new KVM is
>supported.
>> >
>> > I can't object to adding a quirk for this since KVM is breaking
>> > userspace, but on the KVM side we really need to stop "sanitizing"
>> > userspace inputs unless it puts the host at risk, because inevitably it
>leads to needing a quirk.
>>
>> The problem is not the sanitizing, it's that userspace literally
>> cannot know that this needs to be done because the feature bits are
>> "backwards" (1 = unavailable).
>
>Yes, the bits being inverted contributed to KVM not providing a way for
>userspace to enumerate PEBS and BTS support, but lack of enumeration is a
>seperate issue.
>
>If KVM had simply ignored invalid guest state from the get go, then
>userspace would never have gained a dependency on KVM sanitizing guest
>state.  The fact that KVM didn't enumerate support in any way is an
>orthogonal problem.  To play nice with older userspace, KVM will need to
>add a quirk to restore the sanizting code, but that doesn't solve the
>enumeration issue.  And vice versa, solving the enuemaration problem
>doesn't magically fix old userspace.
Hi,

I didn't clearly understand the boundary of when to use quirk and when to fix 
it directly, appreciate your guide.
My previous understanding for quirk is about backward compatibility, old 
behavior vs. new behavior.
But this issue is more like a regression or bug, and the sanitizing code is 
only in kvm/next branch,
not in kernel upstream yet, why bother to use a quirk?

Thanks
Zhenzhong

Re: [PULL 07/16] configure, meson: move ARCH to meson.build

2022-07-19 Thread Richard Henderson


On 7/19/22 23:40, Paolo Bonzini wrote:

On 7/19/22 15:00, Peter Maydell wrote:

shellcheck points out that this (old) commit removed the code
setting ARCH from configure, but left behind a use of it:

case "$ARCH" in
alpha)
   # Ensure there's only a single GP
   QEMU_CFLAGS="-msmall-data $QEMU_CFLAGS"
;;
esac

Presumably meson.build needs to do some equivalent of this ?


Yeah, I'll send a patch before 7.1 gets out (Richard, as the resident Alpha guy do you why 
it is needed?).


It was to allow simplifying assumptions in the jit for tcg/alpha, the patches for which 
were on list but never committed.


This can be dropped.


r~

Re: [PATCH] target/riscv: Support SW update of PTE A/D bits and Ssptwad extension

2022-07-19 Thread Jim Shu

Hi Anup,

Do you think it is OK if we only use ssptwad as a CPU option name
but not a RISC-V extension? just like MMU and PMP options of RISC-V.
(And we could change it to RISC-V extension in the future
if Ssptwad becomes the formal RISC-V extension)

Both HW/SW update schemes are already defined in RISC-V priv spec,
so I think it's reasonable to implement them in QEMU. The only issue here is
to choose a proper CPU option name to turn on/off HW update of A/D bits.

Regards,
Jim Shu

On Mon, Jul 18, 2022 at 12:02 PM Anup Patel  wrote:

> +Atish
>
> On Mon, Jul 18, 2022 at 9:23 AM Jim Shu  wrote:
> >
> > RISC-V priv spec v1.12 permits 2 PTE-update schemes of A/D-bit
> > (Access/Dirty bit): HW update or SW update. RISC-V profile defines the
> > extension name 'Ssptwad' for HW update to PTE A/D bits.
> > https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc
>
> The Ssptwad (even though used by profiles) is not a well defined RISC-V
> ISA extension. Rather, Ssptwad is just a name used in profiles to represent
> an optional feature.
>
> In fact, since it is not a well-defined ISA extension, QEMU cannot include
> Ssptwad in the ISA string as well.
>
> I think for such optionalities which are not well-defined ISA extensions,
> QEMU should treat it differently.
>
> Regards,
> Anup
>
> >
> > Current QEMU RISC-V implements HW update scheme, so this commit
> > introduces SW update scheme to QEMU and uses the 'Ssptwad' extension
> > as the CPU option to select 2 PTE-update schemes. QEMU RISC-V CPU still
> > uses HW update scheme (ext_ssptwad=true) by default to keep the backward
> > compatibility.
> >
> > SW update scheme implemention is based on priv spec v1.12:
> > "When a virtual page is accessed and the A bit is clear, or is written
> > and the D bit is clear, a page-fault exception (corresponding to the
> > original access type) is raised."
> >
> > Signed-off-by: Jim Shu 
> > Reviewed-by: Frank Chang 
> > ---
> >  target/riscv/cpu.c| 2 ++
> >  target/riscv/cpu.h| 1 +
> >  target/riscv/cpu_helper.c | 9 +
> >  3 files changed, 12 insertions(+)
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index 1bb3973806..1d38c1c1d2 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -857,6 +857,7 @@ static void riscv_cpu_init(Object *obj)
> >
> >  cpu->cfg.ext_ifencei = true;
> >  cpu->cfg.ext_icsr = true;
> > +cpu->cfg.ext_ssptwad = true;
> >  cpu->cfg.mmu = true;
> >  cpu->cfg.pmp = true;
> >
> > @@ -900,6 +901,7 @@ static Property riscv_cpu_extensions[] = {
> >  DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
> >  DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
> >  DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false),
> > +DEFINE_PROP_BOOL("ssptwad", RISCVCPU, cfg.ext_ssptwad, true),
> >
> >  DEFINE_PROP_BOOL("zba", RISCVCPU, cfg.ext_zba, true),
> >  DEFINE_PROP_BOOL("zbb", RISCVCPU, cfg.ext_zbb, true),
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index 5c7acc055a..2eee59af98 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -433,6 +433,7 @@ struct RISCVCPUConfig {
> >  bool ext_zve32f;
> >  bool ext_zve64f;
> >  bool ext_zmmul;
> > +bool ext_ssptwad;
> >  bool rvv_ta_all_1s;
> >
> >  uint32_t mvendorid;
> > diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> > index 59b3680b1b..a8607c0d7b 100644
> > --- a/target/riscv/cpu_helper.c
> > +++ b/target/riscv/cpu_helper.c
> > @@ -981,6 +981,15 @@ restart:
> >
> >  /* Page table updates need to be atomic with MTTCG enabled
> */
> >  if (updated_pte != pte) {
> > +if (!cpu->cfg.ext_ssptwad) {
> > +/*
> > + * If A/D bits are managed by SW, HW just raises the
> > + * page fault exception corresponding to the
> original
> > + * access type when A/D bits need to be updated.
> > + */
> > +return TRANSLATE_FAIL;
> > +}
> > +
> >  /*
> >   * - if accessed or dirty bits need updating, and the
> PTE is
> >   *   in RAM, then we do so atomically with a compare
> and swap.
> > --
> > 2.17.1
> >
> >
>

[PULL 1/2] Hexagon (target/hexagon) fix store w/mem_noshuf & predicated load

2022-07-19 Thread Taylor Simpson

Call the CHECK_NOSHUF macro multiple times: once in the
fGEN_TCG_PRED_LOAD() and again in fLOAD().

Before this commit, a packet with a store and a predicated
load with mem_noshuf that gets encoded like this:

{ P0 = cmp.eq(R17,#0x0)
  memw(R18+#0x0) = R2
  if (!P0.new) R3 = memw(R17+#0x4) }

... would end up generating a branch over both the load
and the store like so:

...
brcond_i32 loc17,$0x0,eq,$L1
mov_i32 loc18,store_addr_1
qemu_st_i32 store_val32_1,store_addr_1,leul,0
qemu_ld_i32 loc16,loc7,leul,0
set_label $L1
...

Test cases added to tests/tcg/hexagon/mem_noshuf.c

Co-authored-by: Taylor Simpson 
Signed-off-by: Brian Cain 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20220707210546.15985-2-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h   |   2 +
 tests/tcg/hexagon/mem_noshuf.c | 122 +++--
 2 files changed, 119 insertions(+), 5 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index c6f0879b6e..b0b6b3644e 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -343,6 +343,7 @@
 PRED;  \
 PRED_LOAD_CANCEL(LSB, EA); \
 tcg_gen_movi_tl(RdV, 0); \
+CHECK_NOSHUF; \
 tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, label); \
 fLOAD(1, SIZE, SIGN, EA, RdV); \
 gen_set_label(label); \
@@ -402,6 +403,7 @@
 PRED;  \
 PRED_LOAD_CANCEL(LSB, EA); \
 tcg_gen_movi_i64(RddV, 0); \
+CHECK_NOSHUF; \
 tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, label); \
 fLOAD(1, 8, u, EA, RddV); \
 gen_set_label(label); \
diff --git a/tests/tcg/hexagon/mem_noshuf.c b/tests/tcg/hexagon/mem_noshuf.c
index dd714d5e98..0f4064e700 100644
--- a/tests/tcg/hexagon/mem_noshuf.c
+++ b/tests/tcg/hexagon/mem_noshuf.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2022 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -84,6 +84,70 @@ MEM_NOSHUF32(mem_noshuf_sd_luh, long long,unsigned 
short,   memd, memuh)
 MEM_NOSHUF32(mem_noshuf_sd_lw,  long long,signed int,   memd, memw)
 MEM_NOSHUF64(mem_noshuf_sd_ld,  long long,signed long long, memd, memd)
 
+static inline int pred_lw_sw(int pred, int *p, int *q, int x, int y)
+{
+int ret;
+asm volatile("p0 = cmp.eq(%5, #0)\n\t"
+ "%0 = %3\n\t"
+ "{\n\t"
+ "memw(%1) = %4\n\t"
+ "if (!p0) %0 = memw(%2)\n\t"
+ "}:mem_noshuf\n"
+ : "=&r"(ret)
+ : "r"(p), "r"(q), "r"(x), "r"(y), "r"(pred)
+ : "p0", "memory");
+return ret;
+}
+
+static inline int pred_lw_sw_pi(int pred, int *p, int *q, int x, int y)
+{
+int ret;
+asm volatile("p0 = cmp.eq(%5, #0)\n\t"
+ "%0 = %3\n\t"
+ "r7 = %2\n\t"
+ "{\n\t"
+ "memw(%1) = %4\n\t"
+ "if (!p0) %0 = memw(r7++#4)\n\t"
+ "}:mem_noshuf\n"
+ : "=&r"(ret)
+ : "r"(p), "r"(q), "r"(x), "r"(y), "r"(pred)
+ : "r7", "p0", "memory");
+return ret;
+}
+
+static inline long long pred_ld_sd(int pred, long long *p, long long *q,
+   long long x, long long y)
+{
+unsigned long long ret;
+asm volatile("p0 = cmp.eq(%5, #0)\n\t"
+ "%0 = %3\n\t"
+ "{\n\t"
+ "memd(%1) = %4\n\t"
+ "if (!p0) %0 = memd(%2)\n\t"
+ "}:mem_noshuf\n"
+ : "=&r"(ret)
+ : "r"(p), "r"(q), "r"(x), "r"(y), "r"(pred)
+ : "p0", "memory");
+return ret;
+}
+
+static inline long long pred_ld_sd_pi(int pred, long long *p, long long *q,
+  long long x, long long y)
+{
+long long ret;
+asm volatile("p0 = cmp.eq(%5, #0)\n\t"
+ "%0 = %3\n\t"
+ "r7 = %2\n\t"
+ "{\n\t"
+ "memd(%1) = %4\n\t"
+ "if (!p0) %0 = memd(r7++#8)\n\t"
+ "}:mem_noshuf\n"
+ : "=&r"(ret)
+ : "r"(p), "r"(q), "r"(x), "r"(y), "r"(pred)
+ : "p0", "memory");
+return ret;
+}
+
 static inline unsigned int cancel_sw_lb(int pred, int *p, signed char *q, int 
x)
 {
 unsigned int ret;
@@ -126,18 +190,22 @@ typedef union {
 
 int err;
 
-static void check32(int n, int expect)
+#define check32(n, expect) check32_(n, expect, __LINE__)
+
+static void check32_(int n, int expect, int line)
 {
 if (n != expect) {
-printf("ERROR: 0x%08x != 0x%08x\n", n, expect);
+printf("ERROR: 0x%08

[PULL 0/2] Hexagon (target/hexagon) bug fixes for mem_noshuf

2022-07-19 Thread Taylor Simpson

The following changes since commit d48125de38f48a61d6423ef6a01156d6dff9ee2c:

  Merge tag 'kraxel-20220719-pull-request' of https://gitlab.com/kraxel/qemu 
into staging (2022-07-19 17:40:36 +0100)

are available in the Git repository at:

  https://github.com/quic/qemu tags/pull-hex-20220719-1

for you to fetch changes up to 15fc6badbd28a126346f84c1acae48e273b66b67:

  Hexagon (target/hexagon) fix bug in mem_noshuf load exception (2022-07-19 
14:20:08 -0700)


Recall that the semantics of a Hexagon mem_noshuf packet are that the
store effectively happens before the load.  There are two bug fixes
in this series.


Taylor Simpson (2):
  Hexagon (target/hexagon) fix store w/mem_noshuf & predicated load
  Hexagon (target/hexagon) fix bug in mem_noshuf load exception

 target/hexagon/gen_tcg.h |  10 ++-
 target/hexagon/helper.h  |   1 +
 target/hexagon/macros.h  |  37 +---
 target/hexagon/genptr.c  |   7 ++
 target/hexagon/op_helper.c   |  23 +++--
 tests/tcg/hexagon/mem_noshuf.c   | 122 --
 tests/tcg/hexagon/mem_noshuf_exception.c | 146 +++
 tests/tcg/hexagon/Makefile.target|   1 +
 8 files changed, 323 insertions(+), 24 deletions(-)
 create mode 100644 tests/tcg/hexagon/mem_noshuf_exception.c

[PULL 2/2] Hexagon (target/hexagon) fix bug in mem_noshuf load exception

2022-07-19 Thread Taylor Simpson

The semantics of a mem_noshuf packet are that the store effectively
happens before the load.  However, in cases where the load raises an
exception, we cannot simply execute the store first.

This change adds a probe to check that the load will not raise an
exception before executing the store.

If the load is predicated, this requires special handling.  We check
the condition before performing the probe.  Since, we need the EA to
perform the check, we move the GET_EA portion inside CHECK_NOSHUF_PRED.

Test case added in tests/tcg/hexagon/mem_noshuf_exception.c

Suggested-by: Alessandro Di Federico 
Suggested-by: Anton Johansson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20220707210546.15985-3-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h |  12 +-
 target/hexagon/helper.h  |   1 +
 target/hexagon/macros.h  |  37 --
 target/hexagon/genptr.c  |   7 ++
 target/hexagon/op_helper.c   |  23 +++-
 tests/tcg/hexagon/mem_noshuf_exception.c | 146 +++
 tests/tcg/hexagon/Makefile.target|   1 +
 7 files changed, 206 insertions(+), 21 deletions(-)
 create mode 100644 tests/tcg/hexagon/mem_noshuf_exception.c

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index b0b6b3644e..50634ac459 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -339,13 +339,13 @@
 do { \
 TCGv LSB = tcg_temp_local_new(); \
 TCGLabel *label = gen_new_label(); \
-GET_EA; \
+tcg_gen_movi_tl(EA, 0); \
 PRED;  \
+CHECK_NOSHUF_PRED(GET_EA, SIZE, LSB); \
 PRED_LOAD_CANCEL(LSB, EA); \
 tcg_gen_movi_tl(RdV, 0); \
-CHECK_NOSHUF; \
 tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, label); \
-fLOAD(1, SIZE, SIGN, EA, RdV); \
+fLOAD(1, SIZE, SIGN, EA, RdV); \
 gen_set_label(label); \
 tcg_temp_free(LSB); \
 } while (0)
@@ -399,13 +399,13 @@
 do { \
 TCGv LSB = tcg_temp_local_new(); \
 TCGLabel *label = gen_new_label(); \
-GET_EA; \
+tcg_gen_movi_tl(EA, 0); \
 PRED;  \
+CHECK_NOSHUF_PRED(GET_EA, 8, LSB); \
 PRED_LOAD_CANCEL(LSB, EA); \
 tcg_gen_movi_i64(RddV, 0); \
-CHECK_NOSHUF; \
 tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, label); \
-fLOAD(1, 8, u, EA, RddV); \
+fLOAD(1, 8, u, EA, RddV); \
 gen_set_label(label); \
 tcg_temp_free(LSB); \
 } while (0)
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index c89aa4ed4d..368f0b5708 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -104,6 +104,7 @@ DEF_HELPER_1(vwhist128q, void, env)
 DEF_HELPER_2(vwhist128m, void, env, s32)
 DEF_HELPER_2(vwhist128qm, void, env, s32)
 
+DEF_HELPER_4(probe_noshuf_load, void, env, i32, int, int)
 DEF_HELPER_2(probe_pkt_scalar_store_s0, void, env, int)
 DEF_HELPER_2(probe_hvx_stores, void, env, int)
 DEF_HELPER_3(probe_pkt_scalar_hvx_stores, void, env, int, int)
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index a78e84faa4..92eb8bbf05 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -87,11 +87,28 @@
  *
  *
  * For qemu, we look for a load in slot 0 when there is  a store in slot 1
- * in the same packet.  When we see this, we call a helper that merges the
- * bytes from the store buffer with the value loaded from memory.
+ * in the same packet.  When we see this, we call a helper that probes the
+ * load to make sure it doesn't fault.  Then, we process the store ahead of
+ * the actual load.
+
  */
-#define CHECK_NOSHUF \
+#define CHECK_NOSHUF(VA, SIZE) \
 do { \
+if (insn->slot == 0 && pkt->pkt_has_store_s1) { \
+probe_noshuf_load(VA, SIZE, ctx->mem_idx); \
+process_store(ctx, pkt, 1); \
+} \
+} while (0)
+
+#define CHECK_NOSHUF_PRED(GET_EA, SIZE, PRED) \
+do { \
+TCGLabel *label = gen_new_label(); \
+tcg_gen_brcondi_tl(TCG_COND_EQ, PRED, 0, label); \
+GET_EA; \
+if (insn->slot == 0 && pkt->pkt_has_store_s1) { \
+probe_noshuf_load(EA, SIZE, ctx->mem_idx); \
+} \
+gen_set_label(label); \
 if (insn->slot == 0 && pkt->pkt_has_store_s1) { \
 process_store(ctx, pkt, 1); \
 } \
@@ -99,37 +116,37 @@
 
 #define MEM_LOAD1s(DST, VA) \
 do { \
-CHECK_NOSHUF; \
+CHECK_NOSHUF(VA, 1); \
 tcg_gen_qemu_ld8s(DST, VA, ctx->mem_idx); \
 } while (0)
 #define MEM_LOAD1u(DST, VA) \
 do { \
-CHECK_NOSHUF; \
+CHECK_NOSHUF(VA, 1); \
 tcg_gen_qemu_ld8u(DST, VA, ctx->mem_idx); \
 } while (0)
 #define MEM_LOAD2s(DST, VA) \
 do { \
-CHECK_NOSHUF; \
+CHECK_NOSHUF(VA, 2); \
 tcg_gen_qemu_ld16s(DST, VA, ctx->mem_idx); \
 } while (0)
 #define MEM_LOAD2u(DST, VA) \
 do { \
-CHECK_NOSHUF; \

Re: [PULL 00/29] migration queue

2022-07-19 Thread Peter Xu

On Tue, Jul 19, 2022 at 10:53:33PM +0100, Peter Maydell wrote:
> On Tue, 19 Jul 2022 at 18:16, Dr. David Alan Gilbert (git)
>  wrote:
> >
> > From: "Dr. David Alan Gilbert" 
> >
> > The following changes since commit da7da9d5e608200ecc0749ff37be246e9cd3314f:
> >
> >   Merge tag 'pull-request-2022-07-19' of https://gitlab.com/thuth/qemu into 
> > staging (2022-07-19 13:05:06 +0100)
> >
> > are available in the Git repository at:
> >
> >   https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220719c
> >
> > for you to fetch changes up to ec0345c1000b3a57b557da4c2e3f2114dd23903a:
> >
> >   migration: Avoid false-positive on non-supported scenarios for 
> > zero-copy-send (2022-07-19 17:33:22 +0100)
> >
> > 
> > Migration pull 2022-07-19
> >
> >   Hyman's dirty page rate limit set
> >   Ilya's fix for zlib vs migration
> >   Peter's postcopy-preempt
> >   Cleanup from Dan
> >   zero-copy tidy ups from Leo
> >   multifd doc fix from Juan
> >
> > Signed-off-by: Dr. David Alan Gilbert 
> >
> > 
> 
> Fails to build on some configs, eg:
> https://gitlab.com/qemu-project/qemu/-/jobs/2743059797
> https://gitlab.com/qemu-project/qemu/-/jobs/2743059743
> 
> ../tests/qtest/migration-test.c: In function 'test_postcopy_preempt_tls_psk':
> ../tests/qtest/migration-test.c:1168:23: error:
> 'test_migrate_tls_psk_start_match' undeclared (first use in this
> function)
> 1168 | .start_hook = test_migrate_tls_psk_start_match,
> | ^~~~
> ../tests/qtest/migration-test.c:1168:23: note: each undeclared
> identifier is reported only once for each function it appears in
> ../tests/qtest/migration-test.c:1169:24: error:
> 'test_migrate_tls_psk_finish' undeclared (first use in this function)
> 1169 | .finish_hook = test_migrate_tls_psk_finish,
> | ^~~
> ../tests/qtest/migration-test.c: In function 'test_postcopy_recovery_tls_psk':
> ../tests/qtest/migration-test.c:1247:23: error:
> 'test_migrate_tls_psk_start_match' undeclared (first use in this
> function)
> 1247 | .start_hook = test_migrate_tls_psk_start_match,
> | ^~~~
> ../tests/qtest/migration-test.c:1248:24: error:
> 'test_migrate_tls_psk_finish' undeclared (first use in this function)
> 1248 | .finish_hook = test_migrate_tls_psk_finish,
> | ^~~
> ../tests/qtest/migration-test.c: In function 'test_postcopy_preempt_all':
> ../tests/qtest/migration-test.c:1268:23: error:
> 'test_migrate_tls_psk_start_match' undeclared (first use in this
> function)
> 1268 | .start_hook = test_migrate_tls_psk_start_match,
> | ^~~~
> ../tests/qtest/migration-test.c:1269:24: error:
> 'test_migrate_tls_psk_finish' undeclared (first use in this function)
> 1269 | .finish_hook = test_migrate_tls_psk_finish,
> | ^~~
> At top level:
> ../tests/qtest/migration-test.c:1264:13: error:
> 'test_postcopy_preempt_all' defined but not used
> [-Werror=unused-function]
> 1264 | static void test_postcopy_preempt_all(void)
> | ^
> ../tests/qtest/migration-test.c:1244:13: error:
> 'test_postcopy_recovery_tls_psk' defined but not used
> [-Werror=unused-function]
> 1244 | static void test_postcopy_recovery_tls_psk(void)
> | ^~
> ../tests/qtest/migration-test.c:1164:13: error:
> 'test_postcopy_preempt_tls_psk' defined but not used
> [-Werror=unused-function]
> 1164 | static void test_postcopy_preempt_tls_psk(void)
> | ^

Sorry my fault.  We'll need to fix the 3 test patches one by one to use "#ifdef
CONFIG_GNUTLS" properly for those functions..

I've attached the three small fixups, Peter/Dave, let me know what's the
best way to redo this.

Thanks,

-- 
Peter Xu
>From 7d361c8d61a51ed0df9e1606c3a6f8c306028181 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Tue, 19 Jul 2022 18:16:40 -0400
Subject: [PATCH 1/3] fixup! tests: Add postcopy tls migration test
Content-type: text/plain

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 81780189a8..87dc87ba8b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1133,6 +1133,7 @@ static void test_postcopy(void)
 test_postcopy_common(&args);
 }
 
+#ifdef CONFIG_GNUTLS
 static void test_postcopy_tls_psk(void)
 {
 MigrateCommon args = {
@@ -1142,6 +1143,7 @@ static void test_postcopy_tls_psk(void)
 
 test_postcopy_common(&args);
 }
+#endif
 
 static void test_postcopy_preempt(void)
 {
-- 
2.32.0

>From c76945ab7b9a38456f077267ccb51133ef087e35 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Tue, 19 Jul 2022 18:16:57 -0400
Subject: [PATCH 2/3] fixup! tests: Add postcopy preempt tests
Content-type: text/plain

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test

Re: [PULL 00/29] migration queue

2022-07-19 Thread Peter Maydell

On Tue, 19 Jul 2022 at 18:16, Dr. David Alan Gilbert (git)
 wrote:
>
> From: "Dr. David Alan Gilbert" 
>
> The following changes since commit da7da9d5e608200ecc0749ff37be246e9cd3314f:
>
>   Merge tag 'pull-request-2022-07-19' of https://gitlab.com/thuth/qemu into 
> staging (2022-07-19 13:05:06 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220719c
>
> for you to fetch changes up to ec0345c1000b3a57b557da4c2e3f2114dd23903a:
>
>   migration: Avoid false-positive on non-supported scenarios for 
> zero-copy-send (2022-07-19 17:33:22 +0100)
>
> 
> Migration pull 2022-07-19
>
>   Hyman's dirty page rate limit set
>   Ilya's fix for zlib vs migration
>   Peter's postcopy-preempt
>   Cleanup from Dan
>   zero-copy tidy ups from Leo
>   multifd doc fix from Juan
>
> Signed-off-by: Dr. David Alan Gilbert 
>
> 

Fails to build on some configs, eg:
https://gitlab.com/qemu-project/qemu/-/jobs/2743059797
https://gitlab.com/qemu-project/qemu/-/jobs/2743059743

../tests/qtest/migration-test.c: In function 'test_postcopy_preempt_tls_psk':
../tests/qtest/migration-test.c:1168:23: error:
'test_migrate_tls_psk_start_match' undeclared (first use in this
function)
1168 | .start_hook = test_migrate_tls_psk_start_match,
| ^~~~
../tests/qtest/migration-test.c:1168:23: note: each undeclared
identifier is reported only once for each function it appears in
../tests/qtest/migration-test.c:1169:24: error:
'test_migrate_tls_psk_finish' undeclared (first use in this function)
1169 | .finish_hook = test_migrate_tls_psk_finish,
| ^~~
../tests/qtest/migration-test.c: In function 'test_postcopy_recovery_tls_psk':
../tests/qtest/migration-test.c:1247:23: error:
'test_migrate_tls_psk_start_match' undeclared (first use in this
function)
1247 | .start_hook = test_migrate_tls_psk_start_match,
| ^~~~
../tests/qtest/migration-test.c:1248:24: error:
'test_migrate_tls_psk_finish' undeclared (first use in this function)
1248 | .finish_hook = test_migrate_tls_psk_finish,
| ^~~
../tests/qtest/migration-test.c: In function 'test_postcopy_preempt_all':
../tests/qtest/migration-test.c:1268:23: error:
'test_migrate_tls_psk_start_match' undeclared (first use in this
function)
1268 | .start_hook = test_migrate_tls_psk_start_match,
| ^~~~
../tests/qtest/migration-test.c:1269:24: error:
'test_migrate_tls_psk_finish' undeclared (first use in this function)
1269 | .finish_hook = test_migrate_tls_psk_finish,
| ^~~
At top level:
../tests/qtest/migration-test.c:1264:13: error:
'test_postcopy_preempt_all' defined but not used
[-Werror=unused-function]
1264 | static void test_postcopy_preempt_all(void)
| ^
../tests/qtest/migration-test.c:1244:13: error:
'test_postcopy_recovery_tls_psk' defined but not used
[-Werror=unused-function]
1244 | static void test_postcopy_recovery_tls_psk(void)
| ^~
../tests/qtest/migration-test.c:1164:13: error:
'test_postcopy_preempt_tls_psk' defined but not used
[-Werror=unused-function]
1164 | static void test_postcopy_preempt_tls_psk(void)
| ^


-- PMM

Re: [PATCH] docs: List kvm as a supported accelerator on RISC-V

2022-07-19 Thread Alistair Francis

On Tue, Jul 19, 2022 at 11:37 PM Bin Meng  wrote:
>
> Since commit fbf43c7dbf18 ("target/riscv: enable riscv kvm accel"),
> KVM accelerator is supported on RISC-V. Let's document it.
>
> Signed-off-by: Bin Meng 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
>  docs/about/build-platforms.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/docs/about/build-platforms.rst b/docs/about/build-platforms.rst
> index ebde20f981..118a4c64dc 100644
> --- a/docs/about/build-platforms.rst
> +++ b/docs/about/build-platforms.rst
> @@ -46,7 +46,7 @@ Those hosts are officially supported, with various 
> accelerators:
> * - PPC
>   - kvm, tcg
> * - RISC-V
> - - tcg
> + - kvm, tcg
> * - s390x
>   - kvm, tcg
> * - SPARC
> --
> 2.25.1
>
>

Re: [PATCH v1 1/1] migration: Avoid false-positive on non-supported scenarios for zero-copy-send

2022-07-19 Thread Peter Xu

On Tue, Jul 19, 2022 at 09:23:45AM -0300, Leonardo Bras wrote:
> Migration with zero-copy-send currently has it's limitations, as it can't
> be used with TLS nor any kind of compression. In such scenarios, it should
> output errors during parameter / capability setting.
> 
> But currently there are some ways of setting this not-supported scenarios
> without printing the error message:
> 
> !) For 'compression' capability, it works by enabling it together with
> zero-copy-send. This happens because the validity test for zero-copy uses
> the helper unction migrate_use_compression(), which check for compression
> presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS].
> 
> The point here is: the validity test happens before the capability gets
> enabled. If all of them get enabled together, this test will not return
> error.
> 
> In order to fix that, replace migrate_use_compression() by directly testing
> the cap_list parameter migrate_caps_check().
> 
> 2) For features enabled by parameters such as TLS & 'multifd_compression',
> there was also a possibility of setting non-supported scenarios: setting
> zero-copy-send first, then setting the unsupported parameter.
> 
> In order to fix that, also add a check for parameters conflicting with
> zero-copy-send on migrate_params_check().
> 
> 3) XBZRLE is also a compression capability, so it makes sense to also add
> it to the list of capabilities which are not supported with zero-copy-send.
> 
> Fixes: 1abaec9a1b2c ("migration: Change zero_copy_send from migration 
> parameter to migration capability")
> Signed-off-by: Leonardo Bras 
> ---
>  migration/migration.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 78f5057373..c6260e54bf 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1274,7 +1274,9 @@ static bool migrate_caps_check(bool *cap_list,
>  #ifdef CONFIG_LINUX
>  if (cap_list[MIGRATION_CAPABILITY_ZERO_COPY_SEND] &&
>  (!cap_list[MIGRATION_CAPABILITY_MULTIFD] ||
> - migrate_use_compression() ||
> + cap_list[MIGRATION_CAPABILITY_COMPRESS] ||
> + cap_list[MIGRATION_CAPABILITY_XBZRLE] ||
> + migrate_multifd_compression() ||
>   migrate_use_tls())) {
>  error_setg(errp,
> "Zero copy only available for non-compressed non-TLS 
> multifd migration");
> @@ -1511,6 +1513,17 @@ static bool migrate_params_check(MigrationParameters 
> *params, Error **errp)
>  error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: 
> ");
>  return false;
>  }
> +
> +#ifdef CONFIG_LINUX

A trivial nit: we don't need this since migrate_use_zero_copy_send() will
be defined unconditionally, and it's returning false with !CONFIG_LINUX.
So feel free to drop this if there's a new version.

> +if (migrate_use_zero_copy_send() &&
> +((params->has_multifd_compression && params->multifd_compression) ||
> + (params->has_tls_creds && params->tls_creds && 
> *params->tls_creds))) {
> +error_setg(errp,
> +   "Zero copy only available for non-compressed non-TLS 
> multifd migration");
> +return false;
> +}
> +#endif

Reviewed-by: Peter Xu 

Thanks,

-- 
Peter Xu

Re: [PATCH] tests: migration-test: Allow test to run without uffd

2022-07-19 Thread Peter Xu

On Tue, Jul 19, 2022 at 11:37:55AM +0100, Daniel P. Berrangé wrote:
> On Tue, Jul 19, 2022 at 12:28:24PM +0200, Thomas Huth wrote:
> > On 18/07/2022 21.14, Peter Xu wrote:
> > > Hi, Thomas,
> > > 
> > > On Mon, Jul 18, 2022 at 08:23:26PM +0200, Thomas Huth wrote:
> > > > On 07/07/2022 20.46, Peter Xu wrote:
> > > > > We used to stop running all tests if uffd is not detected.  However
> > > > > logically that's only needed for postcopy not the rest of tests.
> > > > > 
> > > > > Keep running the rest when still possible.
> > > > > 
> > > > > Signed-off-by: Peter Xu 
> > > > > ---
> > > > >tests/qtest/migration-test.c | 11 +--
> > > > >1 file changed, 5 insertions(+), 6 deletions(-)
> > > > 
> > > > Did you test your patch in the gitlab-CI? I just added it to my 
> > > > testing-next
> > > > branch and the the test is failing reproducibly on macOS here:
> > > > 
> > > >   https://gitlab.com/thuth/qemu/-/jobs/2736260861#L6275
> > > >   https://gitlab.com/thuth/qemu/-/jobs/2736623914#L6275
> > > > 
> > > > (without your patch the whole test is skipped instead)
> > > 
> > > Thanks for reporting this.
> > > 
> > > Is it easy to figure out which test was failing on your side?  I cannot
> > > easily reproduce this here on a MacOS with M1.
> > 
> > I've modified the yml file to only run the migration test in verbose mode
> > and got this:
> > 
> > ...
> > ok 5 /x86_64/migration/validate_uuid_src_not_set
> > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > source,debug-threads=on -m 150M -serial
> > file:/tmp/migration-test-ef2fMr/src_serial -drive
> > file=/tmp/migration-test-ef2fMr/bootsect,format=raw  -uuid
> > ---- 2>/dev/null -accel qtest
> > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > target,debug-threads=on -m 150M -serial
> > file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> > unix:/tmp/migration-test-ef2fMr/migsocket -drive
> > file=/tmp/migration-test-ef2fMr/bootsect,format=raw   2>/dev/null -accel
> > qtest
> > ok 6 /x86_64/migration/validate_uuid_dst_not_set
> > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > source,debug-threads=on -m 150M -serial
> > file:/tmp/migration-test-ef2fMr/src_serial -drive
> > file=/tmp/migration-test-ef2fMr/bootsect,format=raw-accel qtest
> > # starting QEMU: exec ./qemu-system-x86_64 -qtest unix:/tmp/qtest-58011.sock
> > -qtest-log /dev/null -chardev socket,path=/tmp/qtest-58011.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -accel kvm -accel tcg -name
> > target,debug-threads=on -m 150M -serial
> > file:/tmp/migration-test-ef2fMr/dest_serial -incoming
> > unix:/tmp/migration-test-ef2fMr/migsocket -drive
> > file=/tmp/migration-test-ef2fMr/bootsect,format=raw-accel qtest
> > **
> > ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> > assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> > Bail out!
> > ERROR:../tests/qtest/migration-helpers.c:181:wait_for_migration_status:
> > assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
> 
> This is the safety net we put it to catch case where the test has
> got stuck. It is set at 2 minutes.
> 
> There's a chance that is too short, so one first step might be to
> increase to 10 minutes and see if the tests pass. If it still fails,
> then its likely a genuine bug

Agreed, it worths another try.

Thanks both for your answers on CI.  I wanted to go through the setup of
Cirrus CI and kick it myself, but I got stuck at the step on generating the
API token for Cirrus.

It seems the button to generate API token just didn't have a respond for me
until I refresh the page (then I can see some token generated), however I
still haven't figured out a way to see the initial 6 letters since they'll
be always masked out..  Changing browser didn't work for me either. :(

-- 
Peter Xu

RE: [PULL 0/2] Hexagon (target/hexagon) bug fixes for mem_noshuf

2022-07-19 Thread Taylor Simpson



> -Original Message-
> From: Peter Maydell 
> Sent: Tuesday, July 19, 2022 6:05 AM
> To: Taylor Simpson 
> Cc: qemu-devel@nongnu.org; richard.hender...@linaro.org;
> f4...@amsat.org
> Subject: Re: [PULL 0/2] Hexagon (target/hexagon) bug fixes for mem_noshuf
> 
> On Mon, 18 Jul 2022 at 23:49, Taylor Simpson 
> wrote:
> >
> > The following changes since commit
> 24f01d220f56eab3268538ef10655b4fb2453fdf:
> >
> >   Merge https://github.com/qemu/qemu into tip (2022-07-18 11:16:39
> > -0700)
> >
> > are available in the Git repository at:
> >
> >   https://github.com/quic/qemu tags/pull-hex-20220718
> >
> > for you to fetch changes up to
> eb9072602617cb49c489aaf058f72695c2eaedc2:
> 
> This tag is badly broken as a pull request, because it includes this commit:
> 
> commit 24f01d220f56eab3268538ef10655b4fb2453fdf
> Merge: eadad54bf10 78237897312
> Author: Taylor Simpson 
> Date:   Mon Jul 18 11:16:39 2022 -0700
> 
> Merge https://github.com/qemu/qemu into tip
> 
> 
> Never merge upstream qemu back into a branch you're using as a pull
> request, please. Just rebase the patches on latest master.
> 
> Luckily I noticed in this case because it introduces a whole load of garbage
> changes and doesn't build.
> 
> thanks
> -- PMM

My apologies.  I'll fix this and resubmit.

Taylor

Re: [PATCH] virtio-gpu: update done only on the scanout associated with rect

2022-07-19 Thread Dongwon Kim

On Tue, Jul 19, 2022 at 01:15:26PM +0200, Gerd Hoffmann wrote:
> On Fri, May 06, 2022 at 10:09:30AM -0700, Dongwon Kim wrote:
> > On Fri, May 06, 2022 at 11:53:22AM +0400, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > On Fri, May 6, 2022 at 1:46 AM Dongwon Kim  wrote:
> > > 
> > > > It only needs to update the scanouts containing the rect area
> > > > coming with the resource-flush request from the guest.
> > > >
> > > >
> > > Cc: Gerd Hoffmann 
> > > > Cc: Vivek Kasireddy 
> > > > Signed-off-by: Dongwon Kim 
> > > > ---
> > > >  hw/display/virtio-gpu.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> > > > index 529b5246b2..165ecafd7a 100644
> > > > --- a/hw/display/virtio-gpu.c
> > > > +++ b/hw/display/virtio-gpu.c
> > > > @@ -514,6 +514,9 @@ static void virtio_gpu_resource_flush(VirtIOGPU *g,
> > > >  for (i = 0; i < g->parent_obj.conf.max_outputs; i++) {
> > > >  scanout = &g->parent_obj.scanout[i];
> > > >  if (scanout->resource_id == res->resource_id &&
> > > > +rf.r.x >= scanout->x && rf.r.y >= scanout->y &&
> > > > +rf.r.x + rf.r.width <= scanout->x + scanout->width &&
> > > > +rf.r.y + rf.r.height <= scanout->y + scanout->height &&
> > > >
> > > 
> > > 
> > > That doesn't seem to handle intersections/overlapping, I think it should.
> > 
> > so set-scanouts and resource flushes are issued per scanout(CRTC/plane
> > from guest's point of view). In case of intersections/overlapping, there
> > will be two resource flushes (in case there are two scanouts) and each
> > resource flush will take care of updating the scanout that covers
> > partial damaged area.
> 
> Even though the linux kernel driver sends two flushes, one for each
> scanout, it is perfectly valid send a single flush for the complete
> resource.
> 
> So checking whenever the rectangle is completely within the scanout is
> not correct.  When the scanout is covered partly you must update too.
> Only when the rectangle is completely outside the scanout it is valid to
> skip it.

Gerd,

I got your point. I will take a look into it.

> 
> take care,
>   Gerd
>

Access target TranslatorOps

2022-07-19 Thread Kenneth Adam Miller

Hello,

I would like to be able to, from the linux-user/main.c, access the target's
registered TranslatorOps instance. How would I do that when 1) the TCG is
correctly initialized and ready to run 2) before QEMU starts to run or when
it is safely paused?

Re: [PULL 00/29] migration queue

2022-07-19 Thread Peter Maydell

On Tue, 19 Jul 2022 at 18:16, Dr. David Alan Gilbert (git)
 wrote:
>
> From: "Dr. David Alan Gilbert" 
>
> The following changes since commit da7da9d5e608200ecc0749ff37be246e9cd3314f:
>
>   Merge tag 'pull-request-2022-07-19' of https://gitlab.com/thuth/qemu into 
> staging (2022-07-19 13:05:06 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220719c
>
> for you to fetch changes up to ec0345c1000b3a57b557da4c2e3f2114dd23903a:
>
>   migration: Avoid false-positive on non-supported scenarios for 
> zero-copy-send (2022-07-19 17:33:22 +0100)
>
> 
> Migration pull 2022-07-19
>
>   Hyman's dirty page rate limit set
>   Ilya's fix for zlib vs migration

I'm processing this pullreq, but while I think about it: once
we've got this fix in can we revert the workarounds we put in
our CI configs to set DFLTCC? (ie commit 309df6acb29346f)

thanks
-- PMM

Re: [PULL 0/6] Kraxel 20220719 patches

2022-07-19 Thread Peter Maydell

On Tue, 19 Jul 2022 at 16:28, Gerd Hoffmann  wrote:
>
> The following changes since commit 782378973121addeb11b13fd12a6ac2e69faa33f:
>
>   Merge tag 'pull-target-arm-20220718' of 
> https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-07-18 
> 16:29:32 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/kraxel/qemu.git tags/kraxel-20220719-pull-request
>
> for you to fetch changes up to c34a933802071aae5288e0aa3792756312e3da34:
>
>   gtk: Add show_tabs=on|off command line option. (2022-07-19 14:36:42 +0200)
>
> 
> ui: dbus-display fix, new gtk config options.
> usb: xhci fix, doc updates.
> microvm: no pcie io reservations.
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.1
for any user-visible changes.

-- PMM

[PATCH] target/mips: Handle lock_user() failure in UHI_plog semihosting call

2022-07-19 Thread Peter Maydell

Coverity notes that we forgot to check the error return from
lock_user() in one place in the handling of the UHI_plog semihosting
call.  Add the missing error handling.

report_fault() is rather brutal in that it will call abort(), but
this is the same error-handling used in the rest of this file.

Resolves: Coverity CID 1490684
Fixes: ea4210600db3c5 ("target/mips: Avoid qemu_semihosting_log_out for 
UHI_plog")
Signed-off-by: Peter Maydell 
---
NB: only tested with 'make check' and 'make check-tcg', which
almost certainly don't actually exercise this codepath.
---
 target/mips/tcg/sysemu/mips-semi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/mips/tcg/sysemu/mips-semi.c 
b/target/mips/tcg/sysemu/mips-semi.c
index 5fb1ad90920..85f0567a7fa 100644
--- a/target/mips/tcg/sysemu/mips-semi.c
+++ b/target/mips/tcg/sysemu/mips-semi.c
@@ -321,6 +321,9 @@ void mips_semihosting(CPUMIPSState *env)
 if (use_gdb_syscalls()) {
 addr = gpr[29] - str->len;
 p = lock_user(VERIFY_WRITE, addr, str->len, 0);
+if (!p) {
+report_fault(env);
+}
 memcpy(p, str->str, str->len);
 unlock_user(p, addr, str->len);
 semihost_sys_write(cs, uhi_cb, 2, addr, str->len);
-- 
2.25.1

Re: [PATCH] i386: Disable BTS and PEBS

2022-07-19 Thread Sean Christopherson

On Tue, Jul 19, 2022, Paolo Bonzini wrote:
> On 7/18/22 22:12, Sean Christopherson wrote:
> > On Mon, Jul 18, 2022, Paolo Bonzini wrote:
> > > This needs to be fixed in the kernel because old QEMU/new KVM is 
> > > supported.
> > 
> > I can't object to adding a quirk for this since KVM is breaking userspace, 
> > but on
> > the KVM side we really need to stop "sanitizing" userspace inputs unless it 
> > puts
> > the host at risk, because inevitably it leads to needing a quirk.
> 
> The problem is not the sanitizing, it's that userspace literally cannot know
> that this needs to be done because the feature bits are "backwards" (1 =
> unavailable).

Yes, the bits being inverted contributed to KVM not providing a way for 
userspace
to enumerate PEBS and BTS support, but lack of enumeration is a seperate issue.

If KVM had simply ignored invalid guest state from the get go, then userspace 
would
never have gained a dependency on KVM sanitizing guest state.  The fact that KVM
didn't enumerate support in any way is an orthogonal problem.  To play nice with
older userspace, KVM will need to add a quirk to restore the sanizting code, but
that doesn't solve the enumeration issue.  And vice versa, solving the 
enuemaration
problem doesn't magically fix old userspace.

> The right way to fix it is probably to use feature MSRs and, by default,
> leave the features marked as unavailable.  I'll think it through and post a
> patch tomorrow for both KVM and QEMU (to enable PEBS).

Yeah, lack of CPUID bits is annoying.

Re: [PATCH] i386: Disable BTS and PEBS

2022-07-19 Thread Paolo Bonzini


On 7/18/22 22:12, Sean Christopherson wrote:

On Mon, Jul 18, 2022, Paolo Bonzini wrote:

This needs to be fixed in the kernel because old QEMU/new KVM is supported.


I can't object to adding a quirk for this since KVM is breaking userspace, but 
on
the KVM side we really need to stop "sanitizing" userspace inputs unless it puts
the host at risk, because inevitably it leads to needing a quirk.


The problem is not the sanitizing, it's that userspace literally cannot 
know that this needs to be done because the feature bits are "backwards" 
(1 = unavailable).


The right way to fix it is probably to use feature MSRs and, by default, 
leave the features marked as unavailable.  I'll think it through and 
post a patch tomorrow for both KVM and QEMU (to enable PEBS).



But apart from that, where does Linux check MSR_IA32_MISC_ENABLE_BTS_UNAVAIL
and MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL?


The kernel uses synthetic feature flags that are set by:

   static void init_intel(struct cpuinfo_x86 *c)

if (boot_cpu_has(X86_FEATURE_DS)) {
unsigned int l1, l2;

rdmsr(MSR_IA32_MISC_ENABLE, l1, l2);
if (!(l1 & (1<<11)))
set_cpu_cap(c, X86_FEATURE_BTS);
if (!(l1 & (1<<12)))
set_cpu_cap(c, X86_FEATURE_PEBS);
}


Gah, shift constants are evil.   I sent 
https://lore.kernel.org/all/20220719174714.2410374-1-pbonz...@redhat.com/ to 
clean this up.


Paolo


and consumed by:

   void __init intel_ds_init(void)

/*
 * No support for 32bit formats
 */
if (!boot_cpu_has(X86_FEATURE_DTES64))
return;

x86_pmu.bts  = boot_cpu_has(X86_FEATURE_BTS);
x86_pmu.pebs = boot_cpu_has(X86_FEATURE_PEBS);
x86_pmu.pebs_buffer_size = PEBS_BUFFER_SIZE;

Re: [PULL 0/3] Misc patches for QEMU 7.1 freeze

2022-07-19 Thread Paolo Bonzini


On 7/19/22 14:35, Jason A. Donenfeld wrote:

  6 files changed, 19 insertions(+), 6 deletions(-)

Considering the subject line, I'm quite distressed that the i386
setup_data rng seed patch did not make it in. I just resent it to the
mailing list [1] in case you missed it before. Do you think you could
queue this up ASAP?


Sure, no problem.  Unfortunately I was on vacation around the time you 
sent it first.


Paolo

[PULL 21/21] hw/loongarch: Add fdt support

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

Add LoongArch flatted device tree, adding cpu device node, firmware cfg node,
pcie node into it, and create fdt rom memory region. Now fdt info is not
full since only uefi bios uses fdt, linux kernel does not use fdt.
Loongarch Linux kernel uses acpi table which is full in qemu virt
machine.

Reviewed-by: Richard Henderson 
Signed-off-by: Xiaojuan Yang 
Message-Id: <20220712083206.4187715-7-yangxiaoj...@loongson.cn>
[rth: Set TARGET_NEED_FDT, add fdt to meson.build]
Signed-off-by: Richard Henderson 
---
 configs/targets/loongarch64-softmmu.mak |   1 +
 include/hw/loongarch/virt.h |   4 +
 target/loongarch/cpu.h  |   3 +
 hw/loongarch/loongson3.c| 136 +++-
 target/loongarch/cpu.c  |   1 +
 hw/loongarch/meson.build|   2 +-
 6 files changed, 143 insertions(+), 4 deletions(-)

diff --git a/configs/targets/loongarch64-softmmu.mak 
b/configs/targets/loongarch64-softmmu.mak
index 7bc06c850c..483474ba93 100644
--- a/configs/targets/loongarch64-softmmu.mak
+++ b/configs/targets/loongarch64-softmmu.mak
@@ -2,3 +2,4 @@ TARGET_ARCH=loongarch64
 TARGET_BASE_ARCH=loongarch
 TARGET_SUPPORTS_MTTCG=y
 TARGET_XML_FILES= gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu64.xml
+TARGET_NEED_FDT=y
diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index fb4a4f4e7b..f4f24df428 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -28,6 +28,9 @@
 #define VIRT_GED_MEM_ADDR   (VIRT_GED_EVT_ADDR + ACPI_GED_EVT_SEL_LEN)
 #define VIRT_GED_REG_ADDR   (VIRT_GED_MEM_ADDR + MEMORY_HOTPLUG_IO_LEN)
 
+#define LA_FDT_BASE 0x1c40
+#define LA_FDT_SIZE 0x10
+
 struct LoongArchMachineState {
 /*< private >*/
 MachineState parent_obj;
@@ -45,6 +48,7 @@ struct LoongArchMachineState {
 char *oem_id;
 char *oem_table_id;
 DeviceState  *acpi_ged;
+int  fdt_size;
 };
 
 #define TYPE_LOONGARCH_MACHINE  MACHINE_TYPE_NAME("virt")
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index d141ec9b5d..a36349df83 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -326,6 +326,9 @@ struct ArchCPU {
 CPUNegativeOffsetState neg;
 CPULoongArchState env;
 QEMUTimer timer;
+
+/* 'compatible' string for this CPU for Linux device trees */
+const char *dtb_compatible;
 };
 
 #define TYPE_LOONGARCH_CPU "loongarch-cpu"
diff --git a/hw/loongarch/loongson3.c b/hw/loongarch/loongson3.c
index 3ec8cda8a1..a08dc9d299 100644
--- a/hw/loongarch/loongson3.c
+++ b/hw/loongarch/loongson3.c
@@ -35,6 +35,129 @@
 #include "qapi/qapi-visit-common.h"
 #include "hw/acpi/generic_event_device.h"
 #include "hw/mem/nvdimm.h"
+#include "sysemu/device_tree.h"
+#include 
+
+static void create_fdt(LoongArchMachineState *lams)
+{
+MachineState *ms = MACHINE(lams);
+
+ms->fdt = create_device_tree(&lams->fdt_size);
+if (!ms->fdt) {
+error_report("create_device_tree() failed");
+exit(1);
+}
+
+/* Header */
+qemu_fdt_setprop_string(ms->fdt, "/", "compatible",
+"linux,dummy-loongson3");
+qemu_fdt_setprop_cell(ms->fdt, "/", "#address-cells", 0x2);
+qemu_fdt_setprop_cell(ms->fdt, "/", "#size-cells", 0x2);
+}
+
+static void fdt_add_cpu_nodes(const LoongArchMachineState *lams)
+{
+int num;
+const MachineState *ms = MACHINE(lams);
+int smp_cpus = ms->smp.cpus;
+
+qemu_fdt_add_subnode(ms->fdt, "/cpus");
+qemu_fdt_setprop_cell(ms->fdt, "/cpus", "#address-cells", 0x1);
+qemu_fdt_setprop_cell(ms->fdt, "/cpus", "#size-cells", 0x0);
+
+/* cpu nodes */
+for (num = smp_cpus - 1; num >= 0; num--) {
+char *nodename = g_strdup_printf("/cpus/cpu@%d", num);
+LoongArchCPU *cpu = LOONGARCH_CPU(qemu_get_cpu(num));
+
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_string(ms->fdt, nodename, "device_type", "cpu");
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible",
+cpu->dtb_compatible);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "reg", num);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(ms->fdt));
+g_free(nodename);
+}
+
+/*cpu map */
+qemu_fdt_add_subnode(ms->fdt, "/cpus/cpu-map");
+
+for (num = smp_cpus - 1; num >= 0; num--) {
+char *cpu_path = g_strdup_printf("/cpus/cpu@%d", num);
+char *map_path;
+
+if (ms->smp.threads > 1) {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/socket%d/core%d/thread%d",
+num / (ms->smp.cores * ms->smp.threads),
+(num / ms->smp.threads) % ms->smp.cores,
+num % ms->smp.threads);
+} else {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/socket%d/core%d",
+num / ms->

[PULL 18/21] hw/loongarch: Add linux kernel booting support

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

There are two situations to start system by kernel file. If exists bios
option, system will boot from loaded bios file, else system will boot
from hardcoded auxcode, and jump to kernel elf entry.

Reviewed-by: Richard Henderson 
Signed-off-by: Xiaojuan Yang 
Message-Id: <20220712083206.4187715-4-yangxiaoj...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 hw/loongarch/loongson3.c | 114 +--
 1 file changed, 99 insertions(+), 15 deletions(-)

diff --git a/hw/loongarch/loongson3.c b/hw/loongarch/loongson3.c
index 3f1849b8b0..88e38ce17e 100644
--- a/hw/loongarch/loongson3.c
+++ b/hw/loongarch/loongson3.c
@@ -103,6 +103,8 @@ static const MemoryRegionOps loongarch_virt_pm_ops = {
 static struct _loaderparams {
 uint64_t ram_size;
 const char *kernel_filename;
+const char *kernel_cmdline;
+const char *initrd_filename;
 } loaderparams;
 
 static uint64_t cpu_loongarch_virt_to_phys(void *opaque, uint64_t addr)
@@ -352,18 +354,97 @@ static void reset_load_elf(void *opaque)
 }
 }
 
+/* Load an image file into an fw_cfg entry identified by key. */
+static void load_image_to_fw_cfg(FWCfgState *fw_cfg, uint16_t size_key,
+ uint16_t data_key, const char *image_name,
+ bool try_decompress)
+{
+size_t size = -1;
+uint8_t *data;
+
+if (image_name == NULL) {
+return;
+}
+
+if (try_decompress) {
+size = load_image_gzipped_buffer(image_name,
+ LOAD_IMAGE_MAX_GUNZIP_BYTES, &data);
+}
+
+if (size == (size_t)-1) {
+gchar *contents;
+gsize length;
+
+if (!g_file_get_contents(image_name, &contents, &length, NULL)) {
+error_report("failed to load \"%s\"", image_name);
+exit(1);
+}
+size = length;
+data = (uint8_t *)contents;
+}
+
+fw_cfg_add_i32(fw_cfg, size_key, size);
+fw_cfg_add_bytes(fw_cfg, data_key, data, size);
+}
+
+static void fw_cfg_add_kernel_info(FWCfgState *fw_cfg)
+{
+/*
+ * Expose the kernel, the command line, and the initrd in fw_cfg.
+ * We don't process them here at all, it's all left to the
+ * firmware.
+ */
+load_image_to_fw_cfg(fw_cfg,
+ FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
+ loaderparams.kernel_filename,
+ false);
+
+if (loaderparams.initrd_filename) {
+load_image_to_fw_cfg(fw_cfg,
+ FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
+ loaderparams.initrd_filename, false);
+}
+
+if (loaderparams.kernel_cmdline) {
+fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
+   strlen(loaderparams.kernel_cmdline) + 1);
+fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
+  loaderparams.kernel_cmdline);
+}
+}
+
+static void loongarch_firmware_boot(LoongArchMachineState *lams)
+{
+fw_cfg_add_kernel_info(lams->fw_cfg);
+}
+
+static void loongarch_direct_kernel_boot(LoongArchMachineState *lams)
+{
+MachineState *machine = MACHINE(lams);
+int64_t kernel_addr = 0;
+LoongArchCPU *lacpu;
+int i;
+
+kernel_addr = load_kernel_info();
+if (!machine->firmware) {
+for (i = 0; i < machine->smp.cpus; i++) {
+lacpu = LOONGARCH_CPU(qemu_get_cpu(i));
+lacpu->env.load_elf = true;
+lacpu->env.elf_address = kernel_addr;
+}
+}
+}
+
 static void loongarch_init(MachineState *machine)
 {
+LoongArchCPU *lacpu;
 const char *cpu_model = machine->cpu_type;
-const char *kernel_filename = machine->kernel_filename;
 ram_addr_t offset = 0;
 ram_addr_t ram_size = machine->ram_size;
 uint64_t highram_size = 0;
 MemoryRegion *address_space_mem = get_system_memory();
 LoongArchMachineState *lams = LOONGARCH_MACHINE(machine);
-LoongArchCPU *lacpu;
 int i;
-int64_t kernel_addr = 0;
 
 if (!cpu_model) {
 cpu_model = LOONGARCH_CPU_TYPE_NAME("la464");
@@ -412,20 +493,23 @@ static void loongarch_init(MachineState *machine)
 memmap_table,
 sizeof(struct memmap_entry) * (memmap_entries));
 }
-
-if (kernel_filename) {
-loaderparams.ram_size = ram_size;
-loaderparams.kernel_filename = kernel_filename;
-kernel_addr = load_kernel_info();
-if (!machine->firmware) {
-for (i = 0; i < machine->smp.cpus; i++) {
-lacpu = LOONGARCH_CPU(qemu_get_cpu(i));
-lacpu->env.load_elf = true;
-lacpu->env.elf_address = kernel_addr;
-qemu_register_reset(reset_load_elf, lacpu);
-}
+loaderparams.ram_size = ram_size;
+loaderparams.kernel_filename = machine->kernel_filename;
+loaderparams.kernel_cmdline = machine->kernel_cmdline;
+loaderparams.i

[PULL 16/21] hw/loongarch: Add fw_cfg table support

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

Add fw_cfg table for loongarch virt machine, including memmap table.

Reviewed-by: Richard Henderson 
Signed-off-by: Xiaojuan Yang 
Message-Id: <20220712083206.4187715-2-yangxiaoj...@loongson.cn>
[rth: Replace fprintf with assert; drop unused return value;
  initialize reserved slot to zero.]
Signed-off-by: Richard Henderson 
---
 hw/loongarch/fw_cfg.h   | 15 ++
 include/hw/loongarch/virt.h |  3 +++
 hw/loongarch/fw_cfg.c   | 33 +
 hw/loongarch/loongson3.c| 41 -
 hw/loongarch/meson.build|  3 +++
 5 files changed, 94 insertions(+), 1 deletion(-)
 create mode 100644 hw/loongarch/fw_cfg.h
 create mode 100644 hw/loongarch/fw_cfg.c

diff --git a/hw/loongarch/fw_cfg.h b/hw/loongarch/fw_cfg.h
new file mode 100644
index 00..7c0de4db4a
--- /dev/null
+++ b/hw/loongarch/fw_cfg.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU fw_cfg helpers (LoongArch specific)
+ *
+ * Copyright (C) 2021 Loongson Technology Corporation Limited
+ */
+
+#ifndef HW_LOONGARCH_FW_CFG_H
+#define HW_LOONGARCH_FW_CFG_H
+
+#include "hw/boards.h"
+#include "hw/nvram/fw_cfg.h"
+
+FWCfgState *loongarch_fw_cfg_init(ram_addr_t ram_size, MachineState *ms);
+#endif
diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 09a816191c..9fec1f8a5c 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -17,6 +17,7 @@
 
 #define LOONGARCH_ISA_IO_BASE   0x1800UL
 #define LOONGARCH_ISA_IO_SIZE   0x0004000
+#define VIRT_FWCFG_BASE 0x1e02UL
 
 struct LoongArchMachineState {
 /*< private >*/
@@ -26,6 +27,8 @@ struct LoongArchMachineState {
 MemoryRegion lowmem;
 MemoryRegion highmem;
 MemoryRegion isa_io;
+/* State for other subsystems/APIs: */
+FWCfgState  *fw_cfg;
 };
 
 #define TYPE_LOONGARCH_MACHINE  MACHINE_TYPE_NAME("virt")
diff --git a/hw/loongarch/fw_cfg.c b/hw/loongarch/fw_cfg.c
new file mode 100644
index 00..f6503d5607
--- /dev/null
+++ b/hw/loongarch/fw_cfg.c
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU fw_cfg helpers (LoongArch specific)
+ *
+ * Copyright (C) 2021 Loongson Technology Corporation Limited
+ */
+
+#include "qemu/osdep.h"
+#include "hw/loongarch/fw_cfg.h"
+#include "hw/loongarch/virt.h"
+#include "hw/nvram/fw_cfg.h"
+#include "sysemu/sysemu.h"
+
+static void fw_cfg_boot_set(void *opaque, const char *boot_device,
+Error **errp)
+{
+fw_cfg_modify_i16(opaque, FW_CFG_BOOT_DEVICE, boot_device[0]);
+}
+
+FWCfgState *loongarch_fw_cfg_init(ram_addr_t ram_size, MachineState *ms)
+{
+FWCfgState *fw_cfg;
+int max_cpus = ms->smp.max_cpus;
+int smp_cpus = ms->smp.cpus;
+
+fw_cfg = fw_cfg_init_mem_wide(VIRT_FWCFG_BASE + 8, VIRT_FWCFG_BASE, 8, 0, 
NULL);
+fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
+fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+
+qemu_register_boot_set(fw_cfg_boot_set, fw_cfg);
+return fw_cfg;
+}
diff --git a/hw/loongarch/loongson3.c b/hw/loongarch/loongson3.c
index 15fddfc4f5..9ee7450252 100644
--- a/hw/loongarch/loongson3.c
+++ b/hw/loongarch/loongson3.c
@@ -28,13 +28,40 @@
 #include "hw/pci-host/ls7a.h"
 #include "hw/pci-host/gpex.h"
 #include "hw/misc/unimp.h"
-
+#include "hw/loongarch/fw_cfg.h"
 #include "target/loongarch/cpu.h"
 
 #define PM_BASE 0x1008
 #define PM_SIZE 0x100
 #define PM_CTRL 0x10
 
+struct memmap_entry {
+uint64_t address;
+uint64_t length;
+uint32_t type;
+uint32_t reserved;
+};
+
+static struct memmap_entry *memmap_table;
+static unsigned memmap_entries;
+
+static void memmap_add_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+/* Ensure there are no duplicate entries. */
+for (unsigned i = 0; i < memmap_entries; i++) {
+assert(memmap_table[i].address != address);
+}
+
+memmap_table = g_renew(struct memmap_entry, memmap_table,
+   memmap_entries + 1);
+memmap_table[memmap_entries].address = cpu_to_le64(address);
+memmap_table[memmap_entries].length = cpu_to_le64(length);
+memmap_table[memmap_entries].type = cpu_to_le32(type);
+memmap_table[memmap_entries].reserved = 0;
+memmap_entries++;
+}
+
+
 /*
  * This is a placeholder for missing ACPI,
  * and will eventually be replaced.
@@ -331,15 +358,27 @@ static void loongarch_init(MachineState *machine)
  machine->ram, 0, 256 * MiB);
 memory_region_add_subregion(address_space_mem, offset, &lams->lowmem);
 offset += 256 * MiB;
+memmap_add_entry(0, 256 * MiB, 1);
 highram_size = ram_size - 256 * MiB;
 memory_region_init_alias(&lams->highmem, NULL, "loongarch.highmem",
  machine->ram, offset, highram_size);
 memory_region_add_subregion(address_space_mem,

Re: [RFC PATCH 7/8] block: use the new _change_ API instead of _can_set_ and _set_

2022-07-19 Thread Paolo Bonzini


On 7/19/22 11:57, Emanuele Giuseppe Esposito wrote:


Wrapping the new drains in aio_context_acquire/release(new_context) is
not so much helpful either, since apparently the following
blk_set_aio_context makes aio_poll() hang.
I am not sure why, any ideas?


I'll take a look, thanks.  In any case this doesn't block this series, 
it was just a suggestion and blk_do_set_aio_context can be improved on top.


Paolo

[PULL 14/21] tests/tcg/loongarch64: Add fp comparison instructions test

2022-07-19 Thread Richard Henderson

From: Song Gao 

Choose some instructions to test:
- FCMP.cond.S
- cond: ceq clt cle cne seq slt sle sne

Signed-off-by: Song Gao 
Message-Id: <20220716085426.3098060-8-gaos...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 tests/tcg/loongarch64/test_fpcom.c| 37 +++
 tests/tcg/loongarch64/Makefile.target |  1 +
 2 files changed, 38 insertions(+)
 create mode 100644 tests/tcg/loongarch64/test_fpcom.c

diff --git a/tests/tcg/loongarch64/test_fpcom.c 
b/tests/tcg/loongarch64/test_fpcom.c
new file mode 100644
index 00..9e81f767f9
--- /dev/null
+++ b/tests/tcg/loongarch64/test_fpcom.c
@@ -0,0 +1,37 @@
+#include 
+
+#define TEST_COMP(N)  \
+void test_##N(float fj, float fk) \
+{ \
+int rd = 0;   \
+  \
+asm volatile("fcmp."#N".s $fcc6,%1,%2\n"  \
+ "movcf2gr %0, $fcc6\n"   \
+ : "=r"(rd)   \
+ : "f"(fj), "f"(fk)   \
+ : ); \
+assert(rd == 1);  \
+}
+
+TEST_COMP(ceq)
+TEST_COMP(clt)
+TEST_COMP(cle)
+TEST_COMP(cne)
+TEST_COMP(seq)
+TEST_COMP(slt)
+TEST_COMP(sle)
+TEST_COMP(sne)
+
+int main()
+{
+test_ceq(0xff700102, 0xff700102);
+test_clt(0x00730007, 0xff730007);
+test_cle(0xff70130a, 0xff70130b);
+test_cne(0x1238acde, 0xff7f);
+test_seq(0xff766618, 0xff766619);
+test_slt(0xff78881c, 0xff78901d);
+test_sle(0xff780b22, 0xff790b22);
+test_sne(0xff7bcd25, 0xff7a26cf);
+
+return 0;
+}
diff --git a/tests/tcg/loongarch64/Makefile.target 
b/tests/tcg/loongarch64/Makefile.target
index 59d564725a..b320d9fd9c 100644
--- a/tests/tcg/loongarch64/Makefile.target
+++ b/tests/tcg/loongarch64/Makefile.target
@@ -13,5 +13,6 @@ LDFLAGS+=-lm
 LOONGARCH64_TESTS  = test_bit
 LOONGARCH64_TESTS  += test_div
 LOONGARCH64_TESTS  += test_fclass
+LOONGARCH64_TESTS  += test_fpcom
 
 TESTS += $(LOONGARCH64_TESTS)
-- 
2.34.1

[PULL 13/21] tests/tcg/loongarch64: Add fclass test

2022-07-19 Thread Richard Henderson

From: Song Gao 

This includes:
- FCLASS.{S/D}

Signed-off-by: Song Gao 
Message-Id: <20220716085426.3098060-7-gaos...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 tests/tcg/loongarch64/test_fclass.c   | 130 ++
 tests/tcg/loongarch64/Makefile.target |   1 +
 2 files changed, 131 insertions(+)
 create mode 100644 tests/tcg/loongarch64/test_fclass.c

diff --git a/tests/tcg/loongarch64/test_fclass.c 
b/tests/tcg/loongarch64/test_fclass.c
new file mode 100644
index 00..7ba1d2c151
--- /dev/null
+++ b/tests/tcg/loongarch64/test_fclass.c
@@ -0,0 +1,130 @@
+#include 
+
+/* float class */
+#define FLOAT_CLASS_SIGNALING_NAN  0x001
+#define FLOAT_CLASS_QUIET_NAN  0x002
+#define FLOAT_CLASS_NEGATIVE_INFINITY  0x004
+#define FLOAT_CLASS_NEGATIVE_NORMAL0x008
+#define FLOAT_CLASS_NEGATIVE_SUBNORMAL 0x010
+#define FLOAT_CLASS_NEGATIVE_ZERO  0x020
+#define FLOAT_CLASS_POSITIVE_INFINITY  0x040
+#define FLOAT_CLASS_POSITIVE_NORMAL0x080
+#define FLOAT_CLASS_POSITIVE_SUBNORMAL 0x100
+#define FLOAT_CLASS_POSITIVE_ZERO  0x200
+
+#define TEST_FCLASS(N)\
+void test_fclass_##N(long s)  \
+{ \
+double fd;\
+long rd;  \
+  \
+asm volatile("fclass."#N" %0, %2\n\t" \
+ "movfr2gr."#N" %1, %2\n\t"   \
+: "=f"(fd), "=r"(rd)  \
+: "f"(s)  \
+: );  \
+switch (rd) { \
+case FLOAT_CLASS_SIGNALING_NAN:   \
+case FLOAT_CLASS_QUIET_NAN:   \
+case FLOAT_CLASS_NEGATIVE_INFINITY:   \
+case FLOAT_CLASS_NEGATIVE_NORMAL: \
+case FLOAT_CLASS_NEGATIVE_SUBNORMAL:  \
+case FLOAT_CLASS_NEGATIVE_ZERO:   \
+case FLOAT_CLASS_POSITIVE_INFINITY:   \
+case FLOAT_CLASS_POSITIVE_NORMAL: \
+case FLOAT_CLASS_POSITIVE_SUBNORMAL:  \
+case FLOAT_CLASS_POSITIVE_ZERO:   \
+break;\
+default:  \
+printf("fclass."#N" test failed.\n"); \
+break;\
+} \
+}
+
+/*
+ *  float format
+ *  type |S  | Exponent  |  Fraction|  example value
+ *31 | 30 --23   | 22  | 21 --0 |
+ *   | bit |
+ *  SNAN 0/1 |   0xFF| 0   |  !=0   |  0x7FBF
+ *  QNAN 0/1 |   0xFF| 1   ||  0x7FCF
+ *  -infinity 1  |   0xFF| 0|  0xFF80
+ *  -normal   1  | [1, 0xFE] | [0, 0x7F]|  0xFF7F
+ *  -subnormal1  |0  |!=0   |  0x807F
+ *  -01  |0  | 0|  0x8000
+ *  +infinity 0  |   0xFF| 0|  0x7F80
+ *  +normal   0  | [1, 0xFE] | [0, 0x7F]|  0x7F7F
+ *  +subnormal0  |0  |!=0   |  0x007F
+ *  +00  |0  | 0|  0x
+ */
+
+long float_snan = 0x7FBF;
+long float_qnan = 0x7FCF;
+long float_neg_infinity = 0xFF80;
+long float_neg_normal = 0xFF7F;
+long float_neg_subnormal = 0x807F;
+long float_neg_zero = 0x8000;
+long float_post_infinity = 0x7F80;
+long float_post_normal = 0x7F7F;
+long float_post_subnormal = 0x007F;
+long float_post_zero = 0x;
+
+/*
+ * double format
+ *  type |S  | Exponent  |  Fraction |  example value
+ *63 | 62  -- 52 | 51  | 50 -- 0 |
+ *   | bit |
+ *  SNAN 0/1 |  0x7FF| 0   |  !=0| 0x7FF7
+ *  QNAN 0/1 |  0x7FF| 1   | | 0x7FFF
+ * -infinity  1  |  0x7FF|0  | 0xFFF0
+ * -normal1  |[1, 0x7FE] |   | 0xFFEF
+ * -subnormal 1  |   0   |   !=0 | 0x8007
+ * -0 1  |   0   |0  | 0x8000
+ * +infinity  0  |  0x7FF|0  | 0x7FF0
+ * +normal0  |[1, 0x7FE] |   | 0x7FEF
+ * +subnormal 0  |  0|   !=0 | 0x000F
+ * +0 0  |  0|   0   | 0x
+ */
+
+long double_snan = 0x7FF7;
+long double_qnan = 0x7FFF;
+long double_neg_infinity = 0xFFF0;
+long double_neg_normal = 0xFFEF;
+long double_neg_subnormal = 0x8007;
+long double_neg_zero = 0x8000;
+long double_post_infinity = 0x7FF0;
+long double_post_normal = 0x7FEF;
+long double_post_subnor

Re: [PULL 0/3] Misc patches for QEMU 7.1 freeze

2022-07-19 Thread Jason A. Donenfeld

Hi Paolo,

On Tue, Jul 19, 2022 at 8:15 PM Paolo Bonzini  wrote:
>
> On 7/19/22 14:35, Jason A. Donenfeld wrote:
> >>   6 files changed, 19 insertions(+), 6 deletions(-)
> > Considering the subject line, I'm quite distressed that the i386
> > setup_data rng seed patch did not make it in. I just resent it to the
> > mailing list [1] in case you missed it before. Do you think you could
> > queue this up ASAP?
>
> Sure, no problem.  Unfortunately I was on vacation around the time you
> sent it first.

Excellent, thanks so much!

Jason

[PULL 10/21] tests/tcg/loongarch64: Add float reference files

2022-07-19 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Generated on Loongson-3A5000 (CPU revision 0x0014c011).

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20220104132022.2146857-1-f4...@amsat.org>
Signed-off-by: Song Gao 
Message-Id: <20220716085426.3098060-2-gaos...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 tests/tcg/loongarch64/float_convd.ref | 988 ++
 tests/tcg/loongarch64/float_convs.ref | 748 +++
 tests/tcg/loongarch64/float_madds.ref | 768 
 3 files changed, 2504 insertions(+)
 create mode 100644 tests/tcg/loongarch64/float_convd.ref
 create mode 100644 tests/tcg/loongarch64/float_convs.ref
 create mode 100644 tests/tcg/loongarch64/float_madds.ref

diff --git a/tests/tcg/loongarch64/float_convd.ref 
b/tests/tcg/loongarch64/float_convd.ref
new file mode 100644
index 00..08d3dfa2fe
--- /dev/null
+++ b/tests/tcg/loongarch64/float_convd.ref
@@ -0,0 +1,988 @@
+### Rounding to nearest
+from double: f64(nan:0x007ff4)
+  to single: f32(nan:0x7fe0) (INVALID)
+   to int32: 0 (INVALID)
+   to int64: 0 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from double: f64(-nan:0x00fff8)
+  to single: f32(-nan:0xffc0) (OK)
+   to int32: 0 (INVALID)
+   to int64: 0 (INVALID)
+  to uint32: 0 (INVALID)
+  to uint64: 0 (INVALID)
+from double: f64(-inf:0x00fff0)
+  to single: f32(-inf:0xff80) (OK)
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: -2147483648 (INVALID)
+  to uint64: -9223372036854775808 (INVALID)
+from double: f64(-0x1.f000p+1023:0x00ffef)
+  to single: f32(-inf:0xff80) (OVERFLOW INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: -2147483648 (INVALID)
+  to uint64: -9223372036854775808 (INVALID)
+from double: f64(-0x1.fe00p+127:0x00c7efe000)
+  to single: f32(-0x1.fe00p+127:0xff7f) (OK)
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: -2147483648 (INVALID)
+  to uint64: -9223372036854775808 (INVALID)
+from double: f64(-0x1.fe00p+127:0x00c7efe000)
+  to single: f32(-0x1.fe00p+127:0xff7f) (OK)
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: -2147483648 (INVALID)
+  to uint64: -9223372036854775808 (INVALID)
+from double: f64(-0x1.1874b135ff654000p+103:0x00c661874b135ff654)
+  to single: f32(-0x1.1874b200p+103:0xf30c3a59) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: -2147483648 (INVALID)
+  to uint64: -9223372036854775808 (INVALID)
+from double: f64(-0x1.c0bab523323b9000p+99:0x00c62c0bab523323b9)
+  to single: f32(-0x1.c0bab600p+99:0xf1605d5b) (INEXACT )
+   to int32: -2147483648 (INVALID)
+   to int64: -9223372036854775808 (INVALID)
+  to uint32: -2147483648 (INVALID)
+  to uint64: -9223372036854775808 (INVALID)
+from double: f64(-0x1.p+1:0x00c000)
+  to single: f32(-0x1.p+1:0xc000) (OK)
+   to int32: -2 (OK)
+   to int64: -2 (OK)
+  to uint32: -2 (OK)
+  to uint64: -2 (OK)
+from double: f64(-0x1.p+0:0x00bff0)
+  to single: f32(-0x1.p+0:0xbf80) (OK)
+   to int32: -1 (OK)
+   to int64: -1 (OK)
+  to uint32: -1 (OK)
+  to uint64: -1 (OK)
+from double: f64(-0x1.p-1022:0x008010)
+  to single: f32(-0x0.p+0:0x8000) (UNDERFLOW INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from double: f64(-0x1.p-126:0x00b810)
+  to single: f32(-0x1.p-126:0x8080) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from double: f64(0x0.p+0:)
+  to single: f32(0x0.p+0:00) (OK)
+   to int32: 0 (OK)
+   to int64: 0 (OK)
+  to uint32: 0 (OK)
+  to uint64: 0 (OK)
+from double: f64(0x1.p-126:0x003810)
+  to single: f32(0x1.p-126:0x0080) (OK)
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from double: f64(0x1.0001c5f68000p-25:0x003e60001c5f68)
+  to single: f32(0x1.p-25:0x3300) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from double: f64(0x1.e6cb2fa82000p-25:0x003e6e6cb2fa82)
+  to single: f32(0x1.e600p-25:0x3373) (INEXACT )
+   to int32: 0 (INEXACT )
+   to int64: 0 (INEXACT )
+  to uint32: 0 (INEXACT )
+  to uint64: 0 (INEXACT )
+from double: f

[PULL 20/21] hw/loongarch: Add acpi ged support

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

Loongarch virt machine uses general hardware reduces acpi method, rather
than LS7A acpi device. Now only power management function is used in
acpi ged device, memory hotplug will be added later. Also acpi tables
such as RSDP/RSDT/FADT etc.

The acpi table has submited to acpi spec, and will release soon.

Acked-by: Richard Henderson 
Signed-off-by: Xiaojuan Yang 
Message-Id: <20220712083206.4187715-6-yangxiaoj...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 include/hw/loongarch/virt.h |  13 +
 include/hw/pci-host/ls7a.h  |   4 +
 hw/loongarch/acpi-build.c   | 609 
 hw/loongarch/loongson3.c|  78 -
 hw/loongarch/Kconfig|   2 +
 hw/loongarch/meson.build|   1 +
 6 files changed, 704 insertions(+), 3 deletions(-)
 create mode 100644 hw/loongarch/acpi-build.c

diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 9b7cdfae78..fb4a4f4e7b 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -21,6 +21,13 @@
 #define VIRT_BIOS_BASE  0x1c00UL
 #define VIRT_BIOS_SIZE  (4 * MiB)
 
+#define VIRT_LOWMEM_BASE0
+#define VIRT_LOWMEM_SIZE0x1000
+#define VIRT_HIGHMEM_BASE   0x9000
+#define VIRT_GED_EVT_ADDR   0x100e
+#define VIRT_GED_MEM_ADDR   (VIRT_GED_EVT_ADDR + ACPI_GED_EVT_SEL_LEN)
+#define VIRT_GED_REG_ADDR   (VIRT_GED_MEM_ADDR + MEMORY_HOTPLUG_IO_LEN)
+
 struct LoongArchMachineState {
 /*< private >*/
 MachineState parent_obj;
@@ -34,8 +41,14 @@ struct LoongArchMachineState {
 /* State for other subsystems/APIs: */
 FWCfgState  *fw_cfg;
 Notifier machine_done;
+OnOffAutoacpi;
+char *oem_id;
+char *oem_table_id;
+DeviceState  *acpi_ged;
 };
 
 #define TYPE_LOONGARCH_MACHINE  MACHINE_TYPE_NAME("virt")
 OBJECT_DECLARE_SIMPLE_TYPE(LoongArchMachineState, LOONGARCH_MACHINE)
+bool loongarch_is_acpi_enabled(LoongArchMachineState *lams);
+void loongarch_acpi_setup(LoongArchMachineState *lams);
 #endif
diff --git a/include/hw/pci-host/ls7a.h b/include/hw/pci-host/ls7a.h
index 08c5f78be2..0fdc86b973 100644
--- a/include/hw/pci-host/ls7a.h
+++ b/include/hw/pci-host/ls7a.h
@@ -23,6 +23,9 @@
 #define LS7A_PCI_IO_BASE0x18004000UL
 #define LS7A_PCI_IO_SIZE0xC000
 
+#define LS7A_PCI_MEM_BASE   0x4000UL
+#define LS7A_PCI_MEM_SIZE   0x4000UL
+
 #define LS7A_PCH_REG_BASE   0x1000UL
 #define LS7A_IOAPIC_REG_BASE(LS7A_PCH_REG_BASE)
 #define LS7A_PCH_MSI_ADDR_LOW   0x2FF0UL
@@ -41,4 +44,5 @@
 #define LS7A_MISC_REG_BASE  (LS7A_PCH_REG_BASE + 0x0008)
 #define LS7A_RTC_REG_BASE   (LS7A_MISC_REG_BASE + 0x00050100)
 #define LS7A_RTC_LEN0x100
+#define LS7A_SCI_IRQ(PCH_PIC_IRQ_OFFSET + 4)
 #endif
diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
new file mode 100644
index 00..b95b83b079
--- /dev/null
+++ b/hw/loongarch/acpi-build.c
@@ -0,0 +1,609 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Support for generating ACPI tables and passing them to Guests
+ *
+ * Copyright (C) 2021 Loongson Technology Corporation Limited
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/bitmap.h"
+#include "hw/pci/pci.h"
+#include "hw/core/cpu.h"
+#include "target/loongarch/cpu.h"
+#include "hw/acpi/acpi-defs.h"
+#include "hw/acpi/acpi.h"
+#include "hw/nvram/fw_cfg.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "migration/vmstate.h"
+#include "hw/mem/memory-device.h"
+#include "sysemu/reset.h"
+
+/* Supported chipsets: */
+#include "hw/pci-host/ls7a.h"
+#include "hw/loongarch/virt.h"
+#include "hw/acpi/aml-build.h"
+
+#include "hw/acpi/utils.h"
+#include "hw/acpi/pci.h"
+
+#include "qom/qom-qobject.h"
+
+#include "hw/acpi/generic_event_device.h"
+
+#define ACPI_BUILD_ALIGN_SIZE 0x1000
+#define ACPI_BUILD_TABLE_SIZE 0x2
+
+#ifdef DEBUG_ACPI_BUILD
+#define ACPI_BUILD_DPRINTF(fmt, ...)\
+do {printf("ACPI_BUILD: " fmt, ## __VA_ARGS__); } while (0)
+#else
+#define ACPI_BUILD_DPRINTF(fmt, ...)
+#endif
+
+/* build FADT */
+static void init_common_fadt_data(AcpiFadtData *data)
+{
+AcpiFadtData fadt = {
+/* ACPI 5.0: 4.1 Hardware-Reduced ACPI */
+.rev = 5,
+.flags = ((1 << ACPI_FADT_F_HW_REDUCED_ACPI) |
+  (1 << ACPI_FADT_F_RESET_REG_SUP)),
+
+/* ACPI 5.0: 4.8.3.7 Sleep Control and Status Registers */
+.sleep_ctl = {
+.space_id = AML_AS_SYSTEM_MEMORY,
+.bit_width = 8,
+.address = VIRT_GED_REG_ADDR + ACPI_GED_REG_SLEEP_CTL,
+},
+.sleep_sts = {
+.space_id = AML_AS_SYSTEM_MEMORY,
+.bit_width = 8,
+.address = VIRT_GED_REG_ADDR + ACPI_GED_REG_SLEEP_STS,
+},
+
+/* ACPI 5.0: 4.8.3.6 Reset Register */
+.reset_reg = {
+.space_id = AML_AS_SYSTEM_MEMORY,
+

[PULL 09/21] target/loongarch: Fix float_convd/float_convs test failing

2022-07-19 Thread Richard Henderson

From: Song Gao 

We should result zero when exception is invalid and operation is nan

Signed-off-by: Song Gao 
Message-Id: <20220716085426.3098060-4-gaos...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 target/loongarch/fpu_helper.c | 143 +++---
 1 file changed, 80 insertions(+), 63 deletions(-)

diff --git a/target/loongarch/fpu_helper.c b/target/loongarch/fpu_helper.c
index 3d0cb8dd0d..bd76529219 100644
--- a/target/loongarch/fpu_helper.c
+++ b/target/loongarch/fpu_helper.c
@@ -13,9 +13,6 @@
 #include "fpu/softfloat.h"
 #include "internals.h"
 
-#define FLOAT_TO_INT32_OVERFLOW 0x7fff
-#define FLOAT_TO_INT64_OVERFLOW 0x7fffULL
-
 static inline uint64_t nanbox_s(float32 fp)
 {
 return fp | MAKE_64BIT_MASK(32, 32);
@@ -544,9 +541,10 @@ uint64_t helper_ftintrm_l_d(CPULoongArchState *env, 
uint64_t fj)
 fd = float64_to_int64(fj, &env->fp_status);
 set_float_rounding_mode(old_mode, &env->fp_status);
 
-if (get_float_exception_flags(&env->fp_status) &
-(float_flag_invalid | float_flag_overflow)) {
-fd = FLOAT_TO_INT64_OVERFLOW;
+if (get_float_exception_flags(&env->fp_status) & (float_flag_invalid)) {
+if (float64_is_any_nan(fj)) {
+fd = 0;
+}
 }
 update_fcsr0(env, GETPC());
 return fd;
@@ -561,9 +559,10 @@ uint64_t helper_ftintrm_l_s(CPULoongArchState *env, 
uint64_t fj)
 fd = float32_to_int64((uint32_t)fj, &env->fp_status);
 set_float_rounding_mode(old_mode, &env->fp_status);
 
-if (get_float_exception_flags(&env->fp_status) &
-(float_flag_invalid | float_flag_overflow)) {
-fd = FLOAT_TO_INT64_OVERFLOW;
+if (get_float_exception_flags(&env->fp_status) & (float_flag_invalid)) {
+if (float32_is_any_nan((uint32_t)fj)) {
+fd = 0;
+}
 }
 update_fcsr0(env, GETPC());
 return fd;
@@ -578,9 +577,10 @@ uint64_t helper_ftintrm_w_d(CPULoongArchState *env, 
uint64_t fj)
 fd = (uint64_t)float64_to_int32(fj, &env->fp_status);
 set_float_rounding_mode(old_mode, &env->fp_status);
 
-if (get_float_exception_flags(&env->fp_status) &
-(float_flag_invalid | float_flag_overflow)) {
-fd = FLOAT_TO_INT32_OVERFLOW;
+if (get_float_exception_flags(&env->fp_status) & (float_flag_invalid)) {
+if (float64_is_any_nan(fj)) {
+fd = 0;
+}
 }
 update_fcsr0(env, GETPC());
 return fd;
@@ -595,9 +595,10 @@ uint64_t helper_ftintrm_w_s(CPULoongArchState *env, 
uint64_t fj)
 fd = (uint64_t)float32_to_int32((uint32_t)fj, &env->fp_status);
 set_float_rounding_mode(old_mode, &env->fp_status);
 
-if (get_float_exception_flags(&env->fp_status) &
-(float_flag_invalid | float_flag_overflow)) {
-fd = FLOAT_TO_INT32_OVERFLOW;
+if (get_float_exception_flags(&env->fp_status) & (float_flag_invalid)) {
+if (float32_is_any_nan((uint32_t)fj)) {
+fd = 0;
+}
 }
 update_fcsr0(env, GETPC());
 return fd;
@@ -612,9 +613,10 @@ uint64_t helper_ftintrp_l_d(CPULoongArchState *env, 
uint64_t fj)
 fd = float64_to_int64(fj, &env->fp_status);
 set_float_rounding_mode(old_mode, &env->fp_status);
 
-if (get_float_exception_flags(&env->fp_status) &
-(float_flag_invalid | float_flag_overflow)) {
-fd = FLOAT_TO_INT64_OVERFLOW;
+if (get_float_exception_flags(&env->fp_status) & (float_flag_invalid)) {
+if (float64_is_any_nan(fj)) {
+fd = 0;
+}
 }
 update_fcsr0(env, GETPC());
 return fd;
@@ -629,9 +631,10 @@ uint64_t helper_ftintrp_l_s(CPULoongArchState *env, 
uint64_t fj)
 fd = float32_to_int64((uint32_t)fj, &env->fp_status);
 set_float_rounding_mode(old_mode, &env->fp_status);
 
-if (get_float_exception_flags(&env->fp_status) &
-(float_flag_invalid | float_flag_overflow)) {
-fd = FLOAT_TO_INT64_OVERFLOW;
+if (get_float_exception_flags(&env->fp_status) & (float_flag_invalid)) {
+if (float32_is_any_nan((uint32_t)fj)) {
+fd = 0;
+}
 }
 update_fcsr0(env, GETPC());
 return fd;
@@ -646,9 +649,10 @@ uint64_t helper_ftintrp_w_d(CPULoongArchState *env, 
uint64_t fj)
 fd = (uint64_t)float64_to_int32(fj, &env->fp_status);
 set_float_rounding_mode(old_mode, &env->fp_status);
 
-if (get_float_exception_flags(&env->fp_status) &
-(float_flag_invalid | float_flag_overflow)) {
-fd = FLOAT_TO_INT32_OVERFLOW;
+if (get_float_exception_flags(&env->fp_status) & (float_flag_invalid)) {
+if (float64_is_any_nan(fj)) {
+fd = 0;
+}
 }
 update_fcsr0(env, GETPC());
 return fd;
@@ -663,9 +667,10 @@ uint64_t helper_ftintrp_w_s(CPULoongArchState *env, 
uint64_t fj)
 fd = (uint64_t)float32_to_int32((uint32_t)fj, &env->fp_status);
 set_float_rounding_mode(old_mode, &env->fp_status);
 
-if (get_float_exception_flags(&env->fp_status) &
-(float_flag_inva

Re: [PULL 07/16] configure, meson: move ARCH to meson.build

2022-07-19 Thread Paolo Bonzini


On 7/19/22 15:00, Peter Maydell wrote:

shellcheck points out that this (old) commit removed the code
setting ARCH from configure, but left behind a use of it:

case "$ARCH" in
alpha)
   # Ensure there's only a single GP
   QEMU_CFLAGS="-msmall-data $QEMU_CFLAGS"
;;
esac

Presumably meson.build needs to do some equivalent of this ?


Yeah, I'll send a patch before 7.1 gets out (Richard, as the resident 
Alpha guy do you why it is needed?).


Paolo

[PULL 12/21] tests/tcg/loongarch64: Add div and mod related instructions test

2022-07-19 Thread Richard Henderson

From: Song Gao 

This includes:
- DIV.{W[U]/D[U]}
- MOD.{W[U]/D[U]}

Signed-off-by: Song Gao 
Message-Id: <20220716085426.3098060-6-gaos...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 tests/tcg/loongarch64/test_div.c  | 54 +++
 tests/tcg/loongarch64/Makefile.target |  1 +
 2 files changed, 55 insertions(+)
 create mode 100644 tests/tcg/loongarch64/test_div.c

diff --git a/tests/tcg/loongarch64/test_div.c b/tests/tcg/loongarch64/test_div.c
new file mode 100644
index 00..6c31fe97ae
--- /dev/null
+++ b/tests/tcg/loongarch64/test_div.c
@@ -0,0 +1,54 @@
+#include 
+#include 
+#include 
+
+#define TEST_DIV(N, M)   \
+static void test_div_ ##N(uint ## M ## _t rj,\
+  uint ## M ## _t rk,\
+  uint64_t rm)   \
+{\
+uint64_t rd = 0; \
+ \
+asm volatile("div."#N" %0,%1,%2\n\t" \
+ : "=r"(rd)  \
+ : "r"(rj), "r"(rk)  \
+ : );\
+assert(rd == rm);\
+}
+
+#define TEST_MOD(N, M)   \
+static void test_mod_ ##N(uint ## M ## _t rj,\
+  uint ## M ## _t rk,\
+  uint64_t rm)   \
+{\
+uint64_t rd = 0; \
+ \
+asm volatile("mod."#N" %0,%1,%2\n\t" \
+ : "=r"(rd)  \
+ : "r"(rj), "r"(rk)  \
+ : );\
+assert(rd == rm);\
+}
+
+TEST_DIV(w, 32)
+TEST_DIV(wu, 32)
+TEST_DIV(d, 64)
+TEST_DIV(du, 64)
+TEST_MOD(w, 32)
+TEST_MOD(wu, 32)
+TEST_MOD(d, 64)
+TEST_MOD(du, 64)
+
+int main(void)
+{
+test_div_w(0xffaced97, 0xc36abcde, 0x0);
+test_div_wu(0xffaced97, 0xc36abcde, 0x1);
+test_div_d(0xffaced973582005f, 0xef56832a358b, 0xffa8);
+test_div_du(0xffaced973582005f, 0xef56832a358b, 0x11179);
+test_mod_w(0x7cf18c32, 0xa04da650, 0x1d3f3282);
+test_mod_wu(0x7cf18c32, 0xc04da650, 0x7cf18c32);
+test_mod_d(0x7cf18c32, 0xa04da650, 0x1d3f3282);
+test_mod_du(0x7cf18c32, 0xc04da650, 0x7cf18c32);
+
+return 0;
+}
diff --git a/tests/tcg/loongarch64/Makefile.target 
b/tests/tcg/loongarch64/Makefile.target
index c0bd8b9b86..24d6bb11e9 100644
--- a/tests/tcg/loongarch64/Makefile.target
+++ b/tests/tcg/loongarch64/Makefile.target
@@ -11,5 +11,6 @@ VPATH += $(LOONGARCH64_SRC)
 LDFLAGS+=-lm
 
 LOONGARCH64_TESTS  = test_bit
+LOONGARCH64_TESTS  += test_div
 
 TESTS += $(LOONGARCH64_TESTS)
-- 
2.34.1

[PULL 19/21] hw/loongarch: Add smbios support

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

Add smbios support for loongarch virt machine, and put them into fw_cfg
table so that bios can parse them quickly. The weblink of smbios spec:
https://www.dmtf.org/dsp/DSP0134, the version is 3.6.0.

Acked-by: Richard Henderson 
Signed-off-by: Xiaojuan Yang 
Message-Id: <20220712083206.4187715-5-yangxiaoj...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 include/hw/loongarch/virt.h |  1 +
 hw/loongarch/loongson3.c| 36 
 hw/loongarch/Kconfig|  1 +
 3 files changed, 38 insertions(+)

diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index ec37d86e44..9b7cdfae78 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -33,6 +33,7 @@ struct LoongArchMachineState {
 bool bios_loaded;
 /* State for other subsystems/APIs: */
 FWCfgState  *fw_cfg;
+Notifier machine_done;
 };
 
 #define TYPE_LOONGARCH_MACHINE  MACHINE_TYPE_NAME("virt")
diff --git a/hw/loongarch/loongson3.c b/hw/loongarch/loongson3.c
index 88e38ce17e..205894d343 100644
--- a/hw/loongarch/loongson3.c
+++ b/hw/loongarch/loongson3.c
@@ -30,11 +30,45 @@
 #include "hw/misc/unimp.h"
 #include "hw/loongarch/fw_cfg.h"
 #include "target/loongarch/cpu.h"
+#include "hw/firmware/smbios.h"
 
 #define PM_BASE 0x1008
 #define PM_SIZE 0x100
 #define PM_CTRL 0x10
 
+static void virt_build_smbios(LoongArchMachineState *lams)
+{
+MachineState *ms = MACHINE(lams);
+MachineClass *mc = MACHINE_GET_CLASS(lams);
+uint8_t *smbios_tables, *smbios_anchor;
+size_t smbios_tables_len, smbios_anchor_len;
+const char *product = "QEMU Virtual Machine";
+
+if (!lams->fw_cfg) {
+return;
+}
+
+smbios_set_defaults("QEMU", product, mc->name, false,
+true, SMBIOS_ENTRY_POINT_TYPE_64);
+
+smbios_get_tables(ms, NULL, 0, &smbios_tables, &smbios_tables_len,
+  &smbios_anchor, &smbios_anchor_len, &error_fatal);
+
+if (smbios_anchor) {
+fw_cfg_add_file(lams->fw_cfg, "etc/smbios/smbios-tables",
+smbios_tables, smbios_tables_len);
+fw_cfg_add_file(lams->fw_cfg, "etc/smbios/smbios-anchor",
+smbios_anchor, smbios_anchor_len);
+}
+}
+
+static void virt_machine_done(Notifier *notifier, void *data)
+{
+LoongArchMachineState *lams = container_of(notifier,
+LoongArchMachineState, machine_done);
+virt_build_smbios(lams);
+}
+
 struct memmap_entry {
 uint64_t address;
 uint64_t length;
@@ -512,6 +546,8 @@ static void loongarch_init(MachineState *machine)
 }
 /* Initialize the IO interrupt subsystem */
 loongarch_irq_init(lams);
+lams->machine_done.notify = virt_machine_done;
+qemu_add_machine_init_done_notifier(&lams->machine_done);
 }
 
 static void loongarch_class_init(ObjectClass *oc, void *data)
diff --git a/hw/loongarch/Kconfig b/hw/loongarch/Kconfig
index 35b6680772..610552e522 100644
--- a/hw/loongarch/Kconfig
+++ b/hw/loongarch/Kconfig
@@ -14,3 +14,4 @@ config LOONGARCH_VIRT
 select LOONGARCH_PCH_MSI
 select LOONGARCH_EXTIOI
 select LS7A_RTC
+select SMBIOS
-- 
2.34.1

[PULL 07/21] target/loongarch/cpu: Fix cpucfg default value

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

We should config cpucfg[20] to set value for the scache's ways, sets,
and size arguments when loongarch cpu init. However, the old code
wirte 'sets argument' twice, so we change one of them to 'size argument'.

Signed-off-by: Xiaojuan Yang 
Reviewed-by: Richard Henderson 
Message-Id: <20220715064829.1521482-1-yangxiaoj...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 target/loongarch/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 0d49ce68e4..1415793d6f 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -406,7 +406,7 @@ static void loongarch_la464_initfn(Object *obj)
 data = 0;
 data = FIELD_DP32(data, CPUCFG20, L3IU_WAYS, 15);
 data = FIELD_DP32(data, CPUCFG20, L3IU_SETS, 14);
-data = FIELD_DP32(data, CPUCFG20, L3IU_SETS, 6);
+data = FIELD_DP32(data, CPUCFG20, L3IU_SIZE, 6);
 env->cpucfg[20] = data;
 
 env->CSR_ASID = FIELD_DP64(0, CSR_ASID, ASIDBITS, 0xa);
-- 
2.34.1

[PULL 11/21] tests/tcg/loongarch64: Add clo related instructions test

2022-07-19 Thread Richard Henderson

From: Song Gao 

This includes:
- CL{O/Z}.{W/D}
- CT{O/Z}.{W/D}

Signed-off-by: Song Gao 
Message-Id: <20220716085426.3098060-5-gaos...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 tests/tcg/loongarch64/test_bit.c  | 88 +++
 tests/tcg/loongarch64/Makefile.target | 15 +
 2 files changed, 103 insertions(+)
 create mode 100644 tests/tcg/loongarch64/test_bit.c
 create mode 100644 tests/tcg/loongarch64/Makefile.target

diff --git a/tests/tcg/loongarch64/test_bit.c b/tests/tcg/loongarch64/test_bit.c
new file mode 100644
index 00..a6d9904909
--- /dev/null
+++ b/tests/tcg/loongarch64/test_bit.c
@@ -0,0 +1,88 @@
+#include 
+#include 
+
+#define ARRAY_SIZE(X) (sizeof(X) / sizeof(*(X)))
+#define TEST_CLO(N) \
+static uint64_t test_clo_##N(uint64_t rj)   \
+{   \
+uint64_t rd = 0;\
+\
+asm volatile("clo."#N" %0, %1\n\t"  \
+ : "=r"(rd) \
+ : "r"(rj)  \
+ : );   \
+return rd;  \
+}
+
+#define TEST_CLZ(N) \
+static uint64_t test_clz_##N(uint64_t rj)   \
+{   \
+uint64_t rd = 0;\
+\
+asm volatile("clz."#N" %0, %1\n\t"  \
+ : "=r"(rd) \
+ : "r"(rj)  \
+ : );   \
+return rd;  \
+}
+
+#define TEST_CTO(N) \
+static uint64_t test_cto_##N(uint64_t rj)   \
+{   \
+uint64_t rd = 0;\
+\
+asm volatile("cto."#N" %0, %1\n\t"  \
+ : "=r"(rd) \
+ : "r"(rj)  \
+ : );   \
+return rd;  \
+}
+
+#define TEST_CTZ(N) \
+static uint64_t test_ctz_##N(uint64_t rj)   \
+{   \
+uint64_t rd = 0;\
+\
+asm volatile("ctz."#N" %0, %1\n\t"  \
+ : "=r"(rd) \
+ : "r"(rj)  \
+ : );   \
+return rd;  \
+}
+
+TEST_CLO(w)
+TEST_CLO(d)
+TEST_CLZ(w)
+TEST_CLZ(d)
+TEST_CTO(w)
+TEST_CTO(d)
+TEST_CTZ(w)
+TEST_CTZ(d)
+
+struct vector {
+uint64_t (*func)(uint64_t);
+uint64_t u;
+uint64_t r;
+};
+
+static struct vector vectors[] = {
+{test_clo_w, 0xfff11fff392476ab, 0},
+{test_clo_d, 0xabd28a6400, 0},
+{test_clz_w, 0xfa42392476ab, 2},
+{test_clz_d, 0xabd28a6400, 8},
+{test_cto_w, 0xfff11fff392476ab, 2},
+{test_cto_d, 0xabd28a6400, 0},
+{test_ctz_w, 0xfa42392476ab, 0},
+{test_ctz_d, 0xabd28a6400, 26},
+};
+
+int main()
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(vectors); i++) {
+assert((*vectors[i].func)(vectors[i].u) == vectors[i].r);
+}
+
+return 0;
+}
diff --git a/tests/tcg/loongarch64/Makefile.target 
b/tests/tcg/loongarch64/Makefile.target
new file mode 100644
index 00..c0bd8b9b86
--- /dev/null
+++ b/tests/tcg/loongarch64/Makefile.target
@@ -0,0 +1,15 @@
+# -*- Mode: makefile -*-
+#
+# LoongArch64 specific tweaks
+
+# Loongarch64 doesn't support gdb, so skip the EXTRA_RUNS
+EXTRA_RUNS =
+
+LOONGARCH64_SRC=$(SRC_PATH)/tests/tcg/loongarch64
+VPATH += $(LOONGARCH64_SRC)
+
+LDFLAGS+=-lm
+
+LOONGARCH64_TESTS  = test_bit
+
+TESTS += $(LOONGARCH64_TESTS)
-- 
2.34.1

[PULL 15/21] tests/tcg/loongarch64: Add pcadd related instructions test

2022-07-19 Thread Richard Henderson

From: Song Gao 

This includes:
- PCADDI
- PCADDU12I
- PCADDU18I
- PCALAU12I

Signed-off-by: Song Gao 
Message-Id: <20220716085426.3098060-9-gaos...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 tests/tcg/loongarch64/test_pcadd.c| 38 +++
 tests/tcg/loongarch64/Makefile.target |  1 +
 2 files changed, 39 insertions(+)
 create mode 100644 tests/tcg/loongarch64/test_pcadd.c

diff --git a/tests/tcg/loongarch64/test_pcadd.c 
b/tests/tcg/loongarch64/test_pcadd.c
new file mode 100644
index 00..da2a64db82
--- /dev/null
+++ b/tests/tcg/loongarch64/test_pcadd.c
@@ -0,0 +1,38 @@
+#include 
+#include 
+#include 
+
+#define TEST_PCADDU(N)  \
+void test_##N(int a)\
+{   \
+uint64_t rd1 = 0;   \
+uint64_t rd2 = 0;   \
+uint64_t rm, rn;\
+\
+asm volatile(""#N" %0, 0x104\n\t"   \
+ ""#N" %1, 0x12345\n\t" \
+ : "=r"(rd1), "=r"(rd2) \
+ : );   \
+rm = rd2 - rd1; \
+if (!strcmp(#N, "pcalau12i")) { \
+rn = ((0x12345UL - 0x104) << a) & ~0xfff;   \
+} else {\
+rn = ((0x12345UL - 0x104) << a) + 4;\
+}   \
+assert(rm == rn);   \
+}
+
+TEST_PCADDU(pcaddi)
+TEST_PCADDU(pcaddu12i)
+TEST_PCADDU(pcaddu18i)
+TEST_PCADDU(pcalau12i)
+
+int main()
+{
+test_pcaddi(2);
+test_pcaddu12i(12);
+test_pcaddu18i(18);
+test_pcalau12i(12);
+
+return 0;
+}
diff --git a/tests/tcg/loongarch64/Makefile.target 
b/tests/tcg/loongarch64/Makefile.target
index b320d9fd9c..0115de78ef 100644
--- a/tests/tcg/loongarch64/Makefile.target
+++ b/tests/tcg/loongarch64/Makefile.target
@@ -14,5 +14,6 @@ LOONGARCH64_TESTS  = test_bit
 LOONGARCH64_TESTS  += test_div
 LOONGARCH64_TESTS  += test_fclass
 LOONGARCH64_TESTS  += test_fpcom
+LOONGARCH64_TESTS  += test_pcadd
 
 TESTS += $(LOONGARCH64_TESTS)
-- 
2.34.1

[PULL 05/21] target/loongarch/tlb_helper: Fix coverity integer overflow error

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

Replace '1 << shift' with 'MAKE_64BIT_MASK(shift, 1)' to fix
unintentional integer overflow errors in tlb_helper file.

Fix coverity CID: 1489759 1489762

Signed-off-by: Xiaojuan Yang 
Reviewed-by: Richard Henderson 
Message-Id: <20220715060740.1500628-5-yangxiaoj...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 target/loongarch/tlb_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index bab19c7e05..610b6d123c 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -298,7 +298,7 @@ static void invalidate_tlb_entry(CPULoongArchState *env, 
int index)
 } else {
 tlb_ps = FIELD_EX64(env->CSR_STLBPS, CSR_STLBPS, PS);
 }
-pagesize = 1 << tlb_ps;
+pagesize = MAKE_64BIT_MASK(tlb_ps, 1);
 mask = MAKE_64BIT_MASK(0, tlb_ps + 1);
 
 if (tlb_v0) {
@@ -736,7 +736,7 @@ void helper_ldpte(CPULoongArchState *env, target_ulong 
base, target_ulong odd,
 (tmp0 & (~(1 << R_TLBENTRY_G_SHIFT)));
 ps = ptbase + ptwidth - 1;
 if (odd) {
-tmp0 += (1 << ps);
+tmp0 += MAKE_64BIT_MASK(ps, 1);
 }
 } else {
 /* 0:64bit, 1:128bit, 2:192bit, 3:256bit */
-- 
2.34.1

[PULL 03/21] hw/intc/loongarch_pch_pic: Fix bugs for update_irq function

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

Fix such errors:
1. We should not use 'unsigned long' type as argument when we use
find_first_bit(), and we use ctz64() to replace find_first_bit()
to fix this bug.
2. It is not standard to use '1ULL << irq' to generate a irq mask.
So, we replace it with 'MAKE_64BIT_MASK(irq, 1)'.

Fix coverity CID: 1489761 1489764 1489765

Signed-off-by: Xiaojuan Yang 
Message-Id: <20220715060740.1500628-3-yangxiaoj...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 hw/intc/loongarch_pch_pic.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/intc/loongarch_pch_pic.c b/hw/intc/loongarch_pch_pic.c
index 3c9814a3b4..3380b09807 100644
--- a/hw/intc/loongarch_pch_pic.c
+++ b/hw/intc/loongarch_pch_pic.c
@@ -15,21 +15,21 @@
 
 static void pch_pic_update_irq(LoongArchPCHPIC *s, uint64_t mask, int level)
 {
-unsigned long val;
+uint64_t val;
 int irq;
 
 if (level) {
 val = mask & s->intirr & ~s->int_mask;
 if (val) {
-irq = find_first_bit(&val, 64);
-s->intisr |= 0x1ULL << irq;
+irq = ctz64(val);
+s->intisr |= MAKE_64BIT_MASK(irq, 1);
 qemu_set_irq(s->parent_irq[s->htmsi_vector[irq]], 1);
 }
 } else {
 val = mask & s->intisr;
 if (val) {
-irq = find_first_bit(&val, 64);
-s->intisr &= ~(0x1ULL << irq);
+irq = ctz64(val);
+s->intisr &= ~MAKE_64BIT_MASK(irq, 1);
 qemu_set_irq(s->parent_irq[s->htmsi_vector[irq]], 0);
 }
 }
-- 
2.34.1

[PULL 17/21] hw/loongarch: Add uefi bios loading support

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

Add uefi bios loading support, now only uefi bios is porting to
loongarch virt machine.

Reviewed-by: Richard Henderson 
Signed-off-by: Xiaojuan Yang 
Message-Id: <20220712083206.4187715-3-yangxiaoj...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 include/hw/loongarch/virt.h |  4 
 hw/loongarch/loongson3.c| 34 ++
 2 files changed, 38 insertions(+)

diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 9fec1f8a5c..ec37d86e44 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -18,6 +18,8 @@
 #define LOONGARCH_ISA_IO_BASE   0x1800UL
 #define LOONGARCH_ISA_IO_SIZE   0x0004000
 #define VIRT_FWCFG_BASE 0x1e02UL
+#define VIRT_BIOS_BASE  0x1c00UL
+#define VIRT_BIOS_SIZE  (4 * MiB)
 
 struct LoongArchMachineState {
 /*< private >*/
@@ -27,6 +29,8 @@ struct LoongArchMachineState {
 MemoryRegion lowmem;
 MemoryRegion highmem;
 MemoryRegion isa_io;
+MemoryRegion bios;
+bool bios_loaded;
 /* State for other subsystems/APIs: */
 FWCfgState  *fw_cfg;
 };
diff --git a/hw/loongarch/loongson3.c b/hw/loongarch/loongson3.c
index 9ee7450252..3f1849b8b0 100644
--- a/hw/loongarch/loongson3.c
+++ b/hw/loongarch/loongson3.c
@@ -310,6 +310,37 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 loongarch_devices_init(pch_pic);
 }
 
+static void loongarch_firmware_init(LoongArchMachineState *lams)
+{
+char *filename = MACHINE(lams)->firmware;
+char *bios_name = NULL;
+int bios_size;
+
+lams->bios_loaded = false;
+if (filename) {
+bios_name = qemu_find_file(QEMU_FILE_TYPE_BIOS, filename);
+if (!bios_name) {
+error_report("Could not find ROM image '%s'", filename);
+exit(1);
+}
+
+bios_size = load_image_targphys(bios_name, VIRT_BIOS_BASE, 
VIRT_BIOS_SIZE);
+if (bios_size < 0) {
+error_report("Could not load ROM image '%s'", bios_name);
+exit(1);
+}
+
+g_free(bios_name);
+
+memory_region_init_ram(&lams->bios, NULL, "loongarch.bios",
+   VIRT_BIOS_SIZE, &error_fatal);
+memory_region_set_readonly(&lams->bios, true);
+memory_region_add_subregion(get_system_memory(), VIRT_BIOS_BASE, 
&lams->bios);
+lams->bios_loaded = true;
+}
+
+}
+
 static void reset_load_elf(void *opaque)
 {
 LoongArchCPU *cpu = opaque;
@@ -369,6 +400,9 @@ static void loongarch_init(MachineState *machine)
  get_system_io(), 0, LOONGARCH_ISA_IO_SIZE);
 memory_region_add_subregion(address_space_mem, LOONGARCH_ISA_IO_BASE,
 &lams->isa_io);
+/* load the BIOS image. */
+loongarch_firmware_init(lams);
+
 /* fw_cfg init */
 lams->fw_cfg = loongarch_fw_cfg_init(ram_size, machine);
 rom_set_fw(lams->fw_cfg);
-- 
2.34.1

[PULL 04/21] target/loongarch/cpu: Fix coverity errors about excp_names

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

Fix out-of-bounds errors when access excp_names[] array. the valid
boundary size of excp_names should be 0 to ARRAY_SIZE(excp_names)-1.
However, the general code do not consider the max boundary.

Fix coverity CID: 1489758

Signed-off-by: Xiaojuan Yang 
Reviewed-by: Richard Henderson 
Message-Id: <20220715060740.1500628-4-yangxiaoj...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 target/loongarch/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 5573468a7d..0d49ce68e4 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -140,7 +140,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 
 if (cs->exception_index != EXCCODE_INT) {
 if (cs->exception_index < 0 ||
-cs->exception_index > ARRAY_SIZE(excp_names)) {
+cs->exception_index >= ARRAY_SIZE(excp_names)) {
 name = "unknown";
 } else {
 name = excp_names[cs->exception_index];
@@ -190,8 +190,8 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 cause = cs->exception_index;
 break;
 default:
-qemu_log("Error: exception(%d) '%s' has not been supported\n",
- cs->exception_index, excp_names[cs->exception_index]);
+qemu_log("Error: exception(%d) has not been supported\n",
+ cs->exception_index);
 abort();
 }
 
-- 
2.34.1

[PULL 06/21] target/loongarch/op_helper: Fix coverity cond_at_most error

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

The boundary size of cpucfg array should be 0 to ARRAY_SIZE(cpucfg)-1.
So, using index bigger than max boundary to access cpucfg[] must be
forbidden.

Fix coverity CID: 1489760

Signed-off-by: Xiaojuan Yang 
Reviewed-by: Richard Henderson 
Message-Id: <20220715060740.1500628-6-yangxiaoj...@loongson.cn>
Signed-off-by: Richard Henderson 
---
 target/loongarch/op_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
index 4b429b6699..568c071601 100644
--- a/target/loongarch/op_helper.c
+++ b/target/loongarch/op_helper.c
@@ -81,7 +81,7 @@ target_ulong helper_crc32c(target_ulong val, target_ulong m, 
uint64_t sz)
 
 target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)
 {
-return rj > 21 ? 0 : env->cpucfg[rj];
+return rj >= ARRAY_SIZE(env->cpucfg) ? 0 : env->cpucfg[rj];
 }
 
 uint64_t helper_rdtime_d(CPULoongArchState *env)
-- 
2.34.1

[PULL 08/21] fpu/softfloat: Add LoongArch specializations for pickNaN*

2022-07-19 Thread Richard Henderson

From: Song Gao 

The muladd (inf,zero,nan) case sets InvalidOp and returns the
input value 'c', and prefer sNaN over qNaN, in c,a,b order.
Binary operations prefer sNaN over qNaN and a,b order.

Signed-off-by: Song Gao 
Message-Id: <20220716085426.3098060-3-gaos...@loongson.cn>
[rth: Add specialization for pickNaN]
Signed-off-by: Richard Henderson 
---
 fpu/softfloat-specialize.c.inc | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index 943e3301d2..9096fb302b 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -390,7 +390,8 @@ bool float32_is_signaling_nan(float32 a_, float_status 
*status)
 static int pickNaN(FloatClass a_cls, FloatClass b_cls,
bool aIsLargerSignificand, float_status *status)
 {
-#if defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA)
+#if defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA) \
+|| defined(TARGET_LOONGARCH64)
 /* ARM mandated NaN propagation rules (see FPProcessNaNs()), take
  * the first of:
  *  1. A if it is signaling
@@ -574,6 +575,29 @@ static int pickNaNMulAdd(FloatClass a_cls, FloatClass 
b_cls, FloatClass c_cls,
 return 1;
 }
 }
+#elif defined(TARGET_LOONGARCH64)
+/*
+ * For LoongArch systems that conform to IEEE754-2008, the (inf,zero,nan)
+ * case sets InvalidOp and returns the input value 'c'
+ */
+if (infzero) {
+float_raise(float_flag_invalid | float_flag_invalid_imz, status);
+return 2;
+}
+/* Prefer sNaN over qNaN, in the c, a, b order. */
+if (is_snan(c_cls)) {
+return 2;
+} else if (is_snan(a_cls)) {
+return 0;
+} else if (is_snan(b_cls)) {
+return 1;
+} else if (is_qnan(c_cls)) {
+return 2;
+} else if (is_qnan(a_cls)) {
+return 0;
+} else {
+return 1;
+}
 #elif defined(TARGET_PPC)
 /* For PPC, the (inf,zero,qnan) case sets InvalidOp, but we prefer
  * to return an input NaN if we have one (ie c) rather than generating
-- 
2.34.1

[PULL 02/21] target/loongarch: Fix loongarch_cpu_class_by_name

2022-07-19 Thread Richard Henderson

From: Xiaojuan Yang 

The cpu_model argument may already have the '-loongarch-cpu' suffix,
e.g. when using the default for the LS7A1000 machine.  If that fails,
try again with the suffix.  Validate that the object created by the
function is derived from the proper base class.

Signed-off-by: Xiaojuan Yang 
Reviewed-by: Richard Henderson 
Message-Id: <20220715060740.1500628-2-yangxiaoj...@loongson.cn>
[rth: Try without and then with the suffix, to avoid testsuite breakage.]
Signed-off-by: Richard Henderson 
---
 target/loongarch/cpu.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index e21715592a..5573468a7d 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -571,12 +571,22 @@ static void loongarch_cpu_init(Object *obj)
 static ObjectClass *loongarch_cpu_class_by_name(const char *cpu_model)
 {
 ObjectClass *oc;
-char *typename;
 
-typename = g_strdup_printf(LOONGARCH_CPU_TYPE_NAME("%s"), cpu_model);
-oc = object_class_by_name(typename);
-g_free(typename);
-return oc;
+oc = object_class_by_name(cpu_model);
+if (!oc) {
+g_autofree char *typename 
+= g_strdup_printf(LOONGARCH_CPU_TYPE_NAME("%s"), cpu_model);
+oc = object_class_by_name(typename);
+if (!oc) {
+return NULL;
+}
+}
+
+if (object_class_dynamic_cast(oc, TYPE_LOONGARCH_CPU)
+&& !object_class_is_abstract(oc)) {
+return oc;
+}
+return NULL;
 }
 
 void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int flags)
-- 
2.34.1

[PULL 01/21] tests/docker/dockerfiles: Add debian-loongarch-cross.docker

2022-07-19 Thread Richard Henderson

Use the pre-packaged toolchain provided by Loongson via github.

Tested-by: Song Gao 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
Message-Id: <20220704070824.965429-1-richard.hender...@linaro.org>
---
 configure |  5 
 tests/docker/Makefile.include |  2 ++
 .../dockerfiles/debian-loongarch-cross.docker | 25 +++
 3 files changed, 32 insertions(+)
 create mode 100644 tests/docker/dockerfiles/debian-loongarch-cross.docker

diff --git a/configure b/configure
index 4f12481765..35e0b28198 100755
--- a/configure
+++ b/configure
@@ -1933,6 +1933,7 @@ probe_target_compiler() {
 hexagon) container_hosts=x86_64 ;;
 hppa) container_hosts=x86_64 ;;
 i386) container_hosts=x86_64 ;;
+loongarch64) container_hosts=x86_64 ;;
 m68k) container_hosts=x86_64 ;;
 microblaze) container_hosts=x86_64 ;;
 mips64el) container_hosts=x86_64 ;;
@@ -1987,6 +1988,10 @@ probe_target_compiler() {
 container_image=fedora-i386-cross
 container_cross_prefix=
 ;;
+  loongarch64)
+container_image=debian-loongarch-cross
+container_cross_prefix=loongarch64-unknown-linux-gnu-
+;;
   m68k)
 container_image=debian-m68k-cross
 container_cross_prefix=m68k-linux-gnu-
diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index ef4518d9eb..9a45e8890b 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -140,6 +140,7 @@ docker-image-debian-nios2-cross: 
$(DOCKER_FILES_DIR)/debian-toolchain.docker \
 # Specialist build images, sometimes very limited tools
 docker-image-debian-tricore-cross: docker-image-debian10
 docker-image-debian-all-test-cross: docker-image-debian10
+docker-image-debian-loongarch-cross: docker-image-debian11
 docker-image-debian-microblaze-cross: docker-image-debian10
 docker-image-debian-nios2-cross: docker-image-debian10
 docker-image-debian-powerpc-test-cross: docker-image-debian11
@@ -149,6 +150,7 @@ docker-image-debian-riscv64-test-cross: 
docker-image-debian11
 DOCKER_PARTIAL_IMAGES += debian-alpha-cross
 DOCKER_PARTIAL_IMAGES += debian-powerpc-test-cross
 DOCKER_PARTIAL_IMAGES += debian-hppa-cross
+DOCKER_PARTIAL_IMAGES += debian-loongarch-cross
 DOCKER_PARTIAL_IMAGES += debian-m68k-cross debian-mips64-cross
 DOCKER_PARTIAL_IMAGES += debian-microblaze-cross
 DOCKER_PARTIAL_IMAGES += debian-nios2-cross
diff --git a/tests/docker/dockerfiles/debian-loongarch-cross.docker 
b/tests/docker/dockerfiles/debian-loongarch-cross.docker
new file mode 100644
index 00..ca2469d2a8
--- /dev/null
+++ b/tests/docker/dockerfiles/debian-loongarch-cross.docker
@@ -0,0 +1,25 @@
+#
+# Docker cross-compiler target
+#
+# This docker target builds on the debian11 base image,
+# using a prebuilt toolchains for LoongArch64 from:
+# https://github.com/loongson/build-tools/releases
+#
+FROM qemu/debian11
+
+RUN apt-get update && \
+DEBIAN_FRONTEND=noninteractive apt install -yy eatmydata && \
+DEBIAN_FRONTEND=noninteractive eatmydata \
+apt-get install -y --no-install-recommends \
+build-essential \
+ca-certificates \
+curl \
+gettext \
+git \
+python3-minimal
+
+RUN curl -#SL 
https://github.com/loongson/build-tools/releases/download/2022.05.29/loongarch64-clfs-5.0-cross-tools-gcc-glibc.tar.xz
 \
+| tar -xJC /opt
+
+ENV PATH $PATH:/opt/cross-tools/bin
+ENV LD_LIBRARY_PATH 
/opt/cross-tools/lib:/opt/cross-tools/loongarch64-unknown-linux-gnu/lib:$LD_LIBRARY_PATH
-- 
2.34.1

[PULL 00/21] loongarch patch queue

2022-07-19 Thread Richard Henderson

The following changes since commit da7da9d5e608200ecc0749ff37be246e9cd3314f:

  Merge tag 'pull-request-2022-07-19' of https://gitlab.com/thuth/qemu into 
staging (2022-07-19 13:05:06 +0100)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-la-20220719

for you to fetch changes up to fda3f15b0079d4bba76791502a7e00b8b747f509:

  hw/loongarch: Add fdt support (2022-07-19 22:55:10 +0530)


LoongArch64 patch queue:

Add dockerfile for loongarch cross compile
Add reference files for float tests.
Add simple tests for div, mod, clo, fclass, fcmp, pcadd
Add bios and kernel boot support.
Add smbios, acpi, and fdt support.
Fix pch-pic update-irq.
Fix some errors identified by coverity.


Philippe Mathieu-Daudé (1):
  tests/tcg/loongarch64: Add float reference files

Richard Henderson (1):
  tests/docker/dockerfiles: Add debian-loongarch-cross.docker

Song Gao (7):
  fpu/softfloat: Add LoongArch specializations for pickNaN*
  target/loongarch: Fix float_convd/float_convs test failing
  tests/tcg/loongarch64: Add clo related instructions test
  tests/tcg/loongarch64: Add div and mod related instructions test
  tests/tcg/loongarch64: Add fclass test
  tests/tcg/loongarch64: Add fp comparison instructions test
  tests/tcg/loongarch64: Add pcadd related instructions test

Xiaojuan Yang (12):
  target/loongarch: Fix loongarch_cpu_class_by_name
  hw/intc/loongarch_pch_pic: Fix bugs for update_irq function
  target/loongarch/cpu: Fix coverity errors about excp_names
  target/loongarch/tlb_helper: Fix coverity integer overflow error
  target/loongarch/op_helper: Fix coverity cond_at_most error
  target/loongarch/cpu: Fix cpucfg default value
  hw/loongarch: Add fw_cfg table support
  hw/loongarch: Add uefi bios loading support
  hw/loongarch: Add linux kernel booting support
  hw/loongarch: Add smbios support
  hw/loongarch: Add acpi ged support
  hw/loongarch: Add fdt support

 configure  |   5 +
 configs/targets/loongarch64-softmmu.mak|   1 +
 hw/loongarch/fw_cfg.h  |  15 +
 include/hw/loongarch/virt.h|  25 +
 include/hw/pci-host/ls7a.h |   4 +
 target/loongarch/cpu.h |   3 +
 hw/intc/loongarch_pch_pic.c|  10 +-
 hw/loongarch/acpi-build.c  | 609 +
 hw/loongarch/fw_cfg.c  |  33 +
 hw/loongarch/loongson3.c   | 433 -
 target/loongarch/cpu.c |  29 +-
 target/loongarch/fpu_helper.c  | 143 +--
 target/loongarch/op_helper.c   |   2 +-
 target/loongarch/tlb_helper.c  |   4 +-
 tests/tcg/loongarch64/test_bit.c   |  88 ++
 tests/tcg/loongarch64/test_div.c   |  54 ++
 tests/tcg/loongarch64/test_fclass.c| 130 +++
 tests/tcg/loongarch64/test_fpcom.c |  37 +
 tests/tcg/loongarch64/test_pcadd.c |  38 +
 fpu/softfloat-specialize.c.inc |  26 +-
 hw/loongarch/Kconfig   |   3 +
 hw/loongarch/meson.build   |   6 +-
 tests/docker/Makefile.include  |   2 +
 .../dockerfiles/debian-loongarch-cross.docker  |  25 +
 tests/tcg/loongarch64/Makefile.target  |  19 +
 tests/tcg/loongarch64/float_convd.ref  | 988 +
 tests/tcg/loongarch64/float_convs.ref  | 748 
 tests/tcg/loongarch64/float_madds.ref  | 768 
 28 files changed, 4147 insertions(+), 101 deletions(-)
 create mode 100644 hw/loongarch/fw_cfg.h
 create mode 100644 hw/loongarch/acpi-build.c
 create mode 100644 hw/loongarch/fw_cfg.c
 create mode 100644 tests/tcg/loongarch64/test_bit.c
 create mode 100644 tests/tcg/loongarch64/test_div.c
 create mode 100644 tests/tcg/loongarch64/test_fclass.c
 create mode 100644 tests/tcg/loongarch64/test_fpcom.c
 create mode 100644 tests/tcg/loongarch64/test_pcadd.c
 create mode 100644 tests/docker/dockerfiles/debian-loongarch-cross.docker
 create mode 100644 tests/tcg/loongarch64/Makefile.target
 create mode 100644 tests/tcg/loongarch64/float_convd.ref
 create mode 100644 tests/tcg/loongarch64/float_convs.ref
 create mode 100644 tests/tcg/loongarch64/float_madds.ref

Re: [PATCH v2] linux-user: Unconditionally use pipe2() syscall

2022-07-19 Thread Laurent Vivier


Le 19/07/2022 à 18:20, Helge Deller a écrit :

The pipe2() syscall is available on all Linux platforms since kernel
2.6.27, so use it unconditionally to emulate pipe() and pipe2().

Signed-off-by: Helge Deller 
Reviewed-by: Peter Maydell 
---
Changes in v2:
- added Reviewed-by: from Peter
- new diff against git head
- no functional changes


diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 991b85e6b4..4f89184d05 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1586,21 +1586,12 @@ static abi_long do_ppoll(abi_long arg1, abi_long arg2, 
abi_long arg3,
  }
  #endif

-static abi_long do_pipe2(int host_pipe[], int flags)
-{
-#ifdef CONFIG_PIPE2
-return pipe2(host_pipe, flags);
-#else
-return -ENOSYS;
-#endif
-}
-
  static abi_long do_pipe(CPUArchState *cpu_env, abi_ulong pipedes,
  int flags, int is_pipe2)
  {
  int host_pipe[2];
  abi_long ret;
-ret = flags ? do_pipe2(host_pipe, flags) : pipe(host_pipe);
+ret = pipe2(host_pipe, flags);

  if (is_error(ret))
  return get_errno(ret);
diff --git a/meson.build b/meson.build
index 8a8c415fc1..75aaca8462 100644
--- a/meson.build
+++ b/meson.build
@@ -2026,15 +2026,6 @@ config_host_data.set('CONFIG_OPEN_BY_HANDLE', 
cc.links(gnu_source_prefix + '''
#else
int main(void) { struct file_handle fh; return open_by_handle_at(0, &fh, 
0); }
#endif'''))
-config_host_data.set('CONFIG_PIPE2', cc.links(gnu_source_prefix + '''
-  #include 
-  #include 
-
-  int main(void)
-  {
-  int pipefd[2];
-  return pipe2(pipefd, O_CLOEXEC);
-  }'''))
  config_host_data.set('CONFIG_POSIX_MADVISE', cc.links(gnu_source_prefix + '''
#include 
#include 


Applied to my linux-user-for-7.1 branch.

Thanks,
Laurent

[PULL 28/29] multifd: Document the locking of MultiFD{Send/Recv}Params

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Juan Quintela 

Reorder the structures so we can know if the fields are:
- Read only
- Their own locking (i.e. sems)
- Protected by 'mutex'
- Only for the multifd channel

Signed-off-by: Juan Quintela 
Message-Id: <20220531104318.7494-2-quint...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Typo fixes from Chen Zhang
---
 migration/multifd.h | 66 -
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 4d8d89e5e5..519f498643 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -65,7 +65,9 @@ typedef struct {
 } MultiFDPages_t;
 
 typedef struct {
-/* this fields are not changed once the thread is created */
+/* Fields are only written at creating/deletion time */
+/* No lock required for them, they are read only */
+
 /* channel number */
 uint8_t id;
 /* channel thread name */
@@ -74,39 +76,47 @@ typedef struct {
 QemuThread thread;
 /* communication channel */
 QIOChannel *c;
+/* is the yank function registered */
+bool registered_yank;
+/* packet allocated len */
+uint32_t packet_len;
+/* multifd flags for sending ram */
+int write_flags;
+
 /* sem where to wait for more work */
 QemuSemaphore sem;
+/* syncs main thread and channels */
+QemuSemaphore sem_sync;
+
 /* this mutex protects the following parameters */
 QemuMutex mutex;
 /* is this channel thread running */
 bool running;
 /* should this thread finish */
 bool quit;
-/* is the yank function registered */
-bool registered_yank;
+/* multifd flags for each packet */
+uint32_t flags;
+/* global number of generated multifd packets */
+uint64_t packet_num;
 /* thread has work to do */
 int pending_job;
-/* array of pages to sent */
+/* array of pages to sent.
+ * The owner of 'pages' depends of 'pending_job' value:
+ * pending_job == 0 -> migration_thread can use it.
+ * pending_job != 0 -> multifd_channel can use it.
+ */
 MultiFDPages_t *pages;
-/* packet allocated len */
-uint32_t packet_len;
+
+/* thread local variables. No locking required */
+
 /* pointer to the packet */
 MultiFDPacket_t *packet;
-/* multifd flags for sending ram */
-int write_flags;
-/* multifd flags for each packet */
-uint32_t flags;
 /* size of the next packet that contains pages */
 uint32_t next_packet_size;
-/* global number of generated multifd packets */
-uint64_t packet_num;
-/* thread local variables */
 /* packets sent through this channel */
 uint64_t num_packets;
 /* non zero pages sent through this channel */
 uint64_t total_normal_pages;
-/* syncs main thread and channels */
-QemuSemaphore sem_sync;
 /* buffers to send */
 struct iovec *iov;
 /* number of iovs used */
@@ -120,7 +130,9 @@ typedef struct {
 }  MultiFDSendParams;
 
 typedef struct {
-/* this fields are not changed once the thread is created */
+/* Fields are only written at creating/deletion time */
+/* No lock required for them, they are read only */
+
 /* channel number */
 uint8_t id;
 /* channel thread name */
@@ -129,31 +141,35 @@ typedef struct {
 QemuThread thread;
 /* communication channel */
 QIOChannel *c;
+/* packet allocated len */
+uint32_t packet_len;
+
+/* syncs main thread and channels */
+QemuSemaphore sem_sync;
+
 /* this mutex protects the following parameters */
 QemuMutex mutex;
 /* is this channel thread running */
 bool running;
 /* should this thread finish */
 bool quit;
-/* ramblock host address */
-uint8_t *host;
-/* packet allocated len */
-uint32_t packet_len;
-/* pointer to the packet */
-MultiFDPacket_t *packet;
 /* multifd flags for each packet */
 uint32_t flags;
 /* global number of generated multifd packets */
 uint64_t packet_num;
-/* thread local variables */
+
+/* thread local variables. No locking required */
+
+/* pointer to the packet */
+MultiFDPacket_t *packet;
 /* size of the next packet that contains pages */
 uint32_t next_packet_size;
 /* packets sent through this channel */
 uint64_t num_packets;
+/* ramblock host address */
+uint8_t *host;
 /* non zero pages recv through this channel */
 uint64_t total_normal_pages;
-/* syncs main thread and channels */
-QemuSemaphore sem_sync;
 /* buffers to recv */
 struct iovec *iov;
 /* Pages that are not zero */
-- 
2.36.1

[PULL 21/29] tests: Add postcopy tls migration test

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

We just added TLS tests for precopy but not postcopy.  Add the
corresponding test for vanilla postcopy.

Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
will only use unix sockets as channel.

Signed-off-by: Peter Xu 
Message-Id: <20220707185525.27692-1-pet...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manual merge
---
 tests/qtest/migration-test.c | 61 ++--
 1 file changed, 51 insertions(+), 10 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index f3931e0a92..b2020ef6c5 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -573,6 +573,9 @@ typedef struct {
 
 /* Optional: set number of migration passes to wait for */
 unsigned int iterations;
+
+/* Postcopy specific fields */
+void *postcopy_data;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1061,15 +1064,19 @@ test_migrate_tls_x509_finish(QTestState *from,
 
 static int migrate_postcopy_prepare(QTestState **from_ptr,
 QTestState **to_ptr,
-MigrateStart *args)
+MigrateCommon *args)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
 QTestState *from, *to;
 
-if (test_migrate_start(&from, &to, uri, args)) {
+if (test_migrate_start(&from, &to, uri, &args->start)) {
 return -1;
 }
 
+if (args->start_hook) {
+args->postcopy_data = args->start_hook(from, to);
+}
+
 migrate_set_capability(from, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
@@ -1089,7 +1096,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 return 0;
 }
 
-static void migrate_postcopy_complete(QTestState *from, QTestState *to)
+static void migrate_postcopy_complete(QTestState *from, QTestState *to,
+  MigrateCommon *args)
 {
 wait_for_migration_complete(from);
 
@@ -1100,25 +1108,50 @@ static void migrate_postcopy_complete(QTestState *from, 
QTestState *to)
 read_blocktime(to);
 }
 
+if (args->finish_hook) {
+args->finish_hook(from, to, args->postcopy_data);
+args->postcopy_data = NULL;
+}
+
 test_migrate_end(from, to, true);
 }
 
-static void test_postcopy(void)
+static void test_postcopy_common(MigrateCommon *args)
 {
-MigrateStart args = {};
 QTestState *from, *to;
 
-if (migrate_postcopy_prepare(&from, &to, &args)) {
+if (migrate_postcopy_prepare(&from, &to, args)) {
 return;
 }
 migrate_postcopy_start(from, to);
-migrate_postcopy_complete(from, to);
+migrate_postcopy_complete(from, to, args);
 }
 
+static void test_postcopy(void)
+{
+MigrateCommon args = { };
+
+test_postcopy_common(&args);
+}
+
+#ifdef CONFIG_GNUTLS
+static void test_postcopy_tls_psk(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_common(&args);
+}
+#endif
+
 static void test_postcopy_recovery(void)
 {
-MigrateStart args = {
-.hide_stderr = true,
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+},
 };
 QTestState *from, *to;
 g_autofree char *uri = NULL;
@@ -1174,7 +1207,7 @@ static void test_postcopy_recovery(void)
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
-migrate_postcopy_complete(from, to);
+migrate_postcopy_complete(from, to, &args);
 }
 
 static void test_baddest(void)
@@ -2378,12 +2411,20 @@ int main(int argc, char **argv)
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
 qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/plain", test_postcopy);
+
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
 qtest_add_func("/migration/precopy/unix/xbzrle", test_precopy_unix_xbzrle);
 #ifdef CONFIG_GNUTLS
 qtest_add_func("/migration/precopy/unix/tls/psk",
test_precopy_unix_tls_psk);
+/*
+ * NOTE: psk test is enough for postcopy, as other types of TLS
+ * channels are tested under precopy.  Here what we want to test is the
+ * general postcopy path that has TLS channel enabled.
+ */
+qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.36.1

[PULL 27/29] migration/multifd: Report to user when zerocopy not working

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.

After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.

Signed-off-by: Leonardo Bras 
Reviewed-by: Daniel P. Berrangé 
Acked-by: Peter Xu 
Message-Id: <2022071122.18951-4-leob...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/multifd.c | 2 ++
 migration/ram.c | 5 +
 migration/ram.h | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 1e49594b02..586ddc9d65 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -624,6 +624,8 @@ int multifd_send_sync_main(QEMUFile *f)
 if (ret < 0) {
 error_report_err(err);
 return -1;
+} else if (ret == 1) {
+dirty_sync_missed_zero_copy();
 }
 }
 }
diff --git a/migration/ram.c b/migration/ram.c
index 4fbad74c6c..b94669ba5d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -434,6 +434,11 @@ static void ram_transferred_add(uint64_t bytes)
 ram_counters.transferred += bytes;
 }
 
+void dirty_sync_missed_zero_copy(void)
+{
+ram_counters.dirty_sync_missed_zero_copy++;
+}
+
 /* used by the search for pages to send */
 struct PageSearchStatus {
 /* Current block being searched */
diff --git a/migration/ram.h b/migration/ram.h
index 5d90945a6e..c7af65ac74 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -89,4 +89,6 @@ void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);
 
+void dirty_sync_missed_zero_copy(void);
+
 #endif
-- 
2.36.1

[PULL 24/29] migration: remove unreachable code after reading data

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Daniel P. Berrangé 

The code calls qio_channel_read() in a loop when it reports
QIO_CHANNEL_ERR_BLOCK. This code is reported when errno==EAGAIN.

As such the later block of code will always hit the 'errno != EAGAIN'
condition, making the final 'else' unreachable.

Fixes: Coverity CID 1490203
Signed-off-by: Daniel P. Berrangé 
Message-Id: <20220627135318.156121-1-berra...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/qemu-file.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2f266b25cd..4f400c2e52 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -411,10 +411,8 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
 f->total_transferred += len;
 } else if (len == 0) {
 qemu_file_set_error_obj(f, -EIO, local_error);
-} else if (len != -EAGAIN) {
-qemu_file_set_error_obj(f, len, local_error);
 } else {
-error_free(local_error);
+qemu_file_set_error_obj(f, len, local_error);
 }
 
 return len;
-- 
2.36.1

[PULL 23/29] tests: Add postcopy preempt tests

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Four tests are added for preempt mode:

  - Postcopy plain
  - Postcopy recovery
  - Postcopy tls
  - Postcopy tls+recovery

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185530.27801-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manual merge
---
 tests/qtest/migration-test.c | 57 ++--
 1 file changed, 55 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e9350ea8c6..02f2ef9f49 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -576,6 +576,7 @@ typedef struct {
 
 /* Postcopy specific fields */
 void *postcopy_data;
+bool postcopy_preempt;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1081,6 +1082,11 @@ static int migrate_postcopy_prepare(QTestState 
**from_ptr,
 migrate_set_capability(to, "postcopy-ram", true);
 migrate_set_capability(to, "postcopy-blocktime", true);
 
+if (args->postcopy_preempt) {
+migrate_set_capability(from, "postcopy-preempt", true);
+migrate_set_capability(to, "postcopy-preempt", true);
+}
+
 migrate_ensure_non_converge(from);
 
 /* Wait for the first serial output from the source */
@@ -1146,6 +1152,26 @@ static void test_postcopy_tls_psk(void)
 }
 #endif
 
+static void test_postcopy_preempt(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+};
+
+test_postcopy_common(&args);
+}
+
+static void test_postcopy_preempt_tls_psk(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_common(&args);
+}
+
 static void test_postcopy_recovery_common(MigrateCommon *args)
 {
 QTestState *from, *to;
@@ -1225,6 +1251,27 @@ static void test_postcopy_recovery_tls_psk(void)
 test_postcopy_recovery_common(&args);
 }
 
+static void test_postcopy_preempt_recovery(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+};
+
+test_postcopy_recovery_common(&args);
+}
+
+/* This contains preempt+recovery+tls test altogether */
+static void test_postcopy_preempt_all(void)
+{
+MigrateCommon args = {
+.postcopy_preempt = true,
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_recovery_common(&args);
+}
+
 static void test_baddest(void)
 {
 MigrateStart args = {
@@ -2425,10 +2472,12 @@ int main(int argc, char **argv)
 module_call_init(MODULE_INIT_QOM);
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
+qtest_add_func("/migration/postcopy/plain", test_postcopy);
 qtest_add_func("/migration/postcopy/recovery/plain",
test_postcopy_recovery);
-
-qtest_add_func("/migration/postcopy/plain", test_postcopy);
+qtest_add_func("/migration/postcopy/preempt/plain", test_postcopy_preempt);
+qtest_add_func("/migration/postcopy/preempt/recovery/plain",
+test_postcopy_preempt_recovery);
 
 qtest_add_func("/migration/bad_dest", test_baddest);
 qtest_add_func("/migration/precopy/unix/plain", test_precopy_unix_plain);
@@ -2444,6 +2493,10 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
 qtest_add_func("/migration/postcopy/recovery/tls/psk",
test_postcopy_recovery_tls_psk);
+qtest_add_func("/migration/postcopy/preempt/tls/psk",
+   test_postcopy_preempt_tls_psk);
+qtest_add_func("/migration/postcopy/preempt/recovery/tls/psk",
+   test_postcopy_preempt_all);
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.36.1

[PULL 05/29] accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Introduce kvm_dirty_ring_size util function to help calculate
dirty ring ful time.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
Message-Id: 

Signed-off-by: Dr. David Alan Gilbert 
---
 accel/kvm/kvm-all.c| 5 +
 accel/stubs/kvm-stub.c | 5 +
 include/sysemu/kvm.h   | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ce989a68ff..184aecab5c 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2318,6 +2318,11 @@ static void query_stats_cb(StatsResultList **result, 
StatsTarget target,
strList *names, strList *targets, Error **errp);
 static void query_stats_schemas_cb(StatsSchemaList **result, Error **errp);
 
+uint32_t kvm_dirty_ring_size(void)
+{
+return kvm_state->kvm_dirty_ring_size;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 3345882d85..2ac5f9c036 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -148,3 +148,8 @@ bool kvm_dirty_ring_enabled(void)
 {
 return false;
 }
+
+uint32_t kvm_dirty_ring_size(void)
+{
+return 0;
+}
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a783c78868..efd6dee818 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -582,4 +582,6 @@ bool kvm_cpu_check_are_resettable(void);
 bool kvm_arch_cpu_check_are_resettable(void);
 
 bool kvm_dirty_ring_enabled(void);
+
+uint32_t kvm_dirty_ring_size(void);
 #endif
-- 
2.36.1

[PULL 29/29] migration: Avoid false-positive on non-supported scenarios for zero-copy-send

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Migration with zero-copy-send currently has it's limitations, as it can't
be used with TLS nor any kind of compression. In such scenarios, it should
output errors during parameter / capability setting.

But currently there are some ways of setting this not-supported scenarios
without printing the error message:

!) For 'compression' capability, it works by enabling it together with
zero-copy-send. This happens because the validity test for zero-copy uses
the helper unction migrate_use_compression(), which check for compression
presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS].

The point here is: the validity test happens before the capability gets
enabled. If all of them get enabled together, this test will not return
error.

In order to fix that, replace migrate_use_compression() by directly testing
the cap_list parameter migrate_caps_check().

2) For features enabled by parameters such as TLS & 'multifd_compression',
there was also a possibility of setting non-supported scenarios: setting
zero-copy-send first, then setting the unsupported parameter.

In order to fix that, also add a check for parameters conflicting with
zero-copy-send on migrate_params_check().

3) XBZRLE is also a compression capability, so it makes sense to also add
it to the list of capabilities which are not supported with zero-copy-send.

Fixes: 1abaec9a1b2c ("migration: Change zero_copy_send from migration parameter 
to migration capability")
Signed-off-by: Leonardo Bras 
Message-Id: <20220719122345.253713-1-leob...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 15ae48b209..e03f698a3c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1306,7 +1306,9 @@ static bool migrate_caps_check(bool *cap_list,
 #ifdef CONFIG_LINUX
 if (cap_list[MIGRATION_CAPABILITY_ZERO_COPY_SEND] &&
 (!cap_list[MIGRATION_CAPABILITY_MULTIFD] ||
- migrate_use_compression() ||
+ cap_list[MIGRATION_CAPABILITY_COMPRESS] ||
+ cap_list[MIGRATION_CAPABILITY_XBZRLE] ||
+ migrate_multifd_compression() ||
  migrate_use_tls())) {
 error_setg(errp,
"Zero copy only available for non-compressed non-TLS 
multifd migration");
@@ -1550,6 +1552,17 @@ static bool migrate_params_check(MigrationParameters 
*params, Error **errp)
 error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: 
");
 return false;
 }
+
+#ifdef CONFIG_LINUX
+if (migrate_use_zero_copy_send() &&
+((params->has_multifd_compression && params->multifd_compression) ||
+ (params->has_tls_creds && params->tls_creds && *params->tls_creds))) {
+error_setg(errp,
+   "Zero copy only available for non-compressed non-TLS 
multifd migration");
+return false;
+}
+#endif
+
 return true;
 }
 
-- 
2.36.1

[PULL 07/29] softmmu/dirtylimit: Implement dirty page rate limit

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.

"query-vcpu-dirty-limit" success depends on enabling dirty
page rate limit, so just add it to the list of skipped
command to ensure qmp-cmd-test run successfully.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Message-Id: 
<4143f26706d413dd29db0b672fe58b3d3fbe34bc.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 hmp-commands-info.hx   |  13 +++
 hmp-commands.hx|  32 ++
 include/monitor/hmp.h  |   3 +
 qapi/migration.json|  80 +++
 softmmu/dirtylimit.c   | 194 +
 tests/qtest/qmp-cmd-test.c |   2 +
 6 files changed, 324 insertions(+)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 3ffa24bd67..188d9ece3b 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -865,6 +865,19 @@ SRST
 Display the vcpu dirty rate information.
 ERST
 
+{
+.name   = "vcpu_dirty_limit",
+.args_type  = "",
+.params = "",
+.help   = "show dirty page limit information of all vCPU",
+.cmd= hmp_info_vcpu_dirty_limit,
+},
+
+SRST
+  ``info vcpu_dirty_limit``
+Display the vcpu dirty page limit information.
+ERST
+
 #if defined(TARGET_I386)
 {
 .name   = "sgx",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index c9d465735a..182e639d14 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1768,3 +1768,35 @@ ERST
   "\n\t\t\t -b to specify dirty bitmap as method of 
calculation)",
 .cmd= hmp_calc_dirty_rate,
 },
+
+SRST
+``set_vcpu_dirty_limit``
+  Set dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "set_vcpu_dirty_limit",
+.args_type  = "dirty_rate:l,cpu_index:l?",
+.params = "dirty_rate [cpu_index]",
+.help   = "set dirty page rate limit, use cpu_index to set limit"
+  "\n\t\t\t\t\t on a specified virtual cpu",
+.cmd= hmp_set_vcpu_dirty_limit,
+},
+
+SRST
+``cancel_vcpu_dirty_limit``
+  Cancel dirty page rate limit on virtual CPU, the information about all the
+  virtual CPU dirty limit status can be observed with ``info vcpu_dirty_limit``
+  command.
+ERST
+
+{
+.name   = "cancel_vcpu_dirty_limit",
+.args_type  = "cpu_index:l?",
+.params = "[cpu_index]",
+.help   = "cancel dirty page rate limit, use cpu_index to cancel"
+  "\n\t\t\t\t\t limit on a specified virtual cpu",
+.cmd= hmp_cancel_vcpu_dirty_limit,
+},
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 2e89a97bd6..a618eb1e4e 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -131,6 +131,9 @@ void hmp_replay_delete_break(Monitor *mon, const QDict 
*qdict);
 void hmp_replay_seek(Monitor *mon, const QDict *qdict);
 void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict);
 void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict);
+void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_cancel_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
+void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict);
 void hmp_human_readable_text_helper(Monitor *mon,
 HumanReadableText *(*qmp_handler)(Error 
**));
 void hmp_info_stats(Monitor *mon, const QDict *qdict);
diff --git a/qapi/migration.json b/qapi/migration.json
index 7102e474a6..e552ee4f43 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1868,6 +1868,86 @@
 ##
 { 'command': 'query-dirty-rate', 'returns': 'DirtyRateInfo' }
 
+##
+# @DirtyLimitInfo:
+#
+# Dirty page rate limit information of a virtual CPU.
+#
+# @cpu-index: index of a virtual CPU.
+#
+# @limit-rate: upper limit of dirty page rate (MB/s) for a virtual
+#  CPU, 0 means unlimited.
+#
+# @current-rate: current dirty page rate (MB/s) for a virtual CPU.
+#
+# Since: 7.1
+#
+##
+{ 'struct': 'DirtyLimitInfo',
+  'data': { 'cpu-index': 'int',
+'limit-rate': 'uint64',
+'current-rate': 'uint64' } }
+
+##
+# @set-vcpu-dirty-limit:
+#
+# Set the upper limit of dirty page rate for virtual CPUs.
+#
+# Requires KVM with accelerator property "dirty-ring-size" set.
+# A virtual CPU's dirty page rate is a measure of its memory load.
+# To observe dirt

[PULL 26/29] Add dirty-sync-missed-zero-copy migration stat

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

Signed-off-by: Leonardo Bras 
Acked-by: Markus Armbruster 
Acked-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <2022071122.18951-3-leob...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 2 ++
 monitor/hmp-cmds.c| 5 +
 qapi/migration.json   | 7 ++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 7c7e529ca7..15ae48b209 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1057,6 +1057,8 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->normal_bytes = ram_counters.normal * page_size;
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
+info->ram->dirty_sync_missed_zero_copy =
+ram_counters.dirty_sync_missed_zero_copy;
 info->ram->postcopy_requests = ram_counters.postcopy_requests;
 info->ram->page_size = page_size;
 info->ram->multifd_bytes = ram_counters.multifd_bytes;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ca98df0495..a6dc79e0d5 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -307,6 +307,11 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "postcopy ram: %" PRIu64 " kbytes\n",
info->ram->postcopy_bytes >> 10);
 }
+if (info->ram->dirty_sync_missed_zero_copy) {
+monitor_printf(mon,
+   "Zero-copy-send fallbacks happened: %" PRIu64 " 
times\n",
+   info->ram->dirty_sync_missed_zero_copy);
+}
 }
 
 if (info->has_disk) {
diff --git a/qapi/migration.json b/qapi/migration.json
index 7586df3dea..81185d4311 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -55,6 +55,10 @@
 # @postcopy-bytes: The number of bytes sent during the post-copy phase
 #  (since 7.0).
 #
+# @dirty-sync-missed-zero-copy: Number of times dirty RAM synchronization could
+#   not avoid copying dirty pages. This is between
+#   0 and @dirty-sync-count * @multifd-channels.
+#   (since 7.1)
 # Since: 0.14
 ##
 { 'struct': 'MigrationStats',
@@ -65,7 +69,8 @@
'postcopy-requests' : 'int', 'page-size' : 'int',
'multifd-bytes' : 'uint64', 'pages-per-second' : 'uint64',
'precopy-bytes' : 'uint64', 'downtime-bytes' : 'uint64',
-   'postcopy-bytes' : 'uint64' } }
+   'postcopy-bytes' : 'uint64',
+   'dirty-sync-missed-zero-copy' : 'uint64' } }
 
 ##
 # @XBZRLECacheStats:
-- 
2.36.1

[PULL 06/29] softmmu/dirtylimit: Implement virtual CPU throttle

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is in service.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<977e808e03a1cef5151cae75984658b6821be618.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 accel/kvm/kvm-all.c |  20 ++-
 include/hw/core/cpu.h   |   6 +
 include/sysemu/dirtylimit.h |  15 ++
 softmmu/dirtylimit.c| 291 
 softmmu/trace-events|   7 +
 5 files changed, 338 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 184aecab5c..3187656570 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -45,6 +45,7 @@
 #include "qemu/guest-random.h"
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
+#include "sysemu/dirtylimit.h"
 
 #include "hw/boards.h"
 #include "monitor/stats.h"
@@ -477,6 +478,7 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
 cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
 
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
@@ -1470,6 +1472,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
  */
 sleep(1);
 
+/* keep sleeping so that dirtylimit not be interfered by reaper */
+if (dirtylimit_in_service()) {
+continue;
+}
+
 trace_kvm_dirty_ring_reaper("wakeup");
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
@@ -2975,8 +2982,19 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state, NULL);
+/*
+ * We throttle vCPU by making it sleep once it exit from kernel
+ * due to dirty ring full. In the dirtylimit scenario, reaping
+ * all vCPUs after a single vCPU dirty ring get full result in
+ * the miss of sleep, so just reap the ring-fulled vCPU.
+ */
+if (dirtylimit_in_service()) {
+kvm_dirty_ring_reap(kvm_state, cpu);
+} else {
+kvm_dirty_ring_reap(kvm_state, NULL);
+}
 qemu_mutex_unlock_iothread();
+dirtylimit_vcpu_execute(cpu);
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 996f94059f..500503da13 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -418,6 +418,12 @@ struct CPUState {
  */
 bool throttle_thread_scheduled;
 
+/*
+ * Sleep throttle_us_per_full microseconds once dirty ring is full
+ * if dirty page rate limit is enabled.
+ */
+int64_t throttle_us_per_full;
+
 bool ignore_memory_transaction_failures;
 
 /* Used for user-only emulation of prctl(PR_SET_UNALIGN). */
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index da459f03d6..8d2c1f3a6b 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -19,4 +19,19 @@ void vcpu_dirty_rate_stat_start(void);
 void vcpu_dirty_rate_stat_stop(void);
 void vcpu_dirty_rate_stat_initialize(void);
 void vcpu_dirty_rate_stat_finalize(void);
+
+void dirtylimit_state_lock(void);
+void dirtylimit_state_unlock(void);
+void dirtylimit_state_initialize(void);
+void dirtylimit_state_finalize(void);
+bool dirtylimit_in_service(void);
+bool dirtylimit_vcpu_index_valid(int cpu_index);
+void dirtylimit_process(void);
+void dirtylimit_change(bool start);
+void dirtylimit_set_vcpu(int cpu_index,
+ uint64_t quota,
+ bool enable);
+void dirtylimit_set_all(uint64_t quota,
+bool enable);
+void dirtylimit_vcpu_execute(CPUState *cpu);
 #endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index ebdc064c9d..e5a4f970bd 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -18,6 +18,26 @@
 #include "sysemu/dirtylimit.h"
 #include "exec/memory.h"
 #include "hw/boards.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+/*
+ * Plus or minus vcpu sleep time linearly if dirty
+ * page rate error value percentage over
+ * DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT.
+ * Otherwise, plus or minus a fixed vcpu sleep time.
+ */
+#define DIRTYLIMIT_LINEAR_ADJUSTMENT_PCT 50
+/*
+ * Max vcpu sleep time percentage during a cycle
+ * composed of dirty ring full and sleep time.
+ */
+#define DIRTYLIMIT_THROTTLE_PCT_MAX 99
 
 struct {
 VcpuStat stat;
@@ -25,6 +45,30 @@ struct {
 QemuThread thread;
 } *vcpu_

[PULL 22/29] tests: Add postcopy tls recovery migration test

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

It's easy to build this upon the postcopy tls test.  Rename the old
postcopy recovery test to postcopy/recovery/plain.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185527.27747-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manual merge
---
 tests/qtest/migration-test.c | 37 +++-
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b2020ef6c5..e9350ea8c6 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1146,17 +1146,15 @@ static void test_postcopy_tls_psk(void)
 }
 #endif
 
-static void test_postcopy_recovery(void)
+static void test_postcopy_recovery_common(MigrateCommon *args)
 {
-MigrateCommon args = {
-.start = {
-.hide_stderr = true,
-},
-};
 QTestState *from, *to;
 g_autofree char *uri = NULL;
 
-if (migrate_postcopy_prepare(&from, &to, &args)) {
+/* Always hide errors for postcopy recover tests since they're expected */
+args->start.hide_stderr = true;
+
+if (migrate_postcopy_prepare(&from, &to, args)) {
 return;
 }
 
@@ -1207,7 +1205,24 @@ static void test_postcopy_recovery(void)
 /* Restore the postcopy bandwidth to unlimited */
 migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
-migrate_postcopy_complete(from, to, &args);
+migrate_postcopy_complete(from, to, args);
+}
+
+static void test_postcopy_recovery(void)
+{
+MigrateCommon args = { };
+
+test_postcopy_recovery_common(&args);
+}
+
+static void test_postcopy_recovery_tls_psk(void)
+{
+MigrateCommon args = {
+.start_hook = test_migrate_tls_psk_start_match,
+.finish_hook = test_migrate_tls_psk_finish,
+};
+
+test_postcopy_recovery_common(&args);
 }
 
 static void test_baddest(void)
@@ -2410,7 +2425,9 @@ int main(int argc, char **argv)
 module_call_init(MODULE_INIT_QOM);
 
 qtest_add_func("/migration/postcopy/unix", test_postcopy);
-qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
+qtest_add_func("/migration/postcopy/recovery/plain",
+   test_postcopy_recovery);
+
 qtest_add_func("/migration/postcopy/plain", test_postcopy);
 
 qtest_add_func("/migration/bad_dest", test_baddest);
@@ -2425,6 +2442,8 @@ int main(int argc, char **argv)
  * general postcopy path that has TLS channel enabled.
  */
 qtest_add_func("/migration/postcopy/tls/psk", test_postcopy_tls_psk);
+qtest_add_func("/migration/postcopy/recovery/tls/psk",
+   test_postcopy_recovery_tls_psk);
 #ifdef CONFIG_TASN1
 qtest_add_func("/migration/precopy/unix/tls/x509/default-host",
test_precopy_unix_tls_x509_default_host);
-- 
2.36.1

[PULL 08/29] tests: Add dirty page rate limit test

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Add dirty page rate limit test if kernel support dirty ring,

The following qmp commands are covered by this test case:
"calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
Message-Id: 

Signed-off-by: Dr. David Alan Gilbert 
---
 tests/qtest/migration-helpers.c |  22 +++
 tests/qtest/migration-helpers.h |   2 +
 tests/qtest/migration-test.c| 256 
 3 files changed, 280 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index e81e831c85..c6fbeb3974 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -83,6 +83,28 @@ QDict *wait_command(QTestState *who, const char *command, 
...)
 return ret;
 }
 
+/*
+ * Execute the qmp command only
+ */
+QDict *qmp_command(QTestState *who, const char *command, ...)
+{
+va_list ap;
+QDict *resp, *ret;
+
+va_start(ap, command);
+resp = qtest_vqmp(who, command, ap);
+va_end(ap);
+
+g_assert(!qdict_haskey(resp, "error"));
+g_assert(qdict_haskey(resp, "return"));
+
+ret = qdict_get_qdict(resp, "return");
+qobject_ref(ret);
+qobject_unref(resp);
+
+return ret;
+}
+
 /*
  * Send QMP command "migrate".
  * Arguments are built from @fmt... (formatted like
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 78587c2b82..59561898d0 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -23,6 +23,8 @@ QDict *wait_command_fd(QTestState *who, int fd, const char 
*command, ...);
 G_GNUC_PRINTF(2, 3)
 QDict *wait_command(QTestState *who, const char *command, ...);
 
+QDict *qmp_command(QTestState *who, const char *command, ...);
+
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 9e64125f02..db4dcc5b31 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -24,6 +24,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qobject-output-visitor.h"
 #include "crypto/tlscredspsk.h"
+#include "qapi/qmp/qlist.h"
 
 #include "migration-helpers.h"
 #include "tests/migration/migration-test.h"
@@ -46,6 +47,12 @@ unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
 
+/*
+ * Dirtylimit stop working if dirty page rate error
+ * value less than DIRTYLIMIT_TOLERANCE_RANGE
+ */
+#define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
+
 #if defined(__linux__)
 #include 
 #include 
@@ -2059,6 +2066,253 @@ static void test_multifd_tcp_cancel(void)
 test_migrate_end(from, to2, true);
 }
 
+static void calc_dirty_rate(QTestState *who, uint64_t calc_time)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'calc-dirty-rate',"
+  "'arguments': { "
+  "'calc-time': %ld,"
+  "'mode': 'dirty-ring' }}",
+  calc_time));
+}
+
+static QDict *query_dirty_rate(QTestState *who)
+{
+return qmp_command(who, "{ 'execute': 'query-dirty-rate' }");
+}
+
+static void dirtylimit_set_all(QTestState *who, uint64_t dirtyrate)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'set-vcpu-dirty-limit',"
+  "'arguments': { "
+  "'dirty-rate': %ld } }",
+  dirtyrate));
+}
+
+static void cancel_vcpu_dirty_limit(QTestState *who)
+{
+qobject_unref(qmp_command(who,
+  "{ 'execute': 'cancel-vcpu-dirty-limit' }"));
+}
+
+static QDict *query_vcpu_dirty_limit(QTestState *who)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who, "{ 'execute': 'query-vcpu-dirty-limit' }");
+g_assert(!qdict_haskey(rsp, "error"));
+g_assert(qdict_haskey(rsp, "return"));
+
+return rsp;
+}
+
+static bool calc_dirtyrate_ready(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(rsp_return, "status"));
+g_assert(status);
+
+return g_strcmp0(status, "measuring");
+}
+
+static void wait_for_calc_dirtyrate_complete(QTestState *who,
+ int64_t time_s)
+{
+int max_try_count = 1;
+usleep(time_s * 100);
+
+while (!calc_dirtyrate_ready(who) && max_try_count--) {
+usleep(1000);
+}
+
+/*
+ * Set the timeout with 10 s(max_try_count * 1000us),
+ * if dirtyrate measurement not complete, fail test.
+ */
+g_assert_cmpint(max_try_count, !=, 0);
+}
+
+static int64_t get_dirty_rate(QTestState *who)
+{
+QDict *rsp_return;
+gchar *status;
+QList *rates;
+const QListEntry *entry;
+QDict *rate;
+int64_t dirtyrate;
+
+rsp_return = query_dirty_rate(who);
+g_assert(rsp_return);
+
+status = g_strdup(qdict_get_str(

[PULL 20/29] tests: Move MigrateCommon upper

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

So that it can be used in postcopy tests too soon.

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Message-Id: <20220707185522.27638-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tests/qtest/migration-test.c | 144 +--
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index db4dcc5b31..f3931e0a92 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -503,6 +503,78 @@ typedef struct {
 const char *opts_target;
 } MigrateStart;
 
+/*
+ * A hook that runs after the src and dst QEMUs have been
+ * created, but before the migration is started. This can
+ * be used to set migration parameters and capabilities.
+ *
+ * Returns: NULL, or a pointer to opaque state to be
+ *  later passed to the TestMigrateFinishHook
+ */
+typedef void * (*TestMigrateStartHook)(QTestState *from,
+   QTestState *to);
+
+/*
+ * A hook that runs after the migration has finished,
+ * regardless of whether it succeeded or failed, but
+ * before QEMU has terminated (unless it self-terminated
+ * due to migration error)
+ *
+ * @opaque is a pointer to state previously returned
+ * by the TestMigrateStartHook if any, or NULL.
+ */
+typedef void (*TestMigrateFinishHook)(QTestState *from,
+  QTestState *to,
+  void *opaque);
+
+typedef struct {
+/* Optional: fine tune start parameters */
+MigrateStart start;
+
+/* Required: the URI for the dst QEMU to listen on */
+const char *listen_uri;
+
+/*
+ * Optional: the URI for the src QEMU to connect to
+ * If NULL, then it will query the dst QEMU for its actual
+ * listening address and use that as the connect address.
+ * This allows for dynamically picking a free TCP port.
+ */
+const char *connect_uri;
+
+/* Optional: callback to run at start to set migration parameters */
+TestMigrateStartHook start_hook;
+/* Optional: callback to run at finish to cleanup */
+TestMigrateFinishHook finish_hook;
+
+/*
+ * Optional: normally we expect the migration process to complete.
+ *
+ * There can be a variety of reasons and stages in which failure
+ * can happen during tests.
+ *
+ * If a failure is expected to happen at time of establishing
+ * the connection, then MIG_TEST_FAIL will indicate that the dst
+ * QEMU is expected to stay running and accept future migration
+ * connections.
+ *
+ * If a failure is expected to happen while processing the
+ * migration stream, then MIG_TEST_FAIL_DEST_QUIT_ERR will indicate
+ * that the dst QEMU is expected to quit with non-zero exit status
+ */
+enum {
+/* This test should succeed, the default */
+MIG_TEST_SUCCEED = 0,
+/* This test should fail, dest qemu should keep alive */
+MIG_TEST_FAIL,
+/* This test should fail, dest qemu should fail with abnormal status */
+MIG_TEST_FAIL_DEST_QUIT_ERR,
+} result;
+
+/* Optional: set number of migration passes to wait for */
+unsigned int iterations;
+} MigrateCommon;
+
 static int test_migrate_start(QTestState **from, QTestState **to,
   const char *uri, MigrateStart *args)
 {
@@ -1120,78 +1192,6 @@ static void test_baddest(void)
 test_migrate_end(from, to, false);
 }
 
-/*
- * A hook that runs after the src and dst QEMUs have been
- * created, but before the migration is started. This can
- * be used to set migration parameters and capabilities.
- *
- * Returns: NULL, or a pointer to opaque state to be
- *  later passed to the TestMigrateFinishHook
- */
-typedef void * (*TestMigrateStartHook)(QTestState *from,
-   QTestState *to);
-
-/*
- * A hook that runs after the migration has finished,
- * regardless of whether it succeeded or failed, but
- * before QEMU has terminated (unless it self-terminated
- * due to migration error)
- *
- * @opaque is a pointer to state previously returned
- * by the TestMigrateStartHook if any, or NULL.
- */
-typedef void (*TestMigrateFinishHook)(QTestState *from,
-  QTestState *to,
-  void *opaque);
-
-typedef struct {
-/* Optional: fine tune start parameters */
-MigrateStart start;
-
-/* Required: the URI for the dst QEMU to listen on */
-const char *listen_uri;
-
-/*
- * Optional: the URI for the src QEMU to connect to
- * If NULL, then it will query the dst QEMU for its actual
- * listening address and use that as the connect address.
- * This allows for dynamically picking a free TCP port.
- */
-const char *connect_uri;
-
-/* Optional: callback to run at start to set migration parameters */
-TestMigrateStartHook start

[PULL 03/29] migration/dirtyrate: Refactor dirty page rate calculation

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

abstract out dirty log change logic into function
global_dirty_log_change.

abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.

abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.

rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.

handle cpu hotplug/unplug scenario during measurement of
dirty page rate.

export util functions outside migration.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<7b6f6f4748d5b3d017b31a0429e630229ae97538.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 include/sysemu/dirtyrate.h |  28 +
 migration/dirtyrate.c  | 227 +++--
 migration/dirtyrate.h  |   7 +-
 3 files changed, 174 insertions(+), 88 deletions(-)
 create mode 100644 include/sysemu/dirtyrate.h

diff --git a/include/sysemu/dirtyrate.h b/include/sysemu/dirtyrate.h
new file mode 100644
index 00..4d3b9a4902
--- /dev/null
+++ b/include/sysemu/dirtyrate.h
@@ -0,0 +1,28 @@
+/*
+ * dirty page rate helper functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_DIRTYRATE_H
+#define QEMU_DIRTYRATE_H
+
+typedef struct VcpuStat {
+int nvcpu; /* number of vcpu */
+DirtyRateVcpu *rates; /* array of dirty rate for each vcpu */
+} VcpuStat;
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot);
+
+void global_dirty_log_change(unsigned int flag,
+ bool start);
+#endif
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index aace12a787..795fab5c37 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -46,7 +46,7 @@ static struct DirtyRateStat DirtyStat;
 static DirtyRateMeasureMode dirtyrate_mode =
 DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING;
 
-static int64_t set_sample_page_period(int64_t msec, int64_t initial_time)
+static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time)
 {
 int64_t current_time;
 
@@ -60,6 +60,132 @@ static int64_t set_sample_page_period(int64_t msec, int64_t 
initial_time)
 return msec;
 }
 
+static inline void record_dirtypages(DirtyPageRecord *dirty_pages,
+ CPUState *cpu, bool start)
+{
+if (start) {
+dirty_pages[cpu->cpu_index].start_pages = cpu->dirty_pages;
+} else {
+dirty_pages[cpu->cpu_index].end_pages = cpu->dirty_pages;
+}
+}
+
+static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages,
+  int64_t calc_time_ms)
+{
+uint64_t memory_size_MB;
+uint64_t increased_dirty_pages =
+dirty_pages.end_pages - dirty_pages.start_pages;
+
+memory_size_MB = (increased_dirty_pages * TARGET_PAGE_SIZE) >> 20;
+
+return memory_size_MB * 1000 / calc_time_ms;
+}
+
+void global_dirty_log_change(unsigned int flag, bool start)
+{
+qemu_mutex_lock_iothread();
+if (start) {
+memory_global_dirty_log_start(flag);
+} else {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+/*
+ * global_dirty_log_sync
+ * 1. sync dirty log from kvm
+ * 2. stop dirty tracking if needed.
+ */
+static void global_dirty_log_sync(unsigned int flag, bool one_shot)
+{
+qemu_mutex_lock_iothread();
+memory_global_dirty_log_sync();
+if (one_shot) {
+memory_global_dirty_log_stop(flag);
+}
+qemu_mutex_unlock_iothread();
+}
+
+static DirtyPageRecord *vcpu_dirty_stat_alloc(VcpuStat *stat)
+{
+CPUState *cpu;
+DirtyPageRecord *records;
+int nvcpu = 0;
+
+CPU_FOREACH(cpu) {
+nvcpu++;
+}
+
+stat->nvcpu = nvcpu;
+stat->rates = g_malloc0(sizeof(DirtyRateVcpu) * nvcpu);
+
+records = g_malloc0(sizeof(DirtyPageRecord) * nvcpu);
+
+return records;
+}
+
+static void vcpu_dirty_stat_collect(VcpuStat *stat,
+DirtyPageRecord *records,
+bool start)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+record_dirtypages(records, cpu, start);
+}
+}
+
+int64_t vcpu_calculate_dirtyrate(int64_t calc_time_ms,
+ VcpuStat *stat,
+ unsigned int flag,
+ bool one_shot)
+{
+DirtyPageRecord *records;
+int64_t init_time_ms;
+int64_t duration;
+int64_t dirtyrate;
+int i = 0;
+unsigned int gen_id;
+
+retry:
+init_time_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+cpu_list_lock();
+gen_id = cpu_

[PULL 25/29] QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Leonardo Bras 

If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
returns 1. This return code should be used only when Linux fails to use
MSG_ZEROCOPY on a lot of sendmsg().

Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
was attempted.

Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & 
io_flush for CONFIG_LINUX")
Signed-off-by: Leonardo Bras 
Reviewed-by: Daniel P. Berrangé 
Acked-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Reviewed-by: Peter Xu 
Message-Id: <2022071122.18951-2-leob...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 io/channel-socket.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 4466bb1cd4..74a936cc1f 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -716,12 +716,18 @@ static int qio_channel_socket_flush(QIOChannel *ioc,
 struct cmsghdr *cm;
 char control[CMSG_SPACE(sizeof(*serr))];
 int received;
-int ret = 1;
+int ret;
+
+if (sioc->zero_copy_queued == sioc->zero_copy_sent) {
+return 0;
+}
 
 msg.msg_control = control;
 msg.msg_controllen = sizeof(control);
 memset(control, 0, sizeof(control));
 
+ret = 1;
+
 while (sioc->zero_copy_sent < sioc->zero_copy_queued) {
 received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE);
 if (received < 0) {
-- 
2.36.1

[PULL 14/29] migration: Create the postcopy preempt channel asynchronously

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

This patch allows the postcopy preempt channel to be created
asynchronously.  The benefit is that when the connection is slow, we won't
take the BQL (and potentially block all things like QMP) for a long time
without releasing.

A function postcopy_preempt_wait_channel() is introduced, allowing the
migration thread to be able to wait on the channel creation.  The channel
is always created by the main thread, in which we'll kick a new semaphore
to tell the migration thread that the channel has created.

We'll need to wait for the new channel in two places: (1) when there's a
new postcopy migration that is starting, or (2) when there's a postcopy
migration to resume.

For the start of migration, we don't need to wait for this channel until
when we want to start postcopy, aka, postcopy_start().  We'll fail the
migration if we found that the channel creation failed (which should
probably not happen at all in 99% of the cases, because the main channel is
using the same network topology).

For a postcopy recovery, we'll need to wait in postcopy_pause().  In that
case if the channel creation failed, we can't fail the migration or we'll
crash the VM, instead we keep in PAUSED state, waiting for yet another
recovery.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
Message-Id: <20220707185509.27311-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c| 16 
 migration/migration.h|  7 +
 migration/postcopy-ram.c | 56 +++-
 migration/postcopy-ram.h |  1 +
 4 files changed, 68 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3119bd2e4b..427d4de185 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3053,6 +3053,12 @@ static int postcopy_start(MigrationState *ms)
 int64_t bandwidth = migrate_max_postcopy_bandwidth();
 bool restart_block = false;
 int cur_state = MIGRATION_STATUS_ACTIVE;
+
+if (postcopy_preempt_wait_channel(ms)) {
+migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+return -1;
+}
+
 if (!migrate_pause_before_switchover()) {
 migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -3534,6 +3540,14 @@ static MigThrError postcopy_pause(MigrationState *s)
 if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
 /* Woken up by a recover procedure. Give it a shot */
 
+if (postcopy_preempt_wait_channel(s)) {
+/*
+ * Preempt enabled, and new channel create failed; loop
+ * back to wait for another recovery.
+ */
+continue;
+}
+
 /*
  * Firstly, let's wake up the return path now, with a new
  * return path channel.
@@ -4398,6 +4412,7 @@ static void migration_instance_finalize(Object *obj)
 qemu_sem_destroy(&ms->postcopy_pause_sem);
 qemu_sem_destroy(&ms->postcopy_pause_rp_sem);
 qemu_sem_destroy(&ms->rp_state.rp_sem);
+qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
 error_free(ms->error);
 }
 
@@ -,6 +4459,7 @@ static void migration_instance_init(Object *obj)
 qemu_sem_init(&ms->rp_state.rp_sem, 0);
 qemu_sem_init(&ms->rate_limit_sem, 0);
 qemu_sem_init(&ms->wait_unplug_sem, 0);
+qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);
 qemu_mutex_init(&ms->qemu_file_lock);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 9220cec6bd..ae4ffd3454 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -219,6 +219,13 @@ struct MigrationState {
 QEMUFile *to_dst_file;
 /* Postcopy specific transfer channel */
 QEMUFile *postcopy_qemufile_src;
+/*
+ * It is posted when the preempt channel is established.  Note: this is
+ * used for both the start or recover of a postcopy migration.  We'll
+ * post to this sem every time a new preempt channel is created in the
+ * main thread, and we keep post() and wait() in pair.
+ */
+QemuSemaphore postcopy_qemufile_src_sem;
 QIOChannelBuffer *bioc;
 /*
  * Protects to_dst_file/from_dst_file pointers.  We need to make sure we
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 84f7b1526e..70b21e9d51 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1552,10 +1552,50 @@ bool 
postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 return true;
 }
 
-int postcopy_preempt_setup(MigrationState *s, Error **errp)
+static void
+postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
 {
-QIOChannel *ioc;
+MigrationState *s = opaque;
+QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
+Error *local_err = NULL;
+
+if (qio_task_propagate_error(task, &local_err)) {
+/* Somet

[PULL 04/29] softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.

Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.

Add dirtylimit.h to export util functions for dirty page rate
limit implementation.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<5d0d641bffcb9b1c4cc3e323b6dfecb36050d948.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 include/exec/memory.h   |   5 +-
 include/sysemu/dirtylimit.h |  22 +++
 softmmu/dirtylimit.c| 116 
 softmmu/meson.build |   1 +
 4 files changed, 143 insertions(+), 1 deletion(-)
 create mode 100644 include/sysemu/dirtylimit.h
 create mode 100644 softmmu/dirtylimit.c

diff --git a/include/exec/memory.h b/include/exec/memory.h
index a6a0f4d8ad..bfb1de8eea 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -69,7 +69,10 @@ static inline void fuzz_dma_read_cb(size_t addr,
 /* Dirty tracking enabled because measuring dirty rate */
 #define GLOBAL_DIRTY_DIRTY_RATE (1U << 1)
 
-#define GLOBAL_DIRTY_MASK  (0x3)
+/* Dirty tracking enabled because dirty limit */
+#define GLOBAL_DIRTY_LIMIT  (1U << 2)
+
+#define GLOBAL_DIRTY_MASK  (0x7)
 
 extern unsigned int global_dirty_tracking;
 
diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
new file mode 100644
index 00..da459f03d6
--- /dev/null
+++ b/include/sysemu/dirtylimit.h
@@ -0,0 +1,22 @@
+/*
+ * Dirty page rate limit common functions
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_DIRTYRLIMIT_H
+#define QEMU_DIRTYRLIMIT_H
+
+#define DIRTYLIMIT_CALC_TIME_MS 1000/* 1000ms */
+
+int64_t vcpu_dirty_rate_get(int cpu_index);
+void vcpu_dirty_rate_stat_start(void);
+void vcpu_dirty_rate_stat_stop(void);
+void vcpu_dirty_rate_stat_initialize(void);
+void vcpu_dirty_rate_stat_finalize(void);
+#endif
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
new file mode 100644
index 00..ebdc064c9d
--- /dev/null
+++ b/softmmu/dirtylimit.c
@@ -0,0 +1,116 @@
+/*
+ * Dirty page rate limit implementation code
+ *
+ * Copyright (c) 2022 CHINA TELECOM CO.,LTD.
+ *
+ * Authors:
+ *  Hyman Huang(黄勇) 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/main-loop.h"
+#include "qapi/qapi-commands-migration.h"
+#include "sysemu/dirtyrate.h"
+#include "sysemu/dirtylimit.h"
+#include "exec/memory.h"
+#include "hw/boards.h"
+
+struct {
+VcpuStat stat;
+bool running;
+QemuThread thread;
+} *vcpu_dirty_rate_stat;
+
+static void vcpu_dirty_rate_stat_collect(void)
+{
+VcpuStat stat;
+int i = 0;
+
+/* calculate vcpu dirtyrate */
+vcpu_calculate_dirtyrate(DIRTYLIMIT_CALC_TIME_MS,
+ &stat,
+ GLOBAL_DIRTY_LIMIT,
+ false);
+
+for (i = 0; i < stat.nvcpu; i++) {
+vcpu_dirty_rate_stat->stat.rates[i].id = i;
+vcpu_dirty_rate_stat->stat.rates[i].dirty_rate =
+stat.rates[i].dirty_rate;
+}
+
+free(stat.rates);
+}
+
+static void *vcpu_dirty_rate_stat_thread(void *opaque)
+{
+rcu_register_thread();
+
+/* start log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, true);
+
+while (qatomic_read(&vcpu_dirty_rate_stat->running)) {
+vcpu_dirty_rate_stat_collect();
+}
+
+/* stop log sync */
+global_dirty_log_change(GLOBAL_DIRTY_LIMIT, false);
+
+rcu_unregister_thread();
+return NULL;
+}
+
+int64_t vcpu_dirty_rate_get(int cpu_index)
+{
+DirtyRateVcpu *rates = vcpu_dirty_rate_stat->stat.rates;
+return qatomic_read_i64(&rates[cpu_index].dirty_rate);
+}
+
+void vcpu_dirty_rate_stat_start(void)
+{
+if (qatomic_read(&vcpu_dirty_rate_stat->running)) {
+return;
+}
+
+qatomic_set(&vcpu_dirty_rate_stat->running, 1);
+qemu_thread_create(&vcpu_dirty_rate_stat->thread,
+   "dirtyrate-stat",
+   vcpu_dirty_rate_stat_thread,
+   NULL,
+   QEMU_THREAD_JOINABLE);
+}
+
+void vcpu_dirty_rate_stat_stop(void)
+{
+qatomic_set(&vcpu_dirty_rate_stat->running, 0);
+qemu_mutex_unlock_iothread();
+qemu_thread_join(&vcpu_dirty_rate_stat->thread);
+qemu_mutex_lock_iothread();
+}
+
+void vcpu_dirty_rate_stat_initialize(void)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+int max_cpus = ms->smp.max_cpus;
+
+vcpu_dirty_rate_stat =
+g_malloc0(sizeof(*vcpu_dirty_rate_stat));
+
+vcpu_dirty_rate_stat->s

[PULL 15/29] migration: Add property x-postcopy-preempt-break-huge

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Add a property field that can conditionally disable the "break sending huge
page" behavior in postcopy preemption.  By default it's enabled.

It should only be used for debugging purposes, and we should never remove
the "x-" prefix.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
Message-Id: <20220707185511.27366-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 2 ++
 migration/migration.h | 7 +++
 migration/ram.c   | 7 +++
 3 files changed, 16 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 427d4de185..864164ad96 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4363,6 +4363,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_SIZE("announce-step", MigrationState,
   parameters.announce_step,
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
+DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
+  postcopy_preempt_break_huge, true),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
diff --git a/migration/migration.h b/migration/migration.h
index ae4ffd3454..cdad8aceaa 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -340,6 +340,13 @@ struct MigrationState {
 bool send_configuration;
 /* Whether we send section footer during migration */
 bool send_section_footer;
+/*
+ * Whether we allow break sending huge pages when postcopy preempt is
+ * enabled.  When disabled, we won't interrupt precopy within sending a
+ * host huge page, which is the old behavior of vanilla postcopy.
+ * NOTE: this parameter is ignored if postcopy preempt is not enabled.
+ */
+bool postcopy_preempt_break_huge;
 
 /* Needed by postcopy-pause state */
 QemuSemaphore postcopy_pause_sem;
diff --git a/migration/ram.c b/migration/ram.c
index 65b08c4edb..7cbe9c310d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2266,11 +2266,18 @@ static int ram_save_target_page(RAMState *rs, 
PageSearchStatus *pss)
 
 static bool postcopy_needs_preempt(RAMState *rs, PageSearchStatus *pss)
 {
+MigrationState *ms = migrate_get_current();
+
 /* Not enabled eager preempt?  Then never do that. */
 if (!migrate_postcopy_preempt()) {
 return false;
 }
 
+/* If the user explicitly disabled breaking of huge page, skip */
+if (!ms->postcopy_preempt_break_huge) {
+return false;
+}
+
 /* If the ramblock we're sending is a small page?  Never bother. */
 if (qemu_ram_pagesize(pss->block) == TARGET_PAGE_SIZE) {
 return false;
-- 
2.36.1

[PULL 16/29] migration: Add helpers to detect TLS capability

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Add migrate_channel_requires_tls() to detect whether the specific channel
requires TLS, leveraging the recently introduced migrate_use_tls().  No
functional change intended.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185513.27421-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/channel.c   | 9 ++---
 migration/migration.c | 1 +
 migration/multifd.c   | 4 +---
 migration/tls.c   | 9 +
 migration/tls.h   | 4 
 5 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index 90087d8986..1b0815039f 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -38,9 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc)
 trace_migration_set_incoming_channel(
 ioc, object_get_typename(OBJECT(ioc)));
 
-if (migrate_use_tls() &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 migration_tls_channel_process_incoming(s, ioc, &local_err);
 } else {
 migration_ioc_register_yank(ioc);
@@ -70,10 +68,7 @@ void migration_channel_connect(MigrationState *s,
 ioc, object_get_typename(OBJECT(ioc)), hostname, error);
 
 if (!error) {
-if (s->parameters.tls_creds &&
-*s->parameters.tls_creds &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 migration_tls_channel_connect(s, ioc, hostname, &error);
 
 if (!error) {
diff --git a/migration/migration.c b/migration/migration.c
index 864164ad96..cc41787079 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -48,6 +48,7 @@
 #include "trace.h"
 #include "exec/target_page.h"
 #include "io/channel-buffer.h"
+#include "io/channel-tls.h"
 #include "migration/colo.h"
 #include "hw/boards.h"
 #include "hw/qdev-properties.h"
diff --git a/migration/multifd.c b/migration/multifd.c
index 684c014c86..1e49594b02 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -831,9 +831,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 migrate_get_current()->hostname, error);
 
 if (!error) {
-if (migrate_use_tls() &&
-!object_dynamic_cast(OBJECT(ioc),
- TYPE_QIO_CHANNEL_TLS)) {
+if (migrate_channel_requires_tls_upgrade(ioc)) {
 multifd_tls_channel_connect(p, ioc, &error);
 if (!error) {
 /*
diff --git a/migration/tls.c b/migration/tls.c
index 32c384a8b6..73e8c9d3c2 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -166,3 +166,12 @@ void migration_tls_channel_connect(MigrationState *s,
   NULL,
   NULL);
 }
+
+bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
+{
+if (!migrate_use_tls()) {
+return false;
+}
+
+return !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS);
+}
diff --git a/migration/tls.h b/migration/tls.h
index de4fe2cafd..98e23c9b0e 100644
--- a/migration/tls.h
+++ b/migration/tls.h
@@ -37,4 +37,8 @@ void migration_tls_channel_connect(MigrationState *s,
QIOChannel *ioc,
const char *hostname,
Error **errp);
+
+/* Whether the QIO channel requires further TLS handshake? */
+bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
+
 #endif
-- 
2.36.1

[PULL 18/29] migration: Enable TLS for preempt channel

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

This patch is based on the async preempt channel creation.  It continues
wiring up the new channel with TLS handshake to destionation when enabled.

Note that only the src QEMU needs such operation; the dest QEMU does not
need any change for TLS support due to the fact that all channels are
established synchronously there, so all the TLS magic is already properly
handled by migration_tls_channel_process_incoming().

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Message-Id: <20220707185518.27529-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/postcopy-ram.c | 57 ++--
 migration/trace-events   |  1 +
 2 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 70b21e9d51..b9a37ef255 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -36,6 +36,7 @@
 #include "socket.h"
 #include "qemu-file.h"
 #include "yank_functions.h"
+#include "tls.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -1552,15 +1553,15 @@ bool 
postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file)
 return true;
 }
 
+/*
+ * Setup the postcopy preempt channel with the IOC.  If ERROR is specified,
+ * setup the error instead.  This helper will free the ERROR if specified.
+ */
 static void
-postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
+postcopy_preempt_send_channel_done(MigrationState *s,
+   QIOChannel *ioc, Error *local_err)
 {
-MigrationState *s = opaque;
-QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
-Error *local_err = NULL;
-
-if (qio_task_propagate_error(task, &local_err)) {
-/* Something wrong happened.. */
+if (local_err) {
 migrate_set_error(s, local_err);
 error_free(local_err);
 } else {
@@ -1574,7 +1575,47 @@ postcopy_preempt_send_channel_new(QIOTask *task, 
gpointer opaque)
  * postcopy_qemufile_src to know whether it failed or not.
  */
 qemu_sem_post(&s->postcopy_qemufile_src_sem);
-object_unref(OBJECT(ioc));
+}
+
+static void
+postcopy_preempt_tls_handshake(QIOTask *task, gpointer opaque)
+{
+g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
+MigrationState *s = opaque;
+Error *local_err = NULL;
+
+qio_task_propagate_error(task, &local_err);
+postcopy_preempt_send_channel_done(s, ioc, local_err);
+}
+
+static void
+postcopy_preempt_send_channel_new(QIOTask *task, gpointer opaque)
+{
+g_autoptr(QIOChannel) ioc = QIO_CHANNEL(qio_task_get_source(task));
+MigrationState *s = opaque;
+QIOChannelTLS *tioc;
+Error *local_err = NULL;
+
+if (qio_task_propagate_error(task, &local_err)) {
+goto out;
+}
+
+if (migrate_channel_requires_tls_upgrade(ioc)) {
+tioc = migration_tls_client_create(s, ioc, s->hostname, &local_err);
+if (!tioc) {
+goto out;
+}
+trace_postcopy_preempt_tls_handshake();
+qio_channel_set_name(QIO_CHANNEL(tioc), "migration-tls-preempt");
+qio_channel_tls_handshake(tioc, postcopy_preempt_tls_handshake,
+  s, NULL, NULL);
+/* Setup the channel until TLS handshake finished */
+return;
+}
+
+out:
+/* This handles both good and error cases */
+postcopy_preempt_send_channel_done(s, ioc, local_err);
 }
 
 /* Returns 0 if channel established, -1 for error. */
diff --git a/migration/trace-events b/migration/trace-events
index 0e385c3a07..a34afe7b85 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -287,6 +287,7 @@ postcopy_request_shared_page(const char *sharer, const char 
*rb, uint64_t rb_off
 postcopy_request_shared_page_present(const char *sharer, const char *rb, 
uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
 postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in 
%s"
 postcopy_page_req_del(void *addr, int count) "resolved page req %p total %d"
+postcopy_preempt_tls_handshake(void) ""
 postcopy_preempt_new_channel(void) ""
 postcopy_preempt_thread_entry(void) ""
 postcopy_preempt_thread_exit(void) ""
-- 
2.36.1

[PULL 17/29] migration: Export tls-[creds|hostname|authz] params to cmdline too

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

It's useful for specifying tls credentials all in the cmdline (along with
the -object tls-creds-*), especially for debugging purpose.

The trick here is we must remember to not free these fields again in the
finalize() function of migration object, otherwise it'll cause double-free.

The thing is when destroying an object, we'll first destroy the properties
that bound to the object, then the object itself.  To be explicit, when
destroy the object in object_finalize() we have such sequence of
operations:

object_property_del_all(obj);
object_deinit(obj, ti);

So after this change the two fields are properly released already even
before reaching the finalize() function but in object_property_del_all(),
hence we don't need to free them anymore in finalize() or it's double-free.

This also fixes a trivial memory leak for tls-authz as we forgot to free it
before this patch.

Reviewed-by: Daniel P. Berrange 
Signed-off-by: Peter Xu 
Message-Id: <20220707185515.27475-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index cc41787079..7c7e529ca7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -4366,6 +4366,9 @@ static Property migration_properties[] = {
   DEFAULT_MIGRATE_ANNOUNCE_STEP),
 DEFINE_PROP_BOOL("x-postcopy-preempt-break-huge", MigrationState,
   postcopy_preempt_break_huge, true),
+DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
+DEFINE_PROP_STRING("tls-hostname", MigrationState, 
parameters.tls_hostname),
+DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -4403,12 +4406,9 @@ static void migration_class_init(ObjectClass *klass, 
void *data)
 static void migration_instance_finalize(Object *obj)
 {
 MigrationState *ms = MIGRATION_OBJ(obj);
-MigrationParameters *params = &ms->parameters;
 
 qemu_mutex_destroy(&ms->error_mutex);
 qemu_mutex_destroy(&ms->qemu_file_lock);
-g_free(params->tls_hostname);
-g_free(params->tls_creds);
 qemu_sem_destroy(&ms->wait_unplug_sem);
 qemu_sem_destroy(&ms->rate_limit_sem);
 qemu_sem_destroy(&ms->pause_sem);
-- 
2.36.1

[PULL 19/29] migration: Respect postcopy request order in preemption mode

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

With preemption mode on, when we see a postcopy request that was requesting
for exactly the page that we have preempted before (so we've partially sent
the page already via PRECOPY channel and it got preempted by another
postcopy request), currently we drop the request so that after all the
other postcopy requests are serviced then we'll go back to precopy stream
and start to handle that.

We dropped the request because we can't send it via postcopy channel since
the precopy channel already contains partial of the data, and we can only
send a huge page via one channel as a whole.  We can't split a huge page
into two channels.

That's a very corner case and that works, but there's a change on the order
of postcopy requests that we handle since we're postponing this (unlucky)
postcopy request to be later than the other queued postcopy requests.  The
problem is there's a possibility that when the guest was very busy, the
postcopy queue can be always non-empty, it means this dropped request will
never be handled until the end of postcopy migration. So, there's a chance
that there's one dest QEMU vcpu thread waiting for a page fault for an
extremely long time just because it's unluckily accessing the specific page
that was preempted before.

The worst case time it needs can be as long as the whole postcopy migration
procedure.  It's extremely unlikely to happen, but when it happens it's not
good.

The root cause of this problem is because we treat pss->postcopy_requested
variable as with two meanings bound together, as the variable shows:

  1. Whether this page request is urgent, and,
  2. Which channel we should use for this page request.

With the old code, when we set postcopy_requested it means either both (1)
and (2) are true, or both (1) and (2) are false.  We can never have (1)
and (2) to have different values.

However it doesn't necessarily need to be like that.  It's very legal that
there's one request that has (1) very high urgency, but (2) we'd like to
use the precopy channel.  Just like the corner case we were discussing
above.

To differenciate the two meanings better, introduce a new field called
postcopy_target_channel, showing which channel we should use for this page
request, so as to cover the old meaning (2) only.  Then we leave the
postcopy_requested variable to stand only for meaning (1), which is the
urgency of this page request.

With this change, we can easily boost priority of a preempted precopy page
as long as we know that page is also requested as a postcopy page.  So with
the new approach in get_queued_page() instead of dropping that request, we
send it right away with the precopy channel so we get back the ordering of
the page faults just like how they're requested on dest.

Reported-by: Manish Mishra 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Manish Mishra 
Signed-off-by: Peter Xu 
Message-Id: <20220707185520.27583-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/ram.c | 65 +++--
 1 file changed, 52 insertions(+), 13 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 7cbe9c310d..4fbad74c6c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -442,8 +442,28 @@ struct PageSearchStatus {
 unsigned long page;
 /* Set once we wrap around */
 bool complete_round;
-/* Whether current page is explicitly requested by postcopy */
+/*
+ * [POSTCOPY-ONLY] Whether current page is explicitly requested by
+ * postcopy.  When set, the request is "urgent" because the dest QEMU
+ * threads are waiting for us.
+ */
 bool postcopy_requested;
+/*
+ * [POSTCOPY-ONLY] The target channel to use to send current page.
+ *
+ * Note: This may _not_ match with the value in postcopy_requested
+ * above. Let's imagine the case where the postcopy request is exactly
+ * the page that we're sending in progress during precopy. In this case
+ * we'll have postcopy_requested set to true but the target channel
+ * will be the precopy channel (so that we don't split brain on that
+ * specific page since the precopy channel already contains partial of
+ * that page data).
+ *
+ * Besides that specific use case, postcopy_target_channel should
+ * always be equal to postcopy_requested, because by default we send
+ * postcopy pages via postcopy preempt channel.
+ */
+bool postcopy_target_channel;
 };
 typedef struct PageSearchStatus PageSearchStatus;
 
@@ -495,6 +515,9 @@ static QemuCond decomp_done_cond;
 static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock 
*block,
  ram_addr_t offset, uint8_t *source_buf);
 
+static void postcopy_preempt_restore(RAMState *rs, PageSearchStatus *pss,
+ bool postcopy_requested);
+
 static void *do_data_compress(void *opaque)
 {
 CompressParam *param = opaque;
@@

[PATCH v9 08/11] i386/pc: factor out device_memory base/size to helper

2022-07-19 Thread Joao Martins

Move obtaining hole64_start from device_memory memory region base/size
into an helper alongside correspondent getters in pc_memory_init() when
the hotplug range is unitialized. While doing that remove the memory
region based logic from this newly added helper.

This is the final step that allows pc_pci_hole64_start() to be callable
at the beginning of pc_memory_init() before any memory regions are
initialized.

Cc: Jonathan Cameron 
Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
---
 hw/i386/pc.c | 46 +++---
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c654be6cf0bd..4ebc45773c29 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -825,15 +825,36 @@ static hwaddr pc_above_4g_end(PCMachineState *pcms)
 return x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
-static uint64_t pc_get_cxl_range_start(PCMachineState *pcms)
+static void pc_get_device_memory_range(PCMachineState *pcms,
+   hwaddr *base,
+   ram_addr_t *device_mem_size)
 {
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 MachineState *machine = MACHINE(pcms);
+ram_addr_t size;
+hwaddr addr;
+
+size = machine->maxram_size - machine->ram_size;
+addr = ROUND_UP(pc_above_4g_end(pcms), 1 * GiB);
+
+if (pcmc->enforce_aligned_dimm) {
+/* size device region assuming 1G page max alignment per slot */
+size += (1 * GiB) * machine->ram_slots;
+}
+
+*base = addr;
+*device_mem_size = size;
+}
+
+static uint64_t pc_get_cxl_range_start(PCMachineState *pcms)
+{
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 hwaddr cxl_base;
+ram_addr_t size;
 
-if (pcmc->has_reserved_memory && machine->device_memory->base) {
-cxl_base = machine->device_memory->base
-+ memory_region_size(&machine->device_memory->mr);
+if (pcmc->has_reserved_memory) {
+pc_get_device_memory_range(pcms, &cxl_base, &size);
+cxl_base += size;
 } else {
 cxl_base = pc_above_4g_end(pcms);
 }
@@ -920,7 +941,7 @@ void pc_memory_init(PCMachineState *pcms,
 /* initialize device memory address space */
 if (pcmc->has_reserved_memory &&
 (machine->ram_size < machine->maxram_size)) {
-ram_addr_t device_mem_size = machine->maxram_size - machine->ram_size;
+ram_addr_t device_mem_size;
 
 if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
 error_report("unsupported amount of memory slots: %"PRIu64,
@@ -935,13 +956,7 @@ void pc_memory_init(PCMachineState *pcms,
 exit(EXIT_FAILURE);
 }
 
-machine->device_memory->base =
-ROUND_UP(pc_above_4g_end(pcms), 1 * GiB);
-
-if (pcmc->enforce_aligned_dimm) {
-/* size device region assuming 1G page max alignment per slot */
-device_mem_size += (1 * GiB) * machine->ram_slots;
-}
+pc_get_device_memory_range(pcms, &machine->device_memory->base, 
&device_mem_size);
 
 if ((machine->device_memory->base + device_mem_size) <
 device_mem_size) {
@@ -1046,13 +1061,14 @@ uint64_t pc_pci_hole64_start(void)
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 MachineState *ms = MACHINE(pcms);
 uint64_t hole64_start = 0;
+ram_addr_t size = 0;
 
 if (pcms->cxl_devices_state.is_enabled) {
 hole64_start = pc_get_cxl_range_end(pcms);
-} else if (pcmc->has_reserved_memory && ms->device_memory->base) {
-hole64_start = ms->device_memory->base;
+} else if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) {
+pc_get_device_memory_range(pcms, &hole64_start, &size);
 if (!pcmc->broken_reserved_end) {
-hole64_start += memory_region_size(&ms->device_memory->mr);
+hole64_start += size;
 }
 } else {
 hole64_start = pc_above_4g_end(pcms);
-- 
2.17.2

[PULL 10/29] migration: Add postcopy-preempt capability

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Firstly, postcopy already preempts precopy due to the fact that we do
unqueue_page() first before looking into dirty bits.

However that's not enough, e.g., when there're host huge page enabled, when
sending a precopy huge page, a postcopy request needs to wait until the whole
huge page that is sending to finish.  That could introduce quite some delay,
the bigger the huge page is the larger delay it'll bring.

This patch adds a new capability to allow postcopy requests to preempt existing
precopy page during sending a huge page, so that postcopy requests can be
serviced even faster.

Meanwhile to send it even faster, bypass the precopy stream by providing a
standalone postcopy socket for sending requested pages.

Since the new behavior will not be compatible with the old behavior, this will
not be the default, it's enabled only when the new capability is set on both
src/dst QEMUs.

This patch only adds the capability itself, the logic will be added in follow
up patches.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
Message-Id: <20220707185342.26794-2-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 18 ++
 migration/migration.h |  1 +
 qapi/migration.json   |  7 ++-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 78f5057373..ce7bb68cdc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1297,6 +1297,13 @@ static bool migrate_caps_check(bool *cap_list,
 return false;
 }
 
+if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT]) {
+if (!cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+error_setg(errp, "Postcopy preempt requires postcopy-ram");
+return false;
+}
+}
+
 return true;
 }
 
@@ -2663,6 +2670,15 @@ bool migrate_background_snapshot(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT];
 }
 
+bool migrate_postcopy_preempt(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT];
+}
+
 /* migration thread support */
 /*
  * Something bad happened to the RP stream, mark an error
@@ -4274,6 +4290,8 @@ static Property migration_properties[] = {
 DEFINE_PROP_MIG_CAP("x-compress", MIGRATION_CAPABILITY_COMPRESS),
 DEFINE_PROP_MIG_CAP("x-events", MIGRATION_CAPABILITY_EVENTS),
 DEFINE_PROP_MIG_CAP("x-postcopy-ram", MIGRATION_CAPABILITY_POSTCOPY_RAM),
+DEFINE_PROP_MIG_CAP("x-postcopy-preempt",
+MIGRATION_CAPABILITY_POSTCOPY_PREEMPT),
 DEFINE_PROP_MIG_CAP("x-colo", MIGRATION_CAPABILITY_X_COLO),
 DEFINE_PROP_MIG_CAP("x-release-ram", MIGRATION_CAPABILITY_RELEASE_RAM),
 DEFINE_PROP_MIG_CAP("x-block", MIGRATION_CAPABILITY_BLOCK),
diff --git a/migration/migration.h b/migration/migration.h
index 485d58b95f..d2269c826c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -400,6 +400,7 @@ int migrate_decompress_threads(void);
 bool migrate_use_events(void);
 bool migrate_postcopy_blocktime(void);
 bool migrate_background_snapshot(void);
+bool migrate_postcopy_preempt(void);
 
 /* Sending on the return path - generic and then for each message type */
 void migrate_send_rp_shut(MigrationIncomingState *mis,
diff --git a/qapi/migration.json b/qapi/migration.json
index e552ee4f43..7586df3dea 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -467,6 +467,11 @@
 #  Requires that QEMU be permitted to use locked memory
 #  for guest RAM pages.
 #  (since 7.1)
+# @postcopy-preempt: If enabled, the migration process will allow postcopy
+#requests to preempt precopy stream, so postcopy requests
+#will be handled faster.  This is a performance feature and
+#should not affect the correctness of postcopy migration.
+#(since 7.1)
 #
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -482,7 +487,7 @@
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
'validate-uuid', 'background-snapshot',
-   'zero-copy-send'] }
+   'zero-copy-send', 'postcopy-preempt'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.36.1

[PULL 12/29] migration: Postcopy preemption enablement

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

This patch enables postcopy-preempt feature.

It contains two major changes to the migration logic:

(1) Postcopy requests are now sent via a different socket from precopy
background migration stream, so as to be isolated from very high page
request delays.

(2) For huge page enabled hosts: when there's postcopy requests, they can now
intercept a partial sending of huge host pages on src QEMU.

After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.

There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested.  In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.

On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.

On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).

This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed.  Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).

Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages).  So that locality
hint is much meaningless if huge pages are used.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185504.27203-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c  |   2 +
 migration/migration.h  |   2 +-
 migration/ram.c| 251 +++--
 migration/trace-events |   7 ++
 4 files changed, 253 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c965cae1d4..c5f0fdf8f8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3190,6 +3190,8 @@ static int postcopy_start(MigrationState *ms)
   MIGRATION_STATUS_FAILED);
 }
 
+trace_postcopy_preempt_enabled(migrate_postcopy_preempt());
+
 return ret;
 
 fail_closefb:
diff --git a/migration/migration.h b/migration/migration.h
index 941c61e543..ff714c235f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -68,7 +68,7 @@ typedef struct {
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
 /* Previously received RAM's RAMBlock pointer */
-RAMBlock *last_recv_block;
+RAMBlock *last_recv_block[RAM_CHANNEL_MAX];
 /* A hook to allow cleanup at the end of incoming migration */
 void *transport_data;
 void (*transport_cleanup)(void *data);
diff --git a/migration/ram.c b/migration/ram.c
index e4364c0bff..65b08c4edb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -296,6 +296,20 @@ struct RAMSrcPageRequest {
 QSIMPLEQ_ENTRY(RAMSrcPageRequest) next_req;
 };
 
+typedef struct {
+/*
+ * Cached ramblock/offset values if preempted.  They're only meaningful if
+ * preempted==true below.
+ */
+RAMBlock *ram_block;
+unsigned long ram_page;
+/*
+ * Whether a postcopy preemption just happened.  Will be reset after
+ * precopy recovered to background migration.
+ */
+bool preempted;
+} PostcopyPreemptState;
+
 /* State of RAM for migration */
 struct RAMState {
 /* QEMUFile used for this migration */
@@ -350,6 +364,14 @@ struct RAMState {
 /* Queue of outstanding page requests from the destination */
 QemuMutex src_page_req_mutex;
 QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
+
+/* Postcopy preemption informations */
+PostcopyPreemptState postcopy_preempt_state;
+/*
+ * Current channel we're using on src VM.  Only valid if postcopy-preempt
+ * is enabled.
+ */
+unsigned int postcopy_channel;
 };
 typedef struct RAMState RAMState;
 
@@ -357,6 +379,11 @@ static RAMState *ram_state;
 
 static NotifierWithReturnList precopy_notifier_list;
 
+static void postcopy_preempt_reset(RAMState *rs)
+{
+memset(&rs->postcopy_preempt_state, 0, sizeof(PostcopyPreemptState));
+}
+
 /* Whether postcopy has queued requests? */
 static bool postcopy_has_request(RAMState *rs)
 {
@@ -1947,

[PATCH v9 09/11] i386/pc: bounds check phys-bits against max used GPA

2022-07-19 Thread Joao Martins

Calculate max *used* GPA against the CPU maximum possible address
and error out if the former surprasses the latter. This ensures
max used GPA is reacheable by configured phys-bits. Default phys-bits
on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough for the CPU to
address 1Tb (0xff  ) or 1010G (0xfc  ) in AMD hosts
with IOMMU.

This is preparation for AMD guests with >1010G, where it will want relocate
ram-above-4g to be after 1Tb instead of 4G.

Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
---
 hw/i386/pc.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 4ebc45773c29..1e7bd549bfe9 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -879,6 +879,18 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms)
 return start;
 }
 
+static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size)
+{
+X86CPU *cpu = X86_CPU(first_cpu);
+
+/* 32-bit systems don't have hole64 thus return max CPU address */
+if (cpu->phys_bits <= 32) {
+return ((hwaddr)1 << cpu->phys_bits) - 1;
+}
+
+return pc_pci_hole64_start() + pci_hole64_size - 1;
+}
+
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
@@ -893,13 +905,28 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
+hwaddr maxphysaddr, maxusedaddr;
 hwaddr cxl_base, cxl_resv_end = 0;
+X86CPU *cpu = X86_CPU(first_cpu);
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
 
 linux_boot = (machine->kernel_filename != NULL);
 
+/*
+ * phys-bits is required to be appropriately configured
+ * to make sure max used GPA is reachable.
+ */
+maxusedaddr = pc_max_used_gpa(pcms, pci_hole64_size);
+maxphysaddr = ((hwaddr)1 << cpu->phys_bits) - 1;
+if (maxphysaddr < maxusedaddr) {
+error_report("Address space limit 0x%"PRIx64" < 0x%"PRIx64
+ " phys-bits too low (%u)",
+ maxphysaddr, maxusedaddr, cpu->phys_bits);
+exit(EXIT_FAILURE);
+}
+
 /*
  * Split single memory region and use aliases to address portions of it,
  * done for backwards compatibility with older qemus.
-- 
2.17.2

[PULL 02/29] cpus: Introduce cpu_list_generation_id

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.

cpu_list_generation_id could be used to detect changes of cpu
list, which is prepared for dirty page rate measurement.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 
<06e1f1362b2501a471dce796abb065b04f320fa5.1656177590.git.huang...@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert 
---
 cpus-common.c | 8 
 include/exec/cpu-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/cpus-common.c b/cpus-common.c
index db459b41ce..793364dc0e 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -73,6 +73,12 @@ static int cpu_get_free_index(void)
 }
 
 CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
+static unsigned int cpu_list_generation_id;
+
+unsigned int cpu_list_generation_id_get(void)
+{
+return cpu_list_generation_id;
+}
 
 void cpu_list_add(CPUState *cpu)
 {
@@ -84,6 +90,7 @@ void cpu_list_add(CPUState *cpu)
 assert(!cpu_index_auto_assigned);
 }
 QTAILQ_INSERT_TAIL_RCU(&cpus, cpu, node);
+cpu_list_generation_id++;
 }
 
 void cpu_list_remove(CPUState *cpu)
@@ -96,6 +103,7 @@ void cpu_list_remove(CPUState *cpu)
 
 QTAILQ_REMOVE_RCU(&cpus, cpu, node);
 cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+cpu_list_generation_id++;
 }
 
 CPUState *qemu_get_cpu(int index)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 5968551a05..2281be4e10 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -35,6 +35,7 @@ extern intptr_t qemu_host_page_mask;
 void qemu_init_cpu_list(void);
 void cpu_list_lock(void);
 void cpu_list_unlock(void);
+unsigned int cpu_list_generation_id_get(void);
 
 void tcg_flush_softmmu_tlb(CPUState *cs);
 
-- 
2.36.1

[PULL 13/29] migration: Postcopy recover with preempt enabled

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
Message-Id: <20220707185506.27257-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c| 27 +++
 migration/migration.h| 19 +++
 migration/postcopy-ram.c | 25 +++--
 migration/qemu-file.c| 27 +++
 migration/qemu-file.h|  1 +
 migration/savevm.c   | 26 --
 migration/trace-events   |  2 ++
 7 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index c5f0fdf8f8..3119bd2e4b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -215,9 +215,11 @@ void migration_object_init(void)
 current_incoming->postcopy_remote_fds =
 g_array_new(FALSE, TRUE, sizeof(struct PostCopyFD));
 qemu_mutex_init(¤t_incoming->rp_mutex);
+qemu_mutex_init(¤t_incoming->postcopy_prio_thread_mutex);
 qemu_event_init(¤t_incoming->main_thread_load_event, false);
 qemu_sem_init(¤t_incoming->postcopy_pause_sem_dst, 0);
 qemu_sem_init(¤t_incoming->postcopy_pause_sem_fault, 0);
+qemu_sem_init(¤t_incoming->postcopy_pause_sem_fast_load, 0);
 qemu_mutex_init(¤t_incoming->page_request_mutex);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
@@ -697,9 +699,9 @@ static bool postcopy_try_recover(void)
 
 /*
  * Here, we only wake up the main loading thread (while the
- * fault thread will still be waiting), so that we can receive
+ * rest threads will still be waiting), so that we can receive
  * commands from source now, and answer it if needed. The
- * fault thread will be woken up afterwards until we are sure
+ * rest threads will be woken up afterwards until we are sure
  * that source is ready to reply to page requests.
  */
 qemu_sem_post(&mis->postcopy_pause_sem_dst);
@@ -3503,6 +3505,18 @@ static MigThrError postcopy_pause(MigrationState *s)
 qemu_file_shutdown(file);
 qemu_fclose(file);
 
+/*
+ * Do the same to postcopy fast path socket too if there is.  No
+ * locking needed because no racer as long as we do this before setting
+ * status to paused.
+ */
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_file_shutdown(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 migrate_set_state(&s->state, s->state,
   MIGRATION_STATUS_POSTCOPY_PAUSED);
 
@@ -3558,8 +3572,13 @@ static MigThrError migration_detect_error(MigrationState 
*s)
 return MIG_THR_ERR_FATAL;
 }
 
-/* Try to detect any file errors */
-ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
+/*
+ * Try to detect any file errors.  Note that postcopy_qemufile_src will
+ * be NULL when postcopy preempt is not enabled.
+ */
+ret = qemu_file_get_error_obj_any(s->to_dst_file,
+  s->postcopy_qemufile_src,
+  &local_error);
 if (!ret) {
 /* Everything is fine */
 assert(!local_error);
diff --git a/migration/migration.h b/migration/migration.h
index ff714c235f..9220cec6bd 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -118,6 +118,18 @@ struct MigrationIncomingState {
 /* Postcopy priority thread is used to receive postcopy requested pages */
 QemuThread postcopy_prio_thread;
 bool postcopy_prio_thread_created;
+/*
+ * Used to sync between the ram load main thread and the fast ram load
+ * thread.  It protects postcopy_qemufile_dst, which is the postcopy
+ * fast channel.
+ *
+ * The ram fast load thread will take it mostly for the whole lifecycle
+ * because it needs to continuously read data from the channel, and
+ * it'll only release this mutex if postcopy is interrupted, so that
+ * the ram load main thread will take this mutex over and properly
+ * release the broken channel.
+ */
+QemuMutex postcopy_prio_thread_mutex;
 /*
  * An array of t

[PULL 00/29] migration queue

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

The following changes since commit da7da9d5e608200ecc0749ff37be246e9cd3314f:

  Merge tag 'pull-request-2022-07-19' of https://gitlab.com/thuth/qemu into 
staging (2022-07-19 13:05:06 +0100)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-migration-20220719c

for you to fetch changes up to ec0345c1000b3a57b557da4c2e3f2114dd23903a:

  migration: Avoid false-positive on non-supported scenarios for zero-copy-send 
(2022-07-19 17:33:22 +0100)


Migration pull 2022-07-19

  Hyman's dirty page rate limit set
  Ilya's fix for zlib vs migration
  Peter's postcopy-preempt
  Cleanup from Dan
  zero-copy tidy ups from Leo
  multifd doc fix from Juan

Signed-off-by: Dr. David Alan Gilbert 


Daniel P. Berrangé (1):
  migration: remove unreachable code after reading data

Hyman Huang (8):
  accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping
  cpus: Introduce cpu_list_generation_id
  migration/dirtyrate: Refactor dirty page rate calculation
  softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically
  accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function
  softmmu/dirtylimit: Implement virtual CPU throttle
  softmmu/dirtylimit: Implement dirty page rate limit
  tests: Add dirty page rate limit test

Ilya Leoshkevich (1):
  multifd: Copy pages before compressing them with zlib

Juan Quintela (1):
  multifd: Document the locking of MultiFD{Send/Recv}Params

Leonardo Bras (4):
  QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  Add dirty-sync-missed-zero-copy migration stat
  migration/multifd: Report to user when zerocopy not working
  migration: Avoid false-positive on non-supported scenarios for 
zero-copy-send

Peter Xu (14):
  migration: Add postcopy-preempt capability
  migration: Postcopy preemption preparation on channel creation
  migration: Postcopy preemption enablement
  migration: Postcopy recover with preempt enabled
  migration: Create the postcopy preempt channel asynchronously
  migration: Add property x-postcopy-preempt-break-huge
  migration: Add helpers to detect TLS capability
  migration: Export tls-[creds|hostname|authz] params to cmdline too
  migration: Enable TLS for preempt channel
  migration: Respect postcopy request order in preemption mode
  tests: Move MigrateCommon upper
  tests: Add postcopy tls migration test
  tests: Add postcopy tls recovery migration test
  tests: Add postcopy preempt tests

 accel/kvm/kvm-all.c |  46 ++-
 accel/stubs/kvm-stub.c  |   5 +
 cpus-common.c   |   8 +
 hmp-commands-info.hx|  13 +
 hmp-commands.hx |  32 +++
 include/exec/cpu-common.h   |   1 +
 include/exec/memory.h   |   5 +-
 include/hw/core/cpu.h   |   6 +
 include/monitor/hmp.h   |   3 +
 include/sysemu/dirtylimit.h |  37 +++
 include/sysemu/dirtyrate.h  |  28 ++
 include/sysemu/kvm.h|   2 +
 io/channel-socket.c |   8 +-
 migration/channel.c |   9 +-
 migration/dirtyrate.c   | 227 +--
 migration/dirtyrate.h   |   7 +-
 migration/migration.c   | 152 --
 migration/migration.h   |  44 ++-
 migration/multifd-zlib.c|  38 ++-
 migration/multifd.c |   6 +-
 migration/multifd.h |  66 +++--
 migration/postcopy-ram.c| 186 -
 migration/postcopy-ram.h|  11 +
 migration/qemu-file.c   |  31 ++-
 migration/qemu-file.h   |   1 +
 migration/ram.c | 331 --
 migration/ram.h |   6 +-
 migration/savevm.c  |  46 ++-
 migration/socket.c  |  22 +-
 migration/socket.h  |   1 +
 migration/tls.c |   9 +
 migration/tls.h |   4 +
 migration/trace-events  |  15 +-
 monitor/hmp-cmds.c  |   5 +
 qapi/migration.json |  94 ++-
 softmmu/dirtylimit.c| 601 
 softmmu/meson.build |   1 +
 softmmu/trace-events|   7 +
 tests/qtest/migration-helpers.c |  22 ++
 tests/qtest/migration-helpers.h |   2 +
 tests/qtest/migration-test.c| 539 +--
 tests/qtest/qmp-cmd-test.c  |   2 +
 42 files changed, 2394 insertions(+), 285 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h
 create mode 100644 include/sysemu/dirtyrate.h
 create mode 100644 softmmu/dirtylimit.c

[PATCH v9 11/11] i386/pc: restrict AMD only enforcing of 1Tb hole to new machine type

2022-07-19 Thread Joao Martins

The added enforcing is only relevant in the case of AMD where the
range right before the 1TB is restricted and cannot be DMA mapped
by the kernel consequently leading to IOMMU INVALID_DEVICE_REQUEST
or possibly other kinds of IOMMU events in the AMD IOMMU.

Although, there's a case where it may make sense to disable the
IOVA relocation/validation when migrating from a
non-amd-1tb-aware qemu to one that supports it.

Relocating RAM regions to after the 1Tb hole has consequences for
guest ABI because we are changing the memory mapping, so make
sure that only new machine enforce but not older ones.

Signed-off-by: Joao Martins 
Acked-by: Dr. David Alan Gilbert 
Acked-by: Igor Mammedov 
---
 hw/i386/pc.c | 6 --
 hw/i386/pc_piix.c| 2 ++
 hw/i386/pc_q35.c | 2 ++
 include/hw/i386/pc.h | 1 +
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index fc2c7655afa0..4518f3c54680 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -951,9 +951,10 @@ void pc_memory_init(PCMachineState *pcms,
 /*
  * The HyperTransport range close to the 1T boundary is unique to AMD
  * hosts with IOMMUs enabled. Restrict the ram-above-4g relocation
- * to above 1T to AMD vCPUs only.
+ * to above 1T to AMD vCPUs only. @enforce_amd_1tb_hole is only false in
+ * older machine types (<= 7.0) for compatibility purposes.
  */
-if (IS_AMD_CPU(&cpu->env)) {
+if (IS_AMD_CPU(&cpu->env) && pcmc->enforce_amd_1tb_hole) {
 /* Bail out if max possible address does not cross HT range */
 if (pc_max_used_gpa(pcms, pci_hole64_size) >= AMD_HT_START) {
 x86ms->above_4g_mem_start = AMD_ABOVE_1TB_START;
@@ -1902,6 +1903,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 pcmc->has_reserved_memory = true;
 pcmc->kvmclock_enabled = true;
 pcmc->enforce_aligned_dimm = true;
+pcmc->enforce_amd_1tb_hole = true;
 /* BIOS ACPI tables: 128K. Other BIOS datastructures: less than 4K reported
  * to be used at the moment, 32K should be enough for a while.  */
 pcmc->acpi_data_size = 0x2 + 0x8000;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2a483e8666b4..074571bc03a8 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -446,9 +446,11 @@ DEFINE_I440FX_MACHINE(v7_1, "pc-i440fx-7.1", NULL,
 
 static void pc_i440fx_7_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_7_1_machine_options(m);
 m->alias = NULL;
 m->is_default = false;
+pcmc->enforce_amd_1tb_hole = false;
 compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len);
 compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 99ed75371c67..f3aa4694a299 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -383,8 +383,10 @@ DEFINE_Q35_MACHINE(v7_1, "pc-q35-7.1", NULL,
 
 static void pc_q35_7_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_7_1_machine_options(m);
 m->alias = NULL;
+pcmc->enforce_amd_1tb_hole = false;
 compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len);
 compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
 }
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 568c226d3034..9cc3f5d33805 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -118,6 +118,7 @@ struct PCMachineClass {
 bool has_reserved_memory;
 bool enforce_aligned_dimm;
 bool broken_reserved_end;
+bool enforce_amd_1tb_hole;
 
 /* generate legacy CPU hotplug AML */
 bool legacy_cpu_hotplug;
-- 
2.17.2

[PULL 01/29] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Hyman Huang(黄勇) 

Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.

Signed-off-by: Hyman Huang(黄勇) 
Reviewed-by: Peter Xu 
Message-Id: 

Signed-off-by: Dr. David Alan Gilbert 
---
 accel/kvm/kvm-all.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ed8b6b896e..ce989a68ff 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -757,17 +757,20 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
 }
 
 /* Must be with slots_lock held */
-static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, CPUState* cpu)
 {
 int ret;
-CPUState *cpu;
 uint64_t total = 0;
 int64_t stamp;
 
 stamp = get_clock();
 
-CPU_FOREACH(cpu) {
-total += kvm_dirty_ring_reap_one(s, cpu);
+if (cpu) {
+total = kvm_dirty_ring_reap_one(s, cpu);
+} else {
+CPU_FOREACH(cpu) {
+total += kvm_dirty_ring_reap_one(s, cpu);
+}
 }
 
 if (total) {
@@ -788,7 +791,7 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
  * Currently for simplicity, we must hold BQL before calling this.  We can
  * consider to drop the BQL if we're clear with all the race conditions.
  */
-static uint64_t kvm_dirty_ring_reap(KVMState *s)
+static uint64_t kvm_dirty_ring_reap(KVMState *s, CPUState *cpu)
 {
 uint64_t total;
 
@@ -808,7 +811,7 @@ static uint64_t kvm_dirty_ring_reap(KVMState *s)
  * reset below.
  */
 kvm_slots_lock();
-total = kvm_dirty_ring_reap_locked(s);
+total = kvm_dirty_ring_reap_locked(s, cpu);
 kvm_slots_unlock();
 
 return total;
@@ -855,7 +858,7 @@ static void kvm_dirty_ring_flush(void)
  * vcpus out in a synchronous way.
  */
 kvm_cpu_synchronize_kick_all();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 trace_kvm_dirty_ring_flush(1);
 }
 
@@ -1399,7 +1402,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  * Not easy.  Let's cross the fingers until it's fixed.
  */
 if (kvm_state->kvm_dirty_ring_size) {
-kvm_dirty_ring_reap_locked(kvm_state);
+kvm_dirty_ring_reap_locked(kvm_state, NULL);
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem);
 }
@@ -1471,7 +1474,7 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
 r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
 
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(s);
+kvm_dirty_ring_reap(s, NULL);
 qemu_mutex_unlock_iothread();
 
 r->reaper_iteration++;
@@ -2967,7 +2970,7 @@ int kvm_cpu_exec(CPUState *cpu)
  */
 trace_kvm_dirty_ring_full(cpu->cpu_index);
 qemu_mutex_lock_iothread();
-kvm_dirty_ring_reap(kvm_state);
+kvm_dirty_ring_reap(kvm_state, NULL);
 qemu_mutex_unlock_iothread();
 ret = 0;
 break;
-- 
2.36.1

[PULL 11/29] migration: Postcopy preemption preparation on channel creation

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Peter Xu 

Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.

A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.

The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.

Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.

Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Juan Quintela 
Signed-off-by: Peter Xu 
Message-Id: <20220707185502.27149-1-pet...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: With Peter's fix to quieten compiler warning on
   start_migration
---
 migration/migration.c| 63 +++
 migration/migration.h|  8 
 migration/postcopy-ram.c | 92 ++--
 migration/postcopy-ram.h | 10 +
 migration/ram.c  | 25 ---
 migration/ram.h  |  4 +-
 migration/savevm.c   | 20 -
 migration/socket.c   | 22 +-
 migration/socket.h   |  1 +
 migration/trace-events   |  5 ++-
 10 files changed, 219 insertions(+), 31 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ce7bb68cdc..c965cae1d4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -321,6 +321,12 @@ void migration_incoming_state_destroy(void)
 mis->page_requested = NULL;
 }
 
+if (mis->postcopy_qemufile_dst) {
+migration_ioc_unregister_yank_from_file(mis->postcopy_qemufile_dst);
+qemu_fclose(mis->postcopy_qemufile_dst);
+mis->postcopy_qemufile_dst = NULL;
+}
+
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
@@ -714,15 +720,21 @@ void migration_fd_process_incoming(QEMUFile *f, Error 
**errp)
 migration_incoming_process();
 }
 
+static bool migration_needs_multiple_sockets(void)
+{
+return migrate_use_multifd() || migrate_postcopy_preempt();
+}
+
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 Error *local_err = NULL;
 bool start_migration;
+QEMUFile *f;
 
 if (!mis->from_src_file) {
 /* The first connection (multifd may have multiple) */
-QEMUFile *f = qemu_file_new_input(ioc);
+f = qemu_file_new_input(ioc);
 
 if (!migration_incoming_setup(f, errp)) {
 return;
@@ -730,13 +742,19 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 
 /*
  * Common migration only needs one channel, so we can start
- * right now.  Multifd needs more than one channel, we wait.
+ * right now.  Some features need more than one channel, we wait.
  */
-start_migration = !migrate_use_multifd();
+start_migration = !migration_needs_multiple_sockets();
 } else {
 /* Multiple connections */
-assert(migrate_use_multifd());
-start_migration = multifd_recv_new_channel(ioc, &local_err);
+assert(migration_needs_multiple_sockets());
+if (migrate_use_multifd()) {
+start_migration = multifd_recv_new_channel(ioc, &local_err);
+} else {
+assert(migrate_postcopy_preempt());
+f = qemu_file_new_input(ioc);
+start_migration = postcopy_preempt_new_channel(mis, f);
+}
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -761,11 +779,20 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
 bool migration_has_all_channels(void)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
-bool all_channels;
 
-all_channels = multifd_recv_all_channels_created();
+if (!mis->from_src_file) {
+return false;
+}
+
+if (migrate_use_multifd()) {
+return multifd_recv_all_channels_created();
+}
+
+if (migrate_postcopy_preempt()) {
+return mis->postcopy_qemufile_dst != NULL;
+}
 
-return all_channels && mis->from_src_file != NULL;
+return true;
 }
 
 /*
@@ -1874,6 +1901,12 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_fclose(tmp);
 }
 
+if (s->postcopy_qemufile_src) {
+migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
+qemu_fclose(s->postcopy_qemufile_src);
+s->postcopy_qemufile_src = NULL;
+}
+
 assert(!migration_is_active(s));
 
 if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -3269,6 +3302,11 @@ static void migration_completion(MigrationState *s)
 qemu_savevm_state_complete_postcopy(s->to_dst_file);
 qemu_mutex_unlock_iothread();
 
+/* Shutdown the postcopy fast path thread */
+if (migrate_postcopy_preempt()) {
+postcopy_preempt_shutdown_file(s);
+}
+

[PATCH v9 10/11] i386/pc: relocate 4g start to 1T where applicable

2022-07-19 Thread Joao Martins

It is assumed that the whole GPA space is available to be DMA
addressable, within a given address space limit, except for a
tiny region before the 4G. Since Linux v5.4, VFIO validates
whether the selected GPA is indeed valid i.e. not reserved by
IOMMU on behalf of some specific devices or platform-defined
restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
 -EINVAL.

AMD systems with an IOMMU are examples of such platforms and
particularly may only have these ranges as allowed:

 - fedf (0  .. 3.982G)
fef0 - 00fc (3.983G .. 1011.9G)
0100 -  (1Tb.. 16Pb[*])

We already account for the 4G hole, albeit if the guest is big
enough we will fail to allocate a guest with  >1010G due to the
~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).

[*] there is another reserved region unrelated to HT that exists
in the 256T boundary in Fam 17h according to Errata #1286,
documeted also in "Open-Source Register Reference for AMD Family
17h Processors (PUB)"

When creating the region above 4G, take into account that on AMD
platforms the HyperTransport range is reserved and hence it
cannot be used either as GPAs. On those cases rather than
establishing the start of ram-above-4g to be 4G, relocate instead
to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical
Topology", for more information on the underlying restriction of
IOVAs.

After accounting for the 1Tb hole on AMD hosts, mtree should
look like:

-7fff (prio 0, i/o):
 alias ram-below-4g @pc.ram -7fff
0100-01ff7fff (prio 0, i/o):
alias ram-above-4g @pc.ram 8000-00ff

If the relocation is done or the address space covers it, we
also add the the reserved HT e820 range as reserved.

Default phys-bits on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough
to address 1Tb (0xff  ). On AMD platforms, if a
ram-above-4g relocation is attempted and the CPU wasn't configured
with a big enough phys-bits, an error message will be printed
due to the maxphysaddr vs maxusedaddr check previously added.

Suggested-by: Igor Mammedov 
Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
---
 hw/i386/pc.c | 54 
 1 file changed, 54 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1e7bd549bfe9..fc2c7655afa0 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -891,6 +891,40 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, 
uint64_t pci_hole64_size)
 return pc_pci_hole64_start() + pci_hole64_size - 1;
 }
 
+/*
+ * AMD systems with an IOMMU have an additional hole close to the
+ * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
+ * on kernel version, VFIO may or may not let you DMA map those ranges.
+ * Starting Linux v5.4 we validate it, and can't create guests on AMD machines
+ * with certain memory sizes. It's also wrong to use those IOVA ranges
+ * in detriment of leading to IOMMU INVALID_DEVICE_REQUEST or worse.
+ * The ranges reserved for Hyper-Transport are:
+ *
+ * FD__h - FF__h
+ *
+ * The ranges represent the following:
+ *
+ * Base Address   Top Address  Use
+ *
+ * FD__h FD_F7FF_h Reserved interrupt address space
+ * FD_F800_h FD_F8FF_h Interrupt/EOI IntCtl
+ * FD_F900_h FD_F90F_h Legacy PIC IACK
+ * FD_F910_h FD_F91F_h System Management
+ * FD_F920_h FD_FAFF_h Reserved Page Tables
+ * FD_FB00_h FD_FBFF_h Address Translation
+ * FD_FC00_h FD_FDFF_h I/O Space
+ * FD_FE00_h FD__h Configuration
+ * FE__h FE_1FFF_h Extended Configuration/Device Messages
+ * FE_2000_h FF__h Reserved
+ *
+ * See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology",
+ * Table 3: Special Address Controls (GPA) for more information.
+ */
+#define AMD_HT_START 0xfdUL
+#define AMD_HT_END   0xffUL
+#define AMD_ABOVE_1TB_START  (AMD_HT_END + 1)
+#define AMD_HT_SIZE  (AMD_ABOVE_1TB_START - AMD_HT_START)
+
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
@@ -914,6 +948,26 @@ void pc_memory_init(PCMachineState *pcms,
 
 linux_boot = (machine->kernel_filename != NULL);
 
+/*
+ * The HyperTransport range close to the 1T boundary is unique to AMD
+ * hosts with IOMMUs enabled. Restrict the ram-above-4g relocation
+ * to above 1T to AMD vCPUs only.
+ */
+if (IS_AMD_CPU(&cpu->env)) {
+/* Bail out if max possible address does not cross HT range */
+if (pc_max_used_gpa(pcms, pci_hole64_size) >= AMD_HT_START) {
+x86ms->above_4g_mem_start = AMD_ABOVE_1TB_START;
+}
+
+/*
+ * Advertise the HT region if address space covers the reserved
+ * region or if we relocate

[PATCH v9 00/11] i386/pc: Fix creation of >= 1010G guests on AMD systems with IOMMU

2022-07-19 Thread Joao Martins

v8[9] -> v9:

* Move wrongfully placed hunk from patch 6 into patch 4 (error only in v8
despite end result being same) (Igor Mammedov)
* Remove stray new line from patch 8 (Igor Mammedov)
* Add Acked-by in patches 5, 6, 8, 9, 10 (Igor Mammedov)
  (only patch 7 is missing acks/rb)

Note: This series builds on top of Jonathan Cameron's CXL cleanups
(https://lore.kernel.org/qemu-devel/20220701132300.2264-1-jonathan.came...@huawei.com/).

---

This series lets Qemu spawn i386 guests with >= 1010G with VFIO,
particularly when running on AMD systems with an IOMMU.

Since Linux v5.4, VFIO validates whether the IOVA in DMA_MAP ioctl is valid and 
it
will return -EINVAL on those cases. On x86, Intel hosts aren't particularly
affected by this extra validation. But AMD systems with IOMMU have a hole in
the 1TB boundary which is *reserved* for HyperTransport I/O addresses located
here: FD__h - FF__h. See IOMMU manual [1], specifically
section '2.1.2 IOMMU Logical Topology', Table 3 on what those addresses mean.

VFIO DMA_MAP calls in this IOVA address range fall through this check and hence 
return
 -EINVAL, consequently failing the creation the guests bigger than 1010G. 
Example
of the failure:

qemu-system-x86_64: -device vfio-pci,host=:41:10.1,bootindex=-1: 
VFIO_MAP_DMA: -22
qemu-system-x86_64: -device vfio-pci,host=:41:10.1,bootindex=-1: vfio 
:41:10.1: 
failed to setup container for group 258: memory listener initialization 
failed:
Region pc.ram: vfio_dma_map(0x55ba53e7a9d0, 0x1, 
0xff3000, 0x7ed243e0) = -22 (Invalid argument)

Prior to v5.4, we could map to these IOVAs *but* that's still not the right 
thing
to do and could trigger certain IOMMU events (e.g. INVALID_DEVICE_REQUEST), or
spurious guest VF failures from the resultant IOMMU target abort (see Errata 
1155[2])
as documented on the links down below.

This small series tries to address that by dealing with this AMD-specific 1Tb 
hole,
but rather than dealing like the 4G hole, it instead relocates RAM above 4G
to be above the 1T if the maximum RAM range crosses the HT reserved range.
It is organized as following:

patch 1: Introduce a @above_4g_mem_start which defaults to 4 GiB as starting
 address of the 4G boundary

patches 2-3: Move pci-host qdev creation to be before pc_memory_init(),
 to get accessing to pci_hole64_size. The actual pci-host
 initialization is kept as is, only the qdev_new.

patch 4: Small deduplication cleanup that was spread around pc

patches 5-8: Make pc_pci_hole64_start() be callable before pc_memory_init()
 initializes any memory regions. This way, the returned value
 is consistent and we don't need to duplicate same said
 calculations when detecting the relocation is needed.

patch 9: Errors out if the phys-bits is too low compared to the max GPA
that gets calculated. This is preparation for the next patch, albeit it
is made generic given it's applicability to any configuration.

patch 10: Change @above_4g_mem_start to 1TiB /if we are on AMD and the max
possible address acrosses the HT region. 

patch 11: Ensure valid IOVAs only on new machine types, but not older
ones (<= v7.0.0)

The 'consequence' of this approach is that we may need more than the default
phys-bits e.g. a guest with >1010G, will have most of its RAM after the 1TB
address, consequently needing 41 phys-bits as opposed to the default of 40
(TCG_PHYS_ADDR_BITS). Today there's already a precedent to depend on the user to
pick the right value of phys-bits (regardless of this series), so we warn in
case phys-bits aren't enough. Finally, CMOS loosing its meaning of the above 4G
ram blocks, but it was mentioned over RFC that CMOS is only useful for very
old seabios. 

Additionally, the reserved region is added to E820 if the relocation is done
or if the phys-bits can cover it.

Alternative options considered (in RFC[0]):

a) Dealing with the 1T hole like the 4G hole -- which also represents what
hardware closely does.

Thanks,
Joao

Older Changelog,

v7[8] -> v8[9]:

* restructure the relocate patch and separate the phys-bits check into being
a predecessor patch. new patch 9 (igor mammedov)
* rework comment on phys-bits check to not mention relocation since it's
now generic. (igor mammedov)

note: this series builds on top of jonathan cameron's cxl cleanups
(https://lore.kernel.org/qemu-devel/20220701132300.2264-1-jonathan.came...@huawei.com/).

v6[7] -> v7[8]:

* Rebased to latest staging
* Build on top of apply CXL cleanups (Igor Mammedov)
* Use qdev property rather introducing new acessors to the i440fx pci-host 
(Bernhard Beschow)
* Add Igor's Rb to patch 4 (Igor Mammedov)
* Replace pci_hole64_start() related helper functions rather than coexisting 
with MR variant
code in patches 4-8. This removes unneeded code that no longer needs to be tied 
to MR (Igor Mammedov)
* Replace MR with memory region in the whole series (I

[PATCH v9 06/11] i386/pc: factor out cxl range start to helper

2022-07-19 Thread Joao Martins

Factor out the calculation of the base address of the memory region.
It will be used later on for the cxl range end counterpart calculation
and as well in pc_memory_init() CXL memory region initialization, thus
avoiding duplication.

Cc: Jonathan Cameron 
Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
---
 hw/i386/pc.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3fc3e985086a..3fdcab4bb4f3 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -825,6 +825,22 @@ static hwaddr pc_above_4g_end(PCMachineState *pcms)
 return x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
+static uint64_t pc_get_cxl_range_start(PCMachineState *pcms)
+{
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+MachineState *machine = MACHINE(pcms);
+hwaddr cxl_base;
+
+if (pcmc->has_reserved_memory && machine->device_memory->base) {
+cxl_base = machine->device_memory->base
++ memory_region_size(&machine->device_memory->mr);
+} else {
+cxl_base = pc_above_4g_end(pcms);
+}
+
+return cxl_base;
+}
+
 static uint64_t pc_get_cxl_range_end(PCMachineState *pcms)
 {
 uint64_t start = 0;
@@ -946,13 +962,7 @@ void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *mr = &pcms->cxl_devices_state.host_mr;
 hwaddr cxl_size = MiB;
 
-if (pcmc->has_reserved_memory && machine->device_memory->base) {
-cxl_base = machine->device_memory->base
-+ memory_region_size(&machine->device_memory->mr);
-} else {
-cxl_base = pc_above_4g_end(pcms);
-}
-
+cxl_base = pc_get_cxl_range_start(pcms);
 e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
 memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
 memory_region_add_subregion(system_memory, cxl_base, mr);
-- 
2.17.2

[PULL 09/29] multifd: Copy pages before compressing them with zlib

2022-07-19 Thread Dr. David Alan Gilbert (git)

From: Ilya Leoshkevich 

zlib_send_prepare() compresses pages of a running VM. zlib does not
make any thread-safety guarantees with respect to changing deflate()
input concurrently with deflate() [1].

One can observe problems due to this with the IBM zEnterprise Data
Compression accelerator capable zlib [2]. When the hardware
acceleration is enabled, migration/multifd/tcp/plain/zlib test fails
intermittently [3] due to sliding window corruption. The accelerator's
architecture explicitly discourages concurrent accesses [4]:

Page 26-57, "Other Conditions":

As observed by this CPU, other CPUs, and channel
programs, references to the parameter block, first,
second, and third operands may be multiple-access
references, accesses to these storage locations are
not necessarily block-concurrent, and the sequence
of these accesses or references is undefined.

Mark Adler pointed out that vanilla zlib performs double fetches under
certain circumstances as well [5], therefore we need to copy data
before passing it to deflate().

[1] https://zlib.net/manual.html
[2] https://github.com/madler/zlib/pull/410
[3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
[4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
[5] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00889.html

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20220705203559.2960949-1-...@linux.ibm.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 migration/multifd-zlib.c | 38 ++
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 3a7ae44485..18213a9513 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -27,6 +27,8 @@ struct zlib_data {
 uint8_t *zbuff;
 /* size of compressed buffer */
 uint32_t zbuff_len;
+/* uncompressed buffer of size qemu_target_page_size() */
+uint8_t *buf;
 };
 
 /* Multifd zlib compression */
@@ -45,26 +47,38 @@ static int zlib_send_setup(MultiFDSendParams *p, Error 
**errp)
 {
 struct zlib_data *z = g_new0(struct zlib_data, 1);
 z_stream *zs = &z->zs;
+const char *err_msg;
 
 zs->zalloc = Z_NULL;
 zs->zfree = Z_NULL;
 zs->opaque = Z_NULL;
 if (deflateInit(zs, migrate_multifd_zlib_level()) != Z_OK) {
-g_free(z);
-error_setg(errp, "multifd %u: deflate init failed", p->id);
-return -1;
+err_msg = "deflate init failed";
+goto err_free_z;
 }
 /* This is the maxium size of the compressed buffer */
 z->zbuff_len = compressBound(MULTIFD_PACKET_SIZE);
 z->zbuff = g_try_malloc(z->zbuff_len);
 if (!z->zbuff) {
-deflateEnd(&z->zs);
-g_free(z);
-error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
-return -1;
+err_msg = "out of memory for zbuff";
+goto err_deflate_end;
+}
+z->buf = g_try_malloc(qemu_target_page_size());
+if (!z->buf) {
+err_msg = "out of memory for buf";
+goto err_free_zbuff;
 }
 p->data = z;
 return 0;
+
+err_free_zbuff:
+g_free(z->zbuff);
+err_deflate_end:
+deflateEnd(&z->zs);
+err_free_z:
+g_free(z);
+error_setg(errp, "multifd %u: %s", p->id, err_msg);
+return -1;
 }
 
 /**
@@ -82,6 +96,8 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
 deflateEnd(&z->zs);
 g_free(z->zbuff);
 z->zbuff = NULL;
+g_free(z->buf);
+z->buf = NULL;
 g_free(p->data);
 p->data = NULL;
 }
@@ -114,8 +130,14 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 flush = Z_SYNC_FLUSH;
 }
 
+/*
+ * Since the VM might be running, the page may be changing concurrently
+ * with compression. zlib does not guarantee that this is safe,
+ * therefore copy the page before calling deflate().
+ */
+memcpy(z->buf, p->pages->block->host + p->normal[i], page_size);
 zs->avail_in = page_size;
-zs->next_in = p->pages->block->host + p->normal[i];
+zs->next_in = z->buf;
 
 zs->avail_out = available;
 zs->next_out = z->zbuff + out_size;
-- 
2.36.1

[PATCH v9 03/11] i386/pc: pass pci_hole64_size to pc_memory_init()

2022-07-19 Thread Joao Martins

Use the pre-initialized pci-host qdev and fetch the
pci-hole64-size into pc_memory_init() newly added argument.
Use PCI_HOST_PROP_PCI_HOLE64_SIZE pci-host property for
fetching pci-hole64-size.

This is in preparation to determine that host-phys-bits are
enough and for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).

Signed-off-by: Joao Martins 
Reviewed-by: Igor Mammedov 
---
 hw/i386/pc.c |  3 ++-
 hw/i386/pc_piix.c|  7 ++-
 hw/i386/pc_q35.c | 10 +-
 include/hw/i386/pc.h |  3 ++-
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1660684d12fd..e952dc62a12e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -817,7 +817,8 @@ void xen_load_linux(PCMachineState *pcms)
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
-MemoryRegion **ram_memory)
+MemoryRegion **ram_memory,
+uint64_t pci_hole64_size)
 {
 int linux_boot, i;
 MemoryRegion *option_rom_mr;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 6186a1473755..2a483e8666b4 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -91,6 +91,7 @@ static void pc_init1(MachineState *machine,
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 ram_addr_t lowmem;
+uint64_t hole64_size;
 DeviceState *i440fx_host;
 
 /*
@@ -166,10 +167,14 @@ static void pc_init1(MachineState *machine,
 memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
 rom_memory = pci_memory;
 i440fx_host = qdev_new(host_type);
+hole64_size = object_property_get_uint(OBJECT(i440fx_host),
+   PCI_HOST_PROP_PCI_HOLE64_SIZE,
+   &error_abort);
 } else {
 pci_memory = NULL;
 rom_memory = system_memory;
 i440fx_host = NULL;
+hole64_size = 0;
 }
 
 pc_guest_info_init(pcms);
@@ -186,7 +191,7 @@ static void pc_init1(MachineState *machine,
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
 pc_memory_init(pcms, system_memory,
-   rom_memory, &ram_memory);
+   rom_memory, &ram_memory, hole64_size);
 } else {
 pc_system_flash_cleanup_unused(pcms);
 if (machine->kernel_filename != NULL) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 46ea89e564de..99ed75371c67 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -138,6 +138,7 @@ static void pc_q35_init(MachineState *machine)
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 bool acpi_pcihp;
 bool keep_pci_slot_hpc;
+uint64_t pci_hole64_size = 0;
 
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
  * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
@@ -206,8 +207,15 @@ static void pc_q35_init(MachineState *machine)
 /* create pci host bus */
 q35_host = Q35_HOST_DEVICE(qdev_new(TYPE_Q35_HOST_DEVICE));
 
+if (pcmc->pci_enabled) {
+pci_hole64_size = object_property_get_uint(OBJECT(q35_host),
+   
PCI_HOST_PROP_PCI_HOLE64_SIZE,
+   &error_abort);
+}
+
 /* allocate ram and load rom/bios */
-pc_memory_init(pcms, get_system_memory(), rom_memory, &ram_memory);
+pc_memory_init(pcms, get_system_memory(), rom_memory, &ram_memory,
+   pci_hole64_size);
 
 object_property_add_child(qdev_get_machine(), "q35", OBJECT(q35_host));
 object_property_set_link(OBJECT(q35_host), MCH_HOST_PROP_RAM_MEM,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index b7735dccfc81..568c226d3034 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -159,7 +159,8 @@ void xen_load_linux(PCMachineState *pcms);
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
-MemoryRegion **ram_memory);
+MemoryRegion **ram_memory,
+uint64_t pci_hole64_size);
 uint64_t pc_pci_hole64_start(void);
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
 void pc_basic_device_init(struct PCMachineState *pcms,
-- 
2.17.2

[PATCH v9 05/11] i386/pc: factor out cxl range end to helper

2022-07-19 Thread Joao Martins

Move calculation of CXL memory region end to separate helper.

This is in preparation to a future change that removes CXL range
dependency on the CXL memory region, with the goal of allowing
pc_pci_hole64_start() to be called before any memory region are
initialized.

Cc: Jonathan Cameron 
Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
---
 hw/i386/pc.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 6c898a86cb89..3fc3e985086a 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -825,6 +825,25 @@ static hwaddr pc_above_4g_end(PCMachineState *pcms)
 return x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
+static uint64_t pc_get_cxl_range_end(PCMachineState *pcms)
+{
+uint64_t start = 0;
+
+if (pcms->cxl_devices_state.host_mr.addr) {
+start = pcms->cxl_devices_state.host_mr.addr +
+memory_region_size(&pcms->cxl_devices_state.host_mr);
+if (pcms->cxl_devices_state.fixed_windows) {
+GList *it;
+for (it = pcms->cxl_devices_state.fixed_windows; it; it = 
it->next) {
+CXLFixedWindow *fw = it->data;
+start = fw->mr.addr + memory_region_size(&fw->mr);
+}
+}
+}
+
+return start;
+}
+
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
@@ -1020,16 +1039,8 @@ uint64_t pc_pci_hole64_start(void)
 MachineState *ms = MACHINE(pcms);
 uint64_t hole64_start = 0;
 
-if (pcms->cxl_devices_state.host_mr.addr) {
-hole64_start = pcms->cxl_devices_state.host_mr.addr +
-memory_region_size(&pcms->cxl_devices_state.host_mr);
-if (pcms->cxl_devices_state.fixed_windows) {
-GList *it;
-for (it = pcms->cxl_devices_state.fixed_windows; it; it = 
it->next) {
-CXLFixedWindow *fw = it->data;
-hole64_start = fw->mr.addr + memory_region_size(&fw->mr);
-}
-}
+if (pcms->cxl_devices_state.is_enabled) {
+hole64_start = pc_get_cxl_range_end(pcms);
 } else if (pcmc->has_reserved_memory && ms->device_memory->base) {
 hole64_start = ms->device_memory->base;
 if (!pcmc->broken_reserved_end) {
-- 
2.17.2

[PATCH v9 01/11] hw/i386: add 4g boundary start to X86MachineState

2022-07-19 Thread Joao Martins

Rather than hardcoding the 4G boundary everywhere, introduce a
X86MachineState field @above_4g_mem_start and use it
accordingly.

This is in preparation for relocating ram-above-4g to be
dynamically start at 1T on AMD platforms.

Signed-off-by: Joao Martins 
Reviewed-by: Igor Mammedov 
---
 hw/i386/acpi-build.c  |  2 +-
 hw/i386/pc.c  | 11 ++-
 hw/i386/sgx.c |  2 +-
 hw/i386/x86.c |  1 +
 include/hw/i386/x86.h |  3 +++
 5 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index cad6f5ac41e9..0355bd3ddaad 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2024,7 +2024,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 build_srat_memory(table_data, mem_base, mem_len, i - 1,
   MEM_AFFINITY_ENABLED);
 }
-mem_base = 1ULL << 32;
+mem_base = x86ms->above_4g_mem_start;
 mem_len = next_base - x86ms->below_4g_mem_size;
 next_base = mem_base + mem_len;
 }
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8d68295fdaff..1660684d12fd 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -850,9 +850,10 @@ void pc_memory_init(PCMachineState *pcms,
  machine->ram,
  x86ms->below_4g_mem_size,
  x86ms->above_4g_mem_size);
-memory_region_add_subregion(system_memory, 0x1ULL,
+memory_region_add_subregion(system_memory, x86ms->above_4g_mem_start,
 ram_above_4g);
-e820_add_entry(0x1ULL, x86ms->above_4g_mem_size, E820_RAM);
+e820_add_entry(x86ms->above_4g_mem_start, x86ms->above_4g_mem_size,
+   E820_RAM);
 }
 
 if (pcms->sgx_epc.size != 0) {
@@ -893,7 +894,7 @@ void pc_memory_init(PCMachineState *pcms,
 machine->device_memory->base = 
sgx_epc_above_4g_end(&pcms->sgx_epc);
 } else {
 machine->device_memory->base =
-0x1ULL + x86ms->above_4g_mem_size;
+x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
 machine->device_memory->base =
@@ -927,7 +928,7 @@ void pc_memory_init(PCMachineState *pcms,
 } else if (pcms->sgx_epc.size != 0) {
 cxl_base = sgx_epc_above_4g_end(&pcms->sgx_epc);
 } else {
-cxl_base = 0x1ULL + x86ms->above_4g_mem_size;
+cxl_base = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
 e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
@@ -1035,7 +1036,7 @@ uint64_t pc_pci_hole64_start(void)
 } else if (pcms->sgx_epc.size != 0) {
 hole64_start = sgx_epc_above_4g_end(&pcms->sgx_epc);
 } else {
-hole64_start = 0x1ULL + x86ms->above_4g_mem_size;
+hole64_start = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
 return ROUND_UP(hole64_start, 1 * GiB);
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index a44d66ba2afc..09d9c7c73d9f 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -295,7 +295,7 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
 return;
 }
 
-sgx_epc->base = 0x1ULL + x86ms->above_4g_mem_size;
+sgx_epc->base = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 
 memory_region_init(&sgx_epc->mr, OBJECT(pcms), "sgx-epc", UINT64_MAX);
 memory_region_add_subregion(get_system_memory(), sgx_epc->base,
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 6003b4b2dfea..029264c54fe2 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1373,6 +1373,7 @@ static void x86_machine_initfn(Object *obj)
 x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
 x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
 x86ms->bus_lock_ratelimit = 0;
+x86ms->above_4g_mem_start = 4 * GiB;
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 9089bdd99c3a..df82c5fd4252 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -56,6 +56,9 @@ struct X86MachineState {
 /* RAM information (sizes, addresses, configuration): */
 ram_addr_t below_4g_mem_size, above_4g_mem_size;
 
+/* Start address of the initial RAM above 4G */
+uint64_t above_4g_mem_start;
+
 /* CPU and apic information: */
 bool apic_xrupt_override;
 unsigned pci_irq_mask;
-- 
2.17.2

[PATCH v9 02/11] i386/pc: create pci-host qdev prior to pc_memory_init()

2022-07-19 Thread Joao Martins

At the start of pc_memory_init() we usually pass a range of
0..UINT64_MAX as pci_memory, when really its 2G (i440fx) or
32G (q35). To get the real user value, we need to get pci-host
passed property for default pci_hole64_size. Thus to get that,
create the qdev prior to memory init to better make estimations
on max used/phys addr.

This is in preparation to determine that host-phys-bits are
enough and also for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).

Signed-off-by: Joao Martins 
Reviewed-by: Igor Mammedov 
---
 hw/i386/pc_piix.c| 7 +--
 hw/i386/pc_q35.c | 6 +++---
 hw/pci-host/i440fx.c | 5 ++---
 include/hw/pci-host/i440fx.h | 3 ++-
 4 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index a234989ac363..6186a1473755 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -91,6 +91,7 @@ static void pc_init1(MachineState *machine,
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 ram_addr_t lowmem;
+DeviceState *i440fx_host;
 
 /*
  * Calculate ram split, for memory below and above 4G.  It's a bit
@@ -164,9 +165,11 @@ static void pc_init1(MachineState *machine,
 pci_memory = g_new(MemoryRegion, 1);
 memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
 rom_memory = pci_memory;
+i440fx_host = qdev_new(host_type);
 } else {
 pci_memory = NULL;
 rom_memory = system_memory;
+i440fx_host = NULL;
 }
 
 pc_guest_info_init(pcms);
@@ -200,8 +203,8 @@ static void pc_init1(MachineState *machine,
 const char *type = xen_enabled() ? TYPE_PIIX3_XEN_DEVICE
  : TYPE_PIIX3_DEVICE;
 
-pci_bus = i440fx_init(host_type,
-  pci_type,
+pci_bus = i440fx_init(pci_type,
+  i440fx_host,
   system_memory, system_io, machine->ram_size,
   x86ms->below_4g_mem_size,
   x86ms->above_4g_mem_size,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index f96cbd04e284..46ea89e564de 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -203,12 +203,12 @@ static void pc_q35_init(MachineState *machine)
 pcms->smbios_entry_point_type);
 }
 
-/* allocate ram and load rom/bios */
-pc_memory_init(pcms, get_system_memory(), rom_memory, &ram_memory);
-
 /* create pci host bus */
 q35_host = Q35_HOST_DEVICE(qdev_new(TYPE_Q35_HOST_DEVICE));
 
+/* allocate ram and load rom/bios */
+pc_memory_init(pcms, get_system_memory(), rom_memory, &ram_memory);
+
 object_property_add_child(qdev_get_machine(), "q35", OBJECT(q35_host));
 object_property_set_link(OBJECT(q35_host), MCH_HOST_PROP_RAM_MEM,
  OBJECT(ram_memory), NULL);
diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
index 1c5ad5f918a2..d5426ef4a53c 100644
--- a/hw/pci-host/i440fx.c
+++ b/hw/pci-host/i440fx.c
@@ -237,7 +237,8 @@ static void i440fx_realize(PCIDevice *dev, Error **errp)
 }
 }
 
-PCIBus *i440fx_init(const char *host_type, const char *pci_type,
+PCIBus *i440fx_init(const char *pci_type,
+DeviceState *dev,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
 ram_addr_t ram_size,
@@ -246,7 +247,6 @@ PCIBus *i440fx_init(const char *host_type, const char 
*pci_type,
 MemoryRegion *pci_address_space,
 MemoryRegion *ram_memory)
 {
-DeviceState *dev;
 PCIBus *b;
 PCIDevice *d;
 PCIHostState *s;
@@ -254,7 +254,6 @@ PCIBus *i440fx_init(const char *host_type, const char 
*pci_type,
 unsigned i;
 I440FXState *i440fx;
 
-dev = qdev_new(host_type);
 s = PCI_HOST_BRIDGE(dev);
 b = pci_root_bus_new(dev, NULL, pci_address_space,
  address_space_io, 0, TYPE_PCI_BUS);
diff --git a/include/hw/pci-host/i440fx.h b/include/hw/pci-host/i440fx.h
index 52518dbf08e6..d02bf1ed6b93 100644
--- a/include/hw/pci-host/i440fx.h
+++ b/include/hw/pci-host/i440fx.h
@@ -35,7 +35,8 @@ struct PCII440FXState {
 
 #define TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE "igd-passthrough-i440FX"
 
-PCIBus *i440fx_init(const char *host_type, const char *pci_type,
+PCIBus *i440fx_init(const char *pci_type,
+DeviceState *dev,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
 ram_addr_t ram_size,
-- 
2.17.2

[PATCH v9 07/11] i386/pc: handle unitialized mr in pc_get_cxl_range_end()

2022-07-19 Thread Joao Martins

Remove pc_get_cxl_range_end() dependency on the CXL memory region,
and replace with one that does not require the CXL host_mr to determine
the start of CXL start.

This in preparation to allow pc_pci_hole64_start() to be called early
in pc_memory_init(), handle CXL memory region end when its underlying
memory region isn't yet initialized.

Cc: Jonathan Cameron 
Signed-off-by: Joao Martins 
---
 hw/i386/pc.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 3fdcab4bb4f3..c654be6cf0bd 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -843,17 +843,15 @@ static uint64_t pc_get_cxl_range_start(PCMachineState 
*pcms)
 
 static uint64_t pc_get_cxl_range_end(PCMachineState *pcms)
 {
-uint64_t start = 0;
+uint64_t start = pc_get_cxl_range_start(pcms) + MiB;
 
-if (pcms->cxl_devices_state.host_mr.addr) {
-start = pcms->cxl_devices_state.host_mr.addr +
-memory_region_size(&pcms->cxl_devices_state.host_mr);
-if (pcms->cxl_devices_state.fixed_windows) {
-GList *it;
-for (it = pcms->cxl_devices_state.fixed_windows; it; it = 
it->next) {
-CXLFixedWindow *fw = it->data;
-start = fw->mr.addr + memory_region_size(&fw->mr);
-}
+if (pcms->cxl_devices_state.fixed_windows) {
+GList *it;
+
+start = ROUND_UP(start, 256 * MiB);
+for (it = pcms->cxl_devices_state.fixed_windows; it; it = it->next) {
+CXLFixedWindow *fw = it->data;
+start += fw->size;
 }
 }
 
-- 
2.17.2

[PATCH v9 04/11] i386/pc: factor out above-4g end to an helper

2022-07-19 Thread Joao Martins

There's a couple of places that seem to duplicate this calculation
of RAM size above the 4G boundary. Move all those to a helper function.

Signed-off-by: Joao Martins 
Reviewed-by: Igor Mammedov 
---
 hw/i386/pc.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e952dc62a12e..6c898a86cb89 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -814,6 +814,17 @@ void xen_load_linux(PCMachineState *pcms)
 #define PC_ROM_ALIGN   0x800
 #define PC_ROM_SIZE(PC_ROM_MAX - PC_ROM_MIN_VGA)
 
+static hwaddr pc_above_4g_end(PCMachineState *pcms)
+{
+X86MachineState *x86ms = X86_MACHINE(pcms);
+
+if (pcms->sgx_epc.size != 0) {
+return sgx_epc_above_4g_end(&pcms->sgx_epc);
+}
+
+return x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
+}
+
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
@@ -891,15 +902,8 @@ void pc_memory_init(PCMachineState *pcms,
 exit(EXIT_FAILURE);
 }
 
-if (pcms->sgx_epc.size != 0) {
-machine->device_memory->base = 
sgx_epc_above_4g_end(&pcms->sgx_epc);
-} else {
-machine->device_memory->base =
-x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
-}
-
 machine->device_memory->base =
-ROUND_UP(machine->device_memory->base, 1 * GiB);
+ROUND_UP(pc_above_4g_end(pcms), 1 * GiB);
 
 if (pcmc->enforce_aligned_dimm) {
 /* size device region assuming 1G page max alignment per slot */
@@ -926,10 +930,8 @@ void pc_memory_init(PCMachineState *pcms,
 if (pcmc->has_reserved_memory && machine->device_memory->base) {
 cxl_base = machine->device_memory->base
 + memory_region_size(&machine->device_memory->mr);
-} else if (pcms->sgx_epc.size != 0) {
-cxl_base = sgx_epc_above_4g_end(&pcms->sgx_epc);
 } else {
-cxl_base = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
+cxl_base = pc_above_4g_end(pcms);
 }
 
 e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
@@ -1016,7 +1018,6 @@ uint64_t pc_pci_hole64_start(void)
 PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 MachineState *ms = MACHINE(pcms);
-X86MachineState *x86ms = X86_MACHINE(pcms);
 uint64_t hole64_start = 0;
 
 if (pcms->cxl_devices_state.host_mr.addr) {
@@ -1034,10 +1035,8 @@ uint64_t pc_pci_hole64_start(void)
 if (!pcmc->broken_reserved_end) {
 hole64_start += memory_region_size(&ms->device_memory->mr);
 }
-} else if (pcms->sgx_epc.size != 0) {
-hole64_start = sgx_epc_above_4g_end(&pcms->sgx_epc);
 } else {
-hole64_start = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
+hole64_start = pc_above_4g_end(pcms);
 }
 
 return ROUND_UP(hole64_start, 1 * GiB);
-- 
2.17.2

Re: [PULL 00/24] Net Patches

2022-07-19 Thread Peter Maydell

On Tue, 19 Jul 2022 at 14:17, Jason Wang  wrote:
>
> The following changes since commit f9d9fff72eed03acde97ea2d66104748dc474b2e:
>
>   Merge tag 'qemu-sparc-20220718' of https://github.com/mcayland/qemu into 
> staging (2022-07-19 09:57:13 +0100)
>
> are available in the git repository at:
>
>   https://github.com/jasowang/qemu.git tags/net-pull-request
>
> for you to fetch changes up to f8a9fd7b7ab6601b76e253bbcbfe952f8c1887ec:
>
>   net/colo.c: fix segmentation fault when packet is not parsed correctly 
> (2022-07-19 21:05:20 +0800)
>
> 
>
> 

Fails to build, many platforms:

eg
https://gitlab.com/qemu-project/qemu/-/jobs/2742242194

libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cvq_unmap_buf':
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:234: undefined
reference to `vhost_iova_tree_find_iova'
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:242: undefined
reference to `vhost_vdpa_dma_unmap'
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:247: undefined
reference to `vhost_iova_tree_remove'
libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cleanup':
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:163: undefined
reference to `vhost_iova_tree_delete'
libcommon.fa.p/net_vhost-vdpa.c.o: In function `vhost_vdpa_cvq_map_buf':
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:285: undefined
reference to `vhost_iova_tree_map_alloc'
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:291: undefined
reference to `vhost_vdpa_dma_map'
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:300: undefined
reference to `vhost_iova_tree_remove'
libcommon.fa.p/net_vhost-vdpa.c.o: In function
`vhost_vdpa_net_handle_ctrl_avail':
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:445: undefined
reference to `vhost_svq_push_elem'
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:408: undefined
reference to `vhost_svq_add'
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:422: undefined
reference to `vhost_svq_poll'
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:434: undefined
reference to `virtio_net_handle_ctrl_iov'
libcommon.fa.p/net_vhost-vdpa.c.o: In function `net_init_vhost_vdpa':
/builds/qemu-project/qemu/build/../net/vhost-vdpa.c:611: undefined
reference to `vhost_iova_tree_new'
libcommon.fa.p/net_vhost-vdpa.c.o: In function
`glib_autoptr_cleanup_VhostIOVATree':
/builds/qemu-project/qemu/hw/virtio/vhost-iova-tree.h:20: undefined
reference to `vhost_iova_tree_delete'
collect2: error: ld returned 1 exit status
[2436/4108] Compiling C object
libqemu-s390x-softmmu.fa.p/meson-generated_.._qapi_qapi-introspect.c.o



Presumably the conditions in the various meson.build files are
out of sync about when to build the net/vhost-vdpa.c code vs
the code that's implementing the functions it's trying to call.

Specifically, the functions being called will only be present
if the target architecture has CONFIG_VIRTIO, which isn't
guaranteed, but we try to link the vhost-vdpa code in anyway.

thanks
-- PMM

1 2 3 4 >

1 - 100 of 312 matches

Mail list logo