date:20240503

Re: [PULL 00/14] Accel / SH4 / UI patches for 2024-05-03

2024-05-03 Thread Richard Henderson


On 5/3/24 08:35, Philippe Mathieu-Daudé wrote:

The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479:

   Merge tag 'accel-20240426' ofhttps://github.com/philmd/qemu  into staging 
(2024-04-26 15:28:13 -0700)

are available in the Git repository at:

   https://github.com/philmd/qemu.git  tags/accel-sh4-ui-20240503

for you to fetch changes up to 2d27c91e2b72ac7a65504ac207c89262d92464eb:

   ui/cocoa.m: Drop old macOS-10.12-and-earlier compat ifdefs (2024-05-03 
17:33:26 +0200)


- Fix NULL dereference in NVMM & WHPX init_vcpu()
- Move user emulation headers "exec/user" to "user"
- Fix SH-4 ADDV / SUBV opcodes
- Drop Cocoa compatility on macOS <= 10.12
- Update Anthony PERARD email


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

Re: [PATCH V8 6/8] physmem: Add helper function to destroy CPU AddressSpace

2024-05-03 Thread Salil Mehta

Hi Zhukeqian,

On Fri, Mar 15, 2024 at 1:17 AM zhukeqian  wrote:

> Hi Salil,
>
> [...]
>
> +void cpu_address_space_destroy(CPUState *cpu, int asidx) {
> +CPUAddressSpace *cpuas;
> +
> +assert(cpu->cpu_ases);
> +assert(asidx >= 0 && asidx < cpu->num_ases);
> +/* KVM cannot currently support multiple address spaces. */
> +assert(asidx == 0 || !kvm_enabled());
> +
> +cpuas = &cpu->cpu_ases[asidx];
> +if (tcg_enabled()) {
> +memory_listener_unregister(&cpuas->tcg_as_listener);
> +}
> +
> +address_space_destroy(cpuas->as);
> +g_free_rcu(cpuas->as, rcu);
>
> In address_space_destroy(), it calls call_rcu1() on cpuas->as which will
> set do_address_space_destroy() as the rcu func.
> And g_free_rcu() also calls call_rcu1() on cpuas->as which will overwrite
> the rcu func as g_free().
>
> Then I think the g_free() may be called twice in rcu thread, please verify
> that.
>
> The source code of call_rcu1:
>
> void call_rcu1(struct rcu_head *node, void (*func)(struct rcu_head *node))
> {
> node->func = func;
> enqueue(node);
> qatomic_inc(&rcu_call_count);
> qemu_event_set(&rcu_call_ready_event);
> }
>


Thanks for testing and identifying this. Let me have a look and will get
back to you.

Thanks
Salil



>
> Thanks,
> Keqian
>
> +
> +if (asidx == 0) {
> +/* reset the convenience alias for address space 0 */
> +cpu->as = NULL;
> +}
> +
> +if (--cpu->cpu_ases_count == 0) {
> +g_free(cpu->cpu_ases);
> +cpu->cpu_ases = NULL;
> +}
> +}
> +
>  AddressSpace *cpu_get_address_space(CPUState *cpu, int asidx)  {
>  /* Return the AddressSpace corresponding to the specified index */
> --
> 2.34.1
>
>

Re: [PATCH 0/3] testing/next: s390x gitlab updates

2024-05-03 Thread Richard Henderson


On 4/26/24 08:39, Alex Bennée wrote:

I was asked to update the custom gitlab runner from the aging 20.04 to
22.04 which has been done. However I needed to update the provisioning
scripts and clean-up some of the cruft. Sadly this doesn't seem to be
passing cleanly as we have:

   - qtest-s390x/migration-test ERROR   98.94s   killed by signal 6 
SIGABRT
   - failing TCG tests (on s390x HW)
  - float_convd fails against a generated reference
  - clc returns 1


I've had a look at the clc failure.

It fails because of bad address space layout, where the NULL page isn't unmapped, so the 
expected SIGSEGV does *not* happen.


This is unfortunate and we could do better.

However, with the upgrade, --static --enable-pie no longer works.
From config.log,

cc -m64 -Werror -fPIE -DPIE -o config-temp/qemu-conf.exe 
config-temp/qemu-conf.c -static-pie
/usr/bin/ld: cannot find rcrt1.o: No such file or directory

which suggests a missing package.

Alternately, we could drop --static, as it's not really relevant to this testing.  With 
that, we get PIE dynamically linked executables, which do not trigger the bad layout.


But at some point we should make sure that the NULL page(s) are reserved PROT_NONE for the 
guest, which ensure this stays fixed.



r~

Re: [PATCH 7/9] monitor: fdset: Match against O_DIRECT

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 06:19:30PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:40AM -0300, Fabiano Rosas wrote:
> >> We're about to enable the use of O_DIRECT in the migration code and
> >> due to the alignment restrictions imposed by filesystems we need to
> >> make sure the flag is only used when doing aligned IO.
> >> 
> >> The migration will do parallel IO to different regions of a file, so
> >> we need to use more than one file descriptor. Those cannot be obtained
> >> by duplicating (dup()) since duplicated file descriptors share the
> >> file status flags, including O_DIRECT. If one migration channel does
> >> unaligned IO while another sets O_DIRECT to do aligned IO, the
> >> filesystem would fail the unaligned operation.
> >> 
> >> The add-fd QMP command along with the fdset code are specifically
> >> designed to allow the user to pass a set of file descriptors with
> >> different access flags into QEMU to be later fetched by code that
> >> needs to alternate between those flags when doing IO.
> >> 
> >> Extend the fdset matching to behave the same with the O_DIRECT flag.
> >> 
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >>  monitor/fds.c | 7 ++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/monitor/fds.c b/monitor/fds.c
> >> index 4ec3b7eea9..62e324fcec 100644
> >> --- a/monitor/fds.c
> >> +++ b/monitor/fds.c
> >> @@ -420,6 +420,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int 
> >> flags)
> >>  int fd = -1;
> >>  int dup_fd;
> >>  int mon_fd_flags;
> >> +int mask = O_ACCMODE;
> >> +
> >> +#ifdef O_DIRECT
> >> +mask |= O_DIRECT;
> >> +#endif
> >>  
> >>  if (mon_fdset->id != fdset_id) {
> >>  continue;
> >> @@ -431,7 +436,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int 
> >> flags)
> >>  return -1;
> >>  }
> >>  
> >> -if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
> >> +if ((flags & mask) == (mon_fd_flags & mask)) {
> >>  fd = mon_fdset_fd->fd;
> >>  break;
> >>  }
> >
> > I think I see what you wanted to do, picking out the right fd out of two
> > when qemu_open_old(), which makes sense.
> >
> > However what happens if the mgmt app only passes in 1 fd to the fdset?  The
> > issue is we have a "fallback dup()" plan right after this chunk of code:
> >
> 
> I'm validating the fdset at file_parse_fdset() beforehand. If there's
> anything else than 2 fds then we'll error out:
> 
> if (nfds != 2) {
> error_setg(errp, "Outgoing migration needs two fds in the fdset, "
>"got %d", nfds);
> qmp_remove_fd(*id, false, -1, NULL);
> *id = -1;
> return false;
> }
> 
> > dup_fd = qemu_dup_flags(fd, flags);
> > if (dup_fd == -1) {
> > return -1;
> > }
> >
> > mon_fdset_fd_dup = g_malloc0(sizeof(*mon_fdset_fd_dup));
> > mon_fdset_fd_dup->fd = dup_fd;
> > QLIST_INSERT_HEAD(&mon_fdset->dup_fds, mon_fdset_fd_dup, next);
> >
> > I think it means even if the mgmt app only passes in 1 fd (rather than 2,
> > one with O_DIRECT, one without), QEMU can always successfully call
> > qemu_open_old() twice for each case, even though silently the two FDs will
> > actually impact on each other.  This doesn't look ideal if it's true.
> >
> > But I also must confess I don't really understand this code at all: we
> > dup(), then we try F_SETFL on all the possible flags got passed in.
> > However AFAICT due to the fact that dup()ed FDs will share "struct file" it
> > means mostly all flags will be shared, except close-on-exec.  I don't ever
> > see anything protecting that F_SETFL to only touch close-on-exec, I think
> > it means it'll silently change file status flags for the other fd which we
> > dup()ed from.  Does it mean that we have issue already with such dup() 
> > usage?
> 
> I think you're right, but I also think there's a requirement even from
> this code that the fds in the fdset cannot be dup()ed. I don't see it
> enforced anywhere, but maybe that's a consequence of the larger use-case
> for which this feature was introduced.

I think that's the thing we need to figure out for add-fd usages.  The bad
thing is there're too many qemu_open_internal() users... so we can't easily
tell what we're looking for. May need some time reading the code or the
history.. pretty sad.  I hope someone can chim in.

> 
> For our scenario, the open() man page says one can use kcmp() to compare
> the fds and determine if they are a result of dup(). Maybe we should do
> that extra check? We're defining a pretty rigid interface between QEMU
> and the management layer, so not likely to break once it's written. I'm
> also not sure how bad would it be to call syscall() directly from QEMU
> (kcmp has no libc wrapper).

That should be all fine, see:

$ git grep " syscall(" |

Re: [PATCH 2/9] migration: Fix file migration with fdset

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 06:31:06PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote:
> >> Peter Xu  writes:
> >> 
> >> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> >> >> When the migration using the "file:" URI was implemented, I don't
> >> >> think any of us noticed that if you pass in a file name with the
> >> >> format "/dev/fdset/N", this allows a file descriptor to be passed in
> >> >> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> >> >> support has been added without regard for the fdset part and we got
> >> >> some things wrong.
> >> >> 
> >> >> The first issue is that we should not truncate the migration file if
> >> >> we're allowing an fd + offset. We need to leave the file contents
> >> >> untouched.
> >> >
> >> > I'm wondering whether we can use fallocate() instead on the ranges so 
> >> > that
> >> > we always don't open() with O_TRUNC.  Before that..  could you remind me
> >> > why do we need to truncate in the first place?  I definitely missed
> >> > something else here too.
> >> 
> >> AFAIK, just to avoid any issues if the file is pre-existing. I don't see
> >> the difference between O_TRUNC and fallocate in this case.
> >
> > Then, shall we avoid truncations at all, leaving all the feasibility to
> > user (also errors prone to make)?
> >
> 
> Is this a big deal? I'd rather close that possible gap and avoid the bug
> reports.

No possible of such report if the user uses Libvirt or even more virt
stacks, am I right?  While this is only for whoever uses QEMU directly, and
only if the one forgot to remove a leftover image file?

I'd not worry about those people who use QEMU directly - they aren't the
people we need to care too much about, imho (and I'm definitely one of
them..).  The problem is I feel it an overkill introducing a migration
global var just for this purpose.

No strong opinions, if you feel strongly like so I'm ok with it.  But if
one day if we want to remove FileOutgoingArgs I'll also leave that to you
as a trade-off. :-)

> 
> >> 
> >> >
> >> >> 
> >> >> The second issue is that there's an expectation that QEMU removes the
> >> >> fd after the migration has finished. That's what the "fd:" code
> >> >> does. Otherwise a second migration on the same VM could attempt to
> >> >> provide an fdset with the same name and QEMU would reject it.
> >> >
> >> > Let me check what we do when with "fd:" and when migration completes or
> >> > cancels.
> >> >
> >> > IIUC it's qio_channel_file_close() that does the final cleanup work on
> >> > e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
> >> >
> >> > /* Close fd that was dup'd from an fdset */
> >> > fdset_id = monitor_fdset_dup_fd_find(fd);
> >> > if (fdset_id != -1) {
> >> > int ret;
> >> >
> >> > ret = close(fd);
> >> > if (ret == 0) {
> >> > monitor_fdset_dup_fd_remove(fd);
> >> > }
> >> >
> >> > return ret;
> >> > }
> >> >
> >> > Shouldn't this done the work already?
> >> 
> >> That removes the mon_fdset_fd_dup->fd, we want to remove the
> >> mon_fdset_fd->fd.
> >
> > What I read so far is when we are removing the dup-fds, we'll do one more
> > thing:
> >
> > monitor_fdset_dup_fd_find_remove():
> > if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> > monitor_fdset_cleanup(mon_fdset);
> > }
> >
> > It means if we removed all the dup-fds correctly, we should also remove the
> > whole fdset, which includes the ->fds, IIUC.
> >
> 
> Since mon_fdset_fd->removed == false, we hit the runstate_is_running()
> problem. I'm not sure, but probably mon_refcount > 0 as well. So the fd
> would not be removed.
> 
> But I'll retest this on Monday just be sure, it's been a while since I
> wrote some parts of this.

Thanks.  And I hope we can also get some more clues too when you dig out
more out of the whole add-fd API; I hope we don't pile up more complicated
logics on top of a mistery.  I feel like this is the time we figure things
out.

-- 
Peter Xu

Re: [PULL 00/10] bufferiszero improvements

2024-05-03 Thread Richard Henderson


On 5/3/24 08:13, Richard Henderson wrote:

The following changes since commit 4977ce198d2390bff8c71ad5cb1a5f6aa24b56fb:

   Merge tag 'pull-tcg-20240501' ofhttps://gitlab.com/rth7680/qemu  into 
staging (2024-05-01 15:15:33 -0700)

are available in the Git repository at:

   https://gitlab.com/rth7680/qemu.git  tags/pull-misc-20240503

for you to fetch changes up to a06d9eddb015a9f5895161b0a3958a2e4be21579:

   tests/bench: Add bufferiszero-bench (2024-05-03 08:03:35 -0700)


util/bufferiszero:
   - Remove sse4.1 and avx512 variants
   - Reorganize for early test for acceleration
   - Remove useless prefetches
   - Optimize sse2, avx2 and integer variants
   - Add simd acceleration for aarch64
   - Add bufferiszero-bench


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

Re: [PATCH v3 07/11] hw/sh4/r2d: Realize IDE controller before accessing it

2024-05-03 Thread Guenter Roeck

Hi,

On Thu, Feb 08, 2024 at 07:12:40PM +0100, Philippe Mathieu-Daudé wrote:
> We should not wire IRQs on unrealized device.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> Reviewed-by: Peter Maydell 
> Reviewed-by: Yoshinori Sato 

qemu 9.0 fails to boot Linux from ide/ata drives with the sh4
and sh4eb emulations. Error log is as follows.

ata1.00: ATA-7: QEMU HARDDISK, 2.5+, max UDMA/100
ata1.00: 16384 sectors, multi 16: LBA48
ata1.00: configured for PIO
scsi 0:0:0:0: Direct-Access ATA  QEMU HARDDISK2.5+ PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
ata1: lost interrupt (Status 0x58)

[ and more similar errors ]

qemu command line:

qemu-system-sh4eb -M r2d -kernel arch/sh/boot/zImage \
-snapshot -drive file=rootfs.ext2,format=raw,if=ide \
-append "root=/dev/sda console=ttySC1,115200 noiotrap" \
-serial null -serial stdio -monitor null -nographic -no-reboot

Bisect points to this patch (see below). Reverting it fixes the problem.

Guenter

---
bisect log:

# bad: [c25df57ae8f9fe1c72eee2dab37d76d904ac382e] Update version for 9.0.0 
release
# good: [1600b9f46b1bd08b00fe86c46ef6dbb48cbe10d6] Update version for v8.2.0 
release
git bisect start 'v9.0.0' 'v8.2.0'
# good: [62357c047a5abc6ede992159ed7c0aaaeb50617a] Merge tag 
'qemu-sparc-20240213' of https://github.com/mcayland/qemu into staging
git bisect good 62357c047a5abc6ede992159ed7c0aaaeb50617a
# bad: [d65f1ed7de1559534d0a1fabca5bdd81c594c7ca] docs/acpi/bits: add some 
clarity and details while also improving formating
git bisect bad d65f1ed7de1559534d0a1fabca5bdd81c594c7ca
# bad: [99e1c1137b6f339be1e4b76e243ad7b7c3d3cb8c] hw/i386/pc: Populate RTC 
attribute directly
git bisect bad 99e1c1137b6f339be1e4b76e243ad7b7c3d3cb8c
# bad: [760b4dcdddba4a40b9fa0eb78fdfc7eda7cb83d0] Merge tag 'for-upstream' of 
https://gitlab.com/bonzini/qemu into staging
git bisect bad 760b4dcdddba4a40b9fa0eb78fdfc7eda7cb83d0
# good: [f2b4a98930c122648e9dc494e49cea5dffbcc2be] target/arm: Allow access to 
SPSR_hyp from hyp mode
git bisect good f2b4a98930c122648e9dc494e49cea5dffbcc2be
# bad: [1a8e2f58c5dd721086284f827326b370d19ad9eb] hw/i386/q35: Use DEVICE() 
cast macro with PCIDevice object
git bisect bad 1a8e2f58c5dd721086284f827326b370d19ad9eb
# good: [59ae6bcddc3651b55b96c2bf05a6cd4312e46d10] hw/ppc/prep: Realize ISA 
bridge before accessing it
git bisect good 59ae6bcddc3651b55b96c2bf05a6cd4312e46d10
# bad: [7ed9a5f626a6c932a8c869a91e6a8b3e2029f5ef] hw/intc/grlib_irqmp: 
implements the multiprocessor status register
git bisect bad 7ed9a5f626a6c932a8c869a91e6a8b3e2029f5ef
# bad: [d08b7af3f7f27f6f3da8446756bf0b9352026b1d] target/sparc: Provide hint 
about CPUSPARCState::irq_manager member
git bisect bad d08b7af3f7f27f6f3da8446756bf0b9352026b1d
# bad: [5e37bc4997c32a1c9a6621a060462c84df9f1b8f] hw/dma: Pass parent object to 
i8257_dma_init()
git bisect bad 5e37bc4997c32a1c9a6621a060462c84df9f1b8f
# bad: [3c5f86a22686ef475a8259c0d8ee714f61c770c9] hw/sh4/r2d: Realize IDE 
controller before accessing it
git bisect bad 3c5f86a22686ef475a8259c0d8ee714f61c770c9
# good: [fc432ba0f58343c8912b80e9056315bb9bd8df92] hw/misc/macio: Realize IDE 
controller before accessing it
git bisect good fc432ba0f58343c8912b80e9056315bb9bd8df92
# first bad commit: [3c5f86a22686ef475a8259c0d8ee714f61c770c9] hw/sh4/r2d: 
Realize IDE controller before accessing it

Re: [PATCH 2/9] migration: Fix file migration with fdset

2024-05-03 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
>> >> When the migration using the "file:" URI was implemented, I don't
>> >> think any of us noticed that if you pass in a file name with the
>> >> format "/dev/fdset/N", this allows a file descriptor to be passed in
>> >> to QEMU and that behaves just like the "fd:" URI. So the "file:"
>> >> support has been added without regard for the fdset part and we got
>> >> some things wrong.
>> >> 
>> >> The first issue is that we should not truncate the migration file if
>> >> we're allowing an fd + offset. We need to leave the file contents
>> >> untouched.
>> >
>> > I'm wondering whether we can use fallocate() instead on the ranges so that
>> > we always don't open() with O_TRUNC.  Before that..  could you remind me
>> > why do we need to truncate in the first place?  I definitely missed
>> > something else here too.
>> 
>> AFAIK, just to avoid any issues if the file is pre-existing. I don't see
>> the difference between O_TRUNC and fallocate in this case.
>
> Then, shall we avoid truncations at all, leaving all the feasibility to
> user (also errors prone to make)?
>

Is this a big deal? I'd rather close that possible gap and avoid the bug
reports.

>> 
>> >
>> >> 
>> >> The second issue is that there's an expectation that QEMU removes the
>> >> fd after the migration has finished. That's what the "fd:" code
>> >> does. Otherwise a second migration on the same VM could attempt to
>> >> provide an fdset with the same name and QEMU would reject it.
>> >
>> > Let me check what we do when with "fd:" and when migration completes or
>> > cancels.
>> >
>> > IIUC it's qio_channel_file_close() that does the final cleanup work on
>> > e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
>> >
>> > /* Close fd that was dup'd from an fdset */
>> > fdset_id = monitor_fdset_dup_fd_find(fd);
>> > if (fdset_id != -1) {
>> > int ret;
>> >
>> > ret = close(fd);
>> > if (ret == 0) {
>> > monitor_fdset_dup_fd_remove(fd);
>> > }
>> >
>> > return ret;
>> > }
>> >
>> > Shouldn't this done the work already?
>> 
>> That removes the mon_fdset_fd_dup->fd, we want to remove the
>> mon_fdset_fd->fd.
>
> What I read so far is when we are removing the dup-fds, we'll do one more
> thing:
>
> monitor_fdset_dup_fd_find_remove():
> if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> monitor_fdset_cleanup(mon_fdset);
> }
>
> It means if we removed all the dup-fds correctly, we should also remove the
> whole fdset, which includes the ->fds, IIUC.
>

Since mon_fdset_fd->removed == false, we hit the runstate_is_running()
problem. I'm not sure, but probably mon_refcount > 0 as well. So the fd
would not be removed.

But I'll retest this on Monday just be sure, it's been a while since I
wrote some parts of this.

>> 
>> >
>> > Off topic: I think this code is over complicated too, maybe I missed
>> > something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
>> > simply walk the list and remove stuff..  I attach a patch at the end that I
>> > tried to clean that up, just in case there's early comments.  But we can
>> > ignore that so we don't get side-tracked, and focus on the direct-io
>> > issues.
>> 
>> Well, I'm not confident touching this code. This is more than a decade
>> old, I have no idea what the original motivations were. The possible
>> interactions with the user via command-line (-add-fd), QMP (add-fd) and
>> the monitor lifetime make me confused. Not to mention the fdset part
>> being plumbed into the guts of a widely used qemu_open_internal() that
>> very misleadingly presents itself as just a wrapper for open().
>
> If to make QEMU long live, we'll probably need to touch it at some
> point.. or at least discuss about it and figure things out. We pay tech
> debts like this when there's no good comment / docs to refer in this case,
> then the earlier, perhaps also the better.. to try taking the stab, imho.
>
> Definitely not a request to clean everything up. :) Let's see whether
> others can chim in with better knowledge of the history.
>
>> 
>> >
>> > Thanks,
>> >
>> > ===
>> >
>> > From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
>> > From: Peter Xu 
>> > Date: Fri, 3 May 2024 11:27:20 -0400
>> > Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
>> > MIME-Version: 1.0
>> > Content-Type: text/plain; charset=UTF-8
>> > Content-Transfer-Encoding: 8bit
>> >
>> > This function is not needed, one remove function should already work.
>> > Clean it up.
>> >
>> > Here the code doesn't really care about whether we need to keep that dupfd
>> > around if close() failed: when that happens something got very wrong,
>> > keeping the dup_fd around the fdsets may not help that situation so far.
>> >

Re: [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 06:05:19PM -0300, Fabiano Rosas wrote:
> >> +#ifdef O_DIRECT
> >> +static void *migrate_mapped_ram_dio_start(QTestState *from,
> >> + QTestState *to)
> >> +{
> >> +migrate_mapped_ram_start(from, to);
> >
> > This line seems redundant, migrate_multifd_mapped_ram_start() should cover 
> > that.
> >
> 
> This is an artifact of another patch that adds direct-io + mapped-ram
> without multifd. I'm bringing that back on v2. We were having a
> discussion[1] about it in the libvirt mailing list. Having direct-io
> even without multifd might still be useful for libvirt.
> 
> 1- 
> https://lists.libvirt.org/archives/list/de...@lists.libvirt.org/thread/K4BDDJDMJ22XMJEFAUE323H5S5E47VQX/

Ah that's fine then.  Maybe add a comment somewhere for future readers?  Or
a sentence in the commit log would work too.

-- 
Peter Xu

Re: [PATCH 7/9] monitor: fdset: Match against O_DIRECT

2024-05-03 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Apr 26, 2024 at 11:20:40AM -0300, Fabiano Rosas wrote:
>> We're about to enable the use of O_DIRECT in the migration code and
>> due to the alignment restrictions imposed by filesystems we need to
>> make sure the flag is only used when doing aligned IO.
>> 
>> The migration will do parallel IO to different regions of a file, so
>> we need to use more than one file descriptor. Those cannot be obtained
>> by duplicating (dup()) since duplicated file descriptors share the
>> file status flags, including O_DIRECT. If one migration channel does
>> unaligned IO while another sets O_DIRECT to do aligned IO, the
>> filesystem would fail the unaligned operation.
>> 
>> The add-fd QMP command along with the fdset code are specifically
>> designed to allow the user to pass a set of file descriptors with
>> different access flags into QEMU to be later fetched by code that
>> needs to alternate between those flags when doing IO.
>> 
>> Extend the fdset matching to behave the same with the O_DIRECT flag.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  monitor/fds.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/monitor/fds.c b/monitor/fds.c
>> index 4ec3b7eea9..62e324fcec 100644
>> --- a/monitor/fds.c
>> +++ b/monitor/fds.c
>> @@ -420,6 +420,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int 
>> flags)
>>  int fd = -1;
>>  int dup_fd;
>>  int mon_fd_flags;
>> +int mask = O_ACCMODE;
>> +
>> +#ifdef O_DIRECT
>> +mask |= O_DIRECT;
>> +#endif
>>  
>>  if (mon_fdset->id != fdset_id) {
>>  continue;
>> @@ -431,7 +436,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>>  return -1;
>>  }
>>  
>> -if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
>> +if ((flags & mask) == (mon_fd_flags & mask)) {
>>  fd = mon_fdset_fd->fd;
>>  break;
>>  }
>
> I think I see what you wanted to do, picking out the right fd out of two
> when qemu_open_old(), which makes sense.
>
> However what happens if the mgmt app only passes in 1 fd to the fdset?  The
> issue is we have a "fallback dup()" plan right after this chunk of code:
>

I'm validating the fdset at file_parse_fdset() beforehand. If there's
anything else than 2 fds then we'll error out:

if (nfds != 2) {
error_setg(errp, "Outgoing migration needs two fds in the fdset, "
   "got %d", nfds);
qmp_remove_fd(*id, false, -1, NULL);
*id = -1;
return false;
}

> dup_fd = qemu_dup_flags(fd, flags);
> if (dup_fd == -1) {
> return -1;
> }
>
> mon_fdset_fd_dup = g_malloc0(sizeof(*mon_fdset_fd_dup));
> mon_fdset_fd_dup->fd = dup_fd;
> QLIST_INSERT_HEAD(&mon_fdset->dup_fds, mon_fdset_fd_dup, next);
>
> I think it means even if the mgmt app only passes in 1 fd (rather than 2,
> one with O_DIRECT, one without), QEMU can always successfully call
> qemu_open_old() twice for each case, even though silently the two FDs will
> actually impact on each other.  This doesn't look ideal if it's true.
>
> But I also must confess I don't really understand this code at all: we
> dup(), then we try F_SETFL on all the possible flags got passed in.
> However AFAICT due to the fact that dup()ed FDs will share "struct file" it
> means mostly all flags will be shared, except close-on-exec.  I don't ever
> see anything protecting that F_SETFL to only touch close-on-exec, I think
> it means it'll silently change file status flags for the other fd which we
> dup()ed from.  Does it mean that we have issue already with such dup() usage?

I think you're right, but I also think there's a requirement even from
this code that the fds in the fdset cannot be dup()ed. I don't see it
enforced anywhere, but maybe that's a consequence of the larger use-case
for which this feature was introduced.

For our scenario, the open() man page says one can use kcmp() to compare
the fds and determine if they are a result of dup(). Maybe we should do
that extra check? We're defining a pretty rigid interface between QEMU
and the management layer, so not likely to break once it's written. I'm
also not sure how bad would it be to call syscall() directly from QEMU
(kcmp has no libc wrapper).

>
> Thanks,

Re: [PATCH 5/9] migration/multifd: Add direct-io support

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 05:54:28PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:38AM -0300, Fabiano Rosas wrote:
> >> When multifd is used along with mapped-ram, we can take benefit of a
> >> filesystem that supports the O_DIRECT flag and perform direct I/O in
> >> the multifd threads. This brings a significant performance improvement
> >> because direct-io writes bypass the page cache which would otherwise
> >> be thrashed by the multifd data which is unlikely to be needed again
> >> in a short period of time.
> >> 
> >> To be able to use a multifd channel opened with O_DIRECT, we must
> >> ensure that a certain aligment is used. Filesystems usually require a
> >> block-size alignment for direct I/O. The way to achieve this is by
> >> enabling the mapped-ram feature, which already aligns its I/O properly
> >> (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
> >> 
> >> By setting O_DIRECT on the multifd channels, all writes to the same
> >> file descriptor need to be aligned as well, even the ones that come
> >> from outside multifd, such as the QEMUFile I/O from the main migration
> >> code. This makes it impossible to use the same file descriptor for the
> >> QEMUFile and for the multifd channels. The various flags and metadata
> >> written by the main migration code will always be unaligned by virtue
> >> of their small size. To workaround this issue, we'll require a second
> >> file descriptor to be used exclusively for direct I/O.
> >> 
> >> The second file descriptor can be obtained by QEMU by re-opening the
> >> migration file (already possible), or by being provided by the user or
> >> management application (support to be added in future patches).
> >> 
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >>  migration/file.c  | 22 +++---
> >>  migration/migration.c | 23 +++
> >>  2 files changed, 42 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/migration/file.c b/migration/file.c
> >> index 8f30999400..b9265b14dd 100644
> >> --- a/migration/file.c
> >> +++ b/migration/file.c
> >> @@ -83,17 +83,33 @@ void file_cleanup_outgoing_migration(void)
> >>  
> >>  bool file_send_channel_create(gpointer opaque, Error **errp)
> >>  {
> >> -QIOChannelFile *ioc;
> >> +QIOChannelFile *ioc = NULL;
> >>  int flags = O_WRONLY;
> >> -bool ret = true;
> >> +bool ret = false;
> >> +
> >> +if (migrate_direct_io()) {
> >> +#ifdef O_DIRECT
> >> +/*
> >> + * Enable O_DIRECT for the secondary channels. These are used
> >> + * for sending ram pages and writes should be guaranteed to be
> >> + * aligned to at least page size.
> >> + */
> >> +flags |= O_DIRECT;
> >> +#else
> >> +error_setg(errp, "System does not support O_DIRECT");
> >> +error_append_hint(errp,
> >> +  "Try disabling direct-io migration 
> >> capability\n");
> >> +goto out;
> >> +#endif
> >
> > Hopefully if we can fail migrate-set-parameters correctly always, we will
> > never trigger this error.
> >
> > I know Linux used some trick like this to even avoid such ifdefs:
> >
> >   if (qemu_has_direct_io() && migrate_direct_io()) {
> >   // reference O_DIRECT
> >   }
> >
> > So as long as qemu_has_direct_io() can return a constant "false" when
> > O_DIRECT not defined, the compiler is smart enough to ignore the O_DIRECT
> > inside the block.
> >
> > Even if it won't work, we can still avoid that error (and rely on the
> > set-parameter failure):
> >
> > #ifdef O_DIRECT
> >if (migrate_direct_io()) {
> >// reference O_DIRECT
> >}
> > #endif
> >
> > Then it should run the same, just to try making ifdefs as light as
> > possible..
> 
> Ok.
> 
> Just FYI, in v2 I'm adding direct-io to migration incoming side as well,
> so I put this logic into a helper:
> 
> static bool file_enable_direct_io(int *flags, Error **errp)
> {
> if (migrate_direct_io()) {
> #ifdef O_DIRECT
> *flags |= O_DIRECT;
> #else
> error_setg(errp, "System does not support O_DIRECT");
> error_append_hint(errp,
>   "Try disabling direct-io migration capability\n");
> return false;
> #endif
> }
> 
> return true;
> }
> 
> But I'll apply your suggestions nonetheless.

Thanks, please give it a shot, I hope it will work with either way.

One thing to mention is, if you want to play with the qemu_has_direct_io()
approach with no "#ifdefs", you can't keep qemu_has_direct_io() in osdep.c,
but you must define it in osdep.h as static inline functions.  Otherwise I
think osdep.o is forced to include it as a function so that trick won't work.

Just try compile without O_DIRECT should see.

-- 
Peter Xu

Re: [PATCH 4/9] migration: Add direct-io parameter

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 05:49:32PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
> >> Add the direct-io migration parameter that tells the migration code to
> >> use O_DIRECT when opening the migration stream file whenever possible.
> >> 
> >> This is currently only used with the mapped-ram migration that has a
> >> clear window guaranteed to perform aligned writes.
> >> 
> >> Acked-by: Markus Armbruster 
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >>  include/qemu/osdep.h   |  2 ++
> >>  migration/migration-hmp-cmds.c | 11 +++
> >>  migration/options.c| 30 ++
> >>  migration/options.h|  1 +
> >>  qapi/migration.json| 18 +++---
> >>  util/osdep.c   |  9 +
> >>  6 files changed, 68 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> >> index c7053cdc2b..645c14a65d 100644
> >> --- a/include/qemu/osdep.h
> >> +++ b/include/qemu/osdep.h
> >> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t 
> >> len, bool exclusive);
> >>  bool qemu_has_ofd_lock(void);
> >>  #endif
> >>  
> >> +bool qemu_has_direct_io(void);
> >> +
> >>  #if defined(__HAIKU__) && defined(__i386__)
> >>  #define FMT_pid "%ld"
> >>  #elif defined(WIN64)
> >> diff --git a/migration/migration-hmp-cmds.c 
> >> b/migration/migration-hmp-cmds.c
> >> index 7e96ae6ffd..8496a2b34e 100644
> >> --- a/migration/migration-hmp-cmds.c
> >> +++ b/migration/migration-hmp-cmds.c
> >> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
> >> QDict *qdict)
> >>  monitor_printf(mon, "%s: %s\n",
> >>  MigrationParameter_str(MIGRATION_PARAMETER_MODE),
> >>  qapi_enum_lookup(&MigMode_lookup, params->mode));
> >> +
> >> +if (params->has_direct_io) {
> >> +monitor_printf(mon, "%s: %s\n",
> >> +   MigrationParameter_str(
> >> +   MIGRATION_PARAMETER_DIRECT_IO),
> >> +   params->direct_io ? "on" : "off");
> >> +}
> >
> > This will be the first parameter to optionally display here.  I think it's
> > a sign of misuse of has_direct_io field..
> >
> > IMHO has_direct_io should be best to be kept as "whether direct_io field is
> > valid" and that's all of it.  It hopefully shouldn't contain more
> > information than that, or otherwise it'll be another small challenge we
> > need to overcome when we can remove all these has_* fields, and can also be
> > easily overlooked.
> 
> I don't think I understand why we have those has_* fields. I thought my
> usage of 'params->has_direct_io = qemu_has_direct_io()' was the correct
> one, i.e. checking whether QEMU has any support for that parameter. Can
> you help me out here?

Here params is the pointer to "struct MigrationParameters", which is
defined in qapi/migration.json.  And we have had "has_*" only because we
allow optional fields with asterisks:

  { 'struct': 'MigrationParameters',
'data': { '*announce-initial': 'size',
  ...
  } }

So that's why it better only means "whether this field existed", because
it's how it is defined.

IIRC we (or say, Markus) used to have some attempts deduplicates those
*MigrationParameter* things, and if success we have chance to drop has_*
fields (in which case we simply always have them; that "has_" makes more
sense only if in a QMP session to allow user only specify one or more
things if not all).

> 
> >
> > IMHO what we should do is assert has_direct_io==true here too, meanwhile...
> >
> >>  }
> >>  
> >>  qapi_free_MigrationParameters(params);
> >> @@ -690,6 +697,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
> >> QDict *qdict)
> >>  p->has_mode = true;
> >>  visit_type_MigMode(v, param, &p->mode, &err);
> >>  break;
> >> +case MIGRATION_PARAMETER_DIRECT_IO:
> >> +p->has_direct_io = true;
> >> +visit_type_bool(v, param, &p->direct_io, &err);
> >> +break;
> >>  default:
> >>  assert(0);
> >>  }
> >> diff --git a/migration/options.c b/migration/options.c
> >> index 239f5ecfb4..ae464aa4f2 100644
> >> --- a/migration/options.c
> >> +++ b/migration/options.c
> >> @@ -826,6 +826,22 @@ int migrate_decompress_threads(void)
> >>  return s->parameters.decompress_threads;
> >>  }
> >>  
> >> +bool migrate_direct_io(void)
> >> +{
> >> +MigrationState *s = migrate_get_current();
> >> +
> >> +/* For now O_DIRECT is only supported with mapped-ram */
> >> +if (!s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM]) {
> >> +return false;
> >> +}
> >> +
> >> +if (s->parameters.has_direct_io) {
> >> +return s->parameters.direct_io;
> >> +}
> >> +
> >> +return false;
> >> +}
> >> +
> >>  uint64_t migrate_downtime_limit(void)
> >>  {
> >>  Mig

[PATCH] hw/nvme: fix number of PIDs for FDP RUH update

2024-05-03 Thread Vincent Fu

The number of PIDs is in the upper 16 bits of cdw10. So we need to
right-shift by 16 bits instead of only a single bit.

Signed-off-by: Vincent Fu 
---
 hw/nvme/ctrl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 127c3d2383..e89f9f7808 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4352,7 +4352,7 @@ static uint16_t nvme_io_mgmt_send_ruh_update(NvmeCtrl *n, 
NvmeRequest *req)
 NvmeNamespace *ns = req->ns;
 uint32_t cdw10 = le32_to_cpu(cmd->cdw10);
 uint16_t ret = NVME_SUCCESS;
-uint32_t npid = (cdw10 >> 1) + 1;
+uint32_t npid = (cdw10 >> 16) + 1;
 unsigned int i = 0;
 g_autofree uint16_t *pids = NULL;
 uint32_t maxnpid;
-- 
2.43.0

Re: [PATCH 3/9] tests/qtest/migration: Fix file migration offset check

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 05:36:59PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:36AM -0300, Fabiano Rosas wrote:
> >> When doing file migration, QEMU accepts an offset that should be
> >> skipped when writing the migration stream to the file. The purpose of
> >> the offset is to allow the management layer to put its own metadata at
> >> the start of the file.
> >> 
> >> We have tests for this in migration-test, but only testing that the
> >> migration stream starts at the correct offset and not that it actually
> >> leaves the data intact. Unsurprisingly, there's been a bug in that
> >> area that the tests didn't catch.
> >> 
> >> Fix the tests to write some data to the offset region and check that
> >> it's actually there after the migration.
> >> 
> >> Fixes: 3dc35470c8 ("tests/qtest: migration-test: Add tests for file-based 
> >> migration")
> >> Signed-off-by: Fabiano Rosas 
> >> ---
> >>  tests/qtest/migration-test.c | 70 +---
> >>  1 file changed, 65 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> >> index 5d6d8cd634..7b177686b4 100644
> >> --- a/tests/qtest/migration-test.c
> >> +++ b/tests/qtest/migration-test.c
> >> @@ -2081,6 +2081,63 @@ static void test_precopy_file(void)
> >>  test_file_common(&args, true);
> >>  }
> >>  
> >> +#ifndef _WIN32
> >> +static void file_dirty_offset_region(void)
> >> +{
> >> +#if defined(__linux__)
> >
> > Hmm, what's the case to cover when !_WIN32 && __linux__?  Can we remove one
> > layer of ifdef?
> >
> > I'm also wondering why it can't work on win32?  I thought win32 has all
> > these stuff we used here, but I may miss something.
> >
> 
> __linux__ is because of mmap, !_WIN32 is because of the passing of
> fds. We might be able to keep !_WIN32 only, I'll check.

Thanks, or simply use __linux__; we don't lose that much if test less on
very special hosts.  Just feel a bit over-engineer to use two ifdefs for
one such test.

> 
> >> +g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
> >> FILE_TEST_FILENAME);
> >> +size_t size = FILE_TEST_OFFSET;
> >> +uintptr_t *addr, *p;
> >> +int fd;
> >> +
> >> +fd = open(path, O_CREAT | O_RDWR, 0660);
> >> +g_assert(fd != -1);
> >> +
> >> +g_assert(!ftruncate(fd, size));
> >> +
> >> +addr = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0);
> >> +g_assert(addr != MAP_FAILED);
> >> +
> >> +/* ensure the skipped offset contains some data */
> >> +p = addr;
> >> +while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> >> +*p = (unsigned long) FILE_TEST_FILENAME;
> >
> > This is fine, but not as clear what is assigned..  I think here we assigned
> > is the pointer pointing to the binary's RO section (rather than the chars).
> 
> Haha you're right, I was assigning the FILE_TEST_OFFSET previously and
> just switched to the FILENAME without thinking. I'll fix it up.

:)

> 
> > Maybe using some random numbers would be more straightforward, but no
> > strong opinions.
> >
> >> +p++;
> >> +}
> >> +
> >> +munmap(addr, size);
> >> +fsync(fd);
> >> +close(fd);
> >> +#endif
> >> +}
> >> +
> >> +static void *file_offset_start_hook(QTestState *from, QTestState *to)
> >> +{
> >> +g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
> >> FILE_TEST_FILENAME);
> >> +int src_flags = O_WRONLY;
> >> +int dst_flags = O_RDONLY;
> >> +int fds[2];
> >> +
> >> +file_dirty_offset_region();
> >> +
> >> +fds[0] = open(file, src_flags, 0660);
> >> +assert(fds[0] != -1);
> >> +
> >> +fds[1] = open(file, dst_flags, 0660);
> >> +assert(fds[1] != -1);
> >> +
> >> +qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', 
> >> "
> >> + "'arguments': {'fdset-id': 1}}");
> >> +
> >> +qtest_qmp_fds_assert_success(to, &fds[1], 1, "{'execute': 'add-fd', "
> >> + "'arguments': {'fdset-id': 1}}");
> >> +
> >> +close(fds[0]);
> >> +close(fds[1]);
> >> +
> >> +return NULL;
> >> +}
> >> +
> >>  static void file_offset_finish_hook(QTestState *from, QTestState *to,
> >>  void *opaque)
> >>  {
> >> @@ -2096,12 +2153,12 @@ static void file_offset_finish_hook(QTestState 
> >> *from, QTestState *to,
> >>  g_assert(addr != MAP_FAILED);
> >>  
> >>  /*
> >> - * Ensure the skipped offset contains zeros and the migration
> >> - * stream starts at the right place.
> >> + * Ensure the skipped offset region's data has not been touched
> >> + * and the migration stream starts at the right place.
> >>   */
> >>  p = addr;
> >>  while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> >> -g_assert(*p == 0);
> >> +g_assert_cmpstr((char *) *p, ==, FILE_TEST_FILENAME);
> >>  p++;
> >>  }
> >>  g_assert_cmpint(cpu_to_be64(*p)

Re: [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io

2024-05-03 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Apr 26, 2024 at 11:20:39AM -0300, Fabiano Rosas wrote:
>> The tests are only allowed to run in systems that know about the
>> O_DIRECT flag and in filesystems which support it.
>> 
>> Signed-off-by: Fabiano Rosas 
>
> Mostly:
>
> Reviewed-by: Peter Xu 
>
> Two trivial comments below.
>
>> ---
>>  tests/qtest/migration-helpers.c | 42 +
>>  tests/qtest/migration-helpers.h |  1 +
>>  tests/qtest/migration-test.c| 42 +
>>  3 files changed, 85 insertions(+)
>> 
>> diff --git a/tests/qtest/migration-helpers.c 
>> b/tests/qtest/migration-helpers.c
>> index ce6d6615b5..356cd4fa8c 100644
>> --- a/tests/qtest/migration-helpers.c
>> +++ b/tests/qtest/migration-helpers.c
>> @@ -473,3 +473,45 @@ void migration_test_add(const char *path, void 
>> (*fn)(void))
>>  qtest_add_data_func_full(path, test, migration_test_wrapper,
>>   migration_test_destroy);
>>  }
>> +
>> +#ifdef O_DIRECT
>> +/*
>> + * Probe for O_DIRECT support on the filesystem. Since this is used
>> + * for tests, be conservative, if anything fails, assume it's
>> + * unsupported.
>> + */
>> +bool probe_o_direct_support(const char *tmpfs)
>> +{
>> +g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
>> +int fd, flags = O_CREAT | O_RDWR | O_TRUNC | O_DIRECT;
>> +void *buf;
>> +ssize_t ret, len;
>> +uint64_t offset;
>> +
>> +fd = open(filename, flags, 0660);
>> +if (fd < 0) {
>> +unlink(filename);
>> +return false;
>> +}
>> +
>> +/*
>> + * Assuming 4k should be enough to satisfy O_DIRECT alignment
>> + * requirements. The migration code uses 1M to be conservative.
>> + */
>> +len = 0x10;
>> +offset = 0x10;
>> +
>> +buf = aligned_alloc(len, len);
>
> This is the first usage of aligned_alloc() in qemu.  IIUC it's just a newer
> posix_memalign(), which QEMU has one use of, and it's protected with:
>
> #if defined(CONFIG_POSIX_MEMALIGN)
> int ret;
> ret = posix_memalign(&ptr, alignment, size);
> ...
> #endif
>
> Didn't check deeper.  Just keep this in mind if you see any compilation
> issues in future CIs, or simply switch to similar pattern.
>
>> +g_assert(buf);
>> +
>> +ret = pwrite(fd, buf, len, offset);
>> +unlink(filename);
>> +g_free(buf);
>> +
>> +if (ret < 0) {
>> +return false;
>> +}
>> +
>> +return true;
>> +}
>> +#endif
>> diff --git a/tests/qtest/migration-helpers.h 
>> b/tests/qtest/migration-helpers.h
>> index 1339835698..d827e16145 100644
>> --- a/tests/qtest/migration-helpers.h
>> +++ b/tests/qtest/migration-helpers.h
>> @@ -54,5 +54,6 @@ char *find_common_machine_version(const char *mtype, const 
>> char *var1,
>>const char *var2);
>>  char *resolve_machine_version(const char *alias, const char *var1,
>>const char *var2);
>> +bool probe_o_direct_support(const char *tmpfs);
>>  void migration_test_add(const char *path, void (*fn)(void));
>>  #endif /* MIGRATION_HELPERS_H */
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index 7b177686b4..512b7ede8b 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -2295,6 +2295,43 @@ static void test_multifd_file_mapped_ram(void)
>>  test_file_common(&args, true);
>>  }
>>  
>> +#ifdef O_DIRECT
>> +static void *migrate_mapped_ram_dio_start(QTestState *from,
>> + QTestState *to)
>> +{
>> +migrate_mapped_ram_start(from, to);
>
> This line seems redundant, migrate_multifd_mapped_ram_start() should cover 
> that.
>

This is an artifact of another patch that adds direct-io + mapped-ram
without multifd. I'm bringing that back on v2. We were having a
discussion[1] about it in the libvirt mailing list. Having direct-io
even without multifd might still be useful for libvirt.

1- 
https://lists.libvirt.org/archives/list/de...@lists.libvirt.org/thread/K4BDDJDMJ22XMJEFAUE323H5S5E47VQX/

>> +migrate_set_parameter_bool(from, "direct-io", true);
>> +migrate_set_parameter_bool(to, "direct-io", true);
>> +
>> +return NULL;
>> +}
>> +
>> +static void *migrate_multifd_mapped_ram_dio_start(QTestState *from,
>> + QTestState *to)
>> +{
>> +migrate_multifd_mapped_ram_start(from, to);
>> +return migrate_mapped_ram_dio_start(from, to);
>> +}
>> +
>> +static void test_multifd_file_mapped_ram_dio(void)
>> +{
>> +g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
>> +   FILE_TEST_FILENAME);
>> +MigrateCommon args = {
>> +.connect_uri = uri,
>> +.listen_uri = "defer",
>> +.start_hook = migrate_multifd_mapped_ram_dio_start,
>> +};
>> +
>> +if (!probe_o_direct_support(tmpfs)) {
>> +g_test_skip("Filesystem does not

Re: [PATCH 2/9] migration: Fix file migration with fdset

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 04:56:08PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> >> When the migration using the "file:" URI was implemented, I don't
> >> think any of us noticed that if you pass in a file name with the
> >> format "/dev/fdset/N", this allows a file descriptor to be passed in
> >> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> >> support has been added without regard for the fdset part and we got
> >> some things wrong.
> >> 
> >> The first issue is that we should not truncate the migration file if
> >> we're allowing an fd + offset. We need to leave the file contents
> >> untouched.
> >
> > I'm wondering whether we can use fallocate() instead on the ranges so that
> > we always don't open() with O_TRUNC.  Before that..  could you remind me
> > why do we need to truncate in the first place?  I definitely missed
> > something else here too.
> 
> AFAIK, just to avoid any issues if the file is pre-existing. I don't see
> the difference between O_TRUNC and fallocate in this case.

Then, shall we avoid truncations at all, leaving all the feasibility to
user (also errors prone to make)?

> 
> >
> >> 
> >> The second issue is that there's an expectation that QEMU removes the
> >> fd after the migration has finished. That's what the "fd:" code
> >> does. Otherwise a second migration on the same VM could attempt to
> >> provide an fdset with the same name and QEMU would reject it.
> >
> > Let me check what we do when with "fd:" and when migration completes or
> > cancels.
> >
> > IIUC it's qio_channel_file_close() that does the final cleanup work on
> > e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
> >
> > /* Close fd that was dup'd from an fdset */
> > fdset_id = monitor_fdset_dup_fd_find(fd);
> > if (fdset_id != -1) {
> > int ret;
> >
> > ret = close(fd);
> > if (ret == 0) {
> > monitor_fdset_dup_fd_remove(fd);
> > }
> >
> > return ret;
> > }
> >
> > Shouldn't this done the work already?
> 
> That removes the mon_fdset_fd_dup->fd, we want to remove the
> mon_fdset_fd->fd.

What I read so far is when we are removing the dup-fds, we'll do one more
thing:

monitor_fdset_dup_fd_find_remove():
if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
monitor_fdset_cleanup(mon_fdset);
}

It means if we removed all the dup-fds correctly, we should also remove the
whole fdset, which includes the ->fds, IIUC.

> 
> >
> > Off topic: I think this code is over complicated too, maybe I missed
> > something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
> > simply walk the list and remove stuff..  I attach a patch at the end that I
> > tried to clean that up, just in case there's early comments.  But we can
> > ignore that so we don't get side-tracked, and focus on the direct-io
> > issues.
> 
> Well, I'm not confident touching this code. This is more than a decade
> old, I have no idea what the original motivations were. The possible
> interactions with the user via command-line (-add-fd), QMP (add-fd) and
> the monitor lifetime make me confused. Not to mention the fdset part
> being plumbed into the guts of a widely used qemu_open_internal() that
> very misleadingly presents itself as just a wrapper for open().

If to make QEMU long live, we'll probably need to touch it at some
point.. or at least discuss about it and figure things out. We pay tech
debts like this when there's no good comment / docs to refer in this case,
then the earlier, perhaps also the better.. to try taking the stab, imho.

Definitely not a request to clean everything up. :) Let's see whether
others can chim in with better knowledge of the history.

> 
> >
> > Thanks,
> >
> > ===
> >
> > From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
> > From: Peter Xu 
> > Date: Fri, 3 May 2024 11:27:20 -0400
> > Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > This function is not needed, one remove function should already work.
> > Clean it up.
> >
> > Here the code doesn't really care about whether we need to keep that dupfd
> > around if close() failed: when that happens something got very wrong,
> > keeping the dup_fd around the fdsets may not help that situation so far.
> >
> > Cc: Dr. David Alan Gilbert 
> > Cc: Markus Armbruster 
> > Cc: Philippe Mathieu-Daudé 
> > Cc: Paolo Bonzini 
> > Cc: Daniel P. Berrangé 
> > Signed-off-by: Peter Xu 
> > ---
> >  include/monitor/monitor.h |  1 -
> >  monitor/fds.c | 27 +--
> >  stubs/fdset.c |  5 -
> >  util/osdep.c  | 15 +--
> >  4 files changed, 6 insertions(+), 42 deletions(-)
> >
> > diff --git a/include/monitor/monitor.h b/include/monitor/mon

Re: [PATCH 5/9] migration/multifd: Add direct-io support

2024-05-03 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Apr 26, 2024 at 11:20:38AM -0300, Fabiano Rosas wrote:
>> When multifd is used along with mapped-ram, we can take benefit of a
>> filesystem that supports the O_DIRECT flag and perform direct I/O in
>> the multifd threads. This brings a significant performance improvement
>> because direct-io writes bypass the page cache which would otherwise
>> be thrashed by the multifd data which is unlikely to be needed again
>> in a short period of time.
>> 
>> To be able to use a multifd channel opened with O_DIRECT, we must
>> ensure that a certain aligment is used. Filesystems usually require a
>> block-size alignment for direct I/O. The way to achieve this is by
>> enabling the mapped-ram feature, which already aligns its I/O properly
>> (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
>> 
>> By setting O_DIRECT on the multifd channels, all writes to the same
>> file descriptor need to be aligned as well, even the ones that come
>> from outside multifd, such as the QEMUFile I/O from the main migration
>> code. This makes it impossible to use the same file descriptor for the
>> QEMUFile and for the multifd channels. The various flags and metadata
>> written by the main migration code will always be unaligned by virtue
>> of their small size. To workaround this issue, we'll require a second
>> file descriptor to be used exclusively for direct I/O.
>> 
>> The second file descriptor can be obtained by QEMU by re-opening the
>> migration file (already possible), or by being provided by the user or
>> management application (support to be added in future patches).
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  migration/file.c  | 22 +++---
>>  migration/migration.c | 23 +++
>>  2 files changed, 42 insertions(+), 3 deletions(-)
>> 
>> diff --git a/migration/file.c b/migration/file.c
>> index 8f30999400..b9265b14dd 100644
>> --- a/migration/file.c
>> +++ b/migration/file.c
>> @@ -83,17 +83,33 @@ void file_cleanup_outgoing_migration(void)
>>  
>>  bool file_send_channel_create(gpointer opaque, Error **errp)
>>  {
>> -QIOChannelFile *ioc;
>> +QIOChannelFile *ioc = NULL;
>>  int flags = O_WRONLY;
>> -bool ret = true;
>> +bool ret = false;
>> +
>> +if (migrate_direct_io()) {
>> +#ifdef O_DIRECT
>> +/*
>> + * Enable O_DIRECT for the secondary channels. These are used
>> + * for sending ram pages and writes should be guaranteed to be
>> + * aligned to at least page size.
>> + */
>> +flags |= O_DIRECT;
>> +#else
>> +error_setg(errp, "System does not support O_DIRECT");
>> +error_append_hint(errp,
>> +  "Try disabling direct-io migration capability\n");
>> +goto out;
>> +#endif
>
> Hopefully if we can fail migrate-set-parameters correctly always, we will
> never trigger this error.
>
> I know Linux used some trick like this to even avoid such ifdefs:
>
>   if (qemu_has_direct_io() && migrate_direct_io()) {
>   // reference O_DIRECT
>   }
>
> So as long as qemu_has_direct_io() can return a constant "false" when
> O_DIRECT not defined, the compiler is smart enough to ignore the O_DIRECT
> inside the block.
>
> Even if it won't work, we can still avoid that error (and rely on the
> set-parameter failure):
>
> #ifdef O_DIRECT
>if (migrate_direct_io()) {
>// reference O_DIRECT
>}
> #endif
>
> Then it should run the same, just to try making ifdefs as light as
> possible..

Ok.

Just FYI, in v2 I'm adding direct-io to migration incoming side as well,
so I put this logic into a helper:

static bool file_enable_direct_io(int *flags, Error **errp)
{
if (migrate_direct_io()) {
#ifdef O_DIRECT
*flags |= O_DIRECT;
#else
error_setg(errp, "System does not support O_DIRECT");
error_append_hint(errp,
  "Try disabling direct-io migration capability\n");
return false;
#endif
}

return true;
}

But I'll apply your suggestions nonetheless.

>
>> +}
>>  
>>  ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
>>  if (!ioc) {
>> -ret = false;
>>  goto out;
>>  }
>>  
>>  multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
>> +ret = true;
>>  
>>  out:
>>  /*
>> diff --git a/migration/migration.c b/migration/migration.c
>> index b5af6b5105..cb923a3f62 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -155,6 +155,16 @@ static bool migration_needs_seekable_channel(void)
>>  return migrate_mapped_ram();
>>  }
>>  
>> +static bool migration_needs_multiple_fds(void)
>
> If I suggest to rename this, would you agree? :)
>

Sure, although this is the more accurate usage than "multifd" hehe.

> I'd try with "migrate_needs_extra_fd()" or "migrate_needs_two_fds()",
> or... just to avoid "multi" + "fd" used altogether, perhaps.
>
> Other than that looks all good.
>
> Thanks,
>
>> +{
>> +/

Re: [PATCH 4/9] migration: Add direct-io parameter

2024-05-03 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
>> Add the direct-io migration parameter that tells the migration code to
>> use O_DIRECT when opening the migration stream file whenever possible.
>> 
>> This is currently only used with the mapped-ram migration that has a
>> clear window guaranteed to perform aligned writes.
>> 
>> Acked-by: Markus Armbruster 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  include/qemu/osdep.h   |  2 ++
>>  migration/migration-hmp-cmds.c | 11 +++
>>  migration/options.c| 30 ++
>>  migration/options.h|  1 +
>>  qapi/migration.json| 18 +++---
>>  util/osdep.c   |  9 +
>>  6 files changed, 68 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
>> index c7053cdc2b..645c14a65d 100644
>> --- a/include/qemu/osdep.h
>> +++ b/include/qemu/osdep.h
>> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t 
>> len, bool exclusive);
>>  bool qemu_has_ofd_lock(void);
>>  #endif
>>  
>> +bool qemu_has_direct_io(void);
>> +
>>  #if defined(__HAIKU__) && defined(__i386__)
>>  #define FMT_pid "%ld"
>>  #elif defined(WIN64)
>> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
>> index 7e96ae6ffd..8496a2b34e 100644
>> --- a/migration/migration-hmp-cmds.c
>> +++ b/migration/migration-hmp-cmds.c
>> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
>> QDict *qdict)
>>  monitor_printf(mon, "%s: %s\n",
>>  MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>>  qapi_enum_lookup(&MigMode_lookup, params->mode));
>> +
>> +if (params->has_direct_io) {
>> +monitor_printf(mon, "%s: %s\n",
>> +   MigrationParameter_str(
>> +   MIGRATION_PARAMETER_DIRECT_IO),
>> +   params->direct_io ? "on" : "off");
>> +}
>
> This will be the first parameter to optionally display here.  I think it's
> a sign of misuse of has_direct_io field..
>
> IMHO has_direct_io should be best to be kept as "whether direct_io field is
> valid" and that's all of it.  It hopefully shouldn't contain more
> information than that, or otherwise it'll be another small challenge we
> need to overcome when we can remove all these has_* fields, and can also be
> easily overlooked.

I don't think I understand why we have those has_* fields. I thought my
usage of 'params->has_direct_io = qemu_has_direct_io()' was the correct
one, i.e. checking whether QEMU has any support for that parameter. Can
you help me out here?

>
> IMHO what we should do is assert has_direct_io==true here too, meanwhile...
>
>>  }
>>  
>>  qapi_free_MigrationParameters(params);
>> @@ -690,6 +697,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
>> QDict *qdict)
>>  p->has_mode = true;
>>  visit_type_MigMode(v, param, &p->mode, &err);
>>  break;
>> +case MIGRATION_PARAMETER_DIRECT_IO:
>> +p->has_direct_io = true;
>> +visit_type_bool(v, param, &p->direct_io, &err);
>> +break;
>>  default:
>>  assert(0);
>>  }
>> diff --git a/migration/options.c b/migration/options.c
>> index 239f5ecfb4..ae464aa4f2 100644
>> --- a/migration/options.c
>> +++ b/migration/options.c
>> @@ -826,6 +826,22 @@ int migrate_decompress_threads(void)
>>  return s->parameters.decompress_threads;
>>  }
>>  
>> +bool migrate_direct_io(void)
>> +{
>> +MigrationState *s = migrate_get_current();
>> +
>> +/* For now O_DIRECT is only supported with mapped-ram */
>> +if (!s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM]) {
>> +return false;
>> +}
>> +
>> +if (s->parameters.has_direct_io) {
>> +return s->parameters.direct_io;
>> +}
>> +
>> +return false;
>> +}
>> +
>>  uint64_t migrate_downtime_limit(void)
>>  {
>>  MigrationState *s = migrate_get_current();
>> @@ -1061,6 +1077,11 @@ MigrationParameters 
>> *qmp_query_migrate_parameters(Error **errp)
>>  params->has_zero_page_detection = true;
>>  params->zero_page_detection = s->parameters.zero_page_detection;
>>  
>> +if (s->parameters.has_direct_io) {
>> +params->has_direct_io = true;
>> +params->direct_io = s->parameters.direct_io;
>> +}
>> +
>>  return params;
>>  }
>>  
>> @@ -1097,6 +1118,7 @@ void migrate_params_init(MigrationParameters *params)
>>  params->has_vcpu_dirty_limit = true;
>>  params->has_mode = true;
>>  params->has_zero_page_detection = true;
>> +params->has_direct_io = qemu_has_direct_io();
>>  }
>>  
>>  /*
>> @@ -1416,6 +1438,10 @@ static void 
>> migrate_params_test_apply(MigrateSetParameters *params,
>>  if (params->has_zero_page_detection) {
>>  dest->zero_page_detection = params->zero_page_detection;
>>  }
>> +
>> +if (params->has_direct_io) {

Re: [PATCH 3/9] tests/qtest/migration: Fix file migration offset check

2024-05-03 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Apr 26, 2024 at 11:20:36AM -0300, Fabiano Rosas wrote:
>> When doing file migration, QEMU accepts an offset that should be
>> skipped when writing the migration stream to the file. The purpose of
>> the offset is to allow the management layer to put its own metadata at
>> the start of the file.
>> 
>> We have tests for this in migration-test, but only testing that the
>> migration stream starts at the correct offset and not that it actually
>> leaves the data intact. Unsurprisingly, there's been a bug in that
>> area that the tests didn't catch.
>> 
>> Fix the tests to write some data to the offset region and check that
>> it's actually there after the migration.
>> 
>> Fixes: 3dc35470c8 ("tests/qtest: migration-test: Add tests for file-based 
>> migration")
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  tests/qtest/migration-test.c | 70 +---
>>  1 file changed, 65 insertions(+), 5 deletions(-)
>> 
>> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
>> index 5d6d8cd634..7b177686b4 100644
>> --- a/tests/qtest/migration-test.c
>> +++ b/tests/qtest/migration-test.c
>> @@ -2081,6 +2081,63 @@ static void test_precopy_file(void)
>>  test_file_common(&args, true);
>>  }
>>  
>> +#ifndef _WIN32
>> +static void file_dirty_offset_region(void)
>> +{
>> +#if defined(__linux__)
>
> Hmm, what's the case to cover when !_WIN32 && __linux__?  Can we remove one
> layer of ifdef?
>
> I'm also wondering why it can't work on win32?  I thought win32 has all
> these stuff we used here, but I may miss something.
>

__linux__ is because of mmap, !_WIN32 is because of the passing of
fds. We might be able to keep !_WIN32 only, I'll check.

>> +g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
>> FILE_TEST_FILENAME);
>> +size_t size = FILE_TEST_OFFSET;
>> +uintptr_t *addr, *p;
>> +int fd;
>> +
>> +fd = open(path, O_CREAT | O_RDWR, 0660);
>> +g_assert(fd != -1);
>> +
>> +g_assert(!ftruncate(fd, size));
>> +
>> +addr = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0);
>> +g_assert(addr != MAP_FAILED);
>> +
>> +/* ensure the skipped offset contains some data */
>> +p = addr;
>> +while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
>> +*p = (unsigned long) FILE_TEST_FILENAME;
>
> This is fine, but not as clear what is assigned..  I think here we assigned
> is the pointer pointing to the binary's RO section (rather than the chars).

Haha you're right, I was assigning the FILE_TEST_OFFSET previously and
just switched to the FILENAME without thinking. I'll fix it up.

> Maybe using some random numbers would be more straightforward, but no
> strong opinions.
>
>> +p++;
>> +}
>> +
>> +munmap(addr, size);
>> +fsync(fd);
>> +close(fd);
>> +#endif
>> +}
>> +
>> +static void *file_offset_start_hook(QTestState *from, QTestState *to)
>> +{
>> +g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
>> FILE_TEST_FILENAME);
>> +int src_flags = O_WRONLY;
>> +int dst_flags = O_RDONLY;
>> +int fds[2];
>> +
>> +file_dirty_offset_region();
>> +
>> +fds[0] = open(file, src_flags, 0660);
>> +assert(fds[0] != -1);
>> +
>> +fds[1] = open(file, dst_flags, 0660);
>> +assert(fds[1] != -1);
>> +
>> +qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', "
>> + "'arguments': {'fdset-id': 1}}");
>> +
>> +qtest_qmp_fds_assert_success(to, &fds[1], 1, "{'execute': 'add-fd', "
>> + "'arguments': {'fdset-id': 1}}");
>> +
>> +close(fds[0]);
>> +close(fds[1]);
>> +
>> +return NULL;
>> +}
>> +
>>  static void file_offset_finish_hook(QTestState *from, QTestState *to,
>>  void *opaque)
>>  {
>> @@ -2096,12 +2153,12 @@ static void file_offset_finish_hook(QTestState 
>> *from, QTestState *to,
>>  g_assert(addr != MAP_FAILED);
>>  
>>  /*
>> - * Ensure the skipped offset contains zeros and the migration
>> - * stream starts at the right place.
>> + * Ensure the skipped offset region's data has not been touched
>> + * and the migration stream starts at the right place.
>>   */
>>  p = addr;
>>  while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
>> -g_assert(*p == 0);
>> +g_assert_cmpstr((char *) *p, ==, FILE_TEST_FILENAME);
>>  p++;
>>  }
>>  g_assert_cmpint(cpu_to_be64(*p) >> 32, ==, QEMU_VM_FILE_MAGIC);
>> @@ -2113,17 +2170,18 @@ static void file_offset_finish_hook(QTestState 
>> *from, QTestState *to,
>>  
>>  static void test_precopy_file_offset(void)
>>  {
>> -g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
>> -   FILE_TEST_FILENAME,
>> +g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
>> FILE_TEST_OFFSET);
>
> Do we want to

RE: [PATCH V8 3/8] hw/acpi: Update ACPI GED framework to support vCPU Hotplug

2024-05-03 Thread Salil Mehta via

Hi Vishnu,

>  From: Vishnu Pajjuri  
>  Sent: Thursday, April 4, 2024 3:01 PM
>  To: Salil Mehta ; qemu-devel@nongnu.org; 
> qemu-...@nongnu.org
>  
>  Hi Salil,
>>  On 12-03-2024 07:29, Salil Mehta wrote:
>>  ACPI GED (as described in the ACPI 6.4 spec) uses an interrupt listed in the
>>  _CRS object of GED to intimate OSPM about an event. Later then 
>> demultiplexes the
>>  notified event by evaluating ACPI _EVT method to know the type of event. Use
>>  ACPI GED to also notify the guest kernel about any CPU hot(un)plug events.
>>  
>>  ACPI CPU hotplug related initialization should only happen if 
>> ACPI_CPU_HOTPLUG
>>  support has been enabled for particular architecture. Add 
>> cpu_hotplug_hw_init()
>>  stub to avoid compilation break.
>>  
>>  Co-developed-by: Keqian Zhu mailto:zhukeqi...@huawei.com
>>  Signed-off-by: Keqian Zhu mailto:zhukeqi...@huawei.com
>>  Signed-off-by: Salil Mehta mailto:salil.me...@huawei.com
>>  Reviewed-by: Jonathan Cameron mailto:jonathan.came...@huawei.com
>>  Reviewed-by: Gavin Shan mailto:gs...@redhat.com
>>  Reviewed-by: David Hildenbrand mailto:da...@redhat.com
>>  Reviewed-by: Shaoqin Huang mailto:shahu...@redhat.com
>>  Tested-by: Vishnu Pajjuri mailto:vis...@os.amperecomputing.com
>>  Tested-by: Xianglai Li mailto:lixiang...@loongson.cn
>>  Tested-by: Miguel Luis mailto:miguel.l...@oracle.com
>>  ---
>>   hw/acpi/acpi-cpu-hotplug-stub.c>|  6 ++
>>   hw/acpi/generic_event_device.c> | 17 +
>>   include/hw/acpi/generic_event_device.h |  4 
>>   3 files changed, 27 insertions(+)
>>  
>>  diff --git a/hw/acpi/acpi-cpu-hotplug-stub.c 
>> b/hw/acpi/acpi-cpu-hotplug-stub.c
>>  index 3fc4b14c26..c6c61bb9cd 100644
>>  --- a/hw/acpi/acpi-cpu-hotplug-stub.c
>>  +++ b/hw/acpi/acpi-cpu-hotplug-stub.c
>>  @@ -19,6 +19,12 @@ void legacy_acpi_cpu_hotplug_init(MemoryRegion *parent, 
>> Object *owner,
>>   return;
>>   }
>>   
>>  +void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
>>  + CPUHotplugState *state, hwaddr base_addr)
>>  +{
>>  +return;
>>  +}
>>  +
>>   void acpi_cpu_ospm_status(CPUHotplugState *cpu_st, ACPIOSTInfoList ***list)
>>   {
>>   return;
>>  diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
>>  index 2d6e91b124..35a71505a5 100644
>>  --- a/hw/acpi/generic_event_device.c
>>  +++ b/hw/acpi/generic_event_device.c
>>  @@ -12,6 +12,7 @@
>>   #include "qemu/osdep.h"
>>   #include "qapi/error.h"
>>   #include "hw/acpi/acpi.h"
>>  +#include "hw/acpi/cpu.h"
>>   #include "hw/acpi/generic_event_device.h"
>>   #include "hw/irq.h"
>>   #include "hw/mem/pc-dimm.h"
>>  @@ -25,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
>>   ACPI_GED_MEM_HOTPLUG_EVT,
>>   ACPI_GED_PWR_DOWN_EVT,
>>   ACPI_GED_NVDIMM_HOTPLUG_EVT,
>>  +ACPI_GED_CPU_HOTPLUG_EVT,
>>   };
>>   
>>   /*
>>  @@ -234,6 +236,8 @@ static void acpi_ged_device_plug_cb(HotplugHandler 
>> *hotplug_dev,
>>   } else {
>>   acpi_memory_plug_cb(hotplug_dev, &s->memhp_state, dev, errp);
>>   }
>>  +} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>>  +acpi_cpu_plug_cb(hotplug_dev, &s->cpuhp_state, dev, errp);
>>   } else {
>>   error_setg(errp, "virt: device plug request for unsupported device"
>>  " type: %s", object_get_typename(OBJECT(dev)));
>>  @@ -248,6 +252,8 @@ static void acpi_ged_unplug_request_cb(HotplugHandler 
>> *hotplug_dev,
>>   if ((object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
>>  !(object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM {
>>  > acpi_memory_unplug_request_cb(hotplug_dev, &s->memhp_state, dev, 
>> errp);
>>  +} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>>  +acpi_cpu_unplug_request_cb(hotplug_dev, &s->cpuhp_state, dev, errp);
>>   } else {
>>   error_setg(errp, "acpi: device unplug request for unsupported device"
>>  " type: %s", object_get_typename(OBJECT(dev)));
>>  @@ -261,6 +267,8 @@ static void acpi_ged_unplug_cb(HotplugHandler 
>> *hotplug_dev,
>>   
>>   if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>>   acpi_memory_unplug_cb(&s->memhp_state, dev, errp);
>>  +} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
>>  +acpi_cpu_unplug_cb(&s->cpuhp_state, dev, errp);
>>   } else {
>>   error_setg(errp, "acpi: device unplug for unsupported device"
>>  " type: %s", object_get_typename(OBJECT(dev)));
>>  @@ -272,6 +280,7 @@ static void acpi_ged_ospm_status(AcpiDeviceIf *adev, 
>> ACPIOSTInfoList ***list)
>>   AcpiGedState *s = ACPI_GED(adev);
>>   
>>   acpi_memory_ospm_status(&s->memhp_state, list);
>>  +acpi_cpu_ospm_status(&s->cpuhp_state, list);
>>   }
>>   
>>   static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
>>  @@ -286,6 +295,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
>> AcpiEventStatusBits ev)
>>   sel = ACPI_GED_PWR_DOWN_EVT;
>>   } else if (ev & ACPI_NVDIMM_HOTPLUG_STATU

Re: More doc updates needed for new migrate argument @channels

2024-05-03 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, May 03, 2024 at 09:13:27AM -0400, Steven Sistare wrote:
>> On 5/3/2024 8:49 AM, Fabiano Rosas wrote:
>> > Markus Armbruster  writes:
>> > 
>> > > Commit 074dbce5fcce (migration: New migrate and migrate-incoming
>> > > argument 'channels') and its fixup commit 57fd4b4e1075 made command
>> > > migrate argument @uri optional and mutually exclusive with @channels.
>> > > 
>> > > But documentation still talks about "the migration URI" in several
>> > > places.
>> > > 
>> > > * MigrationCapability @mapped-ram:
>> > > 
>> > >  # @mapped-ram: Migrate using fixed offsets in the migration file for
>> > >  # each RAM page.  Requires a migration URI that supports 
>> > > seeking,
>> > >  # such as a file.  (since 9.0)
>> > > 
>> > >I guess it requires the migration destination (@uri or @channels) to
>> > >support seeking.
>> > 
>> > This is ambiguous. The migration destination is usually the VM at the
>> > end of the migration, not the medium to where the migration stream is
>> > written to.
>> > 
>> > One option is to just add the mention to channel all around: "Requires a
>> > migration URI or channel that supports seeking".
>> > 
>> > > 
>> > > * MigMode @cpr-reboot:
>> > > 
>> > >  # @cpr-reboot: The migrate command stops the VM and saves state to 
>> > > the
>> > >  # URI.  After quitting QEMU, the user resumes by running QEMU
>> > >  # -incoming.
>> > >  #
>> > >  # This mode allows the user to quit QEMU, optionally update and
>> > >  # reboot the OS, and restart QEMU.  If the user reboots, the URI
>> > >  # must persist across the reboot, such as by using a file.
>> > > 
>> > >I guess this saves to the migration destination (@uri or @channels).
>> > > 
>> > > * Migration Parameter @tls-hostname and its two buddies
>> > > 
>> > >  # @tls-hostname: migration target's hostname for validating the
>> > >  # server's x509 certificate identity.  If empty, QEMU will use 
>> > > the
>> > >  # hostname from the migration URI, if any.  A non-empty value is
>> > >  # required when using x509 based TLS credentials and the 
>> > > migration
>> > >  # URI does not include a hostname, such as fd: or exec: based
>> > >  # migration.  (Since 2.7)
>> > >  #
>> > >  # Note: empty value works only since 2.9.
>> > > 
>> > >What's the default when we're using @channels instead of @uri?
>> > 
>> > The same, both URI and channels get put in the same structure before
>> > taking the hostname.
>> > 
>> > > 
>> > > * migrate-recover
>> > > 
>> > >  ##
>> > >  # @migrate-recover:
>> > >  #
>> > >  # Provide a recovery migration stream URI.
>> > >  #
>> > >  # @uri: the URI to be used for the recovery of migration stream.
>> > > 
>> > >Should this command be extended to accept @channels?
>> > 
>> > Yes.
>> > 
>> > > 
>> > > * docs/devel/migration/CPR.rst
>> > > 
>> > >Didn't look closely.  Let's discuss the others first, then come back
>> > >to this one.
>> > > 
>> > > * HMP migrate
>> > > 
>> > >Fine, because this still only supports URI syntax.
>> > > 
>> > > * CLI option "-incoming defer"
>> > > 
>> > >  "-incoming defer\n" \
>> > >  "wait for the URI to be specified via 
>> > > migrate_incoming\n",
>> > > 
>> > >and
>> > > 
>> > >  ``-incoming defer``
>> > >  Wait for the URI to be specified via migrate\_incoming. The 
>> > > monitor
>> > >  can be used to change settings (such as migration parameters) 
>> > > prior
>> > >  to issuing the migrate\_incoming to allow the migration to 
>> > > begin.
>> > > 
>> > >I figure we should call it "the migration source" instead of "the URI"
>> > >here.
>> > 
>> > I think it's worse. We need a proper way to refer exclusively to "the
>> > thing that the user passes as an argument to the migrate command".
>> 
>> Agreed.  My thoughts:
>> 
>> "migration URI" -> "migration URI/channel"
>> or
>> "migration URI" -> "migration stream"
>
> "stream" might imply more on the protocol itself to me, e.g. how the
> migration headers are defined, rather than the entity / fabric we use to
> send the data stream?
>
> Maybe we can simply do s/URI/channel/? As "channel" can also imply the URI
> in this case as yet another (old) format to specify the migration channels.
> It's like we always use QIOChannels underneath whatever way we specify the
> channels (either URI or the new "channels" API).

I'm fine with just "channel". "URI/channel" is ok as well.

Do we intend to deprecate the URI usage via QMP? Or are going to support
the two ways indefinitely? Keep URI for HMP-only?

If the MigrationAddress API ends up being the only one, then maybe it
makes sense to stop using URI all over.

>
> I also copied qemu-devel starting from now.
>
> Thanks,

RE: [PATCH V8 3/8] hw/acpi: Update ACPI GED framework to support vCPU Hotplug

2024-05-03 Thread Salil Mehta via

Hello,

Sorry, I missed this earlier.

>  From: Zhao Liu 
>  Sent: Wednesday, March 13, 2024 6:14 AM
>  To: Salil Mehta 
>  
>  Hi Salil,
>  
>  It seems my comment [1] in v7 was missed, but I still hit the same issue. Pls
>  let me paste the previous comment here again.
>  
>  [1]: https://lore.kernel.org/qemu-devel/zxcqp32ggifvu...@intel.com/

Yes, I have this in my mind. 

>  
>  [snip]
>  
>  > @@ -400,6 +411,12 @@ static void acpi_ged_initfn(Object *obj)
>  >  memory_region_init_io(&ged_st->regs, obj, &ged_regs_ops, ged_st,
>  >TYPE_ACPI_GED "-regs", ACPI_GED_REG_COUNT);
>  >  sysbus_init_mmio(sbd, &ged_st->regs);
>  > +
>  > +memory_region_init(&s->container_cpuhp, OBJECT(dev), "cpuhp
>  container",
>  > +   ACPI_CPU_HOTPLUG_REG_LEN);
>  > +sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->container_cpuhp);
>  > +cpu_hotplug_hw_init(&s->container_cpuhp, OBJECT(dev),
>  > +&s->cpuhp_state, 0);
>  >  }
>  >
>  
>  I find this cpu_hotplug_hw_init() can still cause qtest errors (for v8) on 
> x86
>  platforms as you mentioned in v6:
>  https://lore.kernel.org/qemu-devel/15e70616-6abb-63a4-17d0-
>  820f4a254...@opnsrc.net/T/#m108f102b2fe92b7dd7218f2f942f7b233a9d6a
>  f3
>  
>  IIUC, microvm machine has its own 'possible_cpus_arch_ids' and that is
>  inherited from its parent x86 machine.
>  
>  The above error is because device-introspect-test sets the none-machine:
>  
>  # starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-
>  3094820.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-
>  3094820.qmp,id=char0 -mon chardev=char0,mode=control -display none -
>  audio none -nodefaults -machine none -accel qtest
>  
>  So what about just checking mc->possible_cpu_arch_ids instead of an
>  assert in cpu_hotplug_hw_init()?
>  
>  diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c index
>  4b24a2500361..303f1f1f57bc 100644
>  --- a/hw/acpi/cpu.c
>  +++ b/hw/acpi/cpu.c
>  @@ -221,7 +221,10 @@ void cpu_hotplug_hw_init(MemoryRegion *as,
>  Object *owner,
>   const CPUArchIdList *id_list;
>   int i;
>  
>  -assert(mc->possible_cpu_arch_ids);
>  +if (!mc->possible_cpu_arch_ids) {
>  +return;
>  +}
>  +


Yes, we can do this with some debug print or trace maybe.


>   id_list = mc->possible_cpu_arch_ids(machine);
>   state->dev_count = id_list->len;
>   state->devs = g_new0(typeof(*state->devs), state->dev_count);
>  
>  This check seems to be acceptable in the general code path? Not all
>  machines have possible_cpu_arch_ids, after all.

True. BTW, have you tested this with Qtest?

Thanks
Salil.

Re: [PATCH] tests/avocado: update sunxi kernel from armbian to 6.6.16

2024-05-03 Thread Niek Linnenbank

On Tue, Apr 30, 2024 at 4:12 PM Peter Maydell 
wrote:

> On Mon, 29 Apr 2024 at 21:40, Niek Linnenbank 
> wrote:
> >
> > Hi Peter, Strahinja,
> >
> > I can confirm that the orangepi-pc and cubieboard based tests are
> working OK using the newer kernel 6.6.16:
> >
> >   $ ARMBIAN_ARTIFACTS_CACHED=yes AVOCADO_ALLOW_LARGE_STORAGE=yes
> ./build/pyvenv/bin/avocado --show=app,console run -t machine:orangepi-pc -t
> machine:cubieboard tests/avocado/boot_linux_console.py
> >   ...
> >   RESULTS: PASS 7 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0
> | CANCEL 1
> >   JOB TIME   : 177.65 s
> >
> > So for this patch:
> > Reviewed-by: Niek Linnenbank 
> > Tested-by: Niek Linnenbank 
>
> Great, thanks. (I'll put this patch into an upcoming arm pullreq.)
>
> > About the BootLinuxConsole.test_arm_orangepi_bionic_20_08 test, I'd be
> happy to provide a patch to revive that test.
> > Since that test is no longer working without having the image available,
> this could also be a good moment to re-consider if armbian is really the
> best input for testing
> > the orangepi-pc board. The image is relatively larger and slower
> compared to other images, like the two openwrt based tests for cubieboard
> and bpim2u.
> >
> > After some searching I've found that Openwrt also has orangepi-pc
> support:
> >   https://openwrt.org/toh/xunlong/orange_pi_pc
> >
> > That image works fine with our emulated orangepi-pc board:
> >
> > $ qemu-system-arm -M orangepi-pc -sd
> openwrt-23.05.0-sunxi-cortexa7-xunlong_orangepi-pc-ext4-sdcard.img
> -nographic
>
> > Using openwrt also for the orangepi-pc test instead of armbian also
> gives some consistency between the various tests, to some degree. What are
> you opinions on this?
>
> Yeah, seems reasonable. My main thing to think about would be
> that to understand what extra coverage this gives us that we
> don't already have (there's no point running a ton of tests
> which all amount to "boot a Linux kernel to a shell prompt").
> It looks like what we get from this one is that we are testing
> the "boot off an SD card image via u-boot" flow -- is that right?
>

Yes, correct.
Okey, I'll try to find some time the coming days to prepare a patch that
replaces test_arm_orangepi_bionic_20_08 with a test that uses openwrt
instead.

Niek



>
> thanks
> -- PMM
>


-- 
Niek Linnenbank

Re: [PATCH 2/9] migration: Fix file migration with fdset

2024-05-03 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
>> When the migration using the "file:" URI was implemented, I don't
>> think any of us noticed that if you pass in a file name with the
>> format "/dev/fdset/N", this allows a file descriptor to be passed in
>> to QEMU and that behaves just like the "fd:" URI. So the "file:"
>> support has been added without regard for the fdset part and we got
>> some things wrong.
>> 
>> The first issue is that we should not truncate the migration file if
>> we're allowing an fd + offset. We need to leave the file contents
>> untouched.
>
> I'm wondering whether we can use fallocate() instead on the ranges so that
> we always don't open() with O_TRUNC.  Before that..  could you remind me
> why do we need to truncate in the first place?  I definitely missed
> something else here too.

AFAIK, just to avoid any issues if the file is pre-existing. I don't see
the difference between O_TRUNC and fallocate in this case.

>
>> 
>> The second issue is that there's an expectation that QEMU removes the
>> fd after the migration has finished. That's what the "fd:" code
>> does. Otherwise a second migration on the same VM could attempt to
>> provide an fdset with the same name and QEMU would reject it.
>
> Let me check what we do when with "fd:" and when migration completes or
> cancels.
>
> IIUC it's qio_channel_file_close() that does the final cleanup work on
> e.g. to_dst_file, right?  Then there's qemu_close(), and it has:
>
> /* Close fd that was dup'd from an fdset */
> fdset_id = monitor_fdset_dup_fd_find(fd);
> if (fdset_id != -1) {
> int ret;
>
> ret = close(fd);
> if (ret == 0) {
> monitor_fdset_dup_fd_remove(fd);
> }
>
> return ret;
> }
>
> Shouldn't this done the work already?

That removes the mon_fdset_fd_dup->fd, we want to remove the
mon_fdset_fd->fd.

>
> Off topic: I think this code is over complicated too, maybe I missed
> something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
> simply walk the list and remove stuff..  I attach a patch at the end that I
> tried to clean that up, just in case there's early comments.  But we can
> ignore that so we don't get side-tracked, and focus on the direct-io
> issues.

Well, I'm not confident touching this code. This is more than a decade
old, I have no idea what the original motivations were. The possible
interactions with the user via command-line (-add-fd), QMP (add-fd) and
the monitor lifetime make me confused. Not to mention the fdset part
being plumbed into the guts of a widely used qemu_open_internal() that
very misleadingly presents itself as just a wrapper for open().

>
> Thanks,
>
> ===
>
> From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
> From: Peter Xu 
> Date: Fri, 3 May 2024 11:27:20 -0400
> Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> This function is not needed, one remove function should already work.
> Clean it up.
>
> Here the code doesn't really care about whether we need to keep that dupfd
> around if close() failed: when that happens something got very wrong,
> keeping the dup_fd around the fdsets may not help that situation so far.
>
> Cc: Dr. David Alan Gilbert 
> Cc: Markus Armbruster 
> Cc: Philippe Mathieu-Daudé 
> Cc: Paolo Bonzini 
> Cc: Daniel P. Berrangé 
> Signed-off-by: Peter Xu 
> ---
>  include/monitor/monitor.h |  1 -
>  monitor/fds.c | 27 +--
>  stubs/fdset.c |  5 -
>  util/osdep.c  | 15 +--
>  4 files changed, 6 insertions(+), 42 deletions(-)
>
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 965f5d5450..fd9b3f538c 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
> int64_t fdset_id,
>  const char *opaque, Error **errp);
>  int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
>  void monitor_fdset_dup_fd_remove(int dup_fd);
> -int64_t monitor_fdset_dup_fd_find(int dup_fd);
>  
>  void monitor_register_hmp(const char *name, bool info,
>void (*cmd)(Monitor *mon, const QDict *qdict));
> diff --git a/monitor/fds.c b/monitor/fds.c
> index d86c2c674c..d5aecfb70e 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>  #endif
>  }
>  
> -static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
> +void monitor_fdset_dup_fd_remove(int dup_fd)
>  {
>  MonFdset *mon_fdset;
>  MonFdsetFd *mon_fdset_fd_dup;
> @@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int 
> dup_fd, bool remove)
>  QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
>  QLIST_FO

RE: [PATCH V8 7/8] gdbstub: Add helper function to unregister GDB register space

2024-05-03 Thread Salil Mehta via

Hi Vishnu,
Sorry for the delay in reply. Still catching up.

>  From: Vishnu Pajjuri  
>  Sent: Thursday, April 4, 2024 3:02 PM
>  To: Salil Mehta ; qemu-devel@nongnu.org; 
> qemu-...@nongnu.org
>  
>  Hi Salil,
>  On 12-03-2024 07:29, Salil Mehta wrote:
>>  Add common function to help unregister the GDB register space. This shall be
>>  done in context to the CPU unrealization.
>>  
>>  Signed-off-by: Salil Mehta mailto:salil.me...@huawei.com
>>  Tested-by: Vishnu Pajjuri mailto:vis...@os.amperecomputing.com
>>  Reviewed-by: Gavin Shan mailto:gs...@redhat.com
>>  Tested-by: Xianglai Li mailto:lixiang...@loongson.cn
>>  Tested-by: Miguel Luis mailto:miguel.l...@oracle.com
>>  Reviewed-by: Shaoqin Huang mailto:shahu...@redhat.com
>>  ---
>>   gdbstub/gdbstub.c>  | 12 
>>   include/exec/gdbstub.h |  6 ++
>>   2 files changed, 18 insertions(+)
>>  
>>  diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
>>  index 17efcae0d0..a8449dc309 100644
>>  --- a/gdbstub/gdbstub.c
>>  +++ b/gdbstub/gdbstub.c
>>  @@ -615,6 +615,18 @@ void gdb_register_coprocessor(CPUState *cpu,
>>   }
>>   }
>>   
>>  +void gdb_unregister_coprocessor_all(CPUState *cpu)
>>  +{
>>  +/*
>>  + * Safe to nuke everything. GDBRegisterState::xml is static const char 
>> so
>>  + * it won't be freed
>>  + */
>>  +g_array_free(cpu->gdb_regs, true);
>>  +
>>  +cpu->gdb_regs = NULL;
>>  +cpu->gdb_num_g_regs = 0;
>  Likewise, you may need to set gdb_num_regs to zero as well.


Sure, thanks.


>>  +}
>>  +
>>   static void gdb_process_breakpoint_remove_all(GDBProcess *p)
>>   {
>>   CPUState *cpu = gdb_get_first_cpu_in_process(p);
>>  diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
>>  index eb14b91139..249d4d4bc8 100644
>>  --- a/include/exec/gdbstub.h
>>  +++ b/include/exec/gdbstub.h
>>  @@ -49,6 +49,12 @@ void gdb_register_coprocessor(CPUState *cpu,
>>  >  >  >  >  >   gdb_get_reg_cb get_reg, gdb_set_reg_cb set_reg,
>>  >  >  >  >  >   const GDBFeature *feature, int g_pos);
>>   
>>  +/**
>>  + * gdb_unregister_coprocessor_all() - unregisters supplemental set of 
>> registers
>>  + * @cpu - the CPU associated with registers
>>  + */
>>  +void gdb_unregister_coprocessor_all(CPUState *cpu);
>>  +
>>   /**
>>* gdbserver_start: start the gdb server
>>* @port_or_device: connection spec for gdb
>>  Otherwise, Looks good to me.  Feel free to add
>>  Reviewed-by: "Vishnu Pajjuri" mailto:vis...@os.amperecomputing.com

Thanks
Salil.


>>  Regards,
>>  -Vishnu

Re: [PATCH v2 7/7] target/sparc: Split out do_ms16b

2024-05-03 Thread Philippe Mathieu-Daudé


On 3/5/24 21:11, Philippe Mathieu-Daudé wrote:

On 2/5/24 18:55, Richard Henderson wrote:

The unit operation for fmul8x16 and friends is described in the
manual as "MS16b".  Split that out for clarity.  Improve rounding
with an unconditional addition of 0.5 as a fixed-point integer.

Signed-off-by: Richard Henderson 
---
  target/sparc/vis_helper.c | 78 ---
  1 file changed, 24 insertions(+), 54 deletions(-)



@@ -150,23 +138,14 @@ uint64_t helper_fmul8x16a(uint32_t src1, int32_t 
src2)

  uint64_t helper_fmul8sux16(uint64_t src1, uint64_t src2)
  {
  VIS64 s, d;
-    uint32_t tmp;
  s.ll = src1;
  d.ll = src2;
-#define 
PMUL(r) \
-    tmp = (int32_t)d.VIS_SW64(r) * ((int32_t)s.VIS_SW64(r) >> 
8);   \
-    if ((tmp & 0xff) > 0x7f) 
{  \
-    tmp += 
0x100;   \
-
}   \

-    d.VIS_W64(r) = tmp >> 8;
-
-    PMUL(0);
-    PMUL(1);
-    PMUL(2);
-    PMUL(3);
-#undef PMUL
+    d.VIS_W64(0) = do_ms16b(s.VIS_SB64(1), d.VIS_SW64(0));


s.VIS_SB64(1) = upper bit, OK.


I meant "bits" (plural)!




+    d.VIS_W64(1) = do_ms16b(s.VIS_SB64(3), d.VIS_SW64(1));
+    d.VIS_W64(2) = do_ms16b(s.VIS_SB64(5), d.VIS_SW64(2));
+    d.VIS_W64(3) = do_ms16b(s.VIS_SB64(7), d.VIS_SW64(3));
  return d.ll;
  }
@@ -174,23 +153,14 @@ uint64_t helper_fmul8sux16(uint64_t src1, 
uint64_t src2)

  uint64_t helper_fmul8ulx16(uint64_t src1, uint64_t src2)
  {
  VIS64 s, d;
-    uint32_t tmp;
  s.ll = src1;
  d.ll = src2;
-#define 
PMUL(r) \
-    tmp = (int32_t)d.VIS_SW64(r) * ((uint32_t)s.VIS_B64(r * 
2));    \
-    if ((tmp & 0xff) > 0x7f) 
{  \
-    tmp += 
0x100;   \
-
}   \

-    d.VIS_W64(r) = tmp >> 8;
-
-    PMUL(0);
-    PMUL(1);
-    PMUL(2);
-    PMUL(3);
-#undef PMUL
+    d.VIS_W64(0) = do_ms16b(s.VIS_B64(0), d.VIS_SW64(0));


s.VIS_B64(0) for lower bit, OK.


Ditto.




+    d.VIS_W64(1) = do_ms16b(s.VIS_B64(2), d.VIS_SW64(1));
+    d.VIS_W64(2) = do_ms16b(s.VIS_B64(4), d.VIS_SW64(2));
+    d.VIS_W64(3) = do_ms16b(s.VIS_B64(6), d.VIS_SW64(3));
  return d.ll;
  }


Maybe add a comment for high/low bits in fmul8sux16/fmul8ulx16,
as it was not obvious at first. Otherwise,

Reviewed-by: Philippe Mathieu-Daudé

Re: More doc updates needed for new migrate argument @channels

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 09:13:27AM -0400, Steven Sistare wrote:
> On 5/3/2024 8:49 AM, Fabiano Rosas wrote:
> > Markus Armbruster  writes:
> > 
> > > Commit 074dbce5fcce (migration: New migrate and migrate-incoming
> > > argument 'channels') and its fixup commit 57fd4b4e1075 made command
> > > migrate argument @uri optional and mutually exclusive with @channels.
> > > 
> > > But documentation still talks about "the migration URI" in several
> > > places.
> > > 
> > > * MigrationCapability @mapped-ram:
> > > 
> > >  # @mapped-ram: Migrate using fixed offsets in the migration file for
> > >  # each RAM page.  Requires a migration URI that supports seeking,
> > >  # such as a file.  (since 9.0)
> > > 
> > >I guess it requires the migration destination (@uri or @channels) to
> > >support seeking.
> > 
> > This is ambiguous. The migration destination is usually the VM at the
> > end of the migration, not the medium to where the migration stream is
> > written to.
> > 
> > One option is to just add the mention to channel all around: "Requires a
> > migration URI or channel that supports seeking".
> > 
> > > 
> > > * MigMode @cpr-reboot:
> > > 
> > >  # @cpr-reboot: The migrate command stops the VM and saves state to 
> > > the
> > >  # URI.  After quitting QEMU, the user resumes by running QEMU
> > >  # -incoming.
> > >  #
> > >  # This mode allows the user to quit QEMU, optionally update and
> > >  # reboot the OS, and restart QEMU.  If the user reboots, the URI
> > >  # must persist across the reboot, such as by using a file.
> > > 
> > >I guess this saves to the migration destination (@uri or @channels).
> > > 
> > > * Migration Parameter @tls-hostname and its two buddies
> > > 
> > >  # @tls-hostname: migration target's hostname for validating the
> > >  # server's x509 certificate identity.  If empty, QEMU will use 
> > > the
> > >  # hostname from the migration URI, if any.  A non-empty value is
> > >  # required when using x509 based TLS credentials and the 
> > > migration
> > >  # URI does not include a hostname, such as fd: or exec: based
> > >  # migration.  (Since 2.7)
> > >  #
> > >  # Note: empty value works only since 2.9.
> > > 
> > >What's the default when we're using @channels instead of @uri?
> > 
> > The same, both URI and channels get put in the same structure before
> > taking the hostname.
> > 
> > > 
> > > * migrate-recover
> > > 
> > >  ##
> > >  # @migrate-recover:
> > >  #
> > >  # Provide a recovery migration stream URI.
> > >  #
> > >  # @uri: the URI to be used for the recovery of migration stream.
> > > 
> > >Should this command be extended to accept @channels?
> > 
> > Yes.
> > 
> > > 
> > > * docs/devel/migration/CPR.rst
> > > 
> > >Didn't look closely.  Let's discuss the others first, then come back
> > >to this one.
> > > 
> > > * HMP migrate
> > > 
> > >Fine, because this still only supports URI syntax.
> > > 
> > > * CLI option "-incoming defer"
> > > 
> > >  "-incoming defer\n" \
> > >  "wait for the URI to be specified via 
> > > migrate_incoming\n",
> > > 
> > >and
> > > 
> > >  ``-incoming defer``
> > >  Wait for the URI to be specified via migrate\_incoming. The 
> > > monitor
> > >  can be used to change settings (such as migration parameters) 
> > > prior
> > >  to issuing the migrate\_incoming to allow the migration to begin.
> > > 
> > >I figure we should call it "the migration source" instead of "the URI"
> > >here.
> > 
> > I think it's worse. We need a proper way to refer exclusively to "the
> > thing that the user passes as an argument to the migrate command".
> 
> Agreed.  My thoughts:
> 
> "migration URI" -> "migration URI/channel"
> or
> "migration URI" -> "migration stream"

"stream" might imply more on the protocol itself to me, e.g. how the
migration headers are defined, rather than the entity / fabric we use to
send the data stream?

Maybe we can simply do s/URI/channel/? As "channel" can also imply the URI
in this case as yet another (old) format to specify the migration channels.
It's like we always use QIOChannels underneath whatever way we specify the
channels (either URI or the new "channels" API).

I also copied qemu-devel starting from now.

Thanks,

-- 
Peter Xu

Re: [PATCH v2 7/7] target/sparc: Split out do_ms16b

2024-05-03 Thread Philippe Mathieu-Daudé


On 2/5/24 18:55, Richard Henderson wrote:

The unit operation for fmul8x16 and friends is described in the
manual as "MS16b".  Split that out for clarity.  Improve rounding
with an unconditional addition of 0.5 as a fixed-point integer.

Signed-off-by: Richard Henderson 
---
  target/sparc/vis_helper.c | 78 ---
  1 file changed, 24 insertions(+), 54 deletions(-)




@@ -150,23 +138,14 @@ uint64_t helper_fmul8x16a(uint32_t src1, int32_t src2)
  uint64_t helper_fmul8sux16(uint64_t src1, uint64_t src2)
  {
  VIS64 s, d;
-uint32_t tmp;
  
  s.ll = src1;

  d.ll = src2;
  
-#define PMUL(r) \

-tmp = (int32_t)d.VIS_SW64(r) * ((int32_t)s.VIS_SW64(r) >> 8);   \
-if ((tmp & 0xff) > 0x7f) {  \
-tmp += 0x100;   \
-}   \
-d.VIS_W64(r) = tmp >> 8;
-
-PMUL(0);
-PMUL(1);
-PMUL(2);
-PMUL(3);
-#undef PMUL
+d.VIS_W64(0) = do_ms16b(s.VIS_SB64(1), d.VIS_SW64(0));


s.VIS_SB64(1) = upper bit, OK.


+d.VIS_W64(1) = do_ms16b(s.VIS_SB64(3), d.VIS_SW64(1));
+d.VIS_W64(2) = do_ms16b(s.VIS_SB64(5), d.VIS_SW64(2));
+d.VIS_W64(3) = do_ms16b(s.VIS_SB64(7), d.VIS_SW64(3));
  
  return d.ll;

  }
@@ -174,23 +153,14 @@ uint64_t helper_fmul8sux16(uint64_t src1, uint64_t src2)
  uint64_t helper_fmul8ulx16(uint64_t src1, uint64_t src2)
  {
  VIS64 s, d;
-uint32_t tmp;
  
  s.ll = src1;

  d.ll = src2;
  
-#define PMUL(r) \

-tmp = (int32_t)d.VIS_SW64(r) * ((uint32_t)s.VIS_B64(r * 2));\
-if ((tmp & 0xff) > 0x7f) {  \
-tmp += 0x100;   \
-}   \
-d.VIS_W64(r) = tmp >> 8;
-
-PMUL(0);
-PMUL(1);
-PMUL(2);
-PMUL(3);
-#undef PMUL
+d.VIS_W64(0) = do_ms16b(s.VIS_B64(0), d.VIS_SW64(0));


s.VIS_B64(0) for lower bit, OK.


+d.VIS_W64(1) = do_ms16b(s.VIS_B64(2), d.VIS_SW64(1));
+d.VIS_W64(2) = do_ms16b(s.VIS_B64(4), d.VIS_SW64(2));
+d.VIS_W64(3) = do_ms16b(s.VIS_B64(6), d.VIS_SW64(3));
  
  return d.ll;

  }


Maybe add a comment for high/low bits in fmul8sux16/fmul8ulx16,
as it was not obvious at first. Otherwise,

Reviewed-by: Philippe Mathieu-Daudé

RE: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU {creation, parking} code

2024-05-03 Thread Salil Mehta via

Hello,

Just replied to your other thread just now. Sorry catching everything late.

Thanks

>  From: Harsh Prateek Bora 
>  Sent: Tuesday, April 23, 2024 7:44 AM
>  
>  + Nick
>  
>  Hi Salil,
>  I have posted a patch [1] for ppc which based on this refactoring patch.
>  I see there were some comments from Vishnu on this patch.
>  Are we expecting any further updates on this patch before merge?


Yes, few of them and I'm working towards it. I've received most of the reviews
and SOBs last year itself. There are few minor comments to be addressed before
I can float V9 version of this patch-set.

I'm planning to push that for review in 2 weeks of time along with  RFC V3 of
the architecture specific code.


Thanks
Salil.


>  
>  Thanks
>  Harsh
>  
>  [1]
>  https://lore.kernel.org/qemu-devel/a0f9b2fc-4c8a-4c37-bc36-
>  26bbaa627...@linux.ibm.com/T/#u
>  
>  On 3/22/24 13:45, Harsh Prateek Bora wrote:
>  > + Vaibhav, Shiva
>  >
>  > Hi Salil,
>  >
>  > I came across your patch while trying to solve a related problem on
>  > spapr. One query below ..
>  >
>  > On 3/12/24 07:29, Salil Mehta via wrote:
>  >> KVM vCPU creation is done once during the vCPU realization when
>  Qemu
>  >> vCPU thread is spawned. This is common to all the architectures as of
>  >> now.
>  >>
>  >> Hot-unplug of vCPU results in destruction of the vCPU object in QOM
>  >> but the corresponding KVM vCPU object in the Host KVM is not
>  >> destroyed as KVM doesn't support vCPU removal. Therefore, its
>  >> representative KVM vCPU object/context in Qemu is parked.
>  >>
>  >> Refactor architecture common logic so that some APIs could be reused
>  >> by vCPU Hotplug code of some architectures likes ARM, Loongson etc.
>  >> Update new/old APIs with trace events instead of DPRINTF. No
>  >> functional change is intended here.
>  >>
>  >> Signed-off-by: Salil Mehta 
>  >> Reviewed-by: Gavin Shan 
>  >> Tested-by: Vishnu Pajjuri 
>  >> Reviewed-by: Jonathan Cameron 
>  >> Tested-by: Xianglai Li 
>  >> Tested-by: Miguel Luis 
>  >> Reviewed-by: Shaoqin Huang 
>  >> ---
>  >>   accel/kvm/kvm-all.c    | 64
>  >> --
>  >>   accel/kvm/trace-events |  5 +++-
>  >>   include/sysemu/kvm.h   | 16 +++
>  >>   3 files changed, 69 insertions(+), 16 deletions(-)
>  >>
>  >> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index
>  >> a8cecd040e..3bc3207bda 100644
>  >> --- a/accel/kvm/kvm-all.c
>  >> +++ b/accel/kvm/kvm-all.c
>  >> @@ -126,6 +126,7 @@ static QemuMutex kml_slots_lock;
>  >>   #define kvm_slots_unlock()  qemu_mutex_unlock(&kml_slots_lock)
>  >>   static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
>  >> +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
>  >>   static inline void kvm_resample_fd_remove(int gsi)
>  >>   {
>  >> @@ -314,14 +315,53 @@ err:
>  >>   return ret;
>  >>   }
>  >> +void kvm_park_vcpu(CPUState *cpu)
>  >> +{
>  >> +    struct KVMParkedVcpu *vcpu;
>  >> +
>  >> +    trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  >> +
>  >> +    vcpu = g_malloc0(sizeof(*vcpu));
>  >> +    vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>  >> +    vcpu->kvm_fd = cpu->kvm_fd;
>  >> +    QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
>  }
>  >> +
>  >> +int kvm_create_vcpu(CPUState *cpu)
>  >> +{
>  >> +    unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
>  >> +    KVMState *s = kvm_state;
>  >> +    int kvm_fd;
>  >> +
>  >> +    trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  >> +
>  >> +    /* check if the KVM vCPU already exist but is parked */
>  >> +    kvm_fd = kvm_get_vcpu(s, vcpu_id);
>  >> +    if (kvm_fd < 0) {
>  >> +    /* vCPU not parked: create a new KVM vCPU */
>  >> +    kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
>  >> +    if (kvm_fd < 0) {
>  >> +    error_report("KVM_CREATE_VCPU IOCTL failed for vCPU
>  >> +%lu",
>  >> vcpu_id);
>  >> +    return kvm_fd;
>  >> +    }
>  >> +    }
>  >> +
>  >> +    cpu->kvm_fd = kvm_fd;
>  >> +    cpu->kvm_state = s;
>  >> +    cpu->vcpu_dirty = true;
>  >> +    cpu->dirty_pages = 0;
>  >> +    cpu->throttle_us_per_full = 0;
>  >> +
>  >> +    return 0;
>  >> +}
>  >> +
>  >>   static int do_kvm_destroy_vcpu(CPUState *cpu)
>  >>   {
>  >>   KVMState *s = kvm_state;
>  >>   long mmap_size;
>  >> -    struct KVMParkedVcpu *vcpu = NULL;
>  >>   int ret = 0;
>  >> -    trace_kvm_destroy_vcpu();
>  >> +    trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  >>   ret = kvm_arch_destroy_vcpu(cpu);
>  >>   if (ret < 0) {
>  >> @@ -347,10 +387,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>  >>   }
>  >>   }
>  >> -    vcpu = g_malloc0(sizeof(*vcpu));
>  >> -    vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>  >> -    vcpu->kvm_fd = cpu->kvm_fd;
>  >> -    QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
>  >> +    kvm_park_vcpu(cpu);
>  >>   err:
>  >>   return ret;
>  >>   }
>  >> @@ -37

Re: [PATCH 7/9] monitor: fdset: Match against O_DIRECT

2024-05-03 Thread Peter Xu

On Fri, Apr 26, 2024 at 11:20:40AM -0300, Fabiano Rosas wrote:
> We're about to enable the use of O_DIRECT in the migration code and
> due to the alignment restrictions imposed by filesystems we need to
> make sure the flag is only used when doing aligned IO.
> 
> The migration will do parallel IO to different regions of a file, so
> we need to use more than one file descriptor. Those cannot be obtained
> by duplicating (dup()) since duplicated file descriptors share the
> file status flags, including O_DIRECT. If one migration channel does
> unaligned IO while another sets O_DIRECT to do aligned IO, the
> filesystem would fail the unaligned operation.
> 
> The add-fd QMP command along with the fdset code are specifically
> designed to allow the user to pass a set of file descriptors with
> different access flags into QEMU to be later fetched by code that
> needs to alternate between those flags when doing IO.
> 
> Extend the fdset matching to behave the same with the O_DIRECT flag.
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  monitor/fds.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/monitor/fds.c b/monitor/fds.c
> index 4ec3b7eea9..62e324fcec 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -420,6 +420,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>  int fd = -1;
>  int dup_fd;
>  int mon_fd_flags;
> +int mask = O_ACCMODE;
> +
> +#ifdef O_DIRECT
> +mask |= O_DIRECT;
> +#endif
>  
>  if (mon_fdset->id != fdset_id) {
>  continue;
> @@ -431,7 +436,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
>  return -1;
>  }
>  
> -if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
> +if ((flags & mask) == (mon_fd_flags & mask)) {
>  fd = mon_fdset_fd->fd;
>  break;
>  }

I think I see what you wanted to do, picking out the right fd out of two
when qemu_open_old(), which makes sense.

However what happens if the mgmt app only passes in 1 fd to the fdset?  The
issue is we have a "fallback dup()" plan right after this chunk of code:

dup_fd = qemu_dup_flags(fd, flags);
if (dup_fd == -1) {
return -1;
}

mon_fdset_fd_dup = g_malloc0(sizeof(*mon_fdset_fd_dup));
mon_fdset_fd_dup->fd = dup_fd;
QLIST_INSERT_HEAD(&mon_fdset->dup_fds, mon_fdset_fd_dup, next);

I think it means even if the mgmt app only passes in 1 fd (rather than 2,
one with O_DIRECT, one without), QEMU can always successfully call
qemu_open_old() twice for each case, even though silently the two FDs will
actually impact on each other.  This doesn't look ideal if it's true.

But I also must confess I don't really understand this code at all: we
dup(), then we try F_SETFL on all the possible flags got passed in.
However AFAICT due to the fact that dup()ed FDs will share "struct file" it
means mostly all flags will be shared, except close-on-exec.  I don't ever
see anything protecting that F_SETFL to only touch close-on-exec, I think
it means it'll silently change file status flags for the other fd which we
dup()ed from.  Does it mean that we have issue already with such dup() usage?

Thanks,

-- 
Peter Xu

RE: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU {creation, parking} code

2024-05-03 Thread Salil Mehta via

Hi Harsh,

Sorry for the delay in my reply. I've been off the grid for some time so missed 
this
earlier mail. Please find my reply below to you query.

Thanks

>  From: Harsh Prateek Bora 
>  Sent: Friday, March 22, 2024 8:15 AM
>  
>  + Vaibhav, Shiva
>  
>  Hi Salil,
>  
>  I came across your patch while trying to solve a related problem on spapr.
>  One query below ..
>  
>  On 3/12/24 07:29, Salil Mehta via wrote:
>  > KVM vCPU creation is done once during the vCPU realization when Qemu
>  > vCPU thread is spawned. This is common to all the architectures as of now.
>  >
>  > Hot-unplug of vCPU results in destruction of the vCPU object in QOM
>  > but the corresponding KVM vCPU object in the Host KVM is not destroyed
>  > as KVM doesn't support vCPU removal. Therefore, its representative KVM
>  > vCPU object/context in Qemu is parked.
>  >
>  > Refactor architecture common logic so that some APIs could be reused
>  > by vCPU Hotplug code of some architectures likes ARM, Loongson etc.
>  > Update new/old APIs with trace events instead of DPRINTF. No functional
>  change is intended here.
>  >
>  > Signed-off-by: Salil Mehta 
>  > Reviewed-by: Gavin Shan 
>  > Tested-by: Vishnu Pajjuri 
>  > Reviewed-by: Jonathan Cameron 
>  > Tested-by: Xianglai Li 
>  > Tested-by: Miguel Luis 
>  > Reviewed-by: Shaoqin Huang 
>  > ---
>  >   accel/kvm/kvm-all.c| 64 --
>  
>  >   accel/kvm/trace-events |  5 +++-
>  >   include/sysemu/kvm.h   | 16 +++
>  >   3 files changed, 69 insertions(+), 16 deletions(-)
>  >
>  > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index
>  > a8cecd040e..3bc3207bda 100644
>  > --- a/accel/kvm/kvm-all.c
>  > +++ b/accel/kvm/kvm-all.c
>  > @@ -126,6 +126,7 @@ static QemuMutex kml_slots_lock;
>  >   #define kvm_slots_unlock()  qemu_mutex_unlock(&kml_slots_lock)
>  >
>  >   static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
>  > +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
>  >
>  >   static inline void kvm_resample_fd_remove(int gsi)
>  >   {
>  > @@ -314,14 +315,53 @@ err:
>  >   return ret;
>  >   }
>  >
>  > +void kvm_park_vcpu(CPUState *cpu)
>  > +{
>  > +struct KVMParkedVcpu *vcpu;
>  > +
>  > +trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  > +
>  > +vcpu = g_malloc0(sizeof(*vcpu));
>  > +vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>  > +vcpu->kvm_fd = cpu->kvm_fd;
>  > +QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node); }
>  > +
>  > +int kvm_create_vcpu(CPUState *cpu)
>  > +{
>  > +unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
>  > +KVMState *s = kvm_state;
>  > +int kvm_fd;
>  > +
>  > +trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  > +
>  > +/* check if the KVM vCPU already exist but is parked */
>  > +kvm_fd = kvm_get_vcpu(s, vcpu_id);
>  > +if (kvm_fd < 0) {
>  > +/* vCPU not parked: create a new KVM vCPU */
>  > +kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
>  > +if (kvm_fd < 0) {
>  > +error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu",
>  vcpu_id);
>  > +return kvm_fd;
>  > +}
>  > +}
>  > +
>  > +cpu->kvm_fd = kvm_fd;
>  > +cpu->kvm_state = s;
>  > +cpu->vcpu_dirty = true;
>  > +cpu->dirty_pages = 0;
>  > +cpu->throttle_us_per_full = 0;
>  > +
>  > +return 0;
>  > +}
>  > +
>  >   static int do_kvm_destroy_vcpu(CPUState *cpu)
>  >   {
>  >   KVMState *s = kvm_state;
>  >   long mmap_size;
>  > -struct KVMParkedVcpu *vcpu = NULL;
>  >   int ret = 0;
>  >
>  > -trace_kvm_destroy_vcpu();
>  > +trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  >
>  >   ret = kvm_arch_destroy_vcpu(cpu);
>  >   if (ret < 0) {
>  > @@ -347,10 +387,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>  >   }
>  >   }
>  >
>  > -vcpu = g_malloc0(sizeof(*vcpu));
>  > -vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>  > -vcpu->kvm_fd = cpu->kvm_fd;
>  > -QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
>  > +kvm_park_vcpu(cpu);
>  >   err:
>  >   return ret;
>  >   }
>  > @@ -371,6 +408,8 @@ static int kvm_get_vcpu(KVMState *s, unsigned
>  long vcpu_id)
>  >   if (cpu->vcpu_id == vcpu_id) {
>  >   int kvm_fd;
>  >
>  > +trace_kvm_get_vcpu(vcpu_id);
>  > +
>  >   QLIST_REMOVE(cpu, node);
>  >   kvm_fd = cpu->kvm_fd;
>  >   g_free(cpu);
>  > @@ -378,7 +417,7 @@ static int kvm_get_vcpu(KVMState *s, unsigned
>  long vcpu_id)
>  >   }
>  >   }
>  >
>  > -return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
>  > +return -ENOENT;
>  >   }
>  >
>  >   int kvm_init_vcpu(CPUState *cpu, Error **errp) @@ -389,19 +428,14 @@
>  > int kvm_init_vcpu(CPUState *cpu, Error **errp)
>  >
>  >   trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  >
>  > -ret = kv

Re: [PATCH 6/9] tests/qtest/migration: Add tests for file migration with direct-io

2024-05-03 Thread Peter Xu

On Fri, Apr 26, 2024 at 11:20:39AM -0300, Fabiano Rosas wrote:
> The tests are only allowed to run in systems that know about the
> O_DIRECT flag and in filesystems which support it.
> 
> Signed-off-by: Fabiano Rosas 

Mostly:

Reviewed-by: Peter Xu 

Two trivial comments below.

> ---
>  tests/qtest/migration-helpers.c | 42 +
>  tests/qtest/migration-helpers.h |  1 +
>  tests/qtest/migration-test.c| 42 +
>  3 files changed, 85 insertions(+)
> 
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index ce6d6615b5..356cd4fa8c 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -473,3 +473,45 @@ void migration_test_add(const char *path, void 
> (*fn)(void))
>  qtest_add_data_func_full(path, test, migration_test_wrapper,
>   migration_test_destroy);
>  }
> +
> +#ifdef O_DIRECT
> +/*
> + * Probe for O_DIRECT support on the filesystem. Since this is used
> + * for tests, be conservative, if anything fails, assume it's
> + * unsupported.
> + */
> +bool probe_o_direct_support(const char *tmpfs)
> +{
> +g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
> +int fd, flags = O_CREAT | O_RDWR | O_TRUNC | O_DIRECT;
> +void *buf;
> +ssize_t ret, len;
> +uint64_t offset;
> +
> +fd = open(filename, flags, 0660);
> +if (fd < 0) {
> +unlink(filename);
> +return false;
> +}
> +
> +/*
> + * Assuming 4k should be enough to satisfy O_DIRECT alignment
> + * requirements. The migration code uses 1M to be conservative.
> + */
> +len = 0x10;
> +offset = 0x10;
> +
> +buf = aligned_alloc(len, len);

This is the first usage of aligned_alloc() in qemu.  IIUC it's just a newer
posix_memalign(), which QEMU has one use of, and it's protected with:

#if defined(CONFIG_POSIX_MEMALIGN)
int ret;
ret = posix_memalign(&ptr, alignment, size);
...
#endif

Didn't check deeper.  Just keep this in mind if you see any compilation
issues in future CIs, or simply switch to similar pattern.

> +g_assert(buf);
> +
> +ret = pwrite(fd, buf, len, offset);
> +unlink(filename);
> +g_free(buf);
> +
> +if (ret < 0) {
> +return false;
> +}
> +
> +return true;
> +}
> +#endif
> diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
> index 1339835698..d827e16145 100644
> --- a/tests/qtest/migration-helpers.h
> +++ b/tests/qtest/migration-helpers.h
> @@ -54,5 +54,6 @@ char *find_common_machine_version(const char *mtype, const 
> char *var1,
>const char *var2);
>  char *resolve_machine_version(const char *alias, const char *var1,
>const char *var2);
> +bool probe_o_direct_support(const char *tmpfs);
>  void migration_test_add(const char *path, void (*fn)(void));
>  #endif /* MIGRATION_HELPERS_H */
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 7b177686b4..512b7ede8b 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2295,6 +2295,43 @@ static void test_multifd_file_mapped_ram(void)
>  test_file_common(&args, true);
>  }
>  
> +#ifdef O_DIRECT
> +static void *migrate_mapped_ram_dio_start(QTestState *from,
> + QTestState *to)
> +{
> +migrate_mapped_ram_start(from, to);

This line seems redundant, migrate_multifd_mapped_ram_start() should cover that.

> +migrate_set_parameter_bool(from, "direct-io", true);
> +migrate_set_parameter_bool(to, "direct-io", true);
> +
> +return NULL;
> +}
> +
> +static void *migrate_multifd_mapped_ram_dio_start(QTestState *from,
> + QTestState *to)
> +{
> +migrate_multifd_mapped_ram_start(from, to);
> +return migrate_mapped_ram_dio_start(from, to);
> +}
> +
> +static void test_multifd_file_mapped_ram_dio(void)
> +{
> +g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
> +   FILE_TEST_FILENAME);
> +MigrateCommon args = {
> +.connect_uri = uri,
> +.listen_uri = "defer",
> +.start_hook = migrate_multifd_mapped_ram_dio_start,
> +};
> +
> +if (!probe_o_direct_support(tmpfs)) {
> +g_test_skip("Filesystem does not support O_DIRECT");
> +return;
> +}
> +
> +test_file_common(&args, true);
> +}
> +
> +#endif /* O_DIRECT */
>  
>  static void test_precopy_tcp_plain(void)
>  {
> @@ -3719,6 +3756,11 @@ int main(int argc, char **argv)
>  migration_test_add("/migration/multifd/file/mapped-ram/live",
> test_multifd_file_mapped_ram_live);
>  
> +#ifdef O_DIRECT
> +migration_test_add("/migration/multifd/file/mapped-ram/dio",
> +   test_multifd_file_mapped_ram_dio);
> +#endif
> +
>  #ifdef CONFIG_G

Re: [PATCH v2 2/7] target/sparc: Fix FEXPAND

2024-05-03 Thread Philippe Mathieu-Daudé


On 2/5/24 18:55, Richard Henderson wrote:

This is a 2-operand instruction, not 3-operand.
Worse, we took the source from the wrong operand.

Signed-off-by: Richard Henderson 
---
  target/sparc/helper.h |  2 +-
  target/sparc/insns.decode |  2 +-
  target/sparc/translate.c  | 20 +++-
  target/sparc/vis_helper.c |  6 +++---
  4 files changed, 24 insertions(+), 6 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 4/7] target/sparc: Fix FMUL8x16A{U,L}

2024-05-03 Thread Philippe Mathieu-Daudé


On 2/5/24 18:55, Richard Henderson wrote:

These instructions have f32 inputs, which changes the decode
of the register numbers.  While we're fixing things, use a
common helper for both insns, extracting the 16-bit scalar
in tcg beforehand.

Signed-off-by: Richard Henderson 
---
  target/sparc/helper.h |  3 +--
  target/sparc/translate.c  | 38 +++
  target/sparc/vis_helper.c | 47 +++
  3 files changed, 48 insertions(+), 40 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 3/7] target/sparc: Fix FMUL8x16

2024-05-03 Thread Philippe Mathieu-Daudé


On 2/5/24 18:55, Richard Henderson wrote:

This instruction has f32 as source1, which alters the
decoding of the register number, which means we've been
passing the wrong data for odd register numbers.

Signed-off-by: Richard Henderson 
---
  target/sparc/helper.h |  2 +-
  target/sparc/translate.c  | 21 -
  target/sparc/vis_helper.c |  9 +
  3 files changed, 26 insertions(+), 6 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 5/9] migration/multifd: Add direct-io support

2024-05-03 Thread Peter Xu

On Fri, Apr 26, 2024 at 11:20:38AM -0300, Fabiano Rosas wrote:
> When multifd is used along with mapped-ram, we can take benefit of a
> filesystem that supports the O_DIRECT flag and perform direct I/O in
> the multifd threads. This brings a significant performance improvement
> because direct-io writes bypass the page cache which would otherwise
> be thrashed by the multifd data which is unlikely to be needed again
> in a short period of time.
> 
> To be able to use a multifd channel opened with O_DIRECT, we must
> ensure that a certain aligment is used. Filesystems usually require a
> block-size alignment for direct I/O. The way to achieve this is by
> enabling the mapped-ram feature, which already aligns its I/O properly
> (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
> 
> By setting O_DIRECT on the multifd channels, all writes to the same
> file descriptor need to be aligned as well, even the ones that come
> from outside multifd, such as the QEMUFile I/O from the main migration
> code. This makes it impossible to use the same file descriptor for the
> QEMUFile and for the multifd channels. The various flags and metadata
> written by the main migration code will always be unaligned by virtue
> of their small size. To workaround this issue, we'll require a second
> file descriptor to be used exclusively for direct I/O.
> 
> The second file descriptor can be obtained by QEMU by re-opening the
> migration file (already possible), or by being provided by the user or
> management application (support to be added in future patches).
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  migration/file.c  | 22 +++---
>  migration/migration.c | 23 +++
>  2 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/file.c b/migration/file.c
> index 8f30999400..b9265b14dd 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -83,17 +83,33 @@ void file_cleanup_outgoing_migration(void)
>  
>  bool file_send_channel_create(gpointer opaque, Error **errp)
>  {
> -QIOChannelFile *ioc;
> +QIOChannelFile *ioc = NULL;
>  int flags = O_WRONLY;
> -bool ret = true;
> +bool ret = false;
> +
> +if (migrate_direct_io()) {
> +#ifdef O_DIRECT
> +/*
> + * Enable O_DIRECT for the secondary channels. These are used
> + * for sending ram pages and writes should be guaranteed to be
> + * aligned to at least page size.
> + */
> +flags |= O_DIRECT;
> +#else
> +error_setg(errp, "System does not support O_DIRECT");
> +error_append_hint(errp,
> +  "Try disabling direct-io migration capability\n");
> +goto out;
> +#endif

Hopefully if we can fail migrate-set-parameters correctly always, we will
never trigger this error.

I know Linux used some trick like this to even avoid such ifdefs:

  if (qemu_has_direct_io() && migrate_direct_io()) {
  // reference O_DIRECT
  }

So as long as qemu_has_direct_io() can return a constant "false" when
O_DIRECT not defined, the compiler is smart enough to ignore the O_DIRECT
inside the block.

Even if it won't work, we can still avoid that error (and rely on the
set-parameter failure):

#ifdef O_DIRECT
   if (migrate_direct_io()) {
   // reference O_DIRECT
   }
#endif

Then it should run the same, just to try making ifdefs as light as
possible..

> +}
>  
>  ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
>  if (!ioc) {
> -ret = false;
>  goto out;
>  }
>  
>  multifd_channel_connect(opaque, QIO_CHANNEL(ioc));
> +ret = true;
>  
>  out:
>  /*
> diff --git a/migration/migration.c b/migration/migration.c
> index b5af6b5105..cb923a3f62 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -155,6 +155,16 @@ static bool migration_needs_seekable_channel(void)
>  return migrate_mapped_ram();
>  }
>  
> +static bool migration_needs_multiple_fds(void)

If I suggest to rename this, would you agree? :)

I'd try with "migrate_needs_extra_fd()" or "migrate_needs_two_fds()",
or... just to avoid "multi" + "fd" used altogether, perhaps.

Other than that looks all good.

Thanks,

> +{
> +/*
> + * When doing direct-io, multifd requires two different,
> + * non-duplicated file descriptors so we can use one of them for
> + * unaligned IO.
> + */
> +return migrate_multifd() && migrate_direct_io();
> +}
> +
>  static bool transport_supports_seeking(MigrationAddress *addr)
>  {
>  if (addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
> @@ -164,6 +174,12 @@ static bool transport_supports_seeking(MigrationAddress 
> *addr)
>  return false;
>  }
>  
> +static bool transport_supports_multiple_fds(MigrationAddress *addr)
> +{
> +/* file: works because QEMU can open it multiple times */
> +return addr->transport == MIGRATION_ADDRESS_TYPE_FILE;
> +}
> +
>  static bool
>  migration_channels_and_transport_compatible(Mi

Re: [PATCH v2 6/7] target/sparc: Fix FPMERGE

2024-05-03 Thread Philippe Mathieu-Daudé


On 2/5/24 18:55, Richard Henderson wrote:

This instruction has f32 inputs, which changes the decode
of the register numbers.

Signed-off-by: Richard Henderson 
---
  target/sparc/helper.h |  2 +-
  target/sparc/translate.c  |  2 +-
  target/sparc/vis_helper.c | 27 ++-
  3 files changed, 16 insertions(+), 15 deletions(-)


Looking at the manual, this call for a gvec implementation.
But then I realized this is v2 and v1 already has it :P

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU {creation,parking} code

2024-05-03 Thread Philippe Mathieu-Daudé


On 3/5/24 17:57, Salil Mehta wrote:

Hi Philippe,


  From: Philippe Mathieu-Daudé 
  Sent: Friday, May 3, 2024 10:40 AM
  Subject: Re: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU
  {creation,parking} code
  
  Hi Salil,
  
  On 12/3/24 02:59, Salil Mehta wrote:

  > KVM vCPU creation is done once during the vCPU realization when Qemu
  > vCPU thread is spawned. This is common to all the architectures as of now.
  >
  > Hot-unplug of vCPU results in destruction of the vCPU object in QOM
  > but the corresponding KVM vCPU object in the Host KVM is not destroyed
  > as KVM doesn't support vCPU removal. Therefore, its representative KVM
  > vCPU object/context in Qemu is parked.
  >
  > Refactor architecture common logic so that some APIs could be reused
  > by vCPU Hotplug code of some architectures likes ARM, Loongson etc.
  > Update new/old APIs with trace events instead of DPRINTF. No functional
  change is intended here.
  >
  > Signed-off-by: Salil Mehta 
  > Reviewed-by: Gavin Shan 
  > Tested-by: Vishnu Pajjuri 
  > Reviewed-by: Jonathan Cameron 
  > Tested-by: Xianglai Li 
  > Tested-by: Miguel Luis 
  > Reviewed-by: Shaoqin Huang 
  > ---
  >   accel/kvm/kvm-all.c| 64 --
  
  >   accel/kvm/trace-events |  5 +++-
  >   include/sysemu/kvm.h   | 16 +++
  >   3 files changed, 69 insertions(+), 16 deletions(-)
  >
  > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index
  > a8cecd040e..3bc3207bda 100644
  > --- a/accel/kvm/kvm-all.c
  > +++ b/accel/kvm/kvm-all.c
  > @@ -126,6 +126,7 @@ static QemuMutex kml_slots_lock;
  >   #define kvm_slots_unlock()  qemu_mutex_unlock(&kml_slots_lock)
  >
  >   static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
  > +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
  >
  >   static inline void kvm_resample_fd_remove(int gsi)
  >   {
  > @@ -314,14 +315,53 @@ err:
  >   return ret;
  >   }
  >
  > +void kvm_park_vcpu(CPUState *cpu)
  > +{
  > +struct KVMParkedVcpu *vcpu;
  > +
  > +trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  > +
  > +vcpu = g_malloc0(sizeof(*vcpu));
  > +vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
  > +vcpu->kvm_fd = cpu->kvm_fd;
  > +QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node); }
  > +
  > +int kvm_create_vcpu(CPUState *cpu)
  > +{
  > +unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
  > +KVMState *s = kvm_state;
  > +int kvm_fd;
  > +
  > +trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
  > +
  > +/* check if the KVM vCPU already exist but is parked */
  > +kvm_fd = kvm_get_vcpu(s, vcpu_id);
  > +if (kvm_fd < 0) {
  > +/* vCPU not parked: create a new KVM vCPU */
  > +kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
  > +if (kvm_fd < 0) {
  > +error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu",  
vcpu_id);
  > +return kvm_fd;
  > +}
  > +}
  > +
  > +cpu->kvm_fd = kvm_fd;
  > +cpu->kvm_state = s;
  > +cpu->vcpu_dirty = true;
  > +cpu->dirty_pages = 0;
  > +cpu->throttle_us_per_full = 0;
  > +
  > +return 0;
  > +}
  
  This seems generic enough to be implemented for all accelerators.
  
  See AccelOpsClass in include/sysemu/accel-ops.h.
  
  That said, can be done later on top.


Let me understand correctly. Are you suggesting to implement above even for
HVF, TCG, QTEST etc?


Not for you to implement the other non-KVM accelerators, but since
you are introducing this, now is a good time to think about a generic
interface.

So far AccelOpsClass::[un]park_vcpu() handlers make sense to me.


Thanks
Salil.

Re: [PATCH v2 0/7] target/sparc: vis fixes

2024-05-03 Thread Philippe Mathieu-Daudé


On 2/5/24 18:55, Richard Henderson wrote:

Split out from my vis4 patch set, with just the bug fixes.
I've fixed the issue in patch 6, as noticed by Mark, but
include the follow-up that cleans up all of the macros by
removing them.


r~


Richard Henderson (7):
   linux-user/sparc: Add more hwcap bits for sparc64
   target/sparc: Fix FEXPAND
   target/sparc: Fix FMUL8x16
   target/sparc: Fix FMUL8x16A{U,L}
   target/sparc: Fix FMULD8*X16
   target/sparc: Fix FPMERGE
   target/sparc: Split out do_ms16b


I'm wondering about the coverage here, since various patches
fix bugs since VIS intro in commit e9ebed4d41 from 2007, so
being broken for 17 years.

Re: [PATCH] target/arm: fix MPIDR value for ARM CPUs with SMT

2024-05-03 Thread Dorjoy Chowdhury

On Fri, May 3, 2024 at 10:28 PM Peter Maydell  wrote:
>
> On Fri, 19 Apr 2024 at 19:31, Dorjoy Chowdhury  wrote:
> >
> > Some ARM CPUs advertise themselves as SMT by having the MT[24] bit set
> > to 1 in the MPIDR register. These CPUs have the thread id in Aff0[7:0]
> > bits, CPU id in Aff1[15:8] bits and cluster id in Aff2[23:16] bits in
> > MPIDR.
> >
> > On the other hand, ARM CPUs without SMT have the MT[24] bit set to 0,
> > CPU id in Aff0[7:0] bits and cluster id in Aff1[15:8] bits in MPIDR.
> >
> > The mpidr_read_val() function always reported non-SMT i.e., MT=0 style
> > MPIDR value which means it was wrong for the following CPUs with SMT
> > supported by QEMU:
> > - cortex-a55
> > - cortex-a76
> > - cortex-a710
> > - neoverse-v1
> > - neoverse-n1
> > - neoverse-n2
>
> This has definitely turned out to be rather more complicated
> than I thought it would be when I wrote up the original issue
> in gitlab, so sorry about that.
>
> I still need to think through how we should deal with the
> interaction between what the CPU type implies about the MPIDR
> format and the topology information provided by the user.
> I probably won't get to that next week, because I'm on holiday
> for most of it, but I will see if I can at least make a start.
>
> In the meantime, there is one tiny bit of this that we can
> do now:
>
> > diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
> > index cc68b5d8f1..9d5dcf1a3f 100644
> > --- a/hw/arm/npcm7xx.c
> > +++ b/hw/arm/npcm7xx.c
> > @@ -487,7 +487,7 @@ static void npcm7xx_realize(DeviceState *dev, Error 
> > **errp)
> >  /* CPUs */
> >  for (i = 0; i < nc->num_cpus; i++) {
> >  object_property_set_int(OBJECT(&s->cpu[i]), "mp-affinity",
> > -arm_build_mp_affinity(i, 
> > NPCM7XX_MAX_NUM_CPUS),
> > +arm_build_mp_affinity(ARM_CPU(&s->cpu[i]), 
> > i, NPCM7XX_MAX_NUM_CPUS),
> >  &error_abort);
> >  object_property_set_int(OBJECT(&s->cpu[i]), "reset-cbar",
> >  NPCM7XX_GIC_CPU_IF_ADDR, &error_abort);
>
> In this file, the value of the mp-affinity property that the
> board is setting is always the same as the default value it
> would have anyway. So we can delete the call to
> object_property_set_int() entirely, which gives us one fewer
> place we need to deal with when we do eventually figure out
> how the MPIDR values should work.
>

Before I send the patch removing the "object_property_set_int" line
for "mp-affinity", just so that I understand, where else is it that
for npcm7xx the mp_affinity is being set? I can't follow the code
easily and I am not seeing where else it is being set to the same
value. It's a bit hard to follow the initialization codes in QEMU.

Regards,
Dorjoy

Re: [PATCH 4/9] migration: Add direct-io parameter

2024-05-03 Thread Peter Xu

On Fri, Apr 26, 2024 at 11:20:37AM -0300, Fabiano Rosas wrote:
> Add the direct-io migration parameter that tells the migration code to
> use O_DIRECT when opening the migration stream file whenever possible.
> 
> This is currently only used with the mapped-ram migration that has a
> clear window guaranteed to perform aligned writes.
> 
> Acked-by: Markus Armbruster 
> Signed-off-by: Fabiano Rosas 
> ---
>  include/qemu/osdep.h   |  2 ++
>  migration/migration-hmp-cmds.c | 11 +++
>  migration/options.c| 30 ++
>  migration/options.h|  1 +
>  qapi/migration.json| 18 +++---
>  util/osdep.c   |  9 +
>  6 files changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index c7053cdc2b..645c14a65d 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, 
> bool exclusive);
>  bool qemu_has_ofd_lock(void);
>  #endif
>  
> +bool qemu_has_direct_io(void);
> +
>  #if defined(__HAIKU__) && defined(__i386__)
>  #define FMT_pid "%ld"
>  #elif defined(WIN64)
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 7e96ae6ffd..8496a2b34e 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -397,6 +397,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
> QDict *qdict)
>  monitor_printf(mon, "%s: %s\n",
>  MigrationParameter_str(MIGRATION_PARAMETER_MODE),
>  qapi_enum_lookup(&MigMode_lookup, params->mode));
> +
> +if (params->has_direct_io) {
> +monitor_printf(mon, "%s: %s\n",
> +   MigrationParameter_str(
> +   MIGRATION_PARAMETER_DIRECT_IO),
> +   params->direct_io ? "on" : "off");
> +}

This will be the first parameter to optionally display here.  I think it's
a sign of misuse of has_direct_io field..

IMHO has_direct_io should be best to be kept as "whether direct_io field is
valid" and that's all of it.  It hopefully shouldn't contain more
information than that, or otherwise it'll be another small challenge we
need to overcome when we can remove all these has_* fields, and can also be
easily overlooked.

IMHO what we should do is assert has_direct_io==true here too, meanwhile...

>  }
>  
>  qapi_free_MigrationParameters(params);
> @@ -690,6 +697,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
> *qdict)
>  p->has_mode = true;
>  visit_type_MigMode(v, param, &p->mode, &err);
>  break;
> +case MIGRATION_PARAMETER_DIRECT_IO:
> +p->has_direct_io = true;
> +visit_type_bool(v, param, &p->direct_io, &err);
> +break;
>  default:
>  assert(0);
>  }
> diff --git a/migration/options.c b/migration/options.c
> index 239f5ecfb4..ae464aa4f2 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -826,6 +826,22 @@ int migrate_decompress_threads(void)
>  return s->parameters.decompress_threads;
>  }
>  
> +bool migrate_direct_io(void)
> +{
> +MigrationState *s = migrate_get_current();
> +
> +/* For now O_DIRECT is only supported with mapped-ram */
> +if (!s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM]) {
> +return false;
> +}
> +
> +if (s->parameters.has_direct_io) {
> +return s->parameters.direct_io;
> +}
> +
> +return false;
> +}
> +
>  uint64_t migrate_downtime_limit(void)
>  {
>  MigrationState *s = migrate_get_current();
> @@ -1061,6 +1077,11 @@ MigrationParameters 
> *qmp_query_migrate_parameters(Error **errp)
>  params->has_zero_page_detection = true;
>  params->zero_page_detection = s->parameters.zero_page_detection;
>  
> +if (s->parameters.has_direct_io) {
> +params->has_direct_io = true;
> +params->direct_io = s->parameters.direct_io;
> +}
> +
>  return params;
>  }
>  
> @@ -1097,6 +1118,7 @@ void migrate_params_init(MigrationParameters *params)
>  params->has_vcpu_dirty_limit = true;
>  params->has_mode = true;
>  params->has_zero_page_detection = true;
> +params->has_direct_io = qemu_has_direct_io();
>  }
>  
>  /*
> @@ -1416,6 +1438,10 @@ static void 
> migrate_params_test_apply(MigrateSetParameters *params,
>  if (params->has_zero_page_detection) {
>  dest->zero_page_detection = params->zero_page_detection;
>  }
> +
> +if (params->has_direct_io) {
> +dest->direct_io = params->direct_io;

.. do proper check here to make sure the current QEMU is built with direct
IO support, then fail QMP migrate-set-parameters otherwise when someone
tries to enable it on a QEMU that doesn't support it.

Always displaying direct_io parameter also helps when we simply want to
check qemu version and whether it supports this feature in general.

> +

Re: [PATCH] qga/commands-posix: fix typo in qmp_guest_set_user_password

2024-05-03 Thread Philippe Mathieu-Daudé


On 3/5/24 19:13, Paolo Bonzini wrote:

qga/commands-posix.c does not compile on FreeBSD due to a confusion
between "chpasswdata" (wrong) and "chpasswddata" (used in the #else
branch).



Fixes: 0e5b75a390 ("qga/commands-posix: qmp_guest_set_user_password: use 
ga_run_command helper")

Reviewed-by: Philippe Mathieu-Daudé 


Signed-off-by: Paolo Bonzini 
---
  qga/commands-posix.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

[PATCH v3] target/i386: Fix CPUID encoding of Fn8000001E_ECX

2024-05-03 Thread Babu Moger

Observed the following failure while booting the SEV-SNP guest and the
guest fails to boot with the smp parameters:
"-smp 192,sockets=1,dies=12,cores=8,threads=2".

qemu-system-x86_64: sev_snp_launch_update: SNP_LAUNCH_UPDATE ret=-5 fw_error=22 
'Invalid parameter'
qemu-system-x86_64: SEV-SNP: CPUID validation failed for function 0x801e, 
index: 0x0.
provided: eax:0x, ebx: 0x0100, ecx: 0x0b00, edx: 0x
expected: eax:0x, ebx: 0x0100, ecx: 0x0300, edx: 0x
qemu-system-x86_64: SEV-SNP: failed update CPUID page

Reason for the failure is due to overflowing of bits used for "Node per
processor" in CPUID Fn801E_ECX. This field's width is 3 bits wide and
can hold maximum value 0x7. With dies=12 (0xB), it overflows and spills
over into the reserved bits. In the case of SEV-SNP, this causes CPUID
enforcement failure and guest fails to boot.

The PPR documentation for CPUID_Fn801E_ECX [Node Identifiers]
=
BitsDescription
31:11   Reserved.

10:8NodesPerProcessor: Node per processor. Read-only.
ValidValues:
Value   Description
0h  1 node per processor.
7h-1h   Reserved.

7:0 NodeId: Node ID. Read-only. Reset: Fixed,XXh.
=

As in the spec, the valid value for "node per processor" is 0 and rest
are reserved.

Looking back at the history of decoding of CPUID_Fn801E_ECX, noticed
that there were cases where "node per processor" can be more than 1. It
is valid only for pre-F17h (pre-EPYC) architectures. For EPYC or later
CPUs, the linux kernel does not use this information to build the L3
topology.

Also noted that the CPUID Function 0x801E_ECX is available only when
TOPOEXT feature is enabled. This feature is enabled only for EPYC(F17h)
or later processors. So, previous generation of processors do not not
enumerate 0x801E_ECX leaf.

There could be some corner cases where the older guests could enable the
TOPOEXT feature by running with -cpu host, in which case legacy guests
might notice the topology change. To address those cases introduced a
new CPU property "legacy-multi-node". It will be true for older machine
types to maintain compatibility. By default, it will be false, so new
decoding will be used going forward.

The documentation is taken from Preliminary Processor Programming
Reference (PPR) for AMD Family 19h Model 11h, Revision B1 Processors 55901
Rev 0.25 - Oct 6, 2022.

Cc: qemu-sta...@nongnu.org
Fixes: 31ada106d891 ("Simplify CPUID_8000_001E for AMD")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Reviewed-by: Zhao Liu 
Signed-off-by: Babu Moger 
---
v3:
  Rebased to the latest tree.
  Updated the pc_compat_9_0 for the new flag.

v2:
   https://lore.kernel.org/kvm/20240102231738.46553-1-babu.mo...@amd.com/
   Rebased to the latest tree.
   Updated the pc_compat_8_2 for the new flag.
   Added the comment for new property legacy_multi_node.
   Added Reviwed-by from Zhao.

v1:
   https://lore.kernel.org/kvm/20231110170806.70962-1-babu.mo...@amd.com/
---
 hw/i386/pc.c  |  1 +
 target/i386/cpu.c | 18 ++
 target/i386/cpu.h |  6 ++
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 08c7de416f..46235466d7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -81,6 +81,7 @@
 GlobalProperty pc_compat_9_0[] = {
 { TYPE_X86_CPU, "guest-phys-bits", "0" },
 { "sev-guest", "legacy-vm-type", "true" },
+{ TYPE_X86_CPU, "legacy-multi-node", "on" },
 };
 const size_t pc_compat_9_0_len = G_N_ELEMENTS(pc_compat_9_0);
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index aa3b2d8391..ceb068027d 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -398,12 +398,9 @@ static void encode_topo_cpuid801e(X86CPU *cpu, 
X86CPUTopoInfo *topo_info,
  * 31:11 Reserved.
  * 10:8 NodesPerProcessor: Node per processor. Read-only. Reset: XXXb.
  *  ValidValues:
- *  Value Description
- *  000b  1 node per processor.
- *  001b  2 nodes per processor.
- *  010b Reserved.
- *  011b 4 nodes per processor.
- *  111b-100b Reserved.
+ *  Value   Description
+ *  0h  1 node per processor.
+ *  7h-1h   Reserved.
  *  7:0 NodeId: Node ID. Read-only. Reset: XXh.
  *
  * NOTE: Hardware reserves 3 bits for number of nodes per processor.
@@ -412,8 +409,12 @@ static void encode_topo_cpuid801e(X86CPU *cpu, 
X86CPUTopoInfo *topo_info,
  * NodeId is combination of node and socket_id which is already decoded
  * in apic_id. Just use it by shifting.
  */
-*ecx = ((topo_info->dies_per_pkg - 1) << 8) |
-   ((cpu->apic_id >> apicid_die_offset(topo_info)) & 0xFF);
+if (cpu->legacy_multi_node) {
+*ecx = ((topo_info->dies_per_pkg - 1) << 8) |
+   ((cpu->apic_id >> apicid_die_offs

Re: [PATCH v2 5/5] monitor: use aio_co_reschedule_self()

2024-05-03 Thread Kevin Wolf

Am 06.02.2024 um 20:06 hat Stefan Hajnoczi geschrieben:
> The aio_co_reschedule_self() API is designed to avoid the race
> condition between scheduling the coroutine in another AioContext and
> yielding.
> 
> The QMP dispatch code uses the open-coded version that appears
> susceptible to the race condition at first glance:
> 
>   aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
>   qemu_coroutine_yield();
> 
> The code is actually safe because the iohandler and qemu_aio_context
> AioContext run under the Big QEMU Lock. Nevertheless, set a good example
> and use aio_co_reschedule_self() so it's obvious that there is no race.
> 
> Suggested-by: Hanna Reitz 
> Reviewed-by: Manos Pitsidianakis 
> Reviewed-by: Hanna Czenczek 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  qapi/qmp-dispatch.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
> index 176b549473..f3488afeef 100644
> --- a/qapi/qmp-dispatch.c
> +++ b/qapi/qmp-dispatch.c
> @@ -212,8 +212,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const 
> QmpCommandList *cmds, QObject *requ
>   * executing the command handler so that it can make progress if 
> it
>   * involves an AIO_WAIT_WHILE().
>   */
> -aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
> -qemu_coroutine_yield();
> +aio_co_reschedule_self(qemu_get_aio_context());

Turns out that this one actually causes a regression. [1] This code is
ŕun in iohandler_ctx, aio_co_reschedule_self() looks at the new context
and compares it with qemu_get_current_aio_context() - and because both
are qemu_aio_context, it decides that it has nothing to do. So the
command handler coroutine actually still runs in iohandler_ctx now,
which is not what we want.

We could just revert this patch because it was only meant as a cleanup
without a semantic difference.

Or aio_co_reschedule_self() could look at qemu_coroutine_self()->ctx
instead of using qemu_get_current_aio_context(). That would be a little
more indirect, though, and I'm not sure if co->ctx is always up to date.

Any opinions on what is the best way to fix this?

Kevin

[1] https://issues.redhat.com/browse/RHEL-34618

Re: [PATCH] qga/commands-posix: fix typo in qmp_guest_set_user_password

2024-05-03 Thread Thomas Huth


On 03/05/2024 19.13, Paolo Bonzini wrote:

qga/commands-posix.c does not compile on FreeBSD due to a confusion
between "chpasswdata" (wrong) and "chpasswddata" (used in the #else
branch).

Signed-off-by: Paolo Bonzini 
---
  qga/commands-posix.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 7a065c4085c..7f05996495a 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2173,7 +2173,7 @@ void qmp_guest_set_user_password(const char *username,
  }
  
  #ifdef __FreeBSD__

-g_autofree char *chpasswdata = g_strdup(rawpasswddata);
+g_autofree char *chpasswddata = g_strdup(rawpasswddata);
  const char *crypt_flag = crypted ? "-H" : "-h";
  const char *argv[] = {"pw", "usermod", "-n", username,
crypt_flag, "0", NULL};


Fixes: 0e5b75a390 ("qga/commands-posix: qmp_guest_set_user_password: use 
ga_run_command helper")

Reviewed-by: Thomas Huth

[PATCH] qga/commands-posix: fix typo in qmp_guest_set_user_password

2024-05-03 Thread Paolo Bonzini

qga/commands-posix.c does not compile on FreeBSD due to a confusion
between "chpasswdata" (wrong) and "chpasswddata" (used in the #else
branch).

Signed-off-by: Paolo Bonzini 
---
 qga/commands-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 7a065c4085c..7f05996495a 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2173,7 +2173,7 @@ void qmp_guest_set_user_password(const char *username,
 }
 
 #ifdef __FreeBSD__
-g_autofree char *chpasswdata = g_strdup(rawpasswddata);
+g_autofree char *chpasswddata = g_strdup(rawpasswddata);
 const char *crypt_flag = crypted ? "-H" : "-h";
 const char *argv[] = {"pw", "usermod", "-n", username,
   crypt_flag, "0", NULL};
-- 
2.44.0

[PATCH v6] Hexagon: add PC alignment check and exception

2024-05-03 Thread Matheus Tavares Bernardino

The Hexagon Programmer's Reference Manual says that the exception 0x1e
should be raised upon an unaligned program counter. Let's implement that
and also add some tests.

Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Richard Henderson 
Reviewed-by: Taylor Simpson 
---
Changes in v6:
- The multi COF test defines a new section for the unaligned label to
  make it more robust.
- Instead of a nop in the undesired test branch, we use a trap for
  SYS_EXIT

 target/hexagon/cpu.h  |   7 ++
 target/hexagon/cpu_bits.h |   4 ++
 target/hexagon/macros.h   |   3 -
 linux-user/hexagon/cpu_loop.c |   4 ++
 target/hexagon/op_helper.c|   9 ++-
 tests/tcg/hexagon/unaligned_pc.c  | 107 ++
 tests/tcg/hexagon/Makefile.target |   2 +
 7 files changed, 128 insertions(+), 8 deletions(-)
 create mode 100644 tests/tcg/hexagon/unaligned_pc.c

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 3eef58fe8f..764f3c38cc 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -134,6 +134,10 @@ struct ArchCPU {
 
 FIELD(TB_FLAGS, IS_TIGHT_LOOP, 0, 1)
 
+G_NORETURN void hexagon_raise_exception_err(CPUHexagonState *env,
+uint32_t exception,
+uintptr_t pc);
+
 static inline void cpu_get_tb_cpu_state(CPUHexagonState *env, vaddr *pc,
 uint64_t *cs_base, uint32_t *flags)
 {
@@ -144,6 +148,9 @@ static inline void cpu_get_tb_cpu_state(CPUHexagonState 
*env, vaddr *pc,
 hex_flags = FIELD_DP32(hex_flags, TB_FLAGS, IS_TIGHT_LOOP, 1);
 }
 *flags = hex_flags;
+if (*pc & PCALIGN_MASK) {
+hexagon_raise_exception_err(env, HEX_EXCP_PC_NOT_ALIGNED, 0);
+}
 }
 
 typedef HexagonCPU ArchCPU;
diff --git a/target/hexagon/cpu_bits.h b/target/hexagon/cpu_bits.h
index 96fef71729..4279281a71 100644
--- a/target/hexagon/cpu_bits.h
+++ b/target/hexagon/cpu_bits.h
@@ -20,9 +20,13 @@
 
 #include "qemu/bitops.h"
 
+#define PCALIGN 4
+#define PCALIGN_MASK (PCALIGN - 1)
+
 #define HEX_EXCP_FETCH_NO_UPAGE  0x012
 #define HEX_EXCP_INVALID_PACKET  0x015
 #define HEX_EXCP_INVALID_OPCODE  0x015
+#define HEX_EXCP_PC_NOT_ALIGNED  0x01e
 #define HEX_EXCP_PRIV_NO_UREAD   0x024
 #define HEX_EXCP_PRIV_NO_UWRITE  0x025
 
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 1376d6ccc1..f375471a98 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -22,9 +22,6 @@
 #include "hex_regs.h"
 #include "reg_fields.h"
 
-#define PCALIGN 4
-#define PCALIGN_MASK (PCALIGN - 1)
-
 #define GET_FIELD(FIELD, REGIN) \
 fEXTRACTU_BITS(REGIN, reg_field_info[FIELD].width, \
reg_field_info[FIELD].offset)
diff --git a/linux-user/hexagon/cpu_loop.c b/linux-user/hexagon/cpu_loop.c
index 7f1499ed28..d41159e52a 100644
--- a/linux-user/hexagon/cpu_loop.c
+++ b/linux-user/hexagon/cpu_loop.c
@@ -60,6 +60,10 @@ void cpu_loop(CPUHexagonState *env)
 env->gpr[0] = ret;
 }
 break;
+case HEX_EXCP_PC_NOT_ALIGNED:
+force_sig_fault(TARGET_SIGBUS, TARGET_BUS_ADRALN,
+env->gpr[HEX_REG_R31]);
+break;
 case EXCP_ATOMIC:
 cpu_exec_step_atomic(cs);
 break;
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index da10ac5847..ae5a605513 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -36,10 +36,9 @@
 #define SF_MANTBITS23
 
 /* Exceptions processing helpers */
-static G_NORETURN
-void do_raise_exception_err(CPUHexagonState *env,
-uint32_t exception,
-uintptr_t pc)
+G_NORETURN void hexagon_raise_exception_err(CPUHexagonState *env,
+uint32_t exception,
+uintptr_t pc)
 {
 CPUState *cs = env_cpu(env);
 qemu_log_mask(CPU_LOG_INT, "%s: %d\n", __func__, exception);
@@ -49,7 +48,7 @@ void do_raise_exception_err(CPUHexagonState *env,
 
 G_NORETURN void HELPER(raise_exception)(CPUHexagonState *env, uint32_t excp)
 {
-do_raise_exception_err(env, excp, 0);
+hexagon_raise_exception_err(env, excp, 0);
 }
 
 void log_store32(CPUHexagonState *env, target_ulong addr,
diff --git a/tests/tcg/hexagon/unaligned_pc.c b/tests/tcg/hexagon/unaligned_pc.c
new file mode 100644
index 00..e9dc7cb8b5
--- /dev/null
+++ b/tests/tcg/hexagon/unaligned_pc.c
@@ -0,0 +1,107 @@
+#include 
+#include 
+#include 
+#include 
+
+/* will be changed in signal handler */
+volatile sig_atomic_t completed_tests;
+static jmp_buf after_test;
+static int nr_tests;
+
+void __attribute__((naked)) test_return(void)
+{
+asm volatile(
+"allocframe(#0x8)\n"
+"r0 = #0x\n"
+"framekey = r0\n"
+"dealloc_return\n"
+:
+:
+: "r0", "r29", "r30", "r31", "framekey");
+}
+
+voi

Re: [PATCH] target/arm: fix MPIDR value for ARM CPUs with SMT

2024-05-03 Thread Dorjoy Chowdhury

On Fri, May 3, 2024 at 10:28 PM Peter Maydell  wrote:
>
> On Fri, 19 Apr 2024 at 19:31, Dorjoy Chowdhury  wrote:
> >
> > Some ARM CPUs advertise themselves as SMT by having the MT[24] bit set
> > to 1 in the MPIDR register. These CPUs have the thread id in Aff0[7:0]
> > bits, CPU id in Aff1[15:8] bits and cluster id in Aff2[23:16] bits in
> > MPIDR.
> >
> > On the other hand, ARM CPUs without SMT have the MT[24] bit set to 0,
> > CPU id in Aff0[7:0] bits and cluster id in Aff1[15:8] bits in MPIDR.
> >
> > The mpidr_read_val() function always reported non-SMT i.e., MT=0 style
> > MPIDR value which means it was wrong for the following CPUs with SMT
> > supported by QEMU:
> > - cortex-a55
> > - cortex-a76
> > - cortex-a710
> > - neoverse-v1
> > - neoverse-n1
> > - neoverse-n2
>
> This has definitely turned out to be rather more complicated
> than I thought it would be when I wrote up the original issue
> in gitlab, so sorry about that.
>
> I still need to think through how we should deal with the
> interaction between what the CPU type implies about the MPIDR
> format and the topology information provided by the user.
> I probably won't get to that next week, because I'm on holiday
> for most of it, but I will see if I can at least make a start.
>

No problem at all. Just let me know when you get to it. I will see if
I can fix it or ask if I need help then. Please enjoy your holidays.
Thanks!

> In the meantime, there is one tiny bit of this that we can
> do now:
>
> > diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
> > index cc68b5d8f1..9d5dcf1a3f 100644
> > --- a/hw/arm/npcm7xx.c
> > +++ b/hw/arm/npcm7xx.c
> > @@ -487,7 +487,7 @@ static void npcm7xx_realize(DeviceState *dev, Error 
> > **errp)
> >  /* CPUs */
> >  for (i = 0; i < nc->num_cpus; i++) {
> >  object_property_set_int(OBJECT(&s->cpu[i]), "mp-affinity",
> > -arm_build_mp_affinity(i, 
> > NPCM7XX_MAX_NUM_CPUS),
> > +arm_build_mp_affinity(ARM_CPU(&s->cpu[i]), 
> > i, NPCM7XX_MAX_NUM_CPUS),
> >  &error_abort);
> >  object_property_set_int(OBJECT(&s->cpu[i]), "reset-cbar",
> >  NPCM7XX_GIC_CPU_IF_ADDR, &error_abort);
>
> In this file, the value of the mp-affinity property that the
> board is setting is always the same as the default value it
> would have anyway. So we can delete the call to
> object_property_set_int() entirely, which gives us one fewer
> place we need to deal with when we do eventually figure out
> how the MPIDR values should work.
>
> If you like you can submit a separate patch which deletes
> this one call.
>

Makes sense. I will try and send a patch.

Regards,
Dorjoy

Re: [PATCH 3/9] tests/qtest/migration: Fix file migration offset check

2024-05-03 Thread Peter Xu

On Fri, Apr 26, 2024 at 11:20:36AM -0300, Fabiano Rosas wrote:
> When doing file migration, QEMU accepts an offset that should be
> skipped when writing the migration stream to the file. The purpose of
> the offset is to allow the management layer to put its own metadata at
> the start of the file.
> 
> We have tests for this in migration-test, but only testing that the
> migration stream starts at the correct offset and not that it actually
> leaves the data intact. Unsurprisingly, there's been a bug in that
> area that the tests didn't catch.
> 
> Fix the tests to write some data to the offset region and check that
> it's actually there after the migration.
> 
> Fixes: 3dc35470c8 ("tests/qtest: migration-test: Add tests for file-based 
> migration")
> Signed-off-by: Fabiano Rosas 
> ---
>  tests/qtest/migration-test.c | 70 +---
>  1 file changed, 65 insertions(+), 5 deletions(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 5d6d8cd634..7b177686b4 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2081,6 +2081,63 @@ static void test_precopy_file(void)
>  test_file_common(&args, true);
>  }
>  
> +#ifndef _WIN32
> +static void file_dirty_offset_region(void)
> +{
> +#if defined(__linux__)

Hmm, what's the case to cover when !_WIN32 && __linux__?  Can we remove one
layer of ifdef?

I'm also wondering why it can't work on win32?  I thought win32 has all
these stuff we used here, but I may miss something.

> +g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
> FILE_TEST_FILENAME);
> +size_t size = FILE_TEST_OFFSET;
> +uintptr_t *addr, *p;
> +int fd;
> +
> +fd = open(path, O_CREAT | O_RDWR, 0660);
> +g_assert(fd != -1);
> +
> +g_assert(!ftruncate(fd, size));
> +
> +addr = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, 0);
> +g_assert(addr != MAP_FAILED);
> +
> +/* ensure the skipped offset contains some data */
> +p = addr;
> +while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> +*p = (unsigned long) FILE_TEST_FILENAME;

This is fine, but not as clear what is assigned..  I think here we assigned
is the pointer pointing to the binary's RO section (rather than the chars).
Maybe using some random numbers would be more straightforward, but no
strong opinions.

> +p++;
> +}
> +
> +munmap(addr, size);
> +fsync(fd);
> +close(fd);
> +#endif
> +}
> +
> +static void *file_offset_start_hook(QTestState *from, QTestState *to)
> +{
> +g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
> FILE_TEST_FILENAME);
> +int src_flags = O_WRONLY;
> +int dst_flags = O_RDONLY;
> +int fds[2];
> +
> +file_dirty_offset_region();
> +
> +fds[0] = open(file, src_flags, 0660);
> +assert(fds[0] != -1);
> +
> +fds[1] = open(file, dst_flags, 0660);
> +assert(fds[1] != -1);
> +
> +qtest_qmp_fds_assert_success(from, &fds[0], 1, "{'execute': 'add-fd', "
> + "'arguments': {'fdset-id': 1}}");
> +
> +qtest_qmp_fds_assert_success(to, &fds[1], 1, "{'execute': 'add-fd', "
> + "'arguments': {'fdset-id': 1}}");
> +
> +close(fds[0]);
> +close(fds[1]);
> +
> +return NULL;
> +}
> +
>  static void file_offset_finish_hook(QTestState *from, QTestState *to,
>  void *opaque)
>  {
> @@ -2096,12 +2153,12 @@ static void file_offset_finish_hook(QTestState *from, 
> QTestState *to,
>  g_assert(addr != MAP_FAILED);
>  
>  /*
> - * Ensure the skipped offset contains zeros and the migration
> - * stream starts at the right place.
> + * Ensure the skipped offset region's data has not been touched
> + * and the migration stream starts at the right place.
>   */
>  p = addr;
>  while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
> -g_assert(*p == 0);
> +g_assert_cmpstr((char *) *p, ==, FILE_TEST_FILENAME);
>  p++;
>  }
>  g_assert_cmpint(cpu_to_be64(*p) >> 32, ==, QEMU_VM_FILE_MAGIC);
> @@ -2113,17 +2170,18 @@ static void file_offset_finish_hook(QTestState *from, 
> QTestState *to,
>  
>  static void test_precopy_file_offset(void)
>  {
> -g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
> -   FILE_TEST_FILENAME,
> +g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
> FILE_TEST_OFFSET);

Do we want to keep both tests to cover both normal file and fdsets?

>  MigrateCommon args = {
>  .connect_uri = uri,
>  .listen_uri = "defer",
> +.start_hook = file_offset_start_hook,
>  .finish_hook = file_offset_finish_hook,
>  };
>  
>  test_file_common(&args, false);
>  }
> +#endif
>  
>  static void test_precopy_file_offset_bad(void)
>  {
> @@ -3636,8 +3694,10 @@ int main(int argc, ch

Re: [PATCH v5] xlnx_dpdma: fix descriptor endianness bug

2024-05-03 Thread Peter Maydell

On Thu, 2 May 2024 at 15:16, Alexandra Diupina  wrote:
>
> Add xlnx_dpdma_read_descriptor() and
> xlnx_dpdma_write_descriptor() functions.
> xlnx_dpdma_read_descriptor() combines reading a
> descriptor from desc_addr by calling dma_memory_read()
> and swapping the desc fields from guest memory order
> to host memory order. xlnx_dpdma_write_descriptor()
> performs similar actions when writing a descriptor.
>
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>
> Fixes: d3c6369a96 ("introduce xlnx-dpdma")
> Signed-off-by: Alexandra Diupina 

> @@ -755,8 +811,10 @@ size_t xlnx_dpdma_start_operation(XlnxDPDMAState *s, 
> uint8_t channel,
>  /* The descriptor need to be updated when it's completed. */
>  DPRINTF("update the descriptor with the done flag set.\n");
>  xlnx_dpdma_desc_set_done(&desc);
> -dma_memory_write(&address_space_memory, desc_addr, &desc,
> - sizeof(DPDMADescriptor), 
> MEMTXATTRS_UNSPECIFIED);
> +if (xlnx_dpdma_write_descriptor(desc_addr, &desc)) {
> +DPRINTF("Can't write the descriptor.\n");
> +break;
> +}

This "break" introduces a behaviour change, so if we want it
it should not be in this patch, which is supposed to just
be fixing the endianness bug. (If we want to try to do the
right thing on write errors here we need to check the device
datasheet to see what it says about the hardware behaviour in
that situation.)

I dropped the "break" line and have queued this to target-arm.next.

thanks
-- PMM

Re: [PATCH] target/arm: fix MPIDR value for ARM CPUs with SMT

2024-05-03 Thread Peter Maydell

On Fri, 19 Apr 2024 at 19:31, Dorjoy Chowdhury  wrote:
>
> Some ARM CPUs advertise themselves as SMT by having the MT[24] bit set
> to 1 in the MPIDR register. These CPUs have the thread id in Aff0[7:0]
> bits, CPU id in Aff1[15:8] bits and cluster id in Aff2[23:16] bits in
> MPIDR.
>
> On the other hand, ARM CPUs without SMT have the MT[24] bit set to 0,
> CPU id in Aff0[7:0] bits and cluster id in Aff1[15:8] bits in MPIDR.
>
> The mpidr_read_val() function always reported non-SMT i.e., MT=0 style
> MPIDR value which means it was wrong for the following CPUs with SMT
> supported by QEMU:
> - cortex-a55
> - cortex-a76
> - cortex-a710
> - neoverse-v1
> - neoverse-n1
> - neoverse-n2

This has definitely turned out to be rather more complicated
than I thought it would be when I wrote up the original issue
in gitlab, so sorry about that.

I still need to think through how we should deal with the
interaction between what the CPU type implies about the MPIDR
format and the topology information provided by the user.
I probably won't get to that next week, because I'm on holiday
for most of it, but I will see if I can at least make a start.

In the meantime, there is one tiny bit of this that we can
do now:

> diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
> index cc68b5d8f1..9d5dcf1a3f 100644
> --- a/hw/arm/npcm7xx.c
> +++ b/hw/arm/npcm7xx.c
> @@ -487,7 +487,7 @@ static void npcm7xx_realize(DeviceState *dev, Error 
> **errp)
>  /* CPUs */
>  for (i = 0; i < nc->num_cpus; i++) {
>  object_property_set_int(OBJECT(&s->cpu[i]), "mp-affinity",
> -arm_build_mp_affinity(i, 
> NPCM7XX_MAX_NUM_CPUS),
> +arm_build_mp_affinity(ARM_CPU(&s->cpu[i]), 
> i, NPCM7XX_MAX_NUM_CPUS),
>  &error_abort);
>  object_property_set_int(OBJECT(&s->cpu[i]), "reset-cbar",
>  NPCM7XX_GIC_CPU_IF_ADDR, &error_abort);

In this file, the value of the mp-affinity property that the
board is setting is always the same as the default value it
would have anyway. So we can delete the call to
object_property_set_int() entirely, which gives us one fewer
place we need to deal with when we do eventually figure out
how the MPIDR values should work.

If you like you can submit a separate patch which deletes
this one call.

thanks
-- PMM

RE: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU {creation,parking} code

2024-05-03 Thread Salil Mehta via

Hi Vishnu,

>  From: Vishnu Pajjuri  
>  Sent: Thursday, April 4, 2024 3:00 PM
>  Subject: Re: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU 
> {creation,parking} code
>  
>  Hi Salil,
>>  On 12-03-2024 07:29, Salil Mehta wrote:
>>  KVM vCPU creation is done once during the vCPU realization when Qemu vCPU 
>> thread
>>  is spawned. This is common to all the architectures as of now.
>>  
>>  Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the
>>  corresponding KVM vCPU object in the Host KVM is not destroyed as KVM 
>> doesn't
>>  support vCPU removal. Therefore, its representative KVM vCPU object/context 
>> in
>>  Qemu is parked.
>>  
>>  Refactor architecture common logic so that some APIs could be reused by vCPU
>>  Hotplug code of some architectures likes ARM, Loongson etc. Update new/old 
>> APIs
>>  with trace events instead of DPRINTF. No functional change is intended here.
>>  
>>  Signed-off-by: Salil Mehta mailto:salil.me...@huawei.com
>>  Reviewed-by: Gavin Shan mailto:gs...@redhat.com
>>  Tested-by: Vishnu Pajjuri mailto:vis...@os.amperecomputing.com
>>  Reviewed-by: Jonathan Cameron mailto:jonathan.came...@huawei.com
>>  Tested-by: Xianglai Li mailto:lixiang...@loongson.cn
>>  Tested-by: Miguel Luis mailto:miguel.l...@oracle.com
>>  Reviewed-by: Shaoqin Huang mailto:shahu...@redhat.com
>>  ---
>>   accel/kvm/kvm-all.c| 64 --
>>   accel/kvm/trace-events |  5 +++-
>>   include/sysemu/kvm.h   | 16 +++
>>   3 files changed, 69 insertions(+), 16 deletions(-)
>>  
>>  diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>  index a8cecd040e..3bc3207bda 100644
>>  --- a/accel/kvm/kvm-all.c
>>  +++ b/accel/kvm/kvm-all.c
>>  @@ -126,6 +126,7 @@ static QemuMutex kml_slots_lock;
>>   #define kvm_slots_unlock()  qemu_mutex_unlock(&kml_slots_lock)
>>   
>>   static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
>>  +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
>>   
>>   static inline void kvm_resample_fd_remove(int gsi)
>>   {
>>  @@ -314,14 +315,53 @@ err:
>>  return ret;
>>   }
>>   
>>  +void kvm_park_vcpu(CPUState *cpu)
>>  +{
>>  +struct KVMParkedVcpu *vcpu;
>>  +
>>  +trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  It's good if we add kvm_fd to trace.
>  It will be useful to cross verify kvm_get_vcpu()'s kvm_fd with parked vcpu.


Agreed. But this is currently called in context to create and destroy vCPU
where the trace already exists with the info you are seeking. Having
trace here might duplicate the info and end up increasing the noise.

Let me know if you think otherwise or have something else to add.

Thanks

 
>>  +
>>  +vcpu = g_malloc0(sizeof(*vcpu));
>>  +vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>>  +vcpu->kvm_fd = cpu->kvm_fd;
>>  +QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
>>  +}
>>  +
>>  +int kvm_create_vcpu(CPUState *cpu)
>>  +{
>>  +unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
>>  +KVMState *s = kvm_state;
>>  +int kvm_fd;
>>  +
>>  +trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  vcpu_id can be used instead of kvm_arch_vcpu_id(cpu).


KVM arch VCPU Id ensures that ID being traced is meaningful for that
architecture. The way CPU ID gets calculated in on different architectures
could be different. Hence, its value might be quite different.

  
>>  +
>>  +/* check if the KVM vCPU already exist but is parked */
>>  +kvm_fd = kvm_get_vcpu(s, vcpu_id);
>>  +if (kvm_fd < 0) {
>>  +>   /* vCPU not parked: create a new KVM vCPU */
>>  +>   kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
>>  +>   if (kvm_fd < 0) {
>>  +>   error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu", vcpu_id);
>>  +>   return kvm_fd;
>>  +>   }
>>  +}
>>  +
>>  +cpu->kvm_fd = kvm_fd;
>>  +cpu->kvm_state = s;
>>  +cpu->vcpu_dirty = true;
>>  +cpu->dirty_pages = 0;
>>  +cpu->throttle_us_per_full = 0;
>>  +
>>  +return 0;
>>  +}
>>  +
>>   static int do_kvm_destroy_vcpu(CPUState *cpu)
>>   {
>>   KVMState *s = kvm_state;
>>   long mmap_size;
>>  -struct KVMParkedVcpu *vcpu = NULL;
>>   int ret = 0;
>>   
>>  -trace_kvm_destroy_vcpu();
>>  +trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>>   
>>   ret = kvm_arch_destroy_vcpu(cpu);
>>   if (ret < 0) {
>>  @@ -347,10 +387,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>>  >}
>>   }
>>   
>>  -vcpu = g_malloc0(sizeof(*vcpu));
>>  -vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>>  -vcpu->kvm_fd = cpu->kvm_fd;
>>  -QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
>>  +kvm_park_vcpu(cpu);
>>   err:
>>   return ret;
>>   }
>>  @@ -371,6 +408,8 @@ static int kvm_get_vcpu(KVMState *s, unsigned long 
>> vcpu_id)
>>  >if (cpu->vcpu_id == vcpu_id) {
>>  >int kvm_fd;
>>   
>>  +>   trace_kvm_get_vcpu(vcpu_id);
>  It's good if we add kvm_fd to

Re: [PATCH 2/9] migration: Fix file migration with fdset

2024-05-03 Thread Peter Xu

On Fri, Apr 26, 2024 at 11:20:35AM -0300, Fabiano Rosas wrote:
> When the migration using the "file:" URI was implemented, I don't
> think any of us noticed that if you pass in a file name with the
> format "/dev/fdset/N", this allows a file descriptor to be passed in
> to QEMU and that behaves just like the "fd:" URI. So the "file:"
> support has been added without regard for the fdset part and we got
> some things wrong.
> 
> The first issue is that we should not truncate the migration file if
> we're allowing an fd + offset. We need to leave the file contents
> untouched.

I'm wondering whether we can use fallocate() instead on the ranges so that
we always don't open() with O_TRUNC.  Before that..  could you remind me
why do we need to truncate in the first place?  I definitely missed
something else here too.

> 
> The second issue is that there's an expectation that QEMU removes the
> fd after the migration has finished. That's what the "fd:" code
> does. Otherwise a second migration on the same VM could attempt to
> provide an fdset with the same name and QEMU would reject it.

Let me check what we do when with "fd:" and when migration completes or
cancels.

IIUC it's qio_channel_file_close() that does the final cleanup work on
e.g. to_dst_file, right?  Then there's qemu_close(), and it has:

/* Close fd that was dup'd from an fdset */
fdset_id = monitor_fdset_dup_fd_find(fd);
if (fdset_id != -1) {
int ret;

ret = close(fd);
if (ret == 0) {
monitor_fdset_dup_fd_remove(fd);
}

return ret;
}

Shouldn't this done the work already?

Off topic: I think this code is over complicated too, maybe I missed
something, but afaict we don't need monitor_fdset_dup_fd_find at all.. we
simply walk the list and remove stuff..  I attach a patch at the end that I
tried to clean that up, just in case there's early comments.  But we can
ignore that so we don't get side-tracked, and focus on the direct-io
issues.

Thanks,

===

>From 2f6b6d1224486d8ee830a7afe34738a07003b863 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Fri, 3 May 2024 11:27:20 -0400
Subject: [PATCH] monitor: Drop monitor_fdset_dup_fd_add()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This function is not needed, one remove function should already work.
Clean it up.

Here the code doesn't really care about whether we need to keep that dupfd
around if close() failed: when that happens something got very wrong,
keeping the dup_fd around the fdsets may not help that situation so far.

Cc: Dr. David Alan Gilbert 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Paolo Bonzini 
Cc: Daniel P. Berrangé 
Signed-off-by: Peter Xu 
---
 include/monitor/monitor.h |  1 -
 monitor/fds.c | 27 +--
 stubs/fdset.c |  5 -
 util/osdep.c  | 15 +--
 4 files changed, 6 insertions(+), 42 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 965f5d5450..fd9b3f538c 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
int64_t fdset_id,
 const char *opaque, Error **errp);
 int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
 void monitor_fdset_dup_fd_remove(int dup_fd);
-int64_t monitor_fdset_dup_fd_find(int dup_fd);
 
 void monitor_register_hmp(const char *name, bool info,
   void (*cmd)(Monitor *mon, const QDict *qdict));
diff --git a/monitor/fds.c b/monitor/fds.c
index d86c2c674c..d5aecfb70e 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 #endif
 }
 
-static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
+void monitor_fdset_dup_fd_remove(int dup_fd)
 {
 MonFdset *mon_fdset;
 MonFdsetFd *mon_fdset_fd_dup;
@@ -467,31 +467,14 @@ static int64_t monitor_fdset_dup_fd_find_remove(int 
dup_fd, bool remove)
 QLIST_FOREACH(mon_fdset, &mon_fdsets, next) {
 QLIST_FOREACH(mon_fdset_fd_dup, &mon_fdset->dup_fds, next) {
 if (mon_fdset_fd_dup->fd == dup_fd) {
-if (remove) {
-QLIST_REMOVE(mon_fdset_fd_dup, next);
-g_free(mon_fdset_fd_dup);
-if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
-monitor_fdset_cleanup(mon_fdset);
-}
-return -1;
-} else {
-return mon_fdset->id;
+QLIST_REMOVE(mon_fdset_fd_dup, next);
+g_free(mon_fdset_fd_dup);
+if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
+monitor_fdset_cleanup(mon_fdset);
 }
 }
 }
 }
-
-return -1;
-}
-
-int64_t monitor_fdset_dup_fd_find(int dup_fd)
-{
-return monitor_fdset_

Re: [PATCH 1/9] monitor: Honor QMP request for fd removal immediately

2024-05-03 Thread Peter Xu

On Fri, Apr 26, 2024 at 11:20:34AM -0300, Fabiano Rosas wrote:
> We're enabling using the fdset interface to pass file descriptors for
> use in the migration code. Since migrations can happen more than once
> during the VMs lifetime, we need a way to remove an fd from the fdset
> at the end of migration.
> 
> The current code only removes an fd from the fdset if the VM is
> running. This causes a QMP call to "remove-fd" to not actually remove
> the fd if the VM happens to be stopped.
> 
> While the fd would eventually be removed when monitor_fdset_cleanup()
> is called again, the user request should be honored and the fd
> actually removed. Calling remove-fd + query-fdset shows a recently
> removed fd still present.
> 
> The runstate_is_running() check was introduced by commit ebe52b592d
> ("monitor: Prevent removing fd from set during init"), which by the
> shortlog indicates that they were trying to avoid removing an
> yet-unduplicated fd too early.
> 
> I don't see why an fd explicitly removed with qmp_remove_fd() should
> be under runstate_is_running(). I'm assuming this was a mistake when
> adding the parenthesis around the expression.
> 
> Move the runstate_is_running() check to apply only to the
> QLIST_EMPTY(dup_fds) side of the expression and ignore it when
> mon_fdset_fd->removed has been explicitly set.

I am confused on why the fdset removal is as complicated.  I'm also
wondering here whether it's dropped because we checked against
"mon_refcount == 0", and maybe monitor_fdset_cleanup() is simply called
_before_ a monitor is created?  Why do we need such check on the first
place?

I'm thinking one case where the only QMP monitor got (for some reason)
disconnected, and reconnected again during VM running.  Won't current code
already lead to unwanted removal of mostly all fds due to mon_refcount==0?

I also am confused why ->removed flags is ever needed, and why we can't
already remove the fdsets fds if found matching.

Copy Corey, Eric and Kevin.

> 
> Signed-off-by: Fabiano Rosas 
> ---
>  monitor/fds.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/monitor/fds.c b/monitor/fds.c
> index d86c2c674c..4ec3b7eea9 100644
> --- a/monitor/fds.c
> +++ b/monitor/fds.c
> @@ -173,9 +173,9 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>  MonFdsetFd *mon_fdset_fd_next;
>  
>  QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, 
> mon_fdset_fd_next) {
> -if ((mon_fdset_fd->removed ||
> -(QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
> -runstate_is_running()) {
> +if (mon_fdset_fd->removed ||
> +(QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0 &&
> + runstate_is_running())) {
>  close(mon_fdset_fd->fd);
>  g_free(mon_fdset_fd->opaque);
>  QLIST_REMOVE(mon_fdset_fd, next);
> -- 
> 2.35.3
> 

-- 
Peter Xu

RE: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU {creation,parking} code

2024-05-03 Thread Salil Mehta via

Hi Philippe,

>  From: Philippe Mathieu-Daudé 
>  Sent: Friday, May 3, 2024 10:40 AM
>  Subject: Re: [PATCH V8 1/8] accel/kvm: Extract common KVM vCPU
>  {creation,parking} code
>  
>  Hi Salil,
>  
>  On 12/3/24 02:59, Salil Mehta wrote:
>  > KVM vCPU creation is done once during the vCPU realization when Qemu
>  > vCPU thread is spawned. This is common to all the architectures as of now.
>  >
>  > Hot-unplug of vCPU results in destruction of the vCPU object in QOM
>  > but the corresponding KVM vCPU object in the Host KVM is not destroyed
>  > as KVM doesn't support vCPU removal. Therefore, its representative KVM
>  > vCPU object/context in Qemu is parked.
>  >
>  > Refactor architecture common logic so that some APIs could be reused
>  > by vCPU Hotplug code of some architectures likes ARM, Loongson etc.
>  > Update new/old APIs with trace events instead of DPRINTF. No functional
>  change is intended here.
>  >
>  > Signed-off-by: Salil Mehta 
>  > Reviewed-by: Gavin Shan 
>  > Tested-by: Vishnu Pajjuri 
>  > Reviewed-by: Jonathan Cameron 
>  > Tested-by: Xianglai Li 
>  > Tested-by: Miguel Luis 
>  > Reviewed-by: Shaoqin Huang 
>  > ---
>  >   accel/kvm/kvm-all.c| 64 --
>  
>  >   accel/kvm/trace-events |  5 +++-
>  >   include/sysemu/kvm.h   | 16 +++
>  >   3 files changed, 69 insertions(+), 16 deletions(-)
>  >
>  > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index
>  > a8cecd040e..3bc3207bda 100644
>  > --- a/accel/kvm/kvm-all.c
>  > +++ b/accel/kvm/kvm-all.c
>  > @@ -126,6 +126,7 @@ static QemuMutex kml_slots_lock;
>  >   #define kvm_slots_unlock()  qemu_mutex_unlock(&kml_slots_lock)
>  >
>  >   static void kvm_slot_init_dirty_bitmap(KVMSlot *mem);
>  > +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id);
>  >
>  >   static inline void kvm_resample_fd_remove(int gsi)
>  >   {
>  > @@ -314,14 +315,53 @@ err:
>  >   return ret;
>  >   }
>  >
>  > +void kvm_park_vcpu(CPUState *cpu)
>  > +{
>  > +struct KVMParkedVcpu *vcpu;
>  > +
>  > +trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  > +
>  > +vcpu = g_malloc0(sizeof(*vcpu));
>  > +vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
>  > +vcpu->kvm_fd = cpu->kvm_fd;
>  > +QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node); }
>  > +
>  > +int kvm_create_vcpu(CPUState *cpu)
>  > +{
>  > +unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
>  > +KVMState *s = kvm_state;
>  > +int kvm_fd;
>  > +
>  > +trace_kvm_create_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  > +
>  > +/* check if the KVM vCPU already exist but is parked */
>  > +kvm_fd = kvm_get_vcpu(s, vcpu_id);
>  > +if (kvm_fd < 0) {
>  > +/* vCPU not parked: create a new KVM vCPU */
>  > +kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
>  > +if (kvm_fd < 0) {
>  > +error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu",  
> vcpu_id);
>  > +return kvm_fd;
>  > +}
>  > +}
>  > +
>  > +cpu->kvm_fd = kvm_fd;
>  > +cpu->kvm_state = s;
>  > +cpu->vcpu_dirty = true;
>  > +cpu->dirty_pages = 0;
>  > +cpu->throttle_us_per_full = 0;
>  > +
>  > +return 0;
>  > +}
>  
>  This seems generic enough to be implemented for all accelerators.
>  
>  See AccelOpsClass in include/sysemu/accel-ops.h.
>  
>  That said, can be done later on top.

Let me understand correctly. Are you suggesting to implement above even for
HVF, TCG, QTEST etc?

Thanks
Salil.

[PULL 13/14] target/sh4: Rename TCGv variables as manual for SUBV opcode

2024-05-03 Thread Philippe Mathieu-Daudé

To easily compare with the SH4 manual, rename:

  REG(B11_8) -> Rn
  REG(B7_4) -> Rm
  t0 -> result

Mention how underflow is calculated.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20240430163125.77430-5-phi...@linaro.org>
---
 target/sh4/translate.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 47c0f3404e..e599ab9d1a 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -933,16 +933,20 @@ static void _decode_opc(DisasContext * ctx)
 return;
 case 0x300b: /* subv Rm,Rn */
 {
-TCGv t0, t1, t2;
-t0 = tcg_temp_new();
-tcg_gen_sub_i32(t0, REG(B11_8), REG(B7_4));
+TCGv Rn = REG(B11_8);
+TCGv Rm = REG(B7_4);
+TCGv result, t1, t2;
+
+result = tcg_temp_new();
 t1 = tcg_temp_new();
-tcg_gen_xor_i32(t1, t0, REG(B11_8));
 t2 = tcg_temp_new();
-tcg_gen_xor_i32(t2, REG(B11_8), REG(B7_4));
+tcg_gen_sub_i32(result, Rn, Rm);
+/* T = ((Rn ^ Rm) & (Result ^ Rn)) >> 31 */
+tcg_gen_xor_i32(t1, result, Rn);
+tcg_gen_xor_i32(t2, Rn, Rm);
 tcg_gen_and_i32(t1, t1, t2);
 tcg_gen_shri_i32(cpu_sr_t, t1, 31);
-tcg_gen_mov_i32(REG(B11_8), t0);
+tcg_gen_mov_i32(Rn, result);
 }
 return;
 case 0x2008: /* tst Rm,Rn */
-- 
2.41.0

[PULL 12/14] target/sh4: Rename TCGv variables as manual for ADDV opcode

2024-05-03 Thread Philippe Mathieu-Daudé

To easily compare with the SH4 manual, rename:

  REG(B11_8) -> Rn
  REG(B7_4) -> Rm
  t0 -> result

Mention how overflow is calculated.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Reviewed-by: Yoshinori Sato 
Message-Id: <20240430163125.77430-4-phi...@linaro.org>
---
 target/sh4/translate.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 3e013b7c7c..47c0f3404e 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -705,16 +705,20 @@ static void _decode_opc(DisasContext * ctx)
 return;
 case 0x300f: /* addv Rm,Rn */
 {
-TCGv t0, t1, t2;
-t0 = tcg_temp_new();
-tcg_gen_add_i32(t0, REG(B7_4), REG(B11_8));
+TCGv Rn = REG(B11_8);
+TCGv Rm = REG(B7_4);
+TCGv result, t1, t2;
+
+result = tcg_temp_new();
 t1 = tcg_temp_new();
-tcg_gen_xor_i32(t1, t0, REG(B11_8));
 t2 = tcg_temp_new();
-tcg_gen_xor_i32(t2, REG(B7_4), REG(B11_8));
+tcg_gen_add_i32(result, Rm, Rn);
+/* T = ((Rn ^ Rm) & (Result ^ Rn)) >> 31 */
+tcg_gen_xor_i32(t1, result, Rn);
+tcg_gen_xor_i32(t2, Rm, Rn);
 tcg_gen_andc_i32(cpu_sr_t, t1, t2);
 tcg_gen_shri_i32(cpu_sr_t, cpu_sr_t, 31);
-tcg_gen_mov_i32(REG(B11_8), t0);
+tcg_gen_mov_i32(Rn, result);
 }
 return;
 case 0x2009: /* and Rm,Rn */
-- 
2.41.0

[PULL 08/14] plugins: Update stale comment

2024-05-03 Thread Philippe Mathieu-Daudé

"plugin_mask" was renamed as "event_mask" in commit c006147122
("plugins: create CPUPluginState and migrate plugin_mask").

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20240427155714.53669-3-phi...@linaro.org>
---
 plugins/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/plugins/core.c b/plugins/core.c
index 11ca20e626..09c98382f5 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -373,7 +373,7 @@ void qemu_plugin_tb_trans_cb(CPUState *cpu, struct 
qemu_plugin_tb *tb)
 struct qemu_plugin_cb *cb, *next;
 enum qemu_plugin_event ev = QEMU_PLUGIN_EV_VCPU_TB_TRANS;
 
-/* no plugin_mask check here; caller should have checked */
+/* no plugin_state->event_mask check here; caller should have checked */
 
 QLIST_FOREACH_SAFE_RCU(cb, &plugin.cb_lists[ev], entry, next) {
 qemu_plugin_vcpu_tb_trans_cb_t func = cb->f.vcpu_tb_trans;
-- 
2.41.0

[PULL 07/14] plugins/api: Only include 'exec/ram_addr.h' with system emulation

2024-05-03 Thread Philippe Mathieu-Daudé

"exec/ram_addr.h" shouldn't be used with user emulation.

Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Richard Henderson 
Message-Id: <20240427155714.53669-4-phi...@linaro.org>
---
 plugins/api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/plugins/api.c b/plugins/api.c
index 8fa5a600ac..eaee344d8e 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -42,10 +42,10 @@
 #include "tcg/tcg.h"
 #include "exec/exec-all.h"
 #include "exec/gdbstub.h"
-#include "exec/ram_addr.h"
 #include "disas/disas.h"
 #include "plugin.h"
 #ifndef CONFIG_USER_ONLY
+#include "exec/ram_addr.h"
 #include "qemu/plugin-memory.h"
 #include "hw/boards.h"
 #else
-- 
2.41.0

[PULL 05/14] user: Move 'thunk.h' from 'exec/user' to 'user'

2024-05-03 Thread Philippe Mathieu-Daudé

Keep all user emulation headers under the same user/ directory.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20240428221450.26460-2-phi...@linaro.org>
---
 MAINTAINERS | 1 -
 bsd-user/qemu.h | 2 +-
 include/{exec => }/user/thunk.h | 8 ++--
 linux-user/user-internals.h | 2 +-
 linux-user/thunk.c  | 2 +-
 5 files changed, 9 insertions(+), 6 deletions(-)
 rename include/{exec => }/user/thunk.h (97%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 302b6fd00c..96411e6adf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3692,7 +3692,6 @@ Overall usermode emulation
 M: Riku Voipio 
 S: Maintained
 F: accel/tcg/user-exec*.c
-F: include/exec/user/
 F: include/user/
 F: common-user/
 
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index a0c1ad7efa..a916724de9 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -26,7 +26,7 @@
 
 extern char **environ;
 
-#include "exec/user/thunk.h"
+#include "user/thunk.h"
 #include "target_arch.h"
 #include "syscall_defs.h"
 #include "target_syscall.h"
diff --git a/include/exec/user/thunk.h b/include/user/thunk.h
similarity index 97%
rename from include/exec/user/thunk.h
rename to include/user/thunk.h
index 9f35c888f9..2a2104b568 100644
--- a/include/exec/user/thunk.h
+++ b/include/user/thunk.h
@@ -17,8 +17,12 @@
  * License along with this library; if not, see .
  */
 
-#ifndef THUNK_H
-#define THUNK_H
+#ifndef USER_THUNK_H
+#define USER_THUNK_H
+
+#ifndef CONFIG_USER_ONLY
+#error Cannot include this header from system emulation
+#endif
 
 #include "cpu.h"
 #include "user/abitypes.h"
diff --git a/linux-user/user-internals.h b/linux-user/user-internals.h
index ce11d9e21c..5c7f173ceb 100644
--- a/linux-user/user-internals.h
+++ b/linux-user/user-internals.h
@@ -18,7 +18,7 @@
 #ifndef LINUX_USER_USER_INTERNALS_H
 #define LINUX_USER_USER_INTERNALS_H
 
-#include "exec/user/thunk.h"
+#include "user/thunk.h"
 #include "exec/exec-all.h"
 #include "exec/tb-flush.h"
 #include "qemu/log.h"
diff --git a/linux-user/thunk.c b/linux-user/thunk.c
index 071aad4b5f..3cd19e79c6 100644
--- a/linux-user/thunk.c
+++ b/linux-user/thunk.c
@@ -20,7 +20,7 @@
 #include "qemu/log.h"
 
 #include "qemu.h"
-#include "exec/user/thunk.h"
+#include "user/thunk.h"
 
 //#define DEBUG
 
-- 
2.41.0

[PULL 11/14] target/sh4: Fix SUBV opcode

2024-05-03 Thread Philippe Mathieu-Daudé

The documentation says:

  SUBV Rm, RnRn - Rm -> Rn, underflow -> T

The overflow / underflow can be calculated as:

  T = ((Rn ^ Rm) & (Result ^ Rn)) >> 31

However we were using the incorrect:

  T = ((Rn ^ Rm) & (Result ^ Rm)) >> 31

Fix by using the Rn register instead of Rm.

Add tests provided by Paul Cercueil.

Cc: qemu-sta...@nongnu.org
Fixes: ad8d25a11f ("target-sh4: implement addv and subv using TCG")
Reported-by: Paul Cercueil 
Suggested-by: Paul Cercueil 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2318
Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Yoshinori Sato 
Message-Id: <20240430163125.77430-3-phi...@linaro.org>
---
 target/sh4/translate.c|  2 +-
 tests/tcg/sh4/test-subv.c | 30 ++
 tests/tcg/sh4/Makefile.target |  3 +++
 3 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/sh4/test-subv.c

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 4a1dd0d1f4..3e013b7c7c 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -933,7 +933,7 @@ static void _decode_opc(DisasContext * ctx)
 t0 = tcg_temp_new();
 tcg_gen_sub_i32(t0, REG(B11_8), REG(B7_4));
 t1 = tcg_temp_new();
-tcg_gen_xor_i32(t1, t0, REG(B7_4));
+tcg_gen_xor_i32(t1, t0, REG(B11_8));
 t2 = tcg_temp_new();
 tcg_gen_xor_i32(t2, REG(B11_8), REG(B7_4));
 tcg_gen_and_i32(t1, t1, t2);
diff --git a/tests/tcg/sh4/test-subv.c b/tests/tcg/sh4/test-subv.c
new file mode 100644
index 00..a3c2db96e4
--- /dev/null
+++ b/tests/tcg/sh4/test-subv.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include 
+#include 
+#include 
+
+static void subv(const int a, const int b, const int res, const int carry)
+{
+int o = a, c;
+
+asm volatile("subv %2,%0\n"
+ "movt %1\n"
+ : "+r"(o), "=r"(c) : "r"(b) : );
+
+if (c != carry || o != res) {
+printf("SUBV %d, %d = %d/%d [T = %d/%d]\n", a, b, o, res, c, carry);
+abort();
+}
+}
+
+int main(void)
+{
+subv(INT_MIN, 1, INT_MAX, 1);
+subv(INT_MAX, -1, INT_MIN, 1);
+subv(INT_MAX, 1, INT_MAX - 1, 0);
+subv(0, 1, -1, 0);
+subv(-1, -1, 0, 0);
+
+return 0;
+}
diff --git a/tests/tcg/sh4/Makefile.target b/tests/tcg/sh4/Makefile.target
index 521b8b0a76..7852fa62d8 100644
--- a/tests/tcg/sh4/Makefile.target
+++ b/tests/tcg/sh4/Makefile.target
@@ -20,3 +20,6 @@ TESTS += test-macw
 
 test-addv: CFLAGS += -O -g
 TESTS += test-addv
+
+test-subv: CFLAGS += -O -g
+TESTS += test-subv
-- 
2.41.0

[PULL 14/14] ui/cocoa.m: Drop old macOS-10.12-and-earlier compat ifdefs

2024-05-03 Thread Philippe Mathieu-Daudé

From: Peter Maydell 

We only support the most recent two versions of macOS (currently
macOS 13 Ventura and macOS 14 Sonoma), and our ui/cocoa.m code
already assumes at least macOS 12 Monterey or better, because it uses
NSScreen safeAreaInsets, which is 12.0-or-newer.

Remove the ifdefs that were providing backwards compatibility for
building on 10.12 and earlier versions.

Signed-off-by: Peter Maydell 
Reviewed-by: Daniel P. Berrangé 
Message-ID: <20240502142904.62644-1-peter.mayd...@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé 
---
 ui/cocoa.m | 13 -
 1 file changed, 13 deletions(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 25e0db9dd0..981615a8b9 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -50,23 +50,10 @@
 #include 
 #include "hw/core/cpu.h"
 
-#ifndef MAC_OS_X_VERSION_10_13
-#define MAC_OS_X_VERSION_10_13 101300
-#endif
-
 #ifndef MAC_OS_VERSION_14_0
 #define MAC_OS_VERSION_14_0 14
 #endif
 
-/* 10.14 deprecates NSOnState and NSOffState in favor of
- * NSControlStateValueOn/Off, which were introduced in 10.13.
- * Define for older versions
- */
-#if MAC_OS_X_VERSION_MAX_ALLOWED < MAC_OS_X_VERSION_10_13
-#define NSControlStateValueOn NSOnState
-#define NSControlStateValueOff NSOffState
-#endif
-
 //#define DEBUG
 
 #ifdef DEBUG
-- 
2.41.0

[PULL 09/14] MAINTAINERS: Update my email address

2024-05-03 Thread Philippe Mathieu-Daudé

From: Anthony PERARD 

Signed-off-by: Anthony PERARD 
Acked-by: Paul Durrant 
Acked-by: Stefano Stabellini 
Message-ID: <20240429154938.19340-1-anthony.per...@citrix.com>
Signed-off-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 96411e6adf..2f08cc528e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -532,7 +532,7 @@ Guest CPU Cores (Xen)
 -
 X86 Xen CPUs
 M: Stefano Stabellini 
-M: Anthony Perard 
+M: Anthony PERARD 
 M: Paul Durrant 
 L: xen-de...@lists.xenproject.org
 S: Supported
-- 
2.41.0

[PULL 04/14] user: Move 'abitypes.h' from 'exec/user' to 'user'

2024-05-03 Thread Philippe Mathieu-Daudé

Keep all user emulation headers under the same user/ directory.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20240503125202.35667-1-phi...@linaro.org>
---
 bsd-user/qemu.h| 2 +-
 include/exec/cpu-all.h | 2 +-
 include/exec/user/thunk.h  | 2 +-
 include/{exec => }/user/abitypes.h | 4 ++--
 include/user/syscall-trace.h   | 2 +-
 linux-user/qemu.h  | 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)
 rename include/{exec => }/user/abitypes.h (97%)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 8629f0dcde..a0c1ad7efa 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -22,7 +22,7 @@
 #include "exec/cpu_ldst.h"
 #include "exec/exec-all.h"
 
-#include "exec/user/abitypes.h"
+#include "user/abitypes.h"
 
 extern char **environ;
 
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index e75ec13cd0..032c6d990e 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -64,7 +64,7 @@
 /* MMU memory access macros */
 
 #if defined(CONFIG_USER_ONLY)
-#include "exec/user/abitypes.h"
+#include "user/abitypes.h"
 
 /*
  * If non-zero, the guest virtual address space is a contiguous subset
diff --git a/include/exec/user/thunk.h b/include/exec/user/thunk.h
index 2ebfecf58e..9f35c888f9 100644
--- a/include/exec/user/thunk.h
+++ b/include/exec/user/thunk.h
@@ -21,7 +21,7 @@
 #define THUNK_H
 
 #include "cpu.h"
-#include "exec/user/abitypes.h"
+#include "user/abitypes.h"
 
 /* types enums definitions */
 
diff --git a/include/exec/user/abitypes.h b/include/user/abitypes.h
similarity index 97%
rename from include/exec/user/abitypes.h
rename to include/user/abitypes.h
index 3ec1969368..5c9a955631 100644
--- a/include/exec/user/abitypes.h
+++ b/include/user/abitypes.h
@@ -1,5 +1,5 @@
-#ifndef EXEC_USER_ABITYPES_H
-#define EXEC_USER_ABITYPES_H
+#ifndef USER_ABITYPES_H
+#define USER_ABITYPES_H
 
 #ifndef CONFIG_USER_ONLY
 #error Cannot include this header from system emulation
diff --git a/include/user/syscall-trace.h b/include/user/syscall-trace.h
index b48b2b2d0a..9bd7ca19c8 100644
--- a/include/user/syscall-trace.h
+++ b/include/user/syscall-trace.h
@@ -10,7 +10,7 @@
 #ifndef SYSCALL_TRACE_H
 #define SYSCALL_TRACE_H
 
-#include "exec/user/abitypes.h"
+#include "user/abitypes.h"
 #include "gdbstub/user.h"
 #include "qemu/plugin.h"
 #include "trace/trace-root.h"
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 4777856b52..263f445ff1 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -4,7 +4,7 @@
 #include "cpu.h"
 #include "exec/cpu_ldst.h"
 
-#include "exec/user/abitypes.h"
+#include "user/abitypes.h"
 
 #include "syscall_defs.h"
 #include "target_syscall.h"
-- 
2.41.0

[PULL 10/14] target/sh4: Fix ADDV opcode

2024-05-03 Thread Philippe Mathieu-Daudé

The documentation says:

  ADDV Rm, RnRn + Rm -> Rn, overflow -> T

But QEMU implementation was:

  ADDV Rm, RnRn + Rm -> Rm, overflow -> T

Fix by filling the correct Rm register.

Add tests provided by Paul Cercueil.

Cc: qemu-sta...@nongnu.org
Fixes: ad8d25a11f ("target-sh4: implement addv and subv using TCG")
Reported-by: Paul Cercueil 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2317
Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Yoshinori Sato 
Message-Id: <20240430163125.77430-2-phi...@linaro.org>
---
 target/sh4/translate.c|  2 +-
 tests/tcg/sh4/test-addv.c | 27 +++
 tests/tcg/sh4/Makefile.target |  3 +++
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/sh4/test-addv.c

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index ebb6c901bf..4a1dd0d1f4 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -714,7 +714,7 @@ static void _decode_opc(DisasContext * ctx)
 tcg_gen_xor_i32(t2, REG(B7_4), REG(B11_8));
 tcg_gen_andc_i32(cpu_sr_t, t1, t2);
 tcg_gen_shri_i32(cpu_sr_t, cpu_sr_t, 31);
-tcg_gen_mov_i32(REG(B7_4), t0);
+tcg_gen_mov_i32(REG(B11_8), t0);
 }
 return;
 case 0x2009: /* and Rm,Rn */
diff --git a/tests/tcg/sh4/test-addv.c b/tests/tcg/sh4/test-addv.c
new file mode 100644
index 00..ca87fe746a
--- /dev/null
+++ b/tests/tcg/sh4/test-addv.c
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include 
+#include 
+#include 
+
+static void addv(const int a, const int b, const int res, const int carry)
+{
+int o = a, c;
+
+asm volatile("addv %2,%0\n"
+ "movt %1\n"
+ : "+r"(o), "=r"(c) : "r"(b) : );
+
+if (c != carry || o != res) {
+printf("ADDV %d, %d = %d/%d [T = %d/%d]\n", a, b, o, res, c, carry);
+abort();
+}
+}
+
+int main(void)
+{
+addv(INT_MAX, 1, INT_MIN, 1);
+addv(INT_MAX - 1, 1, INT_MAX, 0);
+
+return 0;
+}
diff --git a/tests/tcg/sh4/Makefile.target b/tests/tcg/sh4/Makefile.target
index 4d09291c0c..521b8b0a76 100644
--- a/tests/tcg/sh4/Makefile.target
+++ b/tests/tcg/sh4/Makefile.target
@@ -17,3 +17,6 @@ TESTS += test-macl
 
 test-macw: CFLAGS += -O -g
 TESTS += test-macw
+
+test-addv: CFLAGS += -O -g
+TESTS += test-addv
-- 
2.41.0

[PULL 03/14] exec: Include missing license in 'exec/cpu-common.h'

2024-05-03 Thread Philippe Mathieu-Daudé

Commit 1ad2134f91 ("Hardware convenience library") extracted
"cpu-common.h" from "cpu-all.h", which uses the LGPL-2.1+ license.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20240427155714.53669-5-phi...@linaro.org>
---
 include/exec/cpu-common.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 6d5318895a..8812ba744d 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -1,8 +1,13 @@
+/*
+ * CPU interfaces that are target independent.
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * SPDX-License-Identifier: LGPL-2.1+
+ */
 #ifndef CPU_COMMON_H
 #define CPU_COMMON_H
 
-/* CPU interfaces that are target independent.  */
-
 #include "exec/vaddr.h"
 #ifndef CONFIG_USER_ONLY
 #include "exec/hwaddr.h"
-- 
2.41.0

[PULL 06/14] coverity: Update user emulation regexp

2024-05-03 Thread Philippe Mathieu-Daudé

All user emulation headers are now under include/user/.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20240428221450.26460-3-phi...@linaro.org>
---
 scripts/coverity-scan/COMPONENTS.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/coverity-scan/COMPONENTS.md 
b/scripts/coverity-scan/COMPONENTS.md
index 91be8d1c36..1537e49cd5 100644
--- a/scripts/coverity-scan/COMPONENTS.md
+++ b/scripts/coverity-scan/COMPONENTS.md
@@ -121,7 +121,7 @@ usb
   ~ (/qemu)?(/hw/usb/.*|/include/hw/usb/.*)
 
 user
-  ~ 
(/qemu)?(/linux-user/.*|/bsd-user/.*|/user-exec\.c|/thunk\.c|/include/exec/user/.*)
+  ~ 
(/qemu)?(/linux-user/.*|/bsd-user/.*|/user-exec\.c|/thunk\.c|/include/user/.*)
 
 util
   ~ (/qemu)?(/util/.*|/include/qemu/.*)
-- 
2.41.0

[PULL 02/14] accel/whpx: Fix NULL dereference in whpx_init_vcpu()

2024-05-03 Thread Philippe Mathieu-Daudé

When mechanically moving the @dirty field to AccelCPUState
in commit 9ad49538c7, we neglected cpu->accel is still NULL
when we want to dereference it.

Fixes: 9ad49538c7 ("accel/whpx: Use accel-specific per-vcpu @dirty field")
Reported-by: Volker Rümelin 
Suggested-by: Volker Rümelin 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20240429091918.27429-2-phi...@linaro.org>
---
 target/i386/whpx/whpx-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index b08e644517..a6674a826d 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c
@@ -2236,7 +2236,7 @@ int whpx_init_vcpu(CPUState *cpu)
 }
 
 vcpu->interruptable = true;
-cpu->accel->dirty = true;
+vcpu->dirty = true;
 cpu->accel = vcpu;
 max_vcpu_index = max(max_vcpu_index, cpu->cpu_index);
 qemu_add_vm_change_state_handler(whpx_cpu_update_state, env);
-- 
2.41.0

[PULL 01/14] accel/nvmm: Fix NULL dereference in nvmm_init_vcpu()

2024-05-03 Thread Philippe Mathieu-Daudé

When mechanically moving the @dirty field to AccelCPUState
in commit 79f1926b2d, we neglected cpu->accel is still NULL
when we want to dereference it.

Reported-by: Volker Rümelin 
Suggested-by: Volker Rümelin 
Fixes: 79f1926b2d ("accel/nvmm: Use accel-specific per-vcpu @dirty field")
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20240429091918.27429-3-phi...@linaro.org>
---
 target/i386/nvmm/nvmm-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/nvmm/nvmm-all.c b/target/i386/nvmm/nvmm-all.c
index f9cced53b3..65768aca03 100644
--- a/target/i386/nvmm/nvmm-all.c
+++ b/target/i386/nvmm/nvmm-all.c
@@ -982,7 +982,7 @@ nvmm_init_vcpu(CPUState *cpu)
 }
 }
 
-cpu->accel->dirty = true;
+qcpu->dirty = true;
 cpu->accel = qcpu;
 
 return 0;
-- 
2.41.0

[PULL 00/14] Accel / SH4 / UI patches for 2024-05-03

2024-05-03 Thread Philippe Mathieu-Daudé

The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479:

  Merge tag 'accel-20240426' of https://github.com/philmd/qemu into staging 
(2024-04-26 15:28:13 -0700)

are available in the Git repository at:

  https://github.com/philmd/qemu.git tags/accel-sh4-ui-20240503

for you to fetch changes up to 2d27c91e2b72ac7a65504ac207c89262d92464eb:

  ui/cocoa.m: Drop old macOS-10.12-and-earlier compat ifdefs (2024-05-03 
17:33:26 +0200)


- Fix NULL dereference in NVMM & WHPX init_vcpu()
- Move user emulation headers "exec/user" to "user"
- Fix SH-4 ADDV / SUBV opcodes
- Drop Cocoa compatility on macOS <= 10.12
- Update Anthony PERARD email



Anthony PERARD (1):
  MAINTAINERS: Update my email address

Peter Maydell (1):
  ui/cocoa.m: Drop old macOS-10.12-and-earlier compat ifdefs

Philippe Mathieu-Daudé (12):
  accel/nvmm: Fix NULL dereference in nvmm_init_vcpu()
  accel/whpx: Fix NULL dereference in whpx_init_vcpu()
  exec: Include missing license in 'exec/cpu-common.h'
  user: Move 'abitypes.h' from 'exec/user' to 'user'
  user: Move 'thunk.h' from 'exec/user' to 'user'
  coverity: Update user emulation regexp
  plugins/api: Only include 'exec/ram_addr.h' with system emulation
  plugins: Update stale comment
  target/sh4: Fix ADDV opcode
  target/sh4: Fix SUBV opcode
  target/sh4: Rename TCGv variables as manual for ADDV opcode
  target/sh4: Rename TCGv variables as manual for SUBV opcode

 MAINTAINERS |  3 +--
 bsd-user/qemu.h |  4 ++--
 include/exec/cpu-all.h  |  2 +-
 include/exec/cpu-common.h   |  9 ++--
 include/{exec => }/user/abitypes.h  |  4 ++--
 include/user/syscall-trace.h|  2 +-
 include/{exec => }/user/thunk.h | 10 ++---
 linux-user/qemu.h   |  2 +-
 linux-user/user-internals.h |  2 +-
 linux-user/thunk.c  |  2 +-
 plugins/api.c   |  2 +-
 plugins/core.c  |  2 +-
 target/i386/nvmm/nvmm-all.c |  2 +-
 target/i386/whpx/whpx-all.c |  2 +-
 target/sh4/translate.c  | 32 ++---
 tests/tcg/sh4/test-addv.c   | 27 
 tests/tcg/sh4/test-subv.c   | 30 +++
 scripts/coverity-scan/COMPONENTS.md |  2 +-
 tests/tcg/sh4/Makefile.target   |  6 ++
 ui/cocoa.m  | 13 
 20 files changed, 112 insertions(+), 46 deletions(-)
 rename include/{exec => }/user/abitypes.h (97%)
 rename include/{exec => }/user/thunk.h (97%)
 create mode 100644 tests/tcg/sh4/test-addv.c
 create mode 100644 tests/tcg/sh4/test-subv.c

-- 
2.41.0

[PATCH] hvf: arm: Fix encodings for ID_AA64PFR1_EL1 and debug System registers

2024-05-03 Thread Zenghui Yu

We wrongly encoded ID_AA64PFR1_EL1 using {3,0,0,4,2} in hvf_sreg_match[] so
we fail to get the expected ARMCPRegInfo from cp_regs hash table with the
wrong key.

Fix it with the correct encoding {3,0,0,4,1}. With that fixed, the Linux
guest can properly detect FEAT_SSBS2 on my M1 HW.

All DBG{B,W}{V,C}R_EL1 registers are also wrongly encoded with op0 == 14.
It happens to work because HVF_SYSREG(CRn, CRm, 14, op1, op2) equals to
HVF_SYSREG(CRn, CRm, 2, op1, op2), by definition. But we shouldn't rely on
it.

Fixes: a1477da3ddeb ("hvf: Add Apple Silicon support")
Signed-off-by: Zenghui Yu 
---
 target/arm/hvf/hvf.c | 160 +--
 1 file changed, 80 insertions(+), 80 deletions(-)

diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 08d0757438..45e2218be5 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -396,85 +396,85 @@ struct hvf_sreg_match {
 };
 
 static struct hvf_sreg_match hvf_sreg_match[] = {
-{ HV_SYS_REG_DBGBVR0_EL1, HVF_SYSREG(0, 0, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR0_EL1, HVF_SYSREG(0, 0, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR0_EL1, HVF_SYSREG(0, 0, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR0_EL1, HVF_SYSREG(0, 0, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR1_EL1, HVF_SYSREG(0, 1, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR1_EL1, HVF_SYSREG(0, 1, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR1_EL1, HVF_SYSREG(0, 1, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR1_EL1, HVF_SYSREG(0, 1, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR2_EL1, HVF_SYSREG(0, 2, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR2_EL1, HVF_SYSREG(0, 2, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR2_EL1, HVF_SYSREG(0, 2, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR2_EL1, HVF_SYSREG(0, 2, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR3_EL1, HVF_SYSREG(0, 3, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR3_EL1, HVF_SYSREG(0, 3, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR3_EL1, HVF_SYSREG(0, 3, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR3_EL1, HVF_SYSREG(0, 3, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR4_EL1, HVF_SYSREG(0, 4, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR4_EL1, HVF_SYSREG(0, 4, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR4_EL1, HVF_SYSREG(0, 4, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR4_EL1, HVF_SYSREG(0, 4, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR5_EL1, HVF_SYSREG(0, 5, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR5_EL1, HVF_SYSREG(0, 5, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR5_EL1, HVF_SYSREG(0, 5, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR5_EL1, HVF_SYSREG(0, 5, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR6_EL1, HVF_SYSREG(0, 6, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR6_EL1, HVF_SYSREG(0, 6, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR6_EL1, HVF_SYSREG(0, 6, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR6_EL1, HVF_SYSREG(0, 6, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR7_EL1, HVF_SYSREG(0, 7, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR7_EL1, HVF_SYSREG(0, 7, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR7_EL1, HVF_SYSREG(0, 7, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR7_EL1, HVF_SYSREG(0, 7, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR8_EL1, HVF_SYSREG(0, 8, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR8_EL1, HVF_SYSREG(0, 8, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR8_EL1, HVF_SYSREG(0, 8, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR8_EL1, HVF_SYSREG(0, 8, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR9_EL1, HVF_SYSREG(0, 9, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR9_EL1, HVF_SYSREG(0, 9, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR9_EL1, HVF_SYSREG(0, 9, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR9_EL1, HVF_SYSREG(0, 9, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR10_EL1, HVF_SYSREG(0, 10, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR10_EL1, HVF_SYSREG(0, 10, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR10_EL1, HVF_SYSREG(0, 10, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR10_EL1, HVF_SYSREG(0, 10, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR11_EL1, HVF_SYSREG(0, 11, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR11_EL1, HVF_SYSREG(0, 11, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR11_EL1, HVF_SYSREG(0, 11, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR11_EL1, HVF_SYSREG(0, 11, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR12_EL1, HVF_SYSREG(0, 12, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR12_EL1, HVF_SYSREG(0, 12, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR12_EL1, HVF_SYSREG(0, 12, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR12_EL1, HVF_SYSREG(0, 12, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR13_EL1, HVF_SYSREG(0, 13, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR13_EL1, HVF_SYSREG(0, 13, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR13_EL1, HVF_SYSREG(0, 13, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR13_EL1, HVF_SYSREG(0, 13, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR14_EL1, HVF_SYSREG(0, 14, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR14_EL1, HVF_SYSREG(0, 14, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR14_EL1, HVF_SYSREG(0, 14, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR14_EL1, HVF_SYSREG(0, 14, 14, 0, 7) },
-
-{ HV_SYS_REG_DBGBVR15_EL1, HVF_SYSREG(0, 15, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR15_EL1, HVF_SYSREG(0, 15, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR15_EL1, HVF_SYSREG(0, 15, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR15_EL1, HVF_SYSREG(0, 15, 14, 0, 7) },
+{ HV_SYS_REG_DBGBVR0_EL1, HVF_SYSREG(0,

[PULL 06/10] util/bufferiszero: Improve scalar variant

2024-05-03 Thread Richard Henderson

Split less-than and greater-than 256 cases.
Use unaligned accesses for head and tail.
Avoid using out-of-bounds pointers in loop boundary conditions.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 util/bufferiszero.c | 85 +++--
 1 file changed, 51 insertions(+), 34 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 02df82b4ff..c9a7ded016 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -28,40 +28,57 @@
 
 static bool (*buffer_is_zero_accel)(const void *, size_t);
 
-static bool buffer_is_zero_integer(const void *buf, size_t len)
+static bool buffer_is_zero_int_lt256(const void *buf, size_t len)
 {
-if (unlikely(len < 8)) {
-/* For a very small buffer, simply accumulate all the bytes.  */
-const unsigned char *p = buf;
-const unsigned char *e = buf + len;
-unsigned char t = 0;
+uint64_t t;
+const uint64_t *p, *e;
 
-do {
-t |= *p++;
-} while (p < e);
-
-return t == 0;
-} else {
-/* Otherwise, use the unaligned memory access functions to
-   handle the beginning and end of the buffer, with a couple
-   of loops handling the middle aligned section.  */
-uint64_t t = ldq_he_p(buf);
-const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8);
-const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8);
-
-for (; p + 8 <= e; p += 8) {
-if (t) {
-return false;
-}
-t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7];
-}
-while (p < e) {
-t |= *p++;
-}
-t |= ldq_he_p(buf + len - 8);
-
-return t == 0;
+/*
+ * Use unaligned memory access functions to handle
+ * the beginning and end of the buffer.
+ */
+if (unlikely(len <= 8)) {
+return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0;
 }
+
+t = ldq_he_p(buf) | ldq_he_p(buf + len - 8);
+p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8);
+e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8);
+
+/* Read 0 to 31 aligned words from the middle. */
+while (p < e) {
+t |= *p++;
+}
+return t == 0;
+}
+
+static bool buffer_is_zero_int_ge256(const void *buf, size_t len)
+{
+/*
+ * Use unaligned memory access functions to handle
+ * the beginning and end of the buffer.
+ */
+uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8);
+const uint64_t *p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8);
+const uint64_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8);
+
+/* Collect a partial block at the tail end. */
+t |= e[-7] | e[-6] | e[-5] | e[-4] | e[-3] | e[-2] | e[-1];
+
+/*
+ * Loop over 64 byte blocks.
+ * With the head and tail removed, e - p >= 30,
+ * so the loop must iterate at least 3 times.
+ */
+do {
+if (t) {
+return false;
+}
+t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7];
+p += 8;
+} while (p < e - 7);
+
+return t == 0;
 }
 
 #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
@@ -173,7 +190,7 @@ select_accel_cpuinfo(unsigned info)
 { CPUINFO_AVX2,buffer_zero_avx2 },
 #endif
 { CPUINFO_SSE2,buffer_zero_sse2 },
-{ CPUINFO_ALWAYS,  buffer_is_zero_integer },
+{ CPUINFO_ALWAYS,  buffer_is_zero_int_ge256 },
 };
 
 for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) {
@@ -211,7 +228,7 @@ bool test_buffer_is_zero_next_accel(void)
 return false;
 }
 
-#define INIT_ACCEL buffer_is_zero_integer
+#define INIT_ACCEL buffer_is_zero_int_ge256
 #endif
 
 static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL;
@@ -232,7 +249,7 @@ bool buffer_is_zero_ool(const void *buf, size_t len)
 if (likely(len >= 256)) {
 return buffer_is_zero_accel(buf, len);
 }
-return buffer_is_zero_integer(buf, len);
+return buffer_is_zero_int_lt256(buf, len);
 }
 
 bool buffer_is_zero_ge256(const void *buf, size_t len)
-- 
2.34.1

[PULL 04/10] util/bufferiszero: Remove useless prefetches

2024-05-03 Thread Richard Henderson

From: Alexander Monakov 

Use of prefetching in bufferiszero.c is quite questionable:

- prefetches are issued just a few CPU cycles before the corresponding
  line would be hit by demand loads;

- they are done for simple access patterns, i.e. where hardware
  prefetchers can perform better;

- they compete for load ports in loops that should be limited by load
  port throughput rather than ALU throughput.

Signed-off-by: Alexander Monakov 
Signed-off-by: Mikhail Romanov 
Reviewed-by: Richard Henderson 
Message-Id: <20240206204809.9859-5-amona...@ispras.ru>
---
 util/bufferiszero.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 972f394cbd..00118d649e 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -50,7 +50,6 @@ static bool buffer_is_zero_integer(const void *buf, size_t 
len)
 const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8);
 
 for (; p + 8 <= e; p += 8) {
-__builtin_prefetch(p + 8);
 if (t) {
 return false;
 }
@@ -80,7 +79,6 @@ buffer_zero_sse2(const void *buf, size_t len)
 
 /* Loop over 16-byte aligned blocks of 64.  */
 while (likely(p <= e)) {
-__builtin_prefetch(p);
 t = _mm_cmpeq_epi8(t, zero);
 if (unlikely(_mm_movemask_epi8(t) != 0x)) {
 return false;
@@ -111,7 +109,6 @@ buffer_zero_avx2(const void *buf, size_t len)
 
 /* Loop over 32-byte aligned blocks of 128.  */
 while (p <= e) {
-__builtin_prefetch(p);
 if (unlikely(!_mm256_testz_si256(t, t))) {
 return false;
 }
-- 
2.34.1

Re: [PATCH] kvm: ppc: disable sPAPR code if CONFIG_PSERIES is disabled

2024-05-03 Thread Philippe Mathieu-Daudé


On 3/5/24 15:49, Paolo Bonzini wrote:

target/ppc/kvm.c calls out to code in hw/ppc/spapr*.c; that code is
not present and fails to link if CONFIG_PSERIES is not enabled.
Adjust kvm.c to depend on CONFIG_PSERIES instead of TARGET_PPC64,
and compile out anything that requires cap_papr, because only
the pseries machine will call kvmppc_set_papr().

Signed-off-by: Paolo Bonzini 
---
  target/ppc/kvm.c | 17 +
  1 file changed, 13 insertions(+), 4 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PULL 10/10] tests/bench: Add bufferiszero-bench

2024-05-03 Thread Richard Henderson

Benchmark each acceleration function vs an aligned buffer of zeros.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tests/bench/bufferiszero-bench.c | 47 
 tests/bench/meson.build  |  1 +
 2 files changed, 48 insertions(+)
 create mode 100644 tests/bench/bufferiszero-bench.c

diff --git a/tests/bench/bufferiszero-bench.c b/tests/bench/bufferiszero-bench.c
new file mode 100644
index 00..222695c1fa
--- /dev/null
+++ b/tests/bench/bufferiszero-bench.c
@@ -0,0 +1,47 @@
+/*
+ * QEMU buffer_is_zero speed benchmark
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "qemu/units.h"
+
+static void test(const void *opaque)
+{
+size_t max = 64 * KiB;
+void *buf = g_malloc0(max);
+int accel_index = 0;
+
+do {
+if (accel_index != 0) {
+g_test_message("%s", "");  /* gnu_printf Werror for simple "" */
+}
+for (size_t len = 1 * KiB; len <= max; len *= 4) {
+double total = 0.0;
+
+g_test_timer_start();
+do {
+buffer_is_zero_ge256(buf, len);
+total += len;
+} while (g_test_timer_elapsed() < 0.5);
+
+total /= MiB;
+g_test_message("buffer_is_zero #%d: %2zuKB %8.0f MB/sec",
+   accel_index, len / (size_t)KiB,
+   total / g_test_timer_last());
+}
+accel_index++;
+} while (test_buffer_is_zero_next_accel());
+
+g_free(buf);
+}
+
+int main(int argc, char **argv)
+{
+g_test_init(&argc, &argv, NULL);
+g_test_add_data_func("/cutils/bufferiszero/speed", NULL, test);
+return g_test_run();
+}
diff --git a/tests/bench/meson.build b/tests/bench/meson.build
index 7e76338a52..4cd7a2f6b5 100644
--- a/tests/bench/meson.build
+++ b/tests/bench/meson.build
@@ -21,6 +21,7 @@ benchs = {}
 
 if have_block
   benchs += {
+ 'bufferiszero-bench': [],
  'benchmark-crypto-hash': [crypto],
  'benchmark-crypto-hmac': [crypto],
  'benchmark-crypto-cipher': [crypto],
-- 
2.34.1

[PULL 02/10] util/bufferiszero: Remove AVX512 variant

2024-05-03 Thread Richard Henderson

From: Alexander Monakov 

Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD
routines are invoked much more rarely in normal use when most buffers
are non-zero. This makes use of AVX512 unprofitable, as it incurs extra
frequency and voltage transition periods during which the CPU operates
at reduced performance, as described in
https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html

Signed-off-by: Mikhail Romanov 
Signed-off-by: Alexander Monakov 
Reviewed-by: Richard Henderson 
Message-Id: <20240206204809.9859-4-amona...@ispras.ru>
Signed-off-by: Richard Henderson 
---
 util/bufferiszero.c | 38 +++---
 1 file changed, 3 insertions(+), 35 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index f5a3634f9a..641d5f9b9e 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -64,7 +64,7 @@ buffer_zero_int(const void *buf, size_t len)
 }
 }
 
-#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || 
defined(__SSE2__)
+#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
 #include 
 
 /* Note that each of these vectorized functions require len >= 64.  */
@@ -128,41 +128,12 @@ buffer_zero_avx2(const void *buf, size_t len)
 }
 #endif /* CONFIG_AVX2_OPT */
 
-#ifdef CONFIG_AVX512F_OPT
-static bool __attribute__((target("avx512f")))
-buffer_zero_avx512(const void *buf, size_t len)
-{
-/* Begin with an unaligned head of 64 bytes.  */
-__m512i t = _mm512_loadu_si512(buf);
-__m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64);
-__m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64);
-
-/* Loop over 64-byte aligned blocks of 256.  */
-while (p <= e) {
-__builtin_prefetch(p);
-if (unlikely(_mm512_test_epi64_mask(t, t))) {
-return false;
-}
-t = p[-4] | p[-3] | p[-2] | p[-1];
-p += 4;
-}
-
-t |= _mm512_loadu_si512(buf + len - 4 * 64);
-t |= _mm512_loadu_si512(buf + len - 3 * 64);
-t |= _mm512_loadu_si512(buf + len - 2 * 64);
-t |= _mm512_loadu_si512(buf + len - 1 * 64);
-
-return !_mm512_test_epi64_mask(t, t);
-
-}
-#endif /* CONFIG_AVX512F_OPT */
-
 /*
  * Make sure that these variables are appropriately initialized when
  * SSE2 is enabled on the compiler command-line, but the compiler is
  * too old to support CONFIG_AVX2_OPT.
  */
-#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
+#if defined(CONFIG_AVX2_OPT)
 # define INIT_USED 0
 # define INIT_LENGTH   0
 # define INIT_ACCELbuffer_zero_int
@@ -188,9 +159,6 @@ select_accel_cpuinfo(unsigned info)
 unsigned len;
 bool (*fn)(const void *, size_t);
 } all[] = {
-#ifdef CONFIG_AVX512F_OPT
-{ CPUINFO_AVX512F, 256, buffer_zero_avx512 },
-#endif
 #ifdef CONFIG_AVX2_OPT
 { CPUINFO_AVX2,128, buffer_zero_avx2 },
 #endif
@@ -208,7 +176,7 @@ select_accel_cpuinfo(unsigned info)
 return 0;
 }
 
-#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
+#if defined(CONFIG_AVX2_OPT)
 static void __attribute__((constructor)) init_accel(void)
 {
 used_accel = select_accel_cpuinfo(cpuinfo_init());
-- 
2.34.1

[PULL 07/10] util/bufferiszero: Introduce biz_accel_fn typedef

2024-05-03 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 util/bufferiszero.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index c9a7ded016..f9af7841ba 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -26,7 +26,8 @@
 #include "qemu/bswap.h"
 #include "host/cpuinfo.h"
 
-static bool (*buffer_is_zero_accel)(const void *, size_t);
+typedef bool (*biz_accel_fn)(const void *, size_t);
+static biz_accel_fn buffer_is_zero_accel;
 
 static bool buffer_is_zero_int_lt256(const void *buf, size_t len)
 {
@@ -184,7 +185,7 @@ select_accel_cpuinfo(unsigned info)
 /* Array is sorted in order of algorithm preference. */
 static const struct {
 unsigned bit;
-bool (*fn)(const void *, size_t);
+biz_accel_fn fn;
 } all[] = {
 #ifdef CONFIG_AVX2_OPT
 { CPUINFO_AVX2,buffer_zero_avx2 },
@@ -231,7 +232,7 @@ bool test_buffer_is_zero_next_accel(void)
 #define INIT_ACCEL buffer_is_zero_int_ge256
 #endif
 
-static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL;
+static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL;
 
 bool buffer_is_zero_ool(const void *buf, size_t len)
 {
-- 
2.34.1

[PULL 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel

2024-05-03 Thread Richard Henderson

Because the three alternatives are monotonic, we don't need
to keep a couple of bitmasks, just identify the strongest
alternative at startup.

Generalize test_buffer_is_zero_next_accel and init_accel
by always defining an accel_table array.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 util/bufferiszero.c | 81 -
 1 file changed, 35 insertions(+), 46 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index f9af7841ba..7218154a13 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -27,7 +27,6 @@
 #include "host/cpuinfo.h"
 
 typedef bool (*biz_accel_fn)(const void *, size_t);
-static biz_accel_fn buffer_is_zero_accel;
 
 static bool buffer_is_zero_int_lt256(const void *buf, size_t len)
 {
@@ -179,60 +178,35 @@ buffer_zero_avx2(const void *buf, size_t len)
 }
 #endif /* CONFIG_AVX2_OPT */
 
-static unsigned __attribute__((noinline))
-select_accel_cpuinfo(unsigned info)
-{
-/* Array is sorted in order of algorithm preference. */
-static const struct {
-unsigned bit;
-biz_accel_fn fn;
-} all[] = {
+static biz_accel_fn const accel_table[] = {
+buffer_is_zero_int_ge256,
+buffer_zero_sse2,
 #ifdef CONFIG_AVX2_OPT
-{ CPUINFO_AVX2,buffer_zero_avx2 },
+buffer_zero_avx2,
 #endif
-{ CPUINFO_SSE2,buffer_zero_sse2 },
-{ CPUINFO_ALWAYS,  buffer_is_zero_int_ge256 },
-};
+};
 
-for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) {
-if (info & all[i].bit) {
-buffer_is_zero_accel = all[i].fn;
-return all[i].bit;
-}
+static unsigned best_accel(void)
+{
+unsigned info = cpuinfo_init();
+
+#ifdef CONFIG_AVX2_OPT
+if (info & CPUINFO_AVX2) {
+return 2;
 }
-return 0;
+#endif
+return info & CPUINFO_SSE2 ? 1 : 0;
 }
 
-static unsigned used_accel;
-
-static void __attribute__((constructor)) init_accel(void)
-{
-used_accel = select_accel_cpuinfo(cpuinfo_init());
-}
-
-#define INIT_ACCEL NULL
-
-bool test_buffer_is_zero_next_accel(void)
-{
-/*
- * Accumulate the accelerators that we've already tested, and
- * remove them from the set to test this round.  We'll get back
- * a zero from select_accel_cpuinfo when there are no more.
- */
-unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel);
-used_accel |= used;
-return used;
-}
 #else
-bool test_buffer_is_zero_next_accel(void)
-{
-return false;
-}
-
-#define INIT_ACCEL buffer_is_zero_int_ge256
+#define best_accel() 0
+static biz_accel_fn const accel_table[1] = {
+buffer_is_zero_int_ge256
+};
 #endif
 
-static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL;
+static biz_accel_fn buffer_is_zero_accel;
+static unsigned accel_index;
 
 bool buffer_is_zero_ool(const void *buf, size_t len)
 {
@@ -257,3 +231,18 @@ bool buffer_is_zero_ge256(const void *buf, size_t len)
 {
 return buffer_is_zero_accel(buf, len);
 }
+
+bool test_buffer_is_zero_next_accel(void)
+{
+if (accel_index != 0) {
+buffer_is_zero_accel = accel_table[--accel_index];
+return true;
+}
+return false;
+}
+
+static void __attribute__((constructor)) init_accel(void)
+{
+accel_index = best_accel();
+buffer_is_zero_accel = accel_table[accel_index];
+}
-- 
2.34.1

[PULL 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-05-03 Thread Richard Henderson

Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
double-check with the compiler flags for __ARM_NEON and don't bother with
a runtime check.  Otherwise, model the loop after the x86 SSE2 function.

Use UMAXV for the vector reduction.  This is 3 cycles on cortex-a76 and
2 cycles on neoverse-n1.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 util/bufferiszero.c | 67 +
 1 file changed, 67 insertions(+)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 7218154a13..74864f7b78 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -198,6 +198,73 @@ static unsigned best_accel(void)
 return info & CPUINFO_SSE2 ? 1 : 0;
 }
 
+#elif defined(__aarch64__) && defined(__ARM_NEON)
+#include 
+
+/*
+ * Helper for preventing the compiler from reassociating
+ * chains of binary vector operations.
+ */
+#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1))
+
+static bool buffer_is_zero_simd(const void *buf, size_t len)
+{
+uint32x4_t t0, t1, t2, t3;
+
+/* Align head/tail to 16-byte boundaries.  */
+const uint32x4_t *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16);
+const uint32x4_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16);
+
+/* Unaligned loads at head/tail.  */
+t0 = vld1q_u32(buf) | vld1q_u32(buf + len - 16);
+
+/* Collect a partial block at tail end.  */
+t1 = e[-7] | e[-6];
+t2 = e[-5] | e[-4];
+t3 = e[-3] | e[-2];
+t0 |= e[-1];
+REASSOC_BARRIER(t0, t1);
+REASSOC_BARRIER(t2, t3);
+t0 |= t1;
+t2 |= t3;
+REASSOC_BARRIER(t0, t2);
+t0 |= t2;
+
+/*
+ * Loop over complete 128-byte blocks.
+ * With the head and tail removed, e - p >= 14, so the loop
+ * must iterate at least once.
+ */
+do {
+/*
+ * Reduce via UMAXV.  Whatever the actual result,
+ * it will only be zero if all input bytes are zero.
+ */
+if (unlikely(vmaxvq_u32(t0) != 0)) {
+return false;
+}
+
+t0 = p[0] | p[1];
+t1 = p[2] | p[3];
+t2 = p[4] | p[5];
+t3 = p[6] | p[7];
+REASSOC_BARRIER(t0, t1);
+REASSOC_BARRIER(t2, t3);
+t0 |= t1;
+t2 |= t3;
+REASSOC_BARRIER(t0, t2);
+t0 |= t2;
+p += 8;
+} while (p < e - 7);
+
+return vmaxvq_u32(t0) == 0;
+}
+
+#define best_accel() 1
+static biz_accel_fn const accel_table[] = {
+buffer_is_zero_int_ge256,
+buffer_is_zero_simd,
+};
 #else
 #define best_accel() 0
 static biz_accel_fn const accel_table[1] = {
-- 
2.34.1

[PULL 05/10] util/bufferiszero: Optimize SSE2 and AVX2 variants

2024-05-03 Thread Richard Henderson

From: Alexander Monakov 

Increase unroll factor in SIMD loops from 4x to 8x in order to move
their bottlenecks from ALU port contention to load issue rate (two loads
per cycle on popular x86 implementations).

Avoid using out-of-bounds pointers in loop boundary conditions.

Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of
PTEST, which is not profitable there (like in the removed SSE4 variant).

Signed-off-by: Alexander Monakov 
Signed-off-by: Mikhail Romanov 
Reviewed-by: Richard Henderson 
Message-Id: <20240206204809.9859-6-amona...@ispras.ru>
---
 util/bufferiszero.c | 111 +---
 1 file changed, 73 insertions(+), 38 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 00118d649e..02df82b4ff 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -67,62 +67,97 @@ static bool buffer_is_zero_integer(const void *buf, size_t 
len)
 #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
 #include 
 
-/* Note that each of these vectorized functions require len >= 64.  */
+/* Helper for preventing the compiler from reassociating
+   chains of binary vector operations.  */
+#define SSE_REASSOC_BARRIER(vec0, vec1) asm("" : "+x"(vec0), "+x"(vec1))
+
+/* Note that these vectorized functions may assume len >= 256.  */
 
 static bool __attribute__((target("sse2")))
 buffer_zero_sse2(const void *buf, size_t len)
 {
-__m128i t = _mm_loadu_si128(buf);
-__m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16);
-__m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16);
-__m128i zero = _mm_setzero_si128();
+/* Unaligned loads at head/tail.  */
+__m128i v = *(__m128i_u *)(buf);
+__m128i w = *(__m128i_u *)(buf + len - 16);
+/* Align head/tail to 16-byte boundaries.  */
+const __m128i *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16);
+const __m128i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16);
+__m128i zero = { 0 };
 
-/* Loop over 16-byte aligned blocks of 64.  */
-while (likely(p <= e)) {
-t = _mm_cmpeq_epi8(t, zero);
-if (unlikely(_mm_movemask_epi8(t) != 0x)) {
+/* Collect a partial block at tail end.  */
+v |= e[-1]; w |= e[-2];
+SSE_REASSOC_BARRIER(v, w);
+v |= e[-3]; w |= e[-4];
+SSE_REASSOC_BARRIER(v, w);
+v |= e[-5]; w |= e[-6];
+SSE_REASSOC_BARRIER(v, w);
+v |= e[-7]; v |= w;
+
+/*
+ * Loop over complete 128-byte blocks.
+ * With the head and tail removed, e - p >= 14, so the loop
+ * must iterate at least once.
+ */
+do {
+v = _mm_cmpeq_epi8(v, zero);
+if (unlikely(_mm_movemask_epi8(v) != 0x)) {
 return false;
 }
-t = p[-4] | p[-3] | p[-2] | p[-1];
-p += 4;
-}
+v = p[0]; w = p[1];
+SSE_REASSOC_BARRIER(v, w);
+v |= p[2]; w |= p[3];
+SSE_REASSOC_BARRIER(v, w);
+v |= p[4]; w |= p[5];
+SSE_REASSOC_BARRIER(v, w);
+v |= p[6]; w |= p[7];
+SSE_REASSOC_BARRIER(v, w);
+v |= w;
+p += 8;
+} while (p < e - 7);
 
-/* Finish the aligned tail.  */
-t |= e[-3];
-t |= e[-2];
-t |= e[-1];
-
-/* Finish the unaligned tail.  */
-t |= _mm_loadu_si128(buf + len - 16);
-
-return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0x;
+return _mm_movemask_epi8(_mm_cmpeq_epi8(v, zero)) == 0x;
 }
 
 #ifdef CONFIG_AVX2_OPT
 static bool __attribute__((target("avx2")))
 buffer_zero_avx2(const void *buf, size_t len)
 {
-/* Begin with an unaligned head of 32 bytes.  */
-__m256i t = _mm256_loadu_si256(buf);
-__m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32);
-__m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32);
+/* Unaligned loads at head/tail.  */
+__m256i v = *(__m256i_u *)(buf);
+__m256i w = *(__m256i_u *)(buf + len - 32);
+/* Align head/tail to 32-byte boundaries.  */
+const __m256i *p = QEMU_ALIGN_PTR_DOWN(buf + 32, 32);
+const __m256i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 32);
+__m256i zero = { 0 };
 
-/* Loop over 32-byte aligned blocks of 128.  */
-while (p <= e) {
-if (unlikely(!_mm256_testz_si256(t, t))) {
+/* Collect a partial block at tail end.  */
+v |= e[-1]; w |= e[-2];
+SSE_REASSOC_BARRIER(v, w);
+v |= e[-3]; w |= e[-4];
+SSE_REASSOC_BARRIER(v, w);
+v |= e[-5]; w |= e[-6];
+SSE_REASSOC_BARRIER(v, w);
+v |= e[-7]; v |= w;
+
+/* Loop over complete 256-byte blocks.  */
+for (; p < e - 7; p += 8) {
+/* PTEST is not profitable here.  */
+v = _mm256_cmpeq_epi8(v, zero);
+if (unlikely(_mm256_movemask_epi8(v) != 0x)) {
 return false;
 }
-t = p[-4] | p[-3] | p[-2] | p[-1];
-p += 4;
-} ;
+v = p[0]; w = p[1];
+SSE_REASSOC_BARRIER(v, w);
+v |= p[2]; w |= p[3];
+SSE_REASSOC_BARRIER(v, w);
+v |= p[4]; w |= p[5];
+SSE_REASSOC_BARRIER(v, w);
+

[PULL 03/10] util/bufferiszero: Reorganize for early test for acceleration

2024-05-03 Thread Richard Henderson

From: Alexander Monakov 

Test for length >= 256 inline, where is is often a constant.
Before calling into the accelerated routine, sample three bytes
from the buffer, which handles most non-zero buffers.

Signed-off-by: Alexander Monakov 
Signed-off-by: Mikhail Romanov 
Message-Id: <20240206204809.9859-3-amona...@ispras.ru>
[rth: Use __builtin_constant_p; move the indirect call out of line.]
Signed-off-by: Richard Henderson 
---
 include/qemu/cutils.h | 32 -
 util/bufferiszero.c   | 84 +--
 2 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index 92c927a6a3..741dade7cf 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -187,9 +187,39 @@ char *freq_to_str(uint64_t freq_hz);
 /* used to print char* safely */
 #define STR_OR_NULL(str) ((str) ? (str) : "null")
 
-bool buffer_is_zero(const void *buf, size_t len);
+/*
+ * Check if a buffer is all zeroes.
+ */
+
+bool buffer_is_zero_ool(const void *vbuf, size_t len);
+bool buffer_is_zero_ge256(const void *vbuf, size_t len);
 bool test_buffer_is_zero_next_accel(void);
 
+static inline bool buffer_is_zero_sample3(const char *buf, size_t len)
+{
+/*
+ * For any reasonably sized buffer, these three samples come from
+ * three different cachelines.  In qemu-img usage, we find that
+ * each byte eliminates more than half of all buffer testing.
+ * It is therefore critical to performance that the byte tests
+ * short-circuit, so that we do not pull in additional cache lines.
+ * Do not "optimize" this to !(a | b | c).
+ */
+return !buf[0] && !buf[len - 1] && !buf[len / 2];
+}
+
+#ifdef __OPTIMIZE__
+static inline bool buffer_is_zero(const void *buf, size_t len)
+{
+return (__builtin_constant_p(len) && len >= 256
+? buffer_is_zero_sample3(buf, len) &&
+  buffer_is_zero_ge256(buf, len)
+: buffer_is_zero_ool(buf, len));
+}
+#else
+#define buffer_is_zero  buffer_is_zero_ool
+#endif
+
 /*
  * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128)
  * Input is limited to 14-bit numbers
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 641d5f9b9e..972f394cbd 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -26,8 +26,9 @@
 #include "qemu/bswap.h"
 #include "host/cpuinfo.h"
 
-static bool
-buffer_zero_int(const void *buf, size_t len)
+static bool (*buffer_is_zero_accel)(const void *, size_t);
+
+static bool buffer_is_zero_integer(const void *buf, size_t len)
 {
 if (unlikely(len < 8)) {
 /* For a very small buffer, simply accumulate all the bytes.  */
@@ -128,60 +129,38 @@ buffer_zero_avx2(const void *buf, size_t len)
 }
 #endif /* CONFIG_AVX2_OPT */
 
-/*
- * Make sure that these variables are appropriately initialized when
- * SSE2 is enabled on the compiler command-line, but the compiler is
- * too old to support CONFIG_AVX2_OPT.
- */
-#if defined(CONFIG_AVX2_OPT)
-# define INIT_USED 0
-# define INIT_LENGTH   0
-# define INIT_ACCELbuffer_zero_int
-#else
-# ifndef __SSE2__
-#  error "ISA selection confusion"
-# endif
-# define INIT_USED CPUINFO_SSE2
-# define INIT_LENGTH   64
-# define INIT_ACCELbuffer_zero_sse2
-#endif
-
-static unsigned used_accel = INIT_USED;
-static unsigned length_to_accel = INIT_LENGTH;
-static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL;
-
 static unsigned __attribute__((noinline))
 select_accel_cpuinfo(unsigned info)
 {
 /* Array is sorted in order of algorithm preference. */
 static const struct {
 unsigned bit;
-unsigned len;
 bool (*fn)(const void *, size_t);
 } all[] = {
 #ifdef CONFIG_AVX2_OPT
-{ CPUINFO_AVX2,128, buffer_zero_avx2 },
+{ CPUINFO_AVX2,buffer_zero_avx2 },
 #endif
-{ CPUINFO_SSE2, 64, buffer_zero_sse2 },
-{ CPUINFO_ALWAYS,0, buffer_zero_int },
+{ CPUINFO_SSE2,buffer_zero_sse2 },
+{ CPUINFO_ALWAYS,  buffer_is_zero_integer },
 };
 
 for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) {
 if (info & all[i].bit) {
-length_to_accel = all[i].len;
-buffer_accel = all[i].fn;
+buffer_is_zero_accel = all[i].fn;
 return all[i].bit;
 }
 }
 return 0;
 }
 
-#if defined(CONFIG_AVX2_OPT)
+static unsigned used_accel;
+
 static void __attribute__((constructor)) init_accel(void)
 {
 used_accel = select_accel_cpuinfo(cpuinfo_init());
 }
-#endif /* CONFIG_AVX2_OPT */
+
+#define INIT_ACCEL NULL
 
 bool test_buffer_is_zero_next_accel(void)
 {
@@ -194,36 +173,37 @@ bool test_buffer_is_zero_next_accel(void)
 used_accel |= used;
 return used;
 }
-
-static bool select_accel_fn(const void *buf, size_t len)
-{
-if (likely(len >= length_to_accel)) {
-return buffer_accel(buf, len);
-}
-return buffer_zero_int(buf, len);
-}
-
 #else
-#define select_accel_fn  buffer_zero_int
 bool test_

[PULL 00/10] bufferiszero improvements

2024-05-03 Thread Richard Henderson

The following changes since commit 4977ce198d2390bff8c71ad5cb1a5f6aa24b56fb:

  Merge tag 'pull-tcg-20240501' of https://gitlab.com/rth7680/qemu into staging 
(2024-05-01 15:15:33 -0700)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-misc-20240503

for you to fetch changes up to a06d9eddb015a9f5895161b0a3958a2e4be21579:

  tests/bench: Add bufferiszero-bench (2024-05-03 08:03:35 -0700)


util/bufferiszero:
  - Remove sse4.1 and avx512 variants
  - Reorganize for early test for acceleration
  - Remove useless prefetches
  - Optimize sse2, avx2 and integer variants
  - Add simd acceleration for aarch64
  - Add bufferiszero-bench


Alexander Monakov (5):
  util/bufferiszero: Remove SSE4.1 variant
  util/bufferiszero: Remove AVX512 variant
  util/bufferiszero: Reorganize for early test for acceleration
  util/bufferiszero: Remove useless prefetches
  util/bufferiszero: Optimize SSE2 and AVX2 variants

Richard Henderson (5):
  util/bufferiszero: Improve scalar variant
  util/bufferiszero: Introduce biz_accel_fn typedef
  util/bufferiszero: Simplify test_buffer_is_zero_next_accel
  util/bufferiszero: Add simd acceleration for aarch64
  tests/bench: Add bufferiszero-bench

 include/qemu/cutils.h|  32 ++-
 tests/bench/bufferiszero-bench.c |  47 
 util/bufferiszero.c  | 465 +--
 tests/bench/meson.build  |   1 +
 4 files changed, 324 insertions(+), 221 deletions(-)
 create mode 100644 tests/bench/bufferiszero-bench.c

[PULL 01/10] util/bufferiszero: Remove SSE4.1 variant

2024-05-03 Thread Richard Henderson

From: Alexander Monakov 

The SSE4.1 variant is virtually identical to the SSE2 variant, except
for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing
if an SSE register is all zeroes. The PTEST instruction decodes to two
uops, so it can be handled only by the complex decoder, and since
CMP+JNE are macro-fused, both sequences decode to three uops. The uops
comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so
PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch
standpoint.

Hence, the use of PTEST brings no benefit from throughput standpoint.
Its latency is not important, since it feeds only a conditional jump,
which terminates the dependency chain.

I never observed PTEST variants to be faster on real hardware.

Signed-off-by: Alexander Monakov 
Signed-off-by: Mikhail Romanov 
Reviewed-by: Richard Henderson 
Message-Id: <20240206204809.9859-2-amona...@ispras.ru>
---
 util/bufferiszero.c | 29 -
 1 file changed, 29 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 3e6a5dfd63..f5a3634f9a 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -100,34 +100,6 @@ buffer_zero_sse2(const void *buf, size_t len)
 }
 
 #ifdef CONFIG_AVX2_OPT
-static bool __attribute__((target("sse4")))
-buffer_zero_sse4(const void *buf, size_t len)
-{
-__m128i t = _mm_loadu_si128(buf);
-__m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16);
-__m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16);
-
-/* Loop over 16-byte aligned blocks of 64.  */
-while (likely(p <= e)) {
-__builtin_prefetch(p);
-if (unlikely(!_mm_testz_si128(t, t))) {
-return false;
-}
-t = p[-4] | p[-3] | p[-2] | p[-1];
-p += 4;
-}
-
-/* Finish the aligned tail.  */
-t |= e[-3];
-t |= e[-2];
-t |= e[-1];
-
-/* Finish the unaligned tail.  */
-t |= _mm_loadu_si128(buf + len - 16);
-
-return _mm_testz_si128(t, t);
-}
-
 static bool __attribute__((target("avx2")))
 buffer_zero_avx2(const void *buf, size_t len)
 {
@@ -221,7 +193,6 @@ select_accel_cpuinfo(unsigned info)
 #endif
 #ifdef CONFIG_AVX2_OPT
 { CPUINFO_AVX2,128, buffer_zero_avx2 },
-{ CPUINFO_SSE4, 64, buffer_zero_sse4 },
 #endif
 { CPUINFO_SSE2, 64, buffer_zero_sse2 },
 { CPUINFO_ALWAYS,0, buffer_zero_int },
-- 
2.34.1

Re: [PATCH v2 08/33] accel/tcg: Record DisasContextBase in tcg_ctx for plugins

2024-05-03 Thread Philippe Mathieu-Daudé


On 25/4/24 01:31, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  include/tcg/tcg.h  | 1 +
  accel/tcg/plugin-gen.c | 1 +
  2 files changed, 2 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3 0/5] accel/tcg: Call tcg_flush_jmp_cache() again when creating user-mode cpu

2024-05-03 Thread Philippe Mathieu-Daudé


+Claudio & Peter

On 3/5/24 14:34, Philippe Mathieu-Daudé wrote:


Philippe Mathieu-Daudé (5):
   accel/tcg: Move SoftMMU specific units to softmmu_specific_ss[]
   accel/tcg: Move system emulation files under sysemu/ subdirectory
   accel/tcg: Do not define cpu_exec_reset_hold() as stub
   accel/tcg: Introduce common tcg_exec_cpu_reset_hold() method


Richard raised this question: Why AccelOpsClass is system-only?
(also related, why "sysemu/cpus.h" is).

Re: [PATCH 3/5] tcg/i386: Optimize setcond of TST{EQ,NE} with 0xffffffff

2024-05-03 Thread Philippe Mathieu-Daudé


On 24/4/24 19:09, Richard Henderson wrote:

This may be treated as a 32-bit EQ/NE comparison against 0,
which is in turn treated as a LTU/GEU comparison against 1.

Signed-off-by: Richard Henderson 
---
  tcg/i386/tcg-target.c.inc | 17 +++--
  1 file changed, 15 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 5/5] accel/tcg: Introduce CF_BP_PAGE

2024-05-03 Thread Philippe Mathieu-Daudé


On 24/4/24 19:09, Richard Henderson wrote:

Record the fact that we've found a breakpoint on the page
in which a TranslationBlock is running.

Signed-off-by: Richard Henderson 
---
  include/exec/translation-block.h | 1 +
  accel/tcg/cpu-exec.c | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 1/5] tcg: Add write_aofs to GVecGen3i

2024-05-03 Thread Philippe Mathieu-Daudé


On 24/4/24 19:09, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  include/tcg/tcg-op-gvec-common.h |  2 ++
  tcg/tcg-op-gvec.c| 30 ++
  2 files changed, 24 insertions(+), 8 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] target/arm: Restrict translation disabled alignment check to VMSA

2024-05-03 Thread Richard Henderson


On 5/3/24 07:58, Philippe Mathieu-Daudé wrote:

On 24/4/24 19:09, Richard Henderson wrote:

For cpus using PMSA, when the MPU is disabled, the default memory
type is Normal, Non-cachable.

Fixes: 59754f85ed3 ("target/arm: Do memory type alignment check when translation 
disabled")
Reported-by: Clément Chigot 
Signed-off-by: Richard Henderson 
---

Since v9 will likely be tagged tomorrow without this fixed,
Cc: qemu-sta...@nongnu.org

---
  target/arm/tcg/hflags.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index 5da1b0fc1d..66de30b828 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -38,8 +38,16 @@ static bool aprofile_require_alignment(CPUARMState *env, int el, 
uint64_t sctlr)

  }
  /*
- * If translation is disabled, then the default memory type is
- * Device(-nGnRnE) instead of Normal, which requires that alignment
+ * With PMSA, when the MPU is disabled, all memory types in the
+ * default map is Normal.
+ */
+    if (arm_feature(env, ARM_FEATURE_PMSA)) {
+    return false;
+    }
+
+    /*
+ * With VMSA, if translation is disabled, then the default memory type
+ * is Device(-nGnRnE) instead of Normal, which requires that alignment
   * be enforced.  Since this affects all ram, it is most efficient
   * to handle this during translation.
   */


This one is in target-arm.next:
https://lore.kernel.org/qemu-devel/cafeaca98urblsaxkzlskunc2g_rzd56vequksgsttoadfke...@mail.gmail.com/


Yes, that was a stray patch that accidentally got re-posted with this series.


r~

Re: [PATCH] target/arm: Restrict translation disabled alignment check to VMSA

2024-05-03 Thread Philippe Mathieu-Daudé


On 24/4/24 19:09, Richard Henderson wrote:

For cpus using PMSA, when the MPU is disabled, the default memory
type is Normal, Non-cachable.

Fixes: 59754f85ed3 ("target/arm: Do memory type alignment check when translation 
disabled")
Reported-by: Clément Chigot 
Signed-off-by: Richard Henderson 
---

Since v9 will likely be tagged tomorrow without this fixed,
Cc: qemu-sta...@nongnu.org

---
  target/arm/tcg/hflags.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index 5da1b0fc1d..66de30b828 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -38,8 +38,16 @@ static bool aprofile_require_alignment(CPUARMState *env, int 
el, uint64_t sctlr)
  }
  
  /*

- * If translation is disabled, then the default memory type is
- * Device(-nGnRnE) instead of Normal, which requires that alignment
+ * With PMSA, when the MPU is disabled, all memory types in the
+ * default map is Normal.
+ */
+if (arm_feature(env, ARM_FEATURE_PMSA)) {
+return false;
+}
+
+/*
+ * With VMSA, if translation is disabled, then the default memory type
+ * is Device(-nGnRnE) instead of Normal, which requires that alignment
   * be enforced.  Since this affects all ram, it is most efficient
   * to handle this during translation.
   */


This one is in target-arm.next:
https://lore.kernel.org/qemu-devel/cafeaca98urblsaxkzlskunc2g_rzd56vequksgsttoadfke...@mail.gmail.com/

[PATCH v4] vfio/pci: migration: Skip config space check for Vendor Specific Information in VSC during restore/load

2024-05-03 Thread Vinayak Kale

In case of migration, during restore operation, qemu checks config space of the
pci device with the config space in the migration stream captured during save
operation. In case of config space data mismatch, restore operation is failed.

config space check is done in function get_pci_config_device(). By default VSC
(vendor-specific-capability) in config space is checked.

Due to qemu's config space check for VSC, live migration is broken across NVIDIA
vGPU devices in situation where source and destination host driver is different.
In this situation, Vendor Specific Information in VSC varies on the destination
to ensure vGPU feature capabilities exposed to the guest driver are compatible
with destination host.

If a vfio-pci device is migration capable and vfio-pci vendor driver is OK with
volatile Vendor Specific Info in VSC then qemu should exempt config space check
for Vendor Specific Info. It is vendor driver's responsibility to ensure that
VSC is consistent across migration. Here consistency could mean that VSC format
should be same on source and destination, however actual Vendor Specific Info
may not be byte-to-byte identical.

This patch skips the check for Vendor Specific Information in VSC for VFIO-PCI
device by clearing pdev->cmask[] offsets. Config space check is still enforced
for 3 byte VSC header. If cmask[] is not set for an offset, then qemu skips
config space check for that offset.

VSC check is skipped for machine types >= 9.1. The check would be enforced on
older machine types (<= 9.0).

Signed-off-by: Vinayak Kale 
Cc: Alex Williamson 
Cc: Michael S. Tsirkin 
Cc: Cédric Le Goater 
---
Version History
v3->v4:
- VSC check is skipped for machine types >= 9.1. The check would be enforced
  on older machine types (<= 9.0).
v2->v3:
- Config space check skipped only for Vendor Specific Info in VSC, check is
  still enforced for 3 byte VSC header.
- Updated commit description with live migration failure scenario.
v1->v2:
- Limited scope of change to vfio-pci devices instead of all pci devices.

 hw/core/machine.c |  1 +
 hw/vfio/pci.c | 26 ++
 hw/vfio/pci.h |  1 +
 3 files changed, 28 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 4ff60911e7..fc3eb5115f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -35,6 +35,7 @@
 
 GlobalProperty hw_compat_9_0[] = {
 {"arm-cpu", "backcompat-cntfrq", "true" },
+{"vfio-pci", "skip-vsc-check", "false" },
 };
 const size_t hw_compat_9_0_len = G_N_ELEMENTS(hw_compat_9_0);
 
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 64780d1b79..2ece9407cc 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2134,6 +2134,28 @@ static void vfio_check_af_flr(VFIOPCIDevice *vdev, 
uint8_t pos)
 }
 }
 
+static int vfio_add_vendor_specific_cap(VFIOPCIDevice *vdev, int pos,
+uint8_t size, Error **errp)
+{
+PCIDevice *pdev = &vdev->pdev;
+
+pos = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, size, errp);
+if (pos < 0) {
+return pos;
+}
+
+/*
+ * Exempt config space check for Vendor Specific Information during
+ * restore/load.
+ * Config space check is still enforced for 3 byte VSC header.
+ */
+if (vdev->skip_vsc_check && size > 3) {
+memset(pdev->cmask + pos + 3, 0, size - 3);
+}
+
+return pos;
+}
+
 static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t pos, Error **errp)
 {
 ERRP_GUARD();
@@ -2202,6 +2224,9 @@ static int vfio_add_std_cap(VFIOPCIDevice *vdev, uint8_t 
pos, Error **errp)
 vfio_check_af_flr(vdev, pos);
 ret = pci_add_capability(pdev, cap_id, pos, size, errp);
 break;
+case PCI_CAP_ID_VNDR:
+ret = vfio_add_vendor_specific_cap(vdev, pos, size, errp);
+break;
 default:
 ret = pci_add_capability(pdev, cap_id, pos, size, errp);
 break;
@@ -3390,6 +3415,7 @@ static Property vfio_pci_dev_properties[] = {
 DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
  TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
 #endif
+DEFINE_PROP_BOOL("skip-vsc-check", VFIOPCIDevice, skip_vsc_check, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 6e64a2654e..92cd62d115 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -177,6 +177,7 @@ struct VFIOPCIDevice {
 OnOffAuto ramfb_migrate;
 bool defer_kvm_irq_routing;
 bool clear_parent_atomics_on_exit;
+bool skip_vsc_check;
 VFIODisplay *dpy;
 Notifier irqchip_change_notifier;
 };
-- 
2.34.1

Re: [PATCH v5] Hexagon: add PC alignment check and exception

2024-05-03 Thread Richard Henderson


On 5/3/24 06:38, Matheus Tavares Bernardino wrote:

On Thu, 2 May 2024 13:00:34 -0700 Richard Henderson 
 wrote:


On 5/2/24 12:20, Matheus Tavares Bernardino wrote:


+
+void test_multi_cof(void)
+{
+asm volatile(
+"p0 = cmp.eq(r0, r0)\n"
+"{\n"
+"if (p0) jump test_multi_cof_unaligned\n"
+"if (!p0) jump 1f\n"
+"}\n"
+"1: nop\n"


Does it work to write "jump 1f+1" or something?


Unfortunately no :( The assembler will align the address when encoding the
instruction. The only working examples I could think of is using a separated
file, like before, or manually encoding the instruction with a misaligned
address and place it with a `.word` directive... Any preferences, or other
suggestions?


Oof.  The assembler is being too helpful.  :-P

Perhaps using a different section could solve the fragility issue:

asm("
.pushsection .text.evil
.org 3
...
.popsection
");

(adjusting syntax as necessary for correctness), then it doesn't matter where in the 
output assembly the fragment lands.



r~

[PATCH v2] hmp/migration: Fix "migrate" command's documentation

2024-05-03 Thread Peter Xu

Peter missed the Sphinx HMP document for the "resume/-r" flag in commit
7a4da28b26 ("qmp: hmp: add migrate "resume" option").  Add it.

When at it, slightly cleanup the lines around:

  - Move "detach/-d" to a separate section rather than appending it at the
  end of the command description. Add a hint for how to query the migration
  results in detached mode.

  - Add "postcopy" keyword to "resume/-r" help messages, as it only applies
  to postcopy.

Cc: Dr. David Alan Gilbert 
Cc: Fabiano Rosas 
Fixes: 7a4da28b26 ("qmp: hmp: add migrate "resume" option")
Reported-by: Markus Armbruster 
Reviewed-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
Based-on: <20240430142737.29066-1-faro...@suse.de>
("[PATCH v3 0/6] migration removals & deprecations")
---
 hmp-commands.hx | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ebca2cdced..06746f0afc 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -912,14 +912,20 @@ ERST
 .args_type  = "detach:-d,resume:-r,uri:s",
 .params = "[-d] [-r] uri",
 .help   = "migrate to URI (using -d to not wait for completion)"
- "\n\t\t\t -r to resume a paused migration",
+ "\n\t\t\t -r to resume a paused postcopy migration",
 .cmd= hmp_migrate,
 },
 
 
 SRST
-``migrate [-d]`` *uri*
-  Migrate to *uri* (using -d to not wait for completion).
+``migrate [-d] [-r]`` *uri*
+  Migrate the VM to *uri*.
+
+  ``-d``
+Start the migration process, but do not wait for its completion.  To
+query an ongoing migration process, use "info migrate".
+  ``-r``
+Resume a paused postcopy migration.
 ERST
 
 {
-- 
2.44.0

Re: [PATCH] hmp/migration: Fix documents for "migrate" command

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 04:08:45PM +0200, Markus Armbruster wrote:
> If there's still time, suggest to tweak the subject to
> 
> hmp/migration: Fix "migrate" command's documentation

Yes there is. :)

> 
> Peter Xu  writes:
> 
> > On Fri, May 03, 2024 at 08:58:09AM +0200, Markus Armbruster wrote:
> >> Peter Xu  writes:
> >> 
> >> > Peter missed the Sphinx HMP document for the "resume/-r" flag in commit
> >> > 7a4da28b26 ("qmp: hmp: add migrate "resume" option").  Add it.  Avoid
> >> > adding a Fixes to make life easier for the stable maintainer.
> >> 
> >> I'm curious: how does not adding Fixes: make life easier?
> >
> > Because if I attach Fixes then IIUC Michael will read it through and judge
> > whether it should apply to stable, where I want to skip that for him
> > because I think this doesn't apply to stable.  Reasons:
> >
> >   - This is a document update, IIUC we normally only keep the latest
> > document uptodate, not all the stable versions (especiailly for HMP,
> > which isn't a stable ABI)?  I assume it applies the same when a qtest
> > case got a slight fixup.
> >
> >   - This patch is even more special as it will need explicit backport due
> > to the removal of block migration, and I really don't think any of us
> > should spend time on that..
> 
> Right.  But Fixes: is also for downstreams, who may want to make their
> own decisions.
> 
> I think I'd always add Fixes:.  When I think there's a need to steer
> stable away from it, I'd say so in the commit message.  I doubt needed
> here, as the subject states it's just a doc fix for HMP.

OK, I'll attach a Fixes.

> 
> >> > When at it, slightly cleanup the lines, move "detach/-d" to a separate
> >> > section rather than appending it at the end of the command description.
> >> >
> >> > Cc: Dr. David Alan Gilbert 
> >> > Cc: Fabiano Rosas 
> >> > Cc: Markus Armbruster 
> >> > Signed-off-by: Peter Xu 
> >> > ---
> >> >
> >> > Based-on: <20240430142737.29066-1-faro...@suse.de>
> >> > ("[PATCH v3 0/6] migration removals & deprecations")
> >> > ---
> >> >  hmp-commands.hx | 9 +++--
> >> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> >> > index ebca2cdced..484a8a1c3a 100644
> >> > --- a/hmp-commands.hx
> >> > +++ b/hmp-commands.hx
> >> > @@ -918,8 +918,13 @@ ERST
> >>{
> >>.name   = "migrate",
> >>.args_type  = "detach:-d,blk:-b,inc:-i,resume:-r,uri:s",
> >>.params = "[-d] [-b] [-i] [-r] uri",
> >>.help   = "migrate to URI (using -d to not wait for 
> >> completion)"
> >>  "\n\t\t\t -b for migration without shared storage 
> >> with"
> >>  " full copy of disk\n\t\t\t -i for migration 
> >> without "
> >>  "shared storage with incremental copy of disk "
> >>  "(base image shared between src and destination)"
> >>  "\n\t\t\t -r to resume a paused migration",
> >>.cmd= hmp_migrate,
> >>},
> >> >  
> >> >  
> >> >  SRST
> >> > -``migrate [-d]`` *uri*
> >> > -  Migrate to *uri* (using -d to not wait for completion).
> >> > +``migrate [-d] [-r]`` *uri*
> >> > +  Migrate the current VM to *uri*.
> >> 
> >> Could there be any other VM than the current one?  Scratch "current"?
> >
> > I didn't have "current" until I generated the doc and read, then I see
> > right below "migrate_cancel" has it:
> >
> > SRST
> > ``migrate_cancel``
> >   Cancel the current VM migration.
> > ERST
> >
> > But maybe it means "current migration", not "current VM".. So yeah I can
> > drop it.
> >
> >> 
> >> > +
> >> > +  ``-d``
> >> > +Run this command asynchronously, so that the command doesn't wait 
> >> > for completion.
> >> 
> >> What is run asynchronously, and what isn't waiting?  These are two
> >> different entities, aren't they?  Calling them "this command" and "the
> >> command" is confusing :)
> >> 
> >> Perhaps
> >> 
> >>Start the migration process, but do not wait for its completion.
> >> 
> >> Maybe add a hint on how to wait or poll for completion?
> >
> > Yes this reads better; I will add the hint too.
> >
> >> 
> >> > +  ``-r``
> >> > +Resume a paused postcopy migration.
> >> 
> >> .help doesn't have "postcopy".  Should it?
> >
> > It should.
> >
> > This is the fixup I'll squash when sending v2, let me know if there's other
> > early comments, thanks.
> >
> > ===8<===
> >
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > index 484a8a1c3a..06746f0afc 100644
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -912,17 +912,18 @@ ERST
> >  .args_type  = "detach:-d,resume:-r,uri:s",
> >  .params = "[-d] [-r] uri",
> >  .help   = "migrate to URI (using -d to not wait for 
> > completion)"
> > - "\n\t\t\t -r to resume a paused migration",
> > + "\n\t\t\t -r to resume a

Re: [PATCH v3 07/16] aspeed/smc: fix dma moving incorrect data length issue

2024-05-03 Thread Cédric Le Goater


On 4/30/24 10:51, Jamin Lin wrote:

Hi Cedric,

On 4/19/24 15:41, Cédric Le Goater wrote:

On 4/16/24 11:18, Jamin Lin wrote:

DMA length is from 1 byte to 32MB for AST2600 and AST10x0 and DMA
length is from 4 bytes to 32MB for AST2500.

In other words, if "R_DMA_LEN" is 0, it should move at least 1 byte
data for AST2600 and AST10x0 and 4 bytes data for AST2500.

To support all ASPEED SOCs, adds dma_start_length parameter to store

the start length, add helper routines function to compute the dma
length and update DMA_LENGTH mask to "1FF" to fix dma moving
incorrect data length issue.


OK. There are two problems to address, the "zero" length transfer and
the DMA length unit, which is missing today. Newer SoC use a 1 bit /
byte and older ones, AST2400 and AST2500, use 1 bit / 4 bytes.

We can introduce a AspeedSMCClass::dma_len_unit and rework the loop to :

      do {

    

     if (s->regs[R_DMA_LEN]) {
      s->regs[R_DMA_LEN] -= 4 / asc->dma_len_unit;
      }
      } while (s->regs[R_DMA_LEN]);

It should fix the current implementation.



I checked what FW is doing on a QEMU ast2500-evb machine :

  U-Boot 2019.04-v00.04.12 (Sep 29 2022 - 10:40:37 +)
  ...

 Loading Kernel Image ... aspeed_smc_write @0x88 size 4:
0x80001000
  aspeed_smc_write @0x84 size 4: 0x20100130
  aspeed_smc_write @0x8c size 4: 0x3c6770
  aspeed_smc_write @0x80 size 4: 0x1
  aspeed_smc_dma_rw read flash:@0x00100130 dram:@0x1000
size:0x003c6774
  aspeed_smc_read @0x8 size 4: 0x800
  aspeed_smc_write @0x80 size 4: 0x0
  OK
 Loading Ramdisk to 8fef2000, end 8604 ... aspeed_smc_write
@0x88 size 4: 0x8fef2000
  aspeed_smc_write @0x84 size 4: 0x204cdde0
  aspeed_smc_write @0x8c size 4: 0x10d604
  aspeed_smc_write @0x80 size 4: 0x1
  aspeed_smc_dma_rw read flash:@0x004cdde0 dram:@0xfef2000
size:0x0010d608
  aspeed_smc_read @0x8 size 4: 0x800
  aspeed_smc_write @0x80 size 4: 0x0
  OK
 Loading Device Tree to 8fee7000, end 8fef135e ... aspeed_smc_write
@0x88 size 4: 0x8fee7000
  aspeed_smc_write @0x84 size 4: 0x204c69b4
  aspeed_smc_write @0x8c size 4: 0x7360
  aspeed_smc_write @0x80 size 4: 0x1
  aspeed_smc_dma_rw read flash:@0x004c69b4 dram:@0xfee7000
size:0x7364
  aspeed_smc_read @0x8 size 4: 0x800
  aspeed_smc_write @0x80 size 4: 0x0
  OK

  Starting kernel ...

It seems that the R_DMA_LEN register is set by FW without taking into account
the length unit ( bit / 4 bytes). Would you know why ?


https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/lib/string.c#L559
This line make user input data length 4 bytes alignment.
https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2019.04/arch/arm/mach-aspeed/ast2500/utils.S#L35


yes. I don't see any 1bit / 4bytes conversion when setting the DMA_LEN
register. Am I mistaking ? That's not what the specs says.


This line set the value of count parameter to AST_FMC_DNA_LENGTH.
AST_FMC_DMA_LENGTH is 4 bytes alignment value.
Example: input 4
((4+3)/4) * 4 --> (7/4) *4 ---> 4
If AST_FMC_DMA_LENGTH is 0, it means it should move 4 bytes data and 
AST_FMC_DMA_LENGTH do not need to be divided by 4.


ok. For that, I think you could replace aspeed_smc_dma_len() with

   return QEMU_ALIGN_UP(s->regs[R_DMA_LEN] + asc->dma_start_length, 4);

Thanks,

C.






If I change the model to match 1 bit / 4 bytes unit of the R_DMA_LEN register.
Linux fails to boot. I didn't dig further and this is something we need to
understand before committing.


I don't think this is necessary to add a Fixes tag because the problem
has been there for ages and no one reported it. Probably because the
only place DMA transfers are used is in U-Boot and transfers have a
non-zero length.


Currently, only supports dma length 4 bytes aligned.


Is this 4 bytes alignment new for the AST2700 or is this something you added
because the mask of DMA_LENGTH is now relaxed to match all addresses ?

#define DMA_LENGTH(val) ((val) & 0x01FF)

AST2700, AST2600 and AST1030 is from 1 byte to 1FF, so I change this Micro 
to fix data lost.
https://github.com/AspeedTech-BMC/u-boot/blob/aspeed-master-v2023.10/arch/arm/mach-aspeed/ast2700/spi.c#L186
Please see this line, it decrease dma_len 1 byte first then, set to DMA_LEN 
register because DMA_LEN is 0 which means should move 1 byte data if DMA 
enables for AST2600, AST1030 and AST2700.


Thanks,

C.


Only AST2500 need 4 bytes alignment for DMA Length. However, according to the 
design of sapped_smc_dma_rw function,
it utilizes address_space_stl_le API and it only works data 4 bytes alignment. 
https://github.com/qemu/qemu/blob/master/hw/ssi/aspeed_smc.c#L889
For example,
If users want to move 0x101 data_length, after 0x100 data has been moved and 
remains 1 byte data need to be moved.
Please see this line program, 
https://github.com/qemu/qemu/blob/master/hw/ssi/aspeed_smc.c#L940
```
s->regs[R_DMA_LE

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-03 Thread Peter Xu

On Fri, May 03, 2024 at 08:40:03AM +0200, Jinpu Wang wrote:
> I had a brief check in the rsocket changelog, there seems some
> improvement over time,
>  might be worth revisiting this. due to socket abstraction, we can't
> use some feature like
>  ODP, it won't be a small and easy task.

It'll be good to know whether Dan's suggestion would work first, without
rewritting everything yet so far.  Not sure whether some perf test could
help with the rsocket APIs even without QEMU's involvements (or looking for
test data supporting / invalidate such conversions).

Thanks,

-- 
Peter Xu

Re: [PATCH v3 00/19] Add a host IOMMU device abstraction to check with vIOMMU

2024-05-03 Thread Cédric Le Goater


On 5/3/24 16:10, Jason Gunthorpe wrote:

On Fri, May 03, 2024 at 04:04:25PM +0200, Cédric Le Goater wrote:

However, have you considered another/complementary approach which
would be to create an host IOMMU (iommufd) backend object and a vIOMMU
device object together for each vfio-pci device being plugged in the
machine ?

Something like,
 -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0 \
 -object iommufd,id=iommufd1 \
 -device 
intel-iommu,intremap=on,device-iotlb=on,caching-mode=on,iommufd=iommufd1 \
 -device vfio-pci,host=:08:10.0,bus=pci.1,iommufd=iommufd0


? The main point of this is to have a single iommufd FD open in
qemu. Not multiple.


oups. The above example should have the same IOMMUFD object device
instance: iommufd0.

This is bogus copy-paste of a command line with multiple vfio-pci
devices, each using its own IOMMUFD object device instance. That's
where the idea comes from. Sorry for the noise.

C.

Re: [PATCH v3 08/16] aspeed/smc: support 64 bits dma dram address

2024-05-03 Thread Cédric Le Goater


Hello Jamin,

On 4/30/24 09:56, Jamin Lin wrote:

Hi Cedric,


-Original Message-
From: Cédric Le Goater 
Sent: Tuesday, April 30, 2024 3:26 PM
To: Jamin Lin ; Peter Maydell
; Andrew Jeffery ;
Joel Stanley ; Alistair Francis ; Cleber
Rosa ; Philippe Mathieu-Daudé ;
Wainer dos Santos Moschetta ; Beraldo Leal
; open list:ASPEED BMCs ; open
list:All patches CC here 
Cc: Troy Lee ; Yunlin Tang

Subject: Re: [PATCH v3 08/16] aspeed/smc: support 64 bits dma dram address

On 4/19/24 08:00, Jamin Lin wrote:

Hi Cedric,


Hello Jamin,

On 4/16/24 11:18, Jamin Lin wrote:

AST2700 support the maximum dram size is 8GiB and has a "DMA DRAM

Side

Address High Part(0x7C)"
register to support 64 bits dma dram address.
Add helper routines functions to compute the dma dram address, new
features and update trace-event to support 64 bits dram address.

Signed-off-by: Troy Lee 
Signed-off-by: Jamin Lin 
---
hw/ssi/aspeed_smc.c | 66

+++--

hw/ssi/trace-events |  2 +-
2 files changed, 59 insertions(+), 9 deletions(-)

diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c index
71abc7a2d8..a67cac3d0f 100644
--- a/hw/ssi/aspeed_smc.c
+++ b/hw/ssi/aspeed_smc.c
@@ -132,6 +132,9 @@
#define   FMC_WDT2_CTRL_BOOT_SOURCE  BIT(4) /* O:

primary

1: alternate */

#define   FMC_WDT2_CTRL_EN   BIT(0)

+/* DMA DRAM Side Address High Part (AST2700) */
+#define R_DMA_DRAM_ADDR_HIGH   (0x7c / 4)
+
/* DMA Control/Status Register */
#define R_DMA_CTRL(0x80 / 4)
#define   DMA_CTRL_REQUEST  (1 << 31)
@@ -187,6 +190,7 @@
 *   0x1FF: 32M bytes
 */
#define DMA_DRAM_ADDR(asc, val)   ((val) &

(asc)->dma_dram_mask)

+#define DMA_DRAM_ADDR_HIGH(val)   ((val) & 0xf)
#define DMA_FLASH_ADDR(asc, val)  ((val) &

(asc)->dma_flash_mask)

#define DMA_LENGTH(val) ((val) & 0x01FF)

@@ -207,6 +211,7 @@ static const AspeedSegments

aspeed_2500_spi2_segments[];

#define ASPEED_SMC_FEATURE_DMA   0x1
#define ASPEED_SMC_FEATURE_DMA_GRANT 0x2
#define ASPEED_SMC_FEATURE_WDT_CONTROL 0x4
+#define ASPEED_SMC_FEATURE_DMA_DRAM_ADDR_HIGH 0x08

static inline bool aspeed_smc_has_dma(const AspeedSMCClass *asc)
{
@@ -218,6 +223,11 @@ static inline bool

aspeed_smc_has_wdt_control(const AspeedSMCClass *asc)

return !!(asc->features &

ASPEED_SMC_FEATURE_WDT_CONTROL);

}

+static inline bool aspeed_smc_has_dma_dram_addr_high(const
+AspeedSMCClass *asc)


To ease the reading, I would call the helper aspeed_smc_has_dma64()

Will fix it



+{
+return !!(asc->features &

ASPEED_SMC_FEATURE_DMA_DRAM_ADDR_HIGH);

+}
+
#define aspeed_smc_error(fmt, ...)

\

qemu_log_mask(LOG_GUEST_ERROR, "%s: " fmt "\n", __func__,

##

__VA_ARGS__)

@@ -747,6 +757,9 @@ static uint64_t aspeed_smc_read(void *opaque,

hwaddr addr, unsigned int size)

(aspeed_smc_has_dma(asc) && addr == R_DMA_CTRL) ||
(aspeed_smc_has_dma(asc) && addr ==

R_DMA_FLASH_ADDR)

||

(aspeed_smc_has_dma(asc) && addr ==

R_DMA_DRAM_ADDR)

||

+(aspeed_smc_has_dma(asc) &&
+ aspeed_smc_has_dma_dram_addr_high(asc) &&
+ addr == R_DMA_DRAM_ADDR_HIGH) ||
(aspeed_smc_has_dma(asc) && addr == R_DMA_LEN) ||
(aspeed_smc_has_dma(asc) && addr ==

R_DMA_CHECKSUM)

||

(addr >= R_SEG_ADDR0 &&
@@ -847,6 +860,23 @@ static bool

aspeed_smc_inject_read_failure(AspeedSMCState *s)

}
}

+static uint64_t aspeed_smc_dma_dram_addr(AspeedSMCState *s) {
+AspeedSMCClass *asc = ASPEED_SMC_GET_CLASS(s);
+uint64_t dram_addr_high;
+uint64_t dma_dram_addr;
+
+if (aspeed_smc_has_dma_dram_addr_high(asc)) {
+dram_addr_high = s->regs[R_DMA_DRAM_ADDR_HIGH];
+dram_addr_high <<= 32;
+dma_dram_addr = dram_addr_high |

s->regs[R_DMA_DRAM_ADDR];

Here is a proposal to shorten the routine :

   return ((uint64_t) s->regs[R_DMA_DRAM_ADDR_HIGH] << 32)

|

   s->regs[R_DMA_DRAM_ADDR];



+} else {
+dma_dram_addr = s->regs[R_DMA_DRAM_ADDR];


and
   return s->regs[R_DMA_DRAM_ADDR];


+}
+
+return dma_dram_addr;
+}
+

Thanks for your suggestion. Will fix.

static uint32_t aspeed_smc_dma_len(AspeedSMCState *s)
{
AspeedSMCClass *asc = ASPEED_SMC_GET_CLASS(s); @@

-914,24

+944,34 @@ static void aspeed_smc_dma_checksum(AspeedSMCState *s)

static void aspeed_smc_dma_rw(AspeedSMCState *s)
{
+AspeedSMCClass *asc = ASPEED_SMC_GET_CLASS(s);
+uint64_t dram_addr_high;


This variable doesn't look very useful

Will try to remove it.



+uint64_t dma_dram_addr;
+uint64_t dram_addr;


and dram_addr is redundant with dma_dram_addr. Please use only one.

Please see my below description and please give us any suggestion.




MemTxResult result;
uint32_t dma_len;
uint32_t data;

dma_len = aspeed_smc_dma_len(s);
+dma_dram_addr = aspeed_s

Re: [PATCH v3 00/19] Add a host IOMMU device abstraction to check with vIOMMU

2024-05-03 Thread Jason Gunthorpe

On Fri, May 03, 2024 at 04:04:25PM +0200, Cédric Le Goater wrote:
> However, have you considered another/complementary approach which
> would be to create an host IOMMU (iommufd) backend object and a vIOMMU
> device object together for each vfio-pci device being plugged in the
> machine ?
> 
> Something like,
> -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0 \
> -object iommufd,id=iommufd1 \
> -device 
> intel-iommu,intremap=on,device-iotlb=on,caching-mode=on,iommufd=iommufd1 \
> -device vfio-pci,host=:08:10.0,bus=pci.1,iommufd=iommufd0

? The main point of this is to have a single iommufd FD open in
qemu. Not multiple. Would you achieve this with a iommufd0 and
iommufd1 ?

Jason

Re: [PATCH] hmp/migration: Fix documents for "migrate" command

2024-05-03 Thread Markus Armbruster

If there's still time, suggest to tweak the subject to

hmp/migration: Fix "migrate" command's documentation

Peter Xu  writes:

> On Fri, May 03, 2024 at 08:58:09AM +0200, Markus Armbruster wrote:
>> Peter Xu  writes:
>> 
>> > Peter missed the Sphinx HMP document for the "resume/-r" flag in commit
>> > 7a4da28b26 ("qmp: hmp: add migrate "resume" option").  Add it.  Avoid
>> > adding a Fixes to make life easier for the stable maintainer.
>> 
>> I'm curious: how does not adding Fixes: make life easier?
>
> Because if I attach Fixes then IIUC Michael will read it through and judge
> whether it should apply to stable, where I want to skip that for him
> because I think this doesn't apply to stable.  Reasons:
>
>   - This is a document update, IIUC we normally only keep the latest
> document uptodate, not all the stable versions (especiailly for HMP,
> which isn't a stable ABI)?  I assume it applies the same when a qtest
> case got a slight fixup.
>
>   - This patch is even more special as it will need explicit backport due
> to the removal of block migration, and I really don't think any of us
> should spend time on that..

Right.  But Fixes: is also for downstreams, who may want to make their
own decisions.

I think I'd always add Fixes:.  When I think there's a need to steer
stable away from it, I'd say so in the commit message.  I doubt needed
here, as the subject states it's just a doc fix for HMP.

>> > When at it, slightly cleanup the lines, move "detach/-d" to a separate
>> > section rather than appending it at the end of the command description.
>> >
>> > Cc: Dr. David Alan Gilbert 
>> > Cc: Fabiano Rosas 
>> > Cc: Markus Armbruster 
>> > Signed-off-by: Peter Xu 
>> > ---
>> >
>> > Based-on: <20240430142737.29066-1-faro...@suse.de>
>> > ("[PATCH v3 0/6] migration removals & deprecations")
>> > ---
>> >  hmp-commands.hx | 9 +++--
>> >  1 file changed, 7 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/hmp-commands.hx b/hmp-commands.hx
>> > index ebca2cdced..484a8a1c3a 100644
>> > --- a/hmp-commands.hx
>> > +++ b/hmp-commands.hx
>> > @@ -918,8 +918,13 @@ ERST
>>{
>>.name   = "migrate",
>>.args_type  = "detach:-d,blk:-b,inc:-i,resume:-r,uri:s",
>>.params = "[-d] [-b] [-i] [-r] uri",
>>.help   = "migrate to URI (using -d to not wait for 
>> completion)"
>>  "\n\t\t\t -b for migration without shared storage 
>> with"
>>  " full copy of disk\n\t\t\t -i for migration 
>> without "
>>  "shared storage with incremental copy of disk "
>>  "(base image shared between src and destination)"
>>  "\n\t\t\t -r to resume a paused migration",
>>.cmd= hmp_migrate,
>>},
>> >  
>> >  
>> >  SRST
>> > -``migrate [-d]`` *uri*
>> > -  Migrate to *uri* (using -d to not wait for completion).
>> > +``migrate [-d] [-r]`` *uri*
>> > +  Migrate the current VM to *uri*.
>> 
>> Could there be any other VM than the current one?  Scratch "current"?
>
> I didn't have "current" until I generated the doc and read, then I see
> right below "migrate_cancel" has it:
>
> SRST
> ``migrate_cancel``
>   Cancel the current VM migration.
> ERST
>
> But maybe it means "current migration", not "current VM".. So yeah I can
> drop it.
>
>> 
>> > +
>> > +  ``-d``
>> > +Run this command asynchronously, so that the command doesn't wait for 
>> > completion.
>> 
>> What is run asynchronously, and what isn't waiting?  These are two
>> different entities, aren't they?  Calling them "this command" and "the
>> command" is confusing :)
>> 
>> Perhaps
>> 
>>Start the migration process, but do not wait for its completion.
>> 
>> Maybe add a hint on how to wait or poll for completion?
>
> Yes this reads better; I will add the hint too.
>
>> 
>> > +  ``-r``
>> > +Resume a paused postcopy migration.
>> 
>> .help doesn't have "postcopy".  Should it?
>
> It should.
>
> This is the fixup I'll squash when sending v2, let me know if there's other
> early comments, thanks.
>
> ===8<===
>
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 484a8a1c3a..06746f0afc 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -912,17 +912,18 @@ ERST
>  .args_type  = "detach:-d,resume:-r,uri:s",
>  .params = "[-d] [-r] uri",
>  .help   = "migrate to URI (using -d to not wait for completion)"
> - "\n\t\t\t -r to resume a paused migration",
> + "\n\t\t\t -r to resume a paused postcopy migration",
>  .cmd= hmp_migrate,
>  },
>  
>  
>  SRST
>  ``migrate [-d] [-r]`` *uri*
> -  Migrate the current VM to *uri*.
> +  Migrate the VM to *uri*.
>  
>``-d``
> -Run this command asynchronously, so that the command doesn't wait for 
> completion.
> +Start the migration process, but do not wait for its completion.  To
> +

1 2 >

1 - 100 of 198 matches

Mail list logo