date:20240422

Re: Detecting whether dom0 is in a VM

2024-04-22 Thread Jürgen Groß

On 22.04.24 09:12, Jan Beulich wrote:

On 19.04.2024 17:29, George Dunlap wrote:

On Fri, Jul 7, 2023 at 3:56 PM George Dunlap wrote:

Xen's public interface offers access to the featuresets known / found /
used by the hypervisor. See XEN_SYSCTL_get_cpu_featureset, accessible
via xc_get_cpu_featureset().

Are any of these exposed in dom0 via sysctl, or hypfs?

sysctl - yes (as the quoted name also says). hypfs no, afaict.

SYSCTLs are
unfortunately not stable interfaces, correct? So it wouldn't be practical
for systemd to use them.

Indeed, neither sysctl-s nor the libxc interfaces are stable.

Thinking of it, xen-cpuid is a wrapper tool around those. They may want
to look at its output (and, if they want to use it, advise distros to
also package it), which I think we try to keep reasonably stable,
albeit without providing any guarantees.

We haven't had any clear guidance yet on what the systemd team want in the question; I just sort of assumed they wanted the L-1
virtualization *if possible*. It sounds like `vm-other` would be acceptable,
particularly as a fall-back output if there's no way to get Xen's picture of the
cpuid.

It looks like xen-cpuid is available on Fedora, Debian, Ubuntu, and the old
Virt SIG CentOS packages; so I'd expect most packages to follow suit. That's a
place to start.

Just to take the discussion all the way to its conclusion:

- Supposing xen-cpuid isn't available, is there any other way to tell if Xen is
running in a VM from dom0?

- Would it make sense to expose that information somewhere, either in sysfs or
in hypfs (or both?), so that eventually even systems which may not get the memo
about packaging xen-cpuid will get support (or if the systemd guys would rather
avoid executing another process if possible)?

Resurrecting this thread.

To recap:

Currently, `systemd-detect-virt` has the following input / output table:

1. Xen on real hardware, domU: xen
2. Xen on real hardware, dom0: vm-other
3. Xen in a VM, domU: xen
4. Xen in aVM, dom0: vm-other

It's the dom0 lines (2 and 4) which are problematic; we'd ideally like
those to be `none` and `vm-other` (or the actual value, like `xen` or
`kvm`).

It looks like ATM, /proc/xen/capabilities will contain `control_d` if
it's a dom0. Simply unilaterally returning `none` if
/proc/xen/capabilities contains `control_d` would correct the vast
majority of instances (since the number of instances of Xen on real
hardware is presumably higher than the number running virtualized).

Is /proc/xen/capabilities expected to stay around? I don't see
anything equivalent in /dev/xen.

I believe it's intended to stay around, but a definitive answer can only
come from Linux folks. Jürgen, Stefano?

I have no plans to remove it.

Juergen

Jan

Other than adding a new interface to Xen, is there any way to tell if
Xen is running in a VM? If we do need to expose an interface, what
would be the best way to do that?

-George

[xen-unstable test] 185762: tolerable FAIL - PUSHED

2024-04-22 Thread osstest service owner

flight 185762 xen-unstable real [real]
flight 185766 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/185762/
http://logs.test-lab.xenproject.org/osstest/logs/185766/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-pvshim   13 debian-fixupfail pass in 185766-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 185754
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185754
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185754
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185754
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185754
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185754
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  cccb7878f386fb8691b7e28957a562a66d9b875f
baseline version:
 xen  09cb855f33fb619c87d8d4250d3a980f568f151b

Last test of basis   185754  2024-04-22 01:53:44 Z1 days
Testing same since   185762  2024-04-22 17:08:52 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Jason Andryuk 
  Julien Grall 
  Nicola Vetrini 
  Oleksii Kurochko 
  Roger Pau Monné 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf

Re: [PATCH v2 1/3] hotplug: Update block-tap

2024-04-22 Thread Jason Andryuk

On Mon, Apr 22, 2024 at 11:11 AM Anthony PERARD
 wrote:
>
> On Sun, Apr 07, 2024 at 04:49:51PM -0400, Jason Andryuk wrote:
> > diff --git a/tools/hotplug/Linux/block-tap b/tools/hotplug/Linux/block-tap
> > index 89247921b9..126e472786 100755
> > --- a/tools/hotplug/Linux/block-tap
> > +++ b/tools/hotplug/Linux/block-tap
> > @@ -18,11 +18,11 @@
> >  #
> >  # Usage:
> >  #
> > -# Target should be specified using the following syntax:
> > +# Disks should be specified using the following syntax:
> >  #
> > -# script=block-tap,vdev=xvda,target=:
> > +# vdev=xvda,backendtype=tap,format=vhd,target=/srv/target.vhd
>
> I still have unanswered question from the previous round:
> Is `block-tap` still going to work with the current example given in
> the script header? That is:
> "script=block-tap,vdev=xvda,target=:"
> Or maybe, that example is already broken?

Oh, right.  Sorry about that.

> If it's not broken, there could be users which rely on it. But maybe
> it's not really broken, and the new syntax is better anyway.
>
> My guess is that using "script=block-tap,..." might still work, but
> we should say something in the CHANGELOG to encourage people to move to
> the new syntax, with "backendtype=tap" to avoid issues.

I think the old syntax with "backendtype=phy" would work except for this:
-write_dev $dev
...
+# dev, as a unix socket, would end up with major:minor 0:0 in
+# physical-device if write_dev were used.  tapback would be thrown off by
+# that incorrect minor, so just skip writing physical-device.
+xenstore_write "$XENBUS_PATH/physical-device-path" "$dev"

write_dev is needed for blkback to see physical-device and set things
up properly.

I could create a second script, but that's a little silly for just the
single line.  Another approach would be to differentiate off the
device type, vbd vs. vbd3, and use write_dev or not that way.  Should
I just do that?

block-tap will only support backendtype=phy as long as blktap uses the
kernel driver.  In that case tap-ctl create is returning the kernel
module's block dev path.  Once the kernel drive support is removed,
backendtype=tap will be the only option - without it there is no block
dev for blkback.

FYI, "backendtype=tap" is more performant because it is a shorter data path:
blkfront -> tapback/tapdisk
vs
blkfront -> blkback -> /dev/xen/blktap-2/tapdevN -> tapdisk

There are extra copies between blktap and the tapdev.

> In any case, the patch looks good:
> Reviewed-by: Anthony PERARD 

Thanks,
Jason

Re: (subset) [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking

2024-04-22 Thread Jens Axboe



On Fri, 19 Apr 2024 12:08:38 +0100, Pavel Begunkov wrote:
> Please, don't take directly, conflicts with io_uring.
> 
> To have per request buffer notifications each zerocopy io_uring send
> request allocates a new ubuf_info. However, as an skb can carry only
> one uarg, it may force the stack to create many small skbs hurting
> performance in many ways.
> 
> [...]

Applied, thanks!

[3/4] io_uring/notif: simplify io_notif_flush()
  commit: 5a569469b973cb7a6c58192a37dfb8418686e518
[4/4] io_uring/notif: implement notification stacking
  commit: 6fe4220912d19152a26ce19713ab232f4263018d

Best regards,
-- 
Jens Axboe

[ovmf test] 185764: all pass - PUSHED

2024-04-22 Thread osstest service owner

flight 185764 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/185764/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 86c8d69146310f24069701053a27153ae536ebba
baseline version:
 ovmf 7dd7b890582b4d696ca5fd436dbc5fb4bc30e385

Last test of basis   185759  2024-04-22 13:14:40 Z0 days
Testing same since   185764  2024-04-22 23:12:58 Z0 days1 attempts


People who touched revisions under test:
  Gua Guo 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   7dd7b89058..86c8d69146  86c8d69146310f24069701053a27153ae536ebba -> 
xen-tested-master

Re: [PATCH] x86/MTRR: avoid several indirect calls

2024-04-22 Thread Marek Marczykowski-Górecki

On Wed, Jan 17, 2024 at 10:32:53AM +0100, Jan Beulich wrote:
> --- a/xen/arch/x86/cpu/mtrr/main.c
> +++ b/xen/arch/x86/cpu/mtrr/main.c
> @@ -328,7 +316,7 @@ int mtrr_add_page(unsigned long base, un
>   }
>  
>   /*  If the type is WC, check that this processor supports it  */
> - if ((type == X86_MT_WC) && !have_wrcomb()) {
> + if ((type == X86_MT_WC) && mtrr_have_wrcomb()) {

Is reversing the condition intentional here? 

>   printk(KERN_WARNING
>  "mtrr: your processor doesn't support 
> write-combining\n");
>   return -EOPNOTSUPP;

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [PATCH io_uring-next/net-next v2 0/4] implement io_uring notification (ubuf_info) stacking

2024-04-22 Thread patchwork-bot+netdevbpf

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski :

On Fri, 19 Apr 2024 12:08:38 +0100 you wrote:
> Please, don't take directly, conflicts with io_uring.
> 
> To have per request buffer notifications each zerocopy io_uring send
> request allocates a new ubuf_info. However, as an skb can carry only
> one uarg, it may force the stack to create many small skbs hurting
> performance in many ways.
> 
> [...]

Here is the summary with links:
  - [io_uring-next/net-next,v2,1/4] net: extend ubuf_info callback to ops 
structure
https://git.kernel.org/netdev/net-next/c/7ab4f16f9e24
  - [io_uring-next/net-next,v2,2/4] net: add callback for setting a ubuf_info 
to skb
https://git.kernel.org/netdev/net-next/c/65bada80dec1
  - [io_uring-next/net-next,v2,3/4] io_uring/notif: simplify io_notif_flush()
(no matching commit)
  - [io_uring-next/net-next,v2,4/4] io_uring/notif: implement notification 
stacking
(no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: Detecting whether dom0 is in a VM

2024-04-22 Thread Stefano Stabellini

On Mon, 22 Apr 2024, Jan Beulich wrote:
> On 19.04.2024 17:29, George Dunlap wrote:
> > On Fri, Jul 7, 2023 at 3:56 PM George Dunlap  
> > wrote:
> >> Xen's public interface offers access to the featuresets known / found /
> >> used by the hypervisor. See XEN_SYSCTL_get_cpu_featureset, accessible
> >> via xc_get_cpu_featureset().
> >>
> >
> > Are any of these exposed in dom0 via sysctl, or hypfs?
> 
>  sysctl - yes (as the quoted name also says). hypfs no, afaict.
> 
> >  SYSCTLs are
> > unfortunately not stable interfaces, correct?  So it wouldn't be 
> > practical
> > for systemd to use them.
> 
>  Indeed, neither sysctl-s nor the libxc interfaces are stable.
> >>>
> >>> Thinking of it, xen-cpuid is a wrapper tool around those. They may want
> >>> to look at its output (and, if they want to use it, advise distros to
> >>> also package it), which I think we try to keep reasonably stable,
> >>> albeit without providing any guarantees.
> >>
> >>
> >> We haven't had any clear guidance yet on what the systemd team want in the 
> >>  question; I just sort of assumed they 
> >> wanted the L-1 virtualization *if possible*.  It sounds like `vm-other` 
> >> would be acceptable, particularly as a fall-back output if there's no way 
> >> to get Xen's picture of the cpuid.
> >>
> >> It looks like xen-cpuid is available on Fedora, Debian, Ubuntu, and the 
> >> old Virt SIG CentOS packages; so I'd expect most packages to follow suit.  
> >> That's a place to start.
> >>
> >> Just to take the discussion all the way to its conclusion:
> >>
> >> - Supposing xen-cpuid isn't available, is there any other way to tell if 
> >> Xen is running in a VM from dom0?
> >>
> >> - Would it make sense to expose that information somewhere, either in 
> >> sysfs or in hypfs (or both?), so that eventually even systems which may 
> >> not get the memo about packaging xen-cpuid will get support (or if the 
> >> systemd guys would rather avoid executing another process if possible)?
> > 
> > Resurrecting this thread.
> > 
> > To recap:
> > 
> > Currently, `systemd-detect-virt` has the following  input / output table:
> > 
> > 1. Xen on real hardware, domU: xen
> > 2. Xen on real hardware, dom0: vm-other
> > 3. Xen in a VM, domU: xen
> > 4. Xen in aVM, dom0: vm-other
> > 
> > It's the dom0 lines (2 and 4) which are problematic; we'd ideally like
> > those to be `none` and `vm-other` (or the actual value, like `xen` or
> > `kvm`).
> > 
> > It looks like ATM, /proc/xen/capabilities will contain `control_d` if
> > it's a dom0.  Simply unilaterally returning `none` if
> > /proc/xen/capabilities contains `control_d` would correct the vast
> > majority of instances (since the number of instances of Xen on real
> > hardware is presumably higher than the number running virtualized).
> > 
> > Is /proc/xen/capabilities expected to stay around?  I don't see
> > anything equivalent in /dev/xen.
> 
> I believe it's intended to stay around, but a definitive answer can only
> come from Linux folks. Jürgen, Stefano?

No plans to get rid of /proc/xen/capabilities that I am aware

[linux-5.4 test] 185761: regressions - trouble: broken/fail/pass

2024-04-22 Thread osstest service owner

flight 185761 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/185761/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt broken
 build-arm64-pvopsbroken  in 185433
 build-arm64-xsm  broken  in 185433
 build-arm64-libvirt  broken  in 185433
 build-arm64-libvirt  5 host-build-prep fail in 185433 REGR. vs. 185168
 build-arm64-pvops5 host-build-prep fail in 185433 REGR. vs. 185168
 build-arm64-xsm  5 host-build-prep fail in 185433 REGR. vs. 185168
 test-arm64-arm64-libvirt-raw 17 guest-start/debian.repeat fail in 185751 REGR. 
vs. 185168

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt  5 host-install(5)  broken pass in 185755
 test-amd64-i386-qemut-rhel6hvm-amd 14 guest-start/redhat.repeat fail in 185433 
pass in 185761
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 
185751 pass in 185761
 test-armhf-armhf-xl-multivcpu 14 guest-start fail in 185755 pass in 185761
 test-amd64-i386-libvirt-raw  12 debian-di-install  fail pass in 185433
 test-amd64-i386-libvirt-qcow2 12 debian-di-install fail pass in 185433
 test-amd64-i386-xl-vhd   12 debian-di-install  fail pass in 185433
 test-arm64-arm64-libvirt-raw 13 guest-startfail pass in 185751
 test-armhf-armhf-xl-qcow2 8 xen-boot   fail pass in 185755

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-examine  1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked in 185433 n/a
 test-armhf-armhf-xl-credit2  19 guest-start.2 fail in 185433 blocked in 185168
 test-amd64-i386-libvirt-raw 14 migrate-support-check fail in 185433 never pass
 test-amd64-i386-libvirt-qcow2 14 migrate-support-check fail in 185433 never 
pass
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat fail in 185751 like 
185168
 test-arm64-arm64-libvirt-raw 14 migrate-support-check fail in 185751 never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-check fail in 185751 never 
pass
 test-armhf-armhf-xl-credit1 18 guest-start/debian.repeat fail in 185755 
blocked in 185168
 test-armhf-armhf-libvirt 16 saverestore-support-check fail in 185755 like 
185168
 test-armhf-armhf-xl-credit1 15 migrate-support-check fail in 185755 never pass
 test-armhf-armhf-xl-credit1 16 saverestore-support-check fail in 185755 never 
pass
 test-armhf-armhf-libvirt15 migrate-support-check fail in 185755 never pass
 test-armhf-armhf-xl-qcow2   14 migrate-support-check fail in 185755 never pass
 test-armhf-armhf-xl-qcow2 15 saverestore-support-check fail in 185755 never 
pass
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185168
 test-armhf-armhf-xl-credit1  14 guest-start  fail  like 185168
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 185168
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185168
 test-armhf-armhf-xl-credit2  18 guest-start/debian.repeatfail  like 185168
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 185168
 test-armhf-armhf-xl-multivcpu 18 guest-start/debian.repeatfail like 185168
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185168
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185168
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185168
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 185168
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 185168
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass

[linux-linus test] 185758: regressions - FAIL

2024-04-22 Thread osstest service owner

flight 185758 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/185758/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-credit2   8 xen-boot fail REGR. vs. 185750
 test-armhf-armhf-libvirt-vhd  8 xen-boot fail REGR. vs. 185750

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-raw   8 xen-boot   fail pass in 185753

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt   16 saverestore-support-check fail blocked in 185750
 test-armhf-armhf-libvirt  8 xen-bootfail in 185753 like 185750
 test-armhf-armhf-xl-raw 14 migrate-support-check fail in 185753 never pass
 test-armhf-armhf-xl-raw 15 saverestore-support-check fail in 185753 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185750
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185750
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185750
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185750
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185750
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass

version targeted for testing:
 linuxed30a4a51bb196781c8058073ea720133a65596f
baseline version:
 linux977b1ef51866aa7170409af80740788d4f9c4841

Last test of basis   185750  2024-04-21 12:13:57 Z1 days
Testing same since   185753  2024-04-21 22:44:09 Z0 days2 attempts


People who touched revisions under test:
  Alan Stern 
  Alexander Usyskin 
  Andy Shevchenko 
  Andy Shevchenko 
  AngeloGioacchino Del Regno 
  bolan wang 
  Borislav Petkov (AMD) 
  Carlos Llamas 
  Christian A. Ehrhardt 
  Chuanhong Guo 
  Coia Prant 
  Dan Carpenter 
  Daniele Palmas 
  Dave Hansen 
  Dave Hansen  # for x86
  Emil Kronborg 
  Eric Biggers 
  Fabio Estevam 
  Finn Thain 
  Georgi Djakov 
  Gil Fine 
  Greg Kroah-Hartman 
  H. Peter Anvin (Intel) 
  Hans de Goede 
  Hou Wenlong 
  Ingo Molnar 
  Jerry Meng 
  Johan

Re: [PATCH 3/6] x86/alternative: Intend the relocation logic

2024-04-22 Thread Andrew Cooper

This of course intended to say indent...

~Andrew

[PATCH 2/6] x86/alternative: Walk all replacements in debug builds

2024-04-22 Thread Andrew Cooper

In debug builds, walk all alternative replacements with x86_decode_lite().

This checks that we can decode all instructions, and also lets us check that
disp8's don't leave the replacement block.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/alternative.c | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 2e7ba6e0b833..5bd256365def 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define MAX_PATCH_LEN (255-1)
@@ -464,6 +465,54 @@ static void __init _alternative_instructions(bool force)
 void __init alternative_instructions(void)
 {
 arch_init_ideal_nops();
+
+/*
+ * Walk all replacement instructions with x86_decode_lite().  This checks
+ * both that we can decode all instructions within the replacement, and
+ * that any near branch with a disp8 stays within the alternative itself.
+ */
+if ( IS_ENABLED(CONFIG_DEBUG) )
+{
+struct alt_instr *a;
+
+for ( a = __alt_instructions;
+  a < __alt_instructions_end; ++a )
+{
+void *repl = ALT_REPL_PTR(a);
+void *ip = repl, *end = ip + a->repl_len;
+
+if ( !a->repl_len )
+continue;
+
+for ( x86_decode_lite_t res; ip < end; ip += res.len )
+{
+res = x86_decode_lite(ip, end);
+
+if ( res.len <= 0 )
+{
+printk("Alternative for %ps [%*ph]\n",
+   ALT_ORIG_PTR(a), a->repl_len, repl);
+panic("Unable to decode instruction at +%u in 
alternative\n",
+  (unsigned int)(unsigned long)(ip - repl));
+}
+
+if ( res.rel_type == REL_TYPE_d8 )
+{
+int8_t *d8 = res.rel;
+void *target = ip + res.len + *d8;
+
+if ( target < repl || target > end )
+{
+printk("Alternative for %ps [%*ph]\n",
+   ALT_ORIG_PTR(a), a->repl_len, repl);
+panic("'JMP/Jcc disp8' at +%u leaves alternative 
block\n",
+  (unsigned int)(unsigned long)(ip - repl));
+}
+}
+}
+}
+}
+
 _alternative_instructions(false);
 }
 
-- 
2.30.2

[PATCH 5/6] x86/alternative: Relocate all insn-relative fields

2024-04-22 Thread Andrew Cooper

Right now, relocation of displacements is restricted to finding 0xe8/e9 as the
first byte of the replacement, but this is overly restrictive.

Use x86_decode_lite() to find and adjust all insn-relative fields.

As with disp8's not leaving the replacemnet block, some disp32's don't either.
e.g. the RSB stuffing loop.  These stay unmodified.

For now, leave the altcall devirtualisation alone.  These require more care to
transform into the new scheme.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/alternative.c | 46 +++---
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index c86ea235e865..4d7dc9418cf0 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -244,10 +244,31 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
 
 memcpy(buf, repl, a->repl_len);
 
+/* Walk buf[] and adjust any insn-relative operands. */
+if ( a->repl_len )
 {
-/* 0xe8/0xe9 are relative branches; fix the offset. */
-if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
+uint8_t *ip = buf, *end = ip + a->repl_len;
+
+for ( x86_decode_lite_t res; ip < end; ip += res.len )
 {
+int32_t *d32;
+uint8_t *target;
+
+res = x86_decode_lite(ip, end);
+
+if ( res.len <= 0 )
+{
+printk("Alternative for %ps [%*ph]\n",
+   ALT_ORIG_PTR(a), a->repl_len, repl);
+printk("Unable to decode instruction in alternative - 
ignoring.\n");
+goto skip_this_alternative;
+}
+
+if ( res.rel_type != REL_TYPE_d32 )
+continue;
+
+d32 = res.rel;
+
 /*
  * Detect the special case of indirect-to-direct branch 
patching:
  * - replacement is a direct CALL/JMP (opcodes 0xE8/0xE9; 
already
@@ -317,14 +338,23 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
  */
 goto skip_this_alternative;
 }
+
+continue;
 }
-else if ( force && system_state < SYS_STATE_active )
-ASSERT_UNREACHABLE();
-else
-*(int32_t *)(buf + 1) += repl - orig;
+
+target = ip + res.len + *d32;
+
+if ( target >= buf && target <= end )
+{
+/*
+ * Target doesn't leave the replacement block.  e.g. RSB
+ * stuffing.  Leave it unmodified.
+ */
+continue;
+}
+
+*d32 += repl - orig;
 }
-else if ( force && system_state < SYS_STATE_active  )
-ASSERT_UNREACHABLE();
 }
 
 a->priv = 1;
-- 
2.30.2

[PATCH 1/6] x86: Introduce x86_decode_lite()

2024-04-22 Thread Andrew Cooper

In order to relocate all IP-relative fields in an alternative replacement
block, we need to decode the instructions enough to obtain their length and
any relative fields.

Full x86_decode() is far too heavyweight, so introduce a minimal form which
can make several simplifying assumptions.

This logic is sufficient to decode all alternative blocks that exist in Xen
right now.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/x86_emulate/Makefile  |   1 +
 xen/arch/x86/x86_emulate/decode-lite.c | 245 +
 xen/arch/x86/x86_emulate/private.h |   2 +
 xen/arch/x86/x86_emulate/x86_emulate.h |  17 ++
 4 files changed, 265 insertions(+)
 create mode 100644 xen/arch/x86/x86_emulate/decode-lite.c

diff --git a/xen/arch/x86/x86_emulate/Makefile 
b/xen/arch/x86/x86_emulate/Makefile
index 2e20d65d788a..f06731913d67 100644
--- a/xen/arch/x86/x86_emulate/Makefile
+++ b/xen/arch/x86/x86_emulate/Makefile
@@ -3,6 +3,7 @@ obj-y += 0fae.o
 obj-y += 0fc7.o
 obj-y += blk.o
 obj-y += decode.o
+obj-y += decode-lite.o
 obj-$(CONFIG_HVM) += fpu.o
 obj-y += util.o
 obj-y += util-xen.o
diff --git a/xen/arch/x86/x86_emulate/decode-lite.c 
b/xen/arch/x86/x86_emulate/decode-lite.c
new file mode 100644
index ..050b0d8cf4a8
--- /dev/null
+++ b/xen/arch/x86/x86_emulate/decode-lite.c
@@ -0,0 +1,245 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include "private.h"
+
+#define Imm8   (1 << 0)
+#define Imm(1 << 1)
+#define Branch (1 << 5) /* ... that we care about */
+/*  ModRM  (1 << 6) */
+#define Known  (1 << 7)
+
+#define ALU_OPS \
+(Known|ModRM),  \
+(Known|ModRM),  \
+(Known|ModRM),  \
+(Known|ModRM),  \
+(Known|Imm8),   \
+(Known|Imm)
+
+static const uint8_t init_or_livepatch_const onebyte[256] = {
+[0x00] = ALU_OPS, /* ADD */ [0x08] = ALU_OPS, /* OR  */
+[0x10] = ALU_OPS, /* ADC */ [0x18] = ALU_OPS, /* SBB */
+[0x20] = ALU_OPS, /* AND */ [0x28] = ALU_OPS, /* SUB */
+[0x30] = ALU_OPS, /* XOR */ [0x38] = ALU_OPS, /* CMP */
+
+[0x50 ... 0x5f] = (Known), /* PUSH/POP %reg */
+
+[0x70 ... 0x7f] = (Known|Branch|Imm8), /* Jcc disp8 */
+[0x80]  = (Known|ModRM|Imm8),
+[0x81]  = (Known|ModRM|Imm),
+
+[0x83]  = (Known|ModRM|Imm8),
+[0x84 ... 0x8e] = (Known|ModRM),   /* TEST/XCHG/MOV/MOV-SREG/LEA r/rm 
*/
+
+[0xb0 ... 0xb7] = (Known|Imm8),/* MOV $imm8, %reg */
+[0xb8 ... 0xbf] = (Known|Imm), /* MOV $imm32, %reg */
+
+[0xcc]  = (Known), /* INT3 */
+[0xcd]  = (Known|Imm8),/* INT $imm8 */
+
+[0xe8 ... 0xe9] = (Known|Branch|Imm),  /* CALL/JMP disp32 */
+[0xeb]  = (Known|Branch|Imm8), /* JMP disp8 */
+
+[0xf1]  = (Known), /* ICEBP */
+[0xf4]  = (Known), /* HLT */
+[0xf5]  = (Known), /* CMC */
+[0xf6 ... 0xf7] = (Known|ModRM),   /* Grp3 */
+[0xf8 ... 0xfd] = (Known), /* CLC ... STD */
+[0xfe ... 0xff] = (Known|ModRM),   /* Grp4 */
+};
+static const uint8_t init_or_livepatch_const twobyte[256] = {
+[0x00 ... 0x01] = (Known|ModRM),   /* Grp6/Grp7 */
+
+[0x18 ... 0x1f] = (Known|ModRM),   /* Grp16 (Hint Nop) */
+
+[0x20 ... 0x23] = (Known|ModRM),   /* MOV CR/DR */
+
+[0x30 ... 0x34] = (Known), /* WRMSR ... RDPMC */
+[0x40 ... 0x4f] = (Known|ModRM),   /* CMOVcc */
+
+[0x80 ... 0x8f] = (Known|Branch|Imm),  /* Jcc disp32 */
+[0x90 ... 0x9f] = (Known|ModRM),   /* SETcc */
+
+[0xab]  = (Known|ModRM),   /* BTS */
+[0xac]  = (Known|ModRM|Imm8),  /* SHRD $imm8 */
+[0xad ... 0xaf] = (Known|ModRM),   /* SHRD %cl / Grp15 / IMUL */
+
+[0xb8 ... 0xb9] = (Known|ModRM),   /* POPCNT/Grp10 (UD1) */
+[0xba]  = (Known|ModRM|Imm8),  /* Grp8 */
+[0xbb ... 0xbf] = (Known|ModRM),   /* BSR/BSF/BSR/MOVSX */
+};
+
+/*
+ * Bare minimum x86 instruction decoder to parse the alternative replacement
+ * instructions and locate the IP-relative references that may need updating.
+ *
+ * These are:
+ *  - disp8/32 from near branches
+ *  - RIP-relative memory references
+ *
+ * The following simplifications are used:
+ *  - All code is 64bit, and the instruction stream is safe to read.
+ *  - The 67 prefix is not implemented, so the address size is only 64bit.
+ *
+ * Inputs:
+ *  @ip  The position to start decoding from.
+ *  @end End of the replacement block.  Exceeding this is considered an error.
+ *
+ * Returns: x86_decode_lite_t
+ *  - On failure, length of -1.
+ *  - On success, length > 0 and REL_TYPE_*.  For REL_TYPE != NONE, rel points
+ *at the relative field in the instruction stream.
+ */

[PATCH 6/6] x86/spec-ctrl: Introduce and use DO_COND_BHB_SEQ

2024-04-22 Thread Andrew Cooper

Now that alternatives can fix up call displacements even when they're not the
first instruction of the replacement, move the SCF_entry_bhb conditional
inside the replacement block.

This removes a conditional branch from the fastpaths of BHI-unaffected
hardware.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/hvm/vmx/entry.S | 12 +++
 xen/arch/x86/include/asm/spec_ctrl_asm.h | 43 +---
 2 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 7233e771d88a..77bf9ea564ea 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -62,12 +62,12 @@ ENTRY(vmx_asm_vmexit_handler)
  * Clear the BHB to mitigate BHI.  Used on eIBRS parts, and uses RETs
  * itself so must be after we've perfomed all the RET-safety we can.
  */
-testb $SCF_entry_bhb, CPUINFO_scf(%rsp)
-jz .L_skip_bhb
-ALTERNATIVE_2 "",\
-"call clear_bhb_loops", X86_SPEC_BHB_LOOPS,  \
-"call clear_bhb_tsx", X86_SPEC_BHB_TSX
-.L_skip_bhb:
+.macro VMX_BHB_SEQ fn:req
+DO_COND_BHB_SEQ \fn scf=CPUINFO_scf(%rsp)
+.endm
+ALTERNATIVE_2 "", \
+"VMX_BHB_SEQ fn=clear_bhb_loops", X86_SPEC_BHB_LOOPS, \
+"VMX_BHB_SEQ fn=clear_bhb_tsx",   X86_SPEC_BHB_TSX
 
 ALTERNATIVE "lfence", "", X86_SPEC_NO_LFENCE_ENTRY_VMX
 /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h 
b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 729a830411eb..559dad88f967 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -92,6 +92,21 @@
 .L\@_skip:
 .endm
 
+.macro DO_COND_BHB_SEQ fn:req, scf=%bl
+/*
+ * Requires SCF (defaults to %rbx), fn=clear_bhb_{loops,tsx}
+ * Clobbers %rax, %rcx
+ *
+ * Conditionally use a BHB clearing software sequence.
+ */
+testb  $SCF_entry_bhb, \scf
+jz .L\@_skip_bhb
+
+call   \fn
+
+.L\@_skip_bhb:
+.endm
+
 .macro DO_OVERWRITE_RSB tmp=rax, xu
 /*
  * Requires nothing
@@ -277,12 +292,9 @@
  * Clear the BHB to mitigate BHI.  Used on eIBRS parts, and uses RETs
  * itself so must be after we've perfomed all the RET-safety we can.
  */
-testb $SCF_entry_bhb, %bl
-jz .L\@_skip_bhb
-ALTERNATIVE_2 "",\
-"call clear_bhb_loops", X86_SPEC_BHB_LOOPS,  \
-"call clear_bhb_tsx", X86_SPEC_BHB_TSX
-.L\@_skip_bhb:
+ALTERNATIVE_2 "",  \
+"DO_COND_BHB_SEQ clear_bhb_loops", X86_SPEC_BHB_LOOPS, \
+"DO_COND_BHB_SEQ clear_bhb_tsx",   X86_SPEC_BHB_TSX
 
 ALTERNATIVE "lfence", "", X86_SPEC_NO_LFENCE_ENTRY_PV
 .endm
@@ -322,12 +334,9 @@
 ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
 X86_FEATURE_SC_MSR_PV
 
-testb $SCF_entry_bhb, %bl
-jz .L\@_skip_bhb
-ALTERNATIVE_2 "",\
-"call clear_bhb_loops", X86_SPEC_BHB_LOOPS,  \
-"call clear_bhb_tsx", X86_SPEC_BHB_TSX
-.L\@_skip_bhb:
+ALTERNATIVE_2 "",  \
+"DO_COND_BHB_SEQ clear_bhb_loops", X86_SPEC_BHB_LOOPS, \
+"DO_COND_BHB_SEQ clear_bhb_tsx",   X86_SPEC_BHB_TSX
 
 ALTERNATIVE "lfence", "", X86_SPEC_NO_LFENCE_ENTRY_INTR
 .endm
@@ -433,13 +442,9 @@
  * Clear the BHB to mitigate BHI.  Used on eIBRS parts, and uses RETs
  * itself so must be after we've perfomed all the RET-safety we can.
  */
-testb $SCF_entry_bhb, %bl
-jz .L\@_skip_bhb
-
-ALTERNATIVE_2 "",\
-"call clear_bhb_loops", X86_SPEC_BHB_LOOPS,  \
-"call clear_bhb_tsx", X86_SPEC_BHB_TSX
-.L\@_skip_bhb:
+ALTERNATIVE_2 "",  \
+"DO_COND_BHB_SEQ clear_bhb_loops", X86_SPEC_BHB_LOOPS, \
+"DO_COND_BHB_SEQ clear_bhb_tsx",   X86_SPEC_BHB_TSX
 
 lfence
 .endm
-- 
2.30.2

[PATCH 3/6] x86/alternative: Intend the relocation logic

2024-04-22 Thread Andrew Cooper

... to make subsequent patches legible.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/alternative.c | 126 +++--
 1 file changed, 64 insertions(+), 62 deletions(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 5bd256365def..2ca4dfd569bc 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -244,78 +244,80 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
 
 memcpy(buf, repl, a->repl_len);
 
-/* 0xe8/0xe9 are relative branches; fix the offset. */
-if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
 {
-/*
- * Detect the special case of indirect-to-direct branch patching:
- * - replacement is a direct CALL/JMP (opcodes 0xE8/0xE9; already
- *   checked above),
- * - replacement's displacement is -5 (pointing back at the very
- *   insn, which makes no sense in a real replacement insn),
- * - original is an indirect CALL/JMP (opcodes 0xFF/2 or 0xFF/4)
- *   using RIP-relative addressing.
- * Some branch destinations may still be NULL when we come here
- * the first time. Defer patching of those until the post-presmp-
- * initcalls re-invocation (with force set to true). If at that
- * point the branch destination is still NULL, insert "UD2; UD0"
- * (for ease of recognition) instead of CALL/JMP.
- */
-if ( a->cpuid == X86_FEATURE_ALWAYS &&
- *(int32_t *)(buf + 1) == -5 &&
- a->orig_len >= 6 &&
- orig[0] == 0xff &&
- orig[1] == (*buf & 1 ? 0x25 : 0x15) )
+/* 0xe8/0xe9 are relative branches; fix the offset. */
+if ( a->repl_len >= 5 && (*buf & 0xfe) == 0xe8 )
 {
-long disp = *(int32_t *)(orig + 2);
-const uint8_t *dest = *(void **)(orig + 6 + disp);
-
-if ( dest )
+/*
+ * Detect the special case of indirect-to-direct branch 
patching:
+ * - replacement is a direct CALL/JMP (opcodes 0xE8/0xE9; 
already
+ *   checked above),
+ * - replacement's displacement is -5 (pointing back at the 
very
+ *   insn, which makes no sense in a real replacement insn),
+ * - original is an indirect CALL/JMP (opcodes 0xFF/2 or 
0xFF/4)
+ *   using RIP-relative addressing.
+ * Some branch destinations may still be NULL when we come here
+ * the first time. Defer patching of those until the 
post-presmp-
+ * initcalls re-invocation (with force set to true). If at that
+ * point the branch destination is still NULL, insert "UD2; 
UD0"
+ * (for ease of recognition) instead of CALL/JMP.
+ */
+if ( a->cpuid == X86_FEATURE_ALWAYS &&
+ *(int32_t *)(buf + 1) == -5 &&
+ a->orig_len >= 6 &&
+ orig[0] == 0xff &&
+ orig[1] == (*buf & 1 ? 0x25 : 0x15) )
 {
-/*
- * When building for CET-IBT, all function pointer targets
- * should have an endbr64 instruction.
- *
- * If this is not the case, leave a warning because
- * something is probably wrong with the build.  A CET-IBT
- * enabled system might have exploded already.
- *
- * Otherwise, skip the endbr64 instruction.  This is a
- * marginal perf improvement which saves on instruction
- * decode bandwidth.
- */
-if ( IS_ENABLED(CONFIG_XEN_IBT) )
+long disp = *(int32_t *)(orig + 2);
+const uint8_t *dest = *(void **)(orig + 6 + disp);
+
+if ( dest )
 {
-if ( is_endbr64(dest) )
-dest += ENDBR64_LEN;
-else
-printk(XENLOG_WARNING
-   "altcall %ps dest %ps has no endbr64\n",
-   orig, dest);
+/*
+ * When building for CET-IBT, all function pointer 
targets
+ * should have an endbr64 instruction.
+ *
+ * If this is not the case, leave a warning because
+ * something is probably wrong with the build.  A 
CET-IBT
+ * enabled system might have exploded already.
+ *

[PATCH 0/6] x86/alternatives: Adjust all insn-relative fields

2024-04-22 Thread Andrew Cooper

Alternatives have had a reasonably severe restriction since their
introduction.  This has been the source of several bugs, and several
inefficiencies particularly in the speculative safety paths, and I've finally
gotten bored enough to fixing it.

Introduce the new infrastructure, and adjust the BHB scrubbing logic to use
it.

Andrew Cooper (6):
  x86: Introduce x86_decode_lite()
  x86/alternative: Walk all replacements in debug builds
  x86/alternative: Intend the relocation logic
  x86/alternative: Replace a continue with a goto
  x86/alternative: Relocate all insn-relative fields
  x86/spec-ctrl: Introduce and use DO_COND_BHB_SEQ

 xen/arch/x86/alternative.c   | 210 +--
 xen/arch/x86/hvm/vmx/entry.S |  12 +-
 xen/arch/x86/include/asm/spec_ctrl_asm.h |  43 ++--
 xen/arch/x86/x86_emulate/Makefile|   1 +
 xen/arch/x86/x86_emulate/decode-lite.c   | 245 +++
 xen/arch/x86/x86_emulate/private.h   |   2 +
 xen/arch/x86/x86_emulate/x86_emulate.h   |  17 ++
 7 files changed, 445 insertions(+), 85 deletions(-)
 create mode 100644 xen/arch/x86/x86_emulate/decode-lite.c

-- 
2.30.2

[PATCH 4/6] x86/alternative: Replace a continue with a goto

2024-04-22 Thread Andrew Cooper

A subsequent patch is going to insert a loop, which interferes with the
continue in the devirtualisation logic.

Replace it with a goto, and a paragraph explaining why we intentionally avoid
setting a->priv = 1.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
---
 xen/arch/x86/alternative.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 2ca4dfd569bc..c86ea235e865 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -308,7 +308,15 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
 buf[4] = 0xff;
 }
 else
-continue;
+{
+/*
+ * The function pointer we're wanting to devirtualise
+ * is still NULL, and we're not sealing yet.  Leave
+ * the alternative fully un-processed, in order to
+ * process it the next time around.
+ */
+goto skip_this_alternative;
+}
 }
 else if ( force && system_state < SYS_STATE_active )
 ASSERT_UNREACHABLE();
@@ -323,6 +331,7 @@ static void init_or_livepatch _apply_alternatives(struct 
alt_instr *start,
 
 add_nops(buf + a->repl_len, total_len - a->repl_len);
 text_poke(orig, buf, total_len);
+skip_this_alternative:;
 }
 
 /*
-- 
2.30.2

Re: [PATCH v2 5/5] x86: Add --force option to xen-ucode to override microcode version check

2024-04-22 Thread Anthony PERARD

On Tue, Apr 16, 2024 at 10:15:46AM +0100, Fouad Hilly wrote:
> Introduce --force option to xen-ucode to force skipping microcode version
> check, which allows the user to update x86 microcode even if both versions
> are the same.
> 
> [v2]
> 1- Changed data type from uint32_t to unsigned int.
> 2- Corrected line length.
> 3- Removed XENPF_UCODE_FLAG_FORCE_NOT_SET.
> 4- Corrected indentations.
> 5- Changed command line options to have the file name as first argument when 
> applicable.
> 6- --force option doesn't require an argument anymore.
> 7- Used optint to access filename in argv.
> 
> Signed-off-by: Fouad Hilly 
> ---
> 
> Suggested-by: Andrew Cooper 

You might want to move that tag before the '---' separation otherwise it
wont be part of the commit message. `git am` discard every thing after
the '---' line. Also add the tag before your SOB.

> ---
>  tools/include/xenctrl.h   |  3 ++-
>  tools/libs/ctrl/xc_misc.c | 13 +++--
>  tools/misc/xen-ucode.c| 18 +-
>  3 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/misc/xen-ucode.c b/tools/misc/xen-ucode.c
> index e3c1943e3633..4178fd2221ea 100644
> --- a/tools/misc/xen-ucode.c
> +++ b/tools/misc/xen-ucode.c
> @@ -24,7 +26,8 @@ static void usage(const char *name)
> "Usage: %s [microcode file] [options]\n"

Now, that usage line is wrong. The options needs to go before the file.
That could be fix on the previous patch. With that usage line, we would
want to run `./xen-ucode ucode.bin --force`, but I don't think that
would work.

> "Options:\n"
> "  -h, --helpdisplay this help and exit\n"
> -   "  -s, --show-cpu-info   show CPU information and exit\n",
> +   "  -s, --show-cpu-info   show CPU information and exit\n"
> +   "  -f, --force   force to skip micorocde version check\n",
> name, name);
>  }
>  

Thanks,

-- 
Anthony PERARD

Re: [PATCH v2 4/5] x86: Use getopt to handle command line args

2024-04-22 Thread Anthony PERARD

On Tue, Apr 16, 2024 at 10:15:45AM +0100, Fouad Hilly wrote:
> diff --git a/tools/misc/xen-ucode.c b/tools/misc/xen-ucode.c
> index 0c0b2337b4ea..e3c1943e3633 100644
> --- a/tools/misc/xen-ucode.c
> +++ b/tools/misc/xen-ucode.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static xc_interface *xch;
>  
> @@ -22,7 +23,8 @@ static void usage(const char *name)
>  printf("%s: Xen microcode updating tool\n"
> "Usage: %s [microcode file] [options]\n"
> "Options:\n"
> -   "show-cou-info   show CPU information and exit\n",
> +   "  -h, --helpdisplay this help and exit\n"
> +   "  -s, --show-cpu-info   show CPU information and exit\n",
> name, name);
>  }
>  
> @@ -86,6 +88,13 @@ int main(int argc, char *argv[])
>  char *filename, *buf;
>  size_t len;
>  struct stat st;
> +int opt;
> +
> +static const struct option options[] = {
> +{"help", no_argument, NULL, 'h'},
> +{"show-cpu-info", no_argument, NULL, 's'},
> +{NULL, no_argument, NULL, 0}
> +};
>  
>  xch = xc_interface_open(NULL, NULL, 0);
>  if ( xch == NULL )
> @@ -95,19 +104,22 @@ int main(int argc, char *argv[])
>  exit(1);
>  }
>  
> -if ( argc < 2 )
> -{
> -fprintf(stderr,
> -"xen-ucode: Xen microcode updating tool\n"
> -"Usage: %s [ | show-cpu-info]\n", argv[0]);
> -show_curr_cpu(stderr);
> -exit(2);
> -}
> +if ( argc != 2 )
> +goto ext_err;

Why only two arguments allowed? And why check `argc` before calling
getopt_long() ?


>  
> -if ( !strcmp(argv[1], "show-cpu-info") )
> +while ( (opt = getopt_long(argc, argv, "hs", options, NULL)) != -1 )
>  {
> -show_curr_cpu(stdout);
> -return 0;
> +switch (opt)
> +{
> +case 'h':
> +usage(argv[0]);
> +exit(EXIT_SUCCESS);
> +case 's':
> +show_curr_cpu(stdout);
> +exit(EXIT_SUCCESS);
> +default:
> +goto ext_err;
> +}
>  }
>  
>  filename = argv[1];

So, after calling getopt_long(), the variable `optind` point to the next
argument that getopt_long() didn't recognize as an argument. It would be
a good time to start using it, and check that the are actually argument
left on the command line, even if in the current patch the only possible
outcome is that argv[1] has something that isn't an option.

> @@ -152,4 +164,11 @@ int main(int argc, char *argv[])
>  close(fd);
>  
>  return 0;
> +
> + ext_err:
> +fprintf(stderr,
> +"xen-ucode: Xen microcode updating tool\n"
> +"Usage: %s [microcode file] [options]\n", argv[0]);

Isn't there a usage() function that we could use?

> +show_curr_cpu(stderr);

Why call show_curr_cpu() on the error path?

> +exit(STDERR_FILENO);

Still not an exit value.

Thanks,

-- 
Anthony PERARD

Re: [PATCH v2 3/5] x86: Add usage() to print out usage message

2024-04-22 Thread Anthony PERARD

On Tue, Apr 16, 2024 at 10:15:44AM +0100, Fouad Hilly wrote:
> Refactor xen-ucode tool by adding usage() to handle usage\help messages
> As we add more command options this will keep help\usage messages in common 
> block
> 
> [v2]
> 1- Improved message description.
> 2- Fixed formatting and indentation.
> 3- Error message to print to stderr.
> 
> Signed-off-by: Fouad Hilly 
> ---
>  tools/misc/xen-ucode.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tools/misc/xen-ucode.c b/tools/misc/xen-ucode.c
> index c6ae6498d659..0c0b2337b4ea 100644
> --- a/tools/misc/xen-ucode.c
> +++ b/tools/misc/xen-ucode.c
> @@ -17,6 +17,15 @@ static xc_interface *xch;
>  static const char intel_id[] = "GenuineIntel";
>  static const char   amd_id[] = "AuthenticAMD";
>  
> +static void usage(const char *name)
> +{
> +printf("%s: Xen microcode updating tool\n"
> +   "Usage: %s [microcode file] [options]\n"
> +   "Options:\n"
> +   "show-cou-info   show CPU information and exit\n",

Don't change the usage message just yet. It still is
"Usage: %s [ | show-cpu-info]"

The current one mean we can run one of:
./xen-ucode ucode.bin
./xen-ucode show-cpu-info

The proposed help message in this patch mean we could have one of:
./xen-ucode ucode.bin
./xen-ucode show-cpu-info
./xen-ucode ucode.bin show-cpu-info
But that last one is an error.

Thanks,

-- 
Anthony PERARD

Re: [PATCH v3 4/4] docs/man: Add xenwatchdog manual page

2024-04-22 Thread Anthony PERARD

On Thu, Apr 11, 2024 at 07:20:23PM +0100, le...@solinno.co.uk wrote:
> From: Leigh Brown 
> 
> Add a manual page for xenwatchdogd.
> 
> Signed-off-by: Leigh Brown 

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v3 2/4] tools/misc: xenwatchdogd enhancements

2024-04-22 Thread Anthony PERARD

On Thu, Apr 11, 2024 at 07:20:21PM +0100, le...@solinno.co.uk wrote:
> From: Leigh Brown 
> 
> Add usage() function, the ability to run in the foreground, and
> the ability to disarm the watchdog timer when exiting.
> 
> Add enhanced parameter parsing and validation, making use of
> getopt_long().  Check the number of parameters are correct, the
> timeout is at least two seconds (to allow a minimum sleep time of
> one second), and that the sleep time is at least one and less
> than the watchdog timeout.
> 
> With these changes, the daemon will no longer instantly reboot
> the domain if you enter a zero timeout (or non-numeric parameter),
> and prevent the daemon consuming 100% of a CPU due to zero sleep
> time.
> 
> Signed-off-by: Leigh Brown 

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v3 3/4] tools/misc: Add xenwatchdogd.c copyright notice

2024-04-22 Thread Anthony PERARD

On Thu, Apr 11, 2024 at 07:20:22PM +0100, le...@solinno.co.uk wrote:
> From: Leigh Brown 
> 
> Add copyright notice and description of the program.
> 
> Signed-off-by: Leigh Brown 
>
> ---
>  tools/misc/xenwatchdogd.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/tools/misc/xenwatchdogd.c b/tools/misc/xenwatchdogd.c
> index a16d1efc13..2884ca6ca9 100644
> --- a/tools/misc/xenwatchdogd.c
> +++ b/tools/misc/xenwatchdogd.c
> @@ -1,3 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * xenwatchdogd.c
> + *
> + * Watchdog based on Xen hypercall watchdog interface.
> + *
> + * Copyright 2010 Citrix Ltd
> + * Copyright 2024 Leigh Brown 
> + *
> + */

Acked-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v3 1/4] tools/misc: xenwatchdogd: add parse_secs()

2024-04-22 Thread Anthony PERARD

On Thu, Apr 11, 2024 at 07:20:20PM +0100, le...@solinno.co.uk wrote:
> From: Leigh Brown 
> 
> Create a new parse_secs() function to parse the timeout and sleep
> parameters. This ensures that non-numeric parameters are not
> accidentally treated as numbers.
> 
> Signed-off-by: Leigh Brown 

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [RFC PATCH v2 2/2] libs/light: expand device model start timeout use

2024-04-22 Thread Anthony PERARD

On Thu, Apr 11, 2024 at 03:28:20PM +0300, Manos Pitsidianakis wrote:
> Various timeout values that depend on the device model should also
> respect the device model start timeout setting. This commit adds the
> __libxl_device_model_start_timeout() value to those time outs without
> changing their default values.

I don't like much this patch. It multiplies all qmp_cmd timeouts by 7.

First, could you make two separate patches for the changes? One to
change LIBXL_QMP_CMD_TIMEOUT, and one to change
LIBXL_STUBDOM_START_TIMEOUT.

Second, when the environment variable is unset, the timeout should stay
at 10 for the qmp_cmd one. If for some reason that chosen value is to
low, we could always have a separate patch to adjust the value.

Cheers,

-- 
Anthony PERARD

[xen-unstable-smoke test] 185760: tolerable all pass - PUSHED

2024-04-22 Thread osstest service owner

flight 185760 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/185760/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  cccb7878f386fb8691b7e28957a562a66d9b875f
baseline version:
 xen  09cb855f33fb619c87d8d4250d3a980f568f151b

Last test of basis   185740  2024-04-19 13:02:10 Z3 days
Testing same since   185760  2024-04-22 14:02:06 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Jason Andryuk 
  Julien Grall 
  Nicola Vetrini 
  Oleksii Kurochko 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   09cb855f33..cccb7878f3  cccb7878f386fb8691b7e28957a562a66d9b875f -> smoke

Re: [RFC PATCH v2 1/2] libs/light: add device model start timeout env var

2024-04-22 Thread Anthony PERARD

On Thu, Apr 11, 2024 at 03:28:19PM +0300, Manos Pitsidianakis wrote:
> diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
> index bed8393473..c159877094 100644
> --- a/docs/man/xl.1.pod.in
> +++ b/docs/man/xl.1.pod.in
> @@ -1993,6 +1993,17 @@ Otherwise the build time default in 
> LIBXL_BOOTLOADER_TIMEOUT will be used.
>  If defined the value must be an unsigned integer between 0 and INT_MAX,
>  otherwise behavior is undefined.  Setting to 0 disables the timeout.

I don't think a value of "0" would disable timeouts, did you try?
It's more likely that the timeout will fire immediately. I'll had some
comment later about this on case-by-case basis.

>  
> +=item LIBXL_DEVICE_MODEL_START_TIMEOUT
> +
> +Timeout in seconds for starting the device model process. Useful in case the
> +device model takes an unusual amount of time to start— for example in case of

Could you use only ASCII characters,   here ^
It's looks like it's the first UTF-8 chr in this document, and we can do
without it. I just want to avoid find out later that something break
because of that chr.

> +very slow I/O, in case of slow performance due to memory sanitizer usage, 
> etc.
> +
> +If undefined, the default hard-coded value of 60 seconds is used.
> +
> +If defined, the value must be an unsigned integer between 0 and INT_MAX,
> +otherwise behaviour is undefined.  Setting the value to 0 disables the 
> timeout.
> +
>  =back
>  
>  =head1 SEE ALSO
> diff --git a/tools/libs/light/libxl_9pfs.c b/tools/libs/light/libxl_9pfs.c
> index 48f894f070..950a464b45 100644
> --- a/tools/libs/light/libxl_9pfs.c
> +++ b/tools/libs/light/libxl_9pfs.c
> @@ -132,7 +132,7 @@ static int xen9pfsd_spawn(libxl__egc *egc, uint32_t 
> domid, libxl_device_p9 *p9,
>  aop9->spawn.ao = aodev->ao;
>  aop9->spawn.what = "xen-9pfs daemon";
>  aop9->spawn.xspath = GCSPRINTF("%s/state", path);
> -aop9->spawn.timeout_ms = LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
> +aop9->spawn.timeout_ms = __libxl_device_model_start_timeout() * 1000;

So this value is going to be used by libxl__spawn_spawn() then
libxl__xswait_start(), and it's looks like a timeout of "0" will have
the timeout fire immediately. Instead, a value of "-1" might disable the
timeout, but I don't know how well it would work without a timeout.

Timeout is setup with libxl__ev_time_register_rel().

>  aop9->spawn.pidpath = GCSPRINTF("%s/pid", path);
>  aop9->spawn.midproc_cb = libxl__spawn_record_pid;
>  aop9->spawn.confirm_cb = xen9pfsd_confirm;
> diff --git a/tools/libs/light/libxl_device.c b/tools/libs/light/libxl_device.c
> index 6f0100d05e..452e55ba23 100644
> --- a/tools/libs/light/libxl_device.c
> +++ b/tools/libs/light/libxl_device.c
> @@ -1436,7 +1436,7 @@ int libxl__wait_for_device_model_deprecated(libxl__gc 
> *gc,
>  
>  path = DEVICE_MODEL_XS_PATH(gc, dm_domid, domid, "/state");
>  return libxl__xenstore_child_wait_deprecated(gc, domid,
> - LIBXL_DEVICE_MODEL_START_TIMEOUT,
> + __libxl_device_model_start_timeout(),

This function will return an error immediately, without trying, with a
timeout of "0", and "-1" isn't an option (well I guess it might be
INT_MAX because it an unsigned value).

>   "Device Model", path, state, spawning,
>   check_callback, 
> check_callback_userdata);
>  }
> diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
> index 0b03a7c747..4369fef161 100644
> --- a/tools/libs/light/libxl_dm.c
> +++ b/tools/libs/light/libxl_dm.c
> @@ -2629,7 +2629,7 @@ static void spawn_qmp_proxy(libxl__egc *egc,
>  sdss->qmp_proxy_spawn.pidpath = GCSPRINTF("%s/image/qmp-proxy-pid", 
> dom_path);
>  sdss->qmp_proxy_spawn.xspath = DEVICE_MODEL_XS_PATH(gc, 
> LIBXL_TOOLSTACK_DOMID,
>  dm_domid, 
> "/qmp-proxy-state");
> -sdss->qmp_proxy_spawn.timeout_ms = LIBXL_DEVICE_MODEL_START_TIMEOUT * 
> 1000;
> +sdss->qmp_proxy_spawn.timeout_ms = __libxl_device_model_start_timeout() 
> * 1000;

That's used by libxl__spawn_spawn(). Already commented on it.

>  sdss->qmp_proxy_spawn.midproc_cb = libxl__spawn_record_pid;
>  sdss->qmp_proxy_spawn.confirm_cb = qmp_proxy_confirm;
>  sdss->qmp_proxy_spawn.failure_cb = qmp_proxy_startup_failed;
> @@ -3011,7 +3011,7 @@ retry_transaction:
>  spawn->what = GCSPRINTF("domain %d device model", domid);
>  spawn->xspath = DEVICE_MODEL_XS_PATH(gc, LIBXL_TOOLSTACK_DOMID, domid,
>   "/state");
> -spawn->timeout_ms = LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000;
> +spawn->timeout_ms = __libxl_device_model_start_timeout() * 1000;

That's used by libxl__spawn_spawn(). Already commented on it.

>  spawn->pidpath = GCSPRINTF("%s/image/device-model-pid", dom_path);
>  spawn->midproc_cb = libxl__spawn_record_pid;
>

Re: [PATCH] x86/rtc: Avoid UIP flag being set for longer than expected

2024-04-22 Thread Ross Lagerwall

On Thu, Apr 18, 2024 at 2:24 PM Jan Beulich  wrote:
>
> On 08.04.2024 18:26, Ross Lagerwall wrote:
> > In a test, OVMF reported an error initializing the RTC without
> > indicating the precise nature of the error. The only plausible
> > explanation I can find is as follows:
> >
> > As part of the initialization, OVMF reads register C and then reads
> > register A repatedly until the UIP flag is not set. If this takes longer
> > than 100 ms, OVMF fails and reports an error. This may happen with the
> > following sequence of events:
> >
> > At guest time=0s, rtc_init() calls check_update_timer() which schedules
> > update_timer for t=(1 - 244us).
> >
> > At t=1s, the update_timer function happens to have been called >= 244us
> > late.
>
> Any theory on why this timer runs so late? (It needs dealing with anyway,
> but I'm still curious.)

I don't know specifically why our testcase spotted it. I reproduced the
issue by repeatedly running "xl debug-key h" so that a lot of time was
spent writing to the serial console. That caused timer interrupts to
run late but I suppose there could be a lot of reasons why timer
interrupts occasionally run late (e.g. a live patch critical region).

>
> > In the timer callback, it sets the UIP flag and schedules
> > update_timer2 for t=1s.
> >
> > Before update_timer2 runs, the guest reads register C which calls
> > check_update_timer(). check_update_timer() stops the scheduled
> > update_timer2 and since the guest time is now outside of the update
> > cycle, it schedules update_timer for t=(2 - 244us).
> >
> > The UIP flag will therefore be set for a whole second from t=1 to t=2
> > while the guest repeatedly reads register A waiting for the UIP flag to
> > clear. Fix it by clearing the UIP flag when scheduling update_timer.
>
> I can accept this as a workaround (perhaps with a tweak, see below), but
> I wonder whether we couldn't do this in a less ad hoc way. If stop_timer()
> returned whether the timer was actually stopped, couldn't the clearing of
> UIP be made conditional upon that?

I think that would work though I'd need to spend some time convincing
myself that it doesn't introduce any other race conditions. I'm not
convinced it is really any better than just unconditionally clearing
the flag though.

>
> > --- a/xen/arch/x86/hvm/rtc.c
> > +++ b/xen/arch/x86/hvm/rtc.c
> > @@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s)
> >  }
> >  else
> >  {
> > +s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
> >  next_update_time = (USEC_PER_SEC - guest_usec - 244) * 
> > NS_PER_USEC;
> >  expire_time = NOW() + next_update_time;
> >  s->next_update_time = expire_time;
>
> Is rtc_update_timer2() clearing UIP then still necessary, ahead of calling
> here? Hmm, yes, it is; the question is rather why the function calls
> check_update_timer() when all that'll do (due to UF now being set) is stop
> the other timer (in case it's also running) and clear ->use_timer.
>

Are you suggesting something like this?

diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 4bb1c7505534..eb4901bf129a 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -240,7 +240,8 @@ static void cf_check rtc_update_timer2(void *opaque)
 s->hw.cmos_data[RTC_REG_C] |= RTC_UF;
 s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
 rtc_update_irq(s);
-check_update_timer(s);
+stop_timer(>update_timer);
+s->use_timer = 0;
 }
 spin_unlock(>lock);
 }

That may indeed be an improvement but I'm not sure it is really related
to this patch?

Thanks,
Ross

Re: [RFC PATCH v3 1/3] xen/memory : Add a stats_table resource type

2024-04-22 Thread Jan Beulich

On 31.10.2023 17:21, Matias Ezequiel Vara Larsen wrote:
> This commit proposes a new mechanism to query the RUNSTATE_running counter for
> a given vcpu from a dom0 userspace application. This commit proposes to expose
> that counter by using the acquire_resource interface. For this purpose, the
> commit adds a new resource named XENMEM_resource_stats_table and a
> type-specific resource named XENMEM_resource_stats_table_id_vcpustats. This
> type-specific resource is aiming at exposing per-VCPU counters.
> 
> The current mechanism relies on the XEN_DOMCTL_getvcpuinfo and holds a single
> global domctl_lock for the entire hypercall; and iterate over every vcpu in 
> the
> system for every update thus impacting operations that share that lock. This
> commit proposes to expose vcpu RUNSTATE_running via the xenforeignmemory
> interface thus preventing to issue the hypercall and holding the lock.

Throughout can you please avoid "this commit" and "proposes" and alike? You
want to plainly describe what you do and why.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1092,6 +1092,27 @@ unsigned int ioreq_server_max_frames(const struct 
> domain *d)
>  return nr;
>  }
>  
> +unsigned int stats_table_max_frames(const struct domain *d, unsigned int id)

Any particular reason this isn't next to its sibling acquire_stats_table()?

> +{
> +unsigned int nr = 0;
> +unsigned int size;
> +
> +switch ( id )
> +{
> +case XENMEM_resource_stats_table_id_vcpustats:
> +size = DIV_ROUND_UP(sizeof(struct vcpu_shmem_stats), 
> SMP_CACHE_BYTES) +
> +   DIV_ROUND_UP(sizeof(struct vcpu_stats), SMP_CACHE_BYTES) * 
> d->max_vcpus;
> +
> +nr = DIV_ROUND_UP(size, PAGE_SIZE);
> +break;
> +
> +default:
> +break;
> +}
> +
> +return nr;
> +}
> +
>  /*
>   * Return 0 on any kind of error.  Caller converts to -EINVAL.
>   *
> @@ -1113,6 +1134,9 @@ static unsigned int resource_max_frames(const struct 
> domain *d,
>  case XENMEM_resource_vmtrace_buf:
>  return d->vmtrace_size >> PAGE_SHIFT;
>  
> +case XENMEM_resource_stats_table:
> +return stats_table_max_frames(d, id);
> +
>  default:
>  return -EOPNOTSUPP;
>  }
> @@ -1176,6 +1200,146 @@ static int acquire_vmtrace_buf(
>  return nr_frames;
>  }
>  
> +void stats_free_vcpu_mfn(struct domain * d)
> +{
> +struct page_info *pg = d->vcpustats_page.pg;
> +void * _va = d->vcpustats_page.va;

What use is the leading underscore here? Also (nit) stray blank after *.

> +unsigned int i;
> +unsigned int size;
> +unsigned int nr_frames;
> +
> +if ( !pg )
> +return;
> +
> +d->vcpustats_page.pg = NULL;
> +d->vcpustats_page.va = NULL;
> +
> +size = DIV_ROUND_UP(sizeof(struct vcpu_shmem_stats), SMP_CACHE_BYTES) +
> +   DIV_ROUND_UP(sizeof(struct vcpu_stats), SMP_CACHE_BYTES)
> +   * d->max_vcpus;
> +
> +nr_frames = DIV_ROUND_UP(size, PAGE_SIZE);
> +
> +vunmap(_va);
> +
> +for ( i = 0; i < nr_frames; i++ )
> +{
> +put_page_alloc_ref([i]);
> +put_page_and_type([i]);
> +}
> +}
> +
> +static int stats_vcpu_alloc_mfn(struct domain *d)

Both functions' names suggest a single MFN is being allocated / freed.

> +{
> +struct page_info *pg;
> +int order;

This can't be negative, can it?

> +unsigned int i;
> +unsigned int size;
> +unsigned int nr_frames;
> +void *_va;
> +mfn_t mfn;
> +struct vcpu_shmem_stats *hd;
> +
> +size = DIV_ROUND_UP(sizeof(struct vcpu_shmem_stats), SMP_CACHE_BYTES) +

Nit: Style is correct here, ...

> +   DIV_ROUND_UP(sizeof(struct vcpu_stats), SMP_CACHE_BYTES)
> +   * d->max_vcpus;

... but then not here (* belongs on the earlier line). Also this is the 3rd
instance of all the same expression, which thus likely wants putting in a
helper function.

> +nr_frames = DIV_ROUND_UP(size, PAGE_SIZE);
> +
> +order = get_order_from_bytes(size);
> +pg = alloc_domheap_pages(d, order, MEMF_no_refcount);
> +
> +if ( !pg )
> +return -ENOMEM;
> +
> +for ( i = 0; i < nr_frames; i++ )
> +{
> +if ( unlikely(!get_page_and_type([i], d, PGT_writable_page)) )
> +/*
> + * The domain can't possibly know about this page yet, so failure
> + * here is a clear indication of something fishy going on.
> + */
> +goto refcnt_err;
> +}
> +
> +mfn = page_to_mfn(pg);
> +_va = vmap(, nr_frames);

This won't work when nr_frames > 1. The first parameter of this function
points to an array of nr_frames elements. I think you mean vmap_contig(),
yet even better would be if you didn't needlessly allocate non-order-0
pages (and perhaps excess space) here.

> +if ( !_va )
> +goto refcnt_err;
> +
> +for ( i = 0; i < nr_frames; i++ )
> +clear_page(_va + i * PAGE_SIZE);
> +
> +/* Initialize vcpu_shmem_stats header */
> +

Re: [PATCH v3] xen/riscv: check whether the assembler has Zbb extension support

2024-04-22 Thread Oleksii

On Mon, 2024-04-22 at 11:43 +0200, Jan Beulich wrote:
> On 19.04.2024 16:23, Oleksii Kurochko wrote:
> > Update the argument of the as-insn for the Zbb case to verify that
> > Zbb is supported not only by a compiler, but also by an assembler.
> > 
> > Also, check-extenstion(ext_name, "insn") helper macro is introduced
> > to check whether extension is supported by a compiler and an
> > assembler.
> > 
> > Signed-off-by: Oleksii Kurochko 
> 
> Acked-by: Jan Beulich 
> despite ...
> 
> > --- a/xen/arch/riscv/arch.mk
> > +++ b/xen/arch/riscv/arch.mk
> > @@ -11,9 +11,14 @@ riscv-march-$(CONFIG_RISCV_ISA_C)   :=
> > $(riscv-march-y)c
> >  
> >  riscv-generic-flags := $(riscv-abi-y) -march=$(riscv-march-y)
> >  
> > -zbb := $(call as-insn,$(CC) $(riscv-generic-flags)_zbb,"",_zbb)
> > -zihintpause := $(call as-insn, \
> > -  $(CC) $(riscv-generic-
> > flags)_zihintpause,"pause",_zihintpause)
> > +# check-extension: Check whether extenstion is supported by a
> > compiler and
> > +#  an assembler.
> > +# Usage: $(call check-extension,extension_name,"instr")
> > +check-extension = $(call as-insn,$(CC) $(riscv-generic-
> > flags)_$(1),$(2),_$(1))
> > +
> > +zbb-insn := "andn t0, t0, t0"
> > +zbb := $(call check-extension,zbb,$(zbb-insn))
> > +zihintpause := $(call check-extension,zihintpause,"pause")
> 
> ... still not really being happy with this: Either, as said before,
> zbb-insn
> would better be avoided (by using $(comma) as necessary), or you
> should have
> gone yet a step further to fully address my "redundancy" concern.
> Note how
> the two ultimate lines still have 3 (zbb) and 2 (zihintpause)
> references
> respectively, when the goal ought to be to have exactly one. E.g.
> along the
> lines of
> 
> $(call check-extension,zbb)
> $(call check-extension,zihintpause)
> 
> suitably using $(eval ...) (to effect the variable definitions) and
> defining
> both zbb-insn and zihintpause-insn.
> 
> But I'll nevertheless put this in as is, unless you tell me you're up
> to
> going the extra step.
I am okay with all your suggestions. So the final version will look
like ( if I understood you correctly ):
diff --git a/xen/arch/riscv/arch.mk b/xen/arch/riscv/arch.mk
index dd242c91d1..f172187144 100644
--- a/xen/arch/riscv/arch.mk
+++ b/xen/arch/riscv/arch.mk
@@ -13,12 +13,19 @@ riscv-generic-flags := $(riscv-abi-y) -
march=$(riscv-march-y)
 
 # check-extension: Check whether extenstion is supported by a compiler
and
 #  an assembler.
-# Usage: $(call check-extension,extension_name,"instr")
-check-extension = $(call as-insn,$(CC) $(riscv-generic-
flags)_$(1),$(2),_$(1))
+# Usage: $(call check-extension,extension_name).
+#it should be defined variable with name: extension-name :=
"insn"
 
-zbb-insn := "andn t0, t0, t0"
-zbb := $(call check-extension,zbb,$(zbb-insn))
-zihintpause := $(call check-extension,zihintpause,"pause")
+define check-extension =
+$(eval $(1) := \
+   $(call as-insn,$(CC) $(riscv-generic-flags)_$(1),$(value $(1)-
insn),_$(1)))
+endef
+
+zbb-insn := "andn t0$(comma)t0$(comma)t0"
+$(call check-extension,zbb)
+
+zihintpause-insn := "pause"
+$(call check-extension,zihintpause)
 
 extensions := $(zbb) $(zihintpause)

If the diff above looks good, I'll sent a new patch version.

Thanks.

~ Oleksii

Re: [RFC,FUTURE 1/3] domctl/pci: add ability to provide/request a virtual SBDF

2024-04-22 Thread Jan Beulich

On 14.12.2023 00:44, Volodymyr Babchuk wrote:
> With CONFIG_HAS_VPCI_GUEST_SUPPORT enabled, hypervisor will assign a
> passed-through PCI device to a guest using virtual/guest SBDF
> number. Right now hypervisor automatically allocates next free
> SBDF. But there are cases mentioned in [1] when user should be able to
> control SBDF assigned to the passed-through device.
> 
> To enable this, extend assign_device domctl call with optional
> parameter virtual_sbdf. When this parameter is set to
> XEN_DOMCTL_DEV_SDBF_ANY, hypervisor will act as previously, but it
> will return allocated vSBDF back to the toolstack. Alternatively,
> toolstack might provide desired vSBDF and hypervisor will try to use
> it, if it is free and falls into permitted range.
> 
> [1] https://lore.kernel.org/all/d6a58e73-da51-40f1-a2f7-576274945...@xen.org/
> 
> Signed-off-by: Volodymyr Babchuk 

It's been a hile since this was sent, so comments below may have been given
by others already. I'm sorry for the redundancy if so.

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -37,7 +37,7 @@ extern vpci_register_init_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> -static int add_virtual_device(struct pci_dev *pdev)
> +static int add_virtual_device(struct pci_dev *pdev, pci_sbdf_t *vsbdf)
>  {
>  struct domain *d = pdev->domain;
>  unsigned int new_dev_number;
> @@ -57,13 +57,35 @@ static int add_virtual_device(struct pci_dev *pdev)
>   >sbdf);
>  return -EOPNOTSUPP;
>  }
> -new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
> - VPCI_MAX_VIRT_DEV);
> -if ( new_dev_number == VPCI_MAX_VIRT_DEV )
> -return -ENOSPC;
>  
> -__set_bit(new_dev_number, >vpci_dev_assigned_map);
> +if ( !vsbdf || vsbdf->sbdf == XEN_DOMCTL_DEV_SDBF_ANY )
> +{
> +new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
> + VPCI_MAX_VIRT_DEV);
> +if ( new_dev_number == VPCI_MAX_VIRT_DEV )
> +return -ENOSPC;
>  
> +if ( vsbdf )
> +*vsbdf = PCI_SBDF(0, 0, new_dev_number, 0);
> +}
> +else
> +{
> +if ( vsbdf->seg != 0 || vsbdf->bus != 0 || vsbdf->fn != 0 )
> +{
> +gdprintk(XENLOG_ERR,
> + "vSBDF %pp: segment, bus and function should be 0\n",
> + vsbdf);
> +return -EOPNOTSUPP;
> +}
> +new_dev_number = vsbdf->dev;
> +if ( test_bit(new_dev_number, >vpci_dev_assigned_map) )
> +{
> +gdprintk(XENLOG_ERR, "vSBDF %pp already assigned\n", vsbdf);
> +return -EOPNOTSUPP;
> +}
> +}
> +
> +__set_bit(new_dev_number, >vpci_dev_assigned_map);
>  /*
>   * Both segment and bus number are 0:
>   *  - we emulate a single host bridge for the guest, e.g. segment 0

Please can a blank line remain to live ahead of this comment?

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -504,7 +504,12 @@ struct xen_domctl_sendtrigger {
>  
>  
>  /* Assign a device to a guest. Sets up IOMMU structures. */
> -/* XEN_DOMCTL_assign_device */
> +/* XEN_DOMCTL_assign_device
> + * when assigning a PCI device, it is possible to either request
> + * or provide a virtual SBDF. When virtual_sbdf equals to
> + * XEN_DOMCTL_DEV_SDBF_ANY, hypervisor will return allocated
> + * vSBDF back.
> + */
>  /*
>   * XEN_DOMCTL_test_assign_device: Pass DOMID_INVALID to find out whether the
>   * given device is assigned to any DomU at all. Pass a specific domain ID to
> @@ -528,6 +533,8 @@ struct xen_domctl_assign_device {
>  union {
>  struct {
>  uint32_t machine_sbdf;   /* machine PCI ID of assigned device */
> +uint32_t virtual_sbdf;   /* IN/OUT virtual SBDF of the device */
> +#define XEN_DOMCTL_DEV_SDBF_ANY 0x /* request a free SBDF */
>  } pci;

Such a struct change needs to come with an interface version bump, I
guess.

> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -33,7 +33,7 @@ typedef int vpci_register_init_t(struct pci_dev *dev);
> __used_section(".data.vpci." p) = x
>  
>  /* Assign vPCI to device by adding handlers to device. */
> -int __must_check vpci_assign_device(struct pci_dev *pdev);
> +int __must_check vpci_assign_device(struct pci_dev *pdev, pci_sbdf_t *vsbdf);
>  
>  /* Remove all handlers and free vpci related structures. */
>  void vpci_deassign_device(struct pci_dev *pdev);
> @@ -265,7 +265,7 @@ bool vpci_ecam_read(pci_sbdf_t sbdf, unsigned int reg, 
> unsigned int len,
>  #else /* !CONFIG_HAS_VPCI */
>  struct vpci_vcpu {};
>  
> -static inline int vpci_assign_device(struct pci_dev *pdev)
> +static inline int vpci_assign_device(struct pci_dev *pdev, pci_sbdf_t *vsbdf)
>  {
>  return 0;

Re: [PATCH] x86/cpuid-policy: Add AMD SVM CPUID leaf to featureset

2024-04-22 Thread Jan Beulich

On 22.04.2024 17:09, Jan Beulich wrote:
> On 17.04.2024 15:22, George Dunlap wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -477,6 +477,8 @@ static void generic_identify(struct cpuinfo_x86 *c)
>>  c->x86_capability[FEATURESET_e7d] = cpuid_edx(0x8007);
>>  if (c->extended_cpuid_level >= 0x8008)
>>  c->x86_capability[FEATURESET_e8b] = cpuid_ebx(0x8008);
>> +if (c->extended_cpuid_level >= 0x800a)
>> +c->x86_capability[FEATURESET_ead] = cpuid_edx(0x800a);
>>  if (c->extended_cpuid_level >= 0x8021)
>>  c->x86_capability[FEATURESET_e21a] = cpuid_eax(0x8021);
> 
> Aiui this is needed right in this change because of calculate_host_policy()
> deriving from boot_cpu_data.x86_capability. What I'd have expected in
> addition (going forward: instead) is an adjustment to
> x86_cpu_policy_fill_native().

I'm sorry, but no, there should be no need to adjust that function.

Jan

Re: [PATCH v2 1/3] hotplug: Update block-tap

2024-04-22 Thread Anthony PERARD

On Sun, Apr 07, 2024 at 04:49:51PM -0400, Jason Andryuk wrote:
> diff --git a/tools/hotplug/Linux/block-tap b/tools/hotplug/Linux/block-tap
> index 89247921b9..126e472786 100755
> --- a/tools/hotplug/Linux/block-tap
> +++ b/tools/hotplug/Linux/block-tap
> @@ -18,11 +18,11 @@
>  #
>  # Usage:
>  #
> -# Target should be specified using the following syntax:
> +# Disks should be specified using the following syntax:
>  #
> -# script=block-tap,vdev=xvda,target=:
> +# vdev=xvda,backendtype=tap,format=vhd,target=/srv/target.vhd

I still have unanswered question from the previous round:
Is `block-tap` still going to work with the current example given in
the script header? That is:
"script=block-tap,vdev=xvda,target=:"
Or maybe, that example is already broken?

If it's not broken, there could be users which rely on it. But maybe
it's not really broken, and the new syntax is better anyway.

My guess is that using "script=block-tap,..." might still work, but
we should say something in the CHANGELOG to encourage people to move to
the new syntax, with "backendtype=tap" to avoid issues.

In any case, the patch looks good:
Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH] public: xen: Define missing guest handle for int32_t

2024-04-22 Thread Julien Grall


Hi Stefano,

On 17/04/2024 19:49, Stefano Stabellini wrote:

On Wed, 17 Apr 2024, Julien Grall wrote:

Hi Michal,

On 17/04/2024 13:14, Michal Orzel wrote:

Commit afab29d0882f ("public: s/int/int32_t") replaced int with int32_t
in XEN_GUEST_HANDLE() in memory.h but there is no guest handle defined
for it. This results in a build failure. Example on Arm:

./include/public/arch-arm.h:205:41: error: unknown type name
‘__guest_handle_64_int32_t’
205 | #define __XEN_GUEST_HANDLE(name)__guest_handle_64_ ## name
| ^~
./include/public/arch-arm.h:206:41: note: in expansion of macro
‘__XEN_GUEST_HANDLE’
206 | #define XEN_GUEST_HANDLE(name)  __XEN_GUEST_HANDLE(name)
| ^~
./include/public/memory.h:277:5: note: in expansion of macro
‘XEN_GUEST_HANDLE’
277 | XEN_GUEST_HANDLE(int32_t) errs;

Fix it. Also, drop guest handle definition for int given no further use.

Fixes: afab29d0882f ("public: s/int/int32_t")
Signed-off-by: Michal Orzel 


Reviewed-by: Stefano Stabellini 



So it turned out that I committed v1 from Stefano. I was meant to commit the
patch at all, but I think I started with a dirty staging :(. Sorry for that.

I have reverted Stefano's commit for now so we can take the correct patch.

Now, from my understanding, Andrew suggested on Matrix that this solution may
actually be a good way to handle GUEST_HANLDEs (they were removed in v2).
Maybe this can be folded in Stefano's patch?


v1 together with Michal's fix is correct. Also v2 alone is correct, or
v2 with Michal's fix is also correct.


I am slightly confused, v2 + Michal's fix means that 
XEN_GUEST_HANDLE(int) is removed and we introduce XEN_GUEST_INT(int32_t) 
with no user. So wouldn't this break the build?




My preference is v2 with Michal's fix, they can be committed as separate
patches. Also the others options are fine.


I am fine if you want to commit them separately. However, I am not sure 
your suggestion about using v2 + Michal's fix is actually correct.


Cheers,

--
Julien Grall

Re: [PATCH] x86/cpuid-policy: Add AMD SVM CPUID leaf to featureset

2024-04-22 Thread Jan Beulich

On 17.04.2024 15:22, George Dunlap wrote:
> Currently, the CPUID leaf for SVM features (extd 0xa.edx) is manually
> twiddled:
> 
>  - hvm_max_policy takes host_policy and clamps it to supported
>features (with some features unilaterally enabled because they're
>always emulated
> 
>  - hvm_default_policy is copied from there
> 
>  - When recalculate_policy() is called for a guest, if SVM is clear,
>then the entire leaf is zeroed out.
> 
> Move to a mode where the extended features are off by default, and
> enabled when nested_virt is enabled.
> 
> In cpufeatureset.h, define a new featureset word for the AMD SVM
> features, and declare all of the bits defined in
> x86/include/asm/hvm/svm/svm.h.  Mark the ones we currently pass
> through to the "max policy" as HAP-only and optional.
> 
> In cpu-policy.h, define FEATURESET_ead, and convert the un-named space
> in struct_cpu_policy into the appropriate union.  FIXME: Do this in a
> prerequisite patch, and change all references to p->extd.raw[0xa].

Just wondering: Did you mean to submit with this FIXME?

> Update x86_cpu_X_to_Y and Y_to_X to copy this into and out of the
> appropriate leaf.
> 
> Populate this during boot in generic_identify().
> 
> Add the new featureset definition into libxl_cpuid.c.
> 
> Update the code in calculate_hvm_max_policy() to do nothing with the
> "normal" CPUID bits, and use the feature bit to unconditionally enable
> VMCBCLEAN. FIXME Move this to a follow-up patch.
> 
> In recalculate_cpuid_policy(), enable max_fs when nested_hvm() is
> true.
> 
> Signed-off-by: George Dunlap 
> ---
> CC: Andrew Cooper 
> CC: Jan Beulich 
> CC: Roger Pau Monne 
> ---
>  tools/libs/light/libxl_cpuid.c  |  1 +
>  xen/arch/x86/cpu-policy.c   | 19 +--
>  xen/arch/x86/cpu/common.c   |  2 ++
>  xen/include/public/arch-x86/cpufeatureset.h | 16 
>  xen/include/xen/lib/x86/cpu-policy.h| 10 +-
>  xen/lib/x86/cpuid.c |  4 +++-
>  6 files changed, 40 insertions(+), 12 deletions(-)

tools/misc/xen-cpuid.c also wants adjusting, I think.

I further think the dependencies (on the SVM feature at the very least)
also want recording in xen/tools/gen-cpuid.py.

> @@ -909,6 +903,14 @@ void recalculate_cpuid_policy(struct domain *d)
>  __clear_bit(X86_FEATURE_VMX, max_fs);
>  __clear_bit(X86_FEATURE_SVM, max_fs);
>  }
> +else
> +{
> +/* 
> + * Enable SVM features.  This will be empty on VMX
> + * hosts. 
> + */
> +fs[FEATURESET_ead] = max_fs[FEATURESET_ead];
> +}
>  }

I'm afraid I don't understand this part: Why would you forcefully enable
everything, no matter what the tool stack set? Considering the if() part
above, wouldn't you want to mark the features non-optional, relying on
them being cleared (via dependencies) when SVM is clear?

> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -477,6 +477,8 @@ static void generic_identify(struct cpuinfo_x86 *c)
>   c->x86_capability[FEATURESET_e7d] = cpuid_edx(0x8007);
>   if (c->extended_cpuid_level >= 0x8008)
>   c->x86_capability[FEATURESET_e8b] = cpuid_ebx(0x8008);
> + if (c->extended_cpuid_level >= 0x800a)
> + c->x86_capability[FEATURESET_ead] = cpuid_edx(0x800a);
>   if (c->extended_cpuid_level >= 0x8021)
>   c->x86_capability[FEATURESET_e21a] = cpuid_eax(0x8021);

Aiui this is needed right in this change because of calculate_host_policy()
deriving from boot_cpu_data.x86_capability. What I'd have expected in
addition (going forward: instead) is an adjustment to
x86_cpu_policy_fill_native().

> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -357,6 +357,22 @@ XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A 
> Register File(s) cleared by VE
>  
>  /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.edx, word 17 */
>  
> +/* AMD-defined CPU features, CPUID level 0x800a.edx, word 18 */
> +XEN_CPUFEATURE(NPT, 18*32+ 0) /*h  Nested page table support 
> */
> +XEN_CPUFEATURE(LBRV,18*32+ 1) /*h  LBR virtualization 
> support */
> +XEN_CPUFEATURE(SVML,18*32+ 2) /*   SVM locking MSR support */
> +XEN_CPUFEATURE(NRIPS,   18*32+ 3) /*h  Next RIP save on VMEXIT 
> support */
> +XEN_CPUFEATURE(TSCRATEMSR,  18*32+ 4) /*   TSC ratio MSR support */
> +XEN_CPUFEATURE(VMCBCLEAN,   18*32+ 5) /*h  VMCB clean bits support */
> +XEN_CPUFEATURE(FLUSHBYASID, 18*32+ 6) /*   TLB flush by ASID support 
> */
> +XEN_CPUFEATURE(DECODEASSISTS,   18*32+ 7) /*h  Decode assists support */
> +XEN_CPUFEATURE(PAUSEFILTER, 18*32+10) /*h  Pause intercept filter 
> support */
> +XEN_CPUFEATURE(PAUSETHRESH, 18*32+12) /*   Pause intercept

[ovmf test] 185759: all pass - PUSHED

2024-04-22 Thread osstest service owner

flight 185759 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/185759/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 7dd7b890582b4d696ca5fd436dbc5fb4bc30e385
baseline version:
 ovmf be92e09206c2e4bb388e7c9127f048689841dd01

Last test of basis   185756  2024-04-22 03:19:18 Z0 days
Testing same since   185759  2024-04-22 13:14:40 Z0 days1 attempts


People who touched revisions under test:
  Ard Biesheuvel 
  Gerd Hoffmann 
  Jonathan Cameron 
  Konstantin Kostiuk 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   be92e09206..7dd7b89058  7dd7b890582b4d696ca5fd436dbc5fb4bc30e385 -> 
xen-tested-master

[linux-5.4 test] 185755: regressions - FAIL

2024-04-22 Thread osstest service owner

flight 185755 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/185755/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-libvirt  broken  in 185433
 build-arm64-pvopsbroken  in 185433
 build-arm64-xsm  broken  in 185433
 build-arm64-libvirt  5 host-build-prep fail in 185433 REGR. vs. 185168
 build-arm64-pvops5 host-build-prep fail in 185433 REGR. vs. 185168
 build-arm64-xsm  5 host-build-prep fail in 185433 REGR. vs. 185168
 test-arm64-arm64-libvirt-raw 17 guest-start/debian.repeat fail in 185751 REGR. 
vs. 185168

Tests which are failing intermittently (not blocking):
 test-amd64-i386-qemut-rhel6hvm-amd 14 guest-start/redhat.repeat fail in 185433 
pass in 185755
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail in 
185751 pass in 185755
 test-amd64-i386-libvirt-raw  12 debian-di-install  fail pass in 185433
 test-amd64-i386-libvirt-qcow2 12 debian-di-install fail pass in 185433
 test-amd64-i386-xl-vhd   12 debian-di-install  fail pass in 185433
 test-armhf-armhf-xl-multivcpu 14 guest-start   fail pass in 185751
 test-arm64-arm64-libvirt-raw 13 guest-startfail pass in 185751

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-examine  1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked in 185433 n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked in 185433 n/a
 test-armhf-armhf-xl-credit1 18 guest-start/debian.repeat fail blocked in 185168
 test-armhf-armhf-xl-credit2  19 guest-start.2 fail in 185433 blocked in 185168
 test-armhf-armhf-xl-credit1  14 guest-start fail in 185433 like 185168
 test-amd64-i386-libvirt-raw 14 migrate-support-check fail in 185433 never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check fail in 185433 never 
pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check fail in 185433 
never pass
 test-amd64-i386-libvirt-qcow2 14 migrate-support-check fail in 185433 never 
pass
 test-armhf-armhf-xl-multivcpu 18 guest-start/debian.repeat fail in 185751 like 
185168
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat fail in 185751 like 
185168
 test-arm64-arm64-libvirt-raw 14 migrate-support-check fail in 185751 never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-check fail in 185751 never 
pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 185168
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185168
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 185168
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185168
 test-armhf-armhf-xl-credit2  18 guest-start/debian.repeatfail  like 185168
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 185168
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185168
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185168
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185168
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 185168
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 185168
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-check

Re: [PATCH v2 2/5] x86: Refactor microcode_update() hypercall with flags

2024-04-22 Thread Jan Beulich

On 16.04.2024 11:15, Fouad Hilly wrote:
> Refactor microcode_update() hypercall by adding flags field.
> Introduce XENPF_microcode_update2 hypercall to handle flags field.
> struct xenpf_microcode_update updated to have uint32_t flags at
> the end of the sturcture.
> 
> [v2]
> 1- Update message description to highlight interface change.
> 2- Removed extra empty lines.
> 3- removed unnecessary define.
> 4- Corrected long lines.
> 5- Removed ternary operator.
> 6- Introduced static ucode_update_flags, which will be used later to 
> determine local ucode_force_flag.

Non-style comments on v1 have remained un-addressed. Specifically, to
give an example, while 1 says you now highlight the interface change,
the request was to explain why changing an existing structure is okay
(hint: it likely isn't, as the structure size changes for compat [aka
32-bit] callers).

I'm not going to give the same comments again; I'll rather expect you to
respond to them by either adjustments to the patch (or its description),
or by verbal replies.

Jan

Re: [PATCH v2 1/5] x86: Update x86 low level version check of microcode

2024-04-22 Thread Jan Beulich

On 16.04.2024 11:15, Fouad Hilly wrote:
> Update microcode version check at Intel and AMD Level by:
> Preventing the low level code from sending errors if the microcode
> version provided is not a newer version.

And why is this change (a) wanted and (b) correct?

> Other errors will be sent like before.
> When the provided microcode version is the same as the current one, code
> to point to microcode provided.

I'm afraid I can't interpret this sentence.

> Microcode version check happens at higher and common level in core.c.
> Keep all the required code at low level that checks for signature and CPU 
> compatibility
> 
> [v2]
> Update message description to better describe the changes

This belongs ...

> Signed-off-by: Fouad Hilly 
> ---

... below the separator.

> --- a/xen/arch/x86/cpu/microcode/amd.c
> +++ b/xen/arch/x86/cpu/microcode/amd.c
> @@ -383,12 +383,8 @@ static struct microcode_patch *cf_check 
> cpu_request_microcode(
>  goto skip;
>  }
>  
> -/*
> - * If the new ucode covers current CPU, compare ucodes and store 
> the
> - * one with higher revision.
> - */
> -if ( (microcode_fits(mc->patch) != MIS_UCODE) &&
> - (!saved || (compare_header(mc->patch, saved) == NEW_UCODE)) 
> )
> +/* If the provided ucode covers current CPU, then store its 
> revision. */
> +if ( (microcode_fits(mc->patch) != MIS_UCODE) && !saved )
>  {
>  saved = mc->patch;
>  saved_size = mc->len;
> --- a/xen/arch/x86/cpu/microcode/intel.c
> +++ b/xen/arch/x86/cpu/microcode/intel.c
> @@ -294,8 +294,7 @@ static int cf_check apply_microcode(const struct 
> microcode_patch *patch)
>  
>  result = microcode_update_match(patch);
>  
> -if ( result != NEW_UCODE &&
> - !(opt_ucode_allow_same && result == SAME_UCODE) )
> +if ( result != NEW_UCODE && result != SAME_UCODE )
>  return -EINVAL;

Unlike the other two adjustments this one results in still permitting
only same-or-newer. How does this fit with the AMD change above and
the other Intel change ...

> @@ -354,12 +353,8 @@ static struct microcode_patch *cf_check 
> cpu_request_microcode(
>  if ( error )
>  break;
>  
> -/*
> - * If the new update covers current CPU, compare updates and store 
> the
> - * one with higher revision.
> - */
> -if ( (microcode_update_match(mc) != MIS_UCODE) &&
> - (!saved || compare_revisions(saved->rev, mc->rev) == NEW_UCODE) 
> )
> +/* If the provided ucode covers current CPU, then store its 
> revision. */
> +if ( (microcode_update_match(mc) != MIS_UCODE) && !saved )
>  saved = mc;

... here?

Jan

Re: [PATCH v2 2/2] x86/spec: adjust logic to logic that elides lfence

2024-04-22 Thread Jan Beulich

On 22.04.2024 15:35, Roger Pau Monné wrote:
> On Fri, Apr 19, 2024 at 08:25:00AM +0200, Jan Beulich wrote:
>> On 18.04.2024 17:52, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/include/asm/cpufeature.h
>>> +++ b/xen/arch/x86/include/asm/cpufeature.h
>>> @@ -235,9 +235,6 @@ static inline bool boot_cpu_has(unsigned int feat)
>>>  #define cpu_bug_fpu_ptrsboot_cpu_has(X86_BUG_FPU_PTRS)
>>>  #define cpu_bug_null_segboot_cpu_has(X86_BUG_NULL_SEG)
>>>  
>>> -#define cpu_has_bhb_seq(boot_cpu_has(X86_SPEC_BHB_TSX) ||   \
>>> -boot_cpu_has(X86_SPEC_BHB_LOOPS))
>>
>> Might be worth also mentioning in the description that this construct was
>> lacking use of X86_SPEC_BHB_LOOPS_LONG (might even warrant a 2nd Fixes:
>> tag).
> 
> Heh, no, X86_SPEC_BHB_LOOPS_LONG is added in addition to
> X86_SPEC_BHB_LOOPS.   When using long loops we have both
> X86_SPEC_BHB_LOOPS and X86_SPEC_BHB_LOOPS_LONG set (I know it's
> confusing, I was also confused the first time and asked Andrew the
> same question).  See the fallthrough in bhi_calculations().

Oh, I see.

Andrew: This is a very good example of the separating blank line being
misleading when fall-through is intended.

Jan

Re: [PATCH v2 2/2] x86/spec: adjust logic to logic that elides lfence

2024-04-22 Thread Roger Pau Monné

On Fri, Apr 19, 2024 at 08:25:00AM +0200, Jan Beulich wrote:
> On 18.04.2024 17:52, Roger Pau Monne wrote:
> > It's currently too restrictive by just checking whether there's a BHB 
> > clearing
> > sequence selected.  It should instead check whether BHB clearing is used on
> > entry from PV or HVM specifically.
> > 
> > Switch to use opt_bhb_entry_{pv,hvm} instead, and then remove 
> > cpu_has_bhb_seq
> > since it no longer has any users.
> > 
> > Reported-by: Jan Beulich 
> > Fixes: 954c983abcee ('x86/spec-ctrl: Software BHB-clearing sequences')
> > Signed-off-by: Roger Pau Monné 
> 
> Except for the odd double "logic" in the title:
> Reviewed-by: Jan Beulich 

Thanks, title should be:

"adjust logic that elides lfence"

It was just a typo, I didn't intended to express anything additional.

> I can't really guess what is meant instead, so in order to possibly adjust
> while committing I'll need a hint. But committing will want to wait until
> Andrew has taken a look anyway, just like for patch 1.
> 
> > There (possibly) still a bit of overhead for dom0 if BHB clearing is not 
> > used
> > for dom0, as Xen would still add the lfence if domUs require it.
> 
> Right, but what do you do.
> 
> > --- a/xen/arch/x86/include/asm/cpufeature.h
> > +++ b/xen/arch/x86/include/asm/cpufeature.h
> > @@ -235,9 +235,6 @@ static inline bool boot_cpu_has(unsigned int feat)
> >  #define cpu_bug_fpu_ptrsboot_cpu_has(X86_BUG_FPU_PTRS)
> >  #define cpu_bug_null_segboot_cpu_has(X86_BUG_NULL_SEG)
> >  
> > -#define cpu_has_bhb_seq(boot_cpu_has(X86_SPEC_BHB_TSX) ||   \
> > -boot_cpu_has(X86_SPEC_BHB_LOOPS))
> 
> Might be worth also mentioning in the description that this construct was
> lacking use of X86_SPEC_BHB_LOOPS_LONG (might even warrant a 2nd Fixes:
> tag).

Heh, no, X86_SPEC_BHB_LOOPS_LONG is added in addition to
X86_SPEC_BHB_LOOPS.   When using long loops we have both
X86_SPEC_BHB_LOOPS and X86_SPEC_BHB_LOOPS_LONG set (I know it's
confusing, I was also confused the first time and asked Andrew the
same question).  See the fallthrough in bhi_calculations().

Thanks, Roger.

4.19 Release schedule - Reminder

2024-04-22 Thread Kelly Choi

Hello all,

*Please find below our 4.19 Release Schedule*
*(Original thread from Release Manager
)
*

** Proposed option: Wed Jul 10, 2024 **
(+9 months from Xen 4.18 release)

- Last posting date  Fri Apr 26, 2024

Patches adding new features are expected to be posted to the mailing
list by this date, although perhaps not in their final version.

- Feature freeze Fri May 17, 2024 (+3 weeks from Last
posting date)

Patches adding new features should be committed by this date.
Straightforward bug fixes may continue to be accepted by maintainers.

- Code freezeFri May 31, 2024 (+2 weeks from Feature
freeze)

Bugfixes only.

- Hard code freeze   Fri Jun 21, 2024 (+3 weeks from Code
freeze)

Bugfixes for serious bugs (including regressions), and low-risk fixes
only.

- Final commits  Fri Jul 05, 2024 (+2 weeks from Hard code
freeze)

Branch off staging-4.19.

- ReleaseWed Jul 10, 2024

Many thanks,
Kelly Choi

Community Manager
Xen Project

Re: [PATCH v2 2/2] livepatch: refuse to resolve symbols that belong to init sections

2024-04-22 Thread Jan Beulich

On 22.04.2024 13:34, Roger Pau Monné wrote:
> On Mon, Apr 22, 2024 at 12:57:55PM +0200, Jan Beulich wrote:
>> On 22.04.2024 12:49, Roger Pau Monné wrote:
>>> On Mon, Apr 22, 2024 at 10:25:40AM +0200, Jan Beulich wrote:
 On 22.04.2024 09:54, Roger Pau Monné wrote:
> On Fri, Apr 19, 2024 at 04:34:41PM +0200, Jan Beulich wrote:
>> On 19.04.2024 12:50, Roger Pau Monné wrote:
>>> On Fri, Apr 19, 2024 at 12:15:19PM +0200, Jan Beulich wrote:
 On 19.04.2024 12:02, Roger Pau Monne wrote:
> Livepatch payloads containing symbols that belong to init sections 
> can only
> lead to page faults later on, as by the time the livepatch is loaded 
> init
> sections have already been freed.
>
> Refuse to resolve such symbols and return an error instead.
>
> Note such resolutions are only relevant for symbols that point to 
> undefined
> sections (SHN_UNDEF), as that implies the symbol is not in the 
> current payload
> and hence must either be a Xen or a different livepatch payload 
> symbol.
>
> Do not allow to resolve symbols that point to __init_begin, as that 
> address is
> also unmapped.  On the other hand, __init_end is not unmapped, and 
> hence allow
> resolutions against it.
>
> Signed-off-by: Roger Pau Monné 
> ---
> Changes since v1:
>  - Fix off-by-one in range checking.

 Which means you addressed Andrew's comment while at the same time 
 extending
 the scope of ...

> @@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct 
> livepatch_elf *elf)
>  break;
>  }
>  }
> +
> +/*
> + * Ensure not an init symbol.  Only applicable to Xen 
> symbols, as
> + * livepatch payloads don't have init sections or 
> equivalent.
> + */
> +else if ( st_value >= (uintptr_t)&__init_begin &&
> +  st_value < (uintptr_t)&__init_end )
> +{
> +printk(XENLOG_ERR LIVEPATCH
> +   "%s: symbol %s is in init section, not 
> resolving\n",
> +   elf->name, elf->sym[i].name);

 ... what I raised concern about (and I had not seen any verbal reply 
 to that,
 I don't think).
>>>
>>> I've extended the commit message to explicitly mention the handling of
>>> bounds for __init_{begin,end} checks.  Let me know if you are not fine
>>> with it (or maybe you expected something else?).
>>
>> Well, you mention the two symbols you care about, but you don't mention
>> at all that these two may alias other symbols which might be legal to
>> reference from a livepatch.
>
> I'm having a hard time understanding why a livepatch would want to
> reference those, or any symbol in the .init sections for that matter.
> __init_{begin,end} are exclusively used to unmap the init region after
> boot.  What's the point in livepatch referencing data that's no
> longer there?  The only application I would think of is to calculate
> some kind of offsets, but that would better be done using other
> symbols instead.
>
> Please bear with me, it would be easier for me to understand if you
> could provide a concrete example.

 The issue is that you do comparison by address, not by name. Let's assume
 that .rodata ends exactly where .init.* starts. Then &__init_begin ==
 &_erodata == &__2M_rodata_end. Access to the latter two symbols wants to
 be permitted; only __init_begin and whatever ends up being the very first
 symbol in .init.* ought to not be referenced.
>>>
>>> Hm, I guess I could add comparison by name additionally, but it looks
>>> fragile to me.
>>
>> It looks fragile, yes. Thing is that you're trying to deal with this when
>> crucial information was lost already. Or wait - isn't the section number
>> still available in ->st_shndx?
> 
> But that's the section number of the to be resolved symbol?  In the
> livepatch payload it would for example appear as:
> 
>    F *UND* .hidden 
> setup_boot_APIC_clock
> 
> With undefined section.
> 
> It's possible I'm not following, is there a way to get the section
> number of the current Xen symbols, and then correlate it with the
> .init section?

Hmm, yes, looks like I was forgetting that livepatch resolves symbols
using kallsyms data, not Xen's ELF symbol table.

Jan

Re: [XEN PATCH v2 5/5] xen/arm: ffa: support notification

2024-04-22 Thread Julien Grall


Hi Jens,

This is not a full review of the code. I will let Bertrand doing it.

On 22/04/2024 08:37, Jens Wiklander wrote:

+void ffa_notif_init(void)
+{
+const struct arm_smccc_1_2_regs arg = {
+.a0 = FFA_FEATURES,
+.a1 = FFA_FEATURE_SCHEDULE_RECV_INTR,
+};
+struct arm_smccc_1_2_regs resp;
+unsigned int irq;
+int ret;
+
+arm_smccc_1_2_smc(, );
+if ( resp.a0 != FFA_SUCCESS_32 )
+return;
+
+irq = resp.a2;
+if ( irq >= NR_GIC_SGI )
+irq_set_type(irq, IRQ_TYPE_EDGE_RISING);
+ret = request_irq(irq, 0, notif_irq_handler, "FF-A notif", NULL);


If I am not mistaken, ffa_notif_init() is only called once on the boot 
CPU. However, request_irq() needs to be called on every CPU so the 
callback is registered every where and the interrupt enabled.


I know the name of the function is rather confusing. So can you confirm 
this is what you expected?


[...]


diff --git a/xen/arch/arm/tee/ffa_private.h b/xen/arch/arm/tee/ffa_private.h
index 98236cbf14a3..ef8ffd4526bd 100644
--- a/xen/arch/arm/tee/ffa_private.h
+++ b/xen/arch/arm/tee/ffa_private.h
@@ -25,6 +25,7 @@
  #define FFA_RET_DENIED  -6
  #define FFA_RET_RETRY   -7
  #define FFA_RET_ABORTED -8
+#define FFA_RET_NO_DATA -9
  
  /* FFA_VERSION helpers */

  #define FFA_VERSION_MAJOR_SHIFT 16U
@@ -97,6 +98,18 @@
   */
  #define FFA_MAX_SHM_COUNT   32
  
+/*

+ * TODO How to manage the available SGIs? SGI 8-15 seem to be entirely
+ * unused, but that may change.


Are the value below intended for the guests? If so, can they be moved in 
public/arch-arm.h along with the others guest interrupts?



+ *
+ * SGI is the preferred delivery mechanism. SGIs 8-15 are normally not used
+ * by a guest as they in a non-virtualized system typically are assigned to
+ * the secure world. Here we're free to use SGI 8-15 since they are virtual
+ * and have nothing to do with the secure world.


Do you have a pointer to the specification?

[...]

Cheers,

--
Julien Grall

Re: [PATCH v2 2/2] livepatch: refuse to resolve symbols that belong to init sections

2024-04-22 Thread Roger Pau Monné

On Mon, Apr 22, 2024 at 12:57:55PM +0200, Jan Beulich wrote:
> On 22.04.2024 12:49, Roger Pau Monné wrote:
> > On Mon, Apr 22, 2024 at 10:25:40AM +0200, Jan Beulich wrote:
> >> On 22.04.2024 09:54, Roger Pau Monné wrote:
> >>> On Fri, Apr 19, 2024 at 04:34:41PM +0200, Jan Beulich wrote:
>  On 19.04.2024 12:50, Roger Pau Monné wrote:
> > On Fri, Apr 19, 2024 at 12:15:19PM +0200, Jan Beulich wrote:
> >> On 19.04.2024 12:02, Roger Pau Monne wrote:
> >>> Livepatch payloads containing symbols that belong to init sections 
> >>> can only
> >>> lead to page faults later on, as by the time the livepatch is loaded 
> >>> init
> >>> sections have already been freed.
> >>>
> >>> Refuse to resolve such symbols and return an error instead.
> >>>
> >>> Note such resolutions are only relevant for symbols that point to 
> >>> undefined
> >>> sections (SHN_UNDEF), as that implies the symbol is not in the 
> >>> current payload
> >>> and hence must either be a Xen or a different livepatch payload 
> >>> symbol.
> >>>
> >>> Do not allow to resolve symbols that point to __init_begin, as that 
> >>> address is
> >>> also unmapped.  On the other hand, __init_end is not unmapped, and 
> >>> hence allow
> >>> resolutions against it.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>> ---
> >>> Changes since v1:
> >>>  - Fix off-by-one in range checking.
> >>
> >> Which means you addressed Andrew's comment while at the same time 
> >> extending
> >> the scope of ...
> >>
> >>> @@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct 
> >>> livepatch_elf *elf)
> >>>  break;
> >>>  }
> >>>  }
> >>> +
> >>> +/*
> >>> + * Ensure not an init symbol.  Only applicable to Xen 
> >>> symbols, as
> >>> + * livepatch payloads don't have init sections or 
> >>> equivalent.
> >>> + */
> >>> +else if ( st_value >= (uintptr_t)&__init_begin &&
> >>> +  st_value < (uintptr_t)&__init_end )
> >>> +{
> >>> +printk(XENLOG_ERR LIVEPATCH
> >>> +   "%s: symbol %s is in init section, not 
> >>> resolving\n",
> >>> +   elf->name, elf->sym[i].name);
> >>
> >> ... what I raised concern about (and I had not seen any verbal reply 
> >> to that,
> >> I don't think).
> >
> > I've extended the commit message to explicitly mention the handling of
> > bounds for __init_{begin,end} checks.  Let me know if you are not fine
> > with it (or maybe you expected something else?).
> 
>  Well, you mention the two symbols you care about, but you don't mention
>  at all that these two may alias other symbols which might be legal to
>  reference from a livepatch.
> >>>
> >>> I'm having a hard time understanding why a livepatch would want to
> >>> reference those, or any symbol in the .init sections for that matter.
> >>> __init_{begin,end} are exclusively used to unmap the init region after
> >>> boot.  What's the point in livepatch referencing data that's no
> >>> longer there?  The only application I would think of is to calculate
> >>> some kind of offsets, but that would better be done using other
> >>> symbols instead.
> >>>
> >>> Please bear with me, it would be easier for me to understand if you
> >>> could provide a concrete example.
> >>
> >> The issue is that you do comparison by address, not by name. Let's assume
> >> that .rodata ends exactly where .init.* starts. Then &__init_begin ==
> >> &_erodata == &__2M_rodata_end. Access to the latter two symbols wants to
> >> be permitted; only __init_begin and whatever ends up being the very first
> >> symbol in .init.* ought to not be referenced.
> > 
> > Hm, I guess I could add comparison by name additionally, but it looks
> > fragile to me.
> 
> It looks fragile, yes. Thing is that you're trying to deal with this when
> crucial information was lost already. Or wait - isn't the section number
> still available in ->st_shndx?

But that's the section number of the to be resolved symbol?  In the
livepatch payload it would for example appear as:

   F *UND*   .hidden setup_boot_APIC_clock

With undefined section.

It's possible I'm not following, is there a way to get the section
number of the current Xen symbols, and then correlate it with the
.init section?

Overall, I think it's unlikely for a livepatch to care about the
section start/end markers, albeit it would be good if we could make
this less ambiguous.

> 
> > So when st_value == __init_begin check if the name matches
> > "__init_begin" or "__2M_init_start" or "_sinittext", and fail
> > resolution, otherwise allow it?
> 
> Kind of, but that's not enough. For one, as said, the first

[PATCH v3.2 12/12] xen/arm: List static shared memory regions as /memory nodes

2024-04-22 Thread Luca Fancellu

Currently Xen is not exporting the static shared memory regions
to the device tree as /memory node, this commit is fixing this
issue.

Given that now make_memory_node needs a parameter 'struct kernel_info'
in order to call the new function shm_mem_node_fill_reg_range,
take the occasion to remove the unused struct domain parameter.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v3.2:
 - changed u64 to uint64_t and used %u as format specifier for var i
   in make_memory_node, changed u64 to paddr_t in
   shm_mem_node_fill_reg_range, add Michal R-by
v3:
 - removed previous patch that was merging the intervals, rebase
   changes.
v2:
 - try to use make_memory_node, don't add overlapping ranges of
   memory, commit message changed.
v1:
 - new patch
---
---
 xen/arch/arm/dom0less-build.c   |  2 +-
 xen/arch/arm/domain_build.c | 34 +
 xen/arch/arm/include/asm/domain_build.h |  2 +-
 xen/arch/arm/include/asm/static-shmem.h | 15 +++
 xen/arch/arm/static-shmem.c | 23 +
 5 files changed, 63 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 51cf03221d56..74f053c242f4 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -642,7 +642,7 @@ static int __init prepare_dtb_domU(struct domain *d, struct 
kernel_info *kinfo)
 if ( ret )
 goto err;
 
-ret = make_memory_node(d, kinfo->fdt, addrcells, sizecells,
+ret = make_memory_node(kinfo, addrcells, sizecells,
kernel_info_get_mem(kinfo));
 if ( ret )
 goto err;
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 68532ddc084c..0784e4c5e315 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -756,15 +756,14 @@ int __init domain_fdt_begin_node(void *fdt, const char 
*name, uint64_t unit)
 return fdt_begin_node(fdt, buf);
 }
 
-int __init make_memory_node(const struct domain *d,
-void *fdt,
-int addrcells, int sizecells,
-const struct membanks *mem)
+int __init make_memory_node(const struct kernel_info *kinfo, int addrcells,
+int sizecells, const struct membanks *mem)
 {
+void *fdt = kinfo->fdt;
 unsigned int i;
 int res, reg_size = addrcells + sizecells;
 int nr_cells = 0;
-__be32 reg[NR_MEM_BANKS * 4 /* Worst case addrcells + sizecells */];
+__be32 reg[DT_MEM_NODE_REG_RANGE_SIZE];
 __be32 *cells;
 
 if ( mem->nr_banks == 0 )
@@ -797,14 +796,28 @@ int __init make_memory_node(const struct domain *d,
 if ( mem->bank[i].type == MEMBANK_STATIC_DOMAIN )
 continue;
 
-dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
-   i, start, start + size);
-
 nr_cells += reg_size;
 BUG_ON(nr_cells >= ARRAY_SIZE(reg));
 dt_child_set_range(, addrcells, sizecells, start, size);
 }
 
+/*
+ * static shared memory banks need to be listed as /memory node, so when
+ * this function is handling the normal memory, add the banks.
+ */
+if ( mem == kernel_info_get_mem(kinfo) )
+shm_mem_node_fill_reg_range(kinfo, reg, _cells, addrcells,
+sizecells);
+
+for ( cells = reg, i = 0; cells < reg + nr_cells; i++, cells += reg_size )
+{
+uint64_t start = dt_read_number(cells, addrcells);
+uint64_t size = dt_read_number(cells + addrcells, sizecells);
+
+dt_dprintk("  Bank %u: %#"PRIx64"->%#"PRIx64"\n",
+   i, start, start + size);
+}
+
 dt_dprintk("(reg size %d, nr cells %d)\n", reg_size, nr_cells);
 
 res = fdt_property(fdt, "reg", reg, nr_cells * sizeof(*reg));
@@ -1783,7 +1796,7 @@ static int __init handle_node(struct domain *d, struct 
kernel_info *kinfo,
 if ( res )
 return res;
 
-res = make_memory_node(d, kinfo->fdt, addrcells, sizecells,
+res = make_memory_node(kinfo, addrcells, sizecells,
kernel_info_get_mem(kinfo));
 if ( res )
 return res;
@@ -1794,8 +1807,7 @@ static int __init handle_node(struct domain *d, struct 
kernel_info *kinfo,
  */
 if ( reserved_mem->nr_banks > 0 )
 {
-res = make_memory_node(d, kinfo->fdt, addrcells, sizecells,
-   reserved_mem);
+res = make_memory_node(kinfo, addrcells, sizecells, reserved_mem);
 if ( res )
 return res;
 }
diff --git a/xen/arch/arm/include/asm/domain_build.h 
b/xen/arch/arm/include/asm/domain_build.h
index 026d975da28e..45936212ca21 100644
--- a/xen/arch/arm/include/asm/domain_build.h
+++ b/xen/arch/arm/include/asm/domain_build.h
@@ -14,7 +14,7 @@ int make_chosen_node(const struct kernel_info *kinfo);
 int make_cpus_node(const

Re: [PATCH v2 2/2] livepatch: refuse to resolve symbols that belong to init sections

2024-04-22 Thread Jan Beulich

On 22.04.2024 12:49, Roger Pau Monné wrote:
> On Mon, Apr 22, 2024 at 10:25:40AM +0200, Jan Beulich wrote:
>> On 22.04.2024 09:54, Roger Pau Monné wrote:
>>> On Fri, Apr 19, 2024 at 04:34:41PM +0200, Jan Beulich wrote:
 On 19.04.2024 12:50, Roger Pau Monné wrote:
> On Fri, Apr 19, 2024 at 12:15:19PM +0200, Jan Beulich wrote:
>> On 19.04.2024 12:02, Roger Pau Monne wrote:
>>> Livepatch payloads containing symbols that belong to init sections can 
>>> only
>>> lead to page faults later on, as by the time the livepatch is loaded 
>>> init
>>> sections have already been freed.
>>>
>>> Refuse to resolve such symbols and return an error instead.
>>>
>>> Note such resolutions are only relevant for symbols that point to 
>>> undefined
>>> sections (SHN_UNDEF), as that implies the symbol is not in the current 
>>> payload
>>> and hence must either be a Xen or a different livepatch payload symbol.
>>>
>>> Do not allow to resolve symbols that point to __init_begin, as that 
>>> address is
>>> also unmapped.  On the other hand, __init_end is not unmapped, and 
>>> hence allow
>>> resolutions against it.
>>>
>>> Signed-off-by: Roger Pau Monné 
>>> ---
>>> Changes since v1:
>>>  - Fix off-by-one in range checking.
>>
>> Which means you addressed Andrew's comment while at the same time 
>> extending
>> the scope of ...
>>
>>> @@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct 
>>> livepatch_elf *elf)
>>>  break;
>>>  }
>>>  }
>>> +
>>> +/*
>>> + * Ensure not an init symbol.  Only applicable to Xen 
>>> symbols, as
>>> + * livepatch payloads don't have init sections or 
>>> equivalent.
>>> + */
>>> +else if ( st_value >= (uintptr_t)&__init_begin &&
>>> +  st_value < (uintptr_t)&__init_end )
>>> +{
>>> +printk(XENLOG_ERR LIVEPATCH
>>> +   "%s: symbol %s is in init section, not 
>>> resolving\n",
>>> +   elf->name, elf->sym[i].name);
>>
>> ... what I raised concern about (and I had not seen any verbal reply to 
>> that,
>> I don't think).
>
> I've extended the commit message to explicitly mention the handling of
> bounds for __init_{begin,end} checks.  Let me know if you are not fine
> with it (or maybe you expected something else?).

 Well, you mention the two symbols you care about, but you don't mention
 at all that these two may alias other symbols which might be legal to
 reference from a livepatch.
>>>
>>> I'm having a hard time understanding why a livepatch would want to
>>> reference those, or any symbol in the .init sections for that matter.
>>> __init_{begin,end} are exclusively used to unmap the init region after
>>> boot.  What's the point in livepatch referencing data that's no
>>> longer there?  The only application I would think of is to calculate
>>> some kind of offsets, but that would better be done using other
>>> symbols instead.
>>>
>>> Please bear with me, it would be easier for me to understand if you
>>> could provide a concrete example.
>>
>> The issue is that you do comparison by address, not by name. Let's assume
>> that .rodata ends exactly where .init.* starts. Then &__init_begin ==
>> &_erodata == &__2M_rodata_end. Access to the latter two symbols wants to
>> be permitted; only __init_begin and whatever ends up being the very first
>> symbol in .init.* ought to not be referenced.
> 
> Hm, I guess I could add comparison by name additionally, but it looks
> fragile to me.

It looks fragile, yes. Thing is that you're trying to deal with this when
crucial information was lost already. Or wait - isn't the section number
still available in ->st_shndx?

> So when st_value == __init_begin check if the name matches
> "__init_begin" or "__2M_init_start" or "_sinittext", and fail
> resolution, otherwise allow it?

Kind of, but that's not enough. For one, as said, the first symbol in
the first .init.* section would also have this same address, and would
also want rejecting. And then the same would be needed for __init_end.

>> Worse (but contrived) would be cases of "constructed" symbols derived from
>> ones ahead of __init_begin, with an offset large enough to actually point
>> into .init.*. Such are _still_ valid to be used in calculations, as long
>> as no deref occurs for anything at or past __init_begin.
> 
> But one would have to craft such a symbol specifically, at which point
> I consider this out of the scope of what this patch is attempting to
> protect against.  The aim is not to prevent malicious livepatches, and
> there are far easier ways to trigger a page-fault from a livepatch.

I understand the latter is the case, but I'm afraid I then

Re: [XEN PATCH v2 4/5] xen/arm: allow dynamically assigned SGI handlers

2024-04-22 Thread Julien Grall


Hi Jens,

On 22/04/2024 08:37, Jens Wiklander wrote:

Updates so request_irq() can be used with a dynamically assigned SGI irq
as input. This prepares for a later patch where an FF-A schedule
receiver interrupt handler is installed for an SGI generated by the
secure world.


I would like to understand the use-case a bit more. Who is responsible 
to decide the SGI number? Is it Xen or the firmware?


If the later, how can we ever guarantee the ID is not going to clash 
with what the OS/hypervisor is using? Is it described in a 
specification? If so, please give a pointer.




gic_route_irq_to_xen() don't gic_set_irq_type() for SGIs since they are
always edge triggered.

gic_interrupt() is updated to route the dynamically assigned SGIs to
do_IRQ() instead of do_sgi(). The latter still handles the statically
assigned SGI handlers like for instance GIC_SGI_CALL_FUNCTION.

Signed-off-by: Jens Wiklander 
---
v1->v2
- Update patch description as requested
---
  xen/arch/arm/gic.c | 5 +++--
  xen/arch/arm/irq.c | 7 +--


I am not sure where to write the comment. But I think the comment on top 
of irq_set_affinity() in setup_irq() should also be updated.



  2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 44c40e86defe..e9aeb7138455 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -117,7 +117,8 @@ void gic_route_irq_to_xen(struct irq_desc *desc, unsigned 
int priority)
  
  desc->handler = gic_hw_ops->gic_host_irq_type;
  
-gic_set_irq_type(desc, desc->arch.type);

+if ( desc->irq >= NR_GIC_SGI)
+gic_set_irq_type(desc, desc->arch.type);


So above, you say that the SGIs are always edge-triggered interrupt. So 
I assume desc->arch.type. So are you skipping the call because it is 
unnessary or it could do the wrong thing?


Ideally, the outcome of the answer be part of the comment on top of the 
check.



  gic_set_irq_priority(desc, priority);
  }
  
@@ -375,7 +376,7 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)

  /* Reading IRQ will ACK it */
  irq = gic_hw_ops->read_irq();
  
-if ( likely(irq >= 16 && irq < 1020) )

+if ( likely(irq >= GIC_SGI_MAX && irq < 1020) )


This check is now rather confusing as one could think that do_IRQ() 
would still not be reached for dynamic SGI. I think it would be clearer 
GIC_SGI_MAX needs to be renamed to GIC_SGI_STATIC_MAX and do_sgi() to 
do_static_sgi().



  {
  isb();
  do_IRQ(regs, irq, is_fiq);
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index bcce80a4d624..fdb214560978 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -224,9 +224,12 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, 
int is_fiq)
  
  perfc_incr(irqs);
  
-ASSERT(irq >= 16); /* SGIs do not come down this path */

+/* Statically assigned SGIs do not come down this path */
+ASSERT(irq >= GIC_SGI_MAX);



With this change, I think the path with vgic_inject_irq() now needs to 
gain an ASSERT(irq >= NR_GIC_SGI) because the path is not supposed to be 
taken for SGIs.


  
-if ( irq < 32 )

+if ( irq < NR_GIC_SGI )
+perfc_incr(ipis);
+else if ( irq < NR_GIC_LOCAL_IRQS )
  perfc_incr(ppis);
  else
  perfc_incr(spis);


Cheers,

--
Julien Grall

Re: [PATCH v2 2/2] livepatch: refuse to resolve symbols that belong to init sections

2024-04-22 Thread Roger Pau Monné

On Mon, Apr 22, 2024 at 10:25:40AM +0200, Jan Beulich wrote:
> On 22.04.2024 09:54, Roger Pau Monné wrote:
> > On Fri, Apr 19, 2024 at 04:34:41PM +0200, Jan Beulich wrote:
> >> On 19.04.2024 12:50, Roger Pau Monné wrote:
> >>> On Fri, Apr 19, 2024 at 12:15:19PM +0200, Jan Beulich wrote:
>  On 19.04.2024 12:02, Roger Pau Monne wrote:
> > Livepatch payloads containing symbols that belong to init sections can 
> > only
> > lead to page faults later on, as by the time the livepatch is loaded 
> > init
> > sections have already been freed.
> >
> > Refuse to resolve such symbols and return an error instead.
> >
> > Note such resolutions are only relevant for symbols that point to 
> > undefined
> > sections (SHN_UNDEF), as that implies the symbol is not in the current 
> > payload
> > and hence must either be a Xen or a different livepatch payload symbol.
> >
> > Do not allow to resolve symbols that point to __init_begin, as that 
> > address is
> > also unmapped.  On the other hand, __init_end is not unmapped, and 
> > hence allow
> > resolutions against it.
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Changes since v1:
> >  - Fix off-by-one in range checking.
> 
>  Which means you addressed Andrew's comment while at the same time 
>  extending
>  the scope of ...
> 
> > @@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct 
> > livepatch_elf *elf)
> >  break;
> >  }
> >  }
> > +
> > +/*
> > + * Ensure not an init symbol.  Only applicable to Xen 
> > symbols, as
> > + * livepatch payloads don't have init sections or 
> > equivalent.
> > + */
> > +else if ( st_value >= (uintptr_t)&__init_begin &&
> > +  st_value < (uintptr_t)&__init_end )
> > +{
> > +printk(XENLOG_ERR LIVEPATCH
> > +   "%s: symbol %s is in init section, not 
> > resolving\n",
> > +   elf->name, elf->sym[i].name);
> 
>  ... what I raised concern about (and I had not seen any verbal reply to 
>  that,
>  I don't think).
> >>>
> >>> I've extended the commit message to explicitly mention the handling of
> >>> bounds for __init_{begin,end} checks.  Let me know if you are not fine
> >>> with it (or maybe you expected something else?).
> >>
> >> Well, you mention the two symbols you care about, but you don't mention
> >> at all that these two may alias other symbols which might be legal to
> >> reference from a livepatch.
> > 
> > I'm having a hard time understanding why a livepatch would want to
> > reference those, or any symbol in the .init sections for that matter.
> > __init_{begin,end} are exclusively used to unmap the init region after
> > boot.  What's the point in livepatch referencing data that's no
> > longer there?  The only application I would think of is to calculate
> > some kind of offsets, but that would better be done using other
> > symbols instead.
> > 
> > Please bear with me, it would be easier for me to understand if you
> > could provide a concrete example.
> 
> The issue is that you do comparison by address, not by name. Let's assume
> that .rodata ends exactly where .init.* starts. Then &__init_begin ==
> &_erodata == &__2M_rodata_end. Access to the latter two symbols wants to
> be permitted; only __init_begin and whatever ends up being the very first
> symbol in .init.* ought to not be referenced.

Hm, I guess I could add comparison by name additionally, but it looks
fragile to me.

So when st_value == __init_begin check if the name matches
"__init_begin" or "__2M_init_start" or "_sinittext", and fail
resolution, otherwise allow it?

> Worse (but contrived) would be cases of "constructed" symbols derived from
> ones ahead of __init_begin, with an offset large enough to actually point
> into .init.*. Such are _still_ valid to be used in calculations, as long
> as no deref occurs for anything at or past __init_begin.

But one would have to craft such a symbol specifically, at which point
I consider this out of the scope of what this patch is attempting to
protect against.  The aim is not to prevent malicious livepatches, and
there are far easier ways to trigger a page-fault from a livepatch.

Thanks, Roger.

Re: [PATCH v3 12/12] xen/arm: List static shared memory regions as /memory nodes

2024-04-22 Thread Julien Grall


Hi Luca,

On 22/04/2024 11:39, Luca Fancellu wrote:




On 22 Apr 2024, at 11:24, Julien Grall  wrote:

Hi,

On 22/04/2024 10:26, Michal Orzel wrote:

On 22/04/2024 10:07, Luca Fancellu wrote:



Hi Michal,


+for ( cells = reg, i = 0; cells < reg + nr_cells; i++, cells += reg_size )
+{
+u64 start = dt_read_number(cells, addrcells);

We should no longer use Linux derived types like u64. Use uint64_t.


+u64 size = dt_read_number(cells + addrcells, sizecells);
+
+dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
+   i, start, start + size);

i is unsigned so the correct format specifier should be %u


Right, should have been more careful when copying the code from above



+void __init shm_mem_node_fill_reg_range(const struct kernel_info *kinfo,
+__be32 *reg, int *nr_cells,
+int addrcells, int sizecells)
+{
+const struct membanks *mem = >shm_mem.common;
+unsigned int i;
+__be32 *cells;
+
+BUG_ON(!nr_cells || !reg);
+
+cells = [*nr_cells];
+for ( i = 0; i < mem->nr_banks; i++ )
+{
+u64 start = mem->bank[i].start;

ditto


Will fix, here paddr_t should be ok isn’t it?

yes




Rest LGTM:
Reviewed-by: Michal Orzel 


Thanks, I will send the next one shortly.

I don't think there is a need to respin the whole series just for these fixes.
You should wait for the committers opinion.


AFAICT, there are multiple changes requested in various line. So I would rather 
prefer if this is respinned.

If this is the only patch that requires to change. You could send a new one in 
reply-to this patch. I think b4 is clever enough to pick up the new version in 
that case.


All the other patches are already reviewed by Michal, if you agree with his 
review it’s ok for me to respin only this one


I didn't review the patch in details. But I agree with his comments on it.

Cheers,

--
Julien Grall

Re: [PATCH v3 12/12] xen/arm: List static shared memory regions as /memory nodes

2024-04-22 Thread Luca Fancellu



> On 22 Apr 2024, at 11:24, Julien Grall  wrote:
> 
> Hi,
> 
> On 22/04/2024 10:26, Michal Orzel wrote:
>> On 22/04/2024 10:07, Luca Fancellu wrote:
>>> 
>>> 
>>> Hi Michal,
>>> 
> +for ( cells = reg, i = 0; cells < reg + nr_cells; i++, cells += 
> reg_size )
> +{
> +u64 start = dt_read_number(cells, addrcells);
 We should no longer use Linux derived types like u64. Use uint64_t.
 
> +u64 size = dt_read_number(cells + addrcells, sizecells);
> +
> +dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
> +   i, start, start + size);
 i is unsigned so the correct format specifier should be %u
>>> 
>>> Right, should have been more careful when copying the code from above
>>> 
> 
> +void __init shm_mem_node_fill_reg_range(const struct kernel_info *kinfo,
> +__be32 *reg, int *nr_cells,
> +int addrcells, int sizecells)
> +{
> +const struct membanks *mem = >shm_mem.common;
> +unsigned int i;
> +__be32 *cells;
> +
> +BUG_ON(!nr_cells || !reg);
> +
> +cells = [*nr_cells];
> +for ( i = 0; i < mem->nr_banks; i++ )
> +{
> +u64 start = mem->bank[i].start;
 ditto
>>> 
>>> Will fix, here paddr_t should be ok isn’t it?
>> yes
>>> 
 
 Rest LGTM:
 Reviewed-by: Michal Orzel 
>>> 
>>> Thanks, I will send the next one shortly.
>> I don't think there is a need to respin the whole series just for these 
>> fixes.
>> You should wait for the committers opinion.
> 
> AFAICT, there are multiple changes requested in various line. So I would 
> rather prefer if this is respinned.
> 
> If this is the only patch that requires to change. You could send a new one 
> in reply-to this patch. I think b4 is clever enough to pick up the new 
> version in that case.

All the other patches are already reviewed by Michal, if you agree with his 
review it’s ok for me to respin only this one

Cheers,
Luca

> 
> Cheers,
> 
> -- 
> Julien Grall

Re: [PATCH v2 2/2] x86/pvh: zero VGA information

2024-04-22 Thread Jan Beulich

On 22.04.2024 11:52, Roger Pau Monne wrote:
> PVH guests skip real mode VGA detection, and never have a VGA available, hence
> the default VGA selection is not applicable, and at worse can cause confusion
> when parsing Xen boot log.
> 
> Zero the boot_vid_info structure when Xen is booted from the PVH entry point.
> 
> This fixes Xen incorrectly reporting:
> 
> (XEN) Video information:
> (XEN)  VGA is text mode 80x25, font 8x16
> 
> When booted as a PVH guest.
> 
> Reported-by: Andrew Cooper 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich

Re: [PATCH v3 12/12] xen/arm: List static shared memory regions as /memory nodes

2024-04-22 Thread Julien Grall


Hi,

On 22/04/2024 10:26, Michal Orzel wrote:



On 22/04/2024 10:07, Luca Fancellu wrote:



Hi Michal,


+for ( cells = reg, i = 0; cells < reg + nr_cells; i++, cells += reg_size )
+{
+u64 start = dt_read_number(cells, addrcells);

We should no longer use Linux derived types like u64. Use uint64_t.


+u64 size = dt_read_number(cells + addrcells, sizecells);
+
+dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
+   i, start, start + size);

i is unsigned so the correct format specifier should be %u


Right, should have been more careful when copying the code from above



+void __init shm_mem_node_fill_reg_range(const struct kernel_info *kinfo,
+__be32 *reg, int *nr_cells,
+int addrcells, int sizecells)
+{
+const struct membanks *mem = >shm_mem.common;
+unsigned int i;
+__be32 *cells;
+
+BUG_ON(!nr_cells || !reg);
+
+cells = [*nr_cells];
+for ( i = 0; i < mem->nr_banks; i++ )
+{
+u64 start = mem->bank[i].start;

ditto


Will fix, here paddr_t should be ok isn’t it?

yes





Rest LGTM:
Reviewed-by: Michal Orzel 


Thanks, I will send the next one shortly.

I don't think there is a need to respin the whole series just for these fixes.
You should wait for the committers opinion.


AFAICT, there are multiple changes requested in various line. So I would 
rather prefer if this is respinned.


If this is the only patch that requires to change. You could send a new 
one in reply-to this patch. I think b4 is clever enough to pick up the 
new version in that case.


Cheers,

--
Julien Grall

[PATCH v2 1/2] x86/video: add boot_video_info offset generation to asm-offsets

2024-04-22 Thread Roger Pau Monne

Currently the offsets into the boot_video_info struct are manually encoded in
video.S, which is fragile.  Generate them in asm-offsets.c and switch the
current code to use those instead.

No functional change intended.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Andrew Cooper 
---
Changes since v1:
 - Style changes.
 - Calculate remaining BVI size without referencing specific fields.
---
 xen/arch/x86/boot/video.S | 83 ---
 xen/arch/x86/x86_64/asm-offsets.c | 26 ++
 2 files changed, 58 insertions(+), 51 deletions(-)

diff --git a/xen/arch/x86/boot/video.S b/xen/arch/x86/boot/video.S
index 0ae04f270f8c..f78906878a16 100644
--- a/xen/arch/x86/boot/video.S
+++ b/xen/arch/x86/boot/video.S
@@ -26,32 +26,13 @@
 /* Force 400 scan lines for standard modes (hack to fix bad BIOS behaviour */
 #undef CONFIG_VIDEO_400_HACK
 
-/* Positions of various video parameters passed to the kernel */
-/* (see also include/linux/tty.h) */
-#define PARAM_CURSOR_POS0x00
-#define PARAM_VIDEO_MODE0x02
-#define PARAM_VIDEO_COLS0x03
-#define PARAM_VIDEO_LINES   0x04
-#define PARAM_HAVE_VGA  0x05
-#define PARAM_FONT_POINTS   0x06
-#define PARAM_CAPABILITIES  0x08
-#define PARAM_LFB_LINELENGTH0x0c
-#define PARAM_LFB_WIDTH 0x0e
-#define PARAM_LFB_HEIGHT0x10
-#define PARAM_LFB_DEPTH 0x12
-#define PARAM_LFB_BASE  0x14
-#define PARAM_LFB_SIZE  0x18
-#define PARAM_LFB_COLORS0x1c
-#define PARAM_VESAPM_SEG0x24
-#define PARAM_VESAPM_OFF0x26
-#define PARAM_VESA_ATTRIB   0x28
 #define _param(param) bootsym(boot_vid_info)+(param)
 
 video:  xorw%ax, %ax
 movw%ax, %gs# GS is zero
 cld
 callbasic_detect# Basic adapter type testing (EGA/VGA/MDA/CGA)
-cmpb$0,_param(PARAM_HAVE_VGA)
+cmpb$0, _param(BVI_have_vga)
 je  1f# Bail if there's no VGA
 movwbootsym(boot_vid_mode), %ax # User selected video mode
 cmpw$ASK_VGA, %ax   # Bring up the menu
@@ -69,7 +50,7 @@ vid1:   callstore_edid
 
 # Detect if we have CGA, MDA, EGA or VGA and pass it to the kernel.
 basic_detect:
-movb$0, _param(PARAM_HAVE_VGA)
+movb$0, _param(BVI_have_vga)
 movb$0x12, %ah  # Check EGA/VGA
 movb$0x10, %bl
 int $0x10
@@ -79,7 +60,7 @@ basic_detect:
 int $0x10
 cmpb$0x1a, %al  # 1a means VGA...
 jne basret  # anything else is EGA.
-incb_param(PARAM_HAVE_VGA)  # We've detected a VGA
+incb_param(BVI_have_vga)# We've detected a VGA
 basret: ret
 
 # Store the video mode parameters for later usage by the kernel.
@@ -92,57 +73,57 @@ mode_params:
 movb$0x03, %ah  # Read cursor position
 xorb%bh, %bh
 int $0x10
-movw%dx, _param(PARAM_CURSOR_POS)
+movw%dx, _param(BVI_cursor_pos)
 movb$0x0f, %ah  # Read page/mode/width
 int $0x10
-movw%ax, _param(PARAM_VIDEO_MODE)   # Video mode and screen width
+movw%ax, _param(BVI_video_mode) # Video mode and screen width
 movw%gs:(0x485), %ax# Font size
-movw%ax, _param(PARAM_FONT_POINTS)  # (valid only on EGA/VGA)
+movw%ax, _param(BVI_font_points)# (valid only on EGA/VGA)
 movwbootsym(force_size), %ax# Forced size?
 orw %ax, %ax
 jz  mopar1
 
-movb%ah, _param(PARAM_VIDEO_COLS)
-movb%al, _param(PARAM_VIDEO_LINES)
+movb%ah, _param(BVI_video_cols)
+movb%al, _param(BVI_video_lines)
 ret
 
 mopar1: movb%gs:(0x484), %al# On EGA/VGA, use the EGA+ BIOS
 incb%al # location of max lines.
-mopar2: movb%al, _param(PARAM_VIDEO_LINES)
+mopar2: movb%al, _param(BVI_video_lines)
 ret
 
 # Fetching of VESA frame buffer parameters
 mopar_gr:
 movw$vesa_mode_info, %di
-movb$0x23, _param(PARAM_HAVE_VGA)
+movb$0x23, _param(BVI_have_vga)
 movw16(%di), %ax
-movw%ax, _param(PARAM_LFB_LINELENGTH)
+movw%ax, _param(BVI_lfb_linelength)
 movw18(%di), %ax
-movw%ax, _param(PARAM_LFB_WIDTH)
+movw%ax, _param(BVI_lfb_width)
 movw20(%di), %ax
-movw%ax, _param(PARAM_LFB_HEIGHT)
+movw%ax, _param(BVI_lfb_height)
 movzbw  25(%di), %ax
-movw%ax, _param(PARAM_LFB_DEPTH)
+movw%ax, _param(BVI_lfb_depth)
 movl40(%di), %eax
-movl%eax, _param(PARAM_LFB_BASE)
+movl%eax, _param(BVI_lfb_base)
 movl

[PATCH v2 0/2] x86/pvh: fix VGA reporting when booted as a PVH guest

2024-04-22 Thread Roger Pau Monne

Hello,

Following patches are kind of unconnected after v1, but patch 1/2 is
still a helpful improvement.

Patch 2 fixes the (wrong) reporting of VGA information when Xen is
booted as a PVH guest.

Thanks, Roger.

Roger Pau Monne (2):
  x86/video: add boot_video_info offset generation to asm-offsets
  x86/pvh: zero VGA information

 xen/arch/x86/boot/video.S | 83 ---
 xen/arch/x86/boot/video.h |  2 +
 xen/arch/x86/guest/xen/pvh-boot.c |  9 
 xen/arch/x86/setup.c  |  1 -
 xen/arch/x86/x86_64/asm-offsets.c | 26 ++
 5 files changed, 69 insertions(+), 52 deletions(-)

-- 
2.44.0

[PATCH v2 2/2] x86/pvh: zero VGA information

2024-04-22 Thread Roger Pau Monne

PVH guests skip real mode VGA detection, and never have a VGA available, hence
the default VGA selection is not applicable, and at worse can cause confusion
when parsing Xen boot log.

Zero the boot_vid_info structure when Xen is booted from the PVH entry point.

This fixes Xen incorrectly reporting:

(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16

When booted as a PVH guest.

Reported-by: Andrew Cooper 
Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/boot/video.h | 2 ++
 xen/arch/x86/guest/xen/pvh-boot.c | 9 +
 xen/arch/x86/setup.c  | 1 -
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/boot/video.h b/xen/arch/x86/boot/video.h
index 6a7775d24292..1203515f9e5b 100644
--- a/xen/arch/x86/boot/video.h
+++ b/xen/arch/x86/boot/video.h
@@ -67,6 +67,8 @@ struct boot_video_info {
 } vesapm;
 uint16_t vesa_attrib;/* 0x28 */
 };
+
+extern struct boot_video_info boot_vid_info;
 #endif /* __ASSEMBLY__ */
 
 #endif /* __BOOT_VIDEO_H__ */
diff --git a/xen/arch/x86/guest/xen/pvh-boot.c 
b/xen/arch/x86/guest/xen/pvh-boot.c
index 9cbe87b61bdd..cc57ab2cbcde 100644
--- a/xen/arch/x86/guest/xen/pvh-boot.c
+++ b/xen/arch/x86/guest/xen/pvh-boot.c
@@ -15,6 +15,10 @@
 
 #include 
 
+#ifdef CONFIG_VIDEO
+# include "../../boot/video.h"
+#endif
+
 /* Initialised in head.S, before .bss is zeroed. */
 bool __initdata pvh_boot;
 uint32_t __initdata pvh_start_info_pa;
@@ -95,6 +99,11 @@ void __init pvh_init(multiboot_info_t **mbi, module_t **mod)
 ASSERT(xen_guest);
 
 get_memory_map();
+
+#ifdef CONFIG_VIDEO
+/* No VGA available when booted from the PVH entry point. */
+memset((boot_vid_info), 0, sizeof(boot_vid_info));
+#endif
 }
 
 void __init pvh_print_info(void)
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 86cd8b999774..449a3476531e 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -646,7 +646,6 @@ static struct e820map __initdata boot_e820;
 
 #ifdef CONFIG_VIDEO
 # include "boot/video.h"
-extern struct boot_video_info boot_vid_info;
 #endif
 
 static void __init parse_video_info(void)
-- 
2.44.0

Re: [PATCH v3] xen/riscv: check whether the assembler has Zbb extension support

2024-04-22 Thread Jan Beulich

On 19.04.2024 16:23, Oleksii Kurochko wrote:
> Update the argument of the as-insn for the Zbb case to verify that
> Zbb is supported not only by a compiler, but also by an assembler.
> 
> Also, check-extenstion(ext_name, "insn") helper macro is introduced
> to check whether extension is supported by a compiler and an assembler.
> 
> Signed-off-by: Oleksii Kurochko 

Acked-by: Jan Beulich 
despite ...

> --- a/xen/arch/riscv/arch.mk
> +++ b/xen/arch/riscv/arch.mk
> @@ -11,9 +11,14 @@ riscv-march-$(CONFIG_RISCV_ISA_C)   := 
> $(riscv-march-y)c
>  
>  riscv-generic-flags := $(riscv-abi-y) -march=$(riscv-march-y)
>  
> -zbb := $(call as-insn,$(CC) $(riscv-generic-flags)_zbb,"",_zbb)
> -zihintpause := $(call as-insn, \
> -  $(CC) 
> $(riscv-generic-flags)_zihintpause,"pause",_zihintpause)
> +# check-extension: Check whether extenstion is supported by a compiler and
> +#  an assembler.
> +# Usage: $(call check-extension,extension_name,"instr")
> +check-extension = $(call as-insn,$(CC) 
> $(riscv-generic-flags)_$(1),$(2),_$(1))
> +
> +zbb-insn := "andn t0, t0, t0"
> +zbb := $(call check-extension,zbb,$(zbb-insn))
> +zihintpause := $(call check-extension,zihintpause,"pause")

... still not really being happy with this: Either, as said before, zbb-insn
would better be avoided (by using $(comma) as necessary), or you should have
gone yet a step further to fully address my "redundancy" concern. Note how
the two ultimate lines still have 3 (zbb) and 2 (zihintpause) references
respectively, when the goal ought to be to have exactly one. E.g. along the
lines of

$(call check-extension,zbb)
$(call check-extension,zihintpause)

suitably using $(eval ...) (to effect the variable definitions) and defining
both zbb-insn and zihintpause-insn.

But I'll nevertheless put this in as is, unless you tell me you're up to
going the extra step.

Jan

Re: [PATCH v3 12/12] xen/arm: List static shared memory regions as /memory nodes

2024-04-22 Thread Michal Orzel




On 22/04/2024 10:07, Luca Fancellu wrote:
> 
> 
> Hi Michal,
> 
>>> +for ( cells = reg, i = 0; cells < reg + nr_cells; i++, cells += 
>>> reg_size )
>>> +{
>>> +u64 start = dt_read_number(cells, addrcells);
>> We should no longer use Linux derived types like u64. Use uint64_t.
>>
>>> +u64 size = dt_read_number(cells + addrcells, sizecells);
>>> +
>>> +dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
>>> +   i, start, start + size);
>> i is unsigned so the correct format specifier should be %u
> 
> Right, should have been more careful when copying the code from above
> 
>>>
>>> +void __init shm_mem_node_fill_reg_range(const struct kernel_info *kinfo,
>>> +__be32 *reg, int *nr_cells,
>>> +int addrcells, int sizecells)
>>> +{
>>> +const struct membanks *mem = >shm_mem.common;
>>> +unsigned int i;
>>> +__be32 *cells;
>>> +
>>> +BUG_ON(!nr_cells || !reg);
>>> +
>>> +cells = [*nr_cells];
>>> +for ( i = 0; i < mem->nr_banks; i++ )
>>> +{
>>> +u64 start = mem->bank[i].start;
>> ditto
> 
> Will fix, here paddr_t should be ok isn’t it?
yes

> 
>>
>> Rest LGTM:
>> Reviewed-by: Michal Orzel 
> 
> Thanks, I will send the next one shortly.
I don't think there is a need to respin the whole series just for these fixes.
You should wait for the committers opinion.

~Michal

[xen-unstable test] 185754: tolerable FAIL

2024-04-22 Thread osstest service owner

flight 185754 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/185754/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-arm64-arm64-xl  13 debian-fixup   fail pass in 185748

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl 15 migrate-support-check fail in 185748 never pass
 test-arm64-arm64-xl 16 saverestore-support-check fail in 185748 never pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 185748
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185748
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185748
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185748
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185748
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185748
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  09cb855f33fb619c87d8d4250d3a980f568f151b
baseline version:
 xen  09cb855f33fb619c87d8d4250d3a980f568f151b

Last test of basis   185754  2024-04-22 01:53:44 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass

[ovmf test] 185756: all pass - PUSHED

2024-04-22 Thread osstest service owner

flight 185756 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/185756/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf be92e09206c2e4bb388e7c9127f048689841dd01
baseline version:
 ovmf 6780b3aba086395341d8476d43bef5e15c662c3a

Last test of basis   185745  2024-04-20 08:41:14 Z1 days
Testing same since   185756  2024-04-22 03:19:18 Z0 days1 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Min M Xu 
  Min Xu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   6780b3aba0..be92e09206  be92e09206c2e4bb388e7c9127f048689841dd01 -> 
xen-tested-master

Re: [PATCH v2 2/2] livepatch: refuse to resolve symbols that belong to init sections

2024-04-22 Thread Jan Beulich

On 22.04.2024 09:54, Roger Pau Monné wrote:
> On Fri, Apr 19, 2024 at 04:34:41PM +0200, Jan Beulich wrote:
>> On 19.04.2024 12:50, Roger Pau Monné wrote:
>>> On Fri, Apr 19, 2024 at 12:15:19PM +0200, Jan Beulich wrote:
 On 19.04.2024 12:02, Roger Pau Monne wrote:
> Livepatch payloads containing symbols that belong to init sections can 
> only
> lead to page faults later on, as by the time the livepatch is loaded init
> sections have already been freed.
>
> Refuse to resolve such symbols and return an error instead.
>
> Note such resolutions are only relevant for symbols that point to 
> undefined
> sections (SHN_UNDEF), as that implies the symbol is not in the current 
> payload
> and hence must either be a Xen or a different livepatch payload symbol.
>
> Do not allow to resolve symbols that point to __init_begin, as that 
> address is
> also unmapped.  On the other hand, __init_end is not unmapped, and hence 
> allow
> resolutions against it.
>
> Signed-off-by: Roger Pau Monné 
> ---
> Changes since v1:
>  - Fix off-by-one in range checking.

 Which means you addressed Andrew's comment while at the same time extending
 the scope of ...

> @@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct 
> livepatch_elf *elf)
>  break;
>  }
>  }
> +
> +/*
> + * Ensure not an init symbol.  Only applicable to Xen 
> symbols, as
> + * livepatch payloads don't have init sections or equivalent.
> + */
> +else if ( st_value >= (uintptr_t)&__init_begin &&
> +  st_value < (uintptr_t)&__init_end )
> +{
> +printk(XENLOG_ERR LIVEPATCH
> +   "%s: symbol %s is in init section, not 
> resolving\n",
> +   elf->name, elf->sym[i].name);

 ... what I raised concern about (and I had not seen any verbal reply to 
 that,
 I don't think).
>>>
>>> I've extended the commit message to explicitly mention the handling of
>>> bounds for __init_{begin,end} checks.  Let me know if you are not fine
>>> with it (or maybe you expected something else?).
>>
>> Well, you mention the two symbols you care about, but you don't mention
>> at all that these two may alias other symbols which might be legal to
>> reference from a livepatch.
> 
> I'm having a hard time understanding why a livepatch would want to
> reference those, or any symbol in the .init sections for that matter.
> __init_{begin,end} are exclusively used to unmap the init region after
> boot.  What's the point in livepatch referencing data that's no
> longer there?  The only application I would think of is to calculate
> some kind of offsets, but that would better be done using other
> symbols instead.
> 
> Please bear with me, it would be easier for me to understand if you
> could provide a concrete example.

The issue is that you do comparison by address, not by name. Let's assume
that .rodata ends exactly where .init.* starts. Then &__init_begin ==
&_erodata == &__2M_rodata_end. Access to the latter two symbols wants to
be permitted; only __init_begin and whatever ends up being the very first
symbol in .init.* ought to not be referenced.

Worse (but contrived) would be cases of "constructed" symbols derived from
ones ahead of __init_begin, with an offset large enough to actually point
into .init.*. Such are _still_ valid to be used in calculations, as long
as no deref occurs for anything at or past __init_begin.

Jan

Re: [PATCH v3 12/12] xen/arm: List static shared memory regions as /memory nodes

2024-04-22 Thread Luca Fancellu

Hi Michal,

>> +for ( cells = reg, i = 0; cells < reg + nr_cells; i++, cells += 
>> reg_size )
>> +{
>> +u64 start = dt_read_number(cells, addrcells);
> We should no longer use Linux derived types like u64. Use uint64_t.
> 
>> +u64 size = dt_read_number(cells + addrcells, sizecells);
>> +
>> +dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
>> +   i, start, start + size);
> i is unsigned so the correct format specifier should be %u

Right, should have been more careful when copying the code from above

>> 
>> +void __init shm_mem_node_fill_reg_range(const struct kernel_info *kinfo,
>> +__be32 *reg, int *nr_cells,
>> +int addrcells, int sizecells)
>> +{
>> +const struct membanks *mem = >shm_mem.common;
>> +unsigned int i;
>> +__be32 *cells;
>> +
>> +BUG_ON(!nr_cells || !reg);
>> +
>> +cells = [*nr_cells];
>> +for ( i = 0; i < mem->nr_banks; i++ )
>> +{
>> +u64 start = mem->bank[i].start;
> ditto

Will fix, here paddr_t should be ok isn’t it?

> 
> Rest LGTM:
> Reviewed-by: Michal Orzel 

Thanks, I will send the next one shortly.

Cheers,
Luca

Re: [PATCH v3 12/12] xen/arm: List static shared memory regions as /memory nodes

2024-04-22 Thread Michal Orzel

Hi Luca,

On 18/04/2024 09:36, Luca Fancellu wrote:
> 
> 
> Currently Xen is not exporting the static shared memory regions
> to the device tree as /memory node, this commit is fixing this
> issue.
> 
> Given that now make_memory_node needs a parameter 'struct kernel_info'
> in order to call the new function shm_mem_node_fill_reg_range,
> take the occasion to remove the unused struct domain parameter.
> 
> Signed-off-by: Luca Fancellu 
> ---
> v3:
>  - removed previous patch that was merging the intervals, rebase
>changes.
> v2:
>  - try to use make_memory_node, don't add overlapping ranges of
>memory, commit message changed.
> v1:
>  - new patch
> ---
> ---
>  xen/arch/arm/dom0less-build.c   |  2 +-
>  xen/arch/arm/domain_build.c | 34 +
>  xen/arch/arm/include/asm/domain_build.h |  2 +-
>  xen/arch/arm/include/asm/static-shmem.h | 15 +++
>  xen/arch/arm/static-shmem.c | 23 +
>  5 files changed, 63 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
> index 51cf03221d56..74f053c242f4 100644
> --- a/xen/arch/arm/dom0less-build.c
> +++ b/xen/arch/arm/dom0less-build.c
> @@ -642,7 +642,7 @@ static int __init prepare_dtb_domU(struct domain *d, 
> struct kernel_info *kinfo)
>  if ( ret )
>  goto err;
> 
> -ret = make_memory_node(d, kinfo->fdt, addrcells, sizecells,
> +ret = make_memory_node(kinfo, addrcells, sizecells,
> kernel_info_get_mem(kinfo));
>  if ( ret )
>  goto err;
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 68532ddc084c..15f888169c4e 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -756,15 +756,14 @@ int __init domain_fdt_begin_node(void *fdt, const char 
> *name, uint64_t unit)
>  return fdt_begin_node(fdt, buf);
>  }
> 
> -int __init make_memory_node(const struct domain *d,
> -void *fdt,
> -int addrcells, int sizecells,
> -const struct membanks *mem)
> +int __init make_memory_node(const struct kernel_info *kinfo, int addrcells,
> +int sizecells, const struct membanks *mem)
>  {
> +void *fdt = kinfo->fdt;
>  unsigned int i;
>  int res, reg_size = addrcells + sizecells;
>  int nr_cells = 0;
> -__be32 reg[NR_MEM_BANKS * 4 /* Worst case addrcells + sizecells */];
> +__be32 reg[DT_MEM_NODE_REG_RANGE_SIZE];
>  __be32 *cells;
> 
>  if ( mem->nr_banks == 0 )
> @@ -797,14 +796,28 @@ int __init make_memory_node(const struct domain *d,
>  if ( mem->bank[i].type == MEMBANK_STATIC_DOMAIN )
>  continue;
> 
> -dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
> -   i, start, start + size);
> -
>  nr_cells += reg_size;
>  BUG_ON(nr_cells >= ARRAY_SIZE(reg));
>  dt_child_set_range(, addrcells, sizecells, start, size);
>  }
> 
> +/*
> + * static shared memory banks need to be listed as /memory node, so when
> + * this function is handling the normal memory, add the banks.
> + */
> +if ( mem == kernel_info_get_mem(kinfo) )
> +shm_mem_node_fill_reg_range(kinfo, reg, _cells, addrcells,
> +sizecells);
> +
> +for ( cells = reg, i = 0; cells < reg + nr_cells; i++, cells += reg_size 
> )
> +{
> +u64 start = dt_read_number(cells, addrcells);
We should no longer use Linux derived types like u64. Use uint64_t.

> +u64 size = dt_read_number(cells + addrcells, sizecells);
> +
> +dt_dprintk("  Bank %d: %#"PRIx64"->%#"PRIx64"\n",
> +   i, start, start + size);
i is unsigned so the correct format specifier should be %u

> +}
> +
>  dt_dprintk("(reg size %d, nr cells %d)\n", reg_size, nr_cells);
> 
>  res = fdt_property(fdt, "reg", reg, nr_cells * sizeof(*reg));
> @@ -1783,7 +1796,7 @@ static int __init handle_node(struct domain *d, struct 
> kernel_info *kinfo,
>  if ( res )
>  return res;
> 
> -res = make_memory_node(d, kinfo->fdt, addrcells, sizecells,
> +res = make_memory_node(kinfo, addrcells, sizecells,
> kernel_info_get_mem(kinfo));
>  if ( res )
>  return res;
> @@ -1794,8 +1807,7 @@ static int __init handle_node(struct domain *d, struct 
> kernel_info *kinfo,
>   */
>  if ( reserved_mem->nr_banks > 0 )
>  {
> -res = make_memory_node(d, kinfo->fdt, addrcells, sizecells,
> -   reserved_mem);
> +res = make_memory_node(kinfo, addrcells, sizecells, 
> reserved_mem);
>  if ( res )
>  return res;
>  }
> diff --git a/xen/arch/arm/include/asm/domain_build.h 
> b/xen/arch/arm/include/asm/domain_build.h
> index

Re: [PATCH v2 2/2] livepatch: refuse to resolve symbols that belong to init sections

2024-04-22 Thread Roger Pau Monné

On Fri, Apr 19, 2024 at 04:34:41PM +0200, Jan Beulich wrote:
> On 19.04.2024 12:50, Roger Pau Monné wrote:
> > On Fri, Apr 19, 2024 at 12:15:19PM +0200, Jan Beulich wrote:
> >> On 19.04.2024 12:02, Roger Pau Monne wrote:
> >>> Livepatch payloads containing symbols that belong to init sections can 
> >>> only
> >>> lead to page faults later on, as by the time the livepatch is loaded init
> >>> sections have already been freed.
> >>>
> >>> Refuse to resolve such symbols and return an error instead.
> >>>
> >>> Note such resolutions are only relevant for symbols that point to 
> >>> undefined
> >>> sections (SHN_UNDEF), as that implies the symbol is not in the current 
> >>> payload
> >>> and hence must either be a Xen or a different livepatch payload symbol.
> >>>
> >>> Do not allow to resolve symbols that point to __init_begin, as that 
> >>> address is
> >>> also unmapped.  On the other hand, __init_end is not unmapped, and hence 
> >>> allow
> >>> resolutions against it.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>> ---
> >>> Changes since v1:
> >>>  - Fix off-by-one in range checking.
> >>
> >> Which means you addressed Andrew's comment while at the same time extending
> >> the scope of ...
> >>
> >>> @@ -310,6 +311,21 @@ int livepatch_elf_resolve_symbols(struct 
> >>> livepatch_elf *elf)
> >>>  break;
> >>>  }
> >>>  }
> >>> +
> >>> +/*
> >>> + * Ensure not an init symbol.  Only applicable to Xen 
> >>> symbols, as
> >>> + * livepatch payloads don't have init sections or equivalent.
> >>> + */
> >>> +else if ( st_value >= (uintptr_t)&__init_begin &&
> >>> +  st_value < (uintptr_t)&__init_end )
> >>> +{
> >>> +printk(XENLOG_ERR LIVEPATCH
> >>> +   "%s: symbol %s is in init section, not 
> >>> resolving\n",
> >>> +   elf->name, elf->sym[i].name);
> >>
> >> ... what I raised concern about (and I had not seen any verbal reply to 
> >> that,
> >> I don't think).
> > 
> > I've extended the commit message to explicitly mention the handling of
> > bounds for __init_{begin,end} checks.  Let me know if you are not fine
> > with it (or maybe you expected something else?).
> 
> Well, you mention the two symbols you care about, but you don't mention
> at all that these two may alias other symbols which might be legal to
> reference from a livepatch.

I'm having a hard time understanding why a livepatch would want to
reference those, or any symbol in the .init sections for that matter.
__init_{begin,end} are exclusively used to unmap the init region after
boot.  What's the point in livepatch referencing data that's no
longer there?  The only application I would think of is to calculate
some kind of offsets, but that would better be done using other
symbols instead.

Please bear with me, it would be easier for me to understand if you
could provide a concrete example.

Thanks, Roger.

[linux-linus test] 185753: regressions - FAIL

2024-04-22 Thread osstest service owner

flight 185753 linux-linus real [real]
flight 185757 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/185753/
http://logs.test-lab.xenproject.org/osstest/logs/185757/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt-vhd  8 xen-boot fail REGR. vs. 185750

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-credit2   8 xen-bootfail pass in 185757-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-credit2 15 migrate-support-check fail in 185757 never pass
 test-armhf-armhf-xl-credit2 16 saverestore-support-check fail in 185757 never 
pass
 test-armhf-armhf-libvirt  8 xen-boot fail  like 185750
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185750
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185750
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185750
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185750
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185750
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass

version targeted for testing:
 linuxed30a4a51bb196781c8058073ea720133a65596f
baseline version:
 linux977b1ef51866aa7170409af80740788d4f9c4841

Last test of basis   185750  2024-04-21 12:13:57 Z0 days
Testing same since   185753  2024-04-21 22:44:09 Z0 days1 attempts


People who touched revisions under test:
  Alan Stern 
  Alexander Usyskin 
  Andy Shevchenko 
  Andy Shevchenko 
  AngeloGioacchino Del Regno 
  bolan wang 
  Borislav Petkov (AMD) 
  Carlos Llamas 
  Christian A. Ehrhardt 
  Chuanhong Guo 
  Coia Prant 
  Dan Carpenter 
  Daniele Palmas 
  Dave Hansen 
  Dave Hansen  # for x86
  Emil Kronborg 
  Eric Biggers 
  Fabio Estevam 
  Finn Thain 
  Georgi Djakov 
  Gil Fine 
  Greg Kroah-Hartman 
  H. Peter Anvin (Intel) 
  Hans de Goede 
  Hou Wenlong 
  Ingo

[XEN PATCH v2 4/5] xen/arm: allow dynamically assigned SGI handlers

2024-04-22 Thread Jens Wiklander

Updates so request_irq() can be used with a dynamically assigned SGI irq
as input. This prepares for a later patch where an FF-A schedule
receiver interrupt handler is installed for an SGI generated by the
secure world.

gic_route_irq_to_xen() don't gic_set_irq_type() for SGIs since they are
always edge triggered.

gic_interrupt() is updated to route the dynamically assigned SGIs to
do_IRQ() instead of do_sgi(). The latter still handles the statically
assigned SGI handlers like for instance GIC_SGI_CALL_FUNCTION.

Signed-off-by: Jens Wiklander 
---
v1->v2
- Update patch description as requested
---
 xen/arch/arm/gic.c | 5 +++--
 xen/arch/arm/irq.c | 7 +--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 44c40e86defe..e9aeb7138455 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -117,7 +117,8 @@ void gic_route_irq_to_xen(struct irq_desc *desc, unsigned 
int priority)
 
 desc->handler = gic_hw_ops->gic_host_irq_type;
 
-gic_set_irq_type(desc, desc->arch.type);
+if ( desc->irq >= NR_GIC_SGI)
+gic_set_irq_type(desc, desc->arch.type);
 gic_set_irq_priority(desc, priority);
 }
 
@@ -375,7 +376,7 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
 /* Reading IRQ will ACK it */
 irq = gic_hw_ops->read_irq();
 
-if ( likely(irq >= 16 && irq < 1020) )
+if ( likely(irq >= GIC_SGI_MAX && irq < 1020) )
 {
 isb();
 do_IRQ(regs, irq, is_fiq);
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index bcce80a4d624..fdb214560978 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -224,9 +224,12 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, 
int is_fiq)
 
 perfc_incr(irqs);
 
-ASSERT(irq >= 16); /* SGIs do not come down this path */
+/* Statically assigned SGIs do not come down this path */
+ASSERT(irq >= GIC_SGI_MAX);
 
-if ( irq < 32 )
+if ( irq < NR_GIC_SGI )
+perfc_incr(ipis);
+else if ( irq < NR_GIC_LOCAL_IRQS )
 perfc_incr(ppis);
 else
 perfc_incr(spis);
-- 
2.34.1

[XEN PATCH v2 5/5] xen/arm: ffa: support notification

2024-04-22 Thread Jens Wiklander

Add support for FF-A notifications, currently limited to an SP (Secure
Partition) sending an asynchronous notification to a guest.

Guests and Xen itself are made aware of pending notifications with an
interrupt. The interrupt handler retrieves the notifications using the
FF-A ABI and deliver them to their destinations.

Update ffa_partinfo_domain_init() to return error code like
ffa_notif_domain_init().

Signed-off-by: Jens Wiklander 
---
v1->v2:
- Addressing review comments
- Change ffa_handle_notification_{bind,unbind,set}() to take struct
  cpu_user_regs *regs as argument.
- Update ffa_partinfo_domain_init() and ffa_notif_domain_init() to return
  an error code.
- Fixing a bug in handle_features() for FFA_FEATURE_SCHEDULE_RECV_INTR.
---
 xen/arch/arm/tee/Makefile   |   1 +
 xen/arch/arm/tee/ffa.c  |  55 +-
 xen/arch/arm/tee/ffa_notif.c| 331 
 xen/arch/arm/tee/ffa_partinfo.c |   9 +-
 xen/arch/arm/tee/ffa_private.h  |  68 ++-
 5 files changed, 457 insertions(+), 7 deletions(-)
 create mode 100644 xen/arch/arm/tee/ffa_notif.c

diff --git a/xen/arch/arm/tee/Makefile b/xen/arch/arm/tee/Makefile
index f0112a2f922d..7c0f46f7f446 100644
--- a/xen/arch/arm/tee/Makefile
+++ b/xen/arch/arm/tee/Makefile
@@ -2,5 +2,6 @@ obj-$(CONFIG_FFA) += ffa.o
 obj-$(CONFIG_FFA) += ffa_shm.o
 obj-$(CONFIG_FFA) += ffa_partinfo.o
 obj-$(CONFIG_FFA) += ffa_rxtx.o
+obj-$(CONFIG_FFA) += ffa_notif.o
 obj-y += tee.o
 obj-$(CONFIG_OPTEE) += optee.o
diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
index 5209612963e1..aa171c0b61f0 100644
--- a/xen/arch/arm/tee/ffa.c
+++ b/xen/arch/arm/tee/ffa.c
@@ -39,6 +39,9 @@
  *   - at most 32 shared memory regions per guest
  * o FFA_MSG_SEND_DIRECT_REQ:
  *   - only supported from a VM to an SP
+ * o FFA_NOTIFICATION_*:
+ *   - only supports global notifications, that is, per vCPU notifications
+ * are not supported
  *
  * There are some large locked sections with ffa_tx_buffer_lock and
  * ffa_rx_buffer_lock. Especially the ffa_tx_buffer_lock spinlock used
@@ -194,6 +197,8 @@ out:
 
 static void handle_features(struct cpu_user_regs *regs)
 {
+struct domain *d = current->domain;
+struct ffa_ctx *ctx = d->arch.tee;
 uint32_t a1 = get_user_reg(regs, 1);
 unsigned int n;
 
@@ -240,6 +245,30 @@ static void handle_features(struct cpu_user_regs *regs)
 BUILD_BUG_ON(PAGE_SIZE != FFA_PAGE_SIZE);
 ffa_set_regs_success(regs, 0, 0);
 break;
+case FFA_FEATURE_NOTIF_PEND_INTR:
+if ( ctx->notif.enabled )
+ffa_set_regs_success(regs, FFA_NOTIF_PEND_INTR_ID, 0);
+else
+ffa_set_regs_error(regs, FFA_RET_NOT_SUPPORTED);
+break;
+case FFA_FEATURE_SCHEDULE_RECV_INTR:
+if ( ctx->notif.enabled )
+ffa_set_regs_success(regs, FFA_SCHEDULE_RECV_INTR_ID, 0);
+else
+ffa_set_regs_error(regs, FFA_RET_NOT_SUPPORTED);
+break;
+
+case FFA_NOTIFICATION_BIND:
+case FFA_NOTIFICATION_UNBIND:
+case FFA_NOTIFICATION_GET:
+case FFA_NOTIFICATION_SET:
+case FFA_NOTIFICATION_INFO_GET_32:
+case FFA_NOTIFICATION_INFO_GET_64:
+if ( ctx->notif.enabled )
+ffa_set_regs_success(regs, 0, 0);
+else
+ffa_set_regs_error(regs, FFA_RET_NOT_SUPPORTED);
+break;
 default:
 ffa_set_regs_error(regs, FFA_RET_NOT_SUPPORTED);
 break;
@@ -305,6 +334,22 @@ static bool ffa_handle_call(struct cpu_user_regs *regs)
  get_user_reg(regs, 1)),
get_user_reg(regs, 3));
 break;
+case FFA_NOTIFICATION_BIND:
+e = ffa_handle_notification_bind(regs);
+break;
+case FFA_NOTIFICATION_UNBIND:
+e = ffa_handle_notification_unbind(regs);
+break;
+case FFA_NOTIFICATION_INFO_GET_32:
+case FFA_NOTIFICATION_INFO_GET_64:
+ffa_handle_notification_info_get(regs);
+return true;
+case FFA_NOTIFICATION_GET:
+ffa_handle_notification_get(regs);
+return true;
+case FFA_NOTIFICATION_SET:
+e = ffa_handle_notification_set(regs);
+break;
 
 default:
 gprintk(XENLOG_ERR, "ffa: unhandled fid 0x%x\n", fid);
@@ -322,6 +367,7 @@ static bool ffa_handle_call(struct cpu_user_regs *regs)
 static int ffa_domain_init(struct domain *d)
 {
 struct ffa_ctx *ctx;
+int ret;
 
 if ( !ffa_version )
 return -ENODEV;
@@ -345,10 +391,11 @@ static int ffa_domain_init(struct domain *d)
  * error, so no need for cleanup in this function.
  */
 
-if ( !ffa_partinfo_domain_init(d) )
-return -EIO;
+ret = ffa_partinfo_domain_init(d);
+if ( ret )
+return ret;
 
-return 0;
+return ffa_notif_domain_init(d);
 }
 
 static void ffa_domain_teardown_continue(struct ffa_ctx *ctx, bool first_time)
@@ -423,6 +470,7 @@ static int ffa_domain_teardown(struct domain *d)

[XEN PATCH v2 2/5] xen/arm: ffa: use ACCESS_ONCE()

2024-04-22 Thread Jens Wiklander

Replace read_atomic() with ACCESS_ONCE() to match the intended use, that
is, to prevent the compiler from (via optimization) reading shared
memory more than once.

Signed-off-by: Jens Wiklander 
Reviewed-by: Bertrand Marquis 
---
 xen/arch/arm/tee/ffa_shm.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/tee/ffa_shm.c b/xen/arch/arm/tee/ffa_shm.c
index eed9ad2d2986..75a5b66aeb4c 100644
--- a/xen/arch/arm/tee/ffa_shm.c
+++ b/xen/arch/arm/tee/ffa_shm.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -171,8 +172,8 @@ static int get_shm_pages(struct domain *d, struct 
ffa_shm_mem *shm,
 
 for ( n = 0; n < range_count; n++ )
 {
-page_count = read_atomic([n].page_count);
-addr = read_atomic([n].address);
+page_count = ACCESS_ONCE(range[n].page_count);
+addr = ACCESS_ONCE(range[n].address);
 for ( m = 0; m < page_count; m++ )
 {
 if ( pg_idx >= shm->page_count )
@@ -527,13 +528,13 @@ void ffa_handle_mem_share(struct cpu_user_regs *regs)
 goto out_unlock;
 
 mem_access = ctx->tx + trans.mem_access_offs;
-if ( read_atomic(_access->access_perm.perm) != FFA_MEM_ACC_RW )
+if ( ACCESS_ONCE(mem_access->access_perm.perm) != FFA_MEM_ACC_RW )
 {
 ret = FFA_RET_NOT_SUPPORTED;
 goto out_unlock;
 }
 
-region_offs = read_atomic(_access->region_offs);
+region_offs = ACCESS_ONCE(mem_access->region_offs);
 if ( sizeof(*region_descr) + region_offs > frag_len )
 {
 ret = FFA_RET_NOT_SUPPORTED;
@@ -541,8 +542,8 @@ void ffa_handle_mem_share(struct cpu_user_regs *regs)
 }
 
 region_descr = ctx->tx + region_offs;
-range_count = read_atomic(_descr->address_range_count);
-page_count = read_atomic(_descr->total_page_count);
+range_count = ACCESS_ONCE(region_descr->address_range_count);
+page_count = ACCESS_ONCE(region_descr->total_page_count);
 
 if ( !page_count )
 {
@@ -557,7 +558,7 @@ void ffa_handle_mem_share(struct cpu_user_regs *regs)
 goto out_unlock;
 }
 shm->sender_id = trans.sender_id;
-shm->ep_id = read_atomic(_access->access_perm.endpoint_id);
+shm->ep_id = ACCESS_ONCE(mem_access->access_perm.endpoint_id);
 
 /*
  * Check that the Composite memory region descriptor fits.
-- 
2.34.1

[XEN PATCH v2 0/5] FF-A notifications

2024-04-22 Thread Jens Wiklander

Hi,

This patch set adds support for FF-A notifications. We only support
global notifications, per vCPU notifications remain unsupported.

The first three patches are further cleanup and can be merged before the
rest if desired.

A physical SGI is used to make Xen aware of pending FF-A notifications. The
physical SGI is selected by the SPMC in the secure world. Since it must not
already be used by Xen the SPMC is in practice forced to donate one of the
secure SGIs, but that's normally not a problem. The SGI handling in Xen is
updated to support registration of handlers for SGIs that aren't statically
assigned, that is, SGI IDs above GIC_SGI_MAX.

Thanks,
Jens

v1->v2:
- "xen/arm: ffa: support notification" and
  "xen/arm: allow dynamically assigned SGI handlers" updated as requestsed,
  details in each patch.
- Added Bertrands R-B for "xen/arm: ffa: refactor ffa_handle_call()",
  "xen/arm: ffa: use ACCESS_ONCE()", and
  "xen/arm: ffa: simplify ffa_handle_mem_share()"

Jens Wiklander (5):
  xen/arm: ffa: refactor ffa_handle_call()
  xen/arm: ffa: use ACCESS_ONCE()
  xen/arm: ffa: simplify ffa_handle_mem_share()
  xen/arm: allow dynamically assigned SGI handlers
  xen/arm: ffa: support notification

 xen/arch/arm/gic.c  |   5 +-
 xen/arch/arm/irq.c  |   7 +-
 xen/arch/arm/tee/Makefile   |   1 +
 xen/arch/arm/tee/ffa.c  |  83 +---
 xen/arch/arm/tee/ffa_notif.c| 331 
 xen/arch/arm/tee/ffa_partinfo.c |   9 +-
 xen/arch/arm/tee/ffa_private.h  |  68 ++-
 xen/arch/arm/tee/ffa_shm.c  |  33 ++--
 8 files changed, 488 insertions(+), 49 deletions(-)
 create mode 100644 xen/arch/arm/tee/ffa_notif.c

-- 
2.34.1

[XEN PATCH v2 3/5] xen/arm: ffa: simplify ffa_handle_mem_share()

2024-04-22 Thread Jens Wiklander

Simplify ffa_handle_mem_share() by removing the start_page_idx and
last_page_idx parameters from get_shm_pages() and check that the number
of pages matches expectations at the end of get_shm_pages().

Signed-off-by: Jens Wiklander 
Reviewed-by: Bertrand Marquis 
---
 xen/arch/arm/tee/ffa_shm.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/tee/ffa_shm.c b/xen/arch/arm/tee/ffa_shm.c
index 75a5b66aeb4c..370d83ec5cf8 100644
--- a/xen/arch/arm/tee/ffa_shm.c
+++ b/xen/arch/arm/tee/ffa_shm.c
@@ -159,10 +159,9 @@ static int32_t ffa_mem_reclaim(uint32_t handle_lo, 
uint32_t handle_hi,
  */
 static int get_shm_pages(struct domain *d, struct ffa_shm_mem *shm,
  const struct ffa_address_range *range,
- uint32_t range_count, unsigned int start_page_idx,
- unsigned int *last_page_idx)
+ uint32_t range_count)
 {
-unsigned int pg_idx = start_page_idx;
+unsigned int pg_idx = 0;
 gfn_t gfn;
 unsigned int n;
 unsigned int m;
@@ -191,7 +190,9 @@ static int get_shm_pages(struct domain *d, struct 
ffa_shm_mem *shm,
 }
 }
 
-*last_page_idx = pg_idx;
+/* The ranges must add up */
+if ( pg_idx < shm->page_count )
+return FFA_RET_INVALID_PARAMETERS;
 
 return FFA_RET_OK;
 }
@@ -460,7 +461,6 @@ void ffa_handle_mem_share(struct cpu_user_regs *regs)
 struct domain *d = current->domain;
 struct ffa_ctx *ctx = d->arch.tee;
 struct ffa_shm_mem *shm = NULL;
-unsigned int last_page_idx = 0;
 register_t handle_hi = 0;
 register_t handle_lo = 0;
 int ret = FFA_RET_DENIED;
@@ -570,15 +570,9 @@ void ffa_handle_mem_share(struct cpu_user_regs *regs)
 goto out;
 }
 
-ret = get_shm_pages(d, shm, region_descr->address_range_array, range_count,
-0, _page_idx);
+ret = get_shm_pages(d, shm, region_descr->address_range_array, 
range_count);
 if ( ret )
 goto out;
-if ( last_page_idx != shm->page_count )
-{
-ret = FFA_RET_INVALID_PARAMETERS;
-goto out;
-}
 
 /* Note that share_shm() uses our tx buffer */
 spin_lock(_tx_buffer_lock);
-- 
2.34.1

[XEN PATCH v2 1/5] xen/arm: ffa: refactor ffa_handle_call()

2024-04-22 Thread Jens Wiklander

Refactors the large switch block in ffa_handle_call() to use common code
for the simple case where it's either an error code or success with no
further parameters.

Signed-off-by: Jens Wiklander 
Reviewed-by: Bertrand Marquis 
---
 xen/arch/arm/tee/ffa.c | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
index 8665201e34a9..5209612963e1 100644
--- a/xen/arch/arm/tee/ffa.c
+++ b/xen/arch/arm/tee/ffa.c
@@ -273,18 +273,10 @@ static bool ffa_handle_call(struct cpu_user_regs *regs)
 case FFA_RXTX_MAP_64:
 e = ffa_handle_rxtx_map(fid, get_user_reg(regs, 1),
get_user_reg(regs, 2), get_user_reg(regs, 3));
-if ( e )
-ffa_set_regs_error(regs, e);
-else
-ffa_set_regs_success(regs, 0, 0);
-return true;
+break;
 case FFA_RXTX_UNMAP:
 e = ffa_handle_rxtx_unmap();
-if ( e )
-ffa_set_regs_error(regs, e);
-else
-ffa_set_regs_success(regs, 0, 0);
-return true;
+break;
 case FFA_PARTITION_INFO_GET:
 e = ffa_handle_partition_info_get(get_user_reg(regs, 1),
   get_user_reg(regs, 2),
@@ -299,11 +291,7 @@ static bool ffa_handle_call(struct cpu_user_regs *regs)
 return true;
 case FFA_RX_RELEASE:
 e = ffa_handle_rx_release();
-if ( e )
-ffa_set_regs_error(regs, e);
-else
-ffa_set_regs_success(regs, 0, 0);
-return true;
+break;
 case FFA_MSG_SEND_DIRECT_REQ_32:
 case FFA_MSG_SEND_DIRECT_REQ_64:
 handle_msg_send_direct_req(regs, fid);
@@ -316,17 +304,19 @@ static bool ffa_handle_call(struct cpu_user_regs *regs)
 e = ffa_handle_mem_reclaim(regpair_to_uint64(get_user_reg(regs, 2),
  get_user_reg(regs, 1)),
get_user_reg(regs, 3));
-if ( e )
-ffa_set_regs_error(regs, e);
-else
-ffa_set_regs_success(regs, 0, 0);
-return true;
+break;
 
 default:
 gprintk(XENLOG_ERR, "ffa: unhandled fid 0x%x\n", fid);
 ffa_set_regs_error(regs, FFA_RET_NOT_SUPPORTED);
 return true;
 }
+
+if ( e )
+ffa_set_regs_error(regs, e);
+else
+ffa_set_regs_success(regs, 0, 0);
+return true;
 }
 
 static int ffa_domain_init(struct domain *d)
-- 
2.34.1

Re: Detecting whether dom0 is in a VM

2024-04-22 Thread Jan Beulich

On 19.04.2024 17:29, George Dunlap wrote:
> On Fri, Jul 7, 2023 at 3:56 PM George Dunlap  wrote:
>> Xen's public interface offers access to the featuresets known / found /
>> used by the hypervisor. See XEN_SYSCTL_get_cpu_featureset, accessible
>> via xc_get_cpu_featureset().
>>
>
> Are any of these exposed in dom0 via sysctl, or hypfs?

 sysctl - yes (as the quoted name also says). hypfs no, afaict.

>  SYSCTLs are
> unfortunately not stable interfaces, correct?  So it wouldn't be practical
> for systemd to use them.

 Indeed, neither sysctl-s nor the libxc interfaces are stable.
>>>
>>> Thinking of it, xen-cpuid is a wrapper tool around those. They may want
>>> to look at its output (and, if they want to use it, advise distros to
>>> also package it), which I think we try to keep reasonably stable,
>>> albeit without providing any guarantees.
>>
>>
>> We haven't had any clear guidance yet on what the systemd team want in the 
>>  question; I just sort of assumed they 
>> wanted the L-1 virtualization *if possible*.  It sounds like `vm-other` 
>> would be acceptable, particularly as a fall-back output if there's no way to 
>> get Xen's picture of the cpuid.
>>
>> It looks like xen-cpuid is available on Fedora, Debian, Ubuntu, and the old 
>> Virt SIG CentOS packages; so I'd expect most packages to follow suit.  
>> That's a place to start.
>>
>> Just to take the discussion all the way to its conclusion:
>>
>> - Supposing xen-cpuid isn't available, is there any other way to tell if Xen 
>> is running in a VM from dom0?
>>
>> - Would it make sense to expose that information somewhere, either in sysfs 
>> or in hypfs (or both?), so that eventually even systems which may not get 
>> the memo about packaging xen-cpuid will get support (or if the systemd guys 
>> would rather avoid executing another process if possible)?
> 
> Resurrecting this thread.
> 
> To recap:
> 
> Currently, `systemd-detect-virt` has the following  input / output table:
> 
> 1. Xen on real hardware, domU: xen
> 2. Xen on real hardware, dom0: vm-other
> 3. Xen in a VM, domU: xen
> 4. Xen in aVM, dom0: vm-other
> 
> It's the dom0 lines (2 and 4) which are problematic; we'd ideally like
> those to be `none` and `vm-other` (or the actual value, like `xen` or
> `kvm`).
> 
> It looks like ATM, /proc/xen/capabilities will contain `control_d` if
> it's a dom0.  Simply unilaterally returning `none` if
> /proc/xen/capabilities contains `control_d` would correct the vast
> majority of instances (since the number of instances of Xen on real
> hardware is presumably higher than the number running virtualized).
> 
> Is /proc/xen/capabilities expected to stay around?  I don't see
> anything equivalent in /dev/xen.

I believe it's intended to stay around, but a definitive answer can only
come from Linux folks. Jürgen, Stefano?

Jan

> Other than adding a new interface to Xen, is there any way to tell if
> Xen is running in a VM?  If we do need to expose an interface, what
> would be the best way to do that?
> 
>  -George

Re: Detecting whether dom0 is in a VM

2024-04-22 Thread Jan Beulich

On 19.04.2024 17:51, George Dunlap wrote:
> On Fri, Apr 19, 2024 at 4:29 PM George Dunlap  wrote:
>>
>> On Fri, Jul 7, 2023 at 3:56 PM George Dunlap  wrote:
>>> Xen's public interface offers access to the featuresets known / found /
>>> used by the hypervisor. See XEN_SYSCTL_get_cpu_featureset, accessible
>>> via xc_get_cpu_featureset().
>>>
>>
>> Are any of these exposed in dom0 via sysctl, or hypfs?
>
> sysctl - yes (as the quoted name also says). hypfs no, afaict.
>
>>  SYSCTLs are
>> unfortunately not stable interfaces, correct?  So it wouldn't be 
>> practical
>> for systemd to use them.
>
> Indeed, neither sysctl-s nor the libxc interfaces are stable.

 Thinking of it, xen-cpuid is a wrapper tool around those. They may want
 to look at its output (and, if they want to use it, advise distros to
 also package it), which I think we try to keep reasonably stable,
 albeit without providing any guarantees.
>>>
>>>
>>> We haven't had any clear guidance yet on what the systemd team want in the 
>>>  question; I just sort of assumed they 
>>> wanted the L-1 virtualization *if possible*.  It sounds like `vm-other` 
>>> would be acceptable, particularly as a fall-back output if there's no way 
>>> to get Xen's picture of the cpuid.
>>>
>>> It looks like xen-cpuid is available on Fedora, Debian, Ubuntu, and the old 
>>> Virt SIG CentOS packages; so I'd expect most packages to follow suit.  
>>> That's a place to start.
>>>
>>> Just to take the discussion all the way to its conclusion:
>>>
>>> - Supposing xen-cpuid isn't available, is there any other way to tell if 
>>> Xen is running in a VM from dom0?
>>>
>>> - Would it make sense to expose that information somewhere, either in sysfs 
>>> or in hypfs (or both?), so that eventually even systems which may not get 
>>> the memo about packaging xen-cpuid will get support (or if the systemd guys 
>>> would rather avoid executing another process if possible)?
>>
>> Resurrecting this thread.
>>
>> To recap:
>>
>> Currently, `systemd-detect-virt` has the following  input / output table:
>>
>> 1. Xen on real hardware, domU: xen
>> 2. Xen on real hardware, dom0: vm-other
>> 3. Xen in a VM, domU: xen
>> 4. Xen in aVM, dom0: vm-other
>>
>> It's the dom0 lines (2 and 4) which are problematic; we'd ideally like
>> those to be `none` and `vm-other` (or the actual value, like `xen` or
>> `kvm`).
>>
>> It looks like ATM, /proc/xen/capabilities will contain `control_d` if
>> it's a dom0.  Simply unilaterally returning `none` if
>> /proc/xen/capabilities contains `control_d` would correct the vast
>> majority of instances (since the number of instances of Xen on real
>> hardware is presumably higher than the number running virtualized).
>>
>> Is /proc/xen/capabilities expected to stay around?  I don't see
>> anything equivalent in /dev/xen.
>>
>> Other than adding a new interface to Xen, is there any way to tell if
>> Xen is running in a VM?  If we do need to expose an interface, what
>> would be the best way to do that?
> 
> Actually, the entire code for detection is here:
> 
> https://github.com/systemd/systemd/blob/f5fefec786e35ef7606e2b14e2530878b74ca879/src/basic/virt.c
> 
> The core issue seems to be this bit:
> 
> /* Detect from CPUID */
> v = detect_vm_cpuid();
> if (v < 0)
> return v;
> if (v == VIRTUALIZATION_VM_OTHER)
> other = true;
> else if (v != VIRTUALIZATION_NONE)
> goto finish;
> 
> /* If we are in Dom0 and have not yet finished, finish with
> the result of detect_vm_cpuid */
> if (xen_dom0 > 0)
> goto finish;
> 
> Basically, the code expects that dom0 will be able to read the raw
> virtualization CPUID leaves, and that the result will be
> VIRTUALIZATION_NONE on real hardware.
> 
> But in fact, the result appears to be VIRTUALIZATION_VM_OTHER -- which
> would mean that 1) the CPUID instruction for reading CPUID leaf 0x1
> succeeded, 2) the 'hypervisor' bit was set, but 3) the signature at
> the hypervisor information leaf (0x4000) didn't match any
> hypervisor (including XenVMMXenVMM, which it's looking for).

Which in turn suggests that they (a) issue plain CPUID (not the "Xen
escaped" form of it) and (b) there's no CPUID faulting available in the
underlying hardware (or the Xen version used is too old). Since (b) is
nothing userspace can do anything about, using the "Xen escaped" form
of CPUID may be the way to go in order to actually see the hypervisor
leaves. Beyond that ...

> What do we do in regard to these for dom0?

... I'm afraid I have no good idea at the moment.

Jan