Re: [PATCH for-4.19 v3 2/3] xen: enable altp2m at create domain domctl

2024-05-22 Thread Jan Beulich
On 22.05.2024 18:21, Roger Pau Monné wrote:
> On Wed, May 22, 2024 at 03:34:29PM +0200, Jan Beulich wrote:
>> On 22.05.2024 15:16, Roger Pau Monné wrote:
>>> On Tue, May 21, 2024 at 12:30:32PM +0200, Jan Beulich wrote:
 On 17.05.2024 15:33, Roger Pau Monne wrote:
> Enabling it using an HVM param is fragile, and complicates the logic when
> deciding whether options that interact with altp2m can also be enabled.
>
> Leave the HVM param value for consumption by the guest, but prevent it 
> from
> being set.  Enabling is now done using and additional altp2m specific 
> field in
> xen_domctl_createdomain.
>
> Note that albeit only currently implemented in x86, altp2m could be 
> implemented
> in other architectures, hence why the field is added to 
> xen_domctl_createdomain
> instead of xen_arch_domainconfig.
>
> Signed-off-by: Roger Pau Monné 

 Reviewed-by: Jan Beulich  # hypervisor
 albeit with one question:

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -637,6 +637,8 @@ int arch_sanitise_domain_config(struct 
> xen_domctl_createdomain *config)
>  bool hap = config->flags & XEN_DOMCTL_CDF_hap;
>  bool nested_virt = config->flags & XEN_DOMCTL_CDF_nested_virt;
>  unsigned int max_vcpus;
> +unsigned int altp2m_mode = MASK_EXTR(config->altp2m_opts,
> + XEN_DOMCTL_ALTP2M_mode_mask);
>  
>  if ( hvm ? !hvm_enabled : !IS_ENABLED(CONFIG_PV) )
>  {
> @@ -715,6 +717,26 @@ int arch_sanitise_domain_config(struct 
> xen_domctl_createdomain *config)
>  return -EINVAL;
>  }
>  
> +if ( config->altp2m_opts & ~XEN_DOMCTL_ALTP2M_mode_mask )
> +{
> +dprintk(XENLOG_INFO, "Invalid altp2m options selected: %#x\n",
> +config->flags);
> +return -EINVAL;
> +}
> +
> +if ( altp2m_mode && nested_virt )
> +{
> +dprintk(XENLOG_INFO,
> +"Nested virt and altp2m are not supported together\n");
> +return -EINVAL;
> +}
> +
> +if ( altp2m_mode && !hap )
> +{
> +dprintk(XENLOG_INFO, "altp2m is only supported with HAP\n");
> +return -EINVAL;
> +}

 Should this last one perhaps be further extended to permit altp2m with EPT
 only?
>>>
>>> Hm, yes, that would be more accurate as:
>>>
>>> if ( altp2m_mode && (!hap || !hvm_altp2m_supported()) )
>>
>> Wouldn't
>>
>>if ( altp2m_mode && !hvm_altp2m_supported() )
>>
>> suffice? hvm_funcs.caps.altp2m is not supposed to be set when no HAP,
>> as long as HAP continues to be a pre-condition?
> 
> No, `hap` here signals whether the domain is using HAP, and we need to
> take this int account, otherwise we would allow enabling altp2m for
> domains using shadow.

Oh, right. But then the original for is good enough HAP-wise, as a request
to use HAP when HAP isn't available is deal with elsewhere. The
!hvm_altp2m_supported() is still wanted imo (for there potentially being
other restrictions), but then in a separate check, not resulting in a HAP-
specific log message. I'll commit the patch in its original form, and that
further addition can then be an incremental change.

Jan



Re: [PATCH 5.10] x86/xen: Drop USERGS_SYSRET64 paravirt call

2024-05-22 Thread Greg Kroah-Hartman
On Wed, May 22, 2024 at 06:20:15PM -0700, Pawan Gupta wrote:
> From: Juergen Gross 
> 
> commit afd30525a659ac0ae0904f0cb4a2ca75522c3123 upstream.

Now queued up, thanks.

greg k-h



[linux-linus test] 186072: regressions - FAIL

2024-05-22 Thread osstest service owner
flight 186072 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186072/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 186052
 build-amd64   6 xen-buildfail REGR. vs. 186052
 build-i3866 xen-buildfail REGR. vs. 186052
 test-armhf-armhf-xl   8 xen-boot fail REGR. vs. 186052
 build-i386-xsm6 xen-buildfail REGR. vs. 186052
 test-armhf-armhf-libvirt-vhd  8 xen-boot fail REGR. vs. 186052

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-raw   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qemut-ws16-amd64  1 build-check(1) blocked n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine-bios  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine-uefi  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt   16 saverestore-support-check fail blocked in 186052
 test-armhf-armhf-xl-qcow2 8 xen-boot fail  like 186052
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkf

[xen-4.17-testing bisection] complete build-i386-xsm

2024-05-22 Thread osstest service owner
branch xen-4.17-testing
xenbranch xen-4.17-testing
job build-i386-xsm
testid xen-build

Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git

*** Found and reproduced problem changeset ***

  Bug is in tree:  ovmf git://xenbits.xen.org/osstest/ovmf.git
  Bug introduced:  750d763623fd1ff4a69d2e350310333dcbc19d4f
  Bug not present: c3f615a1bd7d64f42e7962f5a4d53f1f1a4423e6
  Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/186102/


  commit 750d763623fd1ff4a69d2e350310333dcbc19d4f
  Author: Wenxing Hou 
  Date:   Thu Apr 18 17:28:15 2024 +0800
  
  SecurityPkg: add DeviceSecurity support
  
  This patch implement the SpdmSecurityLib,
  which is the core of DeviceSecurity.
  And the SpdmSecurityLib include Device Authentication and Measurement.
  The other library is to support SpdmSecurityLib.
  
  Cc: Jiewen Yao 
  Signed-off-by: Wenxing Hou 
  Reviewed-by: Jiewen Yao 


For bisection revision-tuple graph see:
   
http://logs.test-lab.xenproject.org/osstest/results/bisect/xen-4.17-testing/build-i386-xsm.xen-build.html
Revision IDs in each graph node refer, respectively, to the Trees above.


Running cs-bisection-step 
--graph-out=/home/logs/results/bisect/xen-4.17-testing/build-i386-xsm.xen-build 
--summary-out=tmp/186102.bisection-summary --basis-template=185864 
--blessings=real,real-bisect,real-retry xen-4.17-testing build-i386-xsm 
xen-build
Searching for failure / basis pass:
 186069 fail [host=pinot0] / 186063 [host=albana0] 185864 [host=pinot1] 185711 
[host=nobling0] 185446 [host=huxelrebe0] 185400 [host=italia0] 185318 
[host=huxelrebe0] 185300 [host=italia0] 185284 [host=huxelrebe0] 185217 ok.
Failure / basis pass flights: 186069 / 185217
(tree with no url: minios)
Tree: ovmf git://xenbits.xen.org/osstest/ovmf.git
Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: seabios git://xenbits.xen.org/osstest/seabios.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 7142e648416ff5d3eac6c6d607874805f5de0ca8 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ffb451126550b22b43b62fb8731a0d78e3376c03 
e5f2e4c69643bc3cd385306a9e5d29e11578148c 
3c7c9225ffa5605bf0603f9dd1666f3f786e2c44
Basis pass 8f698f0a646124ede518d3e255ef725de1239639 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ffb451126550b22b43b62fb8731a0d78e3376c03 
1588fd1437960d94cadc30c42243671e8c0f1281 
9bc40dbcf9eafccc1923b2555286bf6a2af03b7a
Generating revisions with ./adhoc-revtuple-generator  
git://xenbits.xen.org/osstest/ovmf.git#8f698f0a646124ede518d3e255ef725de1239639-7142e648416ff5d3eac6c6d607874805f5de0ca8
 
git://xenbits.xen.org/qemu-xen-traditional.git#3d273dd05e51e5a1ffba3d98c7437ee84e8f8764-3d273dd05e51e5a1ffba3d98c7437ee84e8f8764
 
git://xenbits.xen.org/qemu-xen.git#ffb451126550b22b43b62fb8731a0d78e3376c03-ffb451126550b22b43b62fb8731a0d78e3376c03
 
git://xenbits.xen.org/osstest/seabios.git#1588fd1437960d94cadc30c42243671e8c0f1\
 281-e5f2e4c69643bc3cd385306a9e5d29e11578148c 
git://xenbits.xen.org/xen.git#9bc40dbcf9eafccc1923b2555286bf6a2af03b7a-3c7c9225ffa5605bf0603f9dd1666f3f786e2c44
Loaded 12696 nodes in revision graph
Searching for test results:
 185217 pass 8f698f0a646124ede518d3e255ef725de1239639 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ffb451126550b22b43b62fb8731a0d78e3376c03 
1588fd1437960d94cadc30c42243671e8c0f1281 
9bc40dbcf9eafccc1923b2555286bf6a2af03b7a
 185284 [host=huxelrebe0]
 185300 [host=italia0]
 185318 [host=huxelrebe0]
 185400 [host=italia0]
 185446 [host=huxelrebe0]
 185494 [host=huxelrebe0]
 185514 [host=huxelrebe0]
 185536 [host=fiano0]
 185711 [host=nobling0]
 185864 [host=pinot1]
 186063 [host=albana0]
 186069 fail 7142e648416ff5d3eac6c6d607874805f5de0ca8 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ffb451126550b22b43b62fb8731a0d78e3376c03 
e5f2e4c69643bc3cd385306a9e5d29e11578148c 
3c7c9225ffa5605bf0603f9dd1666f3f786e2c44
 186088 pass 8f698f0a646124ede518d3e255ef725de1239639 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ffb451126550b22b43b62fb8731a0d78e3376c03 
1588fd1437960d94cadc30c42243671e8c0f1281 
9bc40dbcf9eafccc1923b2555286bf6a2af03b7a
 186089 fail 7142e648416ff5d3eac6c6d607874805f5de0ca8 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ffb451126550b22b43b62fb8731a0d78e3376c03 
e5f2e4c69643bc3cd385306a9e5d29e11578148c 
3c7c9225ffa5605bf0603f9dd1666f3f786e2c44
 186090 pass 2b330b57dbe8014c5fa9f10d4cf4ae5923e3b143 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ffb451126550b22b43b62fb8731a0d78e3376c03 
e5f2e4c69643bc3cd385306a9e5d29e11578148c 
5d9a931fe2c1310dbfd946bbc1e22a177add4f5c
 186091 fail 5f783827bbaa1552edf4386bb71d8d8f471340f5 
3d273dd05e51e5a1ffba3d98c7437ee84e8f8764 
ffb451126550b22b43b62fb8731a0d78e3376c03 
e5f2e4c69643bc3cd385306a9e5d29e11578148c 
effcf70f020ff12d34c80e2

[PATCH] docs/misra: rules for mass adoption

2024-05-22 Thread Stefano Stabellini
This patch adds a bunch of rules to rules.rst that are uncontroversial
and have zero violations in Xen. As such, they have been approved for
adoption.

All the ones that regard the standard library have the link to the
existing footnote in the notes.

Signed-off-by: Stefano Stabellini 

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index 80e5e972ad..d67c74a083 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -580,6 +580,11 @@ maintainers if you want to suggest a change.
  - The relational operators > >= < and <= shall not be applied to objects 
of pointer type except where they point into the same object
  -
 
+   * - `Rule 18.8 
`_
+ - Required
+ - Variable-length array types shall not be used
+ -
+
* - `Rule 19.1 
`_
  - Mandatory
  - An object shall not be assigned or copied to an overlapping
@@ -589,11 +594,29 @@ maintainers if you want to suggest a change.
instances where Eclair is unable to verify that the code is valid
in regard to Rule 19.1. Caution reports are not violations.
 
+   * - `Rule 20.2 
`_
+ - Required
+ - The ', " or \ characters and the /* or // character sequences
+   shall not occur in a header file name
+ -
+
+   * - `Rule 20.3 
`_
+ - Required
+ - The #include directive shall be followed by either a 
+   or "filename" sequence
+ -
+
* - `Rule 20.4 
`_
  - Required
  - A macro shall not be defined with the same name as a keyword
  -
 
+   * - `Rule 20.6 
`_
+ - Required
+ - Tokens that look like a preprocessing directive shall not occur
+   within a macro argument
+ -
+
* - `Rule 20.7 
`_
  - Required
  - Expressions resulting from the expansion of macro parameters
@@ -609,6 +632,12 @@ maintainers if you want to suggest a change.
evaluation
  -
 
+   * - `Rule 20.11 
`_
+ - Required
+ - A macro parameter immediately following a # operator shall not
+   immediately be followed by a ## operator
+ -
+
* - `Rule 20.12 
`_
  - Required
  - A macro parameter used as an operand to the # or ## operators,
@@ -651,11 +680,39 @@ maintainers if you want to suggest a change.
declared
  - See comment for Rule 21.1
 
+   * - `Rule 21.3 
`_
+ - Required
+ - The memory allocation and deallocation functions of 
+   shall not be used
+ - Xen doesn't provide, use, or link against a Standard Library 
[#xen-stdlib]_
+
+   * - `Rule 21.4 
`_
+ - Required
+ - The standard header file  shall not be used
+ - Xen doesn't provide, use, or link against a Standard Library 
[#xen-stdlib]_
+
+   * - `Rule 21.5 
`_
+ - Required
+ - The standard header file  shall not be used
+ - Xen doesn't provide, use, or link against a Standard Library 
[#xen-stdlib]_
+
* - `Rule 21.6 
`_
  - Required
  - The Standard Library input/output routines shall not be used
  - Xen doesn't provide, use, or link against a Standard Library 
[#xen-stdlib]_
 
+   * - `Rule 21.7 
`_
+ - Required
+ - The Standard Library functions atof, atoi, atol and atoll of
+shall not be used
+ - Xen doesn't provide, use, or link against a Standard Library 
[#xen-stdlib]_
+
+   * - `Rule 21.8 
`_
+ - Required
+ - The Standard Library functions abort, exit and system of
+shall not be used
+ - Xen doesn't provide, use, or link against a Standard Library 
[#xen-stdlib]_
+
* - `Rule 21.9 
`_
  - Required
  - The library functions bsearch and qsort of  shall not be used
@@ -666,6 +723,11 @@ maintainers if you want to sugges

[PATCH 5.10] x86/xen: Drop USERGS_SYSRET64 paravirt call

2024-05-22 Thread Pawan Gupta
  */
-   movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
-   movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
-
-   pushq $__USER_DS
-   pushq PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
-   pushq %r11
-   pushq $__USER_CS
-   pushq %rcx
-
-   pushq $VGCF_in_syscall
-   jmp hypercall_iret
-SYM_CODE_END(xen_sysret64)
-
 /*
  * XEN pv doesn't use trampoline stack, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is
  * also the kernel stack.  Reusing swapgs_restore_regs_and_return_to_usermode()
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 8695809b88f0..98242430d07e 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -138,8 +138,6 @@ __visible unsigned long xen_read_cr2_direct(void);
 
 /* These are not functions, and cannot be called normally */
 __visible void xen_iret(void);
-__visible void xen_sysret32(void);
-__visible void xen_sysret64(void);
 
 extern int xen_panic_handler_init(void);
 

---
base-commit: ce3838dbefdccfb95a63f81fe6cf77592ae9138c
change-id: 20240522-verw-xen-pv-fix-e638729ac3ca

Best regards,
-- 
Thanks,
Pawan




Re: [PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs

2024-05-22 Thread Henry Wang

Hi Julien, Stefano,

On 5/22/2024 9:03 PM, Julien Grall wrote:

Hi Henry,

On 22/05/2024 02:22, Henry Wang wrote:
Also, while looking at the locking, I noticed that we are not 
doing anything
with GIC_IRQ_GUEST_MIGRATING. In gic_update_one_lr(), we seem to 
assume that

if the flag is set, then p->desc cannot be NULL.

Can we reach vgic_connect_hw_irq() with the flag set?
I think even from the perspective of making the code extra safe, we 
should
also check GIC_IRQ_GUEST_MIGRATING as the LR is allocated for this 
case. I

will also add the check of GIC_IRQ_GUEST_MIGRATING here.

Yes. I think it might be easier to check for GIC_IRQ_GUEST_MIGRATING
early and return error immediately in that case. Otherwise, we can
continue and take spin_lock(&v_target->arch.vgic.lock) because no
migration is in progress


Ok, this makes sense to me, I will add

 if( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
 {
 vgic_unlock_rank(v_target, rank, flags);
 return -EBUSY;
 }

right after taking the vgic rank lock.


Summary of our yesterday's discussion on Matrix:
For the split of patch mentioned in...

I think that would be ok. I have to admit, I am still a bit wary about 
allowing to remove interrupts when the domain is running.


I am less concerned about the add part. Do you need the remove part 
now? If not, I would suggest to split in two so we can get the most of 
this series merged for 4.19 and continue to deal with the remove path 
in the background.


...here, I will do that in the next version.


I will answer here to the other reply:

> I don't think so, if I am not mistaken, no LR will be allocated with 
other flags set.


I wasn't necessarily thinking about the LR allocation. I was more 
thinking whether there are any flags that could still be set.


IOW, will the vIRQ like new once vgic_connect_hw_irq() is succesful?

Also, while looking at the flags, I noticed we clear _IRQ_INPROGRESS 
before vgic_connect_hw_irq(). Shouldn't we only clear *after*?


This is a good catch, with the logic of vgic_connect_hw_irq() extended 
to reject the invalid cases, it is indeed safer to clear the 
_IRQ_INPROGRESS  after the successful vgic_connect_hw_irq(). I will move 
it after.


This brings to another question. You don't special case a dying 
domain. If the domain is crashing, wouldn't this mean it wouldn't be 
possible to destroy it?


Another good point, thanks. I will try to make a special case of the 
dying domain.


Kind regards,
Henry




Cheers,






[xen-4.17-testing test] 186069: regressions - FAIL

2024-05-22 Thread osstest service owner
flight 186069 xen-4.17-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186069/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i386-xsm6 xen-buildfail REGR. vs. 185864

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 185864
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185864
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185864
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3c7c9225ffa5605bf0603f9dd1666f3f786e2c44
baseline version:
 xen  effcf70f020ff12d34c80e2abde0ecb00ce92bda

Last test of basis   185864  2024-04-29 08:08:55 Z   23 days
Failing since186063  2024-05-21 10:06:36 Z1 days2 attempts
Testing same since   186069  2024-05-22 01:58:18 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Daniel P. Smith 
  Demi Marie Obenour 
  Jan Beulich 
  Jason Andryuk 
  Jason Andryuk 
  Juergen Gross 
  Leigh Brown 
  Roger Pau Monné 
  Ross Lagerwall 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   fail
 bu

[libvirt test] 186070: tolerable all pass - PUSHED

2024-05-22 Thread osstest service owner
flight 186070 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186070/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186057
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass

version targeted for testing:
 libvirt  7dda4a03ac77bbe14b12b7b8f3a509a0e09f3129
baseline version:
 libvirt  7c8e606b64c73ca56d7134cb16d01257f39c53ef

Last test of basis   186057  2024-05-21 04:18:53 Z1 days
Testing same since   186070  2024-05-22 04:20:52 Z0 days1 attempts


People who touched revisions under test:
  Han Han 
  Michal Privoznik 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-amd64-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-amd64-amd64-libvirt-raw pass
 test-arm64-arm64-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd pass
 test-armhf-armhf-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/libvirt.git
   7c8e606b64..7dda4a03ac  7dda4a03ac77bbe14b12b7b8f3a509a0e09f3129 -> 
xen-tested-master



Re: [PATCH v16 1/5] arm/vpci: honor access size when returning an error

2024-05-22 Thread Stewart Hildebrand
On 5/22/24 18:59, Stewart Hildebrand wrote:
> From: Volodymyr Babchuk 
> 
> Guest can try to read config space using different access sizes: 8,
> 16, 32, 64 bits. We need to take this into account when we are
> returning an error back to MMIO handler, otherwise it is possible to
> provide more data than requested: i.e. guest issues LDRB instruction
> to read one byte, but we are writing 0x in the target
> register.
> 
> Signed-off-by: Volodymyr Babchuk 
> Signed-off-by: Stewart Hildebrand 

I forgot to pick up Julien's ack [0].

[0] 
https://lore.kernel.org/xen-devel/8fa02e06-d8dc-4e73-a58e-e4d84b090...@xen.org/

> ---
> v14->v15:
> * re-order so this patch comes before ("xen/arm: translate virtual PCI
>   bus topology for guests")
> * s/access_mask/invalid/
> * add U suffix to 1
> * s/uint8_t/unsigned int/
> * s/uint64_t/register_t/
> * although Julien gave an Acked-by on v14, I omitted it due to the
>   changes made in v15
> 
> v9->10:
> * New patch in v10.
> ---
>  xen/arch/arm/vpci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
> index 3bc4bb55082a..b63a356bb4a8 100644
> --- a/xen/arch/arm/vpci.c
> +++ b/xen/arch/arm/vpci.c
> @@ -29,6 +29,8 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>  {
>  struct pci_host_bridge *bridge = p;
>  pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
> +const unsigned int access_size = (1U << info->dabt.size) * 8;
> +const register_t invalid = GENMASK_ULL(access_size - 1, 0);
>  /* data is needed to prevent a pointer cast on 32bit */
>  unsigned long data;
>  
> @@ -39,7 +41,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
>  return 1;
>  }
>  
> -*r = ~0ul;
> +*r = invalid;
>  
>  return 0;
>  }




[PATCH v16 5/5] xen/arm: account IO handlers for emulated PCI MSI-X

2024-05-22 Thread Stewart Hildebrand
From: Oleksandr Andrushchenko 

At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding IO trap handlers for the emulated
MSI-X registers we need to explicitly tell that we have additional IO
handlers, so those are accounted.

Signed-off-by: Oleksandr Andrushchenko 
Acked-by: Julien Grall 
Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Stewart Hildebrand 
---
This depends on a constant defined in ("vpci: add initial support for
virtual PCI bus topology"), so cannot be committed without the
dependency.

Since v5:
- optimize with IS_ENABLED(CONFIG_HAS_PCI_MSI) since VPCI_MAX_VIRT_DEV is
  defined unconditionally
New in v5
---
 xen/arch/arm/vpci.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 516933bebfb3..4779bbfa9be3 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -132,6 +132,8 @@ static int vpci_get_num_handlers_cb(struct domain *d,
 
 unsigned int domain_vpci_get_num_mmio_handlers(struct domain *d)
 {
+unsigned int count;
+
 if ( !has_vpci(d) )
 return 0;
 
@@ -152,7 +154,17 @@ unsigned int domain_vpci_get_num_mmio_handlers(struct 
domain *d)
  * For guests each host bridge requires one region to cover the
  * configuration space. At the moment, we only expose a single host bridge.
  */
-return 1;
+count = 1;
+
+/*
+ * There's a single MSI-X MMIO handler that deals with both PBA
+ * and MSI-X tables per each PCI device being passed through.
+ * Maximum number of emulated virtual devices is VPCI_MAX_VIRT_DEV.
+ */
+if ( IS_ENABLED(CONFIG_HAS_PCI_MSI) )
+count += VPCI_MAX_VIRT_DEV;
+
+return count;
 }
 
 /*
-- 
2.45.1




[PATCH v16 4/5] xen/arm: translate virtual PCI bus topology for guests

2024-05-22 Thread Stewart Hildebrand
From: Oleksandr Andrushchenko 

There are three  originators for the PCI configuration space access:
1. The domain that owns physical host bridge: MMIO handlers are
there so we can update vPCI register handlers with the values
written by the hardware domain, e.g. physical view of the registers
vs guest's view on the configuration space.
2. Guest access to the passed through PCI devices: we need to properly
map virtual bus topology to the physical one, e.g. pass the configuration
space access to the corresponding physical devices.
3. Emulated host PCI bridge access. It doesn't exist in the physical
topology, e.g. it can't be mapped to some physical host bridge.
So, all access to the host bridge itself needs to be trapped and
emulated.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Stewart Hildebrand 
---
In v15:
- base on top of ("arm/vpci: honor access size when returning an error")
In v11:
- Fixed format issues
- Added ASSERT_UNREACHABLE() to the dummy implementation of
vpci_translate_virtual_device()
- Moved variable in vpci_sbdf_from_gpa(), now it is easier to follow
the logic in the function
Since v9:
- Commend about required lock replaced with ASSERT()
- Style fixes
- call to vpci_translate_virtual_device folded into vpci_sbdf_from_gpa
Since v8:
- locks moved out of vpci_translate_virtual_device()
Since v6:
- add pcidevs locking to vpci_translate_virtual_device
- update wrt to the new locking scheme
Since v5:
- add vpci_translate_virtual_device for #ifndef CONFIG_HAS_VPCI_GUEST_SUPPORT
  case to simplify ifdefery
- add ASSERT(!is_hardware_domain(d)); to vpci_translate_virtual_device
- reset output register on failed virtual SBDF translation
Since v4:
- indentation fixes
- constify struct domain
- updated commit message
- updates to the new locking scheme (pdev->vpci_lock)
Since v3:
- revisit locking
- move code to vpci.c
Since v2:
 - pass struct domain instead of struct vcpu
 - constify arguments where possible
 - gate relevant code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/arch/arm/vpci.c | 45 -
 xen/drivers/vpci/vpci.c | 24 ++
 xen/include/xen/vpci.h  | 12 +++
 3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index b63a356bb4a8..516933bebfb3 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -7,33 +7,53 @@
 
 #include 
 
-static pci_sbdf_t vpci_sbdf_from_gpa(const struct pci_host_bridge *bridge,
- paddr_t gpa)
+static bool vpci_sbdf_from_gpa(struct domain *d,
+   const struct pci_host_bridge *bridge,
+   paddr_t gpa, pci_sbdf_t *sbdf)
 {
-pci_sbdf_t sbdf;
+bool translated = true;
+
+ASSERT(sbdf);
 
 if ( bridge )
 {
-sbdf.sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
-sbdf.seg = bridge->segment;
-sbdf.bus += bridge->cfg->busn_start;
+sbdf->sbdf = VPCI_ECAM_BDF(gpa - bridge->cfg->phys_addr);
+sbdf->seg = bridge->segment;
+sbdf->bus += bridge->cfg->busn_start;
 }
 else
-sbdf.sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+{
+/*
+ * For the passed through devices we need to map their virtual SBDF
+ * to the physical PCI device being passed through.
+ */
+sbdf->sbdf = VPCI_ECAM_BDF(gpa - GUEST_VPCI_ECAM_BASE);
+read_lock(&d->pci_lock);
+translated = vpci_translate_virtual_device(d, sbdf);
+read_unlock(&d->pci_lock);
+}
 
-return sbdf;
+return translated;
 }
 
 static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
   register_t *r, void *p)
 {
 struct pci_host_bridge *bridge = p;
-pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+pci_sbdf_t sbdf;
 const unsigned int access_size = (1U << info->dabt.size) * 8;
 const register_t invalid = GENMASK_ULL(access_size - 1, 0);
 /* data is needed to prevent a pointer cast on 32bit */
 unsigned long data;
 
+ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, &sbdf) )
+{
+*r = invalid;
+return 1;
+}
+
 if ( vpci_ecam_read(sbdf, ECAM_REG_OFFSET(info->gpa),
 1U << info->dabt.size, &data) )
 {
@@ -50,7 +70,12 @@ static int vpci_mmio_write(struct vcpu *v, mmio_info_t *info,
register_t r, void *p)
 {
 struct pci_host_bridge *bridge = p;
-pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+pci_sbdf_t sbdf;
+
+ASSERT(!bridge == !is_hardware_domain(v->domain));
+
+if ( !vpci_sbdf_from_gpa(v->domain, bridge, info->gpa, &sbdf) )
+return 1;
 
 return vpci_ecam_write(sbdf, ECAM_REG_OFFSET(info->gpa),
1U << info->dabt.size, r);
diff --git a/xen/driv

[PATCH v16 3/5] vpci: add initial support for virtual PCI bus topology

2024-05-22 Thread Stewart Hildebrand
From: Oleksandr Andrushchenko 

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Stewart Hildebrand 
Acked-by: Jan Beulich 
---
In v16:
- s/add_virtual_device/assign_virtual_sbdf/
- move ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)) earlier
- add #define INVALID_GUEST_SBDF
In v15:
- add Jan's A-b
In v13:
- s/depends on/select/ in Kconfig
- check pdev->sbdf.fn instead of two booleans in add_virtual_device()
- comment #endifs in sched.h
- clarify comment about limits in vpci.h with seg/bus limit
In v11:
- Fixed code formatting
- Removed bogus write_unlock() call
- Fixed type for new_dev_number
In v10:
- Removed ASSERT(pcidevs_locked())
- Removed redundant code (local sbdf variable, clearing sbdf during
device removal, etc)
- Added __maybe_unused attribute to "out:" label
- Introduced HAS_VPCI_GUEST_SUPPORT Kconfig option, as this is the
  first patch where it is used (previously was in "vpci: add hooks for
  PCI device assign/de-assign")
In v9:
- Lock in add_virtual_device() replaced with ASSERT (thanks, Stewart)
In v8:
- Added write lock in add_virtual_device
Since v6:
- re-work wrt new locking scheme
- OT: add ASSERT(pcidevs_write_locked()); to add_virtual_device()
Since v5:
- s/vpci_add_virtual_device/add_virtual_device and make it static
- call add_virtual_device from vpci_assign_device and do not use
  REGISTER_VPCI_INIT machinery
- add pcidevs_locked ASSERT
- use DECLARE_BITMAP for vpci_dev_assigned_map
Since v4:
- moved and re-worked guest sbdf initializers
- s/set_bit/__set_bit
- s/clear_bit/__clear_bit
- minor comment fix s/Virtual/Guest/
- added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
  later for counting the number of MMIO handlers required for a guest
  (Julien)
Since v3:
 - make use of VPCI_INIT
 - moved all new code to vpci.c which belongs to it
 - changed open-coded 31 to PCI_SLOT(~0)
 - added comments and code to reject multifunction devices with
   functions other than 0
 - updated comment about vpci_dev_next and made it unsigned int
 - implement roll back in case of error while assigning/deassigning devices
 - s/dom%pd/%pd
Since v2:
 - remove casts that are (a) malformed and (b) unnecessary
 - add new line for better readability
 - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
functions are now completely gated with this config
 - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
New in v2
---
 xen/drivers/Kconfig |  4 +++
 xen/drivers/vpci/vpci.c | 57 +
 xen/include/xen/sched.h | 10 +++-
 xen/include/xen/vpci.h  | 13 ++
 4 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
index db94393f47a6..20050e9bb8b3 100644
--- a/xen/drivers/Kconfig
+++ b/xen/drivers/Kconfig
@@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
 config HAS_VPCI
bool
 
+config HAS_VPCI_GUEST_SUPPORT
+   bool
+   select HAS_VPCI
+
 endmenu
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 97e115dc5798..1e6aa5d799b9 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -40,6 +40,49 @@ extern vpci_register_init_t *const __start_vpci_array[];
 extern vpci_register_init_t *const __end_vpci_array[];
 #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
 
+#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
+static int assign_virtual_sbdf(struct pci_dev *pdev)
+{
+struct domain *d = pdev->domain;
+unsigned int new_dev_number;
+
+ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+if ( is_hardware_domain(d) )
+return 0;
+
+/*
+ * Each PCI bus supports 32 devices/slots at max or up to 256 when
+ * there are multi-function ones which are not yet supported.
+ */
+if ( pdev->sbdf.fn )
+{
+gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
+ &pdev->sbdf);
+return -EOPNOTSUPP;
+}
+new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
+ VPCI_MAX_VIRT_DEV);
+if ( new_dev_number == VPCI_MAX_VIRT_DEV )
+return -ENOSPC;
+
+__set_bit(new_dev_number, &d->vpci_dev_assigned_map);
+
+/*
+ * Both segment and bus number are 0:
+ *  - we emulate a single host bridge for the guest, e.g. segment 0
+ *  - with bus 0 the virtual devices are seen as embedded
+ *endpoints behind the root complex
+ *
+ * TODO: add support for multi-function devices.
+ */
+pdev->vpci->guest_sbdf = PCI_SBDF(0, 0, new_dev_number, 0);
+
+return 0;
+}
+
+#endif /* 

[PATCH v16 2/5] vpci/header: emulate PCI_COMMAND register for guests

2024-05-22 Thread Stewart Hildebrand
From: Oleksandr Andrushchenko 

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.

Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.

PCI_COMMAND_IO (bit 0)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if an I/O BAR is exposed to the guest.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
don't yet support I/O BARs for domUs.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_MEMORY (bit 1)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if a Memory BAR is exposed to the guest.
  Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
regions.
  Xen domU: For devices assigned to DomUs, memory decoding will be
disabled at the time of initialization.

PCI_COMMAND_MASTER (bit 2)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_SPECIAL (bit 3)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_INVALIDATE (bit 4)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_VGA_PALETTE (bit 5)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_PARITY (bit 6)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_WAIT (bit 7)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: hardwire to 0
  QEMU: res_mask
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_SERR (bit 8)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_FAST_BACK (bit 9)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_INTX_DISABLE (bit 10)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU checks if INTx was mapped
for a device. If it is not, then guest can't control
PCI_COMMAND_INTX_DISABLE bit.
  Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
  Xen dom0: We allow dom0 to control this bit freely.

Bits 11-15
  PCIe 6.1: RsvdP
  PCI LB 3.0: Reserved
  QEMU: res_mask
  Xen domU: rsvdp_mask
  Xen dom0: We allow dom0 to control these bits freely.

Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Stewart Hildebrand 
Reviewed-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 
---
In v16:
- allow dom0 to freely control RsvdP bits (11-15)
- add Roger's R-b

In v15:
- add Jan's R-b
- add blank line after declaration in msi.c:control_write()

In v14:
- check for 0->1 transition in INTX_DISABLE-setting logic in
  msi.c:control_write() to match msix.c:control_write()
- clear domU-controllable bits in header.c:init_header()

In v13:
- Update right away (don't defer) PCI_COMMAND_MEMORY bit in guest_cmd
  variable in cmd_write()
- Make comment single line in xen/drivers/vpci/msi.c:control_write()
- Rearrange memory decoding disabling snippet in init_header()

In v12:
- Rework patch using vpci_add_register_mask()
- Add bitmask #define in pci_regs.h according to PCIe 6.1 spec, except
  don't add the RO bits because they were RW in PCI LB 3.0 spec.

[PATCH v16 1/5] arm/vpci: honor access size when returning an error

2024-05-22 Thread Stewart Hildebrand
From: Volodymyr Babchuk 

Guest can try to read config space using different access sizes: 8,
16, 32, 64 bits. We need to take this into account when we are
returning an error back to MMIO handler, otherwise it is possible to
provide more data than requested: i.e. guest issues LDRB instruction
to read one byte, but we are writing 0x in the target
register.

Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Stewart Hildebrand 
---
v14->v15:
* re-order so this patch comes before ("xen/arm: translate virtual PCI
  bus topology for guests")
* s/access_mask/invalid/
* add U suffix to 1
* s/uint8_t/unsigned int/
* s/uint64_t/register_t/
* although Julien gave an Acked-by on v14, I omitted it due to the
  changes made in v15

v9->10:
* New patch in v10.
---
 xen/arch/arm/vpci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/vpci.c b/xen/arch/arm/vpci.c
index 3bc4bb55082a..b63a356bb4a8 100644
--- a/xen/arch/arm/vpci.c
+++ b/xen/arch/arm/vpci.c
@@ -29,6 +29,8 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 {
 struct pci_host_bridge *bridge = p;
 pci_sbdf_t sbdf = vpci_sbdf_from_gpa(bridge, info->gpa);
+const unsigned int access_size = (1U << info->dabt.size) * 8;
+const register_t invalid = GENMASK_ULL(access_size - 1, 0);
 /* data is needed to prevent a pointer cast on 32bit */
 unsigned long data;
 
@@ -39,7 +41,7 @@ static int vpci_mmio_read(struct vcpu *v, mmio_info_t *info,
 return 1;
 }
 
-*r = ~0ul;
+*r = invalid;
 
 return 0;
 }
-- 
2.45.1




[PATCH v16 0/5] PCI devices passthrough on Arm, part 3

2024-05-22 Thread Stewart Hildebrand
This is next version of vPCI rework. Aim of this series is to prepare
ground for introducing PCI support on ARM platform.

in v16:
 - minor updates - see individual patches

in v15:
 - reorder so ("arm/vpci: honor access size when returning an error")
   comes first

in v14:
 - drop first 9 patches as they were committed
 - updated ("vpci/header: emulate PCI_COMMAND register for guests")

in v13:
 - drop ("xen/arm: vpci: permit access to guest vpci space") as it was
   unnecessary

in v12:
 - I (Stewart) coordinated with Volodomyr to send this whole series. So,
   add my (Stewart) Signed-off-by to all patches.
 - The biggest change is to re-work the PCI_COMMAND register patch.
   Additional feedback has also been addressed - see individual patches.
 - Drop ("pci: msi: pass pdev to pci_enable_msi() function") and
   ("pci: introduce per-domain PCI rwlock") as they were committed
 - Rename ("rangeset: add rangeset_empty() function")
   to ("rangeset: add rangeset_purge() function")
 - Rename ("vpci/header: rework exit path in init_bars")
   to ("vpci/header: rework exit path in init_header()")

in v11:
 - Added my (Volodymyr) Signed-off-by tag to all patches
 - Patch "vpci/header: emulate PCI_COMMAND register for guests" is in
   intermediate state, because it was agreed to rework it once Stewart's
   series on register handling are in.
 - Addressed comments, please see patch descriptions for details.

in v10:

 - Removed patch ("xen/arm: vpci: check guest range"), proper fix
   for the issue is part of ("vpci/header: emulate PCI_COMMAND
   register for guests")
 - Removed patch ("pci/header: reset the command register when adding
   devices")
 - Added patch ("rangeset: add rangeset_empty() function") because
   this function is needed in ("vpci/header: handle p2m range sets
   per BAR")
 - Added ("vpci/header: handle p2m range sets per BAR") which addressed
   an issue discovered by Andrii Chepurnyi during virtio integration
 - Added ("pci: msi: pass pdev to pci_enable_msi() function"), which is
   prereq for ("pci: introduce per-domain PCI rwlock")
 - Fixed "Since v9/v8/... " comments in changelogs to reduce confusion.
   I left "Since" entries for older versions, because they were added
   by original author of the patches.

in v9:

v9 includes addressed commentes from a previous one. Also it
introduces a couple patches from Stewart. This patches are related to
vPCI use on ARM. Patch "vpci/header: rework exit path in init_bars"
was factored-out from "vpci/header: handle p2m range sets per BAR".

in v8:

The biggest change from previous, mistakenly named, v7 series is how
locking is implemented. Instead of d->vpci_rwlock we introduce
d->pci_lock which has broader scope, as it protects not only domain's
vpci state, but domain's list of PCI devices as well.

As we discussed in IRC with Roger, it is not feasible to rework all
the existing code to use the new lock right away. It was agreed that
any write access to d->pdev_list will be protected by **both**
d->pci_lock in write mode and pcidevs_lock(). Read access on other
hand should be protected by either d->pci_lock in read mode or
pcidevs_lock(). It is expected that existing code will use
pcidevs_lock() and new users will use new rw lock. Of course, this
does not mean that new users shall not use pcidevs_lock() when it is
appropriate.

Changes from previous versions are described in each separate patch.

Oleksandr Andrushchenko (4):
  vpci/header: emulate PCI_COMMAND register for guests
  vpci: add initial support for virtual PCI bus topology
  xen/arm: translate virtual PCI bus topology for guests
  xen/arm: account IO handlers for emulated PCI MSI-X

Volodymyr Babchuk (1):
  arm/vpci: honor access size when returning an error

 xen/arch/arm/vpci.c| 63 +++--
 xen/drivers/Kconfig|  4 ++
 xen/drivers/vpci/header.c  | 60 +---
 xen/drivers/vpci/msi.c |  9 +
 xen/drivers/vpci/msix.c|  7 
 xen/drivers/vpci/vpci.c| 81 ++
 xen/include/xen/pci_regs.h |  1 +
 xen/include/xen/sched.h| 10 -
 xen/include/xen/vpci.h | 28 +
 9 files changed, 244 insertions(+), 19 deletions(-)


base-commit: ced21fbb2842ac4655048bdee56232974ff9ff9c
-- 
2.45.1




Re: [XEN PATCH v4 2/3] x86/MCE: add default switch case in init_nonfatal_mce_checker()

2024-05-22 Thread Stefano Stabellini
On Wed, 22 May 2024, Sergiy Kibrik wrote:
> The default switch case block is wanted here, to handle situation
> e.g. of unexpected c->x86_vendor value -- then no mcheck init is done, but
> misleading message still gets logged anyway.
> 
> Signed-off-by: Sergiy Kibrik 
> CC: Jan Beulich 

Reviewed-by: Stefano Stabellini 


> ---
> changes in v4:
>  - return 0 instead of -ENODEV and put a comment
>  - update description a bit
> ---
>  xen/arch/x86/cpu/mcheck/non-fatal.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/xen/arch/x86/cpu/mcheck/non-fatal.c 
> b/xen/arch/x86/cpu/mcheck/non-fatal.c
> index 33cacd15c2..5a53bcd0b7 100644
> --- a/xen/arch/x86/cpu/mcheck/non-fatal.c
> +++ b/xen/arch/x86/cpu/mcheck/non-fatal.c
> @@ -29,9 +29,14 @@ static int __init cf_check init_nonfatal_mce_checker(void)
>   /* Assume we are on K8 or newer AMD or Hygon CPU here */
>   amd_nonfatal_mcheck_init(c);
>   break;
> +
>   case X86_VENDOR_INTEL:
>   intel_nonfatal_mcheck_init(c);
>   break;
> +
> + default:
> + /* unhandled vendor isn't really an error */
> + return 0;
>   }
>   printk(KERN_INFO "mcheck_poll: Machine check polling timer started.\n");
>   return 0;
> -- 
> 2.25.1
> 



[for-4.19] Re: [XEN PATCH v3] arm/mem_access: add conditional build of mem_access.c

2024-05-22 Thread Julien Grall

Hi,

Adding Oleksii as the release manager.

On 22/05/2024 19:27, Tamas K Lengyel wrote:

On Fri, May 10, 2024 at 8:32 AM Alessandro Zucchelli
 wrote:


In order to comply to MISRA C:2012 Rule 8.4 for ARM the following
changes are done:
revert preprocessor conditional changes to xen/mem_access.h which
had it build unconditionally, add conditional build for xen/mem_access.c
as well and provide stubs in asm/mem_access.h for the users of this
header.

Signed-off-by: Alessandro Zucchelli 


Acked-by: Tamas K Lengyel 


Oleksii, would you be happy if this patch is committed for 4.19?

BTW, do you want to be release-ack every bug until the hard code freeze? 
Or would you be fine to levea the decision to the maintainers?


Cheers,

--
Julien Grall



Re: [PATCH v3 3/7] xen/p2m: put reference for level 2 superpage

2024-05-22 Thread Julien Grall




On 22/05/2024 14:47, Luca Fancellu wrote:

Hi Julien,


Hi Luca,


On 22 May 2024, at 14:25, Julien Grall  wrote:


diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
index 41fcca011cf4..b496266deef6 100644
--- a/xen/arch/arm/mmu/p2m.c
+++ b/xen/arch/arm/mmu/p2m.c
@@ -753,17 +753,9 @@ static int p2m_mem_access_radix_set(struct p2m_domain 
*p2m, gfn_t gfn,
  return rc;
  }
  -/*
- * Put any references on the single 4K page referenced by pte.
- * TODO: Handle superpages, for now we only take special references for leaf
- * pages (specifically foreign ones, which can't be super mapped today).
- */
-static void p2m_put_l3_page(const lpae_t pte)
+/* Put any references on the single 4K page referenced by mfn. */
+static void p2m_put_l3_page(mfn_t mfn, p2m_type_t type)
  {
-mfn_t mfn = lpae_get_mfn(pte);
-
-ASSERT(p2m_is_valid(pte));
-
  /*
   * TODO: Handle other p2m types
   *
@@ -771,16 +763,43 @@ static void p2m_put_l3_page(const lpae_t pte)
   * flush the TLBs if the page is reallocated before the end of
   * this loop.
   */
-if ( p2m_is_foreign(pte.p2m.type) )
+if ( p2m_is_foreign(type) )
  {
  ASSERT(mfn_valid(mfn));
  put_page(mfn_to_page(mfn));
  }
  /* Detect the xenheap page and mark the stored GFN as invalid. */
-else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
+else if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) )
  page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
  }


All the pages within a 2MB mapping should be the same type. So...


  +/* Put any references on the superpage referenced by mfn. */
+static void p2m_put_l2_superpage(mfn_t mfn, p2m_type_t type)
+{
+unsigned int i;
+
+for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+{
+p2m_put_l3_page(mfn, type);
+
+mfn = mfn_add(mfn, 1);
+}


... this solution is a bit wasteful as we will now call p2m_put_l3_page() 512 
times even though there is nothing to do.

So instead can we move the checks outside to optimize the path a bit?


You mean this?

diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
index b496266deef6..d40cddda48f3 100644
--- a/xen/arch/arm/mmu/p2m.c
+++ b/xen/arch/arm/mmu/p2m.c
@@ -794,7 +794,8 @@ static void p2m_put_page(const lpae_t pte, unsigned int 
level)
  ASSERT(p2m_is_valid(pte));
  
  /* We have a second level 2M superpage */

-if ( p2m_is_superpage(pte, level) && (level == 2) )
+if ( p2m_is_superpage(pte, level) && (level == 2) &&
+ p2m_is_foreign(pte.p2m.type) )
  return p2m_put_l2_superpage(mfn, pte.p2m.type);
  else if ( level == 3 )
  return p2m_put_l3_page(mfn, pte.p2m.type);


I meant something like below. This is untested and to apply on top of 
this patch:


diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
index b496266deef6..60c4d680b417 100644
--- a/xen/arch/arm/mmu/p2m.c
+++ b/xen/arch/arm/mmu/p2m.c
@@ -753,20 +753,27 @@ static int p2m_mem_access_radix_set(struct 
p2m_domain *p2m, gfn_t gfn,

 return rc;
 }

+static void p2m_put_foreign_page(struct page_info *pg)
+{
+/*
+ * It's safe to do the put_page here because page_alloc will
+ * flush the TLBs if the page is reallocated before the end of
+ * this loop.
+ */
+put_page(pg)
+}
+
 /* Put any references on the single 4K page referenced by mfn. */
 static void p2m_put_l3_page(mfn_t mfn, p2m_type_t type)
 {
 /*
  * TODO: Handle other p2m types
  *
- * It's safe to do the put_page here because page_alloc will
- * flush the TLBs if the page is reallocated before the end of
- * this loop.
  */
 if ( p2m_is_foreign(type) )
 {
 ASSERT(mfn_valid(mfn));
-put_page(mfn_to_page(mfn));
+p2m_put_foreign_page(mfn_to_page(mfn));
 }
 /* Detect the xenheap page and mark the stored GFN as invalid. */
 else if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) )
@@ -777,13 +784,18 @@ static void p2m_put_l3_page(mfn_t mfn, p2m_type_t 
type)

 static void p2m_put_l2_superpage(mfn_t mfn, p2m_type_t type)
 {
 unsigned int i;
+struct page_info *pg;

-for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
-{
-p2m_put_l3_page(mfn, type);
+/* TODO: Handle other p2m types */
+if ( p2m_is_foreign(type) )
+return;

-mfn = mfn_add(mfn, 1);
-}
+ASSERT(mfn_valid(mfn));
+
+pg = mfn_to_page(mfn);
+
+for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++, pg++ )
+p2m_put_foreign_page(pg);
 }

 /* Put any references on the page referenced by pte. */

The type check only happens once. Also, I moved mfn_to_page(...) outside 
of the loop because the operation is expensive. Yet, if the MFNs are 
contiguous, then the page_info structures will be too.






Otherwise...


+}
+
+/* Put any references on the page referenced by pte. */
+static void p2m_put_page(const lpae_t pte, unsigned int level)
+{
+mfn_t mfn = lpae_get_mfn(pte);
+
+ASSERT(p2m_i

[xen-4.18-testing test] 186067: tolerable FAIL - PUSHED

2024-05-22 Thread osstest service owner
flight 186067 xen-4.18-testing real [real]
flight 186083 xen-4.18-testing real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/186067/
http://logs.test-lab.xenproject.org/osstest/logs/186083/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-multivcpu 10 host-ping-check-xen fail pass in 186083-retest

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 10 host-ping-check-xen  fail REGR. vs. 186060

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check fail in 186083 never 
pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check fail in 186083 
never pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186060
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186060
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186060
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186060
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186060
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186060
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  01f7a3c792241d348a4e454a30afdf6c0d6cd71c
baseline version:
 xen  7cdb1fa2ab0b5e11f66cada0370770404153c824

Last test of basis   186060  2024-05-21 08:38:31 Z1 days
Testing same since   186067  2024-05-21 23:40:31 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64

Re: [XEN PATCH v3] arm/mem_access: add conditional build of mem_access.c

2024-05-22 Thread Tamas K Lengyel
On Fri, May 10, 2024 at 8:32 AM Alessandro Zucchelli
 wrote:
>
> In order to comply to MISRA C:2012 Rule 8.4 for ARM the following
> changes are done:
> revert preprocessor conditional changes to xen/mem_access.h which
> had it build unconditionally, add conditional build for xen/mem_access.c
> as well and provide stubs in asm/mem_access.h for the users of this
> header.
>
> Signed-off-by: Alessandro Zucchelli 

Acked-by: Tamas K Lengyel 



Re: [XEN PATCH v3] arm/mem_access: add conditional build of mem_access.c

2024-05-22 Thread Nicola Vetrini

On 2024-05-10 22:59, Julien Grall wrote:

Hi,


Hi,



On 10/05/2024 13:32, Alessandro Zucchelli wrote:

In order to comply to MISRA C:2012 Rule 8.4 for ARM the following
changes are done:
revert preprocessor conditional changes to xen/mem_access.h which
had it build unconditionally, add conditional build for 
xen/mem_access.c


I am afraid, I don't understand this one as you don't seem to modify 
xen/mem_access.h. Is this meant to be part of the changelog?


You also don't seem to mention the change in Makefile. This is the one 
I was asking for in the previous version. So what about:


"xen/arm: mem_access: Conditionally compile mem_access.c

Commit 634cfc8beb ("Make MEM_ACCESS configurable") intended to make 
MEM_ACCESS configurable on Arm to reduce the code size when the user 
doesn't need it.


However, this didn't cover the arch specific code. None of the code in 
arm/mem_access.c is necessary when MEM_ACCESS=n, so it can be compiled 
out. This will require to provide some stub for functions called by the 
common code.


This is also fixing violation of the MISRA C:2012 Rule 8.4 reported by 
ECLAIR.

"

The patch itself loks good so once we agree on the commit message, then 
I am happy to update it on commit.


Cheers,


since Julien is ok with the patch, with the commit message he proposed, 
I think this needs an R-by or an A-by in order to commit for 4.19.


--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)



Re: [PATCH for-4.19 v3 2/3] xen: enable altp2m at create domain domctl

2024-05-22 Thread Roger Pau Monné
On Wed, May 22, 2024 at 03:34:29PM +0200, Jan Beulich wrote:
> On 22.05.2024 15:16, Roger Pau Monné wrote:
> > On Tue, May 21, 2024 at 12:30:32PM +0200, Jan Beulich wrote:
> >> On 17.05.2024 15:33, Roger Pau Monne wrote:
> >>> Enabling it using an HVM param is fragile, and complicates the logic when
> >>> deciding whether options that interact with altp2m can also be enabled.
> >>>
> >>> Leave the HVM param value for consumption by the guest, but prevent it 
> >>> from
> >>> being set.  Enabling is now done using and additional altp2m specific 
> >>> field in
> >>> xen_domctl_createdomain.
> >>>
> >>> Note that albeit only currently implemented in x86, altp2m could be 
> >>> implemented
> >>> in other architectures, hence why the field is added to 
> >>> xen_domctl_createdomain
> >>> instead of xen_arch_domainconfig.
> >>>
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> Reviewed-by: Jan Beulich  # hypervisor
> >> albeit with one question:
> >>
> >>> --- a/xen/arch/x86/domain.c
> >>> +++ b/xen/arch/x86/domain.c
> >>> @@ -637,6 +637,8 @@ int arch_sanitise_domain_config(struct 
> >>> xen_domctl_createdomain *config)
> >>>  bool hap = config->flags & XEN_DOMCTL_CDF_hap;
> >>>  bool nested_virt = config->flags & XEN_DOMCTL_CDF_nested_virt;
> >>>  unsigned int max_vcpus;
> >>> +unsigned int altp2m_mode = MASK_EXTR(config->altp2m_opts,
> >>> + XEN_DOMCTL_ALTP2M_mode_mask);
> >>>  
> >>>  if ( hvm ? !hvm_enabled : !IS_ENABLED(CONFIG_PV) )
> >>>  {
> >>> @@ -715,6 +717,26 @@ int arch_sanitise_domain_config(struct 
> >>> xen_domctl_createdomain *config)
> >>>  return -EINVAL;
> >>>  }
> >>>  
> >>> +if ( config->altp2m_opts & ~XEN_DOMCTL_ALTP2M_mode_mask )
> >>> +{
> >>> +dprintk(XENLOG_INFO, "Invalid altp2m options selected: %#x\n",
> >>> +config->flags);
> >>> +return -EINVAL;
> >>> +}
> >>> +
> >>> +if ( altp2m_mode && nested_virt )
> >>> +{
> >>> +dprintk(XENLOG_INFO,
> >>> +"Nested virt and altp2m are not supported together\n");
> >>> +return -EINVAL;
> >>> +}
> >>> +
> >>> +if ( altp2m_mode && !hap )
> >>> +{
> >>> +dprintk(XENLOG_INFO, "altp2m is only supported with HAP\n");
> >>> +return -EINVAL;
> >>> +}
> >>
> >> Should this last one perhaps be further extended to permit altp2m with EPT
> >> only?
> > 
> > Hm, yes, that would be more accurate as:
> > 
> > if ( altp2m_mode && (!hap || !hvm_altp2m_supported()) )
> 
> Wouldn't
> 
>if ( altp2m_mode && !hvm_altp2m_supported() )
> 
> suffice? hvm_funcs.caps.altp2m is not supposed to be set when no HAP,
> as long as HAP continues to be a pre-condition?

No, `hap` here signals whether the domain is using HAP, and we need to
take this int account, otherwise we would allow enabling altp2m for
domains using shadow.

Thanks, Roger.



[PATCH v4 2/2] drivers/char: Use sub-page ro API to make just xhci dbc cap RO

2024-05-22 Thread Marek Marczykowski-Górecki
Not the whole page, which may contain other registers too. The XHCI
specification describes DbC as designed to be controlled by a different
driver, but does not mandate placing registers on a separate page. In fact
on Tiger Lake and newer (at least), this page do contain other registers
that Linux tries to use. And with share=yes, a domU would use them too.
Without this patch, PV dom0 would fail to initialize the controller,
while HVM would be killed on EPT violation.

With `share=yes`, this patch gives domU more access to the emulator
(although a HVM with any emulated device already has plenty of it). This
configuration is already documented as unsafe with untrusted guests and
not security supported.

Signed-off-by: Marek Marczykowski-Górecki 
---
Changes in v4:
- restore mmio_ro_ranges in the fallback case
- set XHCI_SHARE_NONE in the fallback case
Changes in v3:
- indentation fix
- remove stale comment
- fallback to pci_ro_device() if subpage_mmio_ro_add() fails
- extend commit message
Changes in v2:
 - adjust for simplified subpage_mmio_ro_add() API
---
 xen/drivers/char/xhci-dbc.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/xen/drivers/char/xhci-dbc.c b/xen/drivers/char/xhci-dbc.c
index 8e2037f1a5f7..c45e4b6825cc 100644
--- a/xen/drivers/char/xhci-dbc.c
+++ b/xen/drivers/char/xhci-dbc.c
@@ -1216,20 +1216,28 @@ static void __init cf_check 
dbc_uart_init_postirq(struct serial_port *port)
 break;
 }
 #ifdef CONFIG_X86
-/*
- * This marks the whole page as R/O, which may include other registers
- * unrelated to DbC. Xen needs only DbC area protected, but it seems
- * Linux's XHCI driver (as of 5.18) works without writting to the whole
- * page, so keep it simple.
- */
-if ( rangeset_add_range(mmio_ro_ranges,
-PFN_DOWN((uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
- uart->dbc.xhc_dbc_offset),
-PFN_UP((uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
-   uart->dbc.xhc_dbc_offset +
-sizeof(*uart->dbc.dbc_reg)) - 1) )
-printk(XENLOG_INFO
-   "Error while adding MMIO range of device to mmio_ro_ranges\n");
+if ( subpage_mmio_ro_add(
+ (uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
+  uart->dbc.xhc_dbc_offset,
+ sizeof(*uart->dbc.dbc_reg)) )
+{
+printk(XENLOG_WARNING
+   "Error while marking MMIO range of XHCI console as R/O, "
+   "making the whole device R/O (share=no)\n");
+uart->dbc.share = XHCI_SHARE_NONE;
+if ( pci_ro_device(0, uart->dbc.sbdf.bus, uart->dbc.sbdf.devfn) )
+printk(XENLOG_WARNING
+   "Failed to mark read-only %pp used for XHCI console\n",
+   &uart->dbc.sbdf);
+if ( rangeset_add_range(mmio_ro_ranges,
+ PFN_DOWN((uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
+  uart->dbc.xhc_dbc_offset),
+ PFN_UP((uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
+uart->dbc.xhc_dbc_offset +
+sizeof(*uart->dbc.dbc_reg)) - 1) )
+printk(XENLOG_INFO
+   "Error while adding MMIO range of device to 
mmio_ro_ranges\n");
+}
 #endif
 }
 
-- 
git-series 0.9.1



[PATCH v4 0/2] Add API for making parts of a MMIO page R/O and use it in XHCI console

2024-05-22 Thread Marek Marczykowski-Górecki
On older systems, XHCI xcap had a layout that no other (interesting) registers
were placed on the same page as the debug capability, so Linux was fine with
making the whole page R/O. But at least on Tiger Lake and Alder Lake, Linux
needs to write to some other registers on the same page too.

Add a generic API for making just parts of an MMIO page R/O and use it to fix
USB3 console with share=yes or share=hwdom options. More details in commit
messages.

Marek Marczykowski-Górecki (2):
  x86/mm: add API for marking only part of a MMIO page read only
  drivers/char: Use sub-page ro API to make just xhci dbc cap RO

 xen/arch/x86/hvm/emulate.c  |   2 +-
 xen/arch/x86/hvm/hvm.c  |   4 +-
 xen/arch/x86/include/asm/mm.h   |  25 +++-
 xen/arch/x86/mm.c   | 273 +-
 xen/arch/x86/pv/ro-page-fault.c |   6 +-
 xen/drivers/char/xhci-dbc.c |  36 ++--
 6 files changed, 327 insertions(+), 19 deletions(-)

base-commit: b0082b908391b29b7c4dd5e6c389ebd6481926f8
-- 
git-series 0.9.1



[PATCH v4 1/2] x86/mm: add API for marking only part of a MMIO page read only

2024-05-22 Thread Marek Marczykowski-Górecki
In some cases, only few registers on a page needs to be write-protected.
Examples include USB3 console (64 bytes worth of registers) or MSI-X's
PBA table (which doesn't need to span the whole table either), although
in the latter case the spec forbids placing other registers on the same
page. Current API allows only marking whole pages pages read-only,
which sometimes may cover other registers that guest may need to
write into.

Currently, when a guest tries to write to an MMIO page on the
mmio_ro_ranges, it's either immediately crashed on EPT violation - if
that's HVM, or if PV, it gets #PF. In case of Linux PV, if access was
from userspace (like, /dev/mem), it will try to fixup by updating page
tables (that Xen again will force to read-only) and will hit that #PF
again (looping endlessly). Both behaviors are undesirable if guest could
actually be allowed the write.

Introduce an API that allows marking part of a page read-only. Since
sub-page permissions are not a thing in page tables (they are in EPT,
but not granular enough), do this via emulation (or simply page fault
handler for PV) that handles writes that are supposed to be allowed.
The new subpage_mmio_ro_add() takes a start physical address and the
region size in bytes. Both start address and the size need to be 8-byte
aligned, as a practical simplification (allows using smaller bitmask,
and a smaller granularity isn't really necessary right now).
It will internally add relevant pages to mmio_ro_ranges, but if either
start or end address is not page-aligned, it additionally adds that page
to a list for sub-page R/O handling. The list holds a bitmask which
qwords are supposed to be read-only and an address where page is mapped
for write emulation - this mapping is done only on the first access. A
plain list is used instead of more efficient structure, because there
isn't supposed to be many pages needing this precise r/o control.

The mechanism this API is plugged in is slightly different for PV and
HVM. For both paths, it's plugged into mmio_ro_emulated_write(). For PV,
it's already called for #PF on read-only MMIO page. For HVM however, EPT
violation on p2m_mmio_direct page results in a direct domain_crash() for
non hardware domains.  To reach mmio_ro_emulated_write(), change how
write violations for p2m_mmio_direct are handled - specifically, check
if they relate to such partially protected page via
subpage_mmio_write_accept() and if so, call hvm_emulate_one_mmio() for
them too. This decodes what guest is trying write and finally calls
mmio_ro_emulated_write(). The EPT write violation is detected as
npfec.write_access and npfec.present both being true (similar to other
places), which may cover some other (future?) cases - if that happens,
emulator might get involved unnecessarily, but since it's limited to
pages marked with subpage_mmio_ro_add() only, the impact is minimal.
Both of those paths need an MFN to which guest tried to write (to check
which part of the page is supposed to be read-only, and where
the page is mapped for writes). This information currently isn't
available directly in mmio_ro_emulated_write(), but in both cases it is
already resolved somewhere higher in the call tree. Pass it down to
mmio_ro_emulated_write() via new mmio_ro_emulate_ctxt.mfn field.

This may give a bit more access to the instruction emulator to HVM
guests (the change in hvm_hap_nested_page_fault()), but only for pages
explicitly marked with subpage_mmio_ro_add() - so, if the guest has a
passed through a device partially used by Xen.
As of the next patch, it applies only configuration explicitly
documented as not security supported.

The subpage_mmio_ro_add() function cannot be called with overlapping
ranges, and on pages already added to mmio_ro_ranges separately.
Successful calls would result in correct handling, but error paths may
result in incorrect state (like pages removed from mmio_ro_ranges too
early). Debug build has asserts for relevant cases.

Signed-off-by: Marek Marczykowski-Górecki 
---
Shadow mode is not tested, but I don't expect it to work differently than
HAP in areas related to this patch.

Changes in v4:
- rename SUBPAGE_MMIO_RO_ALIGN to MMIO_RO_SUBPAGE_GRAN
- guard subpage_mmio_write_accept with CONFIG_HVM, as it's used only
  there
- rename ro_qwords to ro_elems
- use unsigned arguments for subpage_mmio_ro_remove_page()
- use volatile for __iomem
- do not set mmio_ro_ctxt.mfn for mmcfg case
- comment where fields of mmio_ro_ctxt are used
- use bool for result of __test_and_set_bit
- do not open-code mfn_to_maddr()
- remove leftover RCU
- mention hvm_hap_nested_page_fault() explicitly in the commit message
Changes in v3:
- use unsigned int for loop iterators
- use __set_bit/__clear_bit when under spinlock
- avoid ioremap() under spinlock
- do not cast away const
- handle unaligned parameters in release build
- comment fixes
- remove RCU - the add functions are __init and actual usage is only
  much later after domains are running
- add

Re: [PATCH v2 0/3] Clean the policy manipulation path in domain creation

2024-05-22 Thread Alejandro Vallejo
On 20/05/2024 13:45, Roger Pau Monné wrote:
> On Fri, May 17, 2024 at 05:08:33PM +0100, Alejandro Vallejo wrote:
>> v2:
>>   * Removed xc_cpu_policy from xenguest.h
>>   * Added accessors for xc_cpu_policy so the serialised form can be 
>> extracted.
>>   * Modified xen-cpuid to use accessors.
>>
>>  Original cover letter 
>>
>> In the context of creating a domain, we currently issue a lot of hypercalls
>> redundantly while populating its CPU policy; likely a side effect of
>> organic growth more than anything else.
>>
>> However, the worst part is not the overhead (this is a glacially cold
>> path), but the insane amounts of boilerplate that make it really hard to
>> pick apart what's going on. One major contributor to this situation is the
>> fact that what's effectively "setup" and "teardown" phases in policy
>> manipulation are not factored out from the functions that perform said
>> manipulations, leading to the same getters and setter being invoked many
>> times, when once each would do.
>>
>> Another big contributor is the code being unaware of when a policy is
>> serialised and when it's not.
>>
>> This patch attempts to alleviate this situation, yielding over 200 LoC
>> reduction.
>>
>> Patch 1: Mechanical change. Makes xc_cpu_policy_t public so it's usable
>>  from clients of libxc/libxg.
>> Patch 2: Changes the (de)serialization wrappers in xenguest so they always
>>  serialise to/from the internal buffers of xc_cpu_policy_t. The
>>  struct is suitably expanded to hold extra information required.
>> Patch 3: Performs the refactor of the policy manipulation code so that it
>>  follows a strict: PULL_POLICIES, MUTATE_POLICY (n times), 
>> PUSH_POLICY.
> 
> I think patch 3 is no longer part of the set?  I don't see anything
> in the review of v1 that suggests patch 3 was not going to be part of
> the next submission?
> 
> Thanks, Roger.

It's there, there was just a shift. The implication of the first line of
the changelog is that v1/patch1 was dropped. Sorry, should've been clearer.

v1/patch1 => dropped
v1/patch2 => v2/patch1
v1/patch3 => v2/patch2

Cheers,
Alejandro



Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm

2024-05-22 Thread Alejandro Vallejo
On 22/05/2024 10:33, Jan Beulich wrote:
> On 08.05.2024 14:39, Alejandro Vallejo wrote:
>> Otherwise it's not possible to call functions described in hvm/vlapic.h from 
>> the
>> inline functions of hvm/hvm.h.
>>
>> This is because a static inline in vlapic.h depends on hvm.h, and pulls it
>> transitively through vpt.h. The ultimate cause is having hvm.h included in 
>> any
>> of the "v*.h" headers, so break the cycle moving the guilty inline into 
>> hvm.h.
>>
>> No functional change.
>>
>> Signed-off-by: Alejandro Vallejo 
> 
> In principle:
> Reviewed-by: Jan Beulich 
> But see below for one possible adjustment.
> 
>> ---
>> v2:
>>   * New patch. Prereq to moving vlapic_cpu_policy_changed() onto hvm.h
> 
> That hook invocation living outside of hvm/hvm.h was an outlier anyway,
> so even without the planned further work this is probably a good move.
> 
>> --- a/xen/arch/x86/include/asm/hvm/hvm.h
>> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
>> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu 
>> *v)
>>  alternative_vcall(hvm_funcs.update_vlapic_mode, v);
>>  }
>>  
>> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
>> +{
>> +if ( hvm_funcs.sync_pir_to_irr )
>> +alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
>> +}
> 
> The hook doesn't have "vlapic" in its name. Therefore instead or prepending
> hvm_ to the original name or the wrapper, how about replacing the vlapic_
> that was there. That would then also fit better with the naming scheme used
> for other hooks and their wrappers. Happy to adjust while committing, so
> long as you don't disagree.
> 
> Jan

Sounds reasonable. I wasn't sure whether vlapic was adding anything more
than a namespace prefix to the function name. Are you happy to adjust
that on commit?

If so, I'm good with it in the form you propose.

Cheers,
Alejandro



Re: [PATCH v3 1/2] x86/mm: add API for marking only part of a MMIO page read only

2024-05-22 Thread Marek Marczykowski-Górecki
On Wed, May 22, 2024 at 03:29:51PM +0200, Jan Beulich wrote:
> On 22.05.2024 15:22, Marek Marczykowski-Górecki wrote:
> > On Wed, May 22, 2024 at 09:52:44AM +0200, Jan Beulich wrote:
> >> On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
> >>> +static void subpage_mmio_write_emulate(
> >>> +mfn_t mfn,
> >>> +unsigned int offset,
> >>> +const void *data,
> >>> +unsigned int len)
> >>> +{
> >>> +struct subpage_ro_range *entry;
> >>> +void __iomem *addr;
> >>
> >> Wouldn't this better be pointer-to-volatile, with ...
> > 
> > Shouldn't then most other uses of __iomem in the code base be this way
> > too? I see volatile only in few places...
> 
> Quite likely, yet being consistent at least in new code is going to be
> at least desirable.

I tried. Build fails because iounmap() doesn't declare its argument as
volatile, so it triggers -Werror=discarded-qualifiers...

I'll change it just in subpage_mmio_write_emulate(), but leave
subpage_mmio_map_page() (was _get_page) without volatile.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH v2 1/2] tools/xg: Streamline cpu policy serialise/deserialise calls

2024-05-22 Thread Alejandro Vallejo
On 20/05/2024 14:47, Roger Pau Monné wrote:
>> @@ -917,17 +922,14 @@ int xc_cpu_policy_set_domain(xc_interface *xch, 
>> uint32_t domid,
>>   xc_cpu_policy_t *policy)
>>  {
>>  uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
>> -unsigned int nr_leaves = ARRAY_SIZE(policy->leaves);
>> -unsigned int nr_msrs = ARRAY_SIZE(policy->msrs);
>>  int rc;
>>  
>> -rc = xc_cpu_policy_serialise(xch, policy, policy->leaves, &nr_leaves,
>> - policy->msrs, &nr_msrs);
>> +rc = xc_cpu_policy_serialise(xch, policy);
>>  if ( rc )
>>  return rc;
>>  
>> -rc = xc_set_domain_cpu_policy(xch, domid, nr_leaves, policy->leaves,
>> -  nr_msrs, policy->msrs,
>> +rc = xc_set_domain_cpu_policy(xch, domid, policy->nr_leaves, 
>> policy->leaves,
>> +  policy->nr_msrs, policy->msrs,
> 
> I would be tempted to just pass the policy to
> xc_set_domain_cpu_policy() and get rid of the separate cpuid and msrs
> serialized arrays, but that hides (or makes it less obvious) that the
> policy needs to be serialized before providing to
> xc_set_domain_cpu_policy().  Just a rant, no need to change it here.

I'm still pondering what to do about that. I'd like to refactor all that
faff away as well, but I'm not sure how to do it cleanly yet. The
biggest danger I see is modifying one side of the policy and then wiping
those changes by mistake reserializing or deserializing at the wrong time.

Not for this series, I reckon.


>> +int xc_cpu_policy_get_msrs(xc_interface *xch,
>> +   const xc_cpu_policy_t *policy,
>> +   const xen_msr_entry_t **msrs,
>> +   uint32_t *nr)
>> +{
>> +if ( !policy )
>> +{
>> +ERROR("Failed to fetch MSRs from policy object");
>> +errno = -EINVAL;
>> +return -1;
>> +}
>> +
>> +*msrs = policy->msrs;
>> +*nr = policy->nr_msrs;
>> +
>> +return 0;
>> +}
> 
> My preference would probably be to return NULL or
> xen_{leaf,msr}_entry_t * from those, as we can then avoid an extra
> leaves/msrs parameter.  Again I'm fine with leaving it like this.
> 

It didn't feel right to have an output parameter as the return value
doubling as status code when another output is in the parameter list. I
can perfectly imagine someone grabbing "nr" and ignoring "msrs" because
"msrs" doesn't happen to be needed for them.

I think there's extra safety in making it harder to ignore the error.

>> -cpuid.length = nr_leaves * sizeof(xen_cpuid_leaf_t);
>> -if ( cpuid.length )
>> +record = (struct xc_sr_record) {
>> +.type = REC_TYPE_X86_CPUID_POLICY,
>> +.data = policy->leaves,
>> +.length = policy->nr_leaves * sizeof(*policy->leaves),
>> +};
>> +if ( record.length )
>>  {
>> -rc = write_record(ctx, &cpuid);
>> +rc = write_record(ctx, &record);
>>  if ( rc )
>>  goto out;
>>  }
> 
> 
> You could maybe write this as:
> 
> if ( policy->nr_leaves )
> {
> const struct xc_sr_record r = {
> .type = REC_TYPE_X86_CPUID_POLICY,
> .data = policy->leaves,
> .length = policy->nr_leaves * sizeof(*policy->leaves),
> };
> 
> rc = write_record(ctx, &record);
> }
> 
> (same for the msr record)
> 

Ack. Looks nicer that way.

>>  
>> -msrs.length = nr_msrs * sizeof(xen_msr_entry_t);
>> -if ( msrs.length )
>> +record = (struct xc_sr_record) {
>> +.type = REC_TYPE_X86_MSR_POLICY,
>> +.data = policy->msrs,
>> +.length = policy->nr_msrs * sizeof(*policy->msrs),
>> +};
>> +if ( record.length )
>>  {
>> -rc = write_record(ctx, &msrs);
>> +rc = write_record(ctx, &record);
>>  if ( rc )
>>  goto out;
>>  }
>> @@ -100,8 +84,6 @@ int write_x86_cpu_policy_records(struct xc_sr_context 
>> *ctx)
>>  rc = 0;
>>  
>>   out:
>> -free(cpuid.data);
>> -free(msrs.data);
>>  xc_cpu_policy_destroy(policy);
>>  
>>  return rc;
>> diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
>> index 8893547bebce..1c9ba6d32060 100644
>> --- a/tools/misc/xen-cpuid.c
>> +++ b/tools/misc/xen-cpuid.c
>> @@ -409,17 +409,21 @@ static void dump_info(xc_interface *xch, bool detail)
>>  free(fs);
>>  }
>>  
>> -static void print_policy(const char *name,
>> - xen_cpuid_leaf_t *leaves, uint32_t nr_leaves,
>> - xen_msr_entry_t *msrs, uint32_t nr_msrs)
>> +static void print_policy(xc_interface *xch, const char *name, const 
>> xc_cpu_policy_t *policy)
> 
> Line length.

Ack

> 
>>  {
>> -unsigned int l;
>> +const xen_cpuid_leaf_t *leaves;
>> +const xen_msr_entry_t *msrs;
>> +uint32_t nr_leaves, nr_msrs;
>> +
>> +if ( xc_cpu_policy_get_leaves(xch, policy, &leaves, &nr_leaves) ||
>> + xc_cpu_policy_get_msrs(xch, policy,

Re: [PATCH v2 2/2] tools/xg: Clean up xend-style overrides for CPU policies

2024-05-22 Thread Alejandro Vallejo
On 20/05/2024 16:02, Roger Pau Monné wrote:
>> -
>>  static int xc_msr_policy(xc_interface *xch, domid_t domid,
>> - const struct xc_msr *msr)
>> + const struct xc_msr *msr,
>> + xc_cpu_policy_t *host,
>> + xc_cpu_policy_t *def,
> 
> host and def should likely be const?

I tried, but I can't. All policies go through find_msr(), which takes a
non-const policy, and must be non-const because it's also used for the
cur policy.

I did the next best thing (I think) by const-ifying the result of
find_msr inside the loop for host and def. Same thing on the cpuid function.

>> -if ( rc )
>> -{
>> -PERROR("Failed to obtain host policy");
>> -rc = -errno;
>> -goto out;
>> -}
>> +if ( !msrs )
> 
> Does this build?  Where is 'msrs' defined in this context?  The
> function parameter is 'msr' AFAICT.

Ugh. I fixed that while adjusting it for testing within XenServer and
then neglected to make the change in the actual for-upstream patches.

You're right.

> 
>> +return 0;
> 
> Should we also check for host, def, cur != NULL also?

It's already done by the caller, but can do out of paranoia; returning
-EINVAL.

>> @@ -583,14 +436,16 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t 
>> domid, bool restore,
>>  int rc;
>>  bool hvm;
>>  xc_domaininfo_t di;
>> -struct xc_cpu_policy *p = xc_cpu_policy_init();
>> -unsigned int i, nr_leaves = ARRAY_SIZE(p->leaves), nr_msrs = 0;
>> -uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
>> -uint32_t host_featureset[FEATURESET_NR_ENTRIES] = {};
>> -uint32_t len = ARRAY_SIZE(host_featureset);
>>  
>> -if ( !p )
>> -return -ENOMEM;
>> +struct xc_cpu_policy *host = xc_cpu_policy_init();
>> +struct xc_cpu_policy *def = xc_cpu_policy_init();
> 
> I would be helpful to have some kind of mechanism to allocate + init a
> policy at the same time, so that the resulting object could be made
> const here.  (Not that you need to do it in this patch).

That would seem sensible, but we'd also need a way to clone it to avoid
repeating hypercalls when they aren't required. I had a patch that did
that, but was quite complicated for other reasons. I might get back to
it at some point now that per-vCPU policies don't seem to be required.

>> @@ -695,24 +542,24 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t 
>> domid, bool restore,
>>   !(dfs = x86_cpu_policy_lookup_deep_deps(b)) )
>>  continue;
>>  
>> -for ( i = 0; i < ARRAY_SIZE(disabled_features); ++i )
>> +for ( size_t i = 0; i < ARRAY_SIZE(disabled_features); ++i )
> 
> All this loop index type changes could be done as a separate patch,
> you are not even touching the surrounding lines.  It adds a lot of
> churn to this patch for no reason IMO.

I got carried away. Let me revert that. I still want to get rid of all
those overscoped indices, but this is not the patch for it.

>> @@ -772,49 +619,45 @@ int xc_cpuid_apply_policy(xc_interface *xch, uint32_t 
>> domid, bool restore,
>>   * apic_id_size values greater than 7.  Limit the value to
>>   * 7 for now.
>>   */
>> -if ( p->policy.extd.nc < 0x7f )
>> +if ( cur->policy.extd.nc < 0x7f )
>>  {
>> -if ( p->policy.extd.apic_id_size != 0 && 
>> p->policy.extd.apic_id_size < 0x7 )
>> -p->policy.extd.apic_id_size++;
>> +if ( cur->policy.extd.apic_id_size != 0 && 
>> cur->policy.extd.apic_id_size < 0x7 )
> 
> I would split the line while there, it's overly long.

Ack

> 
> Thanks, Roger.

Cheers,
Alejandro



Re: [PATCH v2 2/4] x86/shadow: Introduce sh_trace_gl1e_va()

2024-05-22 Thread Andrew Cooper
On 22/05/2024 2:47 pm, Jan Beulich wrote:
> On 22.05.2024 15:17, Andrew Cooper wrote:
>> trace_shadow_fixup() and trace_not_shadow_fault() both write out identical
>> trace records.  Reimplement them in terms of a common sh_trace_gl1e_va().
>>
>> There's no need to pack the trace record, even in the case of PAE paging.
> Isn't this altering the generated trace record for the 4-level case, in
> size changing from 20 to 24 bytes?

Oh, eww.  Yes it does.

I'll need to rework with __packed still in place.

~Andrew



Re: [PATCH v3 4/7] xen/arm: Parse xen,shared-mem when host phys address is not provided

2024-05-22 Thread Luca Fancellu
Hi Michal,

>> for ( i = 0; i < mem->nr_banks; i++ )
>> {
>> /*
>>  * Meet the following check:
>> - * 1) The shm ID matches and the region exactly match
>> - * 2) The shm ID doesn't match and the region doesn't overlap
>> - * with an existing one
>> + * - when host address is provided:
>> + *   1) The shm ID matches and the region exactly match
>> + *   2) The shm ID doesn't match and the region doesn't overlap
>> + *  with an existing one
>> + * - when host address is not provided:
>> + *   1) The shm ID matches and the region size exactly match
>>  */
>> -if ( paddr == mem->bank[i].start && size == mem->bank[i].size )
>> +bool paddr_assigned = (INVALID_PADDR == paddr);
> Shouldn't it be INVALID_PADDR != paddr to indicate that paddr was assigned? 
> Otherwise, looking at the
> code belowe you would allow a configuration where the shm_id matches but the 
> phys addresses don't.

You are right, good catch, somehow it escaped testing, I’ll fix in the next push

> 
> ~Michal



Re: [PATCH v3 2/2] x86: detect PIT aliasing on ports other than 0x4[0-3]

2024-05-22 Thread Jan Beulich
On 22.05.2024 15:57, Jason Andryuk wrote:
> On 2024-05-22 08:59, Jan Beulich wrote:
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -427,6 +427,74 @@ static struct platform_timesource __init
>>   .resume = resume_pit,
>>   };
>>   
>> +unsigned int __initdata pit_alias_mask;
>> +
>> +static void __init probe_pit_alias(void)
>> +{
>> +unsigned int mask = 0x1c;
>> +uint8_t val = 0;
>> +
>> +if ( !opt_probe_port_aliases )
>> +return;
>> +
>> +/*
>> + * Use channel 2 in mode 0 for probing.  In this mode even a non-initial
>> + * count is loaded independent of counting being / becoming enabled.  
>> Thus
>> + * we have a 16-bit value fully under our control, to write and then 
>> check
>> + * whether we can also read it back unaltered.
>> + */
>> +
>> +/* Turn off speaker output and disable channel 2 counting. */
>> +outb(inb(0x61) & 0x0c, 0x61);
>> +
>> +outb(PIT_LTCH_CH(2) | PIT_RW_LSB_MSB | PIT_MODE_EOC | PIT_BINARY,
>> + PIT_MODE);
>> +
>> +do {
>> +uint8_t val2;
>> +unsigned int offs;
>> +
>> +outb(val, PIT_CH2);
>> +outb(val ^ 0xff, PIT_CH2);
>> +
>> +/* Wait for the Null Count bit to clear. */
>> +do {
>> +/* Latch status. */
>> +outb(PIT_RDB | PIT_RDB_NO_COUNT | PIT_RDB_CH2, PIT_MODE);
>> +
>> +/* Try to make sure we're actually having a PIT here. */
>> +val2 = inb(PIT_CH2);
>> +if ( (val2 & ~(PIT_STATUS_OUT_PIN | PIT_STATUS_NULL_COUNT)) !=
>> + (PIT_RW_LSB_MSB | PIT_MODE_EOC | PIT_BINARY) )
>> +return;
>> +} while ( val2 & (1 << 6) );
> 
> You can use PIT_STATUS_NULL_COUNT here.

Indeed, and I meant to but then forgot. Thanks for noticing.

> With that:
> Reviewed-by: Jason Andryuk 

Thanks.

Jan



Re: [PATCH v3 2/2] x86: detect PIT aliasing on ports other than 0x4[0-3]

2024-05-22 Thread Jason Andryuk

On 2024-05-22 08:59, Jan Beulich wrote:

... in order to also deny Dom0 access through the alias ports (commonly
observed on Intel chipsets). Without this it is only giving the
impression of denying access to PIT. Unlike for CMOS/RTC, do detection
pretty early, to avoid disturbing normal operation later on (even if
typically we won't use much of the PIT).

Like for CMOS/RTC a fundamental assumption of the probing is that reads
from the probed alias port won't have side effects (beyond such that PIT
reads have anyway) in case it does not alias the PIT's.

As to the port 0x61 accesses: Unlike other accesses we do, this masks
off the top four bits (in addition to the bottom two ones), following
Intel chipset documentation saying that these (read-only) bits should
only be written with zero.

Signed-off-by: Jan Beulich 



--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -427,6 +427,74 @@ static struct platform_timesource __init
  .resume = resume_pit,
  };
  
+unsigned int __initdata pit_alias_mask;

+
+static void __init probe_pit_alias(void)
+{
+unsigned int mask = 0x1c;
+uint8_t val = 0;
+
+if ( !opt_probe_port_aliases )
+return;
+
+/*
+ * Use channel 2 in mode 0 for probing.  In this mode even a non-initial
+ * count is loaded independent of counting being / becoming enabled.  Thus
+ * we have a 16-bit value fully under our control, to write and then check
+ * whether we can also read it back unaltered.
+ */
+
+/* Turn off speaker output and disable channel 2 counting. */
+outb(inb(0x61) & 0x0c, 0x61);
+
+outb(PIT_LTCH_CH(2) | PIT_RW_LSB_MSB | PIT_MODE_EOC | PIT_BINARY,
+ PIT_MODE);
+
+do {
+uint8_t val2;
+unsigned int offs;
+
+outb(val, PIT_CH2);
+outb(val ^ 0xff, PIT_CH2);
+
+/* Wait for the Null Count bit to clear. */
+do {
+/* Latch status. */
+outb(PIT_RDB | PIT_RDB_NO_COUNT | PIT_RDB_CH2, PIT_MODE);
+
+/* Try to make sure we're actually having a PIT here. */
+val2 = inb(PIT_CH2);
+if ( (val2 & ~(PIT_STATUS_OUT_PIN | PIT_STATUS_NULL_COUNT)) !=
+ (PIT_RW_LSB_MSB | PIT_MODE_EOC | PIT_BINARY) )
+return;
+} while ( val2 & (1 << 6) );


You can use PIT_STATUS_NULL_COUNT here.

With that:
Reviewed-by: Jason Andryuk 

Thanks,
Jason



Re: [PATCH v2 4/4] x86/shadow: Don't leave trace record field uninitialized

2024-05-22 Thread Jan Beulich
On 22.05.2024 15:17, Andrew Cooper wrote:
> From: Jan Beulich 
> 
> The emulation_count field is set only conditionally right now. Convert
> all field setting to an initializer, thus guaranteeing that field to be
> set to 0 (default initialized) when GUEST_PAGING_LEVELS != 3.
> 
> Rework trace_shadow_emulate() to be consistent with the other trace helpers.
> 
> Coverity-ID: 1598430
> Fixes: 9a86ac1aa3d2 ("xentrace 5/7: Additional tracing for the shadow code")
> Signed-off-by: Jan Beulich 
> Acked-by: Roger Pau Monné 

Your additional changes look pretty much independent of what my original
patch did. I don't mind the folding though, yet I think you need to add
your own S-o-b as well. Then in turn
Acked-by: Jan Beulich 

Jan



Re: [PATCH v2 3/4] x86/shadow: Rework trace_shadow_emulate_other() as sh_trace_gfn_va()

2024-05-22 Thread Jan Beulich
On 22.05.2024 15:17, Andrew Cooper wrote:
> sh_trace_gfn_va() is very similar to sh_trace_gl1e_va(), and a rather shorter
> name than trace_shadow_emulate_other().  Like sh_trace_gl1e_va(), there is no
> need to pack the trace record.

Provided record size can freely change (here for the 3-level case) without
breaking consumers, i.e. similar to patch 2.

> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -2010,29 +2010,30 @@ static void sh_trace_gl1e_va(uint32_t event, 
> guest_l1e_t gl1e, guest_va_t va)
>  }
>  }
>  
> -static inline void trace_shadow_emulate_other(u32 event,
> - guest_va_t va,
> - gfn_t gfn)
> +/* Shadow trace event with a gfn, linear address and flags. */
> +static void sh_trace_gfn_va(uint32_t event, gfn_t gfn, guest_va_t va)
>  {
>  if ( tb_init_done )
>  {
> -struct __packed {
> -/* for PAE, guest_l1e may be 64 while guest_va may be 32;
> -   so put it first for alignment sake. */
> +struct {
> +/*
> + * For GUEST_PAGING_LEVELS=3 (PAE paging), gfn is 64 while
> + * guest_va is 32.  Put it first to avoid padding.
> + */
>  #if GUEST_PAGING_LEVELS == 2
> -u32 gfn;
> +uint32_t gfn;
>  #else
> -u64 gfn;
> +uint64_t gfn;
>  #endif
>  guest_va_t va;
> -} d;
> -
> -event |= ((GUEST_PAGING_LEVELS-2)<<8);
> -
> -d.gfn=gfn_x(gfn);
> -d.va = va;
> +uint32_t flags;
> +} d = {
> +.gfn = gfn_x(gfn),
> +.va = va,
> +.flags = this_cpu(trace_shadow_path_flags),
> +};

There's again no function call involved here, so having tb_init_done checked
only in (inlined) sh_trace() ought to again be enough?

Jan



Re: [PATCH v2 1/4] x86/shadow: Rework trace_shadow_gen() into sh_trace_va()

2024-05-22 Thread Jan Beulich
On 22.05.2024 15:47, Andrew Cooper wrote:
> On 22/05/2024 2:40 pm, Jan Beulich wrote:
>> On 22.05.2024 15:17, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/mm/shadow/multi.c
>>> +++ b/xen/arch/x86/mm/shadow/multi.c
>>> @@ -1974,13 +1974,17 @@ typedef u32 guest_va_t;
>>>  typedef u32 guest_pa_t;
>>>  #endif
>>>  
>>> -static inline void trace_shadow_gen(u32 event, guest_va_t va)
>>> +/* Shadow trace event with GUEST_PAGING_LEVELS folded into the event 
>>> field. */
>>> +static void sh_trace(uint32_t event, unsigned int extra, const void 
>>> *extra_data)
>>> +{
>>> +trace(event | ((GUEST_PAGING_LEVELS - 2) << 8), extra, extra_data);
>>> +}
>>> +
>>> +/* Shadow trace event with the guest's linear address. */
>>> +static void sh_trace_va(uint32_t event, guest_va_t va)
>>>  {
>>>  if ( tb_init_done )
>>> -{
>>> -event |= (GUEST_PAGING_LEVELS-2)<<8;
>>> -trace(event, sizeof(va), &va);
>>> -}
>>> +sh_trace(event, sizeof(va), &va);
>>>  }
>> If any tb_init_done check, then perhaps rather in sh_trace()? With that
>> (and provided you agree)
>> Reviewed-by: Jan Beulich 
> 
> Sadly not.  That leads to double reads of tb_init_done when tracing is
> compiled in.

Not here, but I can see how that could happen in principle, when ...

> When GCC can't fully inline the structure initialisation, it can't prove
> that a function call modified tb_init_done.  This is why I arranged all
> the trace cleanup in this way.

... inlining indeed doesn't happen. Patch 2 fits the one here in this regard
(no function calls); I have yet to look at patch 3, though.

But anyway, the present placement, while likely a little redundant, is not
the end of the world, so my R-b holds either way.

Jan



Re: [PATCH v3 4/7] xen/arm: Parse xen,shared-mem when host phys address is not provided

2024-05-22 Thread Michal Orzel
Hi Luca,

On 22/05/2024 09:51, Luca Fancellu wrote:
> 
> 
> Handle the parsing of the 'xen,shared-mem' property when the host physical
> address is not provided, this commit is introducing the logic to parse it,
> but the functionality is still not implemented and will be part of future
> commits.
> 
> Rework the logic inside process_shm_node to check the shm_id before doing
> the other checks, because it ease the logic itself, add more comment on
> the logic.
> Now when the host physical address is not provided, the value
> INVALID_PADDR is chosen to signal this condition and it is stored as
> start of the bank, due to that change also early_print_info_shmem and
> init_sharedmem_pages are changed, to not handle banks with start equal
> to INVALID_PADDR.
> 
> Another change is done inside meminfo_overlap_check, to skip banks that
> are starting with the start address INVALID_PADDR, that function is used
> to check banks from reserved memory, shared memory and ACPI and since
> the comment above the function states that wrapping around is not handled,
> it's unlikely for these bank to have the start address as INVALID_PADDR.
> Same change is done inside consider_modules, find_unallocated_memory and
> dt_unreserved_regions functions, in order to skip banks that starts with
> INVALID_PADDR from any computation.
> The changes above holds because of this consideration.
> 
> Signed-off-by: Luca Fancellu 
> Reviewed-by: Michal Orzel 
> ---
> v3 changes:
>  - fix typo in commit msg, add R-by Michal
> v2 changes:
>  - fix comments, add parenthesis to some conditions, remove unneeded
>variables, remove else branch, increment counter in the for loop,
>skip INVALID_PADDR start banks from also consider_modules,
>find_unallocated_memory and dt_unreserved_regions. (Michal)
> ---
>  xen/arch/arm/arm32/mmu/mm.c |  11 +++-
>  xen/arch/arm/domain_build.c |   5 ++
>  xen/arch/arm/setup.c|  14 +++-
>  xen/arch/arm/static-shmem.c | 125 +---
>  4 files changed, 111 insertions(+), 44 deletions(-)
> 
> diff --git a/xen/arch/arm/arm32/mmu/mm.c b/xen/arch/arm/arm32/mmu/mm.c
> index be480c31ea05..30a7aa1e8e51 100644
> --- a/xen/arch/arm/arm32/mmu/mm.c
> +++ b/xen/arch/arm/arm32/mmu/mm.c
> @@ -101,8 +101,15 @@ static paddr_t __init consider_modules(paddr_t s, 
> paddr_t e,
>  nr += reserved_mem->nr_banks;
>  for ( ; i - nr < shmem->nr_banks; i++ )
>  {
> -paddr_t r_s = shmem->bank[i - nr].start;
> -paddr_t r_e = r_s + shmem->bank[i - nr].size;
> +paddr_t r_s, r_e;
> +
> +r_s = shmem->bank[i - nr].start;
> +
> +/* Shared memory banks can contain INVALID_PADDR as start */
> +if ( INVALID_PADDR == r_s )
> +continue;
> +
> +r_e = r_s + shmem->bank[i - nr].size;
> 
>  if ( s < r_e && r_s < e )
>  {
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 968c497efc78..02e741685102 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -927,6 +927,11 @@ static int __init find_unallocated_memory(const struct 
> kernel_info *kinfo,
>  for ( j = 0; j < mem_banks[i]->nr_banks; j++ )
>  {
>  start = mem_banks[i]->bank[j].start;
> +
> +/* Shared memory banks can contain INVALID_PADDR as start */
> +if ( INVALID_PADDR == start )
> +continue;
> +
>  end = mem_banks[i]->bank[j].start + mem_banks[i]->bank[j].size;
>  res = rangeset_remove_range(unalloc_mem, PFN_DOWN(start),
>  PFN_DOWN(end - 1));
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index c4e5c19b11d6..0c2fdaceaf21 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -240,8 +240,15 @@ static void __init dt_unreserved_regions(paddr_t s, 
> paddr_t e,
>  offset = reserved_mem->nr_banks;
>  for ( ; i - offset < shmem->nr_banks; i++ )
>  {
> -paddr_t r_s = shmem->bank[i - offset].start;
> -paddr_t r_e = r_s + shmem->bank[i - offset].size;
> +paddr_t r_s, r_e;
> +
> +r_s = shmem->bank[i - offset].start;
> +
> +/* Shared memory banks can contain INVALID_PADDR as start */
> +if ( INVALID_PADDR == r_s )
> +continue;
> +
> +r_e = r_s + shmem->bank[i - offset].size;
> 
>  if ( s < r_e && r_s < e )
>  {
> @@ -272,7 +279,8 @@ static bool __init meminfo_overlap_check(const struct 
> membanks *mem,
>  bank_start = mem->bank[i].start;
>  bank_end = bank_start + mem->bank[i].size;
> 
> -if ( region_end <= bank_start || region_start >= bank_end )
> +if ( INVALID_PADDR == bank_start || region_end <= bank_start ||
> + region_start >= bank_end )
>  continue;
>  else
>  {
> diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
> index c15a65130659..74c81904b8a4 100644
> ---

Re: New Defects reported by Coverity Scan for XenProject

2024-05-22 Thread Andrew Cooper
On 22/05/2024 11:05 am, Jan Beulich wrote:
> On 22.05.2024 11:56, scan-ad...@coverity.com wrote:
>> ** CID 1598431:  Memory - corruptions  (OVERRUN)
>>
>>
>> 
>> *** CID 1598431:  Memory - corruptions  (OVERRUN)
>> /xen/common/trace.c: 798 in trace()
>> 792 }
>> 793 
>> 794 if ( rec_size > bytes_to_wrap )
>> 795 insert_wrap_record(buf, rec_size);
>> 796 
>> 797 /* Write the original record */
> CID 1598431:  Memory - corruptions  (OVERRUN)
> Overrunning callee's array of size 28 by passing argument "extra" 
> (which evaluates to 31) in call to "__insert_record".
>> 798 __insert_record(buf, event, extra, cycles, rec_size, extra_data);
>> 799 
>> 800 unlock:
>> 801 spin_unlock_irqrestore(&this_cpu(t_lock), flags);
>> 802 
>> 803 /* Notify trace buffer consumer that we've crossed the high 
>> water mark. */
> How does the tool conclude "extra" evaluating to 31, when at the top of
> the function it is clearly checked to be less than 28?

Which "top" ?

The reasoning is:

 2. Condition extra % 4UL /* sizeof (uint32_t) */, taking false branch.
 3. Condition extra / 4UL /* sizeof (uint32_t) */ > 7, taking false branch.
 4. cond_at_most: Checking extra / 4UL > 7UL implies that extra may be
up to 31 on the false branch.

which is where 31 comes from.

What Coverity hasn't done is equated "<31 && multiple of 4" to mean
"<28".  I don't think this is unreasonable; analysis has to prune the
reasoning somewhere...

This is (fundamentally) a dumb-ABI problem where we're passing a byte
count but only ever wanting to use it as a unit-of-uint32_t's count.

But it's also problem that we're passing both extra and rec_size into
__insert_record() when one is calculated from the other.

I had decided to leave this alone for now, but maybe it could do with
some improvements (simplifications) to the code.

~Andrew



Re: [PATCH v3 3/7] xen/p2m: put reference for level 2 superpage

2024-05-22 Thread Luca Fancellu
Hi Julien,

> On 22 May 2024, at 14:25, Julien Grall  wrote:
> 
>> diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
>> index 41fcca011cf4..b496266deef6 100644
>> --- a/xen/arch/arm/mmu/p2m.c
>> +++ b/xen/arch/arm/mmu/p2m.c
>> @@ -753,17 +753,9 @@ static int p2m_mem_access_radix_set(struct p2m_domain 
>> *p2m, gfn_t gfn,
>>  return rc;
>>  }
>>  -/*
>> - * Put any references on the single 4K page referenced by pte.
>> - * TODO: Handle superpages, for now we only take special references for leaf
>> - * pages (specifically foreign ones, which can't be super mapped today).
>> - */
>> -static void p2m_put_l3_page(const lpae_t pte)
>> +/* Put any references on the single 4K page referenced by mfn. */
>> +static void p2m_put_l3_page(mfn_t mfn, p2m_type_t type)
>>  {
>> -mfn_t mfn = lpae_get_mfn(pte);
>> -
>> -ASSERT(p2m_is_valid(pte));
>> -
>>  /*
>>   * TODO: Handle other p2m types
>>   *
>> @@ -771,16 +763,43 @@ static void p2m_put_l3_page(const lpae_t pte)
>>   * flush the TLBs if the page is reallocated before the end of
>>   * this loop.
>>   */
>> -if ( p2m_is_foreign(pte.p2m.type) )
>> +if ( p2m_is_foreign(type) )
>>  {
>>  ASSERT(mfn_valid(mfn));
>>  put_page(mfn_to_page(mfn));
>>  }
>>  /* Detect the xenheap page and mark the stored GFN as invalid. */
>> -else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
>> +else if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) )
>>  page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
>>  }
> 
> All the pages within a 2MB mapping should be the same type. So...
> 
>>  +/* Put any references on the superpage referenced by mfn. */
>> +static void p2m_put_l2_superpage(mfn_t mfn, p2m_type_t type)
>> +{
>> +unsigned int i;
>> +
>> +for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
>> +{
>> +p2m_put_l3_page(mfn, type);
>> +
>> +mfn = mfn_add(mfn, 1);
>> +}
> 
> ... this solution is a bit wasteful as we will now call p2m_put_l3_page() 512 
> times even though there is nothing to do.
> 
> So instead can we move the checks outside to optimize the path a bit?

You mean this?

diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
index b496266deef6..d40cddda48f3 100644
--- a/xen/arch/arm/mmu/p2m.c
+++ b/xen/arch/arm/mmu/p2m.c
@@ -794,7 +794,8 @@ static void p2m_put_page(const lpae_t pte, unsigned int 
level)
 ASSERT(p2m_is_valid(pte));
 
 /* We have a second level 2M superpage */
-if ( p2m_is_superpage(pte, level) && (level == 2) )
+if ( p2m_is_superpage(pte, level) && (level == 2) &&
+ p2m_is_foreign(pte.p2m.type) )
 return p2m_put_l2_superpage(mfn, pte.p2m.type);
 else if ( level == 3 )
 return p2m_put_l3_page(mfn, pte.p2m.type);


> Otherwise...
> 
>> +}
>> +
>> +/* Put any references on the page referenced by pte. */
>> +static void p2m_put_page(const lpae_t pte, unsigned int level)
>> +{
>> +mfn_t mfn = lpae_get_mfn(pte);
>> +
>> +ASSERT(p2m_is_valid(pte));
>> +
>> +/* We have a second level 2M superpage */
>> +if ( p2m_is_superpage(pte, level) && (level == 2) )
>> +return p2m_put_l2_superpage(mfn, pte.p2m.type);
>> +else if ( level == 3 )
>> +return p2m_put_l3_page(mfn, pte.p2m.type);
>> +}
>> +
>>  /* Free lpae sub-tree behind an entry */
>>  static void p2m_free_entry(struct p2m_domain *p2m,
>> lpae_t entry, unsigned int level)
>> @@ -809,9 +828,16 @@ static void p2m_free_entry(struct p2m_domain *p2m,
>>  #endif
>>p2m->stats.mappings[level]--;
>> -/* Nothing to do if the entry is a super-page. */
>> -if ( level == 3 )
>> -p2m_put_l3_page(entry);
>> +/*
>> + * TODO: Currently we don't handle 1GB super-page, Xen is not
>> + * preemptible and therefore some work is needed to handle such
>> + * superpages, for which at some point Xen might end up freeing 
>> memory
>> + * and therefore for such a big mapping it could end up in a very 
>> long
>> + * operation.
>> + */
>> +if ( level >= 2 )
>> +p2m_put_page(entry, level);
>> +
>>  return;
>>  }
>>  @@ -1558,9 +1584,12 @@ int relinquish_p2m_mapping(struct domain *d)
>>count++;
>>  /*
>> - * Arbitrarily preempt every 512 iterations.
>> + * Arbitrarily preempt every 512 iterations or when type is foreign
>> + * mapping and the order is above 9 (2MB).
>>   */
>> -if ( !(count % 512) && hypercall_preempt_check() )
>> +if ( (!(count % 512) ||
>> +  (p2m_is_foreign(t) && (order > XEN_PT_LEVEL_ORDER(2 &&
> 
> ... we would need to preempt for every 2MB rather than just for the 
> p2m_is_foreign().

Ok otherwise you are suggesting that if we don’t go for the solution above we 
drop p2m_is_foreign(t) from
the condition here, am I right?

> 
> BTW, p2m_put_l3_page() has also another case.

Re: [PATCH v2 1/4] x86/shadow: Rework trace_shadow_gen() into sh_trace_va()

2024-05-22 Thread Andrew Cooper
On 22/05/2024 2:40 pm, Jan Beulich wrote:
> On 22.05.2024 15:17, Andrew Cooper wrote:
>> --- a/xen/arch/x86/mm/shadow/multi.c
>> +++ b/xen/arch/x86/mm/shadow/multi.c
>> @@ -1974,13 +1974,17 @@ typedef u32 guest_va_t;
>>  typedef u32 guest_pa_t;
>>  #endif
>>  
>> -static inline void trace_shadow_gen(u32 event, guest_va_t va)
>> +/* Shadow trace event with GUEST_PAGING_LEVELS folded into the event field. 
>> */
>> +static void sh_trace(uint32_t event, unsigned int extra, const void 
>> *extra_data)
>> +{
>> +trace(event | ((GUEST_PAGING_LEVELS - 2) << 8), extra, extra_data);
>> +}
>> +
>> +/* Shadow trace event with the guest's linear address. */
>> +static void sh_trace_va(uint32_t event, guest_va_t va)
>>  {
>>  if ( tb_init_done )
>> -{
>> -event |= (GUEST_PAGING_LEVELS-2)<<8;
>> -trace(event, sizeof(va), &va);
>> -}
>> +sh_trace(event, sizeof(va), &va);
>>  }
> If any tb_init_done check, then perhaps rather in sh_trace()? With that
> (and provided you agree)
> Reviewed-by: Jan Beulich 

Sadly not.  That leads to double reads of tb_init_done when tracing is
compiled in.

When GCC can't fully inline the structure initialisation, it can't prove
that a function call modified tb_init_done.  This is why I arranged all
the trace cleanup in this way.

~Andrew



Re: [PATCH v2 2/4] x86/shadow: Introduce sh_trace_gl1e_va()

2024-05-22 Thread Jan Beulich
On 22.05.2024 15:17, Andrew Cooper wrote:
> trace_shadow_fixup() and trace_not_shadow_fault() both write out identical
> trace records.  Reimplement them in terms of a common sh_trace_gl1e_va().
> 
> There's no need to pack the trace record, even in the case of PAE paging.

Isn't this altering the generated trace record for the 4-level case, in
size changing from 20 to 24 bytes?

> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -1987,51 +1987,26 @@ static void sh_trace_va(uint32_t event, guest_va_t va)
>  sh_trace(event, sizeof(va), &va);
>  }
>  
> -static inline void trace_shadow_fixup(guest_l1e_t gl1e,
> -  guest_va_t va)
> +/* Shadow trace event with a gl1e, linear address and flags. */
> +static void sh_trace_gl1e_va(uint32_t event, guest_l1e_t gl1e, guest_va_t va)
>  {
>  if ( tb_init_done )
>  {
> -struct __packed {
> -/* for PAE, guest_l1e may be 64 while guest_va may be 32;
> -   so put it first for alignment sake. */
> -guest_l1e_t gl1e;
> -guest_va_t va;
> -u32 flags;
> -} d;
> -u32 event;
> -
> -event = TRC_SHADOW_FIXUP | ((GUEST_PAGING_LEVELS-2)<<8);
> -
> -d.gl1e = gl1e;
> -d.va = va;
> -d.flags = this_cpu(trace_shadow_path_flags);
> -
> -trace(event, sizeof(d), &d);
> -}
> -}
> -
> -static inline void trace_not_shadow_fault(guest_l1e_t gl1e,
> -  guest_va_t va)
> -{
> -if ( tb_init_done )
> -{
> -struct __packed {
> -/* for PAE, guest_l1e may be 64 while guest_va may be 32;
> -   so put it first for alignment sake. */
> +struct {
> +/*
> + * For GUEST_PAGING_LEVELS=3 (PAE paging), guest_l1e is 64 while
> + * guest_va is 32.  Put it first to avoid padding.
> + */
>  guest_l1e_t gl1e;
>  guest_va_t va;
> -u32 flags;
> -} d;
> -u32 event;
> -
> -event = TRC_SHADOW_NOT_SHADOW | ((GUEST_PAGING_LEVELS-2)<<8);
> -
> -d.gl1e = gl1e;
> -d.va = va;
> -d.flags = this_cpu(trace_shadow_path_flags);
> -
> -trace(event, sizeof(d), &d);
> +uint32_t flags;
> +} d = {
> +.gl1e = gl1e,
> +.va = va,
> +.flags = this_cpu(trace_shadow_path_flags),
> +};
> +
> +sh_trace(event, sizeof(d), &d);
>  }
>  }

Unlike in patch 1, it's less clear here whether leaving the tb_init_done
check is actually better to keep where it is. In principle the compiler
should be able to re-arrange code enough to make it identical no matter
which way it's written, at which point it might again be more desirable
to have the check solely in sh_trace().

Jan



Re: [PATCH v3 1/2] x86/PIT: supply and use #define-s

2024-05-22 Thread Jason Andryuk

On 2024-05-22 08:59, Jan Beulich wrote:

Help reading of code programming the PIT by introducing constants for
control word, read back and latch commands, as well as status.

Requested-by: Jason Andryuk 
Signed-off-by: Jan Beulich 


Reviewed-by: Jason Andryuk 

Thanks for making the switch.

Regards,
Jason



Re: [PATCH v2 1/4] x86/shadow: Rework trace_shadow_gen() into sh_trace_va()

2024-05-22 Thread Jan Beulich
On 22.05.2024 15:17, Andrew Cooper wrote:
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -1974,13 +1974,17 @@ typedef u32 guest_va_t;
>  typedef u32 guest_pa_t;
>  #endif
>  
> -static inline void trace_shadow_gen(u32 event, guest_va_t va)
> +/* Shadow trace event with GUEST_PAGING_LEVELS folded into the event field. 
> */
> +static void sh_trace(uint32_t event, unsigned int extra, const void 
> *extra_data)
> +{
> +trace(event | ((GUEST_PAGING_LEVELS - 2) << 8), extra, extra_data);
> +}
> +
> +/* Shadow trace event with the guest's linear address. */
> +static void sh_trace_va(uint32_t event, guest_va_t va)
>  {
>  if ( tb_init_done )
> -{
> -event |= (GUEST_PAGING_LEVELS-2)<<8;
> -trace(event, sizeof(va), &va);
> -}
> +sh_trace(event, sizeof(va), &va);
>  }

If any tb_init_done check, then perhaps rather in sh_trace()? With that
(and provided you agree)
Reviewed-by: Jan Beulich 

Jan



Re: [PATCH for-4.19 v3 2/3] xen: enable altp2m at create domain domctl

2024-05-22 Thread Jan Beulich
On 22.05.2024 15:16, Roger Pau Monné wrote:
> On Tue, May 21, 2024 at 12:30:32PM +0200, Jan Beulich wrote:
>> On 17.05.2024 15:33, Roger Pau Monne wrote:
>>> Enabling it using an HVM param is fragile, and complicates the logic when
>>> deciding whether options that interact with altp2m can also be enabled.
>>>
>>> Leave the HVM param value for consumption by the guest, but prevent it from
>>> being set.  Enabling is now done using and additional altp2m specific field 
>>> in
>>> xen_domctl_createdomain.
>>>
>>> Note that albeit only currently implemented in x86, altp2m could be 
>>> implemented
>>> in other architectures, hence why the field is added to 
>>> xen_domctl_createdomain
>>> instead of xen_arch_domainconfig.
>>>
>>> Signed-off-by: Roger Pau Monné 
>>
>> Reviewed-by: Jan Beulich  # hypervisor
>> albeit with one question:
>>
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -637,6 +637,8 @@ int arch_sanitise_domain_config(struct 
>>> xen_domctl_createdomain *config)
>>>  bool hap = config->flags & XEN_DOMCTL_CDF_hap;
>>>  bool nested_virt = config->flags & XEN_DOMCTL_CDF_nested_virt;
>>>  unsigned int max_vcpus;
>>> +unsigned int altp2m_mode = MASK_EXTR(config->altp2m_opts,
>>> + XEN_DOMCTL_ALTP2M_mode_mask);
>>>  
>>>  if ( hvm ? !hvm_enabled : !IS_ENABLED(CONFIG_PV) )
>>>  {
>>> @@ -715,6 +717,26 @@ int arch_sanitise_domain_config(struct 
>>> xen_domctl_createdomain *config)
>>>  return -EINVAL;
>>>  }
>>>  
>>> +if ( config->altp2m_opts & ~XEN_DOMCTL_ALTP2M_mode_mask )
>>> +{
>>> +dprintk(XENLOG_INFO, "Invalid altp2m options selected: %#x\n",
>>> +config->flags);
>>> +return -EINVAL;
>>> +}
>>> +
>>> +if ( altp2m_mode && nested_virt )
>>> +{
>>> +dprintk(XENLOG_INFO,
>>> +"Nested virt and altp2m are not supported together\n");
>>> +return -EINVAL;
>>> +}
>>> +
>>> +if ( altp2m_mode && !hap )
>>> +{
>>> +dprintk(XENLOG_INFO, "altp2m is only supported with HAP\n");
>>> +return -EINVAL;
>>> +}
>>
>> Should this last one perhaps be further extended to permit altp2m with EPT
>> only?
> 
> Hm, yes, that would be more accurate as:
> 
> if ( altp2m_mode && (!hap || !hvm_altp2m_supported()) )

Wouldn't

   if ( altp2m_mode && !hvm_altp2m_supported() )

suffice? hvm_funcs.caps.altp2m is not supposed to be set when no HAP,
as long as HAP continues to be a pre-condition?

> Would you be fine adjusting at commit, or would you prefer me to send
> an updated version?

I'd be happy to fold in whatever we settle on.

Jan




Re: [PATCH v3 1/2] x86/mm: add API for marking only part of a MMIO page read only

2024-05-22 Thread Jan Beulich
On 22.05.2024 15:22, Marek Marczykowski-Górecki wrote:
> On Wed, May 22, 2024 at 09:52:44AM +0200, Jan Beulich wrote:
>> On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
>>> +static void subpage_mmio_write_emulate(
>>> +mfn_t mfn,
>>> +unsigned int offset,
>>> +const void *data,
>>> +unsigned int len)
>>> +{
>>> +struct subpage_ro_range *entry;
>>> +void __iomem *addr;
>>
>> Wouldn't this better be pointer-to-volatile, with ...
> 
> Shouldn't then most other uses of __iomem in the code base be this way
> too? I see volatile only in few places...

Quite likely, yet being consistent at least in new code is going to be
at least desirable.

Jan



Re: [PATCH] xen/livepatch: make .livepatch.funcs read-only for in-tree tests

2024-05-22 Thread Ross Lagerwall
On Fri, Dec 1, 2023 at 10:16 AM Roger Pau Monne  wrote:
>
> This matches the flags of the .livepatch.funcs section when generated using
> livepatch-build-tools, which only sets the SHT_ALLOC flag.
>
> Also constify the definitions of the livepatch_func variables in the tests
> themselves, in order to better match the resulting output.  Note that just
> making those variables constant is not enough to force the generated sections
> to be read-only.
>
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Ross Lagerwall 



Re: [PATCH v3 3/7] xen/p2m: put reference for level 2 superpage

2024-05-22 Thread Julien Grall

Hi Luca,

On 22/05/2024 08:51, Luca Fancellu wrote:

From: Penny Zheng 

We are doing foreign memory mapping for static shared memory, and
there is a great possibility that it could be super mapped.
But today, p2m_put_l3_page could not handle superpages.

This commits implements a new function p2m_put_l2_superpage to handle
2MB superpages, specifically for helping put extra references for
foreign superpages.

Modify relinquish_p2m_mapping as well to take into account preemption
when type is foreign memory and order is above 9 (2MB).

Currently 1GB superpages are not handled because Xen is not preemptible
and therefore some work is needed to handle such superpages, for which
at some point Xen might end up freeing memory and therefore for such a
big mapping it could end up in a very long operation.

Signed-off-by: Penny Zheng 
Signed-off-by: Luca Fancellu 
---
v3:
  - Add reasoning why we don't support now 1GB superpage, remove level_order
variable from p2m_put_l2_superpage, update TODO comment inside
p2m_free_entry, use XEN_PT_LEVEL_ORDER(2) instead of value 9 inside
relinquish_p2m_mapping. (Michal)
v2:
  - Do not handle 1GB super page as there might be some issue where
a lot of calls to put_page(...) might be issued which could lead
to free memory that is a long operation.
v1:
  - patch from 
https://patchwork.kernel.org/project/xen-devel/patch/20231206090623.1932275-9-penny.zh...@arm.com/
---
  xen/arch/arm/mmu/p2m.c | 63 ++
  1 file changed, 46 insertions(+), 17 deletions(-)

diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
index 41fcca011cf4..b496266deef6 100644
--- a/xen/arch/arm/mmu/p2m.c
+++ b/xen/arch/arm/mmu/p2m.c
@@ -753,17 +753,9 @@ static int p2m_mem_access_radix_set(struct p2m_domain 
*p2m, gfn_t gfn,
  return rc;
  }
  
-/*

- * Put any references on the single 4K page referenced by pte.
- * TODO: Handle superpages, for now we only take special references for leaf
- * pages (specifically foreign ones, which can't be super mapped today).
- */
-static void p2m_put_l3_page(const lpae_t pte)
+/* Put any references on the single 4K page referenced by mfn. */
+static void p2m_put_l3_page(mfn_t mfn, p2m_type_t type)
  {
-mfn_t mfn = lpae_get_mfn(pte);
-
-ASSERT(p2m_is_valid(pte));
-
  /*
   * TODO: Handle other p2m types
   *
@@ -771,16 +763,43 @@ static void p2m_put_l3_page(const lpae_t pte)
   * flush the TLBs if the page is reallocated before the end of
   * this loop.
   */
-if ( p2m_is_foreign(pte.p2m.type) )
+if ( p2m_is_foreign(type) )
  {
  ASSERT(mfn_valid(mfn));
  put_page(mfn_to_page(mfn));
  }
  /* Detect the xenheap page and mark the stored GFN as invalid. */
-else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
+else if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) )
  page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
  }


All the pages within a 2MB mapping should be the same type. So...

  
+/* Put any references on the superpage referenced by mfn. */

+static void p2m_put_l2_superpage(mfn_t mfn, p2m_type_t type)
+{
+unsigned int i;
+
+for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+{
+p2m_put_l3_page(mfn, type);
+
+mfn = mfn_add(mfn, 1);
+}


... this solution is a bit wasteful as we will now call 
p2m_put_l3_page() 512 times even though there is nothing to do.


So instead can we move the checks outside to optimize the path a bit? 
Otherwise...



+}
+
+/* Put any references on the page referenced by pte. */
+static void p2m_put_page(const lpae_t pte, unsigned int level)
+{
+mfn_t mfn = lpae_get_mfn(pte);
+
+ASSERT(p2m_is_valid(pte));
+
+/* We have a second level 2M superpage */
+if ( p2m_is_superpage(pte, level) && (level == 2) )
+return p2m_put_l2_superpage(mfn, pte.p2m.type);
+else if ( level == 3 )
+return p2m_put_l3_page(mfn, pte.p2m.type);
+}
+
  /* Free lpae sub-tree behind an entry */
  static void p2m_free_entry(struct p2m_domain *p2m,
 lpae_t entry, unsigned int level)
@@ -809,9 +828,16 @@ static void p2m_free_entry(struct p2m_domain *p2m,
  #endif
  
  p2m->stats.mappings[level]--;

-/* Nothing to do if the entry is a super-page. */
-if ( level == 3 )
-p2m_put_l3_page(entry);
+/*
+ * TODO: Currently we don't handle 1GB super-page, Xen is not
+ * preemptible and therefore some work is needed to handle such
+ * superpages, for which at some point Xen might end up freeing memory
+ * and therefore for such a big mapping it could end up in a very long
+ * operation.
+ */
+if ( level >= 2 )
+p2m_put_page(entry, level);
+
  return;
  }
  
@@ -1558,9 +1584,12 @@ int relinquish_p2m_mapping(struct domain *d)
  
  count++;

  /*
- * Arbitrarily preempt every 512 iterations.
+ 

Re: [PATCH v3 1/2] x86/mm: add API for marking only part of a MMIO page read only

2024-05-22 Thread Marek Marczykowski-Górecki
On Wed, May 22, 2024 at 09:52:44AM +0200, Jan Beulich wrote:
> On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
> > +static void subpage_mmio_write_emulate(
> > +mfn_t mfn,
> > +unsigned int offset,
> > +const void *data,
> > +unsigned int len)
> > +{
> > +struct subpage_ro_range *entry;
> > +void __iomem *addr;
> 
> Wouldn't this better be pointer-to-volatile, with ...

Shouldn't then most other uses of __iomem in the code base be this way
too? I see volatile only in few places...

> > +list_for_each_entry(entry, &subpage_ro_ranges, list)
> > +{
> > +if ( mfn_eq(entry->mfn, mfn) )
> > +{
> > +if ( test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, 
> > entry->ro_qwords) )
> > +{
> > + write_ignored:
> > +gprintk(XENLOG_WARNING,
> > +"ignoring write to R/O MMIO 0x%"PRI_mfn"%03x len 
> > %u\n",
> > +mfn_x(mfn), offset, len);
> > +return;
> > +}
> > +
> > +addr = subpage_mmio_get_page(entry);
> > +if ( !addr )
> > +{
> > +gprintk(XENLOG_ERR,
> > +"Failed to map page for MMIO write at 
> > 0x%"PRI_mfn"%03x\n",
> > +mfn_x(mfn), offset);
> > +return;
> > +}
> > +
> > +switch ( len )
> > +{
> > +case 1:
> > +writeb(*(const uint8_t*)data, addr);
> > +break;
> > +case 2:
> > +writew(*(const uint16_t*)data, addr);
> > +break;
> > +case 4:
> > +writel(*(const uint32_t*)data, addr);
> > +break;
> > +case 8:
> > +writeq(*(const uint64_t*)data, addr);
> > +break;
> 
> ... this being how it's written? (If so, volatile suitably carried through to
> other places as well.)

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH v3 6/7] xen/arm: Implement the logic for static shared memory from Xen heap

2024-05-22 Thread Michal Orzel
Hi Luca,

On 22/05/2024 09:51, Luca Fancellu wrote:
> 
> 
> This commit implements the logic to have the static shared memory banks
> from the Xen heap instead of having the host physical address passed from
> the user.
> 
> When the host physical address is not supplied, the physical memory is
> taken from the Xen heap using allocate_domheap_memory, the allocation
> needs to occur at the first handled DT node and the allocated banks
> need to be saved somewhere.
> 
> Introduce the 'shm_heap_banks' for that reason, a struct that will hold
> the banks allocated from the heap, its field bank[].shmem_extra will be
> used to point to the bootinfo shared memory banks .shmem_extra space, so
> that there is not further allocation of memory and every bank in
> shm_heap_banks can be safely identified by the shm_id to reconstruct its
> traceability and if it was allocated or not.
> 
> A search into 'shm_heap_banks' will reveal if the banks were allocated
> or not, in case the host address is not passed, and the callback given
> to allocate_domheap_memory will store the banks in the structure and
> map them to the current domain, to do that, some changes to
> acquire_shared_memory_bank are made to let it differentiate if the bank
> is from the heap and if it is, then assign_pages is called for every
> bank.
> 
> When the bank is already allocated, for every bank allocated with the
> corresponding shm_id, handle_shared_mem_bank is called and the mapping
> are done.
> 
> Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 

~Michal




[PATCH v2 0/4] x86/shadow: Trace fixes and cleanup

2024-05-22 Thread Andrew Cooper
Patches 1-3 new, following reivew of Jan's bugfix (patch 4)

Andrew Cooper (3):
  x86/shadow: Rework trace_shadow_gen() into sh_trace_va()
  x86/shadow: Introduce sh_trace_gl1e_va()
  x86/shadow: Rework trace_shadow_emulate_other() as sh_trace_gfn_va()

Jan Beulich (1):
  x86/shadow: Don't leave trace record field uninitialized

 xen/arch/x86/mm/shadow/multi.c | 150 ++---
 1 file changed, 63 insertions(+), 87 deletions(-)


base-commit: 9c5444b01ad51369bc09197a442a93d87b4b76f2
-- 
2.30.2




[PATCH v2 1/4] x86/shadow: Rework trace_shadow_gen() into sh_trace_va()

2024-05-22 Thread Andrew Cooper
The ((GUEST_PAGING_LEVELS - 2) << 8) expression in the event field is common
to all shadow trace events, so introduce sh_trace() as a very thin wrapper
around trace().

Then, rename trace_shadow_gen() to sh_trace_va() to better describe what it is
doing, and to be more consistent with later cleanup.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: George Dunlap 

v2:
 * New
---
 xen/arch/x86/mm/shadow/multi.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index bcd02b2d0037..1775952d7e18 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1974,13 +1974,17 @@ typedef u32 guest_va_t;
 typedef u32 guest_pa_t;
 #endif
 
-static inline void trace_shadow_gen(u32 event, guest_va_t va)
+/* Shadow trace event with GUEST_PAGING_LEVELS folded into the event field. */
+static void sh_trace(uint32_t event, unsigned int extra, const void 
*extra_data)
+{
+trace(event | ((GUEST_PAGING_LEVELS - 2) << 8), extra, extra_data);
+}
+
+/* Shadow trace event with the guest's linear address. */
+static void sh_trace_va(uint32_t event, guest_va_t va)
 {
 if ( tb_init_done )
-{
-event |= (GUEST_PAGING_LEVELS-2)<<8;
-trace(event, sizeof(va), &va);
-}
+sh_trace(event, sizeof(va), &va);
 }
 
 static inline void trace_shadow_fixup(guest_l1e_t gl1e,
@@ -2239,7 +2243,7 @@ static int cf_check sh_page_fault(
 sh_reset_early_unshadow(v);
 perfc_incr(shadow_fault_fast_gnp);
 SHADOW_PRINTK("fast path not-present\n");
-trace_shadow_gen(TRC_SHADOW_FAST_PROPAGATE, va);
+sh_trace_va(TRC_SHADOW_FAST_PROPAGATE, va);
 return 0;
 }
 #ifdef CONFIG_HVM
@@ -2250,7 +2254,7 @@ static int cf_check sh_page_fault(
 perfc_incr(shadow_fault_fast_mmio);
 SHADOW_PRINTK("fast path mmio %#"PRIpaddr"\n", gpa);
 sh_reset_early_unshadow(v);
-trace_shadow_gen(TRC_SHADOW_FAST_MMIO, va);
+sh_trace_va(TRC_SHADOW_FAST_MMIO, va);
 return handle_mmio_with_translation(va, gpa >> PAGE_SHIFT, access)
? EXCRET_fault_fixed : 0;
 #else
@@ -2265,7 +2269,7 @@ static int cf_check sh_page_fault(
  * Retry and let the hardware give us the right fault next time. */
 perfc_incr(shadow_fault_fast_fail);
 SHADOW_PRINTK("fast path false alarm!\n");
-trace_shadow_gen(TRC_SHADOW_FALSE_FAST_PATH, va);
+sh_trace_va(TRC_SHADOW_FALSE_FAST_PATH, va);
 return EXCRET_fault_fixed;
 }
 }
@@ -2481,7 +2485,7 @@ static int cf_check sh_page_fault(
 #endif
 paging_unlock(d);
 put_gfn(d, gfn_x(gfn));
-trace_shadow_gen(TRC_SHADOW_DOMF_DYING, va);
+sh_trace_va(TRC_SHADOW_DOMF_DYING, va);
 return 0;
 }
 
@@ -2569,7 +2573,7 @@ static int cf_check sh_page_fault(
 put_gfn(d, gfn_x(gfn));
 
 perfc_incr(shadow_fault_mmio);
-trace_shadow_gen(TRC_SHADOW_MMIO, va);
+sh_trace_va(TRC_SHADOW_MMIO, va);
 
 return handle_mmio_with_translation(va, gpa >> PAGE_SHIFT, access)
? EXCRET_fault_fixed : 0;
-- 
2.30.2




[PATCH v2 3/4] x86/shadow: Rework trace_shadow_emulate_other() as sh_trace_gfn_va()

2024-05-22 Thread Andrew Cooper
sh_trace_gfn_va() is very similar to sh_trace_gl1e_va(), and a rather shorter
name than trace_shadow_emulate_other().  Like sh_trace_gl1e_va(), there is no
need to pack the trace record.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: George Dunlap 

v2:
 * New
---
 xen/arch/x86/mm/shadow/multi.c | 40 --
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index f0a9cc527c0b..d2fe4e148fe0 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2010,29 +2010,30 @@ static void sh_trace_gl1e_va(uint32_t event, 
guest_l1e_t gl1e, guest_va_t va)
 }
 }
 
-static inline void trace_shadow_emulate_other(u32 event,
- guest_va_t va,
- gfn_t gfn)
+/* Shadow trace event with a gfn, linear address and flags. */
+static void sh_trace_gfn_va(uint32_t event, gfn_t gfn, guest_va_t va)
 {
 if ( tb_init_done )
 {
-struct __packed {
-/* for PAE, guest_l1e may be 64 while guest_va may be 32;
-   so put it first for alignment sake. */
+struct {
+/*
+ * For GUEST_PAGING_LEVELS=3 (PAE paging), gfn is 64 while
+ * guest_va is 32.  Put it first to avoid padding.
+ */
 #if GUEST_PAGING_LEVELS == 2
-u32 gfn;
+uint32_t gfn;
 #else
-u64 gfn;
+uint64_t gfn;
 #endif
 guest_va_t va;
-} d;
-
-event |= ((GUEST_PAGING_LEVELS-2)<<8);
-
-d.gfn=gfn_x(gfn);
-d.va = va;
+uint32_t flags;
+} d = {
+.gfn = gfn_x(gfn),
+.va = va,
+.flags = this_cpu(trace_shadow_path_flags),
+};
 
-trace(event, sizeof(d), &d);
+sh_trace(event, sizeof(d), &d);
 }
 }
 
@@ -2603,8 +2604,7 @@ static int cf_check sh_page_fault(
   mfn_x(gmfn));
 perfc_incr(shadow_fault_emulate_failed);
 shadow_remove_all_shadows(d, gmfn);
-trace_shadow_emulate_other(TRC_SHADOW_EMULATE_UNSHADOW_USER,
-  va, gfn);
+sh_trace_gfn_va(TRC_SHADOW_EMULATE_UNSHADOW_USER, gfn, va);
 goto done;
 }
 
@@ -2683,8 +2683,7 @@ static int cf_check sh_page_fault(
 }
 #endif
 shadow_remove_all_shadows(d, gmfn);
-trace_shadow_emulate_other(TRC_SHADOW_EMULATE_UNSHADOW_EVTINJ,
-   va, gfn);
+sh_trace_gfn_va(TRC_SHADOW_EMULATE_UNSHADOW_EVTINJ, gfn, va);
 return EXCRET_fault_fixed;
 }
 
@@ -2739,8 +2738,7 @@ static int cf_check sh_page_fault(
  * though, this is a hint that this page should not be shadowed. */
 shadow_remove_all_shadows(d, gmfn);
 
-trace_shadow_emulate_other(TRC_SHADOW_EMULATE_UNSHADOW_UNHANDLED,
-   va, gfn);
+sh_trace_gfn_va(TRC_SHADOW_EMULATE_UNSHADOW_UNHANDLED, gfn, va);
 goto emulate_done;
 }
 
-- 
2.30.2




[PATCH v2 2/4] x86/shadow: Introduce sh_trace_gl1e_va()

2024-05-22 Thread Andrew Cooper
trace_shadow_fixup() and trace_not_shadow_fault() both write out identical
trace records.  Reimplement them in terms of a common sh_trace_gl1e_va().

There's no need to pack the trace record, even in the case of PAE paging.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: George Dunlap 

v2:
 * New
---
 xen/arch/x86/mm/shadow/multi.c | 59 ++
 1 file changed, 17 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 1775952d7e18..f0a9cc527c0b 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1987,51 +1987,26 @@ static void sh_trace_va(uint32_t event, guest_va_t va)
 sh_trace(event, sizeof(va), &va);
 }
 
-static inline void trace_shadow_fixup(guest_l1e_t gl1e,
-  guest_va_t va)
+/* Shadow trace event with a gl1e, linear address and flags. */
+static void sh_trace_gl1e_va(uint32_t event, guest_l1e_t gl1e, guest_va_t va)
 {
 if ( tb_init_done )
 {
-struct __packed {
-/* for PAE, guest_l1e may be 64 while guest_va may be 32;
-   so put it first for alignment sake. */
-guest_l1e_t gl1e;
-guest_va_t va;
-u32 flags;
-} d;
-u32 event;
-
-event = TRC_SHADOW_FIXUP | ((GUEST_PAGING_LEVELS-2)<<8);
-
-d.gl1e = gl1e;
-d.va = va;
-d.flags = this_cpu(trace_shadow_path_flags);
-
-trace(event, sizeof(d), &d);
-}
-}
-
-static inline void trace_not_shadow_fault(guest_l1e_t gl1e,
-  guest_va_t va)
-{
-if ( tb_init_done )
-{
-struct __packed {
-/* for PAE, guest_l1e may be 64 while guest_va may be 32;
-   so put it first for alignment sake. */
+struct {
+/*
+ * For GUEST_PAGING_LEVELS=3 (PAE paging), guest_l1e is 64 while
+ * guest_va is 32.  Put it first to avoid padding.
+ */
 guest_l1e_t gl1e;
 guest_va_t va;
-u32 flags;
-} d;
-u32 event;
-
-event = TRC_SHADOW_NOT_SHADOW | ((GUEST_PAGING_LEVELS-2)<<8);
-
-d.gl1e = gl1e;
-d.va = va;
-d.flags = this_cpu(trace_shadow_path_flags);
-
-trace(event, sizeof(d), &d);
+uint32_t flags;
+} d = {
+.gl1e = gl1e,
+.va = va,
+.flags = this_cpu(trace_shadow_path_flags),
+};
+
+sh_trace(event, sizeof(d), &d);
 }
 }
 
@@ -2603,7 +2578,7 @@ static int cf_check sh_page_fault(
 d->arch.paging.log_dirty.fault_count++;
 sh_reset_early_unshadow(v);
 
-trace_shadow_fixup(gw.l1e, va);
+sh_trace_gl1e_va(TRC_SHADOW_FIXUP, gw.l1e, va);
  done: __maybe_unused;
 sh_audit_gw(v, &gw);
 SHADOW_PRINTK("fixed\n");
@@ -2857,7 +2832,7 @@ static int cf_check sh_page_fault(
 put_gfn(d, gfn_x(gfn));
 
 propagate:
-trace_not_shadow_fault(gw.l1e, va);
+sh_trace_gl1e_va(TRC_SHADOW_NOT_SHADOW, gw.l1e, va);
 
 return 0;
 }
-- 
2.30.2




[PATCH v2 4/4] x86/shadow: Don't leave trace record field uninitialized

2024-05-22 Thread Andrew Cooper
From: Jan Beulich 

The emulation_count field is set only conditionally right now. Convert
all field setting to an initializer, thus guaranteeing that field to be
set to 0 (default initialized) when GUEST_PAGING_LEVELS != 3.

Rework trace_shadow_emulate() to be consistent with the other trace helpers.

Coverity-ID: 1598430
Fixes: 9a86ac1aa3d2 ("xentrace 5/7: Additional tracing for the shadow code")
Signed-off-by: Jan Beulich 
Acked-by: Roger Pau Monné 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: George Dunlap 

v2:
 * Rebase over packing/sh_trace() cleanup.
---
 xen/arch/x86/mm/shadow/multi.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index d2fe4e148fe0..47dd1cc626b2 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2063,30 +2063,29 @@ static void cf_check trace_emulate_write_val(
 #endif
 }
 
-static inline void trace_shadow_emulate(guest_l1e_t gl1e, unsigned long va)
+static inline void sh_trace_emulate(guest_l1e_t gl1e, unsigned long va)
 {
 if ( tb_init_done )
 {
-struct __packed {
-/* for PAE, guest_l1e may be 64 while guest_va may be 32;
-   so put it first for alignment sake. */
+struct {
+/*
+ * For GUEST_PAGING_LEVELS=3 (PAE paging), guest_l1e is 64 while
+ * guest_va is 32.  Put it first to avoid padding.
+ */
 guest_l1e_t gl1e, write_val;
 guest_va_t va;
 uint32_t flags:29, emulation_count:3;
-} d;
-u32 event;
-
-event = TRC_SHADOW_EMULATE | ((GUEST_PAGING_LEVELS-2)<<8);
-
-d.gl1e = gl1e;
-d.write_val.l1 = this_cpu(trace_emulate_write_val);
-d.va = va;
+} d = {
+.gl1e = gl1e,
+.write_val.l1 = this_cpu(trace_emulate_write_val),
+.va = va,
 #if GUEST_PAGING_LEVELS == 3
-d.emulation_count = this_cpu(trace_extra_emulation_count);
+.emulation_count = this_cpu(trace_extra_emulation_count),
 #endif
-d.flags = this_cpu(trace_shadow_path_flags);
+.flags = this_cpu(trace_shadow_path_flags),
+};
 
-trace(event, sizeof(d), &d);
+sh_trace(TRC_SHADOW_EMULATE, sizeof(d), &d);
 }
 }
 #endif /* CONFIG_HVM */
@@ -2815,7 +2814,7 @@ static int cf_check sh_page_fault(
 }
 #endif /* PAE guest */
 
-trace_shadow_emulate(gw.l1e, va);
+sh_trace_emulate(gw.l1e, va);
  emulate_done:
 SHADOW_PRINTK("emulated\n");
 return EXCRET_fault_fixed;
-- 
2.30.2




Re: [PATCH v3 3/7] xen/p2m: put reference for level 2 superpage

2024-05-22 Thread Michal Orzel
Hi Luca,

On 22/05/2024 09:51, Luca Fancellu wrote:
> 
> 
> From: Penny Zheng 
> 
> We are doing foreign memory mapping for static shared memory, and
> there is a great possibility that it could be super mapped.
> But today, p2m_put_l3_page could not handle superpages.
> 
> This commits implements a new function p2m_put_l2_superpage to handle
> 2MB superpages, specifically for helping put extra references for
> foreign superpages.
> 
> Modify relinquish_p2m_mapping as well to take into account preemption
> when type is foreign memory and order is above 9 (2MB).
> 
> Currently 1GB superpages are not handled because Xen is not preemptible
> and therefore some work is needed to handle such superpages, for which
> at some point Xen might end up freeing memory and therefore for such a
> big mapping it could end up in a very long operation.
> 
> Signed-off-by: Penny Zheng 
> Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 

~Michal




Re: [PATCH for-4.19 v3 2/3] xen: enable altp2m at create domain domctl

2024-05-22 Thread Roger Pau Monné
On Tue, May 21, 2024 at 12:30:32PM +0200, Jan Beulich wrote:
> On 17.05.2024 15:33, Roger Pau Monne wrote:
> > Enabling it using an HVM param is fragile, and complicates the logic when
> > deciding whether options that interact with altp2m can also be enabled.
> > 
> > Leave the HVM param value for consumption by the guest, but prevent it from
> > being set.  Enabling is now done using and additional altp2m specific field 
> > in
> > xen_domctl_createdomain.
> > 
> > Note that albeit only currently implemented in x86, altp2m could be 
> > implemented
> > in other architectures, hence why the field is added to 
> > xen_domctl_createdomain
> > instead of xen_arch_domainconfig.
> > 
> > Signed-off-by: Roger Pau Monné 
> 
> Reviewed-by: Jan Beulich  # hypervisor
> albeit with one question:
> 
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -637,6 +637,8 @@ int arch_sanitise_domain_config(struct 
> > xen_domctl_createdomain *config)
> >  bool hap = config->flags & XEN_DOMCTL_CDF_hap;
> >  bool nested_virt = config->flags & XEN_DOMCTL_CDF_nested_virt;
> >  unsigned int max_vcpus;
> > +unsigned int altp2m_mode = MASK_EXTR(config->altp2m_opts,
> > + XEN_DOMCTL_ALTP2M_mode_mask);
> >  
> >  if ( hvm ? !hvm_enabled : !IS_ENABLED(CONFIG_PV) )
> >  {
> > @@ -715,6 +717,26 @@ int arch_sanitise_domain_config(struct 
> > xen_domctl_createdomain *config)
> >  return -EINVAL;
> >  }
> >  
> > +if ( config->altp2m_opts & ~XEN_DOMCTL_ALTP2M_mode_mask )
> > +{
> > +dprintk(XENLOG_INFO, "Invalid altp2m options selected: %#x\n",
> > +config->flags);
> > +return -EINVAL;
> > +}
> > +
> > +if ( altp2m_mode && nested_virt )
> > +{
> > +dprintk(XENLOG_INFO,
> > +"Nested virt and altp2m are not supported together\n");
> > +return -EINVAL;
> > +}
> > +
> > +if ( altp2m_mode && !hap )
> > +{
> > +dprintk(XENLOG_INFO, "altp2m is only supported with HAP\n");
> > +return -EINVAL;
> > +}
> 
> Should this last one perhaps be further extended to permit altp2m with EPT
> only?

Hm, yes, that would be more accurate as:

if ( altp2m_mode && (!hap || !hvm_altp2m_supported()) )

Would you be fine adjusting at commit, or would you prefer me to send
an updated version?

Thanks, Roger.



Re: [PATCH v3 5/8] xen/arm/gic: Allow routing/removing interrupt to running VMs

2024-05-22 Thread Julien Grall

Hi Henry,

On 22/05/2024 02:22, Henry Wang wrote:

On 5/22/2024 9:16 AM, Stefano Stabellini wrote:

On Wed, 22 May 2024, Henry Wang wrote:

Hi Julien,

On 5/21/2024 8:30 PM, Julien Grall wrote:

Hi,

On 21/05/2024 05:35, Henry Wang wrote:

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 56490dbc43..956c11ba13 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -439,24 +439,33 @@ int vgic_connect_hw_irq(struct domain *d, struct
vcpu *v, unsigned int virq,
     /* We are taking to rank lock to prevent parallel 
connections. */

   vgic_lock_rank(v_target, rank, flags);
+    spin_lock(&v_target->arch.vgic.lock);

I know this is what Stefano suggested, but v_target would point to the
current affinity whereas the interrupt may be pending/active on the
"previous" vCPU. So it is a little unclear whether v_target is the 
correct

lock. Do you have more pointer to show this is correct?
No I think you are correct, we have discussed this in the initial 
version of

this patch. Sorry.

I followed the way from that discussion to note down the vcpu ID and 
retrieve

here, below is the diff, would this make sense to you?

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 956c11ba13..134ed4e107 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -439,7 +439,7 @@ int vgic_connect_hw_irq(struct domain *d, struct 
vcpu *v,

unsigned int virq,

  /* We are taking to rank lock to prevent parallel connections. */
  vgic_lock_rank(v_target, rank, flags);
-    spin_lock(&v_target->arch.vgic.lock);
+ spin_lock(&d->vcpu[p->spi_vcpu_id]->arch.vgic.lock);

  if ( connect )
  {
@@ -465,7 +465,7 @@ int vgic_connect_hw_irq(struct domain *d, struct 
vcpu *v,

unsigned int virq,
  p->desc = NULL;
  }

-    spin_unlock(&v_target->arch.vgic.lock);
+ spin_unlock(&d->vcpu[p->spi_vcpu_id]->arch.vgic.lock);
  vgic_unlock_rank(v_target, rank, flags);

  return ret;
diff --git a/xen/arch/arm/include/asm/vgic.h 
b/xen/arch/arm/include/asm/vgic.h

index 79b73a0dbb..f4075d3e75 100644
--- a/xen/arch/arm/include/asm/vgic.h
+++ b/xen/arch/arm/include/asm/vgic.h
@@ -85,6 +85,7 @@ struct pending_irq
  uint8_t priority;
  uint8_t lpi_priority;   /* Caches the priority if this is 
an LPI. */

  uint8_t lpi_vcpu_id;    /* The VCPU for an LPI. */
+    uint8_t spi_vcpu_id;    /* The VCPU for an SPI. */
  /* inflight is used to append instances of pending_irq to
   * vgic.inflight_irqs */
  struct list_head inflight;
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index c04fc4f83f..e852479f13 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -632,6 +632,7 @@ void vgic_inject_irq(struct domain *d, struct 
vcpu *v,

unsigned int virq,
  }
  list_add_tail(&n->inflight, &v->arch.vgic.inflight_irqs);
  out:
+    n->spi_vcpu_id = v->vcpu_id;
  spin_unlock_irqrestore(&v->arch.vgic.lock, flags);

  /* we have a new higher priority irq, inject it into the guest */
  vcpu_kick(v);


Also, while looking at the locking, I noticed that we are not doing 
anything
with GIC_IRQ_GUEST_MIGRATING. In gic_update_one_lr(), we seem to 
assume that

if the flag is set, then p->desc cannot be NULL.

Can we reach vgic_connect_hw_irq() with the flag set?
I think even from the perspective of making the code extra safe, we 
should
also check GIC_IRQ_GUEST_MIGRATING as the LR is allocated for this 
case. I

will also add the check of GIC_IRQ_GUEST_MIGRATING here.

Yes. I think it might be easier to check for GIC_IRQ_GUEST_MIGRATING
early and return error immediately in that case. Otherwise, we can
continue and take spin_lock(&v_target->arch.vgic.lock) because no
migration is in progress


Ok, this makes sense to me, I will add

     if( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
     {
     vgic_unlock_rank(v_target, rank, flags);
     return -EBUSY;
     }

right after taking the vgic rank lock.


I think that would be ok. I have to admit, I am still a bit wary about 
allowing to remove interrupts when the domain is running.


I am less concerned about the add part. Do you need the remove part now? 
If not, I would suggest to split in two so we can get the most of this 
series merged for 4.19 and continue to deal with the remove path in the 
background.


I will answer here to the other reply:

> I don't think so, if I am not mistaken, no LR will be allocated with 
other flags set.


I wasn't necessarily thinking about the LR allocation. I was more 
thinking whether there are any flags that could still be set.


IOW, will the vIRQ like new once vgic_connect_hw_irq() is succesful?

Also, while looking at the flags, I noticed we clear _IRQ_INPROGRESS 
before vgic_connect_hw_irq(). Shouldn't we only clear *after*?


This brings to another question. You don't special case a dying domain. 
If the domain is crashing, wouldn't this mean it wouldn't be possible to 
destroy it?


Cheers,

[PATCH v3 2/2] x86: detect PIT aliasing on ports other than 0x4[0-3]

2024-05-22 Thread Jan Beulich
... in order to also deny Dom0 access through the alias ports (commonly
observed on Intel chipsets). Without this it is only giving the
impression of denying access to PIT. Unlike for CMOS/RTC, do detection
pretty early, to avoid disturbing normal operation later on (even if
typically we won't use much of the PIT).

Like for CMOS/RTC a fundamental assumption of the probing is that reads
from the probed alias port won't have side effects (beyond such that PIT
reads have anyway) in case it does not alias the PIT's.

As to the port 0x61 accesses: Unlike other accesses we do, this masks
off the top four bits (in addition to the bottom two ones), following
Intel chipset documentation saying that these (read-only) bits should
only be written with zero.

Signed-off-by: Jan Beulich 
---
If Xen was running on top of another instance of itself (in HVM mode,
not PVH, i.e. not as a shim), prior to 14f42af3f52d ('x86/vPIT: account
for "counter stopped" time') I'm afraid our vPIT logic would not have
allowed the "Try to further make sure ..." check to pass in the Xen
running on top: We don't respect the gate bit being clear when handling
counter reads. (There are more unhandled [and unmentioned as being so]
aspects of PIT behavior though, yet it's unclear in how far addressing
at least some of them would be useful.)
---
v3: Use PIT_* in dom0_setup_permissions(). Use #define-s introduced by
new earlier patch.
v2: Use new command line option. Re-base over changes to earlier
patches. Use ISOLATE_LSB().

--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct memsize {
 long nr_pages;
@@ -495,7 +496,11 @@ int __init dom0_setup_permissions(struct
 rc |= ioports_deny_access(d, 0x4D0, 0x4D1);
 
 /* Interval Timer (PIT). */
-rc |= ioports_deny_access(d, 0x40, 0x43);
+for ( offs = 0, i = ISOLATE_LSB(pit_alias_mask) ?: 4;
+  offs <= pit_alias_mask; offs += i )
+if ( !(offs & ~pit_alias_mask) )
+rc |= ioports_deny_access(d, PIT_CH0 + offs, PIT_MODE + offs);
+
 /* PIT Channel 2 / PC Speaker Control. */
 rc |= ioports_deny_access(d, 0x61, 0x61);
 
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -49,6 +49,7 @@ extern unsigned long highmem_start;
 #endif
 
 extern unsigned int i8259A_alias_mask;
+extern unsigned int pit_alias_mask;
 
 extern int8_t opt_smt;
 extern int8_t opt_probe_port_aliases;
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -427,6 +427,74 @@ static struct platform_timesource __init
 .resume = resume_pit,
 };
 
+unsigned int __initdata pit_alias_mask;
+
+static void __init probe_pit_alias(void)
+{
+unsigned int mask = 0x1c;
+uint8_t val = 0;
+
+if ( !opt_probe_port_aliases )
+return;
+
+/*
+ * Use channel 2 in mode 0 for probing.  In this mode even a non-initial
+ * count is loaded independent of counting being / becoming enabled.  Thus
+ * we have a 16-bit value fully under our control, to write and then check
+ * whether we can also read it back unaltered.
+ */
+
+/* Turn off speaker output and disable channel 2 counting. */
+outb(inb(0x61) & 0x0c, 0x61);
+
+outb(PIT_LTCH_CH(2) | PIT_RW_LSB_MSB | PIT_MODE_EOC | PIT_BINARY,
+ PIT_MODE);
+
+do {
+uint8_t val2;
+unsigned int offs;
+
+outb(val, PIT_CH2);
+outb(val ^ 0xff, PIT_CH2);
+
+/* Wait for the Null Count bit to clear. */
+do {
+/* Latch status. */
+outb(PIT_RDB | PIT_RDB_NO_COUNT | PIT_RDB_CH2, PIT_MODE);
+
+/* Try to make sure we're actually having a PIT here. */
+val2 = inb(PIT_CH2);
+if ( (val2 & ~(PIT_STATUS_OUT_PIN | PIT_STATUS_NULL_COUNT)) !=
+ (PIT_RW_LSB_MSB | PIT_MODE_EOC | PIT_BINARY) )
+return;
+} while ( val2 & (1 << 6) );
+
+/*
+ * Try to further make sure we're actually having a PIT here.
+ *
+ * NB: Deliberately |, not ||, as we always want both reads.
+ */
+val2 = inb(PIT_CH2);
+if ( (val2 ^ val) | (inb(PIT_CH2) ^ val ^ 0xff) )
+return;
+
+for ( offs = ISOLATE_LSB(mask); offs <= mask; offs <<= 1 )
+{
+if ( !(mask & offs) )
+continue;
+val2 = inb(PIT_CH2 + offs);
+if ( (val2 ^ val) | (inb(PIT_CH2 + offs) ^ val ^ 0xff) )
+mask &= ~offs;
+}
+} while ( mask && (val += 0x0b) );  /* Arbitrary uneven number. */
+
+if ( mask )
+{
+dprintk(XENLOG_INFO, "PIT aliasing mask: %02x\n", mask);
+pit_alias_mask = mask;
+}
+}
+
 /
  * PLATFORM TIMER 2: HIGH PRECISION EVENT TIMER (HPET)
  */
@@ -2416,6 +2484,8 @@ void __init early_time_init(void)
 }
 
 preinit_pit();
+probe_pit_alias();
+
 tmp = init_platf

[PATCH v3 1/2] x86/PIT: supply and use #define-s

2024-05-22 Thread Jan Beulich
Help reading of code programming the PIT by introducing constants for
control word, read back and latch commands, as well as status.

Requested-by: Jason Andryuk 
Signed-off-by: Jan Beulich 
---
v3: New.

--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -983,7 +983,7 @@ static unsigned int __init get_8254_time
 
 /*spin_lock_irqsave(&i8253_lock, flags);*/
 
-outb_p(0x00, PIT_MODE);
+outb_p(PIT_LTCH_CH(0), PIT_MODE);
 count = inb_p(PIT_CH0);
 count |= inb_p(PIT_CH0) << 8;
 
--- a/xen/arch/x86/include/asm/time.h
+++ b/xen/arch/x86/include/asm/time.h
@@ -58,4 +58,38 @@ struct time_scale;
 void set_time_scale(struct time_scale *ts, u64 ticks_per_sec);
 u64 scale_delta(u64 delta, const struct time_scale *scale);
 
+/* Programmable Interval Timer (8254) */
+
+/* Timer Control Word */
+#define PIT_TCW_CH(n) ((n) << 6)
+/* Lower bits also Timer Status. */
+#define PIT_RW_MSB(1 << 5)
+#define PIT_RW_LSB(1 << 4)
+#define PIT_RW_LSB_MSB(PIT_RW_LSB | PIT_RW_MSB)
+#define PIT_MODE_EOC  (0 << 1)
+#define PIT_MODE_ONESHOT  (1 << 1)
+#define PIT_MODE_RATE_GEN (2 << 1)
+#define PIT_MODE_SQUARE_WAVE  (3 << 1)
+#define PIT_MODE_SW_STROBE(4 << 1)
+#define PIT_MODE_HW_STROBE(5 << 1)
+#define PIT_BINARY(0 << 0)
+#define PIT_BCD   (1 << 0)
+
+/* Read Back Command */
+#define PIT_RDB   PIT_TCW_CH(3)
+#define PIT_RDB_NO_COUNT  (1 << 5)
+#define PIT_RDB_NO_STATUS (1 << 4)
+#define PIT_RDB_CH2   (1 << 3)
+#define PIT_RDB_CH1   (1 << 2)
+#define PIT_RDB_CH0   (1 << 1)
+#define PIT_RDB_RSVD  (1 << 0)
+
+/* Counter Latch Command */
+#define PIT_LTCH_CH(n)PIT_TCW_CH(n)
+
+/* Timer Status */
+#define PIT_STATUS_OUT_PIN(1 << 7)
+#define PIT_STATUS_NULL_COUNT (1 << 6)
+/* Lower bits match Timer Control Word. */
+
 #endif /* __X86_TIME_H__ */
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -222,7 +222,7 @@ static void cf_check timer_interrupt(int
 
 spin_lock_irq(&pit_lock);
 
-outb(0x80, PIT_MODE);
+outb(PIT_LTCH_CH(2), PIT_MODE);
 count  = inb(PIT_CH2);
 count |= inb(PIT_CH2) << 8;
 
@@ -245,7 +245,8 @@ static void preinit_pit(void)
 {
 /* Set PIT channel 0 to HZ Hz. */
 #define LATCH (((CLOCK_TICK_RATE)+(HZ/2))/HZ)
-outb_p(0x34, PIT_MODE);/* binary, mode 2, LSB/MSB, ch 0 */
+outb_p(PIT_TCW_CH(0) | PIT_RW_LSB_MSB | PIT_MODE_RATE_GEN | PIT_BINARY,
+   PIT_MODE);
 outb_p(LATCH & 0xff, PIT_CH0); /* LSB */
 outb(LATCH >> 8, PIT_CH0); /* MSB */
 #undef LATCH
@@ -356,7 +357,7 @@ static u64 cf_check read_pit_count(void)
 
 spin_lock_irqsave(&pit_lock, flags);
 
-outb(0x80, PIT_MODE);
+outb(PIT_LTCH_CH(2), PIT_MODE);
 count16  = inb(PIT_CH2);
 count16 |= inb(PIT_CH2) << 8;
 
@@ -383,7 +384,8 @@ static s64 __init cf_check init_pit(stru
  */
 #define CALIBRATE_LATCH CALIBRATE_VALUE(CLOCK_TICK_RATE)
 BUILD_BUG_ON(CALIBRATE_LATCH >> 16);
-outb(0xb0, PIT_MODE);  /* binary, mode 0, LSB/MSB, Ch 2 */
+outb(PIT_TCW_CH(2) | PIT_RW_LSB_MSB | PIT_MODE_EOC | PIT_BINARY,
+ PIT_MODE);
 outb(CALIBRATE_LATCH & 0xff, PIT_CH2); /* LSB of count */
 outb(CALIBRATE_LATCH >> 8, PIT_CH2);   /* MSB of count */
 #undef CALIBRATE_LATCH
@@ -408,7 +410,8 @@ static s64 __init cf_check init_pit(stru
 static void cf_check resume_pit(struct platform_timesource *pts)
 {
 /* Set CTC channel 2 to mode 0 again; initial value does not matter. */
-outb(0xb0, PIT_MODE); /* binary, mode 0, LSB/MSB, Ch 2 */
+outb(PIT_TCW_CH(2) | PIT_RW_LSB_MSB | PIT_MODE_EOC | PIT_BINARY,
+ PIT_MODE);
 outb(0, PIT_CH2); /* LSB of count */
 outb(0, PIT_CH2); /* MSB of count */
 }
@@ -2456,7 +2459,8 @@ static int _disable_pit_irq(bool init)
 }
 
 /* Disable PIT CH0 timer interrupt. */
-outb_p(0x30, PIT_MODE);
+outb_p(PIT_TCW_CH(0) | PIT_RW_LSB_MSB | PIT_MODE_EOC | PIT_BINARY,
+   PIT_MODE);
 outb_p(0, PIT_CH0);
 outb_p(0, PIT_CH0);
 
@@ -2562,17 +2566,18 @@ int hwdom_pit_access(struct ioreq *ioreq
 case PIT_MODE:
 if ( ioreq->dir == IOREQ_READ )
 return 0; /* urk! */
-switch ( ioreq->data & 0xc0 )
+switch ( ioreq->data & PIT_TCW_CH(3) )
 {
-case 0xc0: /* Read Back */
-if ( ioreq->data & 0x08 )/* Select Channel 2? */
-outb(ioreq->data & 0xf8, PIT_MODE);
-if ( !(ioreq->data & 0x06) ) /* Select Channel 0/1? */
+case PIT_RDB: /* Read Back */
+if ( ioreq->data & PIT_RDB_CH2 )
+outb(ioreq->data & ~(PIT_RDB_CH1 | PIT_RDB_CH0 | PIT_RDB_RSVD),
+ PIT_MODE);
+if ( !(ioreq->data & (PIT_RDB_CH0 | PIT_RDB_CH1)) )
 return 1; /* no - we're done */
 /* Filter Channel 2 and reserved bit 0. */
-ioreq->data &= ~0x09;
+  

[PATCH v3 0/2] x86: detect PIT aliasing on ports other than 0x4[0-3]

2024-05-22 Thread Jan Beulich
1: PIT: supply and use #define-s
2: detect PIT aliasing on ports other than 0x4[0-3]

No functional change from v2, just the introduction of the new prereq
patch to help overall readability.

Jan



Re: [PATCH v3 1/2] x86/mm: add API for marking only part of a MMIO page read only

2024-05-22 Thread Jan Beulich
On 22.05.2024 12:36, Marek Marczykowski-Górecki wrote:
> On Wed, May 22, 2024 at 09:52:44AM +0200, Jan Beulich wrote:
>> On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -2009,6 +2009,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
>>> long gla,
>>>  goto out_put_gfn;
>>>  }
>>>  
>>> +if ( (p2mt == p2m_mmio_direct) && npfec.write_access && npfec.present 
>>> &&
>>> + subpage_mmio_write_accept(mfn, gla) &&
>>
>> Afaics subpage_mmio_write_accept() is unreachable then when CONFIG_HVM=n?
> 
> Right, the PV path hits mmio_ro_emulated_write() without my changes
> already.
> Do you suggest to make subpage_mmio_write_accept() under #ifdef
> CONFIG_HVM?

That's not just me, but also Misra.

>>> + (hvm_emulate_one_mmio(mfn_x(mfn), gla) == X86EMUL_OKAY) )
>>> +{
>>> +rc = 1;
>>> +goto out_put_gfn;
>>> +}
>>
>> Overall this new if() is pretty similar to the immediate preceding one.
>> So similar that I wonder whether the two shouldn't be folded. 
> 
> I can do that if you prefer.
> 
>> In fact
>> it looks as if the new one is needed only for the case where you'd pass
>> through (to a DomU) a device partially used by Xen. That could certainly
>> do with mentioning explicitly.
> 
> Well, the change in mmio_ro_emulated_write() is relevant to both dom0
> and domU. It simply wasn't reachable (in this case) for HVM domU before
> (but was for PV already).

The remark was about the code here only. Of course that other change you
talk about is needed for both, and I wasn't meaning to suggest Dom0 had
worked (in this regard) prior to your change.

>>> +static void __iomem *subpage_mmio_get_page(struct subpage_ro_range *entry)
>>> +{
>>> +void __iomem *mapped_page;
>>> +
>>> +if ( entry->mapped )
>>> +return entry->mapped;
>>> +
>>> +mapped_page = ioremap(mfn_x(entry->mfn) << PAGE_SHIFT, PAGE_SIZE);
>>> +
>>> +spin_lock(&subpage_ro_lock);
>>> +/* Re-check under the lock */
>>> +if ( entry->mapped )
>>> +{
>>> +spin_unlock(&subpage_ro_lock);
>>> +iounmap(mapped_page);
>>
>> The only unmap is on an error path here and on another error path elsewhere.
>> IOW it looks as if devices with such marked pages are meant to never be hot
>> unplugged. I can see that being intentional for the XHCI console, but imo
>> such a restriction also needs prominently calling out in a comment next to
>> e.g. the function declaration.
> 
> The v1 included subpage_mmio_ro_remove() function (which would need to
> be used in case of hot-unplug of such device, if desirable), but since
> this series doesn't introduce any use of it (as you say, it isn't
> desirable for XHCI console specifically), you asked me to remove it...
> 
> Should I add an explicit comment about the limitation, instead of having
> it implicit by not having subpage_mmio_ro_remove() there?

That's what I was asking for in my earlier comment, yes.

>>> --- a/xen/arch/x86/pv/ro-page-fault.c
>>> +++ b/xen/arch/x86/pv/ro-page-fault.c
>>> @@ -330,6 +330,7 @@ static int mmio_ro_do_page_fault(struct 
>>> x86_emulate_ctxt *ctxt,
>>>  return X86EMUL_UNHANDLEABLE;
>>>  }
>>>  
>>> +mmio_ro_ctxt.mfn = mfn;
>>>  ctxt->data = &mmio_ro_ctxt;
>>>  if ( pci_ro_mmcfg_decode(mfn_x(mfn), &mmio_ro_ctxt.seg, 
>>> &mmio_ro_ctxt.bdf) )
>>>  return x86_emulate(ctxt, &mmcfg_intercept_ops);
>>
>> Wouldn't you better set .mfn only on the "else" path, just out of context?
>> Suggesting that the new field in the struct could actually overlay the
>> (seg,bdf) tuple (being of relevance only to MMCFG intercept handling).
>> This would be more for documentation purposes than to actually save space.
>> (If so, perhaps the "else" itself would also better be dropped while making
>> the adjustment.)
> 
> I can do that if you prefer. But personally, I find such such use of an
> union risky (without some means for a compiler to actually enforce their
> proper use) - while for correct code it may save some space, it makes
> the impact of a type confusion bug potentially worse - now that the
> unexpected value would be potentially attacker controlled.
> For a documentation purpose I can simply add a comment.

Well, I'm not going to insist on using a union. But I am pretty firm on
expecting the setting of .mfn to move down. Not using a union will then
mean static analysis tools may point out that .mfn is left uninitialized
for the above visible 1st invocation of x86_emulate().

Jan



[xen-unstable test] 186066: tolerable FAIL - PUSHED

2024-05-22 Thread osstest service owner
flight 186066 xen-unstable real [real]
flight 186075 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/186066/
http://logs.test-lab.xenproject.org/osstest/logs/186075/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-qcow2 8 xen-bootfail pass in 186075-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-qcow2   14 migrate-support-check fail in 186075 never pass
 test-armhf-armhf-xl-qcow2 15 saverestore-support-check fail in 186075 never 
pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186058
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186058
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186058
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186058
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186058
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186058
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass

version targeted for testing:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c
baseline version:
 xen  26b122e3bf8f3921d87312fbf5e7e13872ae92b0

Last test of basis   186058  2024-05-21 05:10:55 Z1 days
Testing same since   186066  2024-05-21 19:40:37 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Henry Wang 
  Jan Beulich 
  Nicola Vetrini 
  Oleksii Kurochko 
  Petr Beneš 
  Roger Pau Monné 
  Tamas K Lengyel 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  

Re: [PATCH] x86/shadow: don't leave trace record field uninitialized

2024-05-22 Thread Andrew Cooper
On 22/05/2024 11:17 am, Jan Beulich wrote:
> The emulation_count field is set only conditionally right now. Convert
> all field setting to an initializer, thus guaranteeing that field to be
> set to 0 (default initialized) when GUEST_PAGING_LEVELS != 3.
>
> While there also drop the "event" local variable, thus eliminating an
> instance of the being phased out u32 type.
>
> Coverity ID: 1598430
> Fixes: 9a86ac1aa3d2 ("xentrace 5/7: Additional tracing for the shadow code")
> Signed-off-by: Jan Beulich 

This is an improvement, but there's a related mess right next to it.

I think this would be a whole lot better with a couple of tweaks, if
you're willing to wait a little for me to try.

~Andrew



Re: [PATCH] x86/shadow: don't leave trace record field uninitialized

2024-05-22 Thread Roger Pau Monné
On Wed, May 22, 2024 at 12:17:30PM +0200, Jan Beulich wrote:
> The emulation_count field is set only conditionally right now. Convert
> all field setting to an initializer, thus guaranteeing that field to be
> set to 0 (default initialized) when GUEST_PAGING_LEVELS != 3.
> 
> While there also drop the "event" local variable, thus eliminating an
> instance of the being phased out u32 type.
> 
> Coverity ID: 1598430
> Fixes: 9a86ac1aa3d2 ("xentrace 5/7: Additional tracing for the shadow code")
> Signed-off-by: Jan Beulich 

Acked-by: Roger Pau Monné 

Thanks, Roger.



Re: [PATCH v3 2/2] drivers/char: Use sub-page ro API to make just xhci dbc cap RO

2024-05-22 Thread Marek Marczykowski-Górecki
On Wed, May 22, 2024 at 10:05:05AM +0200, Jan Beulich wrote:
> On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
> > --- a/xen/drivers/char/xhci-dbc.c
> > +++ b/xen/drivers/char/xhci-dbc.c
> > @@ -1216,20 +1216,19 @@ static void __init cf_check 
> > dbc_uart_init_postirq(struct serial_port *port)
> >  break;
> >  }
> >  #ifdef CONFIG_X86
> > -/*
> > - * This marks the whole page as R/O, which may include other registers
> > - * unrelated to DbC. Xen needs only DbC area protected, but it seems
> > - * Linux's XHCI driver (as of 5.18) works without writting to the whole
> > - * page, so keep it simple.
> > - */
> > -if ( rangeset_add_range(mmio_ro_ranges,
> > -PFN_DOWN((uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
> > - uart->dbc.xhc_dbc_offset),
> > -PFN_UP((uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
> > -   uart->dbc.xhc_dbc_offset +
> > -sizeof(*uart->dbc.dbc_reg)) - 1) )
> > -printk(XENLOG_INFO
> > -   "Error while adding MMIO range of device to 
> > mmio_ro_ranges\n");
> > +if ( subpage_mmio_ro_add(
> > + (uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
> > +  uart->dbc.xhc_dbc_offset,
> > + sizeof(*uart->dbc.dbc_reg)) )
> > +{
> > +printk(XENLOG_WARNING
> > +   "Error while marking MMIO range of XHCI console as R/O, "
> > +   "making the whole device R/O (share=no)\n");
> 
> Since you mention "share=no" here, wouldn't you then better also update the
> respective struct field, even if (right now) there may be nothing subsequently
> using that? Except that dbc_ensure_running() actually is looking at it, and
> that's not an __init function.

That case is just an optimization - if pci_ro_device() is used, nobody
else could write to PCI_COMMAND behind the driver backs, so there is no
point checking. Anyway, yes, makes sense to adjust dbc->share too.

> > +if ( pci_ro_device(0, uart->dbc.sbdf.bus, uart->dbc.sbdf.devfn) )
> > +printk(XENLOG_WARNING
> > +   "Failed to mark read-only %pp used for XHCI console\n",
> > +   &uart->dbc.sbdf);
> > +}
> >  #endif
> >  }
> 
> It's been a long time since v2 and the description doesn't say anything in
> this regard: Is there a reason not to retain the rangeset addition alongside
> the pci_ro_device() on the fallback path?

pci_ro_device() prevents device from being assigned to domU at all, so
that case is covered already. Dom0 would fail to load any driver (if
nothing else - because it can't size the BARs with R/O config space), so
a _well behaving_ Dom0 would also not touch the device in this case.
But otherwise, yes, it makes sense keep adding to mmio_ro_ranges in the
fallback path.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH v3 1/2] x86/mm: add API for marking only part of a MMIO page read only

2024-05-22 Thread Marek Marczykowski-Górecki
On Wed, May 22, 2024 at 09:52:44AM +0200, Jan Beulich wrote:
> On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -2009,6 +2009,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
> > long gla,
> >  goto out_put_gfn;
> >  }
> >  
> > +if ( (p2mt == p2m_mmio_direct) && npfec.write_access && npfec.present 
> > &&
> > + subpage_mmio_write_accept(mfn, gla) &&
> 
> Afaics subpage_mmio_write_accept() is unreachable then when CONFIG_HVM=n?

Right, the PV path hits mmio_ro_emulated_write() without my changes
already.
Do you suggest to make subpage_mmio_write_accept() under #ifdef
CONFIG_HVM?

> > + (hvm_emulate_one_mmio(mfn_x(mfn), gla) == X86EMUL_OKAY) )
> > +{
> > +rc = 1;
> > +goto out_put_gfn;
> > +}
> 
> Overall this new if() is pretty similar to the immediate preceding one.
> So similar that I wonder whether the two shouldn't be folded. 

I can do that if you prefer.

> In fact
> it looks as if the new one is needed only for the case where you'd pass
> through (to a DomU) a device partially used by Xen. That could certainly
> do with mentioning explicitly.

Well, the change in mmio_ro_emulated_write() is relevant to both dom0
and domU. It simply wasn't reachable (in this case) for HVM domU before
(but was for PV already).

> > +static void __iomem *subpage_mmio_get_page(struct subpage_ro_range *entry)
> 
> Considering what the function does and what it returns, perhaps better
> s/get/map/? The "get_page" part of the name generally has a different
> meaning in Xen's memory management.

Ok.

> > +{
> > +void __iomem *mapped_page;
> > +
> > +if ( entry->mapped )
> > +return entry->mapped;
> > +
> > +mapped_page = ioremap(mfn_x(entry->mfn) << PAGE_SHIFT, PAGE_SIZE);
> > +
> > +spin_lock(&subpage_ro_lock);
> > +/* Re-check under the lock */
> > +if ( entry->mapped )
> > +{
> > +spin_unlock(&subpage_ro_lock);
> > +iounmap(mapped_page);
> 
> The only unmap is on an error path here and on another error path elsewhere.
> IOW it looks as if devices with such marked pages are meant to never be hot
> unplugged. I can see that being intentional for the XHCI console, but imo
> such a restriction also needs prominently calling out in a comment next to
> e.g. the function declaration.

The v1 included subpage_mmio_ro_remove() function (which would need to
be used in case of hot-unplug of such device, if desirable), but since
this series doesn't introduce any use of it (as you say, it isn't
desirable for XHCI console specifically), you asked me to remove it...

Should I add an explicit comment about the limitation, instead of having
it implicit by not having subpage_mmio_ro_remove() there?

> > +return entry->mapped;
> > +}
> > +
> > +entry->mapped = mapped_page;
> > +spin_unlock(&subpage_ro_lock);
> > +return entry->mapped;
> > +}
> > +
> > +static void subpage_mmio_write_emulate(
> > +mfn_t mfn,
> > +unsigned int offset,
> > +const void *data,
> > +unsigned int len)
> > +{
> > +struct subpage_ro_range *entry;
> > +void __iomem *addr;
> 
> Wouldn't this better be pointer-to-volatile, with ...
> 
> > +list_for_each_entry(entry, &subpage_ro_ranges, list)
> > +{
> > +if ( mfn_eq(entry->mfn, mfn) )
> > +{
> > +if ( test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, 
> > entry->ro_qwords) )
> > +{
> > + write_ignored:
> > +gprintk(XENLOG_WARNING,
> > +"ignoring write to R/O MMIO 0x%"PRI_mfn"%03x len 
> > %u\n",
> > +mfn_x(mfn), offset, len);
> > +return;
> > +}
> > +
> > +addr = subpage_mmio_get_page(entry);
> > +if ( !addr )
> > +{
> > +gprintk(XENLOG_ERR,
> > +"Failed to map page for MMIO write at 
> > 0x%"PRI_mfn"%03x\n",
> > +mfn_x(mfn), offset);
> > +return;
> > +}
> > +
> > +switch ( len )
> > +{
> > +case 1:
> > +writeb(*(const uint8_t*)data, addr);
> > +break;
> > +case 2:
> > +writew(*(const uint16_t*)data, addr);
> > +break;
> > +case 4:
> > +writel(*(const uint32_t*)data, addr);
> > +break;
> > +case 8:
> > +writeq(*(const uint64_t*)data, addr);
> > +break;
> 
> ... this being how it's written? (If so, volatile suitably carried through to
> other places as well.)
> 
> > +default:
> > +/* mmio_ro_emulated_write() already validated the size */
> > +ASSERT_UNREACHABLE();
> > +goto write_ignored;
> > +}
> > +return;
> > +}
> > +}
> > +/* D

Re: [PATCH v15 3/5] vpci: add initial support for virtual PCI bus topology

2024-05-22 Thread Roger Pau Monné
On Fri, May 17, 2024 at 01:06:13PM -0400, Stewart Hildebrand wrote:
> From: Oleksandr Andrushchenko 
> 
> Assign SBDF to the PCI devices being passed through with bus 0.
> The resulting topology is where PCIe devices reside on the bus 0 of the
> root complex itself (embedded endpoints).
> This implementation is limited to 32 devices which are allowed on
> a single PCI bus.
> 
> Please note, that at the moment only function 0 of a multifunction
> device can be passed through.
> 
> Signed-off-by: Oleksandr Andrushchenko 
> Signed-off-by: Volodymyr Babchuk 
> Signed-off-by: Stewart Hildebrand 
> Acked-by: Jan Beulich 
> ---
> In v15:
> - add Jan's A-b
> In v13:
> - s/depends on/select/ in Kconfig
> - check pdev->sbdf.fn instead of two booleans in add_virtual_device()
> - comment #endifs in sched.h
> - clarify comment about limits in vpci.h with seg/bus limit
> In v11:
> - Fixed code formatting
> - Removed bogus write_unlock() call
> - Fixed type for new_dev_number
> In v10:
> - Removed ASSERT(pcidevs_locked())
> - Removed redundant code (local sbdf variable, clearing sbdf during
> device removal, etc)
> - Added __maybe_unused attribute to "out:" label
> - Introduced HAS_VPCI_GUEST_SUPPORT Kconfig option, as this is the
>   first patch where it is used (previously was in "vpci: add hooks for
>   PCI device assign/de-assign")
> In v9:
> - Lock in add_virtual_device() replaced with ASSERT (thanks, Stewart)
> In v8:
> - Added write lock in add_virtual_device
> Since v6:
> - re-work wrt new locking scheme
> - OT: add ASSERT(pcidevs_write_locked()); to add_virtual_device()
> Since v5:
> - s/vpci_add_virtual_device/add_virtual_device and make it static
> - call add_virtual_device from vpci_assign_device and do not use
>   REGISTER_VPCI_INIT machinery
> - add pcidevs_locked ASSERT
> - use DECLARE_BITMAP for vpci_dev_assigned_map
> Since v4:
> - moved and re-worked guest sbdf initializers
> - s/set_bit/__set_bit
> - s/clear_bit/__clear_bit
> - minor comment fix s/Virtual/Guest/
> - added VPCI_MAX_VIRT_DEV constant (PCI_SLOT(~0) + 1) which will be used
>   later for counting the number of MMIO handlers required for a guest
>   (Julien)
> Since v3:
>  - make use of VPCI_INIT
>  - moved all new code to vpci.c which belongs to it
>  - changed open-coded 31 to PCI_SLOT(~0)
>  - added comments and code to reject multifunction devices with
>functions other than 0
>  - updated comment about vpci_dev_next and made it unsigned int
>  - implement roll back in case of error while assigning/deassigning devices
>  - s/dom%pd/%pd
> Since v2:
>  - remove casts that are (a) malformed and (b) unnecessary
>  - add new line for better readability
>  - remove CONFIG_HAS_VPCI_GUEST_SUPPORT ifdef's as the relevant vPCI
> functions are now completely gated with this config
>  - gate common code with CONFIG_HAS_VPCI_GUEST_SUPPORT
> New in v2
> ---
>  xen/drivers/Kconfig |  4 +++
>  xen/drivers/vpci/vpci.c | 57 +
>  xen/include/xen/sched.h | 10 +++-
>  xen/include/xen/vpci.h  | 12 +
>  4 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
> index db94393f47a6..20050e9bb8b3 100644
> --- a/xen/drivers/Kconfig
> +++ b/xen/drivers/Kconfig
> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>  config HAS_VPCI
>   bool
>  
> +config HAS_VPCI_GUEST_SUPPORT
> + bool
> + select HAS_VPCI
> +
>  endmenu
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index 97e115dc5798..23722634d50b 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -40,6 +40,49 @@ extern vpci_register_init_t *const __start_vpci_array[];
>  extern vpci_register_init_t *const __end_vpci_array[];
>  #define NUM_VPCI_INIT (__end_vpci_array - __start_vpci_array)
>  
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +static int add_virtual_device(struct pci_dev *pdev)

This seems quite generic, IMO it would better named
`assign_{guest,virtual}_sbdf()` or similar, unless there are plans to
add more code here that's not strictly only about setting the guest
SBDF.

> +{
> +struct domain *d = pdev->domain;
> +unsigned int new_dev_number;
> +
> +if ( is_hardware_domain(d) )
> +return 0;
> +
> +ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));

Shouldn't the assert be done before the is_hardware_domain() check, so
that we assert that all possible paths (even those from dom0) have
taken the correct lock?

> +
> +/*
> + * Each PCI bus supports 32 devices/slots at max or up to 256 when
> + * there are multi-function ones which are not yet supported.
> + */
> +if ( pdev->sbdf.fn )
> +{
> +gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
> + &pdev->sbdf);
> +return -EOPNOTSUPP;
> +}
> +new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
> + VPCI_MAX_VIRT_DEV);
> +if ( new_dev_n

[PATCH] x86/shadow: don't leave trace record field uninitialized

2024-05-22 Thread Jan Beulich
The emulation_count field is set only conditionally right now. Convert
all field setting to an initializer, thus guaranteeing that field to be
set to 0 (default initialized) when GUEST_PAGING_LEVELS != 3.

While there also drop the "event" local variable, thus eliminating an
instance of the being phased out u32 type.

Coverity ID: 1598430
Fixes: 9a86ac1aa3d2 ("xentrace 5/7: Additional tracing for the shadow code")
Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2093,20 +2093,18 @@ static inline void trace_shadow_emulate(
 guest_l1e_t gl1e, write_val;
 guest_va_t va;
 uint32_t flags:29, emulation_count:3;
-} d;
-u32 event;
-
-event = TRC_SHADOW_EMULATE | ((GUEST_PAGING_LEVELS-2)<<8);
-
-d.gl1e = gl1e;
-d.write_val.l1 = this_cpu(trace_emulate_write_val);
-d.va = va;
+} d = {
+.gl1e = gl1e,
+.write_val.l1 = this_cpu(trace_emulate_write_val),
+.va = va,
 #if GUEST_PAGING_LEVELS == 3
-d.emulation_count = this_cpu(trace_extra_emulation_count);
+.emulation_count = this_cpu(trace_extra_emulation_count),
 #endif
-d.flags = this_cpu(trace_shadow_path_flags);
+.flags = this_cpu(trace_shadow_path_flags),
+};
 
-trace(event, sizeof(d), &d);
+trace(TRC_SHADOW_EMULATE | ((GUEST_PAGING_LEVELS - 2) << 8),
+  sizeof(d), &d);
 }
 }
 #endif /* CONFIG_HVM */



Re: New Defects reported by Coverity Scan for XenProject

2024-05-22 Thread Jan Beulich
On 22.05.2024 11:56, scan-ad...@coverity.com wrote:
> ** CID 1598431:  Memory - corruptions  (OVERRUN)
> 
> 
> 
> *** CID 1598431:  Memory - corruptions  (OVERRUN)
> /xen/common/trace.c: 798 in trace()
> 792 }
> 793 
> 794 if ( rec_size > bytes_to_wrap )
> 795 insert_wrap_record(buf, rec_size);
> 796 
> 797 /* Write the original record */
 CID 1598431:  Memory - corruptions  (OVERRUN)
 Overrunning callee's array of size 28 by passing argument "extra" 
 (which evaluates to 31) in call to "__insert_record".
> 798 __insert_record(buf, event, extra, cycles, rec_size, extra_data);
> 799 
> 800 unlock:
> 801 spin_unlock_irqrestore(&this_cpu(t_lock), flags);
> 802 
> 803 /* Notify trace buffer consumer that we've crossed the high water 
> mark. */

How does the tool conclude "extra" evaluating to 31, when at the top of
the function it is clearly checked to be less than 28?

> ** CID 1598430:  Uninitialized variables  (UNINIT)
> 
> 
> 
> *** CID 1598430:  Uninitialized variables  (UNINIT)
> /xen/arch/x86/mm/shadow/multi.c: 2109 in trace_shadow_emulate()
> 2103 d.va = va;
> 2104 #if GUEST_PAGING_LEVELS == 3
> 2105 d.emulation_count = this_cpu(trace_extra_emulation_count);
> 2106 #endif
> 2107 d.flags = this_cpu(trace_shadow_path_flags);
> 2108 
 CID 1598430:  Uninitialized variables  (UNINIT)
 Using uninitialized value "d". Field "d.emulation_count" is 
 uninitialized when calling "trace".
> 2109 trace(event, sizeof(d), &d);
> 2110 }
> 2111 }
> 2112 #endif /* CONFIG_HVM */
> 2113 
> 2114 
> /**/

This, otoh, looks to be a valid (but long-standing) issue, which I'll make
a patch for.

Jan



Virtual Xen Summit 2024

2024-05-22 Thread Kelly Choi
Hi all,

To make sure we encourage as many people to attend Xen Summit 2024
, the community
will be helping to host the event virtually via Jitsi. This will be free to
join!

I will send the links and instructions to join our event, early next week.
Keep your eyes peeled on the mailing list for the announcement.

Please note there will be no professional AV or video equipment and this is
an effort to encourage participation.

As a reminder, virtual attendees can still get involved in design sessions,
so please register and propose a session today!
https://design-sessions.xenproject.org/

Many thanks,
Kelly Choi

Community Manager
Xen Project


Re: [PATCH v15 2/5] vpci/header: emulate PCI_COMMAND register for guests

2024-05-22 Thread Jan Beulich
On 22.05.2024 11:28, Roger Pau Monné wrote:
> On Fri, May 17, 2024 at 01:06:12PM -0400, Stewart Hildebrand wrote:
>> @@ -754,9 +774,23 @@ static int cf_check init_header(struct pci_dev *pdev)
>>  return -EOPNOTSUPP;
>>  }
>>  
>> -/* Setup a handler for the command register. */
>> -rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, 
>> PCI_COMMAND,
>> -   2, header);
>> +/*
>> + * Setup a handler for the command register.
>> + *
>> + * TODO: If support for emulated bits is added, re-visit how to handle
>> + * PCI_COMMAND_PARITY, PCI_COMMAND_SERR, and PCI_COMMAND_FAST_BACK.
>> + */
>> +rc = vpci_add_register_mask(pdev->vpci,
>> +is_hwdom ? vpci_hw_read16 : guest_cmd_read,
>> +cmd_write, PCI_COMMAND, 2, header, 0, 0,
>> +PCI_COMMAND_RSVDP_MASK |
>> +(is_hwdom ? 0
>> +  : PCI_COMMAND_IO |
>> +PCI_COMMAND_PARITY |
>> +PCI_COMMAND_WAIT |
>> +PCI_COMMAND_SERR |
>> +PCI_COMMAND_FAST_BACK),
> 
> We want to allow full access to the hw domain and only apply the
> PCI_COMMAND_RSVDP_MASK when !is_hwdom in order to keep the current
> behavior for dom0.
> 
> I don't think it makes a difference in practice, but we are very lax
> in explicitly not applying any of such restrictions to dom0.
> 
> With that fixed:
> 
> Reviewed-by: Roger Pau Monné 

Makes sense to me, so please feel free to retain my R-b with that adjustment.

Jan



Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm

2024-05-22 Thread Jan Beulich
On 08.05.2024 14:39, Alejandro Vallejo wrote:
> Otherwise it's not possible to call functions described in hvm/vlapic.h from 
> the
> inline functions of hvm/hvm.h.
> 
> This is because a static inline in vlapic.h depends on hvm.h, and pulls it
> transitively through vpt.h. The ultimate cause is having hvm.h included in any
> of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.
> 
> No functional change.
> 
> Signed-off-by: Alejandro Vallejo 

In principle:
Reviewed-by: Jan Beulich 
But see below for one possible adjustment.

> ---
> v2:
>   * New patch. Prereq to moving vlapic_cpu_policy_changed() onto hvm.h

That hook invocation living outside of hvm/hvm.h was an outlier anyway,
so even without the planned further work this is probably a good move.

> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu *v)
>  alternative_vcall(hvm_funcs.update_vlapic_mode, v);
>  }
>  
> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
> +{
> +if ( hvm_funcs.sync_pir_to_irr )
> +alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
> +}

The hook doesn't have "vlapic" in its name. Therefore instead or prepending
hvm_ to the original name or the wrapper, how about replacing the vlapic_
that was there. That would then also fit better with the naming scheme used
for other hooks and their wrappers. Happy to adjust while committing, so
long as you don't disagree.

Jan



Re: [PATCH v15 2/5] vpci/header: emulate PCI_COMMAND register for guests

2024-05-22 Thread Roger Pau Monné
On Fri, May 17, 2024 at 01:06:12PM -0400, Stewart Hildebrand wrote:
> From: Oleksandr Andrushchenko 
> 
> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
> guest's (domU) view of this will want to be zero (for now), the host
> having set it to 1 should be preserved, or else we'd effectively be
> giving the domU control of the bit. Thus, PCI_COMMAND register needs
> proper emulation in order to honor host's settings.
> 
> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
> Device Control" the reset state of the command register is typically 0,
> so when assigning a PCI device use 0 as the initial state for the
> guest's (domU) view of the command register.
> 
> Here is the full list of command register bits with notes about
> PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
> also documented here since that is our current reference implementation
> for PCI passthrough.
> 
> PCI_COMMAND_IO (bit 0)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
> writes do not propagate to hardware. QEMU sets this bit to 1 in
> hardware if an I/O BAR is exposed to the guest.
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
> don't yet support I/O BARs for domUs.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_MEMORY (bit 1)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
> writes do not propagate to hardware. QEMU sets this bit to 1 in
> hardware if a Memory BAR is exposed to the guest.
>   Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
> regions.
>   Xen domU: For devices assigned to DomUs, memory decoding will be
> disabled at the time of initialization.
> 
> PCI_COMMAND_MASTER (bit 2)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: Pass through writes to hardware.
>   Xen domU/dom0: Pass through writes to hardware.
> 
> PCI_COMMAND_SPECIAL (bit 3)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: RW
>   QEMU: Pass through writes to hardware.
>   Xen domU/dom0: Pass through writes to hardware.
> 
> PCI_COMMAND_INVALIDATE (bit 4)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: RW
>   QEMU: Pass through writes to hardware.
>   Xen domU/dom0: Pass through writes to hardware.
> 
> PCI_COMMAND_VGA_PALETTE (bit 5)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: RW
>   QEMU: Pass through writes to hardware.
>   Xen domU/dom0: Pass through writes to hardware.
> 
> PCI_COMMAND_PARITY (bit 6)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
> writes do not propagate to hardware.
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_WAIT (bit 7)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: hardwire to 0
>   QEMU: res_mask
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_SERR (bit 8)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
> writes do not propagate to hardware.
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_FAST_BACK (bit 9)
>   PCIe 6.1: RO, hardwire to 0
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
> writes do not propagate to hardware.
>   Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> PCI_COMMAND_INTX_DISABLE (bit 10)
>   PCIe 6.1: RW
>   PCI LB 3.0: RW
>   QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
> writes do not propagate to hardware. QEMU checks if INTx was mapped
> for a device. If it is not, then guest can't control
> PCI_COMMAND_INTX_DISABLE bit.
>   Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
>   Xen dom0: We allow dom0 to control this bit freely.
> 
> Bits 11-15
>   PCIe 6.1: RsvdP
>   PCI LB 3.0: Reserved
>   QEMU: res_mask
>   Xen domU/dom0: rsvdp_mask
> 
> Signed-off-by: Oleksandr Andrushchenko 
> Signed-off-by: Volodymyr Babchuk 
> Signed-off-by: Stewart Hildebrand 
> Reviewed-by: Jan Beulich 
> ---
> RFC: There is an unaddressed question for Roger: should we update the
>  guest view of the PCI_COMMAND_INTX_DISABLE bit in
>  msi.c/msix.c:control_write()? See prior discussion at [1].
>  In my opinion, I think we should make sure that hardware state and
>  the guest view are consistent (i.e. don't lie to the guest).
> 
> [1] 
> https://lore.kernel.org/xen-devel/86b25777-788c-4b9a-8166-a6f8174be...@suse.com/

I think updating the guest view is helpful in case we need to debug
issues in the guest.

> 
> In v15:
> - add Jan's R-b

[XEN PATCH v4 3/3] x86/MCE: optional build of AMD/Intel MCE code

2024-05-22 Thread Sergiy Kibrik
Separate Intel/AMD-specific MCE code using CONFIG_{INTEL,AMD} config options.
Now we can avoid build of mcheck code if support for specific platform is
intentionally disabled by configuration.

Also global variables lmce_support & cmci_support from Intel-specific
mce_intel.c have to moved to common mce.c, as they get checked in common code.

Signed-off-by: Sergiy Kibrik 
Reviewed-by: Stefano Stabellini 
Acked-by: Jan Beulich 
---
changes in v4:
 - attribute {lmce_support,cmci_support} with __ro_after_init
changes in v3:
 - default return value of init_nonfatal_mce_checker() done in separate patch
 - move lmce_support & cmci_support to common mce.c code
 - changed patch description
changes in v2:
 - fallback to original ordering in Makefile
 - redefine lmce_support & cmci_support global vars to false when !INTEL
 - changed patch description
---
 xen/arch/x86/cpu/mcheck/Makefile| 8 
 xen/arch/x86/cpu/mcheck/mce.c   | 4 
 xen/arch/x86/cpu/mcheck/mce_intel.c | 4 
 xen/arch/x86/cpu/mcheck/non-fatal.c | 4 
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/Makefile b/xen/arch/x86/cpu/mcheck/Makefile
index f927f10b4d..e6cb4dd503 100644
--- a/xen/arch/x86/cpu/mcheck/Makefile
+++ b/xen/arch/x86/cpu/mcheck/Makefile
@@ -1,12 +1,12 @@
-obj-y += amd_nonfatal.o
-obj-y += mce_amd.o
+obj-$(CONFIG_AMD) += amd_nonfatal.o
+obj-$(CONFIG_AMD) += mce_amd.o
 obj-y += mcaction.o
 obj-y += barrier.o
-obj-y += intel-nonfatal.o
+obj-$(CONFIG_INTEL) += intel-nonfatal.o
 obj-y += mctelem.o
 obj-y += mce.o
 obj-y += mce-apei.o
-obj-y += mce_intel.o
+obj-$(CONFIG_INTEL) += mce_intel.o
 obj-y += non-fatal.o
 obj-y += util.o
 obj-y += vmce.o
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index fb9dec5b89..1664ca6412 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -38,6 +38,10 @@ DEFINE_PER_CPU_READ_MOSTLY(unsigned int, nr_mce_banks);
 unsigned int __read_mostly firstbank;
 unsigned int __read_mostly ppin_msr;
 uint8_t __read_mostly cmci_apic_vector;
+bool __ro_after_init cmci_support;
+
+/* If mce_force_broadcast == 1, lmce_support will be disabled forcibly. */
+bool __ro_after_init lmce_support;
 
 DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, poll_bankmask);
 DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, no_cmci_banks);
diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c 
b/xen/arch/x86/cpu/mcheck/mce_intel.c
index af43281cc6..dd812f4b8a 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -26,16 +26,12 @@
 #include "mcaction.h"
 
 static DEFINE_PER_CPU_READ_MOSTLY(struct mca_banks *, mce_banks_owned);
-bool __read_mostly cmci_support;
 static bool __read_mostly ser_support;
 static bool __read_mostly mce_force_broadcast;
 boolean_param("mce_fb", mce_force_broadcast);
 
 static int __read_mostly nr_intel_ext_msrs;
 
-/* If mce_force_broadcast == 1, lmce_support will be disabled forcibly. */
-bool __read_mostly lmce_support;
-
 /* Intel SDM define bit15~bit0 of IA32_MCi_STATUS as the MC error code */
 #define INTEL_MCCOD_MASK 0x
 
diff --git a/xen/arch/x86/cpu/mcheck/non-fatal.c 
b/xen/arch/x86/cpu/mcheck/non-fatal.c
index 5a53bcd0b7..a9ee9bb94f 100644
--- a/xen/arch/x86/cpu/mcheck/non-fatal.c
+++ b/xen/arch/x86/cpu/mcheck/non-fatal.c
@@ -24,15 +24,19 @@ static int __init cf_check init_nonfatal_mce_checker(void)
 * Check for non-fatal errors every MCE_RATE s
 */
switch (c->x86_vendor) {
+#ifdef CONFIG_AMD
case X86_VENDOR_AMD:
case X86_VENDOR_HYGON:
/* Assume we are on K8 or newer AMD or Hygon CPU here */
amd_nonfatal_mcheck_init(c);
break;
+#endif
 
+#ifdef CONFIG_INTEL
case X86_VENDOR_INTEL:
intel_nonfatal_mcheck_init(c);
break;
+#endif
 
default:
/* unhandled vendor isn't really an error */
-- 
2.25.1




[XEN PATCH v4 2/3] x86/MCE: add default switch case in init_nonfatal_mce_checker()

2024-05-22 Thread Sergiy Kibrik
The default switch case block is wanted here, to handle situation
e.g. of unexpected c->x86_vendor value -- then no mcheck init is done, but
misleading message still gets logged anyway.

Signed-off-by: Sergiy Kibrik 
CC: Jan Beulich 
---
changes in v4:
 - return 0 instead of -ENODEV and put a comment
 - update description a bit
---
 xen/arch/x86/cpu/mcheck/non-fatal.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/cpu/mcheck/non-fatal.c 
b/xen/arch/x86/cpu/mcheck/non-fatal.c
index 33cacd15c2..5a53bcd0b7 100644
--- a/xen/arch/x86/cpu/mcheck/non-fatal.c
+++ b/xen/arch/x86/cpu/mcheck/non-fatal.c
@@ -29,9 +29,14 @@ static int __init cf_check init_nonfatal_mce_checker(void)
/* Assume we are on K8 or newer AMD or Hygon CPU here */
amd_nonfatal_mcheck_init(c);
break;
+
case X86_VENDOR_INTEL:
intel_nonfatal_mcheck_init(c);
break;
+
+   default:
+   /* unhandled vendor isn't really an error */
+   return 0;
}
printk(KERN_INFO "mcheck_poll: Machine check polling timer started.\n");
return 0;
-- 
2.25.1




[XEN PATCH v4 1/3] x86/intel: move vmce_has_lmce() routine to header

2024-05-22 Thread Sergiy Kibrik
Moving this function out of mce_intel.c will make it possible to disable
build of Intel MCE code later on, because the function gets called from
common x86 code.

Also replace boilerplate code that checks for MCG_LMCE_P flag with
vmce_has_lmce(), which might contribute to readability a bit.

Signed-off-by: Sergiy Kibrik 
Reviewed-by: Stefano Stabellini 
CC: Jan Beulich 
---
changes in v4:
 - changed description a bit
changes in v3:
 - do not check for CONFIG_INTEL
 - remove CONFIG_INTEL from patch description
changes in v2:
 - move vmce_has_lmce() to cpu/mcheck/mce.h
 - move IS_ENABLED(CONFIG_INTEL) check inside vmce_has_lmce()
 - changed description
---
 xen/arch/x86/cpu/mcheck/mce.h   | 5 +
 xen/arch/x86/cpu/mcheck/mce_intel.c | 4 
 xen/arch/x86/cpu/mcheck/vmce.c  | 5 ++---
 xen/arch/x86/include/asm/mce.h  | 1 -
 xen/arch/x86/msr.c  | 2 ++
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/mce.h b/xen/arch/x86/cpu/mcheck/mce.h
index 4806405f96..eba4b536c7 100644
--- a/xen/arch/x86/cpu/mcheck/mce.h
+++ b/xen/arch/x86/cpu/mcheck/mce.h
@@ -170,6 +170,11 @@ static inline int mce_bank_msr(const struct vcpu *v, 
uint32_t msr)
 return 0;
 }
 
+static inline bool vmce_has_lmce(const struct vcpu *v)
+{
+return v->arch.vmce.mcg_cap & MCG_LMCE_P;
+}
+
 struct mce_callbacks {
 void (*handler)(const struct cpu_user_regs *regs);
 bool (*check_addr)(uint64_t status, uint64_t misc, int addr_type);
diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c 
b/xen/arch/x86/cpu/mcheck/mce_intel.c
index 3f5199b531..af43281cc6 100644
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -1050,7 +1050,3 @@ int vmce_intel_rdmsr(const struct vcpu *v, uint32_t msr, 
uint64_t *val)
 return 1;
 }
 
-bool vmce_has_lmce(const struct vcpu *v)
-{
-return v->arch.vmce.mcg_cap & MCG_LMCE_P;
-}
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 4da6f4a3e4..5abdf4cb5f 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -203,7 +203,7 @@ int vmce_rdmsr(uint32_t msr, uint64_t *val)
  * bits are always set in guest MSR_IA32_FEATURE_CONTROL by Xen, so it
  * does not need to check them here.
  */
-if ( cur->arch.vmce.mcg_cap & MCG_LMCE_P )
+if ( vmce_has_lmce(cur) )
 {
 *val = cur->arch.vmce.mcg_ext_ctl;
 mce_printk(MCE_VERBOSE, "MCE: %pv: rd MCG_EXT_CTL %#"PRIx64"\n",
@@ -332,8 +332,7 @@ int vmce_wrmsr(uint32_t msr, uint64_t val)
 break;
 
 case MSR_IA32_MCG_EXT_CTL:
-if ( (cur->arch.vmce.mcg_cap & MCG_LMCE_P) &&
- !(val & ~MCG_EXT_CTL_LMCE_EN) )
+if ( vmce_has_lmce(cur) && !(val & ~MCG_EXT_CTL_LMCE_EN) )
 cur->arch.vmce.mcg_ext_ctl = val;
 else
 ret = -1;
diff --git a/xen/arch/x86/include/asm/mce.h b/xen/arch/x86/include/asm/mce.h
index 6ce56b5b85..2ec47a71ae 100644
--- a/xen/arch/x86/include/asm/mce.h
+++ b/xen/arch/x86/include/asm/mce.h
@@ -41,7 +41,6 @@ extern void vmce_init_vcpu(struct vcpu *v);
 extern int vmce_restore_vcpu(struct vcpu *v, const struct hvm_vmce_vcpu *ctxt);
 extern int vmce_wrmsr(uint32_t msr, uint64_t val);
 extern int vmce_rdmsr(uint32_t msr, uint64_t *val);
-extern bool vmce_has_lmce(const struct vcpu *v);
 extern int vmce_enable_mca_cap(struct domain *d, uint64_t cap);
 
 DECLARE_PER_CPU(unsigned int, nr_mce_banks);
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 9babd441f9..b0ec96f021 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -24,6 +24,8 @@
 
 #include 
 
+#include "cpu/mcheck/mce.h"
+
 DEFINE_PER_CPU(uint32_t, tsc_aux);
 
 int init_vcpu_msr_policy(struct vcpu *v)
-- 
2.25.1




[XEN PATCH v4 0/3] x86: make Intel/AMD vPMU & MCE support configurable

2024-05-22 Thread Sergiy Kibrik
Three remaining patches to separate support of Intel & AMD CPUs in Xen build.
Most of related patches from previous series had already been merged.
Specific changes since v3 are provided per-patch.

v3 series here:
https://lore.kernel.org/xen-devel/cover.1715673586.git.sergiy_kib...@epam.com/

  -Sergiy

Sergiy Kibrik (3):
  x86/intel: move vmce_has_lmce() routine to header
  x86/MCE: add default switch case in init_nonfatal_mce_checker()
  x86/MCE: optional build of AMD/Intel MCE code

 xen/arch/x86/cpu/mcheck/Makefile| 8 
 xen/arch/x86/cpu/mcheck/mce.c   | 4 
 xen/arch/x86/cpu/mcheck/mce.h   | 5 +
 xen/arch/x86/cpu/mcheck/mce_intel.c | 8 
 xen/arch/x86/cpu/mcheck/non-fatal.c | 9 +
 xen/arch/x86/cpu/mcheck/vmce.c  | 5 ++---
 xen/arch/x86/include/asm/mce.h  | 1 -
 xen/arch/x86/msr.c  | 2 ++
 8 files changed, 26 insertions(+), 16 deletions(-)

-- 
2.25.1




Re: [PATCH v10 03/14] xen/bitops: implement fls{l}() in common logic

2024-05-22 Thread Jan Beulich
On 22.05.2024 09:37, Oleksii K. wrote:
> On Tue, 2024-05-21 at 13:18 +0200, Jan Beulich wrote:
>> On 17.05.2024 15:54, Oleksii Kurochko wrote:
>>> To avoid the compilation error below, it is needed to update to
>>> places
>>> in common/page_alloc.c where flsl() is used as now flsl() returns
>>> unsigned int:
>>>
>>> ./include/xen/kernel.h:18:21: error: comparison of distinct pointer
>>> types lacks a cast [-Werror]
>>>    18 | (void) (&_x == &_y);    \
>>>   | ^~
>>>     common/page_alloc.c:1843:34: note: in expansion of macro 'min'
>>>  1843 | unsigned int inc_order = min(MAX_ORDER, flsl(e
>>> - s) - 1);
>>>
>>> generic_fls{l} was used instead of __builtin_clz{l}(x) as if x is
>>> 0,
>>> the result in undefined.
>>>
>>> The prototype of the per-architecture fls{l}() functions was
>>> changed to
>>> return 'unsigned int' to align with the generic implementation of
>>> these
>>> functions and avoid introducing signed/unsigned mismatches.
>>>
>>> Signed-off-by: Oleksii Kurochko 
>>> ---
>>>  The patch is almost independent from Andrew's patch series
>>>  (
>>> https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.coop...@citrix.com/T/#t
>>> )
>>>  except test_fls() function which IMO can be merged as a separate
>>> patch after Andrew's patch
>>>  will be fully ready.
>>
>> If there wasn't this dependency (I don't think it's "almost
>> independent"),
>> I'd be offering R-b with again one nit below.
> 
> Aren't all changes, except those in xen/common/bitops.c, independent? I
> could move these changes in xen/common/bitops.c to a separate commit. I
> think it is safe to commit them ( an introduction of common logic for
> fls{l}() and tests ) separately since the CI tests have passed.

Technically they might be, but contextually there are further conflicts.
Just try "patch --dry-run" on top of a plain staging tree. You really
need to settle, perhaps consulting Andrew, whether you want to go on top
of his change, or ahead of it. I'm not willing to approve a patch that's
presented one way but then is (kind of) expected to go in the other way.

Jan



Re: [PATCH v3 0/2] Add API for making parts of a MMIO page R/O and use it in XHCI console

2024-05-22 Thread Jan Beulich
On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
> On older systems, XHCI xcap had a layout that no other (interesting) registers
> were placed on the same page as the debug capability, so Linux was fine with
> making the whole page R/O. But at least on Tiger Lake and Alder Lake, Linux
> needs to write to some other registers on the same page too.
> 
> Add a generic API for making just parts of an MMIO page R/O and use it to fix
> USB3 console with share=yes or share=hwdom options. More details in commit
> messages.
> 
> Technically it may still qualify for 4.19, since v1 was sent well before
> last posting date. But I realize it's quite late and it isn't top
> priority series, so if it won't hit 4.19, it's okay with me too.
> 
> Marek Marczykowski-Górecki (2):
>   x86/mm: add API for marking only part of a MMIO page read only
>   drivers/char: Use sub-page ro API to make just xhci dbc cap RO
> 
>  xen/arch/x86/hvm/emulate.c  |   2 +-
>  xen/arch/x86/hvm/hvm.c  |   8 +-
>  xen/arch/x86/include/asm/mm.h   |  18 ++-
>  xen/arch/x86/mm.c   | 268 +-
>  xen/arch/x86/pv/ro-page-fault.c |   1 +-
>  xen/drivers/char/xhci-dbc.c |  27 +--
>  6 files changed, 309 insertions(+), 15 deletions(-)

Just to mention it here again, with v2 having been quite some time ago:
Like for that other work of yours I'm not really convinced the complexity
is worth the gain. Ultimately this may once again mean that I'll demand
a 2nd maintainer's ack, once technically I may be okay to offer R-b.

Jan



Re: [PATCH v3 2/2] drivers/char: Use sub-page ro API to make just xhci dbc cap RO

2024-05-22 Thread Jan Beulich
On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
> --- a/xen/drivers/char/xhci-dbc.c
> +++ b/xen/drivers/char/xhci-dbc.c
> @@ -1216,20 +1216,19 @@ static void __init cf_check 
> dbc_uart_init_postirq(struct serial_port *port)
>  break;
>  }
>  #ifdef CONFIG_X86
> -/*
> - * This marks the whole page as R/O, which may include other registers
> - * unrelated to DbC. Xen needs only DbC area protected, but it seems
> - * Linux's XHCI driver (as of 5.18) works without writting to the whole
> - * page, so keep it simple.
> - */
> -if ( rangeset_add_range(mmio_ro_ranges,
> -PFN_DOWN((uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
> - uart->dbc.xhc_dbc_offset),
> -PFN_UP((uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
> -   uart->dbc.xhc_dbc_offset +
> -sizeof(*uart->dbc.dbc_reg)) - 1) )
> -printk(XENLOG_INFO
> -   "Error while adding MMIO range of device to 
> mmio_ro_ranges\n");
> +if ( subpage_mmio_ro_add(
> + (uart->dbc.bar_val & PCI_BASE_ADDRESS_MEM_MASK) +
> +  uart->dbc.xhc_dbc_offset,
> + sizeof(*uart->dbc.dbc_reg)) )
> +{
> +printk(XENLOG_WARNING
> +   "Error while marking MMIO range of XHCI console as R/O, "
> +   "making the whole device R/O (share=no)\n");

Since you mention "share=no" here, wouldn't you then better also update the
respective struct field, even if (right now) there may be nothing subsequently
using that? Except that dbc_ensure_running() actually is looking at it, and
that's not an __init function.

> +if ( pci_ro_device(0, uart->dbc.sbdf.bus, uart->dbc.sbdf.devfn) )
> +printk(XENLOG_WARNING
> +   "Failed to mark read-only %pp used for XHCI console\n",
> +   &uart->dbc.sbdf);
> +}
>  #endif
>  }

It's been a long time since v2 and the description doesn't say anything in
this regard: Is there a reason not to retain the rangeset addition alongside
the pci_ro_device() on the fallback path?

Jan



Re: [PATCH v3 1/2] x86/mm: add API for marking only part of a MMIO page read only

2024-05-22 Thread Jan Beulich
On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -2009,6 +2009,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned 
> long gla,
>  goto out_put_gfn;
>  }
>  
> +if ( (p2mt == p2m_mmio_direct) && npfec.write_access && npfec.present &&
> + subpage_mmio_write_accept(mfn, gla) &&

Afaics subpage_mmio_write_accept() is unreachable then when CONFIG_HVM=n?

> + (hvm_emulate_one_mmio(mfn_x(mfn), gla) == X86EMUL_OKAY) )
> +{
> +rc = 1;
> +goto out_put_gfn;
> +}

Overall this new if() is pretty similar to the immediate preceding one.
So similar that I wonder whether the two shouldn't be folded. In fact
it looks as if the new one is needed only for the case where you'd pass
through (to a DomU) a device partially used by Xen. That could certainly
do with mentioning explicitly.

> +static void __iomem *subpage_mmio_get_page(struct subpage_ro_range *entry)

Considering what the function does and what it returns, perhaps better
s/get/map/? The "get_page" part of the name generally has a different
meaning in Xen's memory management.

> +{
> +void __iomem *mapped_page;
> +
> +if ( entry->mapped )
> +return entry->mapped;
> +
> +mapped_page = ioremap(mfn_x(entry->mfn) << PAGE_SHIFT, PAGE_SIZE);
> +
> +spin_lock(&subpage_ro_lock);
> +/* Re-check under the lock */
> +if ( entry->mapped )
> +{
> +spin_unlock(&subpage_ro_lock);
> +iounmap(mapped_page);

The only unmap is on an error path here and on another error path elsewhere.
IOW it looks as if devices with such marked pages are meant to never be hot
unplugged. I can see that being intentional for the XHCI console, but imo
such a restriction also needs prominently calling out in a comment next to
e.g. the function declaration.

> +return entry->mapped;
> +}
> +
> +entry->mapped = mapped_page;
> +spin_unlock(&subpage_ro_lock);
> +return entry->mapped;
> +}
> +
> +static void subpage_mmio_write_emulate(
> +mfn_t mfn,
> +unsigned int offset,
> +const void *data,
> +unsigned int len)
> +{
> +struct subpage_ro_range *entry;
> +void __iomem *addr;

Wouldn't this better be pointer-to-volatile, with ...

> +list_for_each_entry(entry, &subpage_ro_ranges, list)
> +{
> +if ( mfn_eq(entry->mfn, mfn) )
> +{
> +if ( test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, entry->ro_qwords) )
> +{
> + write_ignored:
> +gprintk(XENLOG_WARNING,
> +"ignoring write to R/O MMIO 0x%"PRI_mfn"%03x len 
> %u\n",
> +mfn_x(mfn), offset, len);
> +return;
> +}
> +
> +addr = subpage_mmio_get_page(entry);
> +if ( !addr )
> +{
> +gprintk(XENLOG_ERR,
> +"Failed to map page for MMIO write at 
> 0x%"PRI_mfn"%03x\n",
> +mfn_x(mfn), offset);
> +return;
> +}
> +
> +switch ( len )
> +{
> +case 1:
> +writeb(*(const uint8_t*)data, addr);
> +break;
> +case 2:
> +writew(*(const uint16_t*)data, addr);
> +break;
> +case 4:
> +writel(*(const uint32_t*)data, addr);
> +break;
> +case 8:
> +writeq(*(const uint64_t*)data, addr);
> +break;

... this being how it's written? (If so, volatile suitably carried through to
other places as well.)

> +default:
> +/* mmio_ro_emulated_write() already validated the size */
> +ASSERT_UNREACHABLE();
> +goto write_ignored;
> +}
> +return;
> +}
> +}
> +/* Do not print message for pages without any writable parts. */
> +}
> +
> +bool subpage_mmio_write_accept(mfn_t mfn, unsigned long gla)
> +{
> +unsigned int offset = PAGE_OFFSET(gla);
> +const struct subpage_ro_range *entry;
> +
> +list_for_each_entry_rcu(entry, &subpage_ro_ranges, list)

Considering the other remark about respective devices impossible to go
away, is the RCU form here really needed? Its use gives the (false)
impression of entry removal being possible.

> +if ( mfn_eq(entry->mfn, mfn) &&
> + !test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, entry->ro_qwords) )

Btw, "qwords" in the field name is kind of odd when SUBPAGE_MMIO_RO_ALIGN
in principle suggests that changing granularity ought to be possible by
simply adjusting that #define. Maybe "->ro_elems"?

> --- a/xen/arch/x86/pv/ro-page-fault.c
> +++ b/xen/arch/x86/pv/ro-page-fault.c
> @@ -330,6 +330,7 @@ static int mmio_ro_do_page_fault(struct x86_emulate_ctxt 
> *ctxt,
>  return X86EMUL_UNHANDLEABLE;
>  }
>  
> +mmio_ro_ctxt.mfn = mfn;
>  ctxt->data = 

[PATCH v3 6/7] xen/arm: Implement the logic for static shared memory from Xen heap

2024-05-22 Thread Luca Fancellu
This commit implements the logic to have the static shared memory banks
from the Xen heap instead of having the host physical address passed from
the user.

When the host physical address is not supplied, the physical memory is
taken from the Xen heap using allocate_domheap_memory, the allocation
needs to occur at the first handled DT node and the allocated banks
need to be saved somewhere.

Introduce the 'shm_heap_banks' for that reason, a struct that will hold
the banks allocated from the heap, its field bank[].shmem_extra will be
used to point to the bootinfo shared memory banks .shmem_extra space, so
that there is not further allocation of memory and every bank in
shm_heap_banks can be safely identified by the shm_id to reconstruct its
traceability and if it was allocated or not.

A search into 'shm_heap_banks' will reveal if the banks were allocated
or not, in case the host address is not passed, and the callback given
to allocate_domheap_memory will store the banks in the structure and
map them to the current domain, to do that, some changes to
acquire_shared_memory_bank are made to let it differentiate if the bank
is from the heap and if it is, then assign_pages is called for every
bank.

When the bank is already allocated, for every bank allocated with the
corresponding shm_id, handle_shared_mem_bank is called and the mapping
are done.

Signed-off-by: Luca Fancellu 
---
v3 changes:
 - reworded commit msg section, swap role_str and gbase in
   alloc_heap_pages_cb_extra to avoid padding hole in arm32, remove
   not needed printk, modify printk to print KB instead of MB, swap
   strncmp for strcmp, reduced memory footprint for shm_heap_banks.
   (Michal)
v2 changes:
 - add static inline get_shmem_heap_banks(), given the changes to the
   struct membanks interface. Rebase changes due to removal of
   owner_dom_io arg from handle_shared_mem_bank.
   Change save_map_heap_pages return type given the changes to the
   allocate_domheap_memory callback type.
---
 xen/arch/arm/static-shmem.c | 187 ++--
 1 file changed, 155 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 74c81904b8a4..53e8d3ecf030 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -9,6 +9,25 @@
 #include 
 #include 
 
+typedef struct {
+struct domain *d;
+const char *role_str;
+paddr_t gbase;
+struct shmem_membank_extra *bank_extra_info;
+} alloc_heap_pages_cb_extra;
+
+static struct {
+struct membanks_hdr common;
+struct membank bank[NR_SHMEM_BANKS];
+} shm_heap_banks __initdata = {
+.common.max_banks = NR_SHMEM_BANKS
+};
+
+static inline struct membanks *get_shmem_heap_banks(void)
+{
+return container_of(&shm_heap_banks.common, struct membanks, common);
+}
+
 static void __init __maybe_unused build_assertions(void)
 {
 /*
@@ -63,7 +82,8 @@ static bool __init is_shm_allocated_to_domio(paddr_t pbase)
 }
 
 static mfn_t __init acquire_shared_memory_bank(struct domain *d,
-   paddr_t pbase, paddr_t psize)
+   paddr_t pbase, paddr_t psize,
+   bool bank_from_heap)
 {
 mfn_t smfn;
 unsigned long nr_pfns;
@@ -83,19 +103,31 @@ static mfn_t __init acquire_shared_memory_bank(struct 
domain *d,
 d->max_pages += nr_pfns;
 
 smfn = maddr_to_mfn(pbase);
-res = acquire_domstatic_pages(d, smfn, nr_pfns, 0);
+if ( bank_from_heap )
+/*
+ * When host address is not provided, static shared memory is
+ * allocated from heap and shall be assigned to owner domain.
+ */
+res = assign_pages(maddr_to_page(pbase), nr_pfns, d, 0);
+else
+res = acquire_domstatic_pages(d, smfn, nr_pfns, 0);
+
 if ( res )
 {
-printk(XENLOG_ERR
-   "%pd: failed to acquire static memory: %d.\n", d, res);
-d->max_pages -= nr_pfns;
-return INVALID_MFN;
+printk(XENLOG_ERR "%pd: failed to %s static memory: %d.\n", d,
+   bank_from_heap ? "assign" : "acquire", res);
+goto fail;
 }
 
 return smfn;
+
+ fail:
+d->max_pages -= nr_pfns;
+return INVALID_MFN;
 }
 
 static int __init assign_shared_memory(struct domain *d, paddr_t gbase,
+   bool bank_from_heap,
const struct membank *shm_bank)
 {
 mfn_t smfn;
@@ -108,10 +140,7 @@ static int __init assign_shared_memory(struct domain *d, 
paddr_t gbase,
 psize = shm_bank->size;
 nr_borrowers = shm_bank->shmem_extra->nr_shm_borrowers;
 
-printk("%pd: allocate static shared memory BANK 
%#"PRIpaddr"-%#"PRIpaddr".\n",
-   d, pbase, pbase + psize);
-
-smfn = acquire_shared_memory_bank(d, pbase, psize);
+smfn = acquire_shared_memory_bank(d, pbase, psize, bank_from_heap);
 if ( mfn_eq(smfn, INVALID_MFN) )
   

[PATCH v3 5/7] xen/arm: Rework heap page allocation outside allocate_bank_memory

2024-05-22 Thread Luca Fancellu
The function allocate_bank_memory allocates pages from the heap and
maps them to the guest using guest_physmap_add_page.

As a preparation work to support static shared memory bank when the
host physical address is not provided, Xen needs to allocate memory
from the heap, so rework allocate_bank_memory moving out the page
allocation in a new function called allocate_domheap_memory.

The function allocate_domheap_memory takes a callback function and
a pointer to some extra information passed to the callback and this
function will be called for every region, until a defined size is
reached.

In order to keep allocate_bank_memory functionality, the callback
passed to allocate_domheap_memory is a wrapper for
guest_physmap_add_page.

Let allocate_domheap_memory be externally visible, in order to use
it in the future from the static shared memory module.

Take the opportunity to change the signature of allocate_bank_memory
and remove the 'struct domain' parameter, which can be retrieved from
'struct kernel_info'.

No functional changes is intended.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v3 changes:
 - Add R-by Michal
v2:
 - Reduced scope of pg var in allocate_domheap_memory, removed not
   necessary BUG_ON(), changed callback to return bool and fix
   comment. (Michal)
---
 xen/arch/arm/dom0less-build.c   |  4 +-
 xen/arch/arm/domain_build.c | 79 +
 xen/arch/arm/include/asm/domain_build.h |  9 ++-
 3 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242f4..20ddf6f8f250 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -60,12 +60,12 @@ static void __init allocate_memory(struct domain *d, struct 
kernel_info *kinfo)
 
 mem->nr_banks = 0;
 bank_size = MIN(GUEST_RAM0_SIZE, kinfo->unassigned_mem);
-if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM0_BASE),
+if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(GUEST_RAM0_BASE),
bank_size) )
 goto fail;
 
 bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);
-if ( !allocate_bank_memory(d, kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
+if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(GUEST_RAM1_BASE),
bank_size) )
 goto fail;
 
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 02e741685102..669970c86fd5 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -417,30 +417,15 @@ static void __init allocate_memory_11(struct domain *d,
 }
 
 #ifdef CONFIG_DOM0LESS_BOOT
-bool __init allocate_bank_memory(struct domain *d, struct kernel_info *kinfo,
- gfn_t sgfn, paddr_t tot_size)
+bool __init allocate_domheap_memory(struct domain *d, paddr_t tot_size,
+alloc_domheap_mem_cb cb, void *extra)
 {
-struct membanks *mem = kernel_info_get_mem(kinfo);
-int res;
-struct page_info *pg;
-struct membank *bank;
-unsigned int max_order = ~0;
-
-/*
- * allocate_bank_memory can be called with a tot_size of zero for
- * the second memory bank. It is not an error and we can safely
- * avoid creating a zero-size memory bank.
- */
-if ( tot_size == 0 )
-return true;
-
-bank = &mem->bank[mem->nr_banks];
-bank->start = gfn_to_gaddr(sgfn);
-bank->size = tot_size;
+unsigned int max_order = UINT_MAX;
 
 while ( tot_size > 0 )
 {
 unsigned int order = get_allocation_size(tot_size);
+struct page_info *pg;
 
 order = min(max_order, order);
 
@@ -463,17 +448,61 @@ bool __init allocate_bank_memory(struct domain *d, struct 
kernel_info *kinfo,
 continue;
 }
 
-res = guest_physmap_add_page(d, sgfn, page_to_mfn(pg), order);
-if ( res )
-{
-dprintk(XENLOG_ERR, "Failed map pages to DOMU: %d", res);
+if ( !cb(d, pg, order, extra) )
 return false;
-}
 
-sgfn = gfn_add(sgfn, 1UL << order);
 tot_size -= (1ULL << (PAGE_SHIFT + order));
 }
 
+return true;
+}
+
+static bool __init guest_map_pages(struct domain *d, struct page_info *pg,
+   unsigned int order, void *extra)
+{
+gfn_t *sgfn = (gfn_t *)extra;
+int res;
+
+BUG_ON(!sgfn);
+res = guest_physmap_add_page(d, *sgfn, page_to_mfn(pg), order);
+if ( res )
+{
+dprintk(XENLOG_ERR, "Failed map pages to DOMU: %d", res);
+return false;
+}
+
+*sgfn = gfn_add(*sgfn, 1UL << order);
+
+return true;
+}
+
+bool __init allocate_bank_memory(struct kernel_info *kinfo, gfn_t sgfn,
+ paddr_t tot_size)
+{
+struct membanks *mem = kernel_info_get_mem(kinfo);
+struct domain *d = kinfo->d;
+struct membank *bank;
+
+/*
+ * allocate_bank_memory can

[PATCH v3 7/7] xen/docs: Describe static shared memory when host address is not provided

2024-05-22 Thread Luca Fancellu
From: Penny Zheng 

This commit describe the new scenario where host address is not provided
in "xen,shared-mem" property and a new example is added to the page to
explain in details.

Take the occasion to fix some typos in the page.

Signed-off-by: Penny Zheng 
Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v2:
 - Add Michal R-by
v1:
 - patch from 
https://patchwork.kernel.org/project/xen-devel/patch/20231206090623.1932275-10-penny.zh...@arm.com/
   with some changes in the commit message.
---
 docs/misc/arm/device-tree/booting.txt | 52 ---
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2f6..ac4bad6fe5e0 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -590,7 +590,7 @@ communication.
 An array takes a physical address, which is the base address of the
 shared memory region in host physical address space, a size, and a guest
 physical address, as the target address of the mapping.
-e.g. xen,shared-mem = < [host physical address] [guest address] [size] >
+e.g. xen,shared-mem = < [host physical address] [guest address] [size] >;
 
 It shall also meet the following criteria:
 1) If the SHM ID matches with an existing region, the address range of the
@@ -601,8 +601,8 @@ communication.
 The number of cells for the host address (and size) is the same as the
 guest pseudo-physical address and they are inherited from the parent node.
 
-Host physical address is optional, when missing Xen decides the location
-(currently unimplemented).
+Host physical address is optional, when missing Xen decides the location.
+e.g. xen,shared-mem = < [guest address] [size] >;
 
 - role (Optional)
 
@@ -629,7 +629,7 @@ chosen {
 role = "owner";
 xen,shm-id = "my-shared-mem-0";
 xen,shared-mem = <0x1000 0x1000 0x1000>;
-}
+};
 
 domU1 {
 compatible = "xen,domain";
@@ -640,25 +640,36 @@ chosen {
 vpl011;
 
 /*
- * shared memory region identified as 0x0(xen,shm-id = <0x0>)
- * is shared between Dom0 and DomU1.
+ * shared memory region "my-shared-mem-0" is shared
+ * between Dom0 and DomU1.
  */
 domU1-shared-mem@1000 {
 compatible = "xen,domain-shared-memory-v1";
 role = "borrower";
 xen,shm-id = "my-shared-mem-0";
 xen,shared-mem = <0x1000 0x5000 0x1000>;
-}
+};
 
 /*
- * shared memory region identified as 0x1(xen,shm-id = <0x1>)
- * is shared between DomU1 and DomU2.
+ * shared memory region "my-shared-mem-1" is shared between
+ * DomU1 and DomU2.
  */
 domU1-shared-mem@5000 {
 compatible = "xen,domain-shared-memory-v1";
 xen,shm-id = "my-shared-mem-1";
 xen,shared-mem = <0x5000 0x6000 0x2000>;
-}
+};
+
+/*
+ * shared memory region "my-shared-mem-2" is shared between
+ * DomU1 and DomU2.
+ */
+domU1-shared-mem-2 {
+compatible = "xen,domain-shared-memory-v1";
+xen,shm-id = "my-shared-mem-2";
+role = "owner";
+xen,shared-mem = <0x8000 0x2000>;
+};
 
 ..
 
@@ -672,14 +683,21 @@ chosen {
 cpus = <1>;
 
 /*
- * shared memory region identified as 0x1(xen,shm-id = <0x1>)
- * is shared between domU1 and domU2.
+ * shared memory region "my-shared-mem-1" is shared between
+ * domU1 and domU2.
  */
 domU2-shared-mem@5000 {
 compatible = "xen,domain-shared-memory-v1";
 xen,shm-id = "my-shared-mem-1";
 xen,shared-mem = <0x5000 0x7000 0x2000>;
-}
+};
+
+domU2-shared-mem-2 {
+compatible = "xen,domain-shared-memory-v1";
+xen,shm-id = "my-shared-mem-2";
+role = "borrower";
+xen,shared-mem = <0x9000 0x2000>;
+};
 
 ..
 };
@@ -699,3 +717,11 @@ shared between DomU1 and DomU2. It will get mapped at 
0x6000 in DomU1 guest
 physical address space, and at 0x7000 in DomU2 guest physical address 
space.
 DomU1 and DomU2 are both the borrower domain, the owner domain is the default
 owner domain DOMID_IO.
+
+For the static shared memory region "my-shared-mem-2", since host physical
+address is not provided by user, Xen will automatically allocate 512MB
+from heap as static shared memory to be shared between DomU1 and DomU2.
+The automatically allocated static shared memory will get mapped at
+0x8000 in DomU1 guest physical address space, and at 0x9000 in DomU2
+guest physical address space. DomU1 is explicitly defined as the owner domain,
+and DomU2 

[PATCH v3 0/7] Static shared memory followup v2 - pt2

2024-05-22 Thread Luca Fancellu
This serie is a partial rework of this other serie:
https://patchwork.kernel.org/project/xen-devel/cover/20231206090623.1932275-1-penny.zh...@arm.com/

The original serie is addressing an issue of the static shared memory feature
that impacts the memory footprint of other component when the feature is
enabled, another issue impacts the device tree generation for the guests when
the feature is enabled and used and the last one is a missing feature that is
the option to have a static shared memory region that is not from the host
address space.

This serie is handling some comment on the original serie and it is splitting
the rework in two part, this first part is addressing the memory footprint issue
and the device tree generation and currently is fully merged
(https://patchwork.kernel.org/project/xen-devel/cover/20240418073652.3622828-1-luca.fance...@arm.com/),
this serie is addressing the static shared memory allocation from the Xen heap.

Luca Fancellu (5):
  xen/arm: Lookup bootinfo shm bank during the mapping
  xen/arm: Wrap shared memory mapping code in one function
  xen/arm: Parse xen,shared-mem when host phys address is not provided
  xen/arm: Rework heap page allocation outside allocate_bank_memory
  xen/arm: Implement the logic for static shared memory from Xen heap

Penny Zheng (2):
  xen/p2m: put reference for level 2 superpage
  xen/docs: Describe static shared memory when host address is not
provided

 docs/misc/arm/device-tree/booting.txt   |  52 ++-
 xen/arch/arm/arm32/mmu/mm.c |  11 +-
 xen/arch/arm/dom0less-build.c   |   4 +-
 xen/arch/arm/domain_build.c |  84 +++--
 xen/arch/arm/include/asm/domain_build.h |   9 +-
 xen/arch/arm/mmu/p2m.c  |  63 +++-
 xen/arch/arm/setup.c|  14 +-
 xen/arch/arm/static-shmem.c | 432 +---
 8 files changed, 486 insertions(+), 183 deletions(-)

-- 
2.34.1




[PATCH v3 4/7] xen/arm: Parse xen,shared-mem when host phys address is not provided

2024-05-22 Thread Luca Fancellu
Handle the parsing of the 'xen,shared-mem' property when the host physical
address is not provided, this commit is introducing the logic to parse it,
but the functionality is still not implemented and will be part of future
commits.

Rework the logic inside process_shm_node to check the shm_id before doing
the other checks, because it ease the logic itself, add more comment on
the logic.
Now when the host physical address is not provided, the value
INVALID_PADDR is chosen to signal this condition and it is stored as
start of the bank, due to that change also early_print_info_shmem and
init_sharedmem_pages are changed, to not handle banks with start equal
to INVALID_PADDR.

Another change is done inside meminfo_overlap_check, to skip banks that
are starting with the start address INVALID_PADDR, that function is used
to check banks from reserved memory, shared memory and ACPI and since
the comment above the function states that wrapping around is not handled,
it's unlikely for these bank to have the start address as INVALID_PADDR.
Same change is done inside consider_modules, find_unallocated_memory and
dt_unreserved_regions functions, in order to skip banks that starts with
INVALID_PADDR from any computation.
The changes above holds because of this consideration.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v3 changes:
 - fix typo in commit msg, add R-by Michal
v2 changes:
 - fix comments, add parenthesis to some conditions, remove unneeded
   variables, remove else branch, increment counter in the for loop,
   skip INVALID_PADDR start banks from also consider_modules,
   find_unallocated_memory and dt_unreserved_regions. (Michal)
---
 xen/arch/arm/arm32/mmu/mm.c |  11 +++-
 xen/arch/arm/domain_build.c |   5 ++
 xen/arch/arm/setup.c|  14 +++-
 xen/arch/arm/static-shmem.c | 125 +---
 4 files changed, 111 insertions(+), 44 deletions(-)

diff --git a/xen/arch/arm/arm32/mmu/mm.c b/xen/arch/arm/arm32/mmu/mm.c
index be480c31ea05..30a7aa1e8e51 100644
--- a/xen/arch/arm/arm32/mmu/mm.c
+++ b/xen/arch/arm/arm32/mmu/mm.c
@@ -101,8 +101,15 @@ static paddr_t __init consider_modules(paddr_t s, paddr_t 
e,
 nr += reserved_mem->nr_banks;
 for ( ; i - nr < shmem->nr_banks; i++ )
 {
-paddr_t r_s = shmem->bank[i - nr].start;
-paddr_t r_e = r_s + shmem->bank[i - nr].size;
+paddr_t r_s, r_e;
+
+r_s = shmem->bank[i - nr].start;
+
+/* Shared memory banks can contain INVALID_PADDR as start */
+if ( INVALID_PADDR == r_s )
+continue;
+
+r_e = r_s + shmem->bank[i - nr].size;
 
 if ( s < r_e && r_s < e )
 {
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 968c497efc78..02e741685102 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -927,6 +927,11 @@ static int __init find_unallocated_memory(const struct 
kernel_info *kinfo,
 for ( j = 0; j < mem_banks[i]->nr_banks; j++ )
 {
 start = mem_banks[i]->bank[j].start;
+
+/* Shared memory banks can contain INVALID_PADDR as start */
+if ( INVALID_PADDR == start )
+continue;
+
 end = mem_banks[i]->bank[j].start + mem_banks[i]->bank[j].size;
 res = rangeset_remove_range(unalloc_mem, PFN_DOWN(start),
 PFN_DOWN(end - 1));
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index c4e5c19b11d6..0c2fdaceaf21 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -240,8 +240,15 @@ static void __init dt_unreserved_regions(paddr_t s, 
paddr_t e,
 offset = reserved_mem->nr_banks;
 for ( ; i - offset < shmem->nr_banks; i++ )
 {
-paddr_t r_s = shmem->bank[i - offset].start;
-paddr_t r_e = r_s + shmem->bank[i - offset].size;
+paddr_t r_s, r_e;
+
+r_s = shmem->bank[i - offset].start;
+
+/* Shared memory banks can contain INVALID_PADDR as start */
+if ( INVALID_PADDR == r_s )
+continue;
+
+r_e = r_s + shmem->bank[i - offset].size;
 
 if ( s < r_e && r_s < e )
 {
@@ -272,7 +279,8 @@ static bool __init meminfo_overlap_check(const struct 
membanks *mem,
 bank_start = mem->bank[i].start;
 bank_end = bank_start + mem->bank[i].size;
 
-if ( region_end <= bank_start || region_start >= bank_end )
+if ( INVALID_PADDR == bank_start || region_end <= bank_start ||
+ region_start >= bank_end )
 continue;
 else
 {
diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index c15a65130659..74c81904b8a4 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -264,6 +264,12 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
 pbase = boot_shm_bank->start;
 psize = boot_shm_bank->size;
 
+if ( INVALID_PADDR == pbase )
+{
+  

[PATCH v3 2/7] xen/arm: Wrap shared memory mapping code in one function

2024-05-22 Thread Luca Fancellu
Wrap the code and logic that is calling assign_shared_memory
and map_regions_p2mt into a new function 'handle_shared_mem_bank',
it will become useful later when the code will allow the user to
don't pass the host physical address.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v3 changes:
 - check return value of dt_property_read_string, add R-by Michal
v2 changes:
 - add blank line, move owner_dom_io computation inside
   handle_shared_mem_bank in order to reduce args count, remove
   not needed BUGON(). (Michal)
---
 xen/arch/arm/static-shmem.c | 86 +++--
 1 file changed, 53 insertions(+), 33 deletions(-)

diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 0a1c327e90ea..c15a65130659 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -180,6 +180,53 @@ append_shm_bank_to_domain(struct kernel_info *kinfo, 
paddr_t start,
 return 0;
 }
 
+static int __init handle_shared_mem_bank(struct domain *d, paddr_t gbase,
+ const char *role_str,
+ const struct membank *shm_bank)
+{
+bool owner_dom_io = true;
+paddr_t pbase, psize;
+int ret;
+
+pbase = shm_bank->start;
+psize = shm_bank->size;
+
+/*
+ * "role" property is optional and if it is defined explicitly,
+ * then the owner domain is not the default "dom_io" domain.
+ */
+if ( role_str != NULL )
+owner_dom_io = false;
+
+/*
+ * DOMID_IO is a fake domain and is not described in the Device-Tree.
+ * Therefore when the owner of the shared region is DOMID_IO, we will
+ * only find the borrowers.
+ */
+if ( (owner_dom_io && !is_shm_allocated_to_domio(pbase)) ||
+ (!owner_dom_io && strcmp(role_str, "owner") == 0) )
+{
+/*
+ * We found the first borrower of the region, the owner was not
+ * specified, so they should be assigned to dom_io.
+ */
+ret = assign_shared_memory(owner_dom_io ? dom_io : d, gbase, shm_bank);
+if ( ret )
+return ret;
+}
+
+if ( owner_dom_io || (strcmp(role_str, "borrower") == 0) )
+{
+/* Set up P2M foreign mapping for borrower domain. */
+ret = map_regions_p2mt(d, _gfn(PFN_UP(gbase)), PFN_DOWN(psize),
+   _mfn(PFN_UP(pbase)), p2m_map_foreign_rw);
+if ( ret )
+return ret;
+}
+
+return 0;
+}
+
 int __init process_shm(struct domain *d, struct kernel_info *kinfo,
const struct dt_device_node *node)
 {
@@ -196,7 +243,6 @@ int __init process_shm(struct domain *d, struct kernel_info 
*kinfo,
 unsigned int i;
 const char *role_str;
 const char *shm_id;
-bool owner_dom_io = true;
 
 if ( !dt_device_is_compatible(shm_node, "xen,domain-shared-memory-v1") 
)
 continue;
@@ -237,39 +283,13 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
 return -EINVAL;
 }
 
-/*
- * "role" property is optional and if it is defined explicitly,
- * then the owner domain is not the default "dom_io" domain.
- */
-if ( dt_property_read_string(shm_node, "role", &role_str) == 0 )
-owner_dom_io = false;
+/* "role" property is optional */
+if ( dt_property_read_string(shm_node, "role", &role_str) != 0 )
+role_str = NULL;
 
-/*
- * DOMID_IO is a fake domain and is not described in the Device-Tree.
- * Therefore when the owner of the shared region is DOMID_IO, we will
- * only find the borrowers.
- */
-if ( (owner_dom_io && !is_shm_allocated_to_domio(pbase)) ||
- (!owner_dom_io && strcmp(role_str, "owner") == 0) )
-{
-/*
- * We found the first borrower of the region, the owner was not
- * specified, so they should be assigned to dom_io.
- */
-ret = assign_shared_memory(owner_dom_io ? dom_io : d, gbase,
-   boot_shm_bank);
-if ( ret )
-return ret;
-}
-
-if ( owner_dom_io || (strcmp(role_str, "borrower") == 0) )
-{
-/* Set up P2M foreign mapping for borrower domain. */
-ret = map_regions_p2mt(d, _gfn(PFN_UP(gbase)), PFN_DOWN(psize),
-   _mfn(PFN_UP(pbase)), p2m_map_foreign_rw);
-if ( ret )
-return ret;
-}
+ret = handle_shared_mem_bank(d, gbase, role_str, boot_shm_bank);
+if ( ret )
+return ret;
 
 /*
  * Record static shared memory region info for later setting
-- 
2.34.1




[PATCH v3 3/7] xen/p2m: put reference for level 2 superpage

2024-05-22 Thread Luca Fancellu
From: Penny Zheng 

We are doing foreign memory mapping for static shared memory, and
there is a great possibility that it could be super mapped.
But today, p2m_put_l3_page could not handle superpages.

This commits implements a new function p2m_put_l2_superpage to handle
2MB superpages, specifically for helping put extra references for
foreign superpages.

Modify relinquish_p2m_mapping as well to take into account preemption
when type is foreign memory and order is above 9 (2MB).

Currently 1GB superpages are not handled because Xen is not preemptible
and therefore some work is needed to handle such superpages, for which
at some point Xen might end up freeing memory and therefore for such a
big mapping it could end up in a very long operation.

Signed-off-by: Penny Zheng 
Signed-off-by: Luca Fancellu 
---
v3:
 - Add reasoning why we don't support now 1GB superpage, remove level_order
   variable from p2m_put_l2_superpage, update TODO comment inside
   p2m_free_entry, use XEN_PT_LEVEL_ORDER(2) instead of value 9 inside
   relinquish_p2m_mapping. (Michal)
v2:
 - Do not handle 1GB super page as there might be some issue where
   a lot of calls to put_page(...) might be issued which could lead
   to free memory that is a long operation.
v1:
 - patch from 
https://patchwork.kernel.org/project/xen-devel/patch/20231206090623.1932275-9-penny.zh...@arm.com/
---
 xen/arch/arm/mmu/p2m.c | 63 ++
 1 file changed, 46 insertions(+), 17 deletions(-)

diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c
index 41fcca011cf4..b496266deef6 100644
--- a/xen/arch/arm/mmu/p2m.c
+++ b/xen/arch/arm/mmu/p2m.c
@@ -753,17 +753,9 @@ static int p2m_mem_access_radix_set(struct p2m_domain 
*p2m, gfn_t gfn,
 return rc;
 }
 
-/*
- * Put any references on the single 4K page referenced by pte.
- * TODO: Handle superpages, for now we only take special references for leaf
- * pages (specifically foreign ones, which can't be super mapped today).
- */
-static void p2m_put_l3_page(const lpae_t pte)
+/* Put any references on the single 4K page referenced by mfn. */
+static void p2m_put_l3_page(mfn_t mfn, p2m_type_t type)
 {
-mfn_t mfn = lpae_get_mfn(pte);
-
-ASSERT(p2m_is_valid(pte));
-
 /*
  * TODO: Handle other p2m types
  *
@@ -771,16 +763,43 @@ static void p2m_put_l3_page(const lpae_t pte)
  * flush the TLBs if the page is reallocated before the end of
  * this loop.
  */
-if ( p2m_is_foreign(pte.p2m.type) )
+if ( p2m_is_foreign(type) )
 {
 ASSERT(mfn_valid(mfn));
 put_page(mfn_to_page(mfn));
 }
 /* Detect the xenheap page and mark the stored GFN as invalid. */
-else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) )
+else if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) )
 page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN);
 }
 
+/* Put any references on the superpage referenced by mfn. */
+static void p2m_put_l2_superpage(mfn_t mfn, p2m_type_t type)
+{
+unsigned int i;
+
+for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ )
+{
+p2m_put_l3_page(mfn, type);
+
+mfn = mfn_add(mfn, 1);
+}
+}
+
+/* Put any references on the page referenced by pte. */
+static void p2m_put_page(const lpae_t pte, unsigned int level)
+{
+mfn_t mfn = lpae_get_mfn(pte);
+
+ASSERT(p2m_is_valid(pte));
+
+/* We have a second level 2M superpage */
+if ( p2m_is_superpage(pte, level) && (level == 2) )
+return p2m_put_l2_superpage(mfn, pte.p2m.type);
+else if ( level == 3 )
+return p2m_put_l3_page(mfn, pte.p2m.type);
+}
+
 /* Free lpae sub-tree behind an entry */
 static void p2m_free_entry(struct p2m_domain *p2m,
lpae_t entry, unsigned int level)
@@ -809,9 +828,16 @@ static void p2m_free_entry(struct p2m_domain *p2m,
 #endif
 
 p2m->stats.mappings[level]--;
-/* Nothing to do if the entry is a super-page. */
-if ( level == 3 )
-p2m_put_l3_page(entry);
+/*
+ * TODO: Currently we don't handle 1GB super-page, Xen is not
+ * preemptible and therefore some work is needed to handle such
+ * superpages, for which at some point Xen might end up freeing memory
+ * and therefore for such a big mapping it could end up in a very long
+ * operation.
+ */
+if ( level >= 2 )
+p2m_put_page(entry, level);
+
 return;
 }
 
@@ -1558,9 +1584,12 @@ int relinquish_p2m_mapping(struct domain *d)
 
 count++;
 /*
- * Arbitrarily preempt every 512 iterations.
+ * Arbitrarily preempt every 512 iterations or when type is foreign
+ * mapping and the order is above 9 (2MB).
  */
-if ( !(count % 512) && hypercall_preempt_check() )
+if ( (!(count % 512) ||
+  (p2m_is_foreign(t) && (order > XEN_PT_LEVEL_ORDER(2 &&
+ hypercall_preempt_check() )
 {
 rc = -ERES

[PATCH v3 1/7] xen/arm: Lookup bootinfo shm bank during the mapping

2024-05-22 Thread Luca Fancellu
The current static shared memory code is using bootinfo banks when it
needs to find the number of borrowers, so every time assign_shared_memory
is called, the bank is searched in the bootinfo.shmem structure.

There is nothing wrong with it, however the bank can be used also to
retrieve the start address and size and also to pass less argument to
assign_shared_memory. When retrieving the information from the bootinfo
bank, it's also possible to move the checks on alignment to
process_shm_node in the early stages.

So create a new function find_shm_bank_by_id() which takes a
'struct shared_meminfo' structure and the shared memory ID, to look for a
bank with a matching ID, take the physical host address and size from the
bank, pass the bank to assign_shared_memory() removing the now unnecessary
arguments and finally remove the acquire_nr_borrower_domain() function
since now the information can be extracted from the passed bank.
Move the "xen,shm-id" parsing early in process_shm to bail out quickly in
case of errors (unlikely), as said above, move the checks on alignment
to process_shm_node.

Drawback of this change is that now the bootinfo are used also when the
bank doesn't need to be allocated, however it will be convenient later
to use it as an argument for assign_shared_memory when dealing with
the use case where the Host physical address is not supplied by the user.

Signed-off-by: Luca Fancellu 
Reviewed-by: Michal Orzel 
---
v3 changes:
 - switch strncmp with strcmp in find_shm_bank_by_id, fix commit msg typo,
   add R-by Michal.
v2 changes:
 - fix typo commit msg, renamed find_shm() to find_shm_bank_by_id(),
   swap region size check different from zero and size alignment, remove
   not necessary BUGON(). (Michal)
---
 xen/arch/arm/static-shmem.c | 100 +++-
 1 file changed, 53 insertions(+), 47 deletions(-)

diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c
index 78881dd1d3f7..0a1c327e90ea 100644
--- a/xen/arch/arm/static-shmem.c
+++ b/xen/arch/arm/static-shmem.c
@@ -19,29 +19,21 @@ static void __init __maybe_unused build_assertions(void)
  offsetof(struct shared_meminfo, bank)));
 }
 
-static int __init acquire_nr_borrower_domain(struct domain *d,
- paddr_t pbase, paddr_t psize,
- unsigned long *nr_borrowers)
+static const struct membank __init *
+find_shm_bank_by_id(const struct membanks *shmem, const char *shm_id)
 {
-const struct membanks *shmem = bootinfo_get_shmem();
 unsigned int bank;
 
-/* Iterate reserved memory to find requested shm bank. */
 for ( bank = 0 ; bank < shmem->nr_banks; bank++ )
 {
-paddr_t bank_start = shmem->bank[bank].start;
-paddr_t bank_size = shmem->bank[bank].size;
-
-if ( (pbase == bank_start) && (psize == bank_size) )
+if ( strcmp(shm_id, shmem->bank[bank].shmem_extra->shm_id) == 0 )
 break;
 }
 
 if ( bank == shmem->nr_banks )
-return -ENOENT;
+return NULL;
 
-*nr_borrowers = shmem->bank[bank].shmem_extra->nr_shm_borrowers;
-
-return 0;
+return &shmem->bank[bank];
 }
 
 /*
@@ -103,14 +95,18 @@ static mfn_t __init acquire_shared_memory_bank(struct 
domain *d,
 return smfn;
 }
 
-static int __init assign_shared_memory(struct domain *d,
-   paddr_t pbase, paddr_t psize,
-   paddr_t gbase)
+static int __init assign_shared_memory(struct domain *d, paddr_t gbase,
+   const struct membank *shm_bank)
 {
 mfn_t smfn;
 int ret = 0;
 unsigned long nr_pages, nr_borrowers, i;
 struct page_info *page;
+paddr_t pbase, psize;
+
+pbase = shm_bank->start;
+psize = shm_bank->size;
+nr_borrowers = shm_bank->shmem_extra->nr_shm_borrowers;
 
 printk("%pd: allocate static shared memory BANK 
%#"PRIpaddr"-%#"PRIpaddr".\n",
d, pbase, pbase + psize);
@@ -135,14 +131,6 @@ static int __init assign_shared_memory(struct domain *d,
 }
 }
 
-/*
- * Get the right amount of references per page, which is the number of
- * borrower domains.
- */
-ret = acquire_nr_borrower_domain(d, pbase, psize, &nr_borrowers);
-if ( ret )
-return ret;
-
 /*
  * Instead of letting borrower domain get a page ref, we add as many
  * additional reference as the number of borrowers when the owner
@@ -199,6 +187,7 @@ int __init process_shm(struct domain *d, struct kernel_info 
*kinfo,
 
 dt_for_each_child_node(node, shm_node)
 {
+const struct membank *boot_shm_bank;
 const struct dt_property *prop;
 const __be32 *cells;
 uint32_t addr_cells, size_cells;
@@ -212,6 +201,23 @@ int __init process_shm(struct domain *d, struct 
kernel_info *kinfo,
 if ( !dt_device_is_compatible(shm_node, "xen,domain-shared-memory-v1") 

[linux-linus test] 186065: regressions - FAIL

2024-05-22 Thread osstest service owner
flight 186065 linux-linus real [real]
flight 186071 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/186065/
http://logs.test-lab.xenproject.org/osstest/logs/186071/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-examine  8 reboot   fail REGR. vs. 186052

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds  8 xen-boot fail REGR. vs. 186052

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt   16 saverestore-support-check fail blocked in 186052
 test-armhf-armhf-xl-qcow2 8 xen-boot fail  like 186052
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186052
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186052
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186052
 test-armhf-armhf-xl-credit1   8 xen-boot fail  like 186052
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186052
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186052
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass

version targeted for testing:
 linux5ad8b6ad9a08abdbc8c57a51a5faaf2ef1afc547
baseline version:
 linux8f6a15f095a63a83b096d9b29aaff4f0fbe6f6e6

Last test of basis   186052  2024-05-21 01:42:42 Z1 days
Testing same since   186065  2024-05-21 16:10:24 Z0 days1 attempts


People who touched revisions under test:
  Al Viro 
  Christian Brauner 
  Kent Overstreet 
  Linus Torvalds 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   

Re: [PATCH v10 03/14] xen/bitops: implement fls{l}() in common logic

2024-05-22 Thread Oleksii K.
On Tue, 2024-05-21 at 13:18 +0200, Jan Beulich wrote:
> On 17.05.2024 15:54, Oleksii Kurochko wrote:
> > To avoid the compilation error below, it is needed to update to
> > places
> > in common/page_alloc.c where flsl() is used as now flsl() returns
> > unsigned int:
> > 
> > ./include/xen/kernel.h:18:21: error: comparison of distinct pointer
> > types lacks a cast [-Werror]
> >    18 | (void) (&_x == &_y);    \
> >   | ^~
> >     common/page_alloc.c:1843:34: note: in expansion of macro 'min'
> >  1843 | unsigned int inc_order = min(MAX_ORDER, flsl(e
> > - s) - 1);
> > 
> > generic_fls{l} was used instead of __builtin_clz{l}(x) as if x is
> > 0,
> > the result in undefined.
> > 
> > The prototype of the per-architecture fls{l}() functions was
> > changed to
> > return 'unsigned int' to align with the generic implementation of
> > these
> > functions and avoid introducing signed/unsigned mismatches.
> > 
> > Signed-off-by: Oleksii Kurochko 
> > ---
> >  The patch is almost independent from Andrew's patch series
> >  (
> > https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.coop...@citrix.com/T/#t
> > )
> >  except test_fls() function which IMO can be merged as a separate
> > patch after Andrew's patch
> >  will be fully ready.
> 
> If there wasn't this dependency (I don't think it's "almost
> independent"),
> I'd be offering R-b with again one nit below.

Aren't all changes, except those in xen/common/bitops.c, independent? I
could move these changes in xen/common/bitops.c to a separate commit. I
think it is safe to commit them ( an introduction of common logic for
fls{l}() and tests ) separately since the CI tests have passed.

~ Oleksii

> 
> > --- a/xen/arch/x86/include/asm/bitops.h
> > +++ b/xen/arch/x86/include/asm/bitops.h
> > @@ -425,20 +425,21 @@ static always_inline unsigned int
> > arch_ffsl(unsigned long x)
> >   *
> >   * This is defined the same way as ffs.
> >   */
> > -static inline int flsl(unsigned long x)
> > +static always_inline unsigned int arch_flsl(unsigned long x)
> >  {
> > -    long r;
> > +    unsigned long r;
> >  
> >  asm ( "bsr %1,%0\n\t"
> >    "jnz 1f\n\t"
> >    "mov $-1,%0\n"
> >    "1:" : "=r" (r) : "rm" (x));
> > -    return (int)r+1;
> > +    return (unsigned int)r+1;
> 
> Since you now touch this, you'd better tidy it at the same time:
> 
>     return r + 1;
> 
> (i.e. style and no need for a cast).
> 
> Jan




Re: [PATCH for-4.19 0/3] xen: Misc MISRA changes

2024-05-22 Thread Oleksii K.
Hi Andrew,

We can consider this patch series to be in Xen 4.19:
 Release-acked-by: Oleksii Kurochko 

~ Oleksii
On Tue, 2024-05-21 at 18:15 +0100, Andrew Cooper wrote:
> Misc fixes collected during today's call.
> 
> Andrew Cooper (3):
>   xen/lzo: Implement COPY{4,8} using memcpy()
>   xen/x86: Drop useless non-Kconfig CONFIG_* variables
>   xen/x86: Address two misc MISRA 17.7 violations
> 
>  xen/arch/x86/alternative.c    |  4 ++--
>  xen/arch/x86/include/asm/config.h |  4 
>  xen/arch/x86/nmi.c    |  5 ++---
>  xen/common/lzo.c  | 11 ++-
>  xen/include/xen/acpi.h    |  9 -
>  xen/include/xen/watchdog.h    | 13 +
>  6 files changed, 7 insertions(+), 39 deletions(-)
> 
> 
> base-commit: 26b122e3bf8f3921d87312fbf5e7e13872ae92b0