date:20240219

On 19.02.2024 16:14, Nicola Vetrini wrote:
> The cache clearing and invalidation helpers in x86 and Arm didn't
> comply with MISRA C Rule 17.7: "The value returned by a function
> having non-void return type shall be used". On Arm they
> were always returning 0, while some in x86 returned -EOPNOTSUPP
> and in common/grant_table the return value is saved.
> 
> As a consequence, a common helper arch_grant_cache_flush that returns
> an integer is introduced, so that each architecture can choose whether to
> return an error value on certain conditions, and the helpers have either
> been changed to return void (on Arm) or deleted entirely (on x86).
> 
> Signed-off-by: Julien Grall 
> Signed-off-by: Nicola Vetrini 
> ---
> The original refactor idea came from Julien Grall in [1]; I edited that 
> proposal
> to fix build errors.
> 
> I did introduce a cast to void for the call to flush_area_local on x86, 
> because
> even before this patch the return value of that function wasn't checked in all
> but one use in x86/smp.c, and in this context the helper (perhaps 
> incidentally)
> ignored the return value of flush_area_local.

I object to such casting to void, at least until there's an overriding
decision that for Misra purposes such casts may be needed.

> --- a/xen/arch/arm/include/asm/page.h
> +++ b/xen/arch/arm/include/asm/page.h
> @@ -123,6 +123,7 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +#include 

This is a no-go, imo (also on x86): Adding this include here effectively
means that nearly every CU will have a dependency on that header, no
matter that most are entirely agnostic of grants. Each arch has a
grant_table.h - is there any reason the new, grant-specific helper can't
be put there?

> @@ -182,21 +183,21 @@ void flush_area_mask(const cpumask_t *mask, const void 
> *va,
>  }
>  
>  static inline void flush_page_to_ram(unsigned long mfn, bool sync_icache) {}
> -static inline int invalidate_dcache_va_range(const void *p,
> - unsigned long size)
> -{ return -EOPNOTSUPP; }
> -static inline int clean_and_invalidate_dcache_va_range(const void *p,
> -   unsigned long size)
> +
> +static inline int arch_grant_cache_flush(unsigned int op, const void *p,
> + unsigned long size)
>  {
> -unsigned int order = get_order_from_bytes(size);
> +unsigned int order;
> +
> +if ( !(op & GNTTAB_CACHE_CLEAN) )
> +return -EOPNOTSUPP;
> +
> +order = get_order_from_bytes(size);
>  /* sub-page granularity support needs to be added if necessary */
> -flush_area_local(p, FLUSH_CACHE|FLUSH_ORDER(order));
> +(void) flush_area_local(p, FLUSH_CACHE|FLUSH_ORDER(order));

As to my objection to the addition of a cast, did you actually check
what this function returns, before saying "incidentally" in the
respective remark? Already the return type being "unsigned int" is
indicative of the return value not being suitable here for handing
on.

In addition there shouldn't be a blank after a cast. Instead, if
already you were to touch this line, it would have been nice if you
took the opportunity and added the missing blanks around the binary
operator involved.

Jan

Re: [XEN PATCH v2 2/3] x86/uaccess: replace __{get,put}_user_bad() with STATIC_ASSERT_UNREACHABLE()

On 19.02.2024 21:43, Stefano Stabellini wrote:
> On Mon, 19 Feb 2024, Federico Serafini wrote:
>> On 15/02/24 11:32, Jan Beulich wrote:
>>> The important difference is: Here we're told that there was a use of
>>> __put_user_bad, which is easy to grep for, and thus see how the
>>> supplied function / file / line(?) relate to the ultimate problem.
>>>
>>> I'm afraid I'm meanwhile confused enough by the various replies
>>> containing results of experimentation that I can't really tell
>>> anymore what case is best. Hence I can only restate my expectation for
>>> an eventual v3: Diagnosing what the issue is, no matter whether the new
>>> macro is used in another macro or in an inline function, should not
>>> become meaningfully more difficult. In how far this is the case wants
>>> clarifying in the description of the change.
>>
>> I think the best thing at the moment is to deviate
>> __{get,put}_user_bad() for Rule 16.3.
>> I'll let maintainers further explore the possibility of having a
>> compile-time assertion based on the assembler error.
> 
> OK. I hope Jan is OK to deviate by in-code comment.

Hmm, the follow-on suggestion was to add break statements? Followed
by me asking whether adding noreturn to the decls wouldn't also help.
(Then again I was under the impression that there was more than just
the "missing" break statements which Misra thought was an issue here.)

Jan

Re: [PATCH 1/2] hw/xen: detect when running inside stubdomain

2024-02-19 Thread Philippe Mathieu-Daudé


On 19/2/24 19:16, Marek Marczykowski-Górecki wrote:

Introduce global xen_is_stubdomain variable when qemu is running inside
a stubdomain instead of dom0. This will be relevant for subsequent
patches, as few things like accessing PCI config space need to be done
differently.

Signed-off-by: Marek Marczykowski-Górecki 
---
  hw/xen/xen-legacy-backend.c | 15 +++
  include/hw/xen/xen.h|  1 +
  system/globals.c|  1 +
  3 files changed, 17 insertions(+)




diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 37ecc91fc3..ecb89ecfc1 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -36,6 +36,7 @@ enum xen_mode {
  extern uint32_t xen_domid;
  extern enum xen_mode xen_mode;
  extern bool xen_domid_restrict;
+extern bool xen_is_stubdomain;
  
  int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);

  int xen_set_pci_link_route(uint8_t link, uint8_t irq);
diff --git a/system/globals.c b/system/globals.c
index b6d4e72530..ac27d88bd4 100644
--- a/system/globals.c
+++ b/system/globals.c
@@ -62,6 +62,7 @@ bool qemu_uuid_set;
  uint32_t xen_domid;
  enum xen_mode xen_mode = XEN_DISABLED;
  bool xen_domid_restrict;
+bool xen_is_stubdomain;


Note for myself, Paolo and Claudio, IIUC these fields belong
to TYPE_XEN_ACCEL in accel/xen/xen-all.c. Maybe resulting in
smth like:

-- >8 --
diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
index 5ff0cb8bd9..fc25d8c912 100644
--- a/accel/xen/xen-all.c
+++ b/accel/xen/xen-all.c
@@ -24,11 +24,31 @@
 #include "migration/global_state.h"
 #include "hw/boards.h"

-bool xen_allowed;
+struct XenAccelState
+{
+AccelState parent_obj;
+
+bool xen_allowed;
+
+enum xen_mode xen_mode;
+
+uint32_t xen_domid;
+bool xen_domid_restrict;

 xc_interface *xen_xc;
 xenforeignmemory_handle *xen_fmem;
 xendevicemodel_handle *xen_dmod;
+};
+
+struct XenAccelOpsClass
+{
+AccelOpsClass parent_class;
+
+struct evtchn_backend_ops *xen_evtchn_ops;
+struct gnttab_backend_ops *xen_gnttab_ops;
+struct foreignmem_backend_ops *xen_foreignmem_ops;
+struct xenstore_backend_ops *xen_xenstore_ops;
+}

 static void xenstore_record_dm_state(const char *state)
 {
@@ -114,6 +134,13 @@ static int xen_init(MachineState *ms)
 return 0;
 }

+static void xen_accel_init(Object *obj)
+{
+XenAccelState *s = XEN_ACCEL(obj);
+
+s->xen_mode = XEN_DISABLED;
+}
+
 static void xen_accel_class_init(ObjectClass *oc, void *data)
 {
 AccelClass *ac = ACCEL_CLASS(oc);
@@ -142,6 +169,8 @@ static void xen_accel_class_init(ObjectClass *oc, 
void *data)

 static const TypeInfo xen_accel_type = {
 .name = TYPE_XEN_ACCEL,
 .parent = TYPE_ACCEL,
+.instance_size = sizeof(XenAccelState),
+.instance_init = xen_accel_init,
 .class_init = xen_accel_class_init,
 };

@@ -157,6 +186,7 @@ static const TypeInfo xen_accel_ops_type = {

 .parent = TYPE_ACCEL_OPS,
 .class_init = xen_accel_ops_class_init,
+.class_size = sizeof(XenAccelOpsClass),
 .abstract = true,
 };

---

[xen-unstable test] 184705: tolerable FAIL - PUSHED

flight 184705 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184705/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184699
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184699
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184699
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184699
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184699
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184699
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184699
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184699
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184699
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184699
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184699
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184699
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  c144b9e32427ba37e0e0443a0d3fa53e9fb878b6
baseline version:
 xen  0441c3acc7e9e72e984ce49d32e61827894ae4a3

Last test of basis   184699  2024-02-19 01:52:12 Z1 days
Testing same since   184703  2024-02-19 14:38:55 Z0 days2 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 
  Frediano Ziglio

Re: [PATCH] ns16550: add Asix AX99100 serial card

On Mon, Feb 19, 2024 at 12:13:18PM +0100, Marek Marczykowski-Górecki wrote:
> On Mon, Feb 19, 2024 at 09:57:49AM +0100, Jan Beulich wrote:
> > On 18.02.2024 02:34, Marek Marczykowski-Górecki wrote:
> > > @@ -1170,6 +1177,11 @@ static const struct ns16550_config __initconst 
> > > uart_config[] =
> > >  .dev_id = 0x7adc,
> > >  .param = param_intel_lpss
> > >  },
> > > +{
> > > +.vendor_id = PCI_VENDOR_ID_ASIX,
> > > +.dev_id = 9100,
> > 
> > As per Linux this is 0x9100.
> 
> Right... but then, maybe the patch isn't needed at all, as it does work
> for me. Maybe what's needed instead is some other patch already in
> staging. Initial attempt that did not work was with 4.17.something.
> I guess setting the fifo size isn't that important.

Indeed, the patch is not needed after all, plain "master" from today
works.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [PATCH v6 02/15] xen/arm: add initial support for LLC coloring on arm64


Hi,

On 14/02/2024 13:52, Carlo Nonato wrote:

On Wed, Feb 14, 2024 at 11:14 AM Michal Orzel  wrote:

diff --git a/xen/arch/arm/include/asm/processor.h 
b/xen/arch/arm/include/asm/processor.h
index 8e02410465..336933ee62 100644
--- a/xen/arch/arm/include/asm/processor.h
+++ b/xen/arch/arm/include/asm/processor.h
@@ -18,6 +18,22 @@
  #define CTR_IDC_SHIFT   28
  #define CTR_DIC_SHIFT   29

+/* CCSIDR Current Cache Size ID Register */
+#define CCSIDR_LINESIZE_MASK_AC(0x7, ULL)

Why ULL and not UL? ccsidr is of register_t type


Julien, while reviewing an earlier version:


Please use ULL here otherwise someone using MASK << SHIFT will have the
expected result.


https://patchew.org/Xen/20220826125111.152261-1-carlo.non...@minervasys.tech/20220826125111.152261-2-carlo.non...@minervasys.tech/#08956082-c194-8bae-cb25-44e4e3227...@xen.org


Michal is right. This should be UL. Not sure why I suggested ULL back 
then. Sorry.


Cheers,

--
Julien Grall

Re: [PATCH 2/2] xen: fix stubdom PCI addr

On Mon, Feb 19, 2024 at 07:16:06PM +0100, Marek Marczykowski-Górecki wrote:
> From: Frédéric Pierret (fepitre) 

This shouldn't be here, it's my patch.

> When running in a stubdomain, the config space access via sysfs needs to
> use BDF as seen inside stubdomain (connected via xen-pcifront), which is
> different from the real BDF. For other purposes (hypercall parameters
> etc), the real BDF needs to be used.
> Get the in-stubdomain BDF by looking up relevant PV PCI xenstore
> entries.
> 
> Signed-off-by: Marek Marczykowski-Górecki 
> ---
>  hw/xen/xen-host-pci-device.c | 77 +++-
>  hw/xen/xen-host-pci-device.h |  6 +++
>  2 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
> index 8c6e9a1716..3f8a6f84a8 100644
> --- a/hw/xen/xen-host-pci-device.c
> +++ b/hw/xen/xen-host-pci-device.c
> @@ -9,6 +9,7 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "qemu/cutils.h"
> +#include "hw/xen/xen-legacy-backend.h"
>  #include "xen-host-pci-device.h"
>  
>  #define XEN_HOST_PCI_MAX_EXT_CAP \
> @@ -33,13 +34,76 @@
>  #define IORESOURCE_PREFETCH 0x1000  /* No side effects */
>  #define IORESOURCE_MEM_64   0x0010
>  
> +static void xen_host_pci_fill_local_addr(XenHostPCIDevice *d, Error **errp)
> +{
> +unsigned int num_devs, len, i;
> +unsigned int domain, bus, dev, func;
> +char *be_path = NULL;
> +char path[80];
> +char *msg = NULL;
> +
> +be_path = qemu_xen_xs_read(xenstore, 0, "device/pci/0/backend", );
> +if (!be_path)
> +goto err_out;
> +snprintf(path, sizeof(path), "%s/num_devs", be_path);
> +msg = qemu_xen_xs_read(xenstore, 0, path, );
> +if (!msg)
> +goto err_out;
> +
> +if (sscanf(msg, "%u", _devs) != 1) {
> +error_setg(errp, "Failed to parse %s (%s)\n", msg, path);
> +goto err_out;
> +}
> +free(msg);
> +
> +for (i = 0; i < num_devs; i++) {
> +snprintf(path, sizeof(path), "%s/dev-%u", be_path, i);
> +msg = qemu_xen_xs_read(xenstore, 0, path, );
> +if (!msg) {
> +error_setg(errp, "Failed to read %s\n", path);
> +goto err_out;
> +}
> +if (sscanf(msg, "%x:%x:%x.%x", , , , ) != 4) {
> +error_setg(errp, "Failed to parse %s (%s)\n", msg, path);
> +goto err_out;
> +}
> +free(msg);
> +if (domain != d->domain ||
> +bus != d->bus ||
> +dev != d->dev ||
> +func!= d->func)
> +continue;
> +snprintf(path, sizeof(path), "%s/vdev-%u", be_path, i);
> +msg = qemu_xen_xs_read(xenstore, 0, path, );
> +if (!msg) {
> +error_setg(errp, "Failed to read %s\n", path);
> +goto out;
> +}
> +if (sscanf(msg, "%x:%x:%x.%x", , , , ) != 4) {
> +error_setg(errp, "Failed to parse %s (%s)\n", msg, path);
> +goto err_out;
> +}
> +free(msg);
> +d->local_domain = domain;
> +d->local_bus = bus;
> +d->local_dev = dev;
> +d->local_func = func;
> +goto out;
> +}
> +
> +err_out:
> +free(msg);
> +out:
> +free(be_path);
> +}
> +
>  static void xen_host_pci_sysfs_path(const XenHostPCIDevice *d,
>  const char *name, char *buf, ssize_t 
> size)
>  {
>  int rc;
>  
>  rc = snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
> -  d->domain, d->bus, d->dev, d->func, name);
> +  d->local_domain, d->local_bus, d->local_dev, 
> d->local_func, name);
>  assert(rc >= 0 && rc < size);
>  }
>  
> @@ -342,6 +406,17 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, 
> uint16_t domain,
>  d->dev = dev;
>  d->func = func;
>  
> +if (xen_is_stubdomain) {
> +xen_host_pci_fill_local_addr(d, errp);
> +if (*errp)
> +goto error;
> +} else {
> +d->local_domain = d->domain;
> +d->local_bus = d->bus;
> +d->local_dev = d->dev;
> +d->local_func = d->func;
> +}
> +
>  xen_host_pci_config_open(d, errp);
>  if (*errp) {
>  goto error;
> diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
> index 4d8d34ecb0..270dcb27f7 100644
> --- a/hw/xen/xen-host-pci-device.h
> +++ b/hw/xen/xen-host-pci-device.h
> @@ -23,6 +23,12 @@ typedef struct XenHostPCIDevice {
>  uint8_t dev;
>  uint8_t func;
>  
> +/* different from the above in case of stubdomain */
> +uint16_t local_domain;
> +uint8_t local_bus;
> +uint8_t local_dev;
> +uint8_t local_func;
> +
>  uint16_t vendor_id;
>  uint16_t device_id;
>  uint32_t class_code;
> -- 
> 2.43.0
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [PATCH 2/2] almost fully ignore zero-size flush requests


Hi Jan,

Title: I would add 'gnttab:' to clarify which subsystem you are modifying.

On 05/02/2024 11:03, Jan Beulich wrote:

Along the line with observations in the context of XSA-448, besides
"op" no field is relevant when the range to be flushed is empty, much
like e.g. the pointers passed to memcpy() are irrelevant (and would
never be "validated") when the passed length is zero. Split the existing
condition validating "op", "offset", and "length", leaving only the "op"
part ahead of the check for length being zero (or no flushing to be
performed).


I am probably missing something here. I understand the theory behind 
reducing the number of checks when len == 0. But an OS cannot rely on it:
  1) older hypervisor would still return an error if the check doesn't 
pass)

  2) it does feel odd to allow "invalid" offset when len == 0 (at least.

So to me, it is better to keep those checks early. That said, I agree 
this is a matter of opinion, so I will not Nack it but also I will not 
Ack it.




In the course of splitting also simplify the moved part of the condition
from 3 to 2 conditionals, potentially (depending on the architecture)
requiring one less (conditional) branch.

Signed-off-by: Jan Beulich 

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -3528,15 +3528,16 @@ static int _cache_flush(const gnttab_cac
  void *v;
  int ret;
  
-if ( (cflush->offset >= PAGE_SIZE) ||

- (cflush->length > PAGE_SIZE) ||
- (cflush->offset + cflush->length > PAGE_SIZE) ||
- (cflush->op & ~(GNTTAB_CACHE_INVAL | GNTTAB_CACHE_CLEAN)) )
+if ( cflush->op & ~(GNTTAB_CACHE_INVAL | GNTTAB_CACHE_CLEAN) )
  return -EINVAL;
  
  if ( cflush->length == 0 || cflush->op == 0 )

  return !*cur_ref ? 0 : -EILSEQ;
  
+if ( (cflush->offset | cflush->length) > PAGE_SIZE ||


This is confusing. I understand you are trying to force the compiler to 
optimize. But is it really worth it? After all, the rest of operation 
will outweight this check (cache flush are quite expensive).


We probably should take a more generic decision (and encode in our 
policy) because you seem to like this pattern and I dislike it :). Not 
sure what the others think.


Cheers,

--
Julien Grall

[xen-unstable test] 184703: regressions - FAIL

flight 184703 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184703/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 184699
 build-arm64-pvops 6 kernel-build fail REGR. vs. 184699

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit1   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-thunderx  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184699
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184699
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184699
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184699
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184699
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184699
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184699
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184699
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184699
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184699
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184699
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184699
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  c144b9e32427ba37e0e0443a0d3fa53e9fb878b6
baseline version:
 xen  0441c3acc7e9e72e984ce49d32e61827894ae4a3

Last test of basis   184699  2024-02-19 01:52:12 Z0 days
Testing same since   184703  2024-02-19 14:38:55 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 
  Frediano Ziglio 
  Julien Grall 
  Marek Marczykowski-Górecki 
  Oleksandr Tyshchenko 
  Roger Pau Monné 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass

Re: [PATCH 1/2] gnttab: fully ignore zero-size copy requests


Hi Jan,

On 05/02/2024 11:03, Jan Beulich wrote:

Along the line with observations in the context of XSA-448, no field in
struct gnttab_copy_ptr is relevant when no data is to be copied, much
like e.g. the pointers passed to memcpy() are irrelevant (and would
never be "validated") when the passed length is zero.

Signed-off-by: Jan Beulich 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [XEN PATCH v2 2/3] x86/uaccess: replace __{get,put}_user_bad() with STATIC_ASSERT_UNREACHABLE()

2024-02-19 Thread Stefano Stabellini

On Mon, 19 Feb 2024, Federico Serafini wrote:
> On 15/02/24 11:32, Jan Beulich wrote:
> > The important difference is: Here we're told that there was a use of
> > __put_user_bad, which is easy to grep for, and thus see how the
> > supplied function / file / line(?) relate to the ultimate problem.
> > 
> > I'm afraid I'm meanwhile confused enough by the various replies
> > containing results of experimentation that I can't really tell
> > anymore what case is best. Hence I can only restate my expectation for
> > an eventual v3: Diagnosing what the issue is, no matter whether the new
> > macro is used in another macro or in an inline function, should not
> > become meaningfully more difficult. In how far this is the case wants
> > clarifying in the description of the change.
> 
> I think the best thing at the moment is to deviate
> __{get,put}_user_bad() for Rule 16.3.
> I'll let maintainers further explore the possibility of having a
> compile-time assertion based on the assembler error.

OK. I hope Jan is OK to deviate by in-code comment.

Re: [PATCH v9 4/7] xen/asm-generic: introduce generic device.h


Hi Oleksii,

On 16/02/2024 12:39, Oleksii Kurochko wrote:

Arm, PPC and RISC-V introduce the same things in asm/device.h, so
generic device.h was introduced.
Arm's device.h was taken as a base with the following changes:
  - #ifdef ACPI related things.
  - Rename #ifdef guards.
  - Add SPDX tag.
  - #ifdef CONFIG_HAS_DEVICE_TREE related things.
  - #ifdef-ing iommu related things with CONFIG_HAS_PASSTHROUGH.

Signed-off-by: Oleksii Kurochko 
Acked-by: Jan Beulich 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v8 5/7] xen/asm-generic: introduce generic device.h


Hi Oleksii,

On 15/02/2024 16:54, Oleksii wrote:


On 14/02/2024 09:32, Oleksii wrote:

On Tue, 2024-02-13 at 18:09 +, Julien Grall wrote:

+#ifdef CONFIG_HAS_PASSTHROUGH
+    struct iommu_fwspec *iommu_fwspec; /* per-device IOMMU
instance data */
+#endif
+};
+
+typedef struct device device_t;
+
+#ifdef CONFIG_HAS_DEVICE_TREE
+
+#include 
+
+#define dev_is_dt(dev)  ((dev)->type == DEV_DT)
+
+/**
+ *  device_init - Initialize a device
+ *  @dev: device to initialize
+ *  @class: class of the device (serial, network...)
+ *  @data: specific data for initializing the device
+ *
+ *  Return 0 on success.
+ */
+int device_init(struct dt_device_node *dev, enum device_class
class,
+    const void *data);
+
+/**
+ * device_get_type - Get the type of the device
+ * @dev: device to match
+ *
+ * Return the device type on success or DEVICE_ANY on failure
+ */
+enum device_class device_get_class(const struct dt_device_node
*dev);
+
+#define DT_DEVICE_START(name_, namestr_, class_)    \
+static const struct device_desc __dev_desc_##name_ __used   \
+__section(".dev.info") = {  \
+    .name = namestr_,   \
+    .class = class_,
+
+#define DT_DEVICE_END   \
+};
+
+#else /* !CONFIG_HAS_DEVICE_TREE */
+#define dev_is_dt(dev) ((void)(dev), false)
+#endif /* CONFIG_HAS_DEVICE_TREE */
+
+#define dev_is_pci(dev) ((dev)->type == DEV_PCI)
+
+struct device_desc {
+    /* Device name */
+    const char *name;
+    /* Device class */
+    enum device_class class;
+
+#ifdef CONFIG_HAS_DEVICE_TREE
+
+    /* List of devices supported by this driver */
+    const struct dt_device_match *dt_match;
+    /*
+ * Device initialization.
+ *
+ * -EAGAIN is used to indicate that device probing is
deferred.
+ */
+    int (*init)(struct dt_device_node *dev, const void *data);
+
+#endif
+};

I am not sure I fully understand why "device_desc" is not
protected
by
CONFIG_HAS_DEVICE_TREE. The structure doesn't mean much when the
config
is disabled. Can you clarify?

I thought that one day struct device_desc and acpi_device_desc will
be
"merged", and so decided just to #ifdef only DEVICE_TREE specific
fields.


It might be possible to merge the two if we were using an union for
the
ACPI/DT specific part. However the majority of the parsing code needs
to
differ. So I am not convinced there would be any value to merge the
two
structures.

In this case, let's have two separate structures.

This is not the current situation, and I don't have a specific example.
It appears that all architectures will use Device Tree or ACPI.
However, does it make sense to keep 'struct device_desc' more generic
to accommodate non-DT or non-ACPI cases?


I am not entirely sure what else to say. As I wrote before yes it could 
be made generic. But right now I don't see any values.


If you have any idea how to share the structure. Then feel free to make 
a proposal and I will review it.


Cheers,

--
Julien Grall

Re: [PATCH v9 7/7] xen/asm-generic: fold struct devarch into struct dev


Hi,

On 16/02/2024 12:39, Oleksii Kurochko wrote:

The current patch is a follow-up to the patch titled:
 xen/asm-generic: introduce generic device.h
Also, a prerequisite for this patch is, without which a compilation
error will occur:
 xen/arm: switch Arm to use asm-generic/device.h

The 'struct dev_archdata' is exclusively used within 'struct device',
so it could be merged into 'struct device.'

After the merger, it is necessary to update the 'dev_archdata()'
macros and the comments above 'struct arm_smmu_xen_device' in
drivers/passthrough/arm/smmu.c.
Additionally, it is required to update instances of
"dev->archdata->iommu" to "dev->iommu".

Suggested-by: Julien Grall 
Signed-off-by: Oleksii Kurochko 
---
   This patch can be merged with patches 4 and 5 of this patch series.


I am a bit puzzled with this comment. If this is the case, then why was 
it done in a separate patch?


I know I suggested to create the separate patch but this was only in the 
case you decided to handle it after this series is merged. So this 
should have been merged when sending. Maybe I should have been clearer.


Anyway, regardless that:

Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

Re: [PATCH v9 7/7] xen/asm-generic: fold struct devarch into struct dev


Hi Jan,

On 19/02/2024 11:26, Jan Beulich wrote:

On 16.02.2024 13:39, Oleksii Kurochko wrote:

The current patch is a follow-up to the patch titled:
 xen/asm-generic: introduce generic device.h
Also, a prerequisite for this patch is, without which a compilation
error will occur:
 xen/arm: switch Arm to use asm-generic/device.h

The 'struct dev_archdata' is exclusively used within 'struct device',
so it could be merged into 'struct device.'

After the merger, it is necessary to update the 'dev_archdata()'
macros and the comments above 'struct arm_smmu_xen_device' in
drivers/passthrough/arm/smmu.c.
Additionally, it is required to update instances of
"dev->archdata->iommu" to "dev->iommu".

Suggested-by: Julien Grall 
Signed-off-by: Oleksii Kurochko 
---
   This patch can be merged with patches 4 and 5 of this patch series.
---
Changes in V9:
  - newly introduced patch.
---
  xen/drivers/passthrough/arm/smmu.c | 12 ++--
  xen/include/asm-generic/device.h   |  8 +---
  2 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 32e2ff279b..4a272c8779 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -227,9 +227,9 @@ struct arm_smmu_xen_domain {
  };
  
  /*

- * Xen: Information about each device stored in dev->archdata.iommu
+ * Xen: Information about each device stored in dev->iommu
   *
- * Initially dev->archdata.iommu only stores the iommu_domain (runtime
+ * Initially dev->iommu only stores the iommu_domain (runtime
   * configuration of the SMMU) but, on Xen, we also have to store the
   * iommu_group (list of streamIDs associated to the device).
   *
@@ -242,7 +242,7 @@ struct arm_smmu_xen_device {
struct iommu_group *group;
  };
  
-#define dev_archdata(dev) ((struct arm_smmu_xen_device *)dev->archdata.iommu)

+#define dev_archdata(dev) ((struct arm_smmu_xen_device *)dev->iommu)


I find in particular the naming here odd. But I'll let Julien judge whether
this really is along the lines of what he suggested.


It is. The naming is not great, but the SMMU code is intended to stay 
close to Linux. So we want to do the minimum amount of change (at least 
until we decide we should diverge).


Cheers,

--
Julien Grall

Re: [XEN v4 2/3] xen/arm: arm64: Add emulation of Debug Data Transfer Registers


Hi Michal,

On 19/02/2024 15:43, Michal Orzel wrote:


On 19/02/2024 15:45, Ayan Kumar Halder wrote:


On 06/02/2024 19:05, Julien Grall wrote:

Hi Ayan,

Hi Julien/Michal,


On 31/01/2024 12:10, Ayan Kumar Halder wrote:

From: Michal Orzel 

Currently, if user enables HVC_DCC config option in Linux, it invokes
access
to debug data transfer registers (i.e. DBGDTRTX_EL0 on arm64,
DBGDTRTXINT on
arm32). As these registers are not emulated, Xen injects an undefined
exception to the guest and Linux crashes.

To prevent this crash, introduce a partial emulation of DBGDTR[TR]X_EL0
(these registers share the same encoding) as RAZ/WI and MDCCSR_EL0 as
TXfull.

Refer ARM DDI 0487J.a ID042523, D19.3.8, DBGDTRTX_EL0
"If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN".

Thus, any OS is expected to read MDCCSR_EL0 and check for TXfull before
using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() --->
hvc_dcc_check(),
and returns -ENODEV in case TXfull bit is still set after writing a test
character. This way we prevent the guest from making use of HVC DCC as a
console.

Signed-off-by: Michal Orzel 
Signed-off-by: Ayan Kumar Halder 
---
Changes from

v1 :- 1. DBGDTR_EL0 does not emulate RXfull. This is to avoid giving
the OS any
indication that the RX buffer is full and is waiting to be read.

2. In Arm32, DBGOSLSR is emulated. Also DBGDTRTXINT is emulated at
EL0 only.

3. Fixed the commit message and inline code comments.

v2 :- 1. Split the patch into two (separate patches for arm64 and
arm32).
2. Removed the "fail" label.
3. Fixed the commit message.

v3 :- 1. "HSR_SYSREG_MDCCSR_EL0" emulation differs based on whether
partial_emulation_enabled is true or not.

2. If partial_emulation_enabled is false, then access to
HSR_SYSREG_DBGDTR_EL0,
HSR_SYSREG_DBGDTRTX_EL0 would lead to undefined exception.

   xen/arch/arm/arm64/vsysreg.c | 28 
   xen/arch/arm/include/asm/arm64/hsr.h |  3 +++
   2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index b5d54c569b..94f0a6c384 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs,
    *
    * Unhandled:
    *    MDCCINT_EL1
- *    DBGDTR_EL0
- *    DBGDTRRX_EL0
- *    DBGDTRTX_EL0
    *    OSDTRRX_EL1
    *    OSDTRTX_EL1
    *    OSECCR_EL1
@@ -173,10 +170,32 @@ void do_sysreg(struct cpu_user_regs *regs,
   return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
   case HSR_SYSREG_MDCCSR_EL0:
   /*
+ * Xen doesn't expose a real (or emulated) Debug
Communications Channel
+ * (DCC) to a domain. Yet the Arm ARM implies this is not an
optional
+ * feature. So some domains may start to probe it. For
instance, the
+ * HVC_DCC driver in Linux (since f35dc083 and at least
up to v6.7),
+ * will try to write some characters and check if the
transmit buffer
+ * has emptied.
+ *
+ * By setting TX status bit (only if partial emulation is
enabled) to
+ * indicate the transmit buffer is full, we would hint the
OS that the
+ * DCC is probably not working.
+ *
+ * Bit 29: TX full
+ *
    * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We
emulate that
    * register as RAZ/WI above. So RO at both EL0 and EL1.


The sentence "we emulate that register as ..." seems to be stale?

I can see that you tried to handle Julien remark here. But I disagree. This 
statement
is not stale. It explains that because MDSCR_EL1 is RAZ/WI, MDCCSR_EL0 is RO at 
both
EL0 and EL1. This patch does not change this behavior.


Indeed. I misread the comment. So what I wrote can be ignored here.






    */
-    return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 0);
+    return handle_ro_read_val(regs, regidx, hsr.sysreg.read,
hsr, 0,
+  partial_emulation ? (1U << 29) : 0);
+
+    case HSR_SYSREG_DBGDTR_EL0:
+    /* DBGDTR[TR]X_EL0 share the same encoding */
+    case HSR_SYSREG_DBGDTRTX_EL0:
+    if ( !partial_emulation )
+    goto fail;
+    return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 0);


AFAICT, all the emulation helpers have an explanation why we are using
them. But here this is not the case. Can you add one?

This and..



+
   HSR_SYSREG_DBG_CASES(DBGBVR):
   HSR_SYSREG_DBG_CASES(DBGBCR):
   HSR_SYSREG_DBG_CASES(DBGWVR):
@@ -394,6 +413,7 @@ void do_sysreg(struct cpu_user_regs *regs,
    * And all other unknown registers.
    */
   default:
+ fail:


AFAICT, this would violate MISRA 15.3 [1]. We didn't seem to have yet
(?) accepted the rule, but I don't see we would not given I feel this
is similar to what Rule 16.2 is trying to prevent and we accepted it.

I think case, I move all the code within default outside. And then

[PATCH 2/2] xen: fix stubdom PCI addr

From: Frédéric Pierret (fepitre) 

When running in a stubdomain, the config space access via sysfs needs to
use BDF as seen inside stubdomain (connected via xen-pcifront), which is
different from the real BDF. For other purposes (hypercall parameters
etc), the real BDF needs to be used.
Get the in-stubdomain BDF by looking up relevant PV PCI xenstore
entries.

Signed-off-by: Marek Marczykowski-Górecki 
---
 hw/xen/xen-host-pci-device.c | 77 +++-
 hw/xen/xen-host-pci-device.h |  6 +++
 2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716..3f8a6f84a8 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -9,6 +9,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
+#include "hw/xen/xen-legacy-backend.h"
 #include "xen-host-pci-device.h"
 
 #define XEN_HOST_PCI_MAX_EXT_CAP \
@@ -33,13 +34,76 @@
 #define IORESOURCE_PREFETCH 0x1000  /* No side effects */
 #define IORESOURCE_MEM_64   0x0010
 
+static void xen_host_pci_fill_local_addr(XenHostPCIDevice *d, Error **errp)
+{
+unsigned int num_devs, len, i;
+unsigned int domain, bus, dev, func;
+char *be_path = NULL;
+char path[80];
+char *msg = NULL;
+
+be_path = qemu_xen_xs_read(xenstore, 0, "device/pci/0/backend", );
+if (!be_path)
+goto err_out;
+snprintf(path, sizeof(path), "%s/num_devs", be_path);
+msg = qemu_xen_xs_read(xenstore, 0, path, );
+if (!msg)
+goto err_out;
+
+if (sscanf(msg, "%u", _devs) != 1) {
+error_setg(errp, "Failed to parse %s (%s)\n", msg, path);
+goto err_out;
+}
+free(msg);
+
+for (i = 0; i < num_devs; i++) {
+snprintf(path, sizeof(path), "%s/dev-%u", be_path, i);
+msg = qemu_xen_xs_read(xenstore, 0, path, );
+if (!msg) {
+error_setg(errp, "Failed to read %s\n", path);
+goto err_out;
+}
+if (sscanf(msg, "%x:%x:%x.%x", , , , ) != 4) {
+error_setg(errp, "Failed to parse %s (%s)\n", msg, path);
+goto err_out;
+}
+free(msg);
+if (domain != d->domain ||
+bus != d->bus ||
+dev != d->dev ||
+func!= d->func)
+continue;
+snprintf(path, sizeof(path), "%s/vdev-%u", be_path, i);
+msg = qemu_xen_xs_read(xenstore, 0, path, );
+if (!msg) {
+error_setg(errp, "Failed to read %s\n", path);
+goto out;
+}
+if (sscanf(msg, "%x:%x:%x.%x", , , , ) != 4) {
+error_setg(errp, "Failed to parse %s (%s)\n", msg, path);
+goto err_out;
+}
+free(msg);
+d->local_domain = domain;
+d->local_bus = bus;
+d->local_dev = dev;
+d->local_func = func;
+goto out;
+}
+
+err_out:
+free(msg);
+out:
+free(be_path);
+}
+
 static void xen_host_pci_sysfs_path(const XenHostPCIDevice *d,
 const char *name, char *buf, ssize_t size)
 {
 int rc;
 
 rc = snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
-  d->domain, d->bus, d->dev, d->func, name);
+  d->local_domain, d->local_bus, d->local_dev, d->local_func, 
name);
 assert(rc >= 0 && rc < size);
 }
 
@@ -342,6 +406,17 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 d->dev = dev;
 d->func = func;
 
+if (xen_is_stubdomain) {
+xen_host_pci_fill_local_addr(d, errp);
+if (*errp)
+goto error;
+} else {
+d->local_domain = d->domain;
+d->local_bus = d->bus;
+d->local_dev = d->dev;
+d->local_func = d->func;
+}
+
 xen_host_pci_config_open(d, errp);
 if (*errp) {
 goto error;
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 4d8d34ecb0..270dcb27f7 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -23,6 +23,12 @@ typedef struct XenHostPCIDevice {
 uint8_t dev;
 uint8_t func;
 
+/* different from the above in case of stubdomain */
+uint16_t local_domain;
+uint8_t local_bus;
+uint8_t local_dev;
+uint8_t local_func;
+
 uint16_t vendor_id;
 uint16_t device_id;
 uint32_t class_code;
-- 
2.43.0

[PATCH 1/2] hw/xen: detect when running inside stubdomain

Introduce global xen_is_stubdomain variable when qemu is running inside
a stubdomain instead of dom0. This will be relevant for subsequent
patches, as few things like accessing PCI config space need to be done
differently.

Signed-off-by: Marek Marczykowski-Górecki 
---
 hw/xen/xen-legacy-backend.c | 15 +++
 include/hw/xen/xen.h|  1 +
 system/globals.c|  1 +
 3 files changed, 17 insertions(+)

diff --git a/hw/xen/xen-legacy-backend.c b/hw/xen/xen-legacy-backend.c
index 124dd5f3d6..4a09ea2561 100644
--- a/hw/xen/xen-legacy-backend.c
+++ b/hw/xen/xen-legacy-backend.c
@@ -603,6 +603,19 @@ static void xen_set_dynamic_sysbus(void)
 machine_class_allow_dynamic_sysbus_dev(mc, TYPE_XENSYSDEV);
 }
 
+static bool xen_check_stubdomain(void)
+{
+char *dm_path = g_strdup_printf("/local/domain/%d/image", xen_domid);
+uint32_t dm_domid;
+bool is_stubdom = false;
+
+if (!xenstore_read_int(dm_path, "device-model-domid", _domid))
+is_stubdom = dm_domid != 0;
+
+g_free(dm_path);
+return is_stubdom;
+}
+
 void xen_be_init(void)
 {
 xenstore = qemu_xen_xs_open();
@@ -616,6 +629,8 @@ void xen_be_init(void)
 exit(1);
 }
 
+xen_is_stubdomain = xen_check_stubdomain();
+
 xen_sysdev = qdev_new(TYPE_XENSYSDEV);
 sysbus_realize_and_unref(SYS_BUS_DEVICE(xen_sysdev), _fatal);
 xen_sysbus = qbus_new(TYPE_XENSYSBUS, xen_sysdev, "xen-sysbus");
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 37ecc91fc3..ecb89ecfc1 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -36,6 +36,7 @@ enum xen_mode {
 extern uint32_t xen_domid;
 extern enum xen_mode xen_mode;
 extern bool xen_domid_restrict;
+extern bool xen_is_stubdomain;
 
 int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);
 int xen_set_pci_link_route(uint8_t link, uint8_t irq);
diff --git a/system/globals.c b/system/globals.c
index b6d4e72530..ac27d88bd4 100644
--- a/system/globals.c
+++ b/system/globals.c
@@ -62,6 +62,7 @@ bool qemu_uuid_set;
 uint32_t xen_domid;
 enum xen_mode xen_mode = XEN_DISABLED;
 bool xen_domid_restrict;
+bool xen_is_stubdomain;
 struct evtchn_backend_ops *xen_evtchn_ops;
 struct gnttab_backend_ops *xen_gnttab_ops;
 struct foreignmem_backend_ops *xen_foreignmem_ops;
-- 
2.43.0

Re: [PATCH v2] Constify some parameters

2024-02-19 Thread Daniel P. Smith


For XSM changes,

Ack-by: Daniel P. Smith 

V/r,
Daniel P. Smith
Apertus Solutions, LLC

On 2/14/24 04:47, Frediano Ziglio wrote:

Make clear they are not changed in the functions.

Signed-off-by: Frediano Ziglio
Reviewed-by: Jan Beulich
--
v2:
- fixed typo in commit message.
---
  xen/arch/x86/pv/callback.c | 4 ++--
  xen/common/sched/compat.c  | 2 +-
  xen/common/sched/core.c| 2 +-
  xen/xsm/flask/flask_op.c   | 8 
  4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/pv/callback.c b/xen/arch/x86/pv/callback.c
index 17829304fe..caec4fb16f 100644
--- a/xen/arch/x86/pv/callback.c
+++ b/xen/arch/x86/pv/callback.c
@@ -48,7 +48,7 @@ static void unregister_guest_nmi_callback(void)
  memset(t, 0, sizeof(*t));
  }
  
-static long register_guest_callback(struct callback_register *reg)

+static long register_guest_callback(const struct callback_register *reg)
  {
  long ret = 0;
  struct vcpu *curr = current;
@@ -102,7 +102,7 @@ static long register_guest_callback(struct 
callback_register *reg)
  return ret;
  }
  
-static long unregister_guest_callback(struct callback_unregister *unreg)

+static long unregister_guest_callback(const struct callback_unregister *unreg)
  {
  long ret;
  
diff --git a/xen/common/sched/compat.c b/xen/common/sched/compat.c

index dd97593630..a02204ec9a 100644
--- a/xen/common/sched/compat.c
+++ b/xen/common/sched/compat.c
@@ -26,7 +26,7 @@ CHECK_sched_shutdown;
  CHECK_sched_remote_shutdown;
  #undef xen_sched_remote_shutdown
  
-static int compat_poll(struct compat_sched_poll *compat)

+static int compat_poll(const struct compat_sched_poll *compat)
  {
  struct sched_poll native;
  
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c

index d177c675c8..c5db373972 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -1431,7 +1431,7 @@ static void vcpu_block_enable_events(void)
  vcpu_block();
  }
  
-static long do_poll(struct sched_poll *sched_poll)

+static long do_poll(const struct sched_poll *sched_poll)
  {
  struct vcpu   *v = current;
  struct domain *d = v->domain;
diff --git a/xen/xsm/flask/flask_op.c b/xen/xsm/flask/flask_op.c
index b866e8d05f..ea7dd10dc8 100644
--- a/xen/xsm/flask/flask_op.c
+++ b/xen/xsm/flask/flask_op.c
@@ -469,7 +469,7 @@ static int flask_security_load(struct xen_flask_load *load)
  return ret;
  }
  
-static int flask_devicetree_label(struct xen_flask_devicetree_label *arg)

+static int flask_devicetree_label(const struct xen_flask_devicetree_label *arg)
  {
  int rv;
  char *buf;
@@ -492,7 +492,7 @@ static int flask_devicetree_label(struct 
xen_flask_devicetree_label *arg)
  
  #ifndef COMPAT
  
-static int flask_ocontext_del(struct xen_flask_ocontext *arg)

+static int flask_ocontext_del(const struct xen_flask_ocontext *arg)
  {
  int rv;
  
@@ -506,7 +506,7 @@ static int flask_ocontext_del(struct xen_flask_ocontext *arg)

  return security_ocontext_del(arg->ocon, arg->low, arg->high);
  }
  
-static int flask_ocontext_add(struct xen_flask_ocontext *arg)

+static int flask_ocontext_add(const struct xen_flask_ocontext *arg)
  {
  int rv;
  
@@ -550,7 +550,7 @@ static int flask_get_peer_sid(struct xen_flask_peersid *arg)

  return rv;
  }
  
-static int flask_relabel_domain(struct xen_flask_relabel *arg)

+static int flask_relabel_domain(const struct xen_flask_relabel *arg)
  {
  int rc;
  struct domain *d;

Re: [PATCH 6/6] svm/nestedvm: Introduce nested capabilities bit

On 06.02.2024 02:20, George Dunlap wrote:
> --- /dev/null
> +++ b/docs/designs/nested-svm-cpu-features.md
> @@ -0,0 +1,110 @@
> +# Nested SVM (AMD) CPUID requirements
> +
> +The first step in making nested SVM production-ready is to make sure
> +that all features are implemented and well-tested.  To make this
> +tractable, we will initially be limiting the "supported" range of
> +nested virt to a specific subset of host and guest features.  This
> +document describes the criteria for deciding on features, and the
> +rationale behind each feature.
> +
> +For AMD, all virtualization-related features can be found in CPUID
> +leaf 800A:edx
> +
> +# Criteria
> +
> +- Processor support: At a minimum we want to support processors from
> +  the last 5 years.  All things being equal, older processors are
> +  better.

Nit: Perhaps missing "covering"? Generally I hope newer processors are
"better".

>  Bits 0:7 were available in the very earliest processors;
> +  and even through bit 15 we should be pretty good support-wise.
> +
> +- Faithfulness to hardware: We need the behavior of the "virtual cpu"
> +  from the L1 hypervisor's perspective to be as close as possible to
> +  the original hardware.  In particular, the behavior of the hardware
> +  on error paths 1) is not easy to understand or test, 2) can be the
> +  source of surprising vulnerabiliies.  (See XSA-7 for an example of a
> +  case where subtle error-handling differences can open up a privilege
> +  escalation.)  We should avoid emulating any bit of the hardware with
> +  complex error paths if we can at all help it.
> +
> +- Cost of implementation: We want to minimize the cost of
> +  implementation (where this includes bringing an existing sub-par
> +  implementation up to speed).  All things being equal, we'll favor a
> +  configuration which does not require any new implementation.
> +
> +- Performance: All things being equal, we'd prefer to choose a set of
> +  L0 / L1 CPUID bits that are faster than slower.
> +
> +
> +# Bits
> +
> +- 0 `NP` *Nested Paging*: Required both for L0 and L1.
> +
> +  Based primarily on faithfulness and performance, as well as
> +  potential cost of implementation.  Available on earliest hardware,
> +  so no compatibility issues.
> +
> +- 1 `LbrVirt` *LBR / debugging virtualization*: Require for L0 and L1.
> +
> +  For L0 this is required for performance: There's no way to tell the
> +  guests not to use the LBR-related registers; and if the guest does,
> +  then you have to save and restore all LBR-related registers on
> +  context switch, which is prohibitive.

"prohibitive" is too strong imo; maybe "undesirable"?

> --- a/xen/arch/x86/hvm/svm/nestedhvm.h
> +++ b/xen/arch/x86/hvm/svm/nestedhvm.h
> @@ -35,6 +35,7 @@ enum nestedhvm_vmexits
>  nestedsvm_check_intercepts(struct vcpu *v, struct cpu_user_regs *regs,
>  uint64_t exitcode);
>  void svm_nested_features_on_efer_update(struct vcpu *v);
> +void __init start_nested_svm(struct hvm_function_table *svm_function_table);

No section placement attributes on declarations, please.

> --- a/xen/arch/x86/hvm/svm/nestedsvm.c
> +++ b/xen/arch/x86/hvm/svm/nestedsvm.c
> @@ -1666,3 +1666,17 @@ void svm_nested_features_on_efer_update(struct vcpu *v)
>  }
>  }
>  }
> +
> +void __init start_nested_svm(struct hvm_function_table *svm_function_table)
> +{
> +/* 
> + * Required host functionality to support nested virt.  See
> + * docs/designs/nested-svm-cpu-features.md for rationale.
> + */
> +svm_function_table->caps.nested_virt =
> +cpu_has_svm_nrips &&
> +cpu_has_svm_lbrv &&
> +cpu_has_svm_nrips &&

nrips twice? Was the earlier one meant to be npt?

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -3021,6 +3021,9 @@ const struct hvm_function_table * __init start_vmx(void)
>  if ( cpu_has_vmx_tsc_scaling )
>  vmx_function_table.tsc_scaling.ratio_frac_bits = 48;
>  
> +/* TODO: Require hardware support before enabling nested virt */
> +vmx_function_table.caps.nested_virt = vmx_function_table.caps.hap;

This won't have the intended effect if hap_supported() ends up clearing
the bit (used as input here) due to a command line option override. I
wonder if instead this wants doing e.g. in a new hook to be called from
nestedhvm_setup(). On the SVM side the hook function would then be the
start_nested_svm() that you already introduce, with a caps.hap check
added.

Since you leave the other nested-related if() in place in
arch_sanitise_domain_config(), all ought to be well, but I think that
other if() really wants replacing by the one you presently add.

Jan

Re: [PATCH 3/6] xen/hvm: Move other hvm_function_table booleans into the caps bitfield

On 06.02.2024 02:20, George Dunlap wrote:
> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> @@ -86,20 +86,19 @@ struct hvm_vcpu_nonreg_state {
>  struct hvm_function_table {
>  const char *name;
>  
> -/* Support Hardware-Assisted Paging? */
> -bool hap_supported;
> -
> -/* Necessary hardware support for alternate p2m's? */
> -bool altp2m_supported;
> -bool singlestep_supported;
> -
> -/* Hardware virtual interrupt delivery enable? */
> -bool virtual_intr_delivery_enabled;
> -
>  struct {
>  /* Indicate HAP capabilities. */
> -bool hap_superpage_1gb:1,
> -hap_superpage_2mb:1;
> +bool hap:1,
> + hap_superpage_1gb:1,
> + hap_superpage_2mb:1,
> +
> +/* Altp2m capabilities */
> +altp2m:1,
> +singlestep:1,
> +
> +/* Hardware virtual interrupt delivery enable? */
> +virtual_intr_delivery;
> +
>  } caps;

Nit (spotted only while looking at patch 6): You're adding a stray blank
line at the end of the structure. Further I expect virtual_intr_delivery
would also want to be just a single bit?

Jan

Re: [PATCH 5/6] nestedsvm: Remove bogus debug message from nestedsvm_check_intercepts

On 06.02.2024 02:20, George Dunlap wrote:
> Changeset ef3e8db8068 ("x86/hvm: Corrections and improvements to
> unhandled vmexit logging") introduced a printk to the default path of
> the switch statement in nestedsvm_check_intercepts(), complaining of
> an unknown exit reason.
> 
> Unfortunately, the "core" switch statement which is meant to handle
> all vmexit reasons is in nsvm_vmcb_guest_intercepts_exitcode(); the
> switch statement in nestedsvm_check_intercepts() is only meant to
> superimpose on top of that some special-casing for how to interaction
> between L1 and L0 vmexits.
> 
> Remove the printk, and add a comment to prevent future confusion.
> 
> Signed-off-by: George Dunlap 

Reviewed-by: Jan Beulich 

I wonder if a Fixes: tag is warranted here.

Jan

Re: [XEN v4 2/3] xen/arm: arm64: Add emulation of Debug Data Transfer Registers

2024-02-19 Thread Michal Orzel

Hi Ayan,

On 19/02/2024 15:45, Ayan Kumar Halder wrote:
> 
> On 06/02/2024 19:05, Julien Grall wrote:
>> Hi Ayan,
> Hi Julien/Michal,
>>
>> On 31/01/2024 12:10, Ayan Kumar Halder wrote:
>>> From: Michal Orzel 
>>>
>>> Currently, if user enables HVC_DCC config option in Linux, it invokes 
>>> access
>>> to debug data transfer registers (i.e. DBGDTRTX_EL0 on arm64, 
>>> DBGDTRTXINT on
>>> arm32). As these registers are not emulated, Xen injects an undefined
>>> exception to the guest and Linux crashes.
>>>
>>> To prevent this crash, introduce a partial emulation of DBGDTR[TR]X_EL0
>>> (these registers share the same encoding) as RAZ/WI and MDCCSR_EL0 as 
>>> TXfull.
>>>
>>> Refer ARM DDI 0487J.a ID042523, D19.3.8, DBGDTRTX_EL0
>>> "If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN".
>>>
>>> Thus, any OS is expected to read MDCCSR_EL0 and check for TXfull before
>>> using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() ---> 
>>> hvc_dcc_check(),
>>> and returns -ENODEV in case TXfull bit is still set after writing a test
>>> character. This way we prevent the guest from making use of HVC DCC as a
>>> console.
>>>
>>> Signed-off-by: Michal Orzel 
>>> Signed-off-by: Ayan Kumar Halder 
>>> ---
>>> Changes from
>>>
>>> v1 :- 1. DBGDTR_EL0 does not emulate RXfull. This is to avoid giving 
>>> the OS any
>>> indication that the RX buffer is full and is waiting to be read.
>>>
>>> 2. In Arm32, DBGOSLSR is emulated. Also DBGDTRTXINT is emulated at 
>>> EL0 only.
>>>
>>> 3. Fixed the commit message and inline code comments.
>>>
>>> v2 :- 1. Split the patch into two (separate patches for arm64 and 
>>> arm32).
>>> 2. Removed the "fail" label.
>>> 3. Fixed the commit message.
>>>
>>> v3 :- 1. "HSR_SYSREG_MDCCSR_EL0" emulation differs based on whether
>>> partial_emulation_enabled is true or not.
>>>
>>> 2. If partial_emulation_enabled is false, then access to 
>>> HSR_SYSREG_DBGDTR_EL0,
>>> HSR_SYSREG_DBGDTRTX_EL0 would lead to undefined exception.
>>>
>>>   xen/arch/arm/arm64/vsysreg.c | 28 
>>>   xen/arch/arm/include/asm/arm64/hsr.h |  3 +++
>>>   2 files changed, 27 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
>>> index b5d54c569b..94f0a6c384 100644
>>> --- a/xen/arch/arm/arm64/vsysreg.c
>>> +++ b/xen/arch/arm/arm64/vsysreg.c
>>> @@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs,
>>>    *
>>>    * Unhandled:
>>>    *    MDCCINT_EL1
>>> - *    DBGDTR_EL0
>>> - *    DBGDTRRX_EL0
>>> - *    DBGDTRTX_EL0
>>>    *    OSDTRRX_EL1
>>>    *    OSDTRTX_EL1
>>>    *    OSECCR_EL1
>>> @@ -173,10 +170,32 @@ void do_sysreg(struct cpu_user_regs *regs,
>>>   return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
>>>   case HSR_SYSREG_MDCCSR_EL0:
>>>   /*
>>> + * Xen doesn't expose a real (or emulated) Debug 
>>> Communications Channel
>>> + * (DCC) to a domain. Yet the Arm ARM implies this is not an 
>>> optional
>>> + * feature. So some domains may start to probe it. For 
>>> instance, the
>>> + * HVC_DCC driver in Linux (since f35dc083 and at least 
>>> up to v6.7),
>>> + * will try to write some characters and check if the 
>>> transmit buffer
>>> + * has emptied.
>>> + *
>>> + * By setting TX status bit (only if partial emulation is 
>>> enabled) to
>>> + * indicate the transmit buffer is full, we would hint the 
>>> OS that the
>>> + * DCC is probably not working.
>>> + *
>>> + * Bit 29: TX full
>>> + *
>>>    * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We 
>>> emulate that
>>>    * register as RAZ/WI above. So RO at both EL0 and EL1.
>>
>> The sentence "we emulate that register as ..." seems to be stale?
I can see that you tried to handle Julien remark here. But I disagree. This 
statement
is not stale. It explains that because MDSCR_EL1 is RAZ/WI, MDCCSR_EL0 is RO at 
both
EL0 and EL1. This patch does not change this behavior.

>>
>>>    */
>>> -    return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 0);
>>> +    return handle_ro_read_val(regs, regidx, hsr.sysreg.read, 
>>> hsr, 0,
>>> +  partial_emulation ? (1U << 29) : 0);
>>> +
>>> +    case HSR_SYSREG_DBGDTR_EL0:
>>> +    /* DBGDTR[TR]X_EL0 share the same encoding */
>>> +    case HSR_SYSREG_DBGDTRTX_EL0:
>>> +    if ( !partial_emulation )
>>> +    goto fail;
>>> +    return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 0);
>>
>> AFAICT, all the emulation helpers have an explanation why we are using 
>> them. But here this is not the case. Can you add one?
> This and..
>>
>>> +
>>>   HSR_SYSREG_DBG_CASES(DBGBVR):
>>>   HSR_SYSREG_DBG_CASES(DBGBCR):
>>>   HSR_SYSREG_DBG_CASES(DBGWVR):
>>> @@ -394,6 +413,7 @@ void do_sysreg(struct cpu_user_regs *regs,
>>>    * And all other

Re: Stats on Xen tarball downloads

2024-02-19 Thread Elliott Mitchell

On Mon, Feb 19, 2024 at 06:01:54PM +0800, George Dunlap wrote:
> 
> Looking at the *non*-4.18 downloads, nearly all of them have user
> agents that make it clear they're part of automated build systems:
> user agents like curl and wget, but also "Go-http-client", "libfetch",

I reject this claim.  `curl` or `wget` could be part of an interactive
operation.  Telling a browser to copy a URL into the paste buffer, then
using `wget`/`curl` is entirely possible.  I may be the outlier, but I
routinely do this.

I don't know whether Gentoo's `emerge` uses `wget`/`curl`, but that could
be semi-interactive.

> It's not really clear to me why we'd be getting 300-ish people
> downloading the Xen 4.18.0 tarball, 2/3 of which are on Windows.  But
> then I'm also not sure why someone would *fake* hundreds of downloads
> a week from unique IP addresses; and in particular, if you were going
> to fake hundreds of downloads a week, I'm not sure why you'd only fake
> the most recent release.

Remember the browser wars?  At one point many sites were looking for
IE/Windows and sending back error messages without those.  Getting the
tarball on Windows doesn't seem too likely, faking the browser was
pretty common for a while.

-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h

On 19.02.2024 16:20, Oleksii wrote:
> On Mon, 2024-02-19 at 15:12 +0100, Jan Beulich wrote:
>> On 19.02.2024 15:00, Oleksii wrote:
>>> On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote:
 On 05/02/2024 15:32, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/cmpxchg.h
> @@ -0,0 +1,237 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2014 Regents of the University of California
> */
> +
> +#ifndef _ASM_RISCV_CMPXCHG_H
> +#define _ASM_RISCV_CMPXCHG_H
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#define ALIGN_DOWN(addr, size)  ((addr) & (~((size) - 1)))
> +
> +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier,
> acquire_barrier) \
> +({ \
> +    asm volatile( \
> +    release_barrier \
> +    " amoswap" sfx " %0, %2, %1\n" \
> +    acquire_barrier \
> +    : "=r" (ret), "+A" (*ptr) \
> +    : "r" (new) \
> +    : "memory" ); \
> +})
> +
> +#define emulate_xchg_1_2(ptr, new, ret, release_barrier,
> acquire_barrier) \
> +({ \
> +    uint32_t *ptr_32b_aligned = (uint32_t
> *)ALIGN_DOWN((unsigned
> long)ptr, 4); \
> +    uint8_t mask_l = ((unsigned long)(ptr) & (0x8 -
> sizeof(*ptr)))
> * BITS_PER_BYTE; \
> +    uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \
> +    uint8_t mask_h = mask_l + mask_size - 1; \
> +    unsigned long mask = GENMASK(mask_h, mask_l); \
> +    unsigned long new_ = (unsigned long)(new) << mask_l; \
> +    unsigned long ret_; \
> +    unsigned long rc; \
> +    \
> +    asm volatile( \
> +    release_barrier \
> +    "0: lr.d %0, %2\n" \

 I was going to ask why this is lr.d rather than lr.w. But I see
 Jan 
 already asked. I agree with him that it should probably be a lr.w
 and
 ...

> +    "   and  %1, %0, %z4\n" \
> +    "   or   %1, %1, %z3\n" \
> +    "   sc.d %1, %1, %2\n" \

 ... respectively sc.w. The same applies for cmpxchg.
>>>
>>> I agree that it would be better, and my initial attempt was to
>>> handle
>>> 4-byte or 8-byte boundary crossing during 2-byte access:
>>>
>>> 0 1 2 3 4 5 6 7 8
>>> X X X 1 1 X X X X
>>>
>>> In this case, if I align address 3 to address 0 and then read 4
>>> bytes
>>> instead of 8 bytes, I will not process the byte at address 4. This
>>> was
>>> the reason why I started to read 8 bytes.
>>>
>>> I also acknowledge that there could be an issue in the following
>>> case:
>>>
>>> X  4094 4095 4096
>>> X    1   1    X
>>> In this situation, when I read 8 bytes, there could be an issue
>>> where
>>> the new page (which starts at 4096) will not be mapped. It seems
>>> correct in this case to check that variable is within one page and
>>> read
>>> 4 bytes instead of 8.
>>>
>>> One more thing I am uncertain about is if we change everything to
>>> read
>>> 4 bytes with 4-byte alignment, what should be done with the first
>>> case?
>>> Should we panic? (I am not sure if this is an option.)
>>
>> Counter question (raised elsewhere already): What if a 4-byte access
>> crosses a word / cache line / page boundary? Ideally exactly the
>> same would happen for a 2-byte access crossing a respective boundary.
>> (Which you can achieve relatively easily by masking off only address
>> bit 1, keeping address bit 0 unaltered.)
> But if we align down on a 4-byte boundary and then access 4 bytes, we
> can't cross a boundary. I agree that the algorithm is not correct, as
> it can ignore some values in certain situations. For example:
> 0 1 2 3 4 5 6 7 8
> X X X 1 1 X X X X
> In this case, the value at address 4 won't be updated.
> 
> I agree that introducing a new macro to check if a variable crosses a
> boundary is necessary or as an option we can check that addr is 2-byte
> aligned:
> 
> #define CHECK_BOUNDARY_CROSSING(start, end, boundary_size)
> ASSERT((start / boundary_size) != (end / boundary_size))
> 
> Then, it is necessary to check:
> 
> CHECK_BOUNDARY_CROSSING(start, end, 4)
> CHECK_BOUNDARY_CROSSING(start, end, PAGE_SIZE)
> 
> But why do we need to check the cache line boundary? In the case of the
> cache, the question will only be about performance, but it should still
> work, shouldn't it?

You don't need to check for any of these boundaries. You can simply
leverage what the hardware does for misaligned accesses. See the
various other replies I've sent - I thought things should have become
pretty much crystal clear by now: For 1-byte accesses you access the
containing word, by clearing the low two bits. For 2-byte accesses
you also access the containing word, by clearing only bit 1 (which
the naturally leaves no bit that needs clearing for the projected
[but not necessary] case of handling a 4-byte access). If the resulting
4-byte access then is still misaligned, it'll fault

Re: [PATCH 2/6] svm: Improve type of cpu_has_svm_feature

On 06.02.2024 02:20, George Dunlap wrote:
> --- a/xen/arch/x86/include/asm/hvm/svm/svm.h
> +++ b/xen/arch/x86/include/asm/hvm/svm/svm.h
> @@ -38,7 +38,10 @@ extern u32 svm_feature_flags;
>  #define SVM_FEATURE_SSS   19 /* NPT Supervisor Shadow Stacks */
>  #define SVM_FEATURE_SPEC_CTRL 20 /* MSR_SPEC_CTRL virtualisation */
>  
> -#define cpu_has_svm_feature(f) (svm_feature_flags & (1u << (f)))
> +static inline bool cpu_has_svm_feature(unsigned int feat)
> +{
> +return svm_feature_flags & (1u << (feat));
> +}
>  #define cpu_has_svm_npt   cpu_has_svm_feature(SVM_FEATURE_NPT)
>  #define cpu_has_svm_lbrv  cpu_has_svm_feature(SVM_FEATURE_LBRV)
>  #define cpu_has_svm_svml  cpu_has_svm_feature(SVM_FEATURE_SVML)

Having seen patch 4 now, I have to raise the question here as well: Why
do we need a separate variable (svm_feature_flags) when we could use
the host policy (provided it isn't abused; see comments on patch 4)?

Jan

Re: [PATCH 4/6] nestedsvm: Disable TscRateMSR

On 06.02.2024 02:20, George Dunlap wrote:
> For now, just disable the functionality entirely until we can
> implement it properly:
> 
> - Don't set TSCRATEMSR in the host CPUID policy

This goes too far: This way you would (in principle) also affect guests
with nesting disabled. According to the earlier parts of the description
there's also no issue with it in that case. What you want to make sure
it that in the HVM policy the bit isn't set.

While presently resolving to cpu_has_svm_feature(), I think
cpu_has_tsc_ratio really ought to resolve to the host policy field.
Of course then requiring the host policy to reflect reality rather than
having what is "always emulated". IOW ...

> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -407,8 +407,7 @@ static void __init calculate_host_policy(void)
> (1u << SVM_FEATURE_PAUSEFILTER) |
> (1u << SVM_FEATURE_DECODEASSISTS));
>  /* Enable features which are always emulated. */
> -p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) |
> -   (1u << SVM_FEATURE_TSCRATEMSR));
> +p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN);

... this likely wants replacing altogether by not overriding what we
found in hardware, which would apparently mean moving the two bit
masks to the earlier "clamping" expression.

But then of course Andrew may know of reasons why all of this is done
in calculate_host_policy() in the first place, rather than in HVM
policy calculation.

Jan

Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h

On Mon, 2024-02-19 at 15:12 +0100, Jan Beulich wrote:
> On 19.02.2024 15:00, Oleksii wrote:
> > On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote:
> > > On 05/02/2024 15:32, Oleksii Kurochko wrote:
> > > > --- /dev/null
> > > > +++ b/xen/arch/riscv/include/asm/cmpxchg.h
> > > > @@ -0,0 +1,237 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > +/* Copyright (C) 2014 Regents of the University of California
> > > > */
> > > > +
> > > > +#ifndef _ASM_RISCV_CMPXCHG_H
> > > > +#define _ASM_RISCV_CMPXCHG_H
> > > > +
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +#define ALIGN_DOWN(addr, size)  ((addr) & (~((size) - 1)))
> > > > +
> > > > +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier,
> > > > acquire_barrier) \
> > > > +({ \
> > > > +    asm volatile( \
> > > > +    release_barrier \
> > > > +    " amoswap" sfx " %0, %2, %1\n" \
> > > > +    acquire_barrier \
> > > > +    : "=r" (ret), "+A" (*ptr) \
> > > > +    : "r" (new) \
> > > > +    : "memory" ); \
> > > > +})
> > > > +
> > > > +#define emulate_xchg_1_2(ptr, new, ret, release_barrier,
> > > > acquire_barrier) \
> > > > +({ \
> > > > +    uint32_t *ptr_32b_aligned = (uint32_t
> > > > *)ALIGN_DOWN((unsigned
> > > > long)ptr, 4); \
> > > > +    uint8_t mask_l = ((unsigned long)(ptr) & (0x8 -
> > > > sizeof(*ptr)))
> > > > * BITS_PER_BYTE; \
> > > > +    uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \
> > > > +    uint8_t mask_h = mask_l + mask_size - 1; \
> > > > +    unsigned long mask = GENMASK(mask_h, mask_l); \
> > > > +    unsigned long new_ = (unsigned long)(new) << mask_l; \
> > > > +    unsigned long ret_; \
> > > > +    unsigned long rc; \
> > > > +    \
> > > > +    asm volatile( \
> > > > +    release_barrier \
> > > > +    "0: lr.d %0, %2\n" \
> > > 
> > > I was going to ask why this is lr.d rather than lr.w. But I see
> > > Jan 
> > > already asked. I agree with him that it should probably be a lr.w
> > > and
> > > ...
> > > 
> > > > +    "   and  %1, %0, %z4\n" \
> > > > +    "   or   %1, %1, %z3\n" \
> > > > +    "   sc.d %1, %1, %2\n" \
> > > 
> > > ... respectively sc.w. The same applies for cmpxchg.
> > 
> > I agree that it would be better, and my initial attempt was to
> > handle
> > 4-byte or 8-byte boundary crossing during 2-byte access:
> > 
> > 0 1 2 3 4 5 6 7 8
> > X X X 1 1 X X X X
> > 
> > In this case, if I align address 3 to address 0 and then read 4
> > bytes
> > instead of 8 bytes, I will not process the byte at address 4. This
> > was
> > the reason why I started to read 8 bytes.
> > 
> > I also acknowledge that there could be an issue in the following
> > case:
> > 
> > X  4094 4095 4096
> > X    1   1    X
> > In this situation, when I read 8 bytes, there could be an issue
> > where
> > the new page (which starts at 4096) will not be mapped. It seems
> > correct in this case to check that variable is within one page and
> > read
> > 4 bytes instead of 8.
> > 
> > One more thing I am uncertain about is if we change everything to
> > read
> > 4 bytes with 4-byte alignment, what should be done with the first
> > case?
> > Should we panic? (I am not sure if this is an option.)
> 
> Counter question (raised elsewhere already): What if a 4-byte access
> crosses a word / cache line / page boundary? Ideally exactly the
> same would happen for a 2-byte access crossing a respective boundary.
> (Which you can achieve relatively easily by masking off only address
> bit 1, keeping address bit 0 unaltered.)
But if we align down on a 4-byte boundary and then access 4 bytes, we
can't cross a boundary. I agree that the algorithm is not correct, as
it can ignore some values in certain situations. For example:
0 1 2 3 4 5 6 7 8
X X X 1 1 X X X X
In this case, the value at address 4 won't be updated.

I agree that introducing a new macro to check if a variable crosses a
boundary is necessary or as an option we can check that addr is 2-byte
aligned:

#define CHECK_BOUNDARY_CROSSING(start, end, boundary_size)
ASSERT((start / boundary_size) != (end / boundary_size))

Then, it is necessary to check:

CHECK_BOUNDARY_CROSSING(start, end, 4)
CHECK_BOUNDARY_CROSSING(start, end, PAGE_SIZE)

But why do we need to check the cache line boundary? In the case of the
cache, the question will only be about performance, but it should still
work, shouldn't it?

~ Oleksii

[XEN PATCH] xen: cache clearing and invalidation helpers refactoring

2024-02-19 Thread Nicola Vetrini

The cache clearing and invalidation helpers in x86 and Arm didn't
comply with MISRA C Rule 17.7: "The value returned by a function
having non-void return type shall be used". On Arm they
were always returning 0, while some in x86 returned -EOPNOTSUPP
and in common/grant_table the return value is saved.

As a consequence, a common helper arch_grant_cache_flush that returns
an integer is introduced, so that each architecture can choose whether to
return an error value on certain conditions, and the helpers have either
been changed to return void (on Arm) or deleted entirely (on x86).

Signed-off-by: Julien Grall 
Signed-off-by: Nicola Vetrini 
---
The original refactor idea came from Julien Grall in [1]; I edited that proposal
to fix build errors.

I did introduce a cast to void for the call to flush_area_local on x86, because
even before this patch the return value of that function wasn't checked in all
but one use in x86/smp.c, and in this context the helper (perhaps incidentally)
ignored the return value of flush_area_local.

[1] 
https://lore.kernel.org/xen-devel/09589e8f-77b6-47f7-b5bd-cf485e4b6...@xen.org/
---
 xen/arch/arm/include/asm/page.h | 33 ++---
 xen/arch/x86/include/asm/flushtlb.h | 23 ++--
 xen/common/grant_table.c|  9 +---
 3 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h
index 69f817d1e68a..e90c9de3616e 100644
--- a/xen/arch/arm/include/asm/page.h
+++ b/xen/arch/arm/include/asm/page.h
@@ -123,6 +123,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
 #include 
 #include 
 #include 
@@ -159,13 +160,13 @@ static inline size_t read_dcache_line_bytes(void)
  * if 'range' is large enough we might want to use model-specific
  * full-cache flushes. */
 
-static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
+static inline void invalidate_dcache_va_range(const void *p, unsigned long 
size)
 {
 size_t cacheline_mask = dcache_line_bytes - 1;
 unsigned long idx = 0;
 
 if ( !size )
-return 0;
+return;
 
 /* Passing a region that wraps around is illegal */
 ASSERT(((uintptr_t)p + size - 1) >= (uintptr_t)p);
@@ -188,17 +189,15 @@ static inline int invalidate_dcache_va_range(const void 
*p, unsigned long size)
 asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p + idx));
 
 dsb(sy);   /* So we know the flushes happen before continuing */
-
-return 0;
 }
 
-static inline int clean_dcache_va_range(const void *p, unsigned long size)
+static inline void clean_dcache_va_range(const void *p, unsigned long size)
 {
 size_t cacheline_mask = dcache_line_bytes - 1;
 unsigned long idx = 0;
 
 if ( !size )
-return 0;
+return;
 
 /* Passing a region that wraps around is illegal */
 ASSERT(((uintptr_t)p + size - 1) >= (uintptr_t)p);
@@ -211,18 +210,16 @@ static inline int clean_dcache_va_range(const void *p, 
unsigned long size)
 idx += dcache_line_bytes, size -= dcache_line_bytes )
 asm volatile (__clean_dcache_one(0) : : "r" (p + idx));
 dsb(sy);   /* So we know the flushes happen before continuing */
-/* ARM callers assume that dcache_* functions cannot fail. */
-return 0;
 }
 
-static inline int clean_and_invalidate_dcache_va_range
+static inline void clean_and_invalidate_dcache_va_range
 (const void *p, unsigned long size)
 {
 size_t cacheline_mask = dcache_line_bytes - 1;
 unsigned long idx = 0;
 
 if ( !size )
-return 0;
+return;
 
 /* Passing a region that wraps around is illegal */
 ASSERT(((uintptr_t)p + size - 1) >= (uintptr_t)p);
@@ -235,8 +232,6 @@ static inline int clean_and_invalidate_dcache_va_range
 idx += dcache_line_bytes, size -= dcache_line_bytes )
 asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p + idx));
 dsb(sy); /* So we know the flushes happen before continuing */
-/* ARM callers assume that dcache_* functions cannot fail. */
-return 0;
 }
 
 /* Macros for flushing a single small item.  The predicate is always
@@ -266,6 +261,20 @@ static inline int clean_and_invalidate_dcache_va_range
 : : "r" (_p), "m" (*_p));   \
 } while (0)
 
+static inline int arch_grant_cache_flush(unsigned int op, const void *p,
+ unsigned long size)
+{
+if ( (op & GNTTAB_CACHE_INVAL) && (op & GNTTAB_CACHE_CLEAN) )
+clean_and_invalidate_dcache_va_range(p, size);
+else if ( op & GNTTAB_CACHE_INVAL )
+invalidate_dcache_va_range(p, size);
+else if ( op & GNTTAB_CACHE_CLEAN )
+clean_dcache_va_range(p, size);
+
+/* ARM callers assume that dcache_* functions cannot fail. */
+return 0;
+}
+
 /*
  * Write a pagetable entry.
  *
diff --git a/xen/arch/x86/include/asm/flushtlb.h

Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.3

On 19.02.2024 15:59, Federico Serafini wrote:
> On 19/02/24 14:43, Jan Beulich wrote:
>> On 19.02.2024 14:24, Federico Serafini wrote:
>>> Update ECLAIR configuration to consider safe switch clauses ending
>>> with __{get,put}_user_bad().
>>>
>>> Update docs/misra/deviations.rst accordingly.
>>>
>>> Signed-off-by: Federico Serafini 
>>
>> As mentioned I'm not happy with this, not the least because of it being
>> unclear why these two would be deviated, when there's no sign of a
>> similar deviation for, say, __bad_atomic_size(). Imo this approach
>> doesn't scale, and that's already leaving aside that the purpose of
>> identically named (pseudo-)helpers could differ between architectures,
>> thus putting under question ...
>>
>>> --- a/automation/eclair_analysis/ECLAIR/deviations.ecl
>>> +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
>>> @@ -368,6 +368,10 @@ safe."
>>>   -config=MC3R1.R16.3,reports+={safe, 
>>> "any_area(end_loc(any_exp(text(/BUG\\(\\);/"}
>>>   -doc_end
>>>   
>>> +-doc_begin="Switch clauses ending with constructs \"__get_user_bad()\" and 
>>> \"__put_user_bad()\" are safe: they denote an unreachable program point."
>>> +-config=MC3R1.R16.3,reports+={safe, 
>>> "any_area(end_loc(any_exp(text(/__(put|get)_user_bad\\(\\);/"}
>>> +-doc_end
>>
>> ... the global scope of such a deviation. While it may not be a good idea,
>> even within an arch such (pseudo-)helpers could be used for multiple
>> distinct purposes.
> 
> Would you agree with adding the missing break statement after
> the uses of __{put,get}_user_bad() (as it is already happening for
> uses of __bad_atomic_size())?

I probably wouldn't mind that (despite being a little pointless).
Perhaps declaring them as noreturn would also help?

Jan

Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h

On 19.02.2024 15:29, Oleksii wrote:
> On Mon, 2024-02-19 at 12:22 +0100, Jan Beulich wrote:
>> On 15.02.2024 14:41, Oleksii wrote:
>>> As I mentioned above with 4-byte alignment and then reading and
>>> working
>>> with 8-byte then crossing a word or double-word boundary shouldn't
>>> be
>>> an issue.
>>>
>>> I am not sure that I know how to check that we are crossing cache
>>> line
>>> boundary.
>>>
>>> Regarding page boundary, if the next page is mapped then all should
>>> work fine, otherwise it will be an exception.
>>
>> Are you sure lr.d / sc.d are happy to access across such a boundary,
>> when both pages are mapped?
> If they are mapped, my expectation that lr.d and sc.d should be happy.

How does this expectation of yours fit with the A extension doc having
this:

"For LR and SC, the A extension requires that the address held in rs1 be
 naturally aligned to the size of the operand (i.e., eight-byte aligned
 for 64-bit words and four-byte aligned for 32-bit words). If the
 address is not naturally aligned, an address-misaligned exception or an
 access-fault exception will be generated."

It doesn't even say "may"; it says "will".

>> To me it seems pretty clear that for atomic accesses you want to
>> demand natural alignment, i.e. 2-byte alignment for 2-byte accesses.
>> This way you can be sure no potentially problematic boundaries will
>> be crossed.
> It makes sense, but I am not sure that I can guarantee that a user of
> macros will always have 2-byte alignment (except during a panic) in the
> future.
> 
> Even now, I am uncertain that everyone will be willing to add
> __alignment(...) to struct vcpu->is_urgent
> (xen/include/xen/sched.h:218) and other possible cases to accommodate
> RISC-V requirements.

->is_urgent is bool, i.e. 1 byte and hence okay at any address. For all
normal variables and fields the compiler will guarantee suitable
(natural) alignment. What you prohibit by requiring aligned items is
use of fields of e.g. packed structures.

>>> As 1- and 2-byte cases are emulated I decided that is not to
>>> provide
>>> sfx argument for emulation macros as it will not have to much
>>> affect on
>>> emulated types and just consume more performance on acquire and
>>> release
>>> version of sc/ld instructions.
>>
>> Question is whether the common case (4- and 8-byte accesses)
>> shouldn't
>> be valued higher, with 1- and 2-byte emulation being there just to
>> allow things to not break altogether.
> If I understand you correctly, it would make sense to add the 'sfx'
> argument for the 1/2-byte access case, ensuring that all options are
> available for 1/2-byte access case as well.

That's one of the possibilities. As said, I'm not overly worried about
the emulated cases. For the initial implementation I'd recommend going
with what is easiest there, yielding the best possible result for the
4- and 8-byte cases. If later it turns out repeated acquire/release
accesses are a problem in the emulation loop, things can be changed
to explicit barriers, without touching the 4- and 8-byte cases.

Jan

Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.3

2024-02-19 Thread Federico Serafini


On 19/02/24 14:43, Jan Beulich wrote:

On 19.02.2024 14:24, Federico Serafini wrote:

Update ECLAIR configuration to consider safe switch clauses ending
with __{get,put}_user_bad().

Update docs/misra/deviations.rst accordingly.

Signed-off-by: Federico Serafini 


As mentioned I'm not happy with this, not the least because of it being
unclear why these two would be deviated, when there's no sign of a
similar deviation for, say, __bad_atomic_size(). Imo this approach
doesn't scale, and that's already leaving aside that the purpose of
identically named (pseudo-)helpers could differ between architectures,
thus putting under question ...


--- a/automation/eclair_analysis/ECLAIR/deviations.ecl
+++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
@@ -368,6 +368,10 @@ safe."
  -config=MC3R1.R16.3,reports+={safe, 
"any_area(end_loc(any_exp(text(/BUG\\(\\);/"}
  -doc_end
  
+-doc_begin="Switch clauses ending with constructs \"__get_user_bad()\" and \"__put_user_bad()\" are safe: they denote an unreachable program point."

+-config=MC3R1.R16.3,reports+={safe, 
"any_area(end_loc(any_exp(text(/__(put|get)_user_bad\\(\\);/"}
+-doc_end


... the global scope of such a deviation. While it may not be a good idea,
even within an arch such (pseudo-)helpers could be used for multiple
distinct purposes.


Would you agree with adding the missing break statement after
the uses of __{put,get}_user_bad() (as it is already happening for
uses of __bad_atomic_size())?

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)

Re: [XEN v4 2/3] xen/arm: arm64: Add emulation of Debug Data Transfer Registers

2024-02-19 Thread Ayan Kumar Halder



On 06/02/2024 19:05, Julien Grall wrote:

Hi Ayan,

Hi Julien/Michal,


On 31/01/2024 12:10, Ayan Kumar Halder wrote:

From: Michal Orzel 

Currently, if user enables HVC_DCC config option in Linux, it invokes 
access
to debug data transfer registers (i.e. DBGDTRTX_EL0 on arm64, 
DBGDTRTXINT on

arm32). As these registers are not emulated, Xen injects an undefined
exception to the guest and Linux crashes.

To prevent this crash, introduce a partial emulation of DBGDTR[TR]X_EL0
(these registers share the same encoding) as RAZ/WI and MDCCSR_EL0 as 
TXfull.


Refer ARM DDI 0487J.a ID042523, D19.3.8, DBGDTRTX_EL0
"If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN".

Thus, any OS is expected to read MDCCSR_EL0 and check for TXfull before
using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() ---> 
hvc_dcc_check(),

and returns -ENODEV in case TXfull bit is still set after writing a test
character. This way we prevent the guest from making use of HVC DCC as a
console.

Signed-off-by: Michal Orzel 
Signed-off-by: Ayan Kumar Halder 
---
Changes from

v1 :- 1. DBGDTR_EL0 does not emulate RXfull. This is to avoid giving 
the OS any

indication that the RX buffer is full and is waiting to be read.

2. In Arm32, DBGOSLSR is emulated. Also DBGDTRTXINT is emulated at 
EL0 only.


3. Fixed the commit message and inline code comments.

v2 :- 1. Split the patch into two (separate patches for arm64 and 
arm32).

2. Removed the "fail" label.
3. Fixed the commit message.

v3 :- 1. "HSR_SYSREG_MDCCSR_EL0" emulation differs based on whether
partial_emulation_enabled is true or not.

2. If partial_emulation_enabled is false, then access to 
HSR_SYSREG_DBGDTR_EL0,

HSR_SYSREG_DBGDTRTX_EL0 would lead to undefined exception.

  xen/arch/arm/arm64/vsysreg.c | 28 
  xen/arch/arm/include/asm/arm64/hsr.h |  3 +++
  2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index b5d54c569b..94f0a6c384 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs,
   *
   * Unhandled:
   *    MDCCINT_EL1
- *    DBGDTR_EL0
- *    DBGDTRRX_EL0
- *    DBGDTRTX_EL0
   *    OSDTRRX_EL1
   *    OSDTRTX_EL1
   *    OSECCR_EL1
@@ -173,10 +170,32 @@ void do_sysreg(struct cpu_user_regs *regs,
  return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
  case HSR_SYSREG_MDCCSR_EL0:
  /*
+ * Xen doesn't expose a real (or emulated) Debug 
Communications Channel
+ * (DCC) to a domain. Yet the Arm ARM implies this is not an 
optional
+ * feature. So some domains may start to probe it. For 
instance, the
+ * HVC_DCC driver in Linux (since f35dc083 and at least 
up to v6.7),
+ * will try to write some characters and check if the 
transmit buffer

+ * has emptied.
+ *
+ * By setting TX status bit (only if partial emulation is 
enabled) to
+ * indicate the transmit buffer is full, we would hint the 
OS that the

+ * DCC is probably not working.
+ *
+ * Bit 29: TX full
+ *
   * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We 
emulate that

   * register as RAZ/WI above. So RO at both EL0 and EL1.


The sentence "we emulate that register as ..." seems to be stale?


   */
-    return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 0);
+    return handle_ro_read_val(regs, regidx, hsr.sysreg.read, 
hsr, 0,

+  partial_emulation ? (1U << 29) : 0);
+
+    case HSR_SYSREG_DBGDTR_EL0:
+    /* DBGDTR[TR]X_EL0 share the same encoding */
+    case HSR_SYSREG_DBGDTRTX_EL0:
+    if ( !partial_emulation )
+    goto fail;
+    return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 0);


AFAICT, all the emulation helpers have an explanation why we are using 
them. But here this is not the case. Can you add one?

This and..



+
  HSR_SYSREG_DBG_CASES(DBGBVR):
  HSR_SYSREG_DBG_CASES(DBGBCR):
  HSR_SYSREG_DBG_CASES(DBGWVR):
@@ -394,6 +413,7 @@ void do_sysreg(struct cpu_user_regs *regs,
   * And all other unknown registers.
   */
  default:
+ fail:


AFAICT, this would violate MISRA 15.3 [1]. We didn't seem to have yet 
(?) accepted the rule, but I don't see we would not given I feel this 
is similar to what Rule 16.2 is trying to prevent and we accepted it.


I think case, I move all the code within default outside. And then 
call "goto fail" from the default label.


I am not sure if I have interpreted this correctly.

Is it ok if you can take a look at the attached patch and let me know if 
the explaination and the code change looks sane ?


- Ayan




  {
  const struct hsr_sysreg sysreg = hsr.sysreg;
  diff --git a/xen/arch/arm/include/asm/arm64/hsr.h 
b/xen/arch/arm/include/asm/arm64/hsr.h

Re: [PATCH v4 17/30] xen/riscv: introduce regs.h

Hi Julien,

On Sun, 2024-02-18 at 18:22 +, Julien Grall wrote:
> Hi,
> 
> On 05/02/2024 15:32, Oleksii Kurochko wrote:
> > Signed-off-by: Oleksii Kurochko 
> > Acked-by: Jan Beulich 
> > --
> > Changes in V4:
> >   - add Acked-by: Jan Beulich 
> >   - s/BUG()/BUG_ON("unimplemented")
> > ---
> > Changes in V3:
> >   - update the commit message
> >   - add Acked-by: Jan Beulich 
> >   - remove "include " and use a forward declaration
> > instead.
> > ---
> > Changes in V2:
> >   - change xen/lib.h to xen/bug.h
> >   - remove unnecessary empty line
> > ---
> > xen/arch/riscv/include/asm/regs.h | 29
> > +
> >   1 file changed, 29 insertions(+)
> >   create mode 100644 xen/arch/riscv/include/asm/regs.h
> > 
> > diff --git a/xen/arch/riscv/include/asm/regs.h
> > b/xen/arch/riscv/include/asm/regs.h
> > new file mode 100644
> > index 00..c70ea2aa0c
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/regs.h
> > @@ -0,0 +1,29 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +#ifndef __ARM_RISCV_REGS_H__
> > +#define __ARM_RISCV_REGS_H__
> > +
> > +#ifndef __ASSEMBLY__
> > +
> > +#include 
> > +
> > +#define hyp_mode(r) (0)
> 
> I don't understand where here you return 0 (which should really be 
> false) but ...
> 
> > +
> > +struct cpu_user_regs;
> > +
> > +static inline bool guest_mode(const struct cpu_user_regs *r)
> > +{
> > +    BUG_ON("unimplemented");
> > +}
> 
> ... here you return BUG_ON(). But I couldn't find any user of both 
> guest_mode() and hyp_mode(). So isn't it a bit prematurate to
> introduce 
> the helpers?

I agree regarding hyp_mode() it can be dropped , but gest_mode() is
used by common/keyhandler.c:142.

~ Oleksii

Re: [PATCH 3/6] xen/hvm: Move other hvm_function_table booleans into the caps bitfield

On 06.02.2024 02:20, George Dunlap wrote:
> @@ -144,7 +144,7 @@ static bool __init hap_supported(struct 
> hvm_function_table *fns)
>  
>  if ( !opt_hap_enabled )
>  {
> -fns->hap_supported = 0;
> +fns->caps.hap = 0;

As you touch such, would you mind switching to true/false instead of
1/0 at this occasion?

> @@ -3000,7 +3000,7 @@ const struct hvm_function_table * __init start_vmx(void)
>  vmx_function_table.update_eoi_exit_bitmap = 
> vmx_update_eoi_exit_bitmap;
>  vmx_function_table.process_isr = vmx_process_isr;
>  vmx_function_table.handle_eoi = vmx_handle_eoi;
> -vmx_function_table.virtual_intr_delivery_enabled = true;
> +vmx_function_table.caps.virtual_intr_delivery = true;

I'm unsure about this one - it had "enabled" in its name for a good
reason. Then again its (somewhat more involved) derivation from
other capability bits (and a command line option) isn't fundamentally
different from that of, say, hap_supported. Hence I guess with the
other item taken care of
Reviewed-by: Jan Beulich 

Jan

Re: [PATCH v4 14/30] xen/riscv: introduce atomic.h

Hi Julien,

On Sun, 2024-02-18 at 19:22 +, Julien Grall wrote:
> Hi,
> 
> On 05/02/2024 15:32, Oleksii Kurochko wrote:
> > From: Bobby Eshleman 
> > 
> > Additionally, this patch introduces macros in fence.h,
> > which are utilized in atomic.h.
> > 
> > atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n)
> > were updated to use __*xchg_generic().
> > 
> > Signed-off-by: Oleksii Kurochko 
> 
> The author is Bobby, but I don't see a Signed-off-by. Did you forgot
> it?
I missed to add that as I thought that it would be enough to change a
commit author.

> 
> > ---
> > Changes in V4:
> >   - do changes related to the updates of [PATCH v3 13/34]
> > xen/riscv: introduce cmpxchg.h
> >   - drop casts in read_atomic_size(), write_atomic(), add_sized()
> >   - tabs -> spaces
> >   - drop #ifdef CONFIG_SMP ... #endif in fence.ha as it is simpler
> > to handle NR_CPUS=1
> >     the same as NR_CPUS>1 with accepting less than ideal
> > performance.
> > ---
> > Changes in V3:
> >    - update the commit message
> >    - add SPDX for fence.h
> >    - code style fixes
> >    - Remove /* TODO: ... */ for add_sized macros. It looks correct
> > to me.
> >    - re-order the patch
> >    - merge to this patch fence.h
> > ---
> > Changes in V2:
> >   - Change an author of commit. I got this header from Bobby's old
> > repo.
> > ---
> >   xen/arch/riscv/include/asm/atomic.h | 395
> > 
> >   xen/arch/riscv/include/asm/fence.h  |   8 +
> >   2 files changed, 403 insertions(+)
> >   create mode 100644 xen/arch/riscv/include/asm/atomic.h
> >   create mode 100644 xen/arch/riscv/include/asm/fence.h
> > 
> > diff --git a/xen/arch/riscv/include/asm/atomic.h
> > b/xen/arch/riscv/include/asm/atomic.h
> > new file mode 100644
> > index 00..267d3c0803
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/atomic.h
> > @@ -0,0 +1,395 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Taken and modified from Linux.
> 
> Which version of Linux? Can you also spell out what are the big
> changes? 
> This would be helpful if we need to re-sync.
Sure, I'll add the changes here.

> 
> > + *
> > + * atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were
> > updated to use
> > + * __*xchg_generic()
> > + *
> > + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
> > + * Copyright (C) 2012 Regents of the University of California
> > + * Copyright (C) 2017 SiFive
> > + * Copyright (C) 2021 Vates SAS
> > + */
> > +
> > +#ifndef _ASM_RISCV_ATOMIC_H
> > +#define _ASM_RISCV_ATOMIC_H
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +void __bad_atomic_size(void);
> > +
> > +static always_inline void read_atomic_size(const volatile void *p,
> > +   void *res,
> > +   unsigned int size)
> > +{
> > +    switch ( size )
> > +    {
> > +    case 1: *(uint8_t *)res = readb(p); break;
> > +    case 2: *(uint16_t *)res = readw(p); break;
> > +    case 4: *(uint32_t *)res = readl(p); break;
> > +    case 8: *(uint32_t *)res  = readq(p); break;
> > +    default: __bad_atomic_size(); break;
> > +    }
> > +}
> > +
> > +#define read_atomic(p) ({   \
> > +    union { typeof(*p) val; char c[0]; } x_;    \
> > +    read_atomic_size(p, x_.c, sizeof(*p));  \
> > +    x_.val; \
> > +})
> > +
> > +#define write_atomic(p, x)  \
> > +({  \
> > +    typeof(*p) x__ = (x);   \
> > +    switch ( sizeof(*p) )   \
> > +    {   \
> > +    case 1: writeb((uint8_t)x__,  p); break;    \
> > +    case 2: writew((uint16_t)x__, p); break;    \
> > +    case 4: writel((uint32_t)x__, p); break;    \
> > +    case 8: writeq((uint64_t)x__, p); break;    \
> > +    default: __bad_atomic_size(); break;    \
> > +    }   \
> > +    x__;    \
> > +})
> > +
> > +#define add_sized(p, x) \
> > +({  \
> > +    typeof(*(p)) x__ = (x); \
> > +    switch ( sizeof(*(p)) ) \
> > +    {   \
> > +    case 1: writeb(read_atomic(p) + x__, p); break; \
> > +    case 2: writew(read_atomic(p) + x__, p); break; \
> > +    case 4: writel(read_atomic(p) + x__, p); break; \
> > +    default: __bad_atomic_size(); break;    \
> > +    }   \
> > +})
> > +
> > +/*
> > + *  __unqual_scalar_typeof(x) - Declare an unqualified scalar
> > type, leaving
> > + *

Re: [PATCH v4 13/30] xen/riscv: introduce io.h

On Sun, 2024-02-18 at 19:07 +, Julien Grall wrote:
> 
> > +/*
> > + * Unordered I/O memory access primitives.  These are even more
> > relaxed than
> > + * the relaxed versions, as they don't even order accesses between
> > successive
> > + * operations to the I/O regions.
> > + */
> > +#define readb_cpu(c)   ({ uint8_t  __r = __raw_readb(c);
> > __r; })
> > +#define readw_cpu(c)   ({ uint16_t __r =
> > le16_to_cpu((__force __le16)__raw_readw(c)); __r; })
> > +#define readl_cpu(c)   ({ uint32_t __r =
> > le32_to_cpu((__force __le32)__raw_readl(c)); __r; })
> > +
> > +#define writeb_cpu(v,c)((void)__raw_writeb(v,c))
> > +#define
> > writew_cpu(v,c) ((void)__raw_writew((__force 
> > uint16_t)cpu_to_le16(v),c))
> > +#define
> > writel_cpu(v,c) ((void)__raw_writel((__force 
> > uint32_t)cpu_to_le32(v),c))
> 
> NIT: __raw_write*() are already returning void. So I am not sure to 
> understand the pointer of the cast. IIUC, this is coming from Linux,
> are 
> you intend to keep the code as-is (including style)? If not, then I 
> woudl consider to drop the cast on the three lines above and ...
Changes have already been made in this header, so it makes sense to
remove these casts. Thanks.


~ Oleksii

[xen-unstable-smoke test] 184701: tolerable all pass - PUSHED

flight 184701 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184701/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  c144b9e32427ba37e0e0443a0d3fa53e9fb878b6
baseline version:
 xen  0441c3acc7e9e72e984ce49d32e61827894ae4a3

Last test of basis   184685  2024-02-16 18:03:46 Z2 days
Testing same since   184701  2024-02-19 12:03:49 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 
  Frediano Ziglio 
  Julien Grall 
  Marek Marczykowski-Górecki 
  Oleksandr Tyshchenko 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   0441c3acc7..c144b9e324  c144b9e32427ba37e0e0443a0d3fa53e9fb878b6 -> smoke

Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h

On Mon, 2024-02-19 at 12:22 +0100, Jan Beulich wrote:
> On 15.02.2024 14:41, Oleksii wrote:
> > > > +    : "=r" (ret), "+A" (*ptr) \
> > > > +    : "r" (new) \
> > > > +    : "memory" ); \
> > > > +})
> > > > +
> > > > +#define emulate_xchg_1_2(ptr, new, ret, release_barrier,
> > > > acquire_barrier) \
> > > > +({ \
> > > > +    uint32_t *ptr_32b_aligned = (uint32_t
> > > > *)ALIGN_DOWN((unsigned
> > > > long)ptr, 4); \
> > > 
> > > You now appear to assume that this macro is only used with inputs
> > > not
> > > crossing word boundaries. That's okay as long as suitably
> > > guaranteed
> > > at the use sites, but imo wants saying in a comment.
> > > 
> > > > +    uint8_t mask_l = ((unsigned long)(ptr) & (0x8 -
> > > > sizeof(*ptr)))
> > > > * BITS_PER_BYTE; \
> > > 
> > > Why 0x8 (i.e. spanning 64 bits), not 4 (matching the uint32_t use
> > > above)?
> > The idea to read 8 bytes was to deal with crossing word boundary.
> > So if
> > our address is 0x3 and we have to xchg() 2 bytes, what will cross 4
> > byte boundary. Instead we align add 0x3, so it will become 0x0 and
> > then
> > just always work with 8 bytes.
> 
> Then what if my 2-byte access crosses a dword boundary? A cache line
> one? A page one?
Everything looks okay to me, except in the case of a page boundary.

In the scenario of a dword boundary:

0 1 2 3 4 5 6 7 8 9 ...
X X X X X X X 1 1 X

Assuming a variable starts at address 7, 4-byte alignment will be
enforced, and 8 bytes will be processed starting from address 4.

Concerning a cache line, it should still work, with potential
performance issues arising only if a part of the variable is cached
while another part is not.

Regarding page crossing, I acknowledge that it could be problematic if
the variable is entirely located at the end of a page, as there is no
guarantee that the next page exists. In this case, it would be
preferable to consistently read 4 bytes with 4-byte alignment:

X 4094 4095 4096?
X  11?

However, if the variable spans two pages, proper page mapping should be
ensured.

It appears sensible to reconsider the macros and implement 4-byte
alignment and 4-byte access, but then this is not clear how better to
deal with first case ( dword boundary ). Panic ? or use the macros
twice for address 4, and address 8?

> 
> > > > +    unsigned long new_ = (unsigned long)(new) << mask_l; \
> > > > +    unsigned long ret_; \
> > > > +    unsigned long rc; \
> > > 
> > > Similarly, why unsigned long here?
> > sizeof(unsigned long) is 8 bytes and it was chosen as we are
> > working
> > with lc/sc.d which are working with 8 bytes.
> > 
> > > 
> > > I also wonder about the mix of underscore suffixed (or not)
> > > variable
> > > names here.
> > If the question about ret_, then the same as before size of ret
> > argument of the macros will be 1 or 2, but {lc/sc}.d expected to
> > work
> > with 8 bytes.
> 
> Then what's the uint32_t * about?
Agree, then it should be also unsigned long.
> > 

> > > > +    __typeof__(*(ptr)) new__ = (new); \
> > > > +    __typeof__(*(ptr)) ret__; \
> > > > +    switch (size) \
> > > > +    { \
> > > > +    case 1: \
> > > > +    case 2: \
> > > > +    emulate_xchg_1_2(ptr__, new__, ret__, release_barrier,
> > > > acquire_barrier); \
> > > 
> > > ... this would become
> > > 
> > >     ret__ = emulate_xchg_1_2(ptr__, new__, release_barrier,
> > > acquire_barrier); \
> > > 
> > > But, unlike assumed above, there's no enforcement here that a 2-
> > > byte
> > > quantity won't cross a word, double-word, cache line, or even
> > > page
> > > boundary. That might be okay if then the code would simply crash
> > > (like
> > > the AMO insns emitted further down would), but aiui silent
> > > misbehavior
> > > would result.
> > As I mentioned above with 4-byte alignment and then reading and
> > working
> > with 8-byte then crossing a word or double-word boundary shouldn't
> > be
> > an issue.
> > 
> > I am not sure that I know how to check that we are crossing cache
> > line
> > boundary.
> > 
> > Regarding page boundary, if the next page is mapped then all should
> > work fine, otherwise it will be an exception.
> 
> Are you sure lr.d / sc.d are happy to access across such a boundary,
> when both pages are mapped?
If they are mapped, my expectation that lr.d and sc.d should be happy.

> 
> To me it seems pretty clear that for atomic accesses you want to
> demand natural alignment, i.e. 2-byte alignment for 2-byte accesses.
> This way you can be sure no potentially problematic boundaries will
> be crossed.
It makes sense, but I am not sure that I can guarantee that a user of
macros will always have 2-byte alignment (except during a panic) in the
future.

Even now, I am uncertain that everyone will be willing to add
__alignment(...) to struct vcpu->is_urgent
(xen/include/xen/sched.h:218) and other possible cases to accommodate
RISC-V requirements.

> 
> > > > +    break; \
> > > > +    case 4: \
> > > > +    __amoswap_generic(ptr__,

Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h

Hi Oleksii,

On 19/02/2024 14:00, Oleksii wrote:

On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote:

On 05/02/2024 15:32, Oleksii Kurochko wrote:

The header was taken from Linux kernl 6.4.0-rc1.

Addionally, were updated:
* add emulation of {cmp}xchg for 1/2 byte types

This explaination is a little bit light. IIUC, you are implementing
them
using 32-bit atomic access. Is that correct? If so, please spell it
out.

Sure, I'll update commit message.

Also, I wonder whether it would be better to try to get rid of the
1/2
bytes access. Do you know where they are used?

Right now, the issue is with test_and_clear_bool() which is used in
common/sched/core.c:840
[https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/sched/core.c?ref_type=heads#L840
]

I don't remember details, but in xen-devel chat someone told me that
grant table requires 1/2 bytes access.

Ok :/. This would be part of the ABI then and therefore can't be easily
changed.

* replace tabs with spaces

Does this mean you are not planning to backport any Linux fixes?

If it will be any fixes for sure I'll back port them, but it looks like
this code is stable enough and not to many fixes will be done there, so
it shouldn't be hard to backport them and switch to spaces.

Fair enough.

+ " and %1, %0, %z4\n" \
+ " or %1, %1, %z3\n" \
+ " sc.d %1, %1, %2\n" \

... respectively sc.w. The same applies for cmpxchg.

I agree that it would be better, and my initial attempt was to handle
4-byte or 8-byte boundary crossing during 2-byte access:

0 1 2 3 4 5 6 7 8
X X X 1 1 X X X X

In this case, if I align address 3 to address 0 and then read 4 bytes
instead of 8 bytes, I will not process the byte at address 4. This was
the reason why I started to read 8 bytes.

At least on Arm, the architecture doesn't support atomic operations if
the access is not aligned to its size (this will send a data abort). On
some architecture, this is supported but potentially very slow.

So all the common code should already use properly aligned address.
Therefore, I don't really see the reason to add support for unaligned
access.

Cheers,

--
Julien Grall

[linux-linus test] 184700: tolerable FAIL - PUSHED

flight 184700 linux-linus real [real]
flight 184702 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/184700/
http://logs.test-lab.xenproject.org/osstest/logs/184702/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-freebsd11-amd64 19 guest-localmigrate/x10 fail pass in 
184702-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184697
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184697
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184697
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184697
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184697
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184697
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184697
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184697
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linuxb401b621758e46812da61fa58a67c3fd8d91de0d
baseline version:
 linux6c160f16be5df1f66f6afe186c961ad446d7f94b

Last test of basis   184697  2024-02-18 18:43:56 Z0 days
Testing same since   184700  2024-02-19 03:44:55 Z0 days1 attempts


People who touched revisions under test:
  Linus Torvalds 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386

Re: [PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure

2024-02-19 Thread Stewart Hildebrand

On 2/19/24 08:12, Jan Beulich wrote:
> On 19.02.2024 13:47, Stewart Hildebrand wrote:
>> On 2/19/24 07:10, Jan Beulich wrote:
>>> On 19.02.2024 12:47, Stewart Hildebrand wrote:
 @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
  {
  unsigned int i;
  
 +/*
 + * Assert that d->pdev_list doesn't change. 
 ASSERT_PDEV_LIST_IS_READ_LOCKED
 + * is not suitable here because it may allow either pcidevs_lock() or
 + * d->pci_lock to be held, but here we rely on d->pci_lock being 
 held, not
 + * pcidevs_lock().
 + */
 +ASSERT(rw_is_locked(>pdev->domain->pci_lock));
 +ASSERT(spin_is_locked(>pdev->vpci->lock));
>>>
>>> There's no "d" in sight here, so it's a little odd that "d" is being talked
>>> about. But I guess people can infer what's meant without too much trouble.
>>
>> I can s/d->pci_lock/msix->pdev->domain->pci_lock/ for the next rev.
> 
> Or simply drop the d-s? That would be better for readability's sake,
> I think.

OK

 @@ -313,17 +316,36 @@ void vpci_dump_msi(void)
  {
  /*
   * On error vpci_msix_arch_print will always return 
 without
 - * holding the lock.
 + * holding the locks.
   */
  printk("unable to print all MSI-X entries: %d\n", rc);
 -process_pending_softirqs();
 -continue;
 +goto pdev_done;
  }
  }
  
 +/*
 + * Unlock locks to process pending softirqs. This is
 + * potentially unsafe, as d->pdev_list can be changed in
 + * meantime.
 + */
  spin_unlock(>vpci->lock);
 +read_unlock(>pci_lock);
 +pdev_done:
  process_pending_softirqs();
 +if ( !read_trylock(>pci_lock) )
 +{
 +printk("unable to access other devices for the domain\n");
 +goto domain_done;
 +}
  }
 +read_unlock(>pci_lock);
 +domain_done:
 +/*
 + * We need this label at the end of the loop, but some
 + * compilers might not be happy about label at the end of the
 + * compound statement so we adding an empty statement here.
 + */
 +;
>>>
>>> As to "some compilers": Are there any which accept a label not followed
>>> by a statement? Depending on the answer, this comment may be viewed as
>>> superfluous. Or else I'd ask about wording: Besides a grammar issue I
>>> also don't view it as appropriate that a comment talks about "adding"
>>> something when its adjacent code that is meant. That something is there
>>> when the comment is there, hence respective wording should imo be used.
>>
>> It seems like hit or miss whether gcc would accept it or not (prior
>> discussion at [1]). I agree the comment is rather lengthy for what it's
>> trying to convey. I'd be happy to either remove the comment or reduce
>> it to:
>>
>> domain_done:
>> ; /* Empty statement to make some compilers happy */
>>
>> [1] 
>> https://lore.kernel.org/xen-devel/98b8c131-b0b9-f46c-5f46-c2136f2e3...@amd.com/
> 
> This earlier discussion only proves that there is at least one compiler
> objecting. There's no proof there that any compiler exists which, as a
> language extension, actually permits such syntax. Yet if the comment
> was purely about normal language syntax, then imo it should be zapped
> altogether, not just be shrunk.

I'll zap it

Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h

On 19.02.2024 15:00, Oleksii wrote:
> On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote:
>> On 05/02/2024 15:32, Oleksii Kurochko wrote:
>>> --- /dev/null
>>> +++ b/xen/arch/riscv/include/asm/cmpxchg.h
>>> @@ -0,0 +1,237 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/* Copyright (C) 2014 Regents of the University of California */
>>> +
>>> +#ifndef _ASM_RISCV_CMPXCHG_H
>>> +#define _ASM_RISCV_CMPXCHG_H
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#define ALIGN_DOWN(addr, size)  ((addr) & (~((size) - 1)))
>>> +
>>> +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier,
>>> acquire_barrier) \
>>> +({ \
>>> +    asm volatile( \
>>> +    release_barrier \
>>> +    " amoswap" sfx " %0, %2, %1\n" \
>>> +    acquire_barrier \
>>> +    : "=r" (ret), "+A" (*ptr) \
>>> +    : "r" (new) \
>>> +    : "memory" ); \
>>> +})
>>> +
>>> +#define emulate_xchg_1_2(ptr, new, ret, release_barrier,
>>> acquire_barrier) \
>>> +({ \
>>> +    uint32_t *ptr_32b_aligned = (uint32_t *)ALIGN_DOWN((unsigned
>>> long)ptr, 4); \
>>> +    uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - sizeof(*ptr)))
>>> * BITS_PER_BYTE; \
>>> +    uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \
>>> +    uint8_t mask_h = mask_l + mask_size - 1; \
>>> +    unsigned long mask = GENMASK(mask_h, mask_l); \
>>> +    unsigned long new_ = (unsigned long)(new) << mask_l; \
>>> +    unsigned long ret_; \
>>> +    unsigned long rc; \
>>> +    \
>>> +    asm volatile( \
>>> +    release_barrier \
>>> +    "0: lr.d %0, %2\n" \
>>
>> I was going to ask why this is lr.d rather than lr.w. But I see Jan 
>> already asked. I agree with him that it should probably be a lr.w and
>> ...
>>
>>> +    "   and  %1, %0, %z4\n" \
>>> +    "   or   %1, %1, %z3\n" \
>>> +    "   sc.d %1, %1, %2\n" \
>>
>> ... respectively sc.w. The same applies for cmpxchg.
> 
> I agree that it would be better, and my initial attempt was to handle
> 4-byte or 8-byte boundary crossing during 2-byte access:
> 
> 0 1 2 3 4 5 6 7 8
> X X X 1 1 X X X X
> 
> In this case, if I align address 3 to address 0 and then read 4 bytes
> instead of 8 bytes, I will not process the byte at address 4. This was
> the reason why I started to read 8 bytes.
> 
> I also acknowledge that there could be an issue in the following case:
> 
> X  4094 4095 4096
> X1   1X
> In this situation, when I read 8 bytes, there could be an issue where
> the new page (which starts at 4096) will not be mapped. It seems
> correct in this case to check that variable is within one page and read
> 4 bytes instead of 8.
> 
> One more thing I am uncertain about is if we change everything to read
> 4 bytes with 4-byte alignment, what should be done with the first case?
> Should we panic? (I am not sure if this is an option.)

Counter question (raised elsewhere already): What if a 4-byte access
crosses a word / cache line / page boundary? Ideally exactly the
same would happen for a 2-byte access crossing a respective boundary.
(Which you can achieve relatively easily by masking off only address
bit 1, keeping address bit 0 unaltered.)

> Should we
> perform the operation twice for addresses 0x0 and 0x4?

That wouldn't be atomic then.

Jan

Re: [PATCH 2/6] svm: Improve type of cpu_has_svm_feature

On 06.02.2024 02:20, George Dunlap wrote:
> The "effective type" of the cpu_has_svm_feature macro is effectively
> an unsigned log with one bit set (or not); at least one place someone
> felt compelled to do a !! to make sure that they got a boolean out of
> it.
> 
> Ideally the whole of this would be folded into the cpufeature.h
> infrastructure.  But for now, duplicate the more type-safe static
> inlines in that file, and remove the !!.
> 
> Signed-off-by: George Dunlap 

Acked-by: Jan Beulich 
albeit preferably with ...

> --- a/xen/arch/x86/include/asm/hvm/svm/svm.h
> +++ b/xen/arch/x86/include/asm/hvm/svm/svm.h
> @@ -38,7 +38,10 @@ extern u32 svm_feature_flags;
>  #define SVM_FEATURE_SSS   19 /* NPT Supervisor Shadow Stacks */
>  #define SVM_FEATURE_SPEC_CTRL 20 /* MSR_SPEC_CTRL virtualisation */
>  
> -#define cpu_has_svm_feature(f) (svm_feature_flags & (1u << (f)))
> +static inline bool cpu_has_svm_feature(unsigned int feat)
> +{
> +return svm_feature_flags & (1u << (feat));

... the inner pair of parentheses dropped.

Jan

Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h

On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote:
> 
> 
> On 05/02/2024 15:32, Oleksii Kurochko wrote:
> > The header was taken from Linux kernl 6.4.0-rc1.
> > 
> > Addionally, were updated:
> > * add emulation of {cmp}xchg for 1/2 byte types
> 
> This explaination is a little bit light. IIUC, you are implementing
> them 
> using 32-bit atomic access. Is that correct? If so, please spell it
> out.
Sure, I'll update commit message.

> 
> Also, I wonder whether it would be better to try to get rid of the
> 1/2 
> bytes access. Do you know where they are used?
Right now, the issue is with test_and_clear_bool() which is used in
common/sched/core.c:840
[https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/sched/core.c?ref_type=heads#L840
]

I don't remember details, but in xen-devel chat someone told me that
grant table requires 1/2 bytes access.

> 
> > * replace tabs with spaces
> Does this mean you are not planning to backport any Linux fixes?
If it will be any fixes for sure I'll back port them, but it looks like
this code is stable enough and not to many fixes will be done there, so
it shouldn't be hard to backport them and switch to spaces.

> 
> > * replace __* varialbed with *__
> 
> s/varialbed/variable/
> 
> > * introduce generic version of xchg_* and cmpxchg_*.
> > 
> > Signed-off-by: Oleksii Kurochko 
> > ---
> > Changes in V4:
> >   - Code style fixes.
> >   - enforce in __xchg_*() has the same type for new and *ptr, also
> > "\n"
> >     was removed at the end of asm instruction.
> >   - dependency from
> > https://lore.kernel.org/xen-devel/cover.1706259490.git.federico.seraf...@bugseng.com/
> >   - switch from ASSERT_UNREACHABLE to STATIC_ASSERT_UNREACHABLE().
> >   - drop xchg32(ptr, x) and xchg64(ptr, x) as they aren't used.
> >   - drop cmpxcg{32,64}_{local} as they aren't used.
> >   - introduce generic version of xchg_* and cmpxchg_*.
> >   - update the commit message.
> > ---
> > Changes in V3:
> >   - update the commit message
> >   - add emulation of {cmp}xchg_... for 1 and 2 bytes types
> > ---
> > Changes in V2:
> >   - update the comment at the top of the header.
> >   - change xen/lib.h to xen/bug.h.
> >   - sort inclusion of headers properly.
> > ---
> >   xen/arch/riscv/include/asm/cmpxchg.h | 237
> > +++
> >   1 file changed, 237 insertions(+)
> >   create mode 100644 xen/arch/riscv/include/asm/cmpxchg.h
> > 
> > diff --git a/xen/arch/riscv/include/asm/cmpxchg.h
> > b/xen/arch/riscv/include/asm/cmpxchg.h
> > new file mode 100644
> > index 00..b751a50cbf
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/cmpxchg.h
> > @@ -0,0 +1,237 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (C) 2014 Regents of the University of California */
> > +
> > +#ifndef _ASM_RISCV_CMPXCHG_H
> > +#define _ASM_RISCV_CMPXCHG_H
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define ALIGN_DOWN(addr, size)  ((addr) & (~((size) - 1)))
> > +
> > +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier,
> > acquire_barrier) \
> > +({ \
> > +    asm volatile( \
> > +    release_barrier \
> > +    " amoswap" sfx " %0, %2, %1\n" \
> > +    acquire_barrier \
> > +    : "=r" (ret), "+A" (*ptr) \
> > +    : "r" (new) \
> > +    : "memory" ); \
> > +})
> > +
> > +#define emulate_xchg_1_2(ptr, new, ret, release_barrier,
> > acquire_barrier) \
> > +({ \
> > +    uint32_t *ptr_32b_aligned = (uint32_t *)ALIGN_DOWN((unsigned
> > long)ptr, 4); \
> > +    uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - sizeof(*ptr)))
> > * BITS_PER_BYTE; \
> > +    uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \
> > +    uint8_t mask_h = mask_l + mask_size - 1; \
> > +    unsigned long mask = GENMASK(mask_h, mask_l); \
> > +    unsigned long new_ = (unsigned long)(new) << mask_l; \
> > +    unsigned long ret_; \
> > +    unsigned long rc; \
> > +    \
> > +    asm volatile( \
> > +    release_barrier \
> > +    "0: lr.d %0, %2\n" \
> 
> I was going to ask why this is lr.d rather than lr.w. But I see Jan 
> already asked. I agree with him that it should probably be a lr.w and
> ...
> 
> > +    "   and  %1, %0, %z4\n" \
> > +    "   or   %1, %1, %z3\n" \
> > +    "   sc.d %1, %1, %2\n" \
> 
> ... respectively sc.w. The same applies for cmpxchg.

I agree that it would be better, and my initial attempt was to handle
4-byte or 8-byte boundary crossing during 2-byte access:

0 1 2 3 4 5 6 7 8
X X X 1 1 X X X X

In this case, if I align address 3 to address 0 and then read 4 bytes
instead of 8 bytes, I will not process the byte at address 4. This was
the reason why I started to read 8 bytes.

I also acknowledge that there could be an issue in the following case:

X  4094 4095 4096
X1   1X
In this situation, when I read 8 bytes, there could be an issue where
the new page (which starts at 4096) will not be mapped. It seems
correct in this case to check that variable is

[PATCH] tools/9pfsd: add missing va_end() in fill_data()

2024-02-19 Thread Juergen Gross

In xen-9pfsd fill_data() va_end() needs to be called before returning.

Coverity Id CID 1592145

Fixes: bcec59cf7ff4 ("tools/xen-9pfsd: add 9pfs version request support")
Signed-off-by: Juergen Gross 
---
 tools/9pfsd/io.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/tools/9pfsd/io.c b/tools/9pfsd/io.c
index ebc4102713..adb887c7d9 100644
--- a/tools/9pfsd/io.c
+++ b/tools/9pfsd/io.c
@@ -445,7 +445,7 @@ static int fill_data(struct ring *ring, const char *fmt, 
...)
 if ( !*f || array_sz )
 fmt_err(fmt);
 if ( !chk_data(ring, data, sizeof(uint16_t)) )
-return pars;
+goto out;
 array_sz = get_unaligned((uint16_t *)data);
 data += sizeof(uint16_t);
 *(unsigned int *)par = array_sz;
@@ -455,10 +455,10 @@ static int fill_data(struct ring *ring, const char *fmt, 
...)
 
 case 'b':
 if ( !chk_data(ring, data, sizeof(uint8_t)) )
-return pars;
+goto out;
 if ( !fill_data_elem(, array, _sz, sizeof(uint8_t),
  data) )
-return pars;
+goto out;
 data += sizeof(uint8_t);
 break;
 
@@ -466,48 +466,48 @@ static int fill_data(struct ring *ring, const char *fmt, 
...)
 if ( array_sz )
 fmt_err(fmt);
 if ( !chk_data(ring, data, sizeof(uint32_t)) )
-return pars;
+goto out;
 len = get_unaligned((uint32_t *)data);
 data += sizeof(uint32_t);
 *(unsigned int *)par = len;
 par = va_arg(ap, void *);
 if ( !chk_data(ring, data, len) )
-return pars;
+goto out;
 memcpy(par, data, len);
 data += len;
 break;
 
 case 'L':
 if ( !chk_data(ring, data, sizeof(uint64_t)) )
-return pars;
+goto out;
 if ( !fill_data_elem(, array, _sz, sizeof(uint64_t),
  data) )
-return pars;
+goto out;
 data += sizeof(uint64_t);
 break;
 
 case 'S':
 if ( !chk_data(ring, data, sizeof(uint16_t)) )
-return pars;
+goto out;
 len = get_unaligned((uint16_t *)data);
 data += sizeof(uint16_t);
 if ( !chk_data(ring, data, len) )
-return pars;
+goto out;
 str_off = add_string(ring, data, len);
 if ( str_off == ~0 )
-return pars;
+goto out;
 if ( !fill_data_elem(, array, _sz, sizeof(unsigned int),
  _off) )
-return pars;
+goto out;
 data += len;
 break;
 
 case 'U':
 if ( !chk_data(ring, data, sizeof(uint32_t)) )
-return pars;
+goto out;
 if ( !fill_data_elem(, array, _sz, sizeof(uint32_t),
  data) )
-return pars;
+goto out;
 data += sizeof(uint32_t);
 break;
 
@@ -520,6 +520,9 @@ static int fill_data(struct ring *ring, const char *fmt, 
...)
 pars++;
 }
 
+ out:
+va_end(ap);
+
 return pars;
 }
 
-- 
2.35.3

Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.3

On 19.02.2024 14:24, Federico Serafini wrote:
> Update ECLAIR configuration to consider safe switch clauses ending
> with __{get,put}_user_bad().
> 
> Update docs/misra/deviations.rst accordingly.
> 
> Signed-off-by: Federico Serafini 

As mentioned I'm not happy with this, not the least because of it being
unclear why these two would be deviated, when there's no sign of a
similar deviation for, say, __bad_atomic_size(). Imo this approach
doesn't scale, and that's already leaving aside that the purpose of
identically named (pseudo-)helpers could differ between architectures,
thus putting under question ...

> --- a/automation/eclair_analysis/ECLAIR/deviations.ecl
> +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
> @@ -368,6 +368,10 @@ safe."
>  -config=MC3R1.R16.3,reports+={safe, 
> "any_area(end_loc(any_exp(text(/BUG\\(\\);/"}
>  -doc_end
>  
> +-doc_begin="Switch clauses ending with constructs \"__get_user_bad()\" and 
> \"__put_user_bad()\" are safe: they denote an unreachable program point."
> +-config=MC3R1.R16.3,reports+={safe, 
> "any_area(end_loc(any_exp(text(/__(put|get)_user_bad\\(\\);/"}
> +-doc_end

... the global scope of such a deviation. While it may not be a good idea,
even within an arch such (pseudo-)helpers could be used for multiple
distinct purposes.

Jan

Re: [PATCH 1/6] xen/hvm: Convert hap_capabilities into a bitfield

On 06.02.2024 02:20, George Dunlap wrote:
> hvm_function_table is an internal structure; rather than manually
> |-ing and &-ing bits, just make it a boolean bitfield and let the
> compiler do all the work.  This makes everything easier to read, and
> presumably allows the compiler more flexibility in producing efficient
> code.
> 
> No functional change intended.
> 
> Signed-off-by: George Dunlap 

Reviewed-by: Jan Beulich 

> ---
> Questions:
> 
> * Should hap_superpage_2m really be set unconditionally, or should we
>   condition it on cpu_has_svm_npt?

That's HAP capabilities; there's not going to be any use of HAP when
there's no NPT (on an AMD system). IOW - all is fine as is, imo.

> * Do we really need to "!!cpu_has_svm_npt"?  If so, wouldn't it be
>   better to put the "!!"  in the #define, rather than requiring the
>   user to know that it's needed?

Considering that hap_supported is bool now, the !! can simply be
dropped. We've been doing so as code was touched anyway, not in a
concerted effort.

> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -113,8 +113,8 @@ static int cf_check parse_ept_param_runtime(const char *s)
>  int val;
>  
>  if ( !cpu_has_vmx_ept || !hvm_funcs.hap_supported ||
> - !(hvm_funcs.hap_capabilities &
> -   (HVM_HAP_SUPERPAGE_2MB | HVM_HAP_SUPERPAGE_1GB)) )
> + !(hvm_funcs.caps.hap_superpage_2mb ||
> +   hvm_funcs.caps.hap_superpage_1gb) )
>  {
>  printk("VMX: EPT not available, or not in use - ignoring\n");

Just to mention it: The conditional and the log message don't really
fit together. (I was first wondering what the 2mb/1gb checks had to
do here at all, but that's immediately clear when seeing that the
only sub-option here is "exec-sp".)

> @@ -104,8 +96,11 @@ struct hvm_function_table {
>  /* Hardware virtual interrupt delivery enable? */
>  bool virtual_intr_delivery_enabled;
>  
> -/* Indicate HAP capabilities. */
> -unsigned int hap_capabilities;
> +struct {
> +/* Indicate HAP capabilities. */
> +bool hap_superpage_1gb:1,
> +hap_superpage_2mb:1;

Nit: Would be nice imo if the two identifiers aligned vertically with
one another.

Jan

[XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.3

2024-02-19 Thread Federico Serafini

Update ECLAIR configuration to consider safe switch clauses ending
with __{get,put}_user_bad().

Update docs/misra/deviations.rst accordingly.

Signed-off-by: Federico Serafini 
---
 automation/eclair_analysis/ECLAIR/deviations.ecl | 4 
 docs/misra/deviations.rst| 6 ++
 2 files changed, 10 insertions(+)

diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl 
b/automation/eclair_analysis/ECLAIR/deviations.ecl
index fd32ff8a9c..831b5d4c67 100644
--- a/automation/eclair_analysis/ECLAIR/deviations.ecl
+++ b/automation/eclair_analysis/ECLAIR/deviations.ecl
@@ -368,6 +368,10 @@ safe."
 -config=MC3R1.R16.3,reports+={safe, 
"any_area(end_loc(any_exp(text(/BUG\\(\\);/"}
 -doc_end
 
+-doc_begin="Switch clauses ending with constructs \"__get_user_bad()\" and 
\"__put_user_bad()\" are safe: they denote an unreachable program point."
+-config=MC3R1.R16.3,reports+={safe, 
"any_area(end_loc(any_exp(text(/__(put|get)_user_bad\\(\\);/"}
+-doc_end
+
 -doc_begin="Switch clauses not ending with the break statement are safe if an
 explicit comment indicating the fallthrough intention is present."
 -config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(^(?s).*/\\* 
[fF]all ?through.? \\*/.*$,0..1"}
diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst
index 123c78e20a..58f4fac18e 100644
--- a/docs/misra/deviations.rst
+++ b/docs/misra/deviations.rst
@@ -307,6 +307,12 @@ Deviations related to MISRA C:2012 Rules:
  - Switch clauses ending with failure method \"BUG()\" are safe.
  - Tagged as `safe` for ECLAIR.
 
+   * - R16.3
+ - Switch clauses ending with constructs
+   \"__get_user_bad()\" and \"__put_user_bad()\" are safe:
+   they denote an unreachable program point.
+ - Tagged as `safe` for ECLAIR.
+
* - R16.3
  - Existing switch clauses not ending with the break statement are safe if
an explicit comment indicating the fallthrough intention is present.
-- 
2.34.1

Re: [PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure

On 19.02.2024 13:47, Stewart Hildebrand wrote:
> On 2/19/24 07:10, Jan Beulich wrote:
>> On 19.02.2024 12:47, Stewart Hildebrand wrote:
>>> @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>>>  {
>>>  unsigned int i;
>>>  
>>> +/*
>>> + * Assert that d->pdev_list doesn't change. 
>>> ASSERT_PDEV_LIST_IS_READ_LOCKED
>>> + * is not suitable here because it may allow either pcidevs_lock() or
>>> + * d->pci_lock to be held, but here we rely on d->pci_lock being held, 
>>> not
>>> + * pcidevs_lock().
>>> + */
>>> +ASSERT(rw_is_locked(>pdev->domain->pci_lock));
>>> +ASSERT(spin_is_locked(>pdev->vpci->lock));
>>
>> There's no "d" in sight here, so it's a little odd that "d" is being talked
>> about. But I guess people can infer what's meant without too much trouble.
> 
> I can s/d->pci_lock/msix->pdev->domain->pci_lock/ for the next rev.

Or simply drop the d-s? That would be better for readability's sake,
I think.

>>> @@ -313,17 +316,36 @@ void vpci_dump_msi(void)
>>>  {
>>>  /*
>>>   * On error vpci_msix_arch_print will always return 
>>> without
>>> - * holding the lock.
>>> + * holding the locks.
>>>   */
>>>  printk("unable to print all MSI-X entries: %d\n", rc);
>>> -process_pending_softirqs();
>>> -continue;
>>> +goto pdev_done;
>>>  }
>>>  }
>>>  
>>> +/*
>>> + * Unlock locks to process pending softirqs. This is
>>> + * potentially unsafe, as d->pdev_list can be changed in
>>> + * meantime.
>>> + */
>>>  spin_unlock(>vpci->lock);
>>> +read_unlock(>pci_lock);
>>> +pdev_done:
>>>  process_pending_softirqs();
>>> +if ( !read_trylock(>pci_lock) )
>>> +{
>>> +printk("unable to access other devices for the domain\n");
>>> +goto domain_done;
>>> +}
>>>  }
>>> +read_unlock(>pci_lock);
>>> +domain_done:
>>> +/*
>>> + * We need this label at the end of the loop, but some
>>> + * compilers might not be happy about label at the end of the
>>> + * compound statement so we adding an empty statement here.
>>> + */
>>> +;
>>
>> As to "some compilers": Are there any which accept a label not followed
>> by a statement? Depending on the answer, this comment may be viewed as
>> superfluous. Or else I'd ask about wording: Besides a grammar issue I
>> also don't view it as appropriate that a comment talks about "adding"
>> something when its adjacent code that is meant. That something is there
>> when the comment is there, hence respective wording should imo be used.
> 
> It seems like hit or miss whether gcc would accept it or not (prior
> discussion at [1]). I agree the comment is rather lengthy for what it's
> trying to convey. I'd be happy to either remove the comment or reduce
> it to:
> 
> domain_done:
> ; /* Empty statement to make some compilers happy */
> 
> [1] 
> https://lore.kernel.org/xen-devel/98b8c131-b0b9-f46c-5f46-c2136f2e3...@amd.com/

This earlier discussion only proves that there is at least one compiler
objecting. There's no proof there that any compiler exists which, as a
language extension, actually permits such syntax. Yet if the comment
was purely about normal language syntax, then imo it should be zapped
altogether, not just be shrunk.

Jan

Re: [RESEND RFC] kernel/ksysfs.c: restrict /sys/kernel/notes to root access

2024-02-19 Thread Guixiong Wei




On 2024/2/18 17:04, Kees Cook wrote:

On Sun, Feb 18, 2024 at 08:47:03AM +0100, Greg KH wrote:

On Sun, Feb 18, 2024 at 03:35:01PM +0800, Guixiong Wei wrote:

From: Guixiong Wei 

Restrict non-privileged user access to /sys/kernel/notes to
avoid security attack.

The non-privileged users have read access to notes. The notes
expose the load address of startup_xen. This address could be
used to bypass KASLR.

How can it be used to bypass it?

KASLR is, for local users, pretty much not an issue, as that's not what
it protects from, only remote ones.


For example, the startup_xen is built at 0x82465180 and
commit_creds is built at 0x810ad570 which could read from
the /boot/System.map. And the loaded address of startup_xen is
0xbc265180 which read from /sys/kernel/notes. So the loaded
address of commit_creds is 0xbc265180 - (0x82465180
  - 0x810ad570) = 0xbaead570.

I've cc: the hardening list on this, I'm sure the developers there have
opinions about this.

Oh eww, why is Xen spewing addresses into the notes section? (This must
be how it finds its entry point? But that would be before relocations
happen...)

But yes, I can confirm that relocations are done against the .notes
section at boot, so the addresses exposed in .notes is an immediate
KASLR offset exposure.

In /sys/kernel/notes (are there any tools to read this? I wrote my own...)

type: 1
name: Xen
desc: 0xb4a711c0 0x

which matches a privileged read of /proc/kallsysms:

b4a711c0 T startup_xen

(and the hypercall_page too)

There are all coming from arch/x86/xen/xen-head.S:

 ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS,   .asciz "linux")
 ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION,  .asciz "2.6")
 ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION,.asciz "xen-3.0")
#ifdef CONFIG_XEN_PV
 ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE,  _ASM_PTR __START_KERNEL_map)
 /* Map the p2m table to a 512GB-aligned user address. */
 ELFNOTE(Xen, XEN_ELFNOTE_INIT_P2M,   .quad (PUD_SIZE * 
PTRS_PER_PUD))
 ELFNOTE(Xen, XEN_ELFNOTE_ENTRY,  _ASM_PTR startup_xen)
...

Introduced in commit 5ead97c84fa7 ("xen: Core Xen implementation")

Exposed in commit da1a679cde9b ("Add /sys/kernel/notes")

Amazingly these both went in on the same release (v2.6.23, 2007). This
has been exposed for longer than KASLR has been upstream. :P


Signed-off-by: Guixiong Wei 
---
  kernel/ksysfs.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index b1292a57c2a5..09bc0730239b 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -199,7 +199,7 @@ static ssize_t notes_read(struct file *filp, struct kobject 
*kobj,
  static struct bin_attribute notes_attr __ro_after_init  = {
.attr = {
.name = "notes",
-   .mode = S_IRUGO,
+   .mode = S_IRUSR,
},
.read = _read,
  };

Yes please.

Reviewed-by: Kees Cook 

I wonder if we should also remove relocations that are aimed at the
.notes section for good measure? If that had already been true, this
would have just given the same info as System.map.
That's a good idea, but it depends on whether the user space tool can 
accept the remove relocation address.

No objection from me, but what userspace tool requires access to this
file today?  Will it break if permissions are changed on it?
From the exposed content, it seems that the main users are Xen-related 
tools. I add Xen list, developers should be able to provide some 
information.

And what about the module notes files?  If you change one, shouldn't you
change all?
From what I currently know, the module note files do not expose any 
kernel symbol address, so there is no need for modification.

Luckily all of _those_ contain what I'd expect: the Linux and
GNU.build-id notes, which are harmless. But if we're going to suddenly
have things appearing in here, let's make those root-only too.
Yes, but I also not sure whether the user space tools using this file 
can accept this permission modification.

Re: [PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure

2024-02-19 Thread Stewart Hildebrand

On 2/19/24 07:10, Jan Beulich wrote:
> On 19.02.2024 12:47, Stewart Hildebrand wrote:
>> @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>>  {
>>  unsigned int i;
>>  
>> +/*
>> + * Assert that d->pdev_list doesn't change. 
>> ASSERT_PDEV_LIST_IS_READ_LOCKED
>> + * is not suitable here because it may allow either pcidevs_lock() or
>> + * d->pci_lock to be held, but here we rely on d->pci_lock being held, 
>> not
>> + * pcidevs_lock().
>> + */
>> +ASSERT(rw_is_locked(>pdev->domain->pci_lock));
>> +ASSERT(spin_is_locked(>pdev->vpci->lock));
> 
> There's no "d" in sight here, so it's a little odd that "d" is being talked
> about. But I guess people can infer what's meant without too much trouble.

I can s/d->pci_lock/msix->pdev->domain->pci_lock/ for the next rev.

> 
>> @@ -313,17 +316,36 @@ void vpci_dump_msi(void)
>>  {
>>  /*
>>   * On error vpci_msix_arch_print will always return 
>> without
>> - * holding the lock.
>> + * holding the locks.
>>   */
>>  printk("unable to print all MSI-X entries: %d\n", rc);
>> -process_pending_softirqs();
>> -continue;
>> +goto pdev_done;
>>  }
>>  }
>>  
>> +/*
>> + * Unlock locks to process pending softirqs. This is
>> + * potentially unsafe, as d->pdev_list can be changed in
>> + * meantime.
>> + */
>>  spin_unlock(>vpci->lock);
>> +read_unlock(>pci_lock);
>> +pdev_done:
>>  process_pending_softirqs();
>> +if ( !read_trylock(>pci_lock) )
>> +{
>> +printk("unable to access other devices for the domain\n");
>> +goto domain_done;
>> +}
>>  }
>> +read_unlock(>pci_lock);
>> +domain_done:
>> +/*
>> + * We need this label at the end of the loop, but some
>> + * compilers might not be happy about label at the end of the
>> + * compound statement so we adding an empty statement here.
>> + */
>> +;
> 
> As to "some compilers": Are there any which accept a label not followed
> by a statement? Depending on the answer, this comment may be viewed as
> superfluous. Or else I'd ask about wording: Besides a grammar issue I
> also don't view it as appropriate that a comment talks about "adding"
> something when its adjacent code that is meant. That something is there
> when the comment is there, hence respective wording should imo be used.

It seems like hit or miss whether gcc would accept it or not (prior
discussion at [1]). I agree the comment is rather lengthy for what it's
trying to convey. I'd be happy to either remove the comment or reduce
it to:

domain_done:
; /* Empty statement to make some compilers happy */

[1] 
https://lore.kernel.org/xen-devel/98b8c131-b0b9-f46c-5f46-c2136f2e3...@amd.com/

> 
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -171,6 +171,19 @@ void pcidevs_lock(void);
>>  void pcidevs_unlock(void);
>>  bool __must_check pcidevs_locked(void);
>>  
>> +#ifndef NDEBUG
>> +/*
>> + * Check to ensure there will be no changes to the entries in d->pdev_list 
>> (but
>> + * not the contents of each entry).
>> + * This check is not suitable for protecting other state or critical 
>> regions.
>> + */
>> +#define ASSERT_PDEV_LIST_IS_READ_LOCKED(d)   \
>> +/* NB: d may be evaluated multiple times, or not at all */   \
>> +ASSERT(pcidevs_locked() || ((d) && rw_is_locked(&(d)->pci_lock)))
> 
> Is there actually any case where d can be NULL here?

Yes, when called from ns16550 driver, if the driver failed to make the
device RO.

> 
>> +#else
>> +#define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ({ (void)(d); })
> 
> Evaluating d here isn't very useful when the assertion expression doesn't
> guarantee single evaluation. Plus even if it needed evaluating, there would
> be no need to use a compiler extension here:
> 
> #define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ((void)(d))

OK, I can make this change.

Re: [PATCH v4 07/30] xen/asm-generic: introdure nospec.h

On 19.02.2024 12:59, Oleksii wrote:
> Hi Julien,
> 
> On Sun, 2024-02-18 at 18:30 +, Julien Grall wrote:
>> Hi Oleksii,
>>
>> Title: Typo s/introdure/introduce/
>>
>> On 05/02/2024 15:32, Oleksii Kurochko wrote:
>>> The  header is similar between Arm, PPC, and RISC-V,
>>> so it has been moved to asm-generic.
>>
>> I am not 100% convinced that moving this header to asm-generic is a
>> good 
>> idea. At least for Arm, those helpers ought to be non-empty, what
>> about 
>> RISC-V?
> For Arm, they are not taking any action, are they? There are no
> specific fences or other mechanisms inside
> evaluate_nospec()/block_speculation() to address speculation.

The question isn't the status quo, but how things should be looking like
if everything was in place that's (in principle) needed.

> For RISC-V, it can be implemented in a similar manner, at least for
> now. Since these functions are only used in the grant tables code ( for
> Arm and so for RISC-V ), which is not supported by RISC-V.

Same here - the question is whether long term, when gnttab is also
supported, RISC-V would get away without doing anything. Still ...

>> If the answer is they should be non-empty. Then I would consider to
>> keep 
>> the duplication to make clear that each architecture should take
>> their 
>> own decision in term of security.
>>
>> The alternative, is to have a generic implementation that is safe by 
>> default (if that's even possible).
> I am not certain that we can have a generic implementation, as each
> architecture may have specific speculation issues.

... it's theoretically possible that there'd be an arch with no
speculation issues, maybe simply because of not speculating.

Jan

Re: [PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure

On 19.02.2024 12:47, Stewart Hildebrand wrote:
> @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>  {
>  unsigned int i;
>  
> +/*
> + * Assert that d->pdev_list doesn't change. 
> ASSERT_PDEV_LIST_IS_READ_LOCKED
> + * is not suitable here because it may allow either pcidevs_lock() or
> + * d->pci_lock to be held, but here we rely on d->pci_lock being held, 
> not
> + * pcidevs_lock().
> + */
> +ASSERT(rw_is_locked(>pdev->domain->pci_lock));
> +ASSERT(spin_is_locked(>pdev->vpci->lock));

There's no "d" in sight here, so it's a little odd that "d" is being talked
about. But I guess people can infer what's meant without too much trouble.

> @@ -313,17 +316,36 @@ void vpci_dump_msi(void)
>  {
>  /*
>   * On error vpci_msix_arch_print will always return 
> without
> - * holding the lock.
> + * holding the locks.
>   */
>  printk("unable to print all MSI-X entries: %d\n", rc);
> -process_pending_softirqs();
> -continue;
> +goto pdev_done;
>  }
>  }
>  
> +/*
> + * Unlock locks to process pending softirqs. This is
> + * potentially unsafe, as d->pdev_list can be changed in
> + * meantime.
> + */
>  spin_unlock(>vpci->lock);
> +read_unlock(>pci_lock);
> +pdev_done:
>  process_pending_softirqs();
> +if ( !read_trylock(>pci_lock) )
> +{
> +printk("unable to access other devices for the domain\n");
> +goto domain_done;
> +}
>  }
> +read_unlock(>pci_lock);
> +domain_done:
> +/*
> + * We need this label at the end of the loop, but some
> + * compilers might not be happy about label at the end of the
> + * compound statement so we adding an empty statement here.
> + */
> +;

As to "some compilers": Are there any which accept a label not followed
by a statement? Depending on the answer, this comment may be viewed as
superfluous. Or else I'd ask about wording: Besides a grammar issue I
also don't view it as appropriate that a comment talks about "adding"
something when its adjacent code that is meant. That something is there
when the comment is there, hence respective wording should imo be used.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -171,6 +171,19 @@ void pcidevs_lock(void);
>  void pcidevs_unlock(void);
>  bool __must_check pcidevs_locked(void);
>  
> +#ifndef NDEBUG
> +/*
> + * Check to ensure there will be no changes to the entries in d->pdev_list 
> (but
> + * not the contents of each entry).
> + * This check is not suitable for protecting other state or critical regions.
> + */
> +#define ASSERT_PDEV_LIST_IS_READ_LOCKED(d)   \
> +/* NB: d may be evaluated multiple times, or not at all */   \
> +ASSERT(pcidevs_locked() || ((d) && rw_is_locked(&(d)->pci_lock)))

Is there actually any case where d can be NULL here?

> +#else
> +#define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ({ (void)(d); })

Evaluating d here isn't very useful when the assertion expression doesn't
guarantee single evaluation. Plus even if it needed evaluating, there would
be no need to use a compiler extension here:

#define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ((void)(d))

Jan

Re: [PATCH v4 07/30] xen/asm-generic: introdure nospec.h

Hi Julien,

On Sun, 2024-02-18 at 18:30 +, Julien Grall wrote:
> Hi Oleksii,
> 
> Title: Typo s/introdure/introduce/
> 
> On 05/02/2024 15:32, Oleksii Kurochko wrote:
> > The  header is similar between Arm, PPC, and RISC-V,
> > so it has been moved to asm-generic.
> 
> I am not 100% convinced that moving this header to asm-generic is a
> good 
> idea. At least for Arm, those helpers ought to be non-empty, what
> about 
> RISC-V?
For Arm, they are not taking any action, are they? There are no
specific fences or other mechanisms inside
evaluate_nospec()/block_speculation() to address speculation.

For RISC-V, it can be implemented in a similar manner, at least for
now. Since these functions are only used in the grant tables code ( for
Arm and so for RISC-V ), which is not supported by RISC-V.

> 
> If the answer is they should be non-empty. Then I would consider to
> keep 
> the duplication to make clear that each architecture should take
> their 
> own decision in term of security.
> 
> The alternative, is to have a generic implementation that is safe by 
> default (if that's even possible).
I am not certain that we can have a generic implementation, as each
architecture may have specific speculation issues.

~ Oleksii

Re: QEMU features useful for Xen development?

2024-02-19 Thread Peter Maydell

On Thu, 31 Aug 2023 at 11:32, Ayan Kumar Halder  wrote:
> On 31/08/2023 11:03, Peter Maydell wrote:
> > On Thu, 31 Aug 2023 at 10:53, Alex Bennée  wrote:
> >> Peter Maydell  writes:
> >>> On Thu, 31 Aug 2023 at 01:57, Stefano Stabellini  
> >>> wrote:
>  As Xen is gaining R52 and R82 support, it would be great to be able to
>  use QEMU for development and testing there as well, but I don't think
>  QEMU can emulate EL2 properly for the Cortex-R architecture. We would
>  need EL2 support in the GIC/timer for R52/R82 as well.

> >>> (What sort of board model would Xen want to use it with?)

> >> We already model a bunch of the mps2/mps3 images so I'm assuming adding
> >> the mps3-an536 would be a fairly simple step to do (mps2tz.c is mostly
> >> tweaking config values). The question is would it be a useful target for
> >> Xen?

> Yes, it will be helpful if Qemu can model this board. We have a
> downstream port of Xen on R52 (upstreaming is in progress).
>
> So, we can test the Qemu model with Xen.

Hi, all. I just wanted to provide an update on this. We've now
completed the mps3-an536 board model, and you can find it if
you check out the head-of-git QEMU. The new board will be in the
9.0 QEMU release, but if you have a chance to give it a spin now
we'll be able to fix any bugs or problems with it before the release.

The documentation for the board is here:
https://www.qemu.org/docs/master/system/arm/mps2.html
and it lists the limitations/missing features. (If any of those
are important let me know and we can look at scheduling the work
to fill them in.)

I'd also like to draw your attention to the note about the UART
ordering on the AN536; unfortunately the hardware setup is a bit
awkward here, so if you have an "I don't see any output" problem
make sure your guest is sending to the same UART you're asking
QEMU to show you the output from :-)

thanks
-- PMM

[PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure

2024-02-19 Thread Stewart Hildebrand

From: Oleksandr Andrushchenko 

Use the per-domain PCI read/write lock to protect the presence of the
pci device vpci field. This lock can be used (and in a few cases is used
right away) so that vpci removal can be performed while holding the lock
in write mode. Previously such removal could race with vpci_read for
example.

When taking both d->pci_lock and pdev->vpci->lock, they should be
taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
possible deadlock situations.

1. Per-domain's pci_lock is used to protect pdev->vpci structure
from being removed.

2. Writing the command register and ROM BAR register may trigger
modify_bars to run, which in turn may access multiple pdevs while
checking for the existing BAR's overlap. The overlapping check, if
done under the read lock, requires vpci->lock to be acquired on both
devices being compared, which may produce a deadlock. It is not
possible to upgrade read lock to write lock in such a case. So, in
order to prevent the deadlock, use d->pci_lock in write mode instead.

All other code, which doesn't lead to pdev->vpci destruction and does
not access multiple pdevs at the same time, can still use a
combination of the read lock and pdev->vpci->lock.

3. Drop const qualifier where the new rwlock is used and this is
appropriate.

4. Do not call process_pending_softirqs with any locks held. For that
unlock prior the call and re-acquire the locks after. After
re-acquiring the lock there is no need to check if pdev->vpci exists:
 - in apply_map because of the context it is called (no race condition
   possible)
 - for MSI/MSI-X debug code because it is called at the end of
   pdev->vpci access and no further access to pdev->vpci is made

5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
while accessing pdevs in vpci code.

6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
do not go away. The vPCI functions call several MSI-related functions
which already have existing non-vPCI callers. Change those MSI-related
functions to allow using either pcidevs_lock() or d->pci_lock for
ensuring pdevs do not go away. Holding d->pci_lock in read mode is
sufficient. Note that this pdev protection mechanism does not protect
other state or critical sections. These MSI-related functions already
have other race condition and state protection mechanims (e.g.
d->event_lock and msixtbl RCU), so we deduce that the use of the global
pcidevs_lock() is to ensure that pdevs do not go away.

7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
that pdevs do not go away. The purpose of this wrapper is to aid
readability and document the intent of the pdev protection mechanism.

8. When possible, the existing non-vPCI callers of these MSI-related
functions haven't been switched to use the newly introduced per-domain
pci_lock, and will continue to use the global pcidevs_lock(). This is
done to reduce the risk of the new locking scheme introducing
regressions. Those users will be adjusted in due time. One exception
is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to
the caller, physdev_map_pirq(): this instance is switched to
read_lock(>pci_lock) right away.

Suggested-by: Roger Pau Monné 
Suggested-by: Jan Beulich 
Signed-off-by: Oleksandr Andrushchenko 
Signed-off-by: Volodymyr Babchuk 
Signed-off-by: Stewart Hildebrand 
---
Changes in v13.2:
 - clarify comment in vpci_msix_arch_print()
 - add parenthesis around arg in macro definition
 - make the helper macro an ASSERT itself

Changes in v13.1:
 - move pdev_list_is_read_locked() to pci.h
 - use pdev_list_is_read_locked() in more places
 - use d directly in pdev_list_is_read_locked()
 - wrap pdev_list_is_read_locked() helper in #ifndef NDEBUG, and add
   declaration in the #else case with no implementation
 - replace pcidevs_lock() with read_lock(>pci_lock) in physdev.c

Changes in v13:
 - hold off adding Roger's R-b tag even though it was provided on v12.2
 - use a wrapper construct to ease readability of odd-looking ASSERTs
 - new placement of ASSERT in __pci_enable_msix(), __pci_enable_msi(),
   and pci_enable_msi(). Rearrange/add pdev NULL check.
 - expand commit description with details about using either
   pcidevs_lock() or d->pci_lock

Changes in v12.2:
 - drop Roger's R-b
 - drop both locks on error paths in vpci_msix_arch_print()
 - add another ASSERT in vpci_msix_arch_print(), to enforce the
   expectation both locks are held before calling vpci_msix_arch_print()
 - move pdev_done label in vpci_dump_msi()
 - update comments in vpci_dump_msi() to say locks (plural)

Changes in v12.1:
 - use read_trylock() in vpci_msix_arch_print()
 - fixup in-code comments (revert double space, use DomXEN) in
   vpci_{read,write}()
 - minor updates in commit message
 - add Roger's R-b

Changes in v12:
 - s/pci_rwlock/pci_lock/ in commit message
 - expand comment about scope of pci_lock in sched.h
 - in vpci_{read,write}, if hwdom is trying to access a device assigned

[PATCH v4] Reduce assembly code size of entry points

2024-02-19 Thread Frediano Ziglio

On many entries we push 8-bytes zero and exception constants are
small so we can just write a single byte saving 3 bytes for
instruction.
With ENDBR64 this reduces the size of many entry points from 32 to
16 bytes (due to alignment).
The push and the mov are overlapping stores either way.  Swapping
between movl and movb will make no difference at all on performance.
Similar code is already used in autogen_stubs.

Signed-off-by: Frediano Ziglio 
Reviewed-by: Jan Beulich 
--
v4:
- minor space adjustments;
- quoted possible error message;
- update commit subject removing "exception";
- added "Reviewed-by", I hope to got everything asked for.

v3:
- add other missing entries;
- reduce code when TRAP_syscall is used;
- improved commit message.

v2:
- added missing entry points;
- add mention to autogen_stubs code, as suggested.
---
 xen/arch/x86/x86_64/entry.S | 66 ++---
 1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index ecdd6e5b47..a7bd8f0ca5 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -22,6 +22,14 @@
 #endif
 .endm
 
+.macro BUILD_BUG_ON condstr cond:vararg
+.if \cond
+.error "Condition \"\condstr\" not satisfied"
+.endif
+.endm
+/* preprocessor macro to make error message more user friendly */
+#define BUILD_BUG_ON(cond) BUILD_BUG_ON #cond cond
+
 #ifdef CONFIG_PV
 /* %rbx: struct vcpu */
 FUNC_LOCAL(switch_to_kernel)
@@ -187,7 +195,8 @@ FUNC_LOCAL(restore_all_guest)
 SPEC_CTRL_EXIT_TO_PV/* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: 
cd */
 
 RESTORE_ALL
-testw $TRAP_syscall,4(%rsp)
+BUILD_BUG_ON(TRAP_syscall & 0xff)
+testb $TRAP_syscall >> 8, 4+1(%rsp)
 jziret_exit_to_guest
 
 movq  24(%rsp),%r11   # RFLAGS
@@ -254,7 +263,8 @@ FUNC(lstar_enter)
 pushq $FLAT_KERNEL_CS64
 pushq %rcx
 pushq $0
-movl  $TRAP_syscall, 4(%rsp)
+BUILD_BUG_ON(TRAP_syscall & 0xff)
+movb  $TRAP_syscall >> 8, 4+1(%rsp)
 SAVE_ALL
 
 SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
@@ -292,7 +302,8 @@ FUNC(cstar_enter)
 pushq $FLAT_USER_CS32
 pushq %rcx
 pushq $0
-movl  $TRAP_syscall, 4(%rsp)
+BUILD_BUG_ON(TRAP_syscall & 0xff)
+movb  $TRAP_syscall >> 8, 4+1(%rsp)
 SAVE_ALL
 
 SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
@@ -334,7 +345,8 @@ LABEL(sysenter_eflags_saved, 0)
 pushq $3 /* ring 3 null cs */
 pushq $0 /* null rip */
 pushq $0
-movl  $TRAP_syscall, 4(%rsp)
+BUILD_BUG_ON(TRAP_syscall & 0xff)
+movb  $TRAP_syscall >> 8, 4+1(%rsp)
 SAVE_ALL
 
 SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
@@ -389,7 +401,7 @@ FUNC(entry_int80)
 ENDBR64
 ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 pushq $0
-movl  $0x80, 4(%rsp)
+movb  $0x80, 4(%rsp)
 SAVE_ALL
 
 SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
@@ -561,7 +573,8 @@ __UNLIKELY_END(create_bounce_frame_bad_sp)
 /* Rewrite our stack frame and return to guest-OS mode. */
 /* IA32 Ref. Vol. 3: TF, VM, RF and NT flags are cleared on trap. */
 /* Also clear AC: alignment checks shouldn't trigger in kernel mode. */
-orl   $TRAP_syscall,UREGS_entry_vector+8(%rsp)
+BUILD_BUG_ON(TRAP_syscall & 0xff)
+orb   $TRAP_syscall >> 8, UREGS_entry_vector+8+1(%rsp)
 andl  $~(X86_EFLAGS_AC|X86_EFLAGS_VM|X86_EFLAGS_RF|\
  X86_EFLAGS_NT|X86_EFLAGS_TF),UREGS_eflags+8(%rsp)
 movq  $FLAT_KERNEL_SS,UREGS_ss+8(%rsp)
@@ -653,7 +666,7 @@ END(ret_from_intr)
 .section .init.text, "ax", @progbits
 FUNC(early_page_fault)
 ENDBR64
-movl  $X86_EXC_PF, 4(%rsp)
+movb  $X86_EXC_PF, 4(%rsp)
 SAVE_ALL
 movq  %rsp, %rdi
 call  do_early_page_fault
@@ -722,7 +735,7 @@ END(common_interrupt)
 
 FUNC(entry_PF)
 ENDBR64
-movl  $X86_EXC_PF, 4(%rsp)
+movb  $X86_EXC_PF, 4(%rsp)
 END(entry_PF)
 /* No special register assumptions. */
 FUNC(handle_exception, 0)
@@ -898,105 +911,106 @@ END(handle_exception)
 FUNC(entry_DE)
 ENDBR64
 pushq $0
-movl  $X86_EXC_DE, 4(%rsp)
+/* no need to update exception type, already 0 */
+BUILD_BUG_ON(X86_EXC_DE)
 jmp   handle_exception
 END(entry_DE)
 
 FUNC(entry_MF)
 ENDBR64
 pushq $0
-movl  $X86_EXC_MF, 4(%rsp)
+movb  $X86_EXC_MF, 4(%rsp)
 jmp   handle_exception
 END(entry_MF)
 
 FUNC(entry_XM)
 ENDBR64
 pushq $0
-movl  $X86_EXC_XM, 4(%rsp)
+movb  $X86_EXC_XM, 4(%rsp)
 jmp   handle_exception
 END(entry_XM)
 
 FUNC(entry_NM)
 ENDBR64

Re: [PATCH v9 7/7] xen/asm-generic: fold struct devarch into struct dev

On 16.02.2024 13:39, Oleksii Kurochko wrote:
> The current patch is a follow-up to the patch titled:
> xen/asm-generic: introduce generic device.h
> Also, a prerequisite for this patch is, without which a compilation
> error will occur:
> xen/arm: switch Arm to use asm-generic/device.h
> 
> The 'struct dev_archdata' is exclusively used within 'struct device',
> so it could be merged into 'struct device.'
> 
> After the merger, it is necessary to update the 'dev_archdata()'
> macros and the comments above 'struct arm_smmu_xen_device' in
> drivers/passthrough/arm/smmu.c.
> Additionally, it is required to update instances of
> "dev->archdata->iommu" to "dev->iommu".
> 
> Suggested-by: Julien Grall 
> Signed-off-by: Oleksii Kurochko 
> ---
>   This patch can be merged with patches 4 and 5 of this patch series.
> ---
> Changes in V9:
>  - newly introduced patch.
> ---
>  xen/drivers/passthrough/arm/smmu.c | 12 ++--
>  xen/include/asm-generic/device.h   |  8 +---
>  2 files changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/arm/smmu.c 
> b/xen/drivers/passthrough/arm/smmu.c
> index 32e2ff279b..4a272c8779 100644
> --- a/xen/drivers/passthrough/arm/smmu.c
> +++ b/xen/drivers/passthrough/arm/smmu.c
> @@ -227,9 +227,9 @@ struct arm_smmu_xen_domain {
>  };
>  
>  /*
> - * Xen: Information about each device stored in dev->archdata.iommu
> + * Xen: Information about each device stored in dev->iommu
>   *
> - * Initially dev->archdata.iommu only stores the iommu_domain (runtime
> + * Initially dev->iommu only stores the iommu_domain (runtime
>   * configuration of the SMMU) but, on Xen, we also have to store the
>   * iommu_group (list of streamIDs associated to the device).
>   *
> @@ -242,7 +242,7 @@ struct arm_smmu_xen_device {
>   struct iommu_group *group;
>  };
>  
> -#define dev_archdata(dev) ((struct arm_smmu_xen_device *)dev->archdata.iommu)
> +#define dev_archdata(dev) ((struct arm_smmu_xen_device *)dev->iommu)

I find in particular the naming here odd. But I'll let Julien judge whether
this really is along the lines of what he suggested.

Jan

Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h

On 15.02.2024 14:41, Oleksii wrote:
>>> +    : "=r" (ret), "+A" (*ptr) \
>>> +    : "r" (new) \
>>> +    : "memory" ); \
>>> +})
>>> +
>>> +#define emulate_xchg_1_2(ptr, new, ret, release_barrier,
>>> acquire_barrier) \
>>> +({ \
>>> +    uint32_t *ptr_32b_aligned = (uint32_t *)ALIGN_DOWN((unsigned
>>> long)ptr, 4); \
>>
>> You now appear to assume that this macro is only used with inputs not
>> crossing word boundaries. That's okay as long as suitably guaranteed
>> at the use sites, but imo wants saying in a comment.
>>
>>> +    uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - sizeof(*ptr)))
>>> * BITS_PER_BYTE; \
>>
>> Why 0x8 (i.e. spanning 64 bits), not 4 (matching the uint32_t use
>> above)?
> The idea to read 8 bytes was to deal with crossing word boundary. So if
> our address is 0x3 and we have to xchg() 2 bytes, what will cross 4
> byte boundary. Instead we align add 0x3, so it will become 0x0 and then
> just always work with 8 bytes.

Then what if my 2-byte access crosses a dword boundary? A cache line
one? A page one?

>>> +    unsigned long new_ = (unsigned long)(new) << mask_l; \
>>> +    unsigned long ret_; \
>>> +    unsigned long rc; \
>>
>> Similarly, why unsigned long here?
> sizeof(unsigned long) is 8 bytes and it was chosen as we are working
> with lc/sc.d which are working with 8 bytes.
> 
>>
>> I also wonder about the mix of underscore suffixed (or not) variable
>> names here.
> If the question about ret_, then the same as before size of ret
> argument of the macros will be 1 or 2, but {lc/sc}.d expected to work
> with 8 bytes.

Then what's the uint32_t * about?

>>> +    release_barrier \
>>> +    "0: lr.d %0, %2\n" \
>>
>> Even here it's an 8-byte access. Even if - didn't check - the insn
>> was
>> okay to use with just a 4-byte aligned pointer, wouldn't it make
>> sense
>> then to 8-byte align it, and be consistent throughout this macro wrt
>> the base unit acted upon? Alternatively, why not use lr.w here, thus
>> reducing possible collisions between multiple CPUs accessing the same
>> cache line?
> According to the docs:
> LR and SC operate on naturally-aligned 64-bit (RV64 only) or 32-bit
> words in memory. Misaligned
> addresses will generate misaligned address exceptions.
> 
> My intention was to deal with 4-byte crossing boundary. so if ptr is 4-
> byte aligned then by reading 8-bytes we shouldn't care about boundary
> crossing, if I am not missing something.

If a ptr is 4-byte aligned, there's no point reading more than 4 bytes.

>>> +    "   and  %1, %0, %z4\n" \
>>> +    "   or   %1, %1, %z3\n" \
>>> +    "   sc.d %1, %1, %2\n" \
>>> +    "   bnez %1, 0b\n" \
>>> +    acquire_barrier \
>>> +    : "=" (ret_), "=" (rc), "+A" (*ptr_32b_aligned) \
>>> +    : "rJ" (new_), "rJ" (~mask) \
>>
>> I think that as soon as there are more than 2 or maybe 3 operands,
>> legibility is vastly improved by using named asm() operands.
> Just to clarify you mean that it would be better to use instead of %0
> use names?

Yes. Just like you have it in one of the other patches that I looked at
later.

>>> +    : "memory"); \
>>
>> Nit: Missing blank before closing parenthesis.
>>
>>> +    \
>>> +    ret = (__typeof__(*(ptr)))((ret_ & mask) >> mask_l); \
>>> +})
>>
>> Why does "ret" need to be a macro argument? If you had only the
>> expression here, not the the assigment, ...
>>
>>> +#define __xchg_generic(ptr, new, size, sfx, release_barrier,
>>> acquire_barrier) \
>>> +({ \
>>> +    __typeof__(ptr) ptr__ = (ptr); \
>>
>> Is this local variable really needed? Can't you use "ptr" directly
>> in the three macro invocations?
>>
>>> +    __typeof__(*(ptr)) new__ = (new); \
>>> +    __typeof__(*(ptr)) ret__; \
>>> +    switch (size) \
>>> +    { \
>>> +    case 1: \
>>> +    case 2: \
>>> +    emulate_xchg_1_2(ptr__, new__, ret__, release_barrier,
>>> acquire_barrier); \
>>
>> ... this would become
>>
>>     ret__ = emulate_xchg_1_2(ptr__, new__, release_barrier,
>> acquire_barrier); \
>>
>> But, unlike assumed above, there's no enforcement here that a 2-byte
>> quantity won't cross a word, double-word, cache line, or even page
>> boundary. That might be okay if then the code would simply crash
>> (like
>> the AMO insns emitted further down would), but aiui silent
>> misbehavior
>> would result.
> As I mentioned above with 4-byte alignment and then reading and working
> with 8-byte then crossing a word or double-word boundary shouldn't be
> an issue.
> 
> I am not sure that I know how to check that we are crossing cache line
> boundary.
> 
> Regarding page boundary, if the next page is mapped then all should
> work fine, otherwise it will be an exception.

Are you sure lr.d / sc.d are happy to access across such a boundary,
when both pages are mapped?

To me it seems pretty clear that for atomic accesses you want to
demand natural alignment, i.e. 2-byte alignment for 2-byte accesses.
This way you can be sure no potentially problematic

Re: [PATCH] ns16550: add Asix AX99100 serial card

On Mon, Feb 19, 2024 at 09:57:49AM +0100, Jan Beulich wrote:
> On 18.02.2024 02:34, Marek Marczykowski-Górecki wrote:
> > @@ -1170,6 +1177,11 @@ static const struct ns16550_config __initconst 
> > uart_config[] =
> >  .dev_id = 0x7adc,
> >  .param = param_intel_lpss
> >  },
> > +{
> > +.vendor_id = PCI_VENDOR_ID_ASIX,
> > +.dev_id = 9100,
> 
> As per Linux this is 0x9100.

Right... but then, maybe the patch isn't needed at all, as it does work
for me. Maybe what's needed instead is some other patch already in
staging. Initial attempt that did not work was with 4.17.something.
I guess setting the fifo size isn't that important.

> > +.param = param_asix_ax99100
> > +},
> >  };
> >  
> >  static int __init
> > diff --git a/xen/include/xen/pci_ids.h b/xen/include/xen/pci_ids.h
> > index e798477a7e23..2a19f4ab7872 100644
> > --- a/xen/include/xen/pci_ids.h
> > +++ b/xen/include/xen/pci_ids.h
> > @@ -11,3 +11,5 @@
> >  #define PCI_VENDOR_ID_BROADCOM   0x14e4
> >  
> >  #define PCI_VENDOR_ID_INTEL  0x8086
> > +
> > +#define PCI_VENDOR_ID_ASIX   0x125b
> 
> Please insert such that numeric sorting is retained.
> 
> Jan

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: Stats on Xen tarball downloads

On 19.02.2024 11:31, Roger Pau Monné wrote:
> On Mon, Feb 19, 2024 at 06:01:54PM +0800, George Dunlap wrote:
>> One of the questions we had with respect to changing our release
>> practice (for instance, making the process more light-weight so that
>> we could do a point release after every XSA) was, "How many people are
>> actually using the tarballs?"
> 
> What would this more lightweight process involve from a downstream
> PoV?  IOW: in what would the contents of the tarball change compared
> to the current releases?

>From all prior discussion my conclusion was "no tarball at all".

Jan

Re: Stats on Xen tarball downloads

2024-02-19 Thread Roger Pau Monné

On Mon, Feb 19, 2024 at 06:01:54PM +0800, George Dunlap wrote:
> Hey all,
> 
> One of the questions we had with respect to changing our release
> practice (for instance, making the process more light-weight so that
> we could do a point release after every XSA) was, "How many people are
> actually using the tarballs?"

What would this more lightweight process involve from a downstream
PoV?  IOW: in what would the contents of the tarball change compared
to the current releases?

> I finally got access (again) to
> downloads.xenproject.org, and took a look at the logs.  It appears
> that we only keep about 2 weeks of logs there.
> 
> Short answer: It's pretty clear from looking at the logs that there
> are large numbers of automated build systems building various versions
> of Xen from tarballs.  It *looks* like there are over 300 people a
> week downloading 4.18.0 specifically from various web browsers.

As someone who packages Xen for FreeBSD, I've recently switched the
build to use the git sources directly, as otherwise keeping up with
XSA tends to be a pain, specially when XSAs happen to depend on the
context of some of the backports that happened between the point
release and the XSA disclosure.

Overall as a consumer of Xen it would be helpful if we could make a
release for each (batch) or XSAs, as that would possibly make me
switch to build from the release tarballs instead of git.

I don't think it would be much of a disruption if such change to
generate more lightweight tarball is done starting from a major
release (ie: 4.19) and minor releases of previous versions (4.18.x)
are kept using the non-lightweight process.

Thanks, Roger.

[xen-unstable test] 184699: tolerable FAIL

flight 184699 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184699/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184693
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184693
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184695
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184695
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184695
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184695
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184695
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184695
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184695
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184695
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184695
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184695
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  0441c3acc7e9e72e984ce49d32e61827894ae4a3
baseline version:
 xen  0441c3acc7e9e72e984ce49d32e61827894ae4a3

Last test of basis   184699  2024-02-19 01:52:12 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass

Re: [PATCH v3] Reduce assembly code size of exception entry points

On 16.02.2024 11:50, Frediano Ziglio wrote:
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -22,6 +22,14 @@
>  #endif
>  .endm
>  
> +.macro BUILD_BUG_ON condstr cond:vararg
> +.if \cond
> +.error "Condition \condstr not satisfied"

Maybe

.error "Condition \"\condstr\" not satisfied"

?

> @@ -187,7 +195,8 @@ FUNC_LOCAL(restore_all_guest)
>  SPEC_CTRL_EXIT_TO_PV/* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: 
> cd */
>  
>  RESTORE_ALL
> -testw $TRAP_syscall,4(%rsp)
> +BUILD_BUG_ON(TRAP_syscall & 0xff)
> +testb $TRAP_syscall >> 8,4+1(%rsp)
>  jziret_exit_to_guest

Nit: Blank after comma please (and again elsewhere).

Preferably with both adjustments (which I'd be okay making while
committing, so long as you agree specifically on the former)
Reviewed-by: Jan Beulich 

That said, especially with this not being an entry point path, and
neither this nor ...

> @@ -254,7 +263,8 @@ FUNC(lstar_enter)
>  pushq $FLAT_KERNEL_CS64
>  pushq %rcx
>  pushq $0
> -movl  $TRAP_syscall, 4(%rsp)
> +BUILD_BUG_ON(TRAP_syscall & 0xff)
> +movb  $TRAP_syscall >> 8, 4+1(%rsp)
>  SAVE_ALL
>  
>  SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd 
> */
> @@ -292,7 +302,8 @@ FUNC(cstar_enter)
>  pushq $FLAT_USER_CS32
>  pushq %rcx
>  pushq $0
> -movl  $TRAP_syscall, 4(%rsp)
> +BUILD_BUG_ON(TRAP_syscall & 0xff)
> +movb  $TRAP_syscall >> 8, 4+1(%rsp)
>  SAVE_ALL
>  
>  SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd 
> */
> @@ -334,7 +345,8 @@ LABEL(sysenter_eflags_saved, 0)
>  pushq $3 /* ring 3 null cs */
>  pushq $0 /* null rip */
>  pushq $0
> -movl  $TRAP_syscall, 4(%rsp)
> +BUILD_BUG_ON(TRAP_syscall & 0xff)
> +movb  $TRAP_syscall >> 8, 4+1(%rsp)
>  SAVE_ALL

... any of these being exception entry point paths (and hence none of
these changes being related to the subject), it would likely have been
a good idea to split this into two patches.

Jan

Re: [PATCH v3] x86/vmx: add support for virtualize SPEC_CTRL

On 15.02.2024 17:46, Roger Pau Monne wrote:
> The feature is defined in the tertiary exec control, and is available starting
> from Sapphire Rapids and Alder Lake CPUs.
> 
> When enabled, two extra VMCS fields are used: SPEC_CTRL mask and shadow.  Bits
> set in mask are not allowed to be toggled by the guest (either set or clear)
> and the value in the shadow field is the value the guest expects to be in the
> SPEC_CTRL register.
> 
> By using it the hypervisor can force the value of SPEC_CTRL bits behind the
> guest back without having to trap all accesses to SPEC_CTRL, note that no bits
> are forced into the guest as part of this patch.  It also allows getting rid 
> of
> SPEC_CTRL in the guest MSR load list, since the value in the shadow field will
> be loaded by the hardware on vmentry.
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich

Re: [PATCH] ns16550: add Asix AX99100 serial card

On 18.02.2024 02:34, Marek Marczykowski-Górecki wrote:
> @@ -1170,6 +1177,11 @@ static const struct ns16550_config __initconst 
> uart_config[] =
>  .dev_id = 0x7adc,
>  .param = param_intel_lpss
>  },
> +{
> +.vendor_id = PCI_VENDOR_ID_ASIX,
> +.dev_id = 9100,

As per Linux this is 0x9100.

> +.param = param_asix_ax99100
> +},
>  };
>  
>  static int __init
> diff --git a/xen/include/xen/pci_ids.h b/xen/include/xen/pci_ids.h
> index e798477a7e23..2a19f4ab7872 100644
> --- a/xen/include/xen/pci_ids.h
> +++ b/xen/include/xen/pci_ids.h
> @@ -11,3 +11,5 @@
>  #define PCI_VENDOR_ID_BROADCOM   0x14e4
>  
>  #define PCI_VENDOR_ID_INTEL  0x8086
> +
> +#define PCI_VENDOR_ID_ASIX   0x125b

Please insert such that numeric sorting is retained.

Jan

Re: [PATCH 5/5] mm: add the __must_check attribute to {gfn,mfn}_add()

On 19.02.2024 06:07, George Dunlap wrote:
> On Wed, Feb 14, 2024 at 7:42 PM Roger Pau Monne  wrote:
>>
>> It's not obvious from the function itself whether the incremented value will 
>> be
>> stored in the parameter, or returned to the caller.  That has leads to bugs 
>> in
>> the past as callers assume the incremented value is stored in the parameter.
>>
>> Add the __must_check attribute to the function to easily spot callers that
>> don't consume the returned value, which signals an error in the caller logic.
>>
>> No functional change intended.
> 
> Just thinking out loud here... I wonder if "mfn_plus()" would be less
> likely to be misunderstood.  Catching this during compile is def
> better than catching it w/ Coverity or Eclair, but even better to help
> people not make the mistake in the first place.

To me while mfn_plus() would indeed be somewhat less likely to cause this
kind of confusion, I don't think that naming would entirely eliminate the
risk.

I was actually considering to add mfn_inc() for the common case of
incrementing by 1, yet I can't see an alternative name avoiding the issue
there.

Jan

Re: [PATCH v13.1 01/14] vpci: use per-domain PCI lock to protect vpci structure

2024-02-19 Thread Roger Pau Monné

On Fri, Feb 16, 2024 at 09:41:19AM -0500, Stewart Hildebrand wrote:
> On 2/16/24 06:44, Roger Pau Monné wrote:
> > On Thu, Feb 15, 2024 at 03:30:00PM -0500, Stewart Hildebrand wrote:
> >> From: Oleksandr Andrushchenko 
> >>
> >> Use the per-domain PCI read/write lock to protect the presence of the
> >> pci device vpci field. This lock can be used (and in a few cases is used
> >> right away) so that vpci removal can be performed while holding the lock
> >> in write mode. Previously such removal could race with vpci_read for
> >> example.
> >>
> >> When taking both d->pci_lock and pdev->vpci->lock, they should be
> >> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
> >> possible deadlock situations.
> >>
> >> 1. Per-domain's pci_lock is used to protect pdev->vpci structure
> >> from being removed.
> >>
> >> 2. Writing the command register and ROM BAR register may trigger
> >> modify_bars to run, which in turn may access multiple pdevs while
> >> checking for the existing BAR's overlap. The overlapping check, if
> >> done under the read lock, requires vpci->lock to be acquired on both
> >> devices being compared, which may produce a deadlock. It is not
> >> possible to upgrade read lock to write lock in such a case. So, in
> >> order to prevent the deadlock, use d->pci_lock in write mode instead.
> >>
> >> All other code, which doesn't lead to pdev->vpci destruction and does
> >> not access multiple pdevs at the same time, can still use a
> >> combination of the read lock and pdev->vpci->lock.
> >>
> >> 3. Drop const qualifier where the new rwlock is used and this is
> >> appropriate.
> >>
> >> 4. Do not call process_pending_softirqs with any locks held. For that
> >> unlock prior the call and re-acquire the locks after. After
> >> re-acquiring the lock there is no need to check if pdev->vpci exists:
> >>  - in apply_map because of the context it is called (no race condition
> >>possible)
> >>  - for MSI/MSI-X debug code because it is called at the end of
> >>pdev->vpci access and no further access to pdev->vpci is made
> >>
> >> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
> >> while accessing pdevs in vpci code.
> >>
> >> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
> >> do not go away. The vPCI functions call several MSI-related functions
> >> which already have existing non-vPCI callers. Change those MSI-related
> >> functions to allow using either pcidevs_lock() or d->pci_lock for
> >> ensuring pdevs do not go away. Holding d->pci_lock in read mode is
> >> sufficient. Note that this pdev protection mechanism does not protect
> >> other state or critical sections. These MSI-related functions already
> >> have other race condition and state protection mechanims (e.g.
> >> d->event_lock and msixtbl RCU), so we deduce that the use of the global
> >> pcidevs_lock() is to ensure that pdevs do not go away.
> >>
> >> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
> >> that pdevs do not go away. The purpose of this wrapper is to aid
> >> readability and document the intent of the pdev protection mechanism.
> >>
> >> 8. When possible, the existing non-vPCI callers of these MSI-related
> >> functions haven't been switched to use the newly introduced per-domain
> >> pci_lock, and will continue to use the global pcidevs_lock(). This is
> >> done to reduce the risk of the new locking scheme introducing
> >> regressions. Those users will be adjusted in due time. One exception
> >> is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to
> >> the caller, physdev_map_pirq(): this instance is switched to
> >> read_lock(>pci_lock) right away.
> >>
> >> Suggested-by: Roger Pau Monné 
> >> Suggested-by: Jan Beulich 
> >> Signed-off-by: Oleksandr Andrushchenko 
> >> Signed-off-by: Volodymyr Babchuk 
> >> Signed-off-by: Stewart Hildebrand 
> > 
> > A couple of questions and the pdev_list_is_read_locked() needs a small
> > adjustment.
> > 
> >> @@ -895,6 +891,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
> >>  {
> >>  unsigned int i;
> >>  
> >> +/*
> >> + * Assert that d->pdev_list doesn't change. 
> >> pdev_list_is_read_locked() is
> >> + * not suitable here because we may 
> >> read_unlock(>domain->pci_lock)
> >> + * before returning.
> > 
> > I'm confused by this comment, as I don't see why it matters that the
> > lock might be lock before returning.  We need to ensure the lock is
> > taken at the time of the assert, and hence pdev_list_is_read_locked()
> > can be used.
> 
> pdev_list_is_read_locked() currently would allow either pcidevs_lock()
> or d->pci_lock. If vpci_msix_arch_print() is entered with only
> pcidevs_lock() held, we may end up unlocking d->pci_lock when it is
> not locked, which would be wrong.
> 
> Perhaps the comment could be clarified as:
> 
> /*
>  * Assert that d->pdev_list doesn't change. 
> ASSERT_PDEV_LIST_IS_READ_LOCKED
>  * is not suitable

Re: [PATCH v2] x86: amend 'n' debug-key output with SMI count

On 16.02.2024 10:11, Roger Pau Monné wrote:
> On Wed, Feb 14, 2024 at 11:15:51AM +0100, Jan Beulich wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -407,9 +407,15 @@ void __init early_cpu_init(bool verbose)
>>  paddr_bits -= (ebx >> 6) & 0x3f;
>>  }
>>  
>> -if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
>> +if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) {
>> +uint64_t smi_count;
>> +
>>  park_offline_cpus = opt_mce;
>>  
>> +if (!verbose && !rdmsr_safe(MSR_SMI_COUNT, smi_count))
>> +setup_force_cpu_cap(X86_FEATURE_SMI_COUNT);
> 
> Why make it dependent on !verbose?  The call with !verbose is tied to
> part of the ucode loading being half-functional (for example
> MCU_CONTROL_DIS_MCU_LOAD not being set) but I don't see that as a
> signal that SMI count shouldn't be used.
> 
> does it need to be part of the early cpu initialization instead of
> being in the (later) Intel specific init code part of the
> identify_cpu()?

Yes, the condition was inverted. It could likely also be dropped
altogether; not sure which one's better: On one hand avoiding multiple
setup_force_cpu_cap() seems desirable (albeit not strictly necessary),
while otoh the code would be simpler without.

>> --- a/xen/arch/x86/nmi.c
>> +++ b/xen/arch/x86/nmi.c
>> @@ -585,15 +585,34 @@ static void cf_check do_nmi_trigger(unsi
>>  self_nmi();
>>  }
>>  
>> +static DEFINE_PER_CPU(unsigned int, smi_count);
>> +
>> +static void cf_check read_smi_count(void *unused)
>> +{
>> +unsigned int dummy;
>> +
>> +rdmsr(MSR_SMI_COUNT, this_cpu(smi_count), dummy);
>> +}
>> +
>>  static void cf_check do_nmi_stats(unsigned char key)
>>  {
>>  const struct vcpu *v;
>>  unsigned int cpu;
>>  bool pend, mask;
>>  
>> -printk("CPU\tNMI\n");
>> +printk("CPU\tNMI%s\n", boot_cpu_has(X86_FEATURE_SMI_COUNT) ? "\tSMI" : 
>> "");
>> +
>> +if ( boot_cpu_has(X86_FEATURE_SMI_COUNT) )
>> +on_each_cpu(read_smi_count, NULL, 1);
>> +
>>  for_each_online_cpu ( cpu )
>> -printk("%3u\t%3u\n", cpu, per_cpu(nmi_count, cpu));
>> +{
>> +printk("%3u\t%3u", cpu, per_cpu(nmi_count, cpu));
>> +if ( boot_cpu_has(X86_FEATURE_SMI_COUNT) )
>> +printk("\t%3u\n", per_cpu(smi_count, cpu));
>> +else
>> +printk("\n");
>> +}
>>  
>>  if ( !hardware_domain || !(v = domain_vcpu(hardware_domain, 0)) )
>>  return;
> 
> Could you also amend the debug-key help text to mention SMI?

Hmm, I had considered that and decided against. I'm uncertain, nevertheless,
so could be talked into amending that help text. Just that I can't make it
"NMI and SMI statistics" as whether SMI data is available is conditional.
Yet "NMI (and maybe SMI) statistics" looks a little clumsy to me ...

Jan

Re: [PATCH] build/xen: fail to rebuild if Kconfig fails

On 16.02.2024 11:51, Roger Pau Monné wrote:
> On Fri, Feb 16, 2024 at 11:04:46AM +0100, Jan Beulich wrote:
>> On 15.02.2024 18:23, Roger Pau Monné wrote:
>>> On Thu, Feb 15, 2024 at 05:22:00PM +0100, Jan Beulich wrote:
 On 15.02.2024 17:08, Roger Pau Monné wrote:
> On Thu, Feb 15, 2024 at 02:02:41PM +0100, Jan Beulich wrote:
>> --- a/xen/Rules.mk
>> +++ b/xen/Rules.mk
>> @@ -15,7 +15,11 @@ srcdir := $(srctree)/$(src)
>>  PHONY := __build
>>  __build:
>>  
>> --include $(objtree)/include/config/auto.conf
>> +ifneq ($(obj),tools)
>> +ifneq ($(obj),tools/kconfig)
>> +include $(objtree)/include/config/auto.conf
>> +endif
>> +endif
>
> Trying to understand this, I assume it's to avoid an infinite
> dependency loop that generating include/config/auto.conf requires some
> tools that are build using xen/Rules.mk?

 The file has dependencies only in xen/Makefile. This is about the
 file simply not being there when initially building. Perhaps the
 patch description helps that I've written in the meantime:

 "Because of using "-include", failure to (re)build auto.conf (with
  auto.conf.cmd produced as a secondary target) won't stop make from
  continuing the build. Arrange for it being possible to drop the - from
  Rules.mk, requiring that the include be skipped for tools-only targets.
>>>
>>> Wouldn't it be more reliable if we skipped the include for any paths
>>> in $(obj) that start with 'tools', rather than hardcoding 'tools' and
>>> 'tools/kconfig'?
>>
>> I was first meaning to do so, but the expression would end up more
>> complex than I'd like (for it needing to be an exact match of "tools"
>> and a prefix match of "tools/"). Thinking of it,
>>
>> ifneq ($(obj),tools)
>> ifneq ($(patsubst tools/%,$(obj)),)
>>
>> might do (and not be as complex as I first thought, when intending to
>> put all in a single "if").
> 
> Would something like the rune below work?
> 
> ifneq ($(word 1, $(subst /, ,$(obj))),tools)
> 
> That should allow to have a single condition, and should match both
> 'tools' and 'tools/*'

Hmm, yes, that works. $(subst ...) is something I usually try to avoid,
in favor of $(patsubst ...).

Jan

Re: [XEN PATCH v2 2/3] x86/uaccess: replace __{get,put}_user_bad() with STATIC_ASSERT_UNREACHABLE()

2024-02-19 Thread Federico Serafini


On 15/02/24 11:32, Jan Beulich wrote:

The important difference is: Here we're told that there was a use of
__put_user_bad, which is easy to grep for, and thus see how the
supplied function / file / line(?) relate to the ultimate problem.

I'm afraid I'm meanwhile confused enough by the various replies
containing results of experimentation that I can't really tell
anymore what case is best. Hence I can only restate my expectation for
an eventual v3: Diagnosing what the issue is, no matter whether the new
macro is used in another macro or in an inline function, should not
become meaningfully more difficult. In how far this is the case wants
clarifying in the description of the change.


I think the best thing at the moment is to deviate
__{get,put}_user_bad() for Rule 16.3.
I'll let maintainers further explore the possibility of having a
compile-time assertion based on the assembler error.

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)

Re: [PATCH v4 26/30] xen/riscv: add minimal stuff to mm.h to build full Xen

On 16.02.2024 12:03, Oleksii wrote:
>>
>>> +    } free;
>>> +    } u;
>>> +
>>> +    union {
>>> +    /* Page is in use, but not as a shadow. */
>>
>> I'm also pretty sure I asked before what shadow this comment alludes
>> to.
> I missed your request about 'shadow' before.
> 
> The comment arrived from Arm.
> 
> I tried to find out the answer by investigation how 'inuse' is used,
> and, unfortunately, I couldn't find an answer what 'shadow' alludes to.

That's from x86'es shadow paging, where a page can serve as a "shadow" of
a guest page table page.

Jan

Re: [PATCH v4 25/30] xen/riscv: add minimal stuff to processor.h to build full Xen