Re: [XEN PATCH] xen: cache clearing and invalidation helpers refactoring
On 19.02.2024 16:14, Nicola Vetrini wrote: > The cache clearing and invalidation helpers in x86 and Arm didn't > comply with MISRA C Rule 17.7: "The value returned by a function > having non-void return type shall be used". On Arm they > were always returning 0, while some in x86 returned -EOPNOTSUPP > and in common/grant_table the return value is saved. > > As a consequence, a common helper arch_grant_cache_flush that returns > an integer is introduced, so that each architecture can choose whether to > return an error value on certain conditions, and the helpers have either > been changed to return void (on Arm) or deleted entirely (on x86). > > Signed-off-by: Julien Grall > Signed-off-by: Nicola Vetrini > --- > The original refactor idea came from Julien Grall in [1]; I edited that > proposal > to fix build errors. > > I did introduce a cast to void for the call to flush_area_local on x86, > because > even before this patch the return value of that function wasn't checked in all > but one use in x86/smp.c, and in this context the helper (perhaps > incidentally) > ignored the return value of flush_area_local. I object to such casting to void, at least until there's an overriding decision that for Misra purposes such casts may be needed. > --- a/xen/arch/arm/include/asm/page.h > +++ b/xen/arch/arm/include/asm/page.h > @@ -123,6 +123,7 @@ > > #ifndef __ASSEMBLY__ > > +#include This is a no-go, imo (also on x86): Adding this include here effectively means that nearly every CU will have a dependency on that header, no matter that most are entirely agnostic of grants. Each arch has a grant_table.h - is there any reason the new, grant-specific helper can't be put there? > @@ -182,21 +183,21 @@ void flush_area_mask(const cpumask_t *mask, const void > *va, > } > > static inline void flush_page_to_ram(unsigned long mfn, bool sync_icache) {} > -static inline int invalidate_dcache_va_range(const void *p, > - unsigned long size) > -{ return -EOPNOTSUPP; } > -static inline int clean_and_invalidate_dcache_va_range(const void *p, > - unsigned long size) > + > +static inline int arch_grant_cache_flush(unsigned int op, const void *p, > + unsigned long size) > { > -unsigned int order = get_order_from_bytes(size); > +unsigned int order; > + > +if ( !(op & GNTTAB_CACHE_CLEAN) ) > +return -EOPNOTSUPP; > + > +order = get_order_from_bytes(size); > /* sub-page granularity support needs to be added if necessary */ > -flush_area_local(p, FLUSH_CACHE|FLUSH_ORDER(order)); > +(void) flush_area_local(p, FLUSH_CACHE|FLUSH_ORDER(order)); As to my objection to the addition of a cast, did you actually check what this function returns, before saying "incidentally" in the respective remark? Already the return type being "unsigned int" is indicative of the return value not being suitable here for handing on. In addition there shouldn't be a blank after a cast. Instead, if already you were to touch this line, it would have been nice if you took the opportunity and added the missing blanks around the binary operator involved. Jan
Re: [XEN PATCH v2 2/3] x86/uaccess: replace __{get,put}_user_bad() with STATIC_ASSERT_UNREACHABLE()
On 19.02.2024 21:43, Stefano Stabellini wrote: > On Mon, 19 Feb 2024, Federico Serafini wrote: >> On 15/02/24 11:32, Jan Beulich wrote: >>> The important difference is: Here we're told that there was a use of >>> __put_user_bad, which is easy to grep for, and thus see how the >>> supplied function / file / line(?) relate to the ultimate problem. >>> >>> I'm afraid I'm meanwhile confused enough by the various replies >>> containing results of experimentation that I can't really tell >>> anymore what case is best. Hence I can only restate my expectation for >>> an eventual v3: Diagnosing what the issue is, no matter whether the new >>> macro is used in another macro or in an inline function, should not >>> become meaningfully more difficult. In how far this is the case wants >>> clarifying in the description of the change. >> >> I think the best thing at the moment is to deviate >> __{get,put}_user_bad() for Rule 16.3. >> I'll let maintainers further explore the possibility of having a >> compile-time assertion based on the assembler error. > > OK. I hope Jan is OK to deviate by in-code comment. Hmm, the follow-on suggestion was to add break statements? Followed by me asking whether adding noreturn to the decls wouldn't also help. (Then again I was under the impression that there was more than just the "missing" break statements which Misra thought was an issue here.) Jan
Re: [PATCH 1/2] hw/xen: detect when running inside stubdomain
On 19/2/24 19:16, Marek Marczykowski-Górecki wrote: Introduce global xen_is_stubdomain variable when qemu is running inside a stubdomain instead of dom0. This will be relevant for subsequent patches, as few things like accessing PCI config space need to be done differently. Signed-off-by: Marek Marczykowski-Górecki --- hw/xen/xen-legacy-backend.c | 15 +++ include/hw/xen/xen.h| 1 + system/globals.c| 1 + 3 files changed, 17 insertions(+) diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h index 37ecc91fc3..ecb89ecfc1 100644 --- a/include/hw/xen/xen.h +++ b/include/hw/xen/xen.h @@ -36,6 +36,7 @@ enum xen_mode { extern uint32_t xen_domid; extern enum xen_mode xen_mode; extern bool xen_domid_restrict; +extern bool xen_is_stubdomain; int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num); int xen_set_pci_link_route(uint8_t link, uint8_t irq); diff --git a/system/globals.c b/system/globals.c index b6d4e72530..ac27d88bd4 100644 --- a/system/globals.c +++ b/system/globals.c @@ -62,6 +62,7 @@ bool qemu_uuid_set; uint32_t xen_domid; enum xen_mode xen_mode = XEN_DISABLED; bool xen_domid_restrict; +bool xen_is_stubdomain; Note for myself, Paolo and Claudio, IIUC these fields belong to TYPE_XEN_ACCEL in accel/xen/xen-all.c. Maybe resulting in smth like: -- >8 -- diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c index 5ff0cb8bd9..fc25d8c912 100644 --- a/accel/xen/xen-all.c +++ b/accel/xen/xen-all.c @@ -24,11 +24,31 @@ #include "migration/global_state.h" #include "hw/boards.h" -bool xen_allowed; +struct XenAccelState +{ +AccelState parent_obj; + +bool xen_allowed; + +enum xen_mode xen_mode; + +uint32_t xen_domid; +bool xen_domid_restrict; xc_interface *xen_xc; xenforeignmemory_handle *xen_fmem; xendevicemodel_handle *xen_dmod; +}; + +struct XenAccelOpsClass +{ +AccelOpsClass parent_class; + +struct evtchn_backend_ops *xen_evtchn_ops; +struct gnttab_backend_ops *xen_gnttab_ops; +struct foreignmem_backend_ops *xen_foreignmem_ops; +struct xenstore_backend_ops *xen_xenstore_ops; +} static void xenstore_record_dm_state(const char *state) { @@ -114,6 +134,13 @@ static int xen_init(MachineState *ms) return 0; } +static void xen_accel_init(Object *obj) +{ +XenAccelState *s = XEN_ACCEL(obj); + +s->xen_mode = XEN_DISABLED; +} + static void xen_accel_class_init(ObjectClass *oc, void *data) { AccelClass *ac = ACCEL_CLASS(oc); @@ -142,6 +169,8 @@ static void xen_accel_class_init(ObjectClass *oc, void *data) static const TypeInfo xen_accel_type = { .name = TYPE_XEN_ACCEL, .parent = TYPE_ACCEL, +.instance_size = sizeof(XenAccelState), +.instance_init = xen_accel_init, .class_init = xen_accel_class_init, }; @@ -157,6 +186,7 @@ static const TypeInfo xen_accel_ops_type = { .parent = TYPE_ACCEL_OPS, .class_init = xen_accel_ops_class_init, +.class_size = sizeof(XenAccelOpsClass), .abstract = true, }; ---
[xen-unstable test] 184705: tolerable FAIL - PUSHED
flight 184705 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/184705/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 184699 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184699 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184699 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184699 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184699 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184699 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184699 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 184699 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 184699 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184699 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184699 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184699 test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: xen c144b9e32427ba37e0e0443a0d3fa53e9fb878b6 baseline version: xen 0441c3acc7e9e72e984ce49d32e61827894ae4a3 Last test of basis 184699 2024-02-19 01:52:12 Z1 days Testing same since 184703 2024-02-19 14:38:55 Z0 days2 attempts People who touched revisions under test: Andrew Cooper Anthony PERARD Frediano Ziglio
Re: [PATCH] ns16550: add Asix AX99100 serial card
On Mon, Feb 19, 2024 at 12:13:18PM +0100, Marek Marczykowski-Górecki wrote: > On Mon, Feb 19, 2024 at 09:57:49AM +0100, Jan Beulich wrote: > > On 18.02.2024 02:34, Marek Marczykowski-Górecki wrote: > > > @@ -1170,6 +1177,11 @@ static const struct ns16550_config __initconst > > > uart_config[] = > > > .dev_id = 0x7adc, > > > .param = param_intel_lpss > > > }, > > > +{ > > > +.vendor_id = PCI_VENDOR_ID_ASIX, > > > +.dev_id = 9100, > > > > As per Linux this is 0x9100. > > Right... but then, maybe the patch isn't needed at all, as it does work > for me. Maybe what's needed instead is some other patch already in > staging. Initial attempt that did not work was with 4.17.something. > I guess setting the fifo size isn't that important. Indeed, the patch is not needed after all, plain "master" from today works. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH v6 02/15] xen/arm: add initial support for LLC coloring on arm64
Hi, On 14/02/2024 13:52, Carlo Nonato wrote: On Wed, Feb 14, 2024 at 11:14 AM Michal Orzel wrote: diff --git a/xen/arch/arm/include/asm/processor.h b/xen/arch/arm/include/asm/processor.h index 8e02410465..336933ee62 100644 --- a/xen/arch/arm/include/asm/processor.h +++ b/xen/arch/arm/include/asm/processor.h @@ -18,6 +18,22 @@ #define CTR_IDC_SHIFT 28 #define CTR_DIC_SHIFT 29 +/* CCSIDR Current Cache Size ID Register */ +#define CCSIDR_LINESIZE_MASK_AC(0x7, ULL) Why ULL and not UL? ccsidr is of register_t type Julien, while reviewing an earlier version: Please use ULL here otherwise someone using MASK << SHIFT will have the expected result. https://patchew.org/Xen/20220826125111.152261-1-carlo.non...@minervasys.tech/20220826125111.152261-2-carlo.non...@minervasys.tech/#08956082-c194-8bae-cb25-44e4e3227...@xen.org Michal is right. This should be UL. Not sure why I suggested ULL back then. Sorry. Cheers, -- Julien Grall
Re: [PATCH 2/2] xen: fix stubdom PCI addr
On Mon, Feb 19, 2024 at 07:16:06PM +0100, Marek Marczykowski-Górecki wrote: > From: Frédéric Pierret (fepitre) This shouldn't be here, it's my patch. > When running in a stubdomain, the config space access via sysfs needs to > use BDF as seen inside stubdomain (connected via xen-pcifront), which is > different from the real BDF. For other purposes (hypercall parameters > etc), the real BDF needs to be used. > Get the in-stubdomain BDF by looking up relevant PV PCI xenstore > entries. > > Signed-off-by: Marek Marczykowski-Górecki > --- > hw/xen/xen-host-pci-device.c | 77 +++- > hw/xen/xen-host-pci-device.h | 6 +++ > 2 files changed, 82 insertions(+), 1 deletion(-) > > diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c > index 8c6e9a1716..3f8a6f84a8 100644 > --- a/hw/xen/xen-host-pci-device.c > +++ b/hw/xen/xen-host-pci-device.c > @@ -9,6 +9,7 @@ > #include "qemu/osdep.h" > #include "qapi/error.h" > #include "qemu/cutils.h" > +#include "hw/xen/xen-legacy-backend.h" > #include "xen-host-pci-device.h" > > #define XEN_HOST_PCI_MAX_EXT_CAP \ > @@ -33,13 +34,76 @@ > #define IORESOURCE_PREFETCH 0x1000 /* No side effects */ > #define IORESOURCE_MEM_64 0x0010 > > +static void xen_host_pci_fill_local_addr(XenHostPCIDevice *d, Error **errp) > +{ > +unsigned int num_devs, len, i; > +unsigned int domain, bus, dev, func; > +char *be_path = NULL; > +char path[80]; > +char *msg = NULL; > + > +be_path = qemu_xen_xs_read(xenstore, 0, "device/pci/0/backend", ); > +if (!be_path) > +goto err_out; > +snprintf(path, sizeof(path), "%s/num_devs", be_path); > +msg = qemu_xen_xs_read(xenstore, 0, path, ); > +if (!msg) > +goto err_out; > + > +if (sscanf(msg, "%u", _devs) != 1) { > +error_setg(errp, "Failed to parse %s (%s)\n", msg, path); > +goto err_out; > +} > +free(msg); > + > +for (i = 0; i < num_devs; i++) { > +snprintf(path, sizeof(path), "%s/dev-%u", be_path, i); > +msg = qemu_xen_xs_read(xenstore, 0, path, ); > +if (!msg) { > +error_setg(errp, "Failed to read %s\n", path); > +goto err_out; > +} > +if (sscanf(msg, "%x:%x:%x.%x", , , , ) != 4) { > +error_setg(errp, "Failed to parse %s (%s)\n", msg, path); > +goto err_out; > +} > +free(msg); > +if (domain != d->domain || > +bus != d->bus || > +dev != d->dev || > +func!= d->func) > +continue; > +snprintf(path, sizeof(path), "%s/vdev-%u", be_path, i); > +msg = qemu_xen_xs_read(xenstore, 0, path, ); > +if (!msg) { > +error_setg(errp, "Failed to read %s\n", path); > +goto out; > +} > +if (sscanf(msg, "%x:%x:%x.%x", , , , ) != 4) { > +error_setg(errp, "Failed to parse %s (%s)\n", msg, path); > +goto err_out; > +} > +free(msg); > +d->local_domain = domain; > +d->local_bus = bus; > +d->local_dev = dev; > +d->local_func = func; > +goto out; > +} > + > +err_out: > +free(msg); > +out: > +free(be_path); > +} > + > static void xen_host_pci_sysfs_path(const XenHostPCIDevice *d, > const char *name, char *buf, ssize_t > size) > { > int rc; > > rc = snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s", > - d->domain, d->bus, d->dev, d->func, name); > + d->local_domain, d->local_bus, d->local_dev, > d->local_func, name); > assert(rc >= 0 && rc < size); > } > > @@ -342,6 +406,17 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, > uint16_t domain, > d->dev = dev; > d->func = func; > > +if (xen_is_stubdomain) { > +xen_host_pci_fill_local_addr(d, errp); > +if (*errp) > +goto error; > +} else { > +d->local_domain = d->domain; > +d->local_bus = d->bus; > +d->local_dev = d->dev; > +d->local_func = d->func; > +} > + > xen_host_pci_config_open(d, errp); > if (*errp) { > goto error; > diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h > index 4d8d34ecb0..270dcb27f7 100644 > --- a/hw/xen/xen-host-pci-device.h > +++ b/hw/xen/xen-host-pci-device.h > @@ -23,6 +23,12 @@ typedef struct XenHostPCIDevice { > uint8_t dev; > uint8_t func; > > +/* different from the above in case of stubdomain */ > +uint16_t local_domain; > +uint8_t local_bus; > +uint8_t local_dev; > +uint8_t local_func; > + > uint16_t vendor_id; > uint16_t device_id; > uint32_t class_code; > -- > 2.43.0 > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH 2/2] almost fully ignore zero-size flush requests
Hi Jan, Title: I would add 'gnttab:' to clarify which subsystem you are modifying. On 05/02/2024 11:03, Jan Beulich wrote: Along the line with observations in the context of XSA-448, besides "op" no field is relevant when the range to be flushed is empty, much like e.g. the pointers passed to memcpy() are irrelevant (and would never be "validated") when the passed length is zero. Split the existing condition validating "op", "offset", and "length", leaving only the "op" part ahead of the check for length being zero (or no flushing to be performed). I am probably missing something here. I understand the theory behind reducing the number of checks when len == 0. But an OS cannot rely on it: 1) older hypervisor would still return an error if the check doesn't pass) 2) it does feel odd to allow "invalid" offset when len == 0 (at least. So to me, it is better to keep those checks early. That said, I agree this is a matter of opinion, so I will not Nack it but also I will not Ack it. In the course of splitting also simplify the moved part of the condition from 3 to 2 conditionals, potentially (depending on the architecture) requiring one less (conditional) branch. Signed-off-by: Jan Beulich --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -3528,15 +3528,16 @@ static int _cache_flush(const gnttab_cac void *v; int ret; -if ( (cflush->offset >= PAGE_SIZE) || - (cflush->length > PAGE_SIZE) || - (cflush->offset + cflush->length > PAGE_SIZE) || - (cflush->op & ~(GNTTAB_CACHE_INVAL | GNTTAB_CACHE_CLEAN)) ) +if ( cflush->op & ~(GNTTAB_CACHE_INVAL | GNTTAB_CACHE_CLEAN) ) return -EINVAL; if ( cflush->length == 0 || cflush->op == 0 ) return !*cur_ref ? 0 : -EILSEQ; +if ( (cflush->offset | cflush->length) > PAGE_SIZE || This is confusing. I understand you are trying to force the compiler to optimize. But is it really worth it? After all, the rest of operation will outweight this check (cache flush are quite expensive). We probably should take a more generic decision (and encode in our policy) because you seem to like this pattern and I dislike it :). Not sure what the others think. Cheers, -- Julien Grall
[xen-unstable test] 184703: regressions - FAIL
flight 184703 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/184703/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-dom0pvh-xl-amd 14 guest-start fail REGR. vs. 184699 build-arm64-pvops 6 kernel-build fail REGR. vs. 184699 Tests which did not succeed, but are not blocking: test-arm64-arm64-examine 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-xl 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit1 1 build-check(1) blocked n/a test-arm64-arm64-xl-credit2 1 build-check(1) blocked n/a test-arm64-arm64-xl-thunderx 1 build-check(1) blocked n/a test-arm64-arm64-xl-vhd 1 build-check(1) blocked n/a test-arm64-arm64-xl-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 184699 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184699 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184699 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184699 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184699 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184699 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184699 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 184699 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 184699 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184699 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184699 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184699 test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: xen c144b9e32427ba37e0e0443a0d3fa53e9fb878b6 baseline version: xen 0441c3acc7e9e72e984ce49d32e61827894ae4a3 Last test of basis 184699 2024-02-19 01:52:12 Z0 days Testing same since 184703 2024-02-19 14:38:55 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Anthony PERARD Frediano Ziglio Julien Grall Marek Marczykowski-Górecki Oleksandr Tyshchenko Roger Pau Monné jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass
Re: [PATCH 1/2] gnttab: fully ignore zero-size copy requests
Hi Jan, On 05/02/2024 11:03, Jan Beulich wrote: Along the line with observations in the context of XSA-448, no field in struct gnttab_copy_ptr is relevant when no data is to be copied, much like e.g. the pointers passed to memcpy() are irrelevant (and would never be "validated") when the passed length is zero. Signed-off-by: Jan Beulich Reviewed-by: Julien Grall Cheers, -- Julien Grall
Re: [XEN PATCH v2 2/3] x86/uaccess: replace __{get,put}_user_bad() with STATIC_ASSERT_UNREACHABLE()
On Mon, 19 Feb 2024, Federico Serafini wrote: > On 15/02/24 11:32, Jan Beulich wrote: > > The important difference is: Here we're told that there was a use of > > __put_user_bad, which is easy to grep for, and thus see how the > > supplied function / file / line(?) relate to the ultimate problem. > > > > I'm afraid I'm meanwhile confused enough by the various replies > > containing results of experimentation that I can't really tell > > anymore what case is best. Hence I can only restate my expectation for > > an eventual v3: Diagnosing what the issue is, no matter whether the new > > macro is used in another macro or in an inline function, should not > > become meaningfully more difficult. In how far this is the case wants > > clarifying in the description of the change. > > I think the best thing at the moment is to deviate > __{get,put}_user_bad() for Rule 16.3. > I'll let maintainers further explore the possibility of having a > compile-time assertion based on the assembler error. OK. I hope Jan is OK to deviate by in-code comment.
Re: [PATCH v9 4/7] xen/asm-generic: introduce generic device.h
Hi Oleksii, On 16/02/2024 12:39, Oleksii Kurochko wrote: Arm, PPC and RISC-V introduce the same things in asm/device.h, so generic device.h was introduced. Arm's device.h was taken as a base with the following changes: - #ifdef ACPI related things. - Rename #ifdef guards. - Add SPDX tag. - #ifdef CONFIG_HAS_DEVICE_TREE related things. - #ifdef-ing iommu related things with CONFIG_HAS_PASSTHROUGH. Signed-off-by: Oleksii Kurochko Acked-by: Jan Beulich Acked-by: Julien Grall Cheers, -- Julien Grall
Re: [PATCH v8 5/7] xen/asm-generic: introduce generic device.h
Hi Oleksii, On 15/02/2024 16:54, Oleksii wrote: On 14/02/2024 09:32, Oleksii wrote: On Tue, 2024-02-13 at 18:09 +, Julien Grall wrote: +#ifdef CONFIG_HAS_PASSTHROUGH + struct iommu_fwspec *iommu_fwspec; /* per-device IOMMU instance data */ +#endif +}; + +typedef struct device device_t; + +#ifdef CONFIG_HAS_DEVICE_TREE + +#include + +#define dev_is_dt(dev) ((dev)->type == DEV_DT) + +/** + * device_init - Initialize a device + * @dev: device to initialize + * @class: class of the device (serial, network...) + * @data: specific data for initializing the device + * + * Return 0 on success. + */ +int device_init(struct dt_device_node *dev, enum device_class class, + const void *data); + +/** + * device_get_type - Get the type of the device + * @dev: device to match + * + * Return the device type on success or DEVICE_ANY on failure + */ +enum device_class device_get_class(const struct dt_device_node *dev); + +#define DT_DEVICE_START(name_, namestr_, class_) \ +static const struct device_desc __dev_desc_##name_ __used \ +__section(".dev.info") = { \ + .name = namestr_, \ + .class = class_, + +#define DT_DEVICE_END \ +}; + +#else /* !CONFIG_HAS_DEVICE_TREE */ +#define dev_is_dt(dev) ((void)(dev), false) +#endif /* CONFIG_HAS_DEVICE_TREE */ + +#define dev_is_pci(dev) ((dev)->type == DEV_PCI) + +struct device_desc { + /* Device name */ + const char *name; + /* Device class */ + enum device_class class; + +#ifdef CONFIG_HAS_DEVICE_TREE + + /* List of devices supported by this driver */ + const struct dt_device_match *dt_match; + /* + * Device initialization. + * + * -EAGAIN is used to indicate that device probing is deferred. + */ + int (*init)(struct dt_device_node *dev, const void *data); + +#endif +}; I am not sure I fully understand why "device_desc" is not protected by CONFIG_HAS_DEVICE_TREE. The structure doesn't mean much when the config is disabled. Can you clarify? I thought that one day struct device_desc and acpi_device_desc will be "merged", and so decided just to #ifdef only DEVICE_TREE specific fields. It might be possible to merge the two if we were using an union for the ACPI/DT specific part. However the majority of the parsing code needs to differ. So I am not convinced there would be any value to merge the two structures. In this case, let's have two separate structures. This is not the current situation, and I don't have a specific example. It appears that all architectures will use Device Tree or ACPI. However, does it make sense to keep 'struct device_desc' more generic to accommodate non-DT or non-ACPI cases? I am not entirely sure what else to say. As I wrote before yes it could be made generic. But right now I don't see any values. If you have any idea how to share the structure. Then feel free to make a proposal and I will review it. Cheers, -- Julien Grall
Re: [PATCH v9 7/7] xen/asm-generic: fold struct devarch into struct dev
Hi, On 16/02/2024 12:39, Oleksii Kurochko wrote: The current patch is a follow-up to the patch titled: xen/asm-generic: introduce generic device.h Also, a prerequisite for this patch is, without which a compilation error will occur: xen/arm: switch Arm to use asm-generic/device.h The 'struct dev_archdata' is exclusively used within 'struct device', so it could be merged into 'struct device.' After the merger, it is necessary to update the 'dev_archdata()' macros and the comments above 'struct arm_smmu_xen_device' in drivers/passthrough/arm/smmu.c. Additionally, it is required to update instances of "dev->archdata->iommu" to "dev->iommu". Suggested-by: Julien Grall Signed-off-by: Oleksii Kurochko --- This patch can be merged with patches 4 and 5 of this patch series. I am a bit puzzled with this comment. If this is the case, then why was it done in a separate patch? I know I suggested to create the separate patch but this was only in the case you decided to handle it after this series is merged. So this should have been merged when sending. Maybe I should have been clearer. Anyway, regardless that: Reviewed-by: Julien Grall Cheers, -- Julien Grall
Re: [PATCH v9 7/7] xen/asm-generic: fold struct devarch into struct dev
Hi Jan, On 19/02/2024 11:26, Jan Beulich wrote: On 16.02.2024 13:39, Oleksii Kurochko wrote: The current patch is a follow-up to the patch titled: xen/asm-generic: introduce generic device.h Also, a prerequisite for this patch is, without which a compilation error will occur: xen/arm: switch Arm to use asm-generic/device.h The 'struct dev_archdata' is exclusively used within 'struct device', so it could be merged into 'struct device.' After the merger, it is necessary to update the 'dev_archdata()' macros and the comments above 'struct arm_smmu_xen_device' in drivers/passthrough/arm/smmu.c. Additionally, it is required to update instances of "dev->archdata->iommu" to "dev->iommu". Suggested-by: Julien Grall Signed-off-by: Oleksii Kurochko --- This patch can be merged with patches 4 and 5 of this patch series. --- Changes in V9: - newly introduced patch. --- xen/drivers/passthrough/arm/smmu.c | 12 ++-- xen/include/asm-generic/device.h | 8 +--- 2 files changed, 7 insertions(+), 13 deletions(-) diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c index 32e2ff279b..4a272c8779 100644 --- a/xen/drivers/passthrough/arm/smmu.c +++ b/xen/drivers/passthrough/arm/smmu.c @@ -227,9 +227,9 @@ struct arm_smmu_xen_domain { }; /* - * Xen: Information about each device stored in dev->archdata.iommu + * Xen: Information about each device stored in dev->iommu * - * Initially dev->archdata.iommu only stores the iommu_domain (runtime + * Initially dev->iommu only stores the iommu_domain (runtime * configuration of the SMMU) but, on Xen, we also have to store the * iommu_group (list of streamIDs associated to the device). * @@ -242,7 +242,7 @@ struct arm_smmu_xen_device { struct iommu_group *group; }; -#define dev_archdata(dev) ((struct arm_smmu_xen_device *)dev->archdata.iommu) +#define dev_archdata(dev) ((struct arm_smmu_xen_device *)dev->iommu) I find in particular the naming here odd. But I'll let Julien judge whether this really is along the lines of what he suggested. It is. The naming is not great, but the SMMU code is intended to stay close to Linux. So we want to do the minimum amount of change (at least until we decide we should diverge). Cheers, -- Julien Grall
Re: [XEN v4 2/3] xen/arm: arm64: Add emulation of Debug Data Transfer Registers
Hi Michal, On 19/02/2024 15:43, Michal Orzel wrote: On 19/02/2024 15:45, Ayan Kumar Halder wrote: On 06/02/2024 19:05, Julien Grall wrote: Hi Ayan, Hi Julien/Michal, On 31/01/2024 12:10, Ayan Kumar Halder wrote: From: Michal Orzel Currently, if user enables HVC_DCC config option in Linux, it invokes access to debug data transfer registers (i.e. DBGDTRTX_EL0 on arm64, DBGDTRTXINT on arm32). As these registers are not emulated, Xen injects an undefined exception to the guest and Linux crashes. To prevent this crash, introduce a partial emulation of DBGDTR[TR]X_EL0 (these registers share the same encoding) as RAZ/WI and MDCCSR_EL0 as TXfull. Refer ARM DDI 0487J.a ID042523, D19.3.8, DBGDTRTX_EL0 "If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN". Thus, any OS is expected to read MDCCSR_EL0 and check for TXfull before using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() ---> hvc_dcc_check(), and returns -ENODEV in case TXfull bit is still set after writing a test character. This way we prevent the guest from making use of HVC DCC as a console. Signed-off-by: Michal Orzel Signed-off-by: Ayan Kumar Halder --- Changes from v1 :- 1. DBGDTR_EL0 does not emulate RXfull. This is to avoid giving the OS any indication that the RX buffer is full and is waiting to be read. 2. In Arm32, DBGOSLSR is emulated. Also DBGDTRTXINT is emulated at EL0 only. 3. Fixed the commit message and inline code comments. v2 :- 1. Split the patch into two (separate patches for arm64 and arm32). 2. Removed the "fail" label. 3. Fixed the commit message. v3 :- 1. "HSR_SYSREG_MDCCSR_EL0" emulation differs based on whether partial_emulation_enabled is true or not. 2. If partial_emulation_enabled is false, then access to HSR_SYSREG_DBGDTR_EL0, HSR_SYSREG_DBGDTRTX_EL0 would lead to undefined exception. xen/arch/arm/arm64/vsysreg.c | 28 xen/arch/arm/include/asm/arm64/hsr.h | 3 +++ 2 files changed, 27 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c index b5d54c569b..94f0a6c384 100644 --- a/xen/arch/arm/arm64/vsysreg.c +++ b/xen/arch/arm/arm64/vsysreg.c @@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs, * * Unhandled: * MDCCINT_EL1 - * DBGDTR_EL0 - * DBGDTRRX_EL0 - * DBGDTRTX_EL0 * OSDTRRX_EL1 * OSDTRTX_EL1 * OSECCR_EL1 @@ -173,10 +170,32 @@ void do_sysreg(struct cpu_user_regs *regs, return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1); case HSR_SYSREG_MDCCSR_EL0: /* + * Xen doesn't expose a real (or emulated) Debug Communications Channel + * (DCC) to a domain. Yet the Arm ARM implies this is not an optional + * feature. So some domains may start to probe it. For instance, the + * HVC_DCC driver in Linux (since f35dc083 and at least up to v6.7), + * will try to write some characters and check if the transmit buffer + * has emptied. + * + * By setting TX status bit (only if partial emulation is enabled) to + * indicate the transmit buffer is full, we would hint the OS that the + * DCC is probably not working. + * + * Bit 29: TX full + * * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We emulate that * register as RAZ/WI above. So RO at both EL0 and EL1. The sentence "we emulate that register as ..." seems to be stale? I can see that you tried to handle Julien remark here. But I disagree. This statement is not stale. It explains that because MDSCR_EL1 is RAZ/WI, MDCCSR_EL0 is RO at both EL0 and EL1. This patch does not change this behavior. Indeed. I misread the comment. So what I wrote can be ignored here. */ - return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 0); + return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr, 0, + partial_emulation ? (1U << 29) : 0); + + case HSR_SYSREG_DBGDTR_EL0: + /* DBGDTR[TR]X_EL0 share the same encoding */ + case HSR_SYSREG_DBGDTRTX_EL0: + if ( !partial_emulation ) + goto fail; + return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 0); AFAICT, all the emulation helpers have an explanation why we are using them. But here this is not the case. Can you add one? This and.. + HSR_SYSREG_DBG_CASES(DBGBVR): HSR_SYSREG_DBG_CASES(DBGBCR): HSR_SYSREG_DBG_CASES(DBGWVR): @@ -394,6 +413,7 @@ void do_sysreg(struct cpu_user_regs *regs, * And all other unknown registers. */ default: + fail: AFAICT, this would violate MISRA 15.3 [1]. We didn't seem to have yet (?) accepted the rule, but I don't see we would not given I feel this is similar to what Rule 16.2 is trying to prevent and we accepted it. I think case, I move all the code within default outside. And then
[PATCH 2/2] xen: fix stubdom PCI addr
From: Frédéric Pierret (fepitre) When running in a stubdomain, the config space access via sysfs needs to use BDF as seen inside stubdomain (connected via xen-pcifront), which is different from the real BDF. For other purposes (hypercall parameters etc), the real BDF needs to be used. Get the in-stubdomain BDF by looking up relevant PV PCI xenstore entries. Signed-off-by: Marek Marczykowski-Górecki --- hw/xen/xen-host-pci-device.c | 77 +++- hw/xen/xen-host-pci-device.h | 6 +++ 2 files changed, 82 insertions(+), 1 deletion(-) diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c index 8c6e9a1716..3f8a6f84a8 100644 --- a/hw/xen/xen-host-pci-device.c +++ b/hw/xen/xen-host-pci-device.c @@ -9,6 +9,7 @@ #include "qemu/osdep.h" #include "qapi/error.h" #include "qemu/cutils.h" +#include "hw/xen/xen-legacy-backend.h" #include "xen-host-pci-device.h" #define XEN_HOST_PCI_MAX_EXT_CAP \ @@ -33,13 +34,76 @@ #define IORESOURCE_PREFETCH 0x1000 /* No side effects */ #define IORESOURCE_MEM_64 0x0010 +static void xen_host_pci_fill_local_addr(XenHostPCIDevice *d, Error **errp) +{ +unsigned int num_devs, len, i; +unsigned int domain, bus, dev, func; +char *be_path = NULL; +char path[80]; +char *msg = NULL; + +be_path = qemu_xen_xs_read(xenstore, 0, "device/pci/0/backend", ); +if (!be_path) +goto err_out; +snprintf(path, sizeof(path), "%s/num_devs", be_path); +msg = qemu_xen_xs_read(xenstore, 0, path, ); +if (!msg) +goto err_out; + +if (sscanf(msg, "%u", _devs) != 1) { +error_setg(errp, "Failed to parse %s (%s)\n", msg, path); +goto err_out; +} +free(msg); + +for (i = 0; i < num_devs; i++) { +snprintf(path, sizeof(path), "%s/dev-%u", be_path, i); +msg = qemu_xen_xs_read(xenstore, 0, path, ); +if (!msg) { +error_setg(errp, "Failed to read %s\n", path); +goto err_out; +} +if (sscanf(msg, "%x:%x:%x.%x", , , , ) != 4) { +error_setg(errp, "Failed to parse %s (%s)\n", msg, path); +goto err_out; +} +free(msg); +if (domain != d->domain || +bus != d->bus || +dev != d->dev || +func!= d->func) +continue; +snprintf(path, sizeof(path), "%s/vdev-%u", be_path, i); +msg = qemu_xen_xs_read(xenstore, 0, path, ); +if (!msg) { +error_setg(errp, "Failed to read %s\n", path); +goto out; +} +if (sscanf(msg, "%x:%x:%x.%x", , , , ) != 4) { +error_setg(errp, "Failed to parse %s (%s)\n", msg, path); +goto err_out; +} +free(msg); +d->local_domain = domain; +d->local_bus = bus; +d->local_dev = dev; +d->local_func = func; +goto out; +} + +err_out: +free(msg); +out: +free(be_path); +} + static void xen_host_pci_sysfs_path(const XenHostPCIDevice *d, const char *name, char *buf, ssize_t size) { int rc; rc = snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s", - d->domain, d->bus, d->dev, d->func, name); + d->local_domain, d->local_bus, d->local_dev, d->local_func, name); assert(rc >= 0 && rc < size); } @@ -342,6 +406,17 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain, d->dev = dev; d->func = func; +if (xen_is_stubdomain) { +xen_host_pci_fill_local_addr(d, errp); +if (*errp) +goto error; +} else { +d->local_domain = d->domain; +d->local_bus = d->bus; +d->local_dev = d->dev; +d->local_func = d->func; +} + xen_host_pci_config_open(d, errp); if (*errp) { goto error; diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h index 4d8d34ecb0..270dcb27f7 100644 --- a/hw/xen/xen-host-pci-device.h +++ b/hw/xen/xen-host-pci-device.h @@ -23,6 +23,12 @@ typedef struct XenHostPCIDevice { uint8_t dev; uint8_t func; +/* different from the above in case of stubdomain */ +uint16_t local_domain; +uint8_t local_bus; +uint8_t local_dev; +uint8_t local_func; + uint16_t vendor_id; uint16_t device_id; uint32_t class_code; -- 2.43.0
[PATCH 1/2] hw/xen: detect when running inside stubdomain
Introduce global xen_is_stubdomain variable when qemu is running inside a stubdomain instead of dom0. This will be relevant for subsequent patches, as few things like accessing PCI config space need to be done differently. Signed-off-by: Marek Marczykowski-Górecki --- hw/xen/xen-legacy-backend.c | 15 +++ include/hw/xen/xen.h| 1 + system/globals.c| 1 + 3 files changed, 17 insertions(+) diff --git a/hw/xen/xen-legacy-backend.c b/hw/xen/xen-legacy-backend.c index 124dd5f3d6..4a09ea2561 100644 --- a/hw/xen/xen-legacy-backend.c +++ b/hw/xen/xen-legacy-backend.c @@ -603,6 +603,19 @@ static void xen_set_dynamic_sysbus(void) machine_class_allow_dynamic_sysbus_dev(mc, TYPE_XENSYSDEV); } +static bool xen_check_stubdomain(void) +{ +char *dm_path = g_strdup_printf("/local/domain/%d/image", xen_domid); +uint32_t dm_domid; +bool is_stubdom = false; + +if (!xenstore_read_int(dm_path, "device-model-domid", _domid)) +is_stubdom = dm_domid != 0; + +g_free(dm_path); +return is_stubdom; +} + void xen_be_init(void) { xenstore = qemu_xen_xs_open(); @@ -616,6 +629,8 @@ void xen_be_init(void) exit(1); } +xen_is_stubdomain = xen_check_stubdomain(); + xen_sysdev = qdev_new(TYPE_XENSYSDEV); sysbus_realize_and_unref(SYS_BUS_DEVICE(xen_sysdev), _fatal); xen_sysbus = qbus_new(TYPE_XENSYSBUS, xen_sysdev, "xen-sysbus"); diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h index 37ecc91fc3..ecb89ecfc1 100644 --- a/include/hw/xen/xen.h +++ b/include/hw/xen/xen.h @@ -36,6 +36,7 @@ enum xen_mode { extern uint32_t xen_domid; extern enum xen_mode xen_mode; extern bool xen_domid_restrict; +extern bool xen_is_stubdomain; int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num); int xen_set_pci_link_route(uint8_t link, uint8_t irq); diff --git a/system/globals.c b/system/globals.c index b6d4e72530..ac27d88bd4 100644 --- a/system/globals.c +++ b/system/globals.c @@ -62,6 +62,7 @@ bool qemu_uuid_set; uint32_t xen_domid; enum xen_mode xen_mode = XEN_DISABLED; bool xen_domid_restrict; +bool xen_is_stubdomain; struct evtchn_backend_ops *xen_evtchn_ops; struct gnttab_backend_ops *xen_gnttab_ops; struct foreignmem_backend_ops *xen_foreignmem_ops; -- 2.43.0
Re: [PATCH v2] Constify some parameters
For XSM changes, Ack-by: Daniel P. Smith V/r, Daniel P. Smith Apertus Solutions, LLC On 2/14/24 04:47, Frediano Ziglio wrote: Make clear they are not changed in the functions. Signed-off-by: Frediano Ziglio Reviewed-by: Jan Beulich -- v2: - fixed typo in commit message. --- xen/arch/x86/pv/callback.c | 4 ++-- xen/common/sched/compat.c | 2 +- xen/common/sched/core.c| 2 +- xen/xsm/flask/flask_op.c | 8 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/pv/callback.c b/xen/arch/x86/pv/callback.c index 17829304fe..caec4fb16f 100644 --- a/xen/arch/x86/pv/callback.c +++ b/xen/arch/x86/pv/callback.c @@ -48,7 +48,7 @@ static void unregister_guest_nmi_callback(void) memset(t, 0, sizeof(*t)); } -static long register_guest_callback(struct callback_register *reg) +static long register_guest_callback(const struct callback_register *reg) { long ret = 0; struct vcpu *curr = current; @@ -102,7 +102,7 @@ static long register_guest_callback(struct callback_register *reg) return ret; } -static long unregister_guest_callback(struct callback_unregister *unreg) +static long unregister_guest_callback(const struct callback_unregister *unreg) { long ret; diff --git a/xen/common/sched/compat.c b/xen/common/sched/compat.c index dd97593630..a02204ec9a 100644 --- a/xen/common/sched/compat.c +++ b/xen/common/sched/compat.c @@ -26,7 +26,7 @@ CHECK_sched_shutdown; CHECK_sched_remote_shutdown; #undef xen_sched_remote_shutdown -static int compat_poll(struct compat_sched_poll *compat) +static int compat_poll(const struct compat_sched_poll *compat) { struct sched_poll native; diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index d177c675c8..c5db373972 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -1431,7 +1431,7 @@ static void vcpu_block_enable_events(void) vcpu_block(); } -static long do_poll(struct sched_poll *sched_poll) +static long do_poll(const struct sched_poll *sched_poll) { struct vcpu *v = current; struct domain *d = v->domain; diff --git a/xen/xsm/flask/flask_op.c b/xen/xsm/flask/flask_op.c index b866e8d05f..ea7dd10dc8 100644 --- a/xen/xsm/flask/flask_op.c +++ b/xen/xsm/flask/flask_op.c @@ -469,7 +469,7 @@ static int flask_security_load(struct xen_flask_load *load) return ret; } -static int flask_devicetree_label(struct xen_flask_devicetree_label *arg) +static int flask_devicetree_label(const struct xen_flask_devicetree_label *arg) { int rv; char *buf; @@ -492,7 +492,7 @@ static int flask_devicetree_label(struct xen_flask_devicetree_label *arg) #ifndef COMPAT -static int flask_ocontext_del(struct xen_flask_ocontext *arg) +static int flask_ocontext_del(const struct xen_flask_ocontext *arg) { int rv; @@ -506,7 +506,7 @@ static int flask_ocontext_del(struct xen_flask_ocontext *arg) return security_ocontext_del(arg->ocon, arg->low, arg->high); } -static int flask_ocontext_add(struct xen_flask_ocontext *arg) +static int flask_ocontext_add(const struct xen_flask_ocontext *arg) { int rv; @@ -550,7 +550,7 @@ static int flask_get_peer_sid(struct xen_flask_peersid *arg) return rv; } -static int flask_relabel_domain(struct xen_flask_relabel *arg) +static int flask_relabel_domain(const struct xen_flask_relabel *arg) { int rc; struct domain *d;
Re: [PATCH 6/6] svm/nestedvm: Introduce nested capabilities bit
On 06.02.2024 02:20, George Dunlap wrote: > --- /dev/null > +++ b/docs/designs/nested-svm-cpu-features.md > @@ -0,0 +1,110 @@ > +# Nested SVM (AMD) CPUID requirements > + > +The first step in making nested SVM production-ready is to make sure > +that all features are implemented and well-tested. To make this > +tractable, we will initially be limiting the "supported" range of > +nested virt to a specific subset of host and guest features. This > +document describes the criteria for deciding on features, and the > +rationale behind each feature. > + > +For AMD, all virtualization-related features can be found in CPUID > +leaf 800A:edx > + > +# Criteria > + > +- Processor support: At a minimum we want to support processors from > + the last 5 years. All things being equal, older processors are > + better. Nit: Perhaps missing "covering"? Generally I hope newer processors are "better". > Bits 0:7 were available in the very earliest processors; > + and even through bit 15 we should be pretty good support-wise. > + > +- Faithfulness to hardware: We need the behavior of the "virtual cpu" > + from the L1 hypervisor's perspective to be as close as possible to > + the original hardware. In particular, the behavior of the hardware > + on error paths 1) is not easy to understand or test, 2) can be the > + source of surprising vulnerabiliies. (See XSA-7 for an example of a > + case where subtle error-handling differences can open up a privilege > + escalation.) We should avoid emulating any bit of the hardware with > + complex error paths if we can at all help it. > + > +- Cost of implementation: We want to minimize the cost of > + implementation (where this includes bringing an existing sub-par > + implementation up to speed). All things being equal, we'll favor a > + configuration which does not require any new implementation. > + > +- Performance: All things being equal, we'd prefer to choose a set of > + L0 / L1 CPUID bits that are faster than slower. > + > + > +# Bits > + > +- 0 `NP` *Nested Paging*: Required both for L0 and L1. > + > + Based primarily on faithfulness and performance, as well as > + potential cost of implementation. Available on earliest hardware, > + so no compatibility issues. > + > +- 1 `LbrVirt` *LBR / debugging virtualization*: Require for L0 and L1. > + > + For L0 this is required for performance: There's no way to tell the > + guests not to use the LBR-related registers; and if the guest does, > + then you have to save and restore all LBR-related registers on > + context switch, which is prohibitive. "prohibitive" is too strong imo; maybe "undesirable"? > --- a/xen/arch/x86/hvm/svm/nestedhvm.h > +++ b/xen/arch/x86/hvm/svm/nestedhvm.h > @@ -35,6 +35,7 @@ enum nestedhvm_vmexits > nestedsvm_check_intercepts(struct vcpu *v, struct cpu_user_regs *regs, > uint64_t exitcode); > void svm_nested_features_on_efer_update(struct vcpu *v); > +void __init start_nested_svm(struct hvm_function_table *svm_function_table); No section placement attributes on declarations, please. > --- a/xen/arch/x86/hvm/svm/nestedsvm.c > +++ b/xen/arch/x86/hvm/svm/nestedsvm.c > @@ -1666,3 +1666,17 @@ void svm_nested_features_on_efer_update(struct vcpu *v) > } > } > } > + > +void __init start_nested_svm(struct hvm_function_table *svm_function_table) > +{ > +/* > + * Required host functionality to support nested virt. See > + * docs/designs/nested-svm-cpu-features.md for rationale. > + */ > +svm_function_table->caps.nested_virt = > +cpu_has_svm_nrips && > +cpu_has_svm_lbrv && > +cpu_has_svm_nrips && nrips twice? Was the earlier one meant to be npt? > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -3021,6 +3021,9 @@ const struct hvm_function_table * __init start_vmx(void) > if ( cpu_has_vmx_tsc_scaling ) > vmx_function_table.tsc_scaling.ratio_frac_bits = 48; > > +/* TODO: Require hardware support before enabling nested virt */ > +vmx_function_table.caps.nested_virt = vmx_function_table.caps.hap; This won't have the intended effect if hap_supported() ends up clearing the bit (used as input here) due to a command line option override. I wonder if instead this wants doing e.g. in a new hook to be called from nestedhvm_setup(). On the SVM side the hook function would then be the start_nested_svm() that you already introduce, with a caps.hap check added. Since you leave the other nested-related if() in place in arch_sanitise_domain_config(), all ought to be well, but I think that other if() really wants replacing by the one you presently add. Jan
Re: [PATCH 3/6] xen/hvm: Move other hvm_function_table booleans into the caps bitfield
On 06.02.2024 02:20, George Dunlap wrote: > --- a/xen/arch/x86/include/asm/hvm/hvm.h > +++ b/xen/arch/x86/include/asm/hvm/hvm.h > @@ -86,20 +86,19 @@ struct hvm_vcpu_nonreg_state { > struct hvm_function_table { > const char *name; > > -/* Support Hardware-Assisted Paging? */ > -bool hap_supported; > - > -/* Necessary hardware support for alternate p2m's? */ > -bool altp2m_supported; > -bool singlestep_supported; > - > -/* Hardware virtual interrupt delivery enable? */ > -bool virtual_intr_delivery_enabled; > - > struct { > /* Indicate HAP capabilities. */ > -bool hap_superpage_1gb:1, > -hap_superpage_2mb:1; > +bool hap:1, > + hap_superpage_1gb:1, > + hap_superpage_2mb:1, > + > +/* Altp2m capabilities */ > +altp2m:1, > +singlestep:1, > + > +/* Hardware virtual interrupt delivery enable? */ > +virtual_intr_delivery; > + > } caps; Nit (spotted only while looking at patch 6): You're adding a stray blank line at the end of the structure. Further I expect virtual_intr_delivery would also want to be just a single bit? Jan
Re: [PATCH 5/6] nestedsvm: Remove bogus debug message from nestedsvm_check_intercepts
On 06.02.2024 02:20, George Dunlap wrote: > Changeset ef3e8db8068 ("x86/hvm: Corrections and improvements to > unhandled vmexit logging") introduced a printk to the default path of > the switch statement in nestedsvm_check_intercepts(), complaining of > an unknown exit reason. > > Unfortunately, the "core" switch statement which is meant to handle > all vmexit reasons is in nsvm_vmcb_guest_intercepts_exitcode(); the > switch statement in nestedsvm_check_intercepts() is only meant to > superimpose on top of that some special-casing for how to interaction > between L1 and L0 vmexits. > > Remove the printk, and add a comment to prevent future confusion. > > Signed-off-by: George Dunlap Reviewed-by: Jan Beulich I wonder if a Fixes: tag is warranted here. Jan
Re: [XEN v4 2/3] xen/arm: arm64: Add emulation of Debug Data Transfer Registers
Hi Ayan, On 19/02/2024 15:45, Ayan Kumar Halder wrote: > > On 06/02/2024 19:05, Julien Grall wrote: >> Hi Ayan, > Hi Julien/Michal, >> >> On 31/01/2024 12:10, Ayan Kumar Halder wrote: >>> From: Michal Orzel >>> >>> Currently, if user enables HVC_DCC config option in Linux, it invokes >>> access >>> to debug data transfer registers (i.e. DBGDTRTX_EL0 on arm64, >>> DBGDTRTXINT on >>> arm32). As these registers are not emulated, Xen injects an undefined >>> exception to the guest and Linux crashes. >>> >>> To prevent this crash, introduce a partial emulation of DBGDTR[TR]X_EL0 >>> (these registers share the same encoding) as RAZ/WI and MDCCSR_EL0 as >>> TXfull. >>> >>> Refer ARM DDI 0487J.a ID042523, D19.3.8, DBGDTRTX_EL0 >>> "If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN". >>> >>> Thus, any OS is expected to read MDCCSR_EL0 and check for TXfull before >>> using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() ---> >>> hvc_dcc_check(), >>> and returns -ENODEV in case TXfull bit is still set after writing a test >>> character. This way we prevent the guest from making use of HVC DCC as a >>> console. >>> >>> Signed-off-by: Michal Orzel >>> Signed-off-by: Ayan Kumar Halder >>> --- >>> Changes from >>> >>> v1 :- 1. DBGDTR_EL0 does not emulate RXfull. This is to avoid giving >>> the OS any >>> indication that the RX buffer is full and is waiting to be read. >>> >>> 2. In Arm32, DBGOSLSR is emulated. Also DBGDTRTXINT is emulated at >>> EL0 only. >>> >>> 3. Fixed the commit message and inline code comments. >>> >>> v2 :- 1. Split the patch into two (separate patches for arm64 and >>> arm32). >>> 2. Removed the "fail" label. >>> 3. Fixed the commit message. >>> >>> v3 :- 1. "HSR_SYSREG_MDCCSR_EL0" emulation differs based on whether >>> partial_emulation_enabled is true or not. >>> >>> 2. If partial_emulation_enabled is false, then access to >>> HSR_SYSREG_DBGDTR_EL0, >>> HSR_SYSREG_DBGDTRTX_EL0 would lead to undefined exception. >>> >>> xen/arch/arm/arm64/vsysreg.c | 28 >>> xen/arch/arm/include/asm/arm64/hsr.h | 3 +++ >>> 2 files changed, 27 insertions(+), 4 deletions(-) >>> >>> diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c >>> index b5d54c569b..94f0a6c384 100644 >>> --- a/xen/arch/arm/arm64/vsysreg.c >>> +++ b/xen/arch/arm/arm64/vsysreg.c >>> @@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs, >>> * >>> * Unhandled: >>> * MDCCINT_EL1 >>> - * DBGDTR_EL0 >>> - * DBGDTRRX_EL0 >>> - * DBGDTRTX_EL0 >>> * OSDTRRX_EL1 >>> * OSDTRTX_EL1 >>> * OSECCR_EL1 >>> @@ -173,10 +170,32 @@ void do_sysreg(struct cpu_user_regs *regs, >>> return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1); >>> case HSR_SYSREG_MDCCSR_EL0: >>> /* >>> + * Xen doesn't expose a real (or emulated) Debug >>> Communications Channel >>> + * (DCC) to a domain. Yet the Arm ARM implies this is not an >>> optional >>> + * feature. So some domains may start to probe it. For >>> instance, the >>> + * HVC_DCC driver in Linux (since f35dc083 and at least >>> up to v6.7), >>> + * will try to write some characters and check if the >>> transmit buffer >>> + * has emptied. >>> + * >>> + * By setting TX status bit (only if partial emulation is >>> enabled) to >>> + * indicate the transmit buffer is full, we would hint the >>> OS that the >>> + * DCC is probably not working. >>> + * >>> + * Bit 29: TX full >>> + * >>> * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We >>> emulate that >>> * register as RAZ/WI above. So RO at both EL0 and EL1. >> >> The sentence "we emulate that register as ..." seems to be stale? I can see that you tried to handle Julien remark here. But I disagree. This statement is not stale. It explains that because MDSCR_EL1 is RAZ/WI, MDCCSR_EL0 is RO at both EL0 and EL1. This patch does not change this behavior. >> >>> */ >>> - return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 0); >>> + return handle_ro_read_val(regs, regidx, hsr.sysreg.read, >>> hsr, 0, >>> + partial_emulation ? (1U << 29) : 0); >>> + >>> + case HSR_SYSREG_DBGDTR_EL0: >>> + /* DBGDTR[TR]X_EL0 share the same encoding */ >>> + case HSR_SYSREG_DBGDTRTX_EL0: >>> + if ( !partial_emulation ) >>> + goto fail; >>> + return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 0); >> >> AFAICT, all the emulation helpers have an explanation why we are using >> them. But here this is not the case. Can you add one? > This and.. >> >>> + >>> HSR_SYSREG_DBG_CASES(DBGBVR): >>> HSR_SYSREG_DBG_CASES(DBGBCR): >>> HSR_SYSREG_DBG_CASES(DBGWVR): >>> @@ -394,6 +413,7 @@ void do_sysreg(struct cpu_user_regs *regs, >>> * And all other
Re: Stats on Xen tarball downloads
On Mon, Feb 19, 2024 at 06:01:54PM +0800, George Dunlap wrote: > > Looking at the *non*-4.18 downloads, nearly all of them have user > agents that make it clear they're part of automated build systems: > user agents like curl and wget, but also "Go-http-client", "libfetch", I reject this claim. `curl` or `wget` could be part of an interactive operation. Telling a browser to copy a URL into the paste buffer, then using `wget`/`curl` is entirely possible. I may be the outlier, but I routinely do this. I don't know whether Gentoo's `emerge` uses `wget`/`curl`, but that could be semi-interactive. > It's not really clear to me why we'd be getting 300-ish people > downloading the Xen 4.18.0 tarball, 2/3 of which are on Windows. But > then I'm also not sure why someone would *fake* hundreds of downloads > a week from unique IP addresses; and in particular, if you were going > to fake hundreds of downloads a week, I'm not sure why you'd only fake > the most recent release. Remember the browser wars? At one point many sites were looking for IE/Windows and sending back error messages without those. Getting the tarball on Windows doesn't seem too likely, faking the browser was pretty common for a while. -- (\___(\___(\__ --=> 8-) EHM <=-- __/)___/)___/) \BS (| ehem+sig...@m5p.com PGP 87145445 |) / \_CS\ | _ -O #include O- _ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On 19.02.2024 16:20, Oleksii wrote: > On Mon, 2024-02-19 at 15:12 +0100, Jan Beulich wrote: >> On 19.02.2024 15:00, Oleksii wrote: >>> On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote: On 05/02/2024 15:32, Oleksii Kurochko wrote: > --- /dev/null > +++ b/xen/arch/riscv/include/asm/cmpxchg.h > @@ -0,0 +1,237 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* Copyright (C) 2014 Regents of the University of California > */ > + > +#ifndef _ASM_RISCV_CMPXCHG_H > +#define _ASM_RISCV_CMPXCHG_H > + > +#include > +#include > + > +#include > +#include > +#include > + > +#define ALIGN_DOWN(addr, size) ((addr) & (~((size) - 1))) > + > +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier, > acquire_barrier) \ > +({ \ > + asm volatile( \ > + release_barrier \ > + " amoswap" sfx " %0, %2, %1\n" \ > + acquire_barrier \ > + : "=r" (ret), "+A" (*ptr) \ > + : "r" (new) \ > + : "memory" ); \ > +}) > + > +#define emulate_xchg_1_2(ptr, new, ret, release_barrier, > acquire_barrier) \ > +({ \ > + uint32_t *ptr_32b_aligned = (uint32_t > *)ALIGN_DOWN((unsigned > long)ptr, 4); \ > + uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - > sizeof(*ptr))) > * BITS_PER_BYTE; \ > + uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \ > + uint8_t mask_h = mask_l + mask_size - 1; \ > + unsigned long mask = GENMASK(mask_h, mask_l); \ > + unsigned long new_ = (unsigned long)(new) << mask_l; \ > + unsigned long ret_; \ > + unsigned long rc; \ > + \ > + asm volatile( \ > + release_barrier \ > + "0: lr.d %0, %2\n" \ I was going to ask why this is lr.d rather than lr.w. But I see Jan already asked. I agree with him that it should probably be a lr.w and ... > + " and %1, %0, %z4\n" \ > + " or %1, %1, %z3\n" \ > + " sc.d %1, %1, %2\n" \ ... respectively sc.w. The same applies for cmpxchg. >>> >>> I agree that it would be better, and my initial attempt was to >>> handle >>> 4-byte or 8-byte boundary crossing during 2-byte access: >>> >>> 0 1 2 3 4 5 6 7 8 >>> X X X 1 1 X X X X >>> >>> In this case, if I align address 3 to address 0 and then read 4 >>> bytes >>> instead of 8 bytes, I will not process the byte at address 4. This >>> was >>> the reason why I started to read 8 bytes. >>> >>> I also acknowledge that there could be an issue in the following >>> case: >>> >>> X 4094 4095 4096 >>> X 1 1 X >>> In this situation, when I read 8 bytes, there could be an issue >>> where >>> the new page (which starts at 4096) will not be mapped. It seems >>> correct in this case to check that variable is within one page and >>> read >>> 4 bytes instead of 8. >>> >>> One more thing I am uncertain about is if we change everything to >>> read >>> 4 bytes with 4-byte alignment, what should be done with the first >>> case? >>> Should we panic? (I am not sure if this is an option.) >> >> Counter question (raised elsewhere already): What if a 4-byte access >> crosses a word / cache line / page boundary? Ideally exactly the >> same would happen for a 2-byte access crossing a respective boundary. >> (Which you can achieve relatively easily by masking off only address >> bit 1, keeping address bit 0 unaltered.) > But if we align down on a 4-byte boundary and then access 4 bytes, we > can't cross a boundary. I agree that the algorithm is not correct, as > it can ignore some values in certain situations. For example: > 0 1 2 3 4 5 6 7 8 > X X X 1 1 X X X X > In this case, the value at address 4 won't be updated. > > I agree that introducing a new macro to check if a variable crosses a > boundary is necessary or as an option we can check that addr is 2-byte > aligned: > > #define CHECK_BOUNDARY_CROSSING(start, end, boundary_size) > ASSERT((start / boundary_size) != (end / boundary_size)) > > Then, it is necessary to check: > > CHECK_BOUNDARY_CROSSING(start, end, 4) > CHECK_BOUNDARY_CROSSING(start, end, PAGE_SIZE) > > But why do we need to check the cache line boundary? In the case of the > cache, the question will only be about performance, but it should still > work, shouldn't it? You don't need to check for any of these boundaries. You can simply leverage what the hardware does for misaligned accesses. See the various other replies I've sent - I thought things should have become pretty much crystal clear by now: For 1-byte accesses you access the containing word, by clearing the low two bits. For 2-byte accesses you also access the containing word, by clearing only bit 1 (which the naturally leaves no bit that needs clearing for the projected [but not necessary] case of handling a 4-byte access). If the resulting 4-byte access then is still misaligned, it'll fault
Re: [PATCH 2/6] svm: Improve type of cpu_has_svm_feature
On 06.02.2024 02:20, George Dunlap wrote: > --- a/xen/arch/x86/include/asm/hvm/svm/svm.h > +++ b/xen/arch/x86/include/asm/hvm/svm/svm.h > @@ -38,7 +38,10 @@ extern u32 svm_feature_flags; > #define SVM_FEATURE_SSS 19 /* NPT Supervisor Shadow Stacks */ > #define SVM_FEATURE_SPEC_CTRL 20 /* MSR_SPEC_CTRL virtualisation */ > > -#define cpu_has_svm_feature(f) (svm_feature_flags & (1u << (f))) > +static inline bool cpu_has_svm_feature(unsigned int feat) > +{ > +return svm_feature_flags & (1u << (feat)); > +} > #define cpu_has_svm_npt cpu_has_svm_feature(SVM_FEATURE_NPT) > #define cpu_has_svm_lbrv cpu_has_svm_feature(SVM_FEATURE_LBRV) > #define cpu_has_svm_svml cpu_has_svm_feature(SVM_FEATURE_SVML) Having seen patch 4 now, I have to raise the question here as well: Why do we need a separate variable (svm_feature_flags) when we could use the host policy (provided it isn't abused; see comments on patch 4)? Jan
Re: [PATCH 4/6] nestedsvm: Disable TscRateMSR
On 06.02.2024 02:20, George Dunlap wrote: > For now, just disable the functionality entirely until we can > implement it properly: > > - Don't set TSCRATEMSR in the host CPUID policy This goes too far: This way you would (in principle) also affect guests with nesting disabled. According to the earlier parts of the description there's also no issue with it in that case. What you want to make sure it that in the HVM policy the bit isn't set. While presently resolving to cpu_has_svm_feature(), I think cpu_has_tsc_ratio really ought to resolve to the host policy field. Of course then requiring the host policy to reflect reality rather than having what is "always emulated". IOW ... > --- a/xen/arch/x86/cpu-policy.c > +++ b/xen/arch/x86/cpu-policy.c > @@ -407,8 +407,7 @@ static void __init calculate_host_policy(void) > (1u << SVM_FEATURE_PAUSEFILTER) | > (1u << SVM_FEATURE_DECODEASSISTS)); > /* Enable features which are always emulated. */ > -p->extd.raw[0xa].d |= ((1u << SVM_FEATURE_VMCBCLEAN) | > - (1u << SVM_FEATURE_TSCRATEMSR)); > +p->extd.raw[0xa].d |= (1u << SVM_FEATURE_VMCBCLEAN); ... this likely wants replacing altogether by not overriding what we found in hardware, which would apparently mean moving the two bit masks to the earlier "clamping" expression. But then of course Andrew may know of reasons why all of this is done in calculate_host_policy() in the first place, rather than in HVM policy calculation. Jan
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On Mon, 2024-02-19 at 15:12 +0100, Jan Beulich wrote: > On 19.02.2024 15:00, Oleksii wrote: > > On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote: > > > On 05/02/2024 15:32, Oleksii Kurochko wrote: > > > > --- /dev/null > > > > +++ b/xen/arch/riscv/include/asm/cmpxchg.h > > > > @@ -0,0 +1,237 @@ > > > > +/* SPDX-License-Identifier: GPL-2.0-only */ > > > > +/* Copyright (C) 2014 Regents of the University of California > > > > */ > > > > + > > > > +#ifndef _ASM_RISCV_CMPXCHG_H > > > > +#define _ASM_RISCV_CMPXCHG_H > > > > + > > > > +#include > > > > +#include > > > > + > > > > +#include > > > > +#include > > > > +#include > > > > + > > > > +#define ALIGN_DOWN(addr, size) ((addr) & (~((size) - 1))) > > > > + > > > > +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier, > > > > acquire_barrier) \ > > > > +({ \ > > > > + asm volatile( \ > > > > + release_barrier \ > > > > + " amoswap" sfx " %0, %2, %1\n" \ > > > > + acquire_barrier \ > > > > + : "=r" (ret), "+A" (*ptr) \ > > > > + : "r" (new) \ > > > > + : "memory" ); \ > > > > +}) > > > > + > > > > +#define emulate_xchg_1_2(ptr, new, ret, release_barrier, > > > > acquire_barrier) \ > > > > +({ \ > > > > + uint32_t *ptr_32b_aligned = (uint32_t > > > > *)ALIGN_DOWN((unsigned > > > > long)ptr, 4); \ > > > > + uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - > > > > sizeof(*ptr))) > > > > * BITS_PER_BYTE; \ > > > > + uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \ > > > > + uint8_t mask_h = mask_l + mask_size - 1; \ > > > > + unsigned long mask = GENMASK(mask_h, mask_l); \ > > > > + unsigned long new_ = (unsigned long)(new) << mask_l; \ > > > > + unsigned long ret_; \ > > > > + unsigned long rc; \ > > > > + \ > > > > + asm volatile( \ > > > > + release_barrier \ > > > > + "0: lr.d %0, %2\n" \ > > > > > > I was going to ask why this is lr.d rather than lr.w. But I see > > > Jan > > > already asked. I agree with him that it should probably be a lr.w > > > and > > > ... > > > > > > > + " and %1, %0, %z4\n" \ > > > > + " or %1, %1, %z3\n" \ > > > > + " sc.d %1, %1, %2\n" \ > > > > > > ... respectively sc.w. The same applies for cmpxchg. > > > > I agree that it would be better, and my initial attempt was to > > handle > > 4-byte or 8-byte boundary crossing during 2-byte access: > > > > 0 1 2 3 4 5 6 7 8 > > X X X 1 1 X X X X > > > > In this case, if I align address 3 to address 0 and then read 4 > > bytes > > instead of 8 bytes, I will not process the byte at address 4. This > > was > > the reason why I started to read 8 bytes. > > > > I also acknowledge that there could be an issue in the following > > case: > > > > X 4094 4095 4096 > > X 1 1 X > > In this situation, when I read 8 bytes, there could be an issue > > where > > the new page (which starts at 4096) will not be mapped. It seems > > correct in this case to check that variable is within one page and > > read > > 4 bytes instead of 8. > > > > One more thing I am uncertain about is if we change everything to > > read > > 4 bytes with 4-byte alignment, what should be done with the first > > case? > > Should we panic? (I am not sure if this is an option.) > > Counter question (raised elsewhere already): What if a 4-byte access > crosses a word / cache line / page boundary? Ideally exactly the > same would happen for a 2-byte access crossing a respective boundary. > (Which you can achieve relatively easily by masking off only address > bit 1, keeping address bit 0 unaltered.) But if we align down on a 4-byte boundary and then access 4 bytes, we can't cross a boundary. I agree that the algorithm is not correct, as it can ignore some values in certain situations. For example: 0 1 2 3 4 5 6 7 8 X X X 1 1 X X X X In this case, the value at address 4 won't be updated. I agree that introducing a new macro to check if a variable crosses a boundary is necessary or as an option we can check that addr is 2-byte aligned: #define CHECK_BOUNDARY_CROSSING(start, end, boundary_size) ASSERT((start / boundary_size) != (end / boundary_size)) Then, it is necessary to check: CHECK_BOUNDARY_CROSSING(start, end, 4) CHECK_BOUNDARY_CROSSING(start, end, PAGE_SIZE) But why do we need to check the cache line boundary? In the case of the cache, the question will only be about performance, but it should still work, shouldn't it? ~ Oleksii
[XEN PATCH] xen: cache clearing and invalidation helpers refactoring
The cache clearing and invalidation helpers in x86 and Arm didn't comply with MISRA C Rule 17.7: "The value returned by a function having non-void return type shall be used". On Arm they were always returning 0, while some in x86 returned -EOPNOTSUPP and in common/grant_table the return value is saved. As a consequence, a common helper arch_grant_cache_flush that returns an integer is introduced, so that each architecture can choose whether to return an error value on certain conditions, and the helpers have either been changed to return void (on Arm) or deleted entirely (on x86). Signed-off-by: Julien Grall Signed-off-by: Nicola Vetrini --- The original refactor idea came from Julien Grall in [1]; I edited that proposal to fix build errors. I did introduce a cast to void for the call to flush_area_local on x86, because even before this patch the return value of that function wasn't checked in all but one use in x86/smp.c, and in this context the helper (perhaps incidentally) ignored the return value of flush_area_local. [1] https://lore.kernel.org/xen-devel/09589e8f-77b6-47f7-b5bd-cf485e4b6...@xen.org/ --- xen/arch/arm/include/asm/page.h | 33 ++--- xen/arch/x86/include/asm/flushtlb.h | 23 ++-- xen/common/grant_table.c| 9 +--- 3 files changed, 34 insertions(+), 31 deletions(-) diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h index 69f817d1e68a..e90c9de3616e 100644 --- a/xen/arch/arm/include/asm/page.h +++ b/xen/arch/arm/include/asm/page.h @@ -123,6 +123,7 @@ #ifndef __ASSEMBLY__ +#include #include #include #include @@ -159,13 +160,13 @@ static inline size_t read_dcache_line_bytes(void) * if 'range' is large enough we might want to use model-specific * full-cache flushes. */ -static inline int invalidate_dcache_va_range(const void *p, unsigned long size) +static inline void invalidate_dcache_va_range(const void *p, unsigned long size) { size_t cacheline_mask = dcache_line_bytes - 1; unsigned long idx = 0; if ( !size ) -return 0; +return; /* Passing a region that wraps around is illegal */ ASSERT(((uintptr_t)p + size - 1) >= (uintptr_t)p); @@ -188,17 +189,15 @@ static inline int invalidate_dcache_va_range(const void *p, unsigned long size) asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p + idx)); dsb(sy); /* So we know the flushes happen before continuing */ - -return 0; } -static inline int clean_dcache_va_range(const void *p, unsigned long size) +static inline void clean_dcache_va_range(const void *p, unsigned long size) { size_t cacheline_mask = dcache_line_bytes - 1; unsigned long idx = 0; if ( !size ) -return 0; +return; /* Passing a region that wraps around is illegal */ ASSERT(((uintptr_t)p + size - 1) >= (uintptr_t)p); @@ -211,18 +210,16 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size) idx += dcache_line_bytes, size -= dcache_line_bytes ) asm volatile (__clean_dcache_one(0) : : "r" (p + idx)); dsb(sy); /* So we know the flushes happen before continuing */ -/* ARM callers assume that dcache_* functions cannot fail. */ -return 0; } -static inline int clean_and_invalidate_dcache_va_range +static inline void clean_and_invalidate_dcache_va_range (const void *p, unsigned long size) { size_t cacheline_mask = dcache_line_bytes - 1; unsigned long idx = 0; if ( !size ) -return 0; +return; /* Passing a region that wraps around is illegal */ ASSERT(((uintptr_t)p + size - 1) >= (uintptr_t)p); @@ -235,8 +232,6 @@ static inline int clean_and_invalidate_dcache_va_range idx += dcache_line_bytes, size -= dcache_line_bytes ) asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p + idx)); dsb(sy); /* So we know the flushes happen before continuing */ -/* ARM callers assume that dcache_* functions cannot fail. */ -return 0; } /* Macros for flushing a single small item. The predicate is always @@ -266,6 +261,20 @@ static inline int clean_and_invalidate_dcache_va_range : : "r" (_p), "m" (*_p)); \ } while (0) +static inline int arch_grant_cache_flush(unsigned int op, const void *p, + unsigned long size) +{ +if ( (op & GNTTAB_CACHE_INVAL) && (op & GNTTAB_CACHE_CLEAN) ) +clean_and_invalidate_dcache_va_range(p, size); +else if ( op & GNTTAB_CACHE_INVAL ) +invalidate_dcache_va_range(p, size); +else if ( op & GNTTAB_CACHE_CLEAN ) +clean_dcache_va_range(p, size); + +/* ARM callers assume that dcache_* functions cannot fail. */ +return 0; +} + /* * Write a pagetable entry. * diff --git a/xen/arch/x86/include/asm/flushtlb.h
Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.3
On 19.02.2024 15:59, Federico Serafini wrote: > On 19/02/24 14:43, Jan Beulich wrote: >> On 19.02.2024 14:24, Federico Serafini wrote: >>> Update ECLAIR configuration to consider safe switch clauses ending >>> with __{get,put}_user_bad(). >>> >>> Update docs/misra/deviations.rst accordingly. >>> >>> Signed-off-by: Federico Serafini >> >> As mentioned I'm not happy with this, not the least because of it being >> unclear why these two would be deviated, when there's no sign of a >> similar deviation for, say, __bad_atomic_size(). Imo this approach >> doesn't scale, and that's already leaving aside that the purpose of >> identically named (pseudo-)helpers could differ between architectures, >> thus putting under question ... >> >>> --- a/automation/eclair_analysis/ECLAIR/deviations.ecl >>> +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl >>> @@ -368,6 +368,10 @@ safe." >>> -config=MC3R1.R16.3,reports+={safe, >>> "any_area(end_loc(any_exp(text(/BUG\\(\\);/"} >>> -doc_end >>> >>> +-doc_begin="Switch clauses ending with constructs \"__get_user_bad()\" and >>> \"__put_user_bad()\" are safe: they denote an unreachable program point." >>> +-config=MC3R1.R16.3,reports+={safe, >>> "any_area(end_loc(any_exp(text(/__(put|get)_user_bad\\(\\);/"} >>> +-doc_end >> >> ... the global scope of such a deviation. While it may not be a good idea, >> even within an arch such (pseudo-)helpers could be used for multiple >> distinct purposes. > > Would you agree with adding the missing break statement after > the uses of __{put,get}_user_bad() (as it is already happening for > uses of __bad_atomic_size())? I probably wouldn't mind that (despite being a little pointless). Perhaps declaring them as noreturn would also help? Jan
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On 19.02.2024 15:29, Oleksii wrote: > On Mon, 2024-02-19 at 12:22 +0100, Jan Beulich wrote: >> On 15.02.2024 14:41, Oleksii wrote: >>> As I mentioned above with 4-byte alignment and then reading and >>> working >>> with 8-byte then crossing a word or double-word boundary shouldn't >>> be >>> an issue. >>> >>> I am not sure that I know how to check that we are crossing cache >>> line >>> boundary. >>> >>> Regarding page boundary, if the next page is mapped then all should >>> work fine, otherwise it will be an exception. >> >> Are you sure lr.d / sc.d are happy to access across such a boundary, >> when both pages are mapped? > If they are mapped, my expectation that lr.d and sc.d should be happy. How does this expectation of yours fit with the A extension doc having this: "For LR and SC, the A extension requires that the address held in rs1 be naturally aligned to the size of the operand (i.e., eight-byte aligned for 64-bit words and four-byte aligned for 32-bit words). If the address is not naturally aligned, an address-misaligned exception or an access-fault exception will be generated." It doesn't even say "may"; it says "will". >> To me it seems pretty clear that for atomic accesses you want to >> demand natural alignment, i.e. 2-byte alignment for 2-byte accesses. >> This way you can be sure no potentially problematic boundaries will >> be crossed. > It makes sense, but I am not sure that I can guarantee that a user of > macros will always have 2-byte alignment (except during a panic) in the > future. > > Even now, I am uncertain that everyone will be willing to add > __alignment(...) to struct vcpu->is_urgent > (xen/include/xen/sched.h:218) and other possible cases to accommodate > RISC-V requirements. ->is_urgent is bool, i.e. 1 byte and hence okay at any address. For all normal variables and fields the compiler will guarantee suitable (natural) alignment. What you prohibit by requiring aligned items is use of fields of e.g. packed structures. >>> As 1- and 2-byte cases are emulated I decided that is not to >>> provide >>> sfx argument for emulation macros as it will not have to much >>> affect on >>> emulated types and just consume more performance on acquire and >>> release >>> version of sc/ld instructions. >> >> Question is whether the common case (4- and 8-byte accesses) >> shouldn't >> be valued higher, with 1- and 2-byte emulation being there just to >> allow things to not break altogether. > If I understand you correctly, it would make sense to add the 'sfx' > argument for the 1/2-byte access case, ensuring that all options are > available for 1/2-byte access case as well. That's one of the possibilities. As said, I'm not overly worried about the emulated cases. For the initial implementation I'd recommend going with what is easiest there, yielding the best possible result for the 4- and 8-byte cases. If later it turns out repeated acquire/release accesses are a problem in the emulation loop, things can be changed to explicit barriers, without touching the 4- and 8-byte cases. Jan
Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.3
On 19/02/24 14:43, Jan Beulich wrote: On 19.02.2024 14:24, Federico Serafini wrote: Update ECLAIR configuration to consider safe switch clauses ending with __{get,put}_user_bad(). Update docs/misra/deviations.rst accordingly. Signed-off-by: Federico Serafini As mentioned I'm not happy with this, not the least because of it being unclear why these two would be deviated, when there's no sign of a similar deviation for, say, __bad_atomic_size(). Imo this approach doesn't scale, and that's already leaving aside that the purpose of identically named (pseudo-)helpers could differ between architectures, thus putting under question ... --- a/automation/eclair_analysis/ECLAIR/deviations.ecl +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl @@ -368,6 +368,10 @@ safe." -config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(/BUG\\(\\);/"} -doc_end +-doc_begin="Switch clauses ending with constructs \"__get_user_bad()\" and \"__put_user_bad()\" are safe: they denote an unreachable program point." +-config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(/__(put|get)_user_bad\\(\\);/"} +-doc_end ... the global scope of such a deviation. While it may not be a good idea, even within an arch such (pseudo-)helpers could be used for multiple distinct purposes. Would you agree with adding the missing break statement after the uses of __{put,get}_user_bad() (as it is already happening for uses of __bad_atomic_size())? -- Federico Serafini, M.Sc. Software Engineer, BUGSENG (http://bugseng.com)
Re: [XEN v4 2/3] xen/arm: arm64: Add emulation of Debug Data Transfer Registers
On 06/02/2024 19:05, Julien Grall wrote: Hi Ayan, Hi Julien/Michal, On 31/01/2024 12:10, Ayan Kumar Halder wrote: From: Michal Orzel Currently, if user enables HVC_DCC config option in Linux, it invokes access to debug data transfer registers (i.e. DBGDTRTX_EL0 on arm64, DBGDTRTXINT on arm32). As these registers are not emulated, Xen injects an undefined exception to the guest and Linux crashes. To prevent this crash, introduce a partial emulation of DBGDTR[TR]X_EL0 (these registers share the same encoding) as RAZ/WI and MDCCSR_EL0 as TXfull. Refer ARM DDI 0487J.a ID042523, D19.3.8, DBGDTRTX_EL0 "If TXfull is set to 1, set DTRRX and DTRTX to UNKNOWN". Thus, any OS is expected to read MDCCSR_EL0 and check for TXfull before using DBGDTRTX_EL0. Linux does it via hvc_dcc_init() ---> hvc_dcc_check(), and returns -ENODEV in case TXfull bit is still set after writing a test character. This way we prevent the guest from making use of HVC DCC as a console. Signed-off-by: Michal Orzel Signed-off-by: Ayan Kumar Halder --- Changes from v1 :- 1. DBGDTR_EL0 does not emulate RXfull. This is to avoid giving the OS any indication that the RX buffer is full and is waiting to be read. 2. In Arm32, DBGOSLSR is emulated. Also DBGDTRTXINT is emulated at EL0 only. 3. Fixed the commit message and inline code comments. v2 :- 1. Split the patch into two (separate patches for arm64 and arm32). 2. Removed the "fail" label. 3. Fixed the commit message. v3 :- 1. "HSR_SYSREG_MDCCSR_EL0" emulation differs based on whether partial_emulation_enabled is true or not. 2. If partial_emulation_enabled is false, then access to HSR_SYSREG_DBGDTR_EL0, HSR_SYSREG_DBGDTRTX_EL0 would lead to undefined exception. xen/arch/arm/arm64/vsysreg.c | 28 xen/arch/arm/include/asm/arm64/hsr.h | 3 +++ 2 files changed, 27 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c index b5d54c569b..94f0a6c384 100644 --- a/xen/arch/arm/arm64/vsysreg.c +++ b/xen/arch/arm/arm64/vsysreg.c @@ -159,9 +159,6 @@ void do_sysreg(struct cpu_user_regs *regs, * * Unhandled: * MDCCINT_EL1 - * DBGDTR_EL0 - * DBGDTRRX_EL0 - * DBGDTRTX_EL0 * OSDTRRX_EL1 * OSDTRTX_EL1 * OSECCR_EL1 @@ -173,10 +170,32 @@ void do_sysreg(struct cpu_user_regs *regs, return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1); case HSR_SYSREG_MDCCSR_EL0: /* + * Xen doesn't expose a real (or emulated) Debug Communications Channel + * (DCC) to a domain. Yet the Arm ARM implies this is not an optional + * feature. So some domains may start to probe it. For instance, the + * HVC_DCC driver in Linux (since f35dc083 and at least up to v6.7), + * will try to write some characters and check if the transmit buffer + * has emptied. + * + * By setting TX status bit (only if partial emulation is enabled) to + * indicate the transmit buffer is full, we would hint the OS that the + * DCC is probably not working. + * + * Bit 29: TX full + * * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We emulate that * register as RAZ/WI above. So RO at both EL0 and EL1. The sentence "we emulate that register as ..." seems to be stale? */ - return handle_ro_raz(regs, regidx, hsr.sysreg.read, hsr, 0); + return handle_ro_read_val(regs, regidx, hsr.sysreg.read, hsr, 0, + partial_emulation ? (1U << 29) : 0); + + case HSR_SYSREG_DBGDTR_EL0: + /* DBGDTR[TR]X_EL0 share the same encoding */ + case HSR_SYSREG_DBGDTRTX_EL0: + if ( !partial_emulation ) + goto fail; + return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 0); AFAICT, all the emulation helpers have an explanation why we are using them. But here this is not the case. Can you add one? This and.. + HSR_SYSREG_DBG_CASES(DBGBVR): HSR_SYSREG_DBG_CASES(DBGBCR): HSR_SYSREG_DBG_CASES(DBGWVR): @@ -394,6 +413,7 @@ void do_sysreg(struct cpu_user_regs *regs, * And all other unknown registers. */ default: + fail: AFAICT, this would violate MISRA 15.3 [1]. We didn't seem to have yet (?) accepted the rule, but I don't see we would not given I feel this is similar to what Rule 16.2 is trying to prevent and we accepted it. I think case, I move all the code within default outside. And then call "goto fail" from the default label. I am not sure if I have interpreted this correctly. Is it ok if you can take a look at the attached patch and let me know if the explaination and the code change looks sane ? - Ayan { const struct hsr_sysreg sysreg = hsr.sysreg; diff --git a/xen/arch/arm/include/asm/arm64/hsr.h b/xen/arch/arm/include/asm/arm64/hsr.h
Re: [PATCH v4 17/30] xen/riscv: introduce regs.h
Hi Julien, On Sun, 2024-02-18 at 18:22 +, Julien Grall wrote: > Hi, > > On 05/02/2024 15:32, Oleksii Kurochko wrote: > > Signed-off-by: Oleksii Kurochko > > Acked-by: Jan Beulich > > -- > > Changes in V4: > > - add Acked-by: Jan Beulich > > - s/BUG()/BUG_ON("unimplemented") > > --- > > Changes in V3: > > - update the commit message > > - add Acked-by: Jan Beulich > > - remove "include " and use a forward declaration > > instead. > > --- > > Changes in V2: > > - change xen/lib.h to xen/bug.h > > - remove unnecessary empty line > > --- > > xen/arch/riscv/include/asm/regs.h | 29 > > + > > 1 file changed, 29 insertions(+) > > create mode 100644 xen/arch/riscv/include/asm/regs.h > > > > diff --git a/xen/arch/riscv/include/asm/regs.h > > b/xen/arch/riscv/include/asm/regs.h > > new file mode 100644 > > index 00..c70ea2aa0c > > --- /dev/null > > +++ b/xen/arch/riscv/include/asm/regs.h > > @@ -0,0 +1,29 @@ > > +/* SPDX-License-Identifier: GPL-2.0-only */ > > +#ifndef __ARM_RISCV_REGS_H__ > > +#define __ARM_RISCV_REGS_H__ > > + > > +#ifndef __ASSEMBLY__ > > + > > +#include > > + > > +#define hyp_mode(r) (0) > > I don't understand where here you return 0 (which should really be > false) but ... > > > + > > +struct cpu_user_regs; > > + > > +static inline bool guest_mode(const struct cpu_user_regs *r) > > +{ > > + BUG_ON("unimplemented"); > > +} > > ... here you return BUG_ON(). But I couldn't find any user of both > guest_mode() and hyp_mode(). So isn't it a bit prematurate to > introduce > the helpers? I agree regarding hyp_mode() it can be dropped , but gest_mode() is used by common/keyhandler.c:142. ~ Oleksii
Re: [PATCH 3/6] xen/hvm: Move other hvm_function_table booleans into the caps bitfield
On 06.02.2024 02:20, George Dunlap wrote: > @@ -144,7 +144,7 @@ static bool __init hap_supported(struct > hvm_function_table *fns) > > if ( !opt_hap_enabled ) > { > -fns->hap_supported = 0; > +fns->caps.hap = 0; As you touch such, would you mind switching to true/false instead of 1/0 at this occasion? > @@ -3000,7 +3000,7 @@ const struct hvm_function_table * __init start_vmx(void) > vmx_function_table.update_eoi_exit_bitmap = > vmx_update_eoi_exit_bitmap; > vmx_function_table.process_isr = vmx_process_isr; > vmx_function_table.handle_eoi = vmx_handle_eoi; > -vmx_function_table.virtual_intr_delivery_enabled = true; > +vmx_function_table.caps.virtual_intr_delivery = true; I'm unsure about this one - it had "enabled" in its name for a good reason. Then again its (somewhat more involved) derivation from other capability bits (and a command line option) isn't fundamentally different from that of, say, hap_supported. Hence I guess with the other item taken care of Reviewed-by: Jan Beulich Jan
Re: [PATCH v4 14/30] xen/riscv: introduce atomic.h
Hi Julien, On Sun, 2024-02-18 at 19:22 +, Julien Grall wrote: > Hi, > > On 05/02/2024 15:32, Oleksii Kurochko wrote: > > From: Bobby Eshleman > > > > Additionally, this patch introduces macros in fence.h, > > which are utilized in atomic.h. > > > > atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) > > were updated to use __*xchg_generic(). > > > > Signed-off-by: Oleksii Kurochko > > The author is Bobby, but I don't see a Signed-off-by. Did you forgot > it? I missed to add that as I thought that it would be enough to change a commit author. > > > --- > > Changes in V4: > > - do changes related to the updates of [PATCH v3 13/34] > > xen/riscv: introduce cmpxchg.h > > - drop casts in read_atomic_size(), write_atomic(), add_sized() > > - tabs -> spaces > > - drop #ifdef CONFIG_SMP ... #endif in fence.ha as it is simpler > > to handle NR_CPUS=1 > > the same as NR_CPUS>1 with accepting less than ideal > > performance. > > --- > > Changes in V3: > > - update the commit message > > - add SPDX for fence.h > > - code style fixes > > - Remove /* TODO: ... */ for add_sized macros. It looks correct > > to me. > > - re-order the patch > > - merge to this patch fence.h > > --- > > Changes in V2: > > - Change an author of commit. I got this header from Bobby's old > > repo. > > --- > > xen/arch/riscv/include/asm/atomic.h | 395 > > > > xen/arch/riscv/include/asm/fence.h | 8 + > > 2 files changed, 403 insertions(+) > > create mode 100644 xen/arch/riscv/include/asm/atomic.h > > create mode 100644 xen/arch/riscv/include/asm/fence.h > > > > diff --git a/xen/arch/riscv/include/asm/atomic.h > > b/xen/arch/riscv/include/asm/atomic.h > > new file mode 100644 > > index 00..267d3c0803 > > --- /dev/null > > +++ b/xen/arch/riscv/include/asm/atomic.h > > @@ -0,0 +1,395 @@ > > +/* SPDX-License-Identifier: GPL-2.0-only */ > > +/* > > + * Taken and modified from Linux. > > Which version of Linux? Can you also spell out what are the big > changes? > This would be helpful if we need to re-sync. Sure, I'll add the changes here. > > > + * > > + * atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were > > updated to use > > + * __*xchg_generic() > > + * > > + * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved. > > + * Copyright (C) 2012 Regents of the University of California > > + * Copyright (C) 2017 SiFive > > + * Copyright (C) 2021 Vates SAS > > + */ > > + > > +#ifndef _ASM_RISCV_ATOMIC_H > > +#define _ASM_RISCV_ATOMIC_H > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +void __bad_atomic_size(void); > > + > > +static always_inline void read_atomic_size(const volatile void *p, > > + void *res, > > + unsigned int size) > > +{ > > + switch ( size ) > > + { > > + case 1: *(uint8_t *)res = readb(p); break; > > + case 2: *(uint16_t *)res = readw(p); break; > > + case 4: *(uint32_t *)res = readl(p); break; > > + case 8: *(uint32_t *)res = readq(p); break; > > + default: __bad_atomic_size(); break; > > + } > > +} > > + > > +#define read_atomic(p) ({ \ > > + union { typeof(*p) val; char c[0]; } x_; \ > > + read_atomic_size(p, x_.c, sizeof(*p)); \ > > + x_.val; \ > > +}) > > + > > +#define write_atomic(p, x) \ > > +({ \ > > + typeof(*p) x__ = (x); \ > > + switch ( sizeof(*p) ) \ > > + { \ > > + case 1: writeb((uint8_t)x__, p); break; \ > > + case 2: writew((uint16_t)x__, p); break; \ > > + case 4: writel((uint32_t)x__, p); break; \ > > + case 8: writeq((uint64_t)x__, p); break; \ > > + default: __bad_atomic_size(); break; \ > > + } \ > > + x__; \ > > +}) > > + > > +#define add_sized(p, x) \ > > +({ \ > > + typeof(*(p)) x__ = (x); \ > > + switch ( sizeof(*(p)) ) \ > > + { \ > > + case 1: writeb(read_atomic(p) + x__, p); break; \ > > + case 2: writew(read_atomic(p) + x__, p); break; \ > > + case 4: writel(read_atomic(p) + x__, p); break; \ > > + default: __bad_atomic_size(); break; \ > > + } \ > > +}) > > + > > +/* > > + * __unqual_scalar_typeof(x) - Declare an unqualified scalar > > type, leaving > > + *
Re: [PATCH v4 13/30] xen/riscv: introduce io.h
On Sun, 2024-02-18 at 19:07 +, Julien Grall wrote: > > > +/* > > + * Unordered I/O memory access primitives. These are even more > > relaxed than > > + * the relaxed versions, as they don't even order accesses between > > successive > > + * operations to the I/O regions. > > + */ > > +#define readb_cpu(c) ({ uint8_t __r = __raw_readb(c); > > __r; }) > > +#define readw_cpu(c) ({ uint16_t __r = > > le16_to_cpu((__force __le16)__raw_readw(c)); __r; }) > > +#define readl_cpu(c) ({ uint32_t __r = > > le32_to_cpu((__force __le32)__raw_readl(c)); __r; }) > > + > > +#define writeb_cpu(v,c)((void)__raw_writeb(v,c)) > > +#define > > writew_cpu(v,c) ((void)__raw_writew((__force > > uint16_t)cpu_to_le16(v),c)) > > +#define > > writel_cpu(v,c) ((void)__raw_writel((__force > > uint32_t)cpu_to_le32(v),c)) > > NIT: __raw_write*() are already returning void. So I am not sure to > understand the pointer of the cast. IIUC, this is coming from Linux, > are > you intend to keep the code as-is (including style)? If not, then I > woudl consider to drop the cast on the three lines above and ... Changes have already been made in this header, so it makes sense to remove these casts. Thanks. ~ Oleksii
[xen-unstable-smoke test] 184701: tolerable all pass - PUSHED
flight 184701 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/184701/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen c144b9e32427ba37e0e0443a0d3fa53e9fb878b6 baseline version: xen 0441c3acc7e9e72e984ce49d32e61827894ae4a3 Last test of basis 184685 2024-02-16 18:03:46 Z2 days Testing same since 184701 2024-02-19 12:03:49 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Anthony PERARD Frediano Ziglio Julien Grall Marek Marczykowski-Górecki Oleksandr Tyshchenko Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 0441c3acc7..c144b9e324 c144b9e32427ba37e0e0443a0d3fa53e9fb878b6 -> smoke
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On Mon, 2024-02-19 at 12:22 +0100, Jan Beulich wrote: > On 15.02.2024 14:41, Oleksii wrote: > > > > + : "=r" (ret), "+A" (*ptr) \ > > > > + : "r" (new) \ > > > > + : "memory" ); \ > > > > +}) > > > > + > > > > +#define emulate_xchg_1_2(ptr, new, ret, release_barrier, > > > > acquire_barrier) \ > > > > +({ \ > > > > + uint32_t *ptr_32b_aligned = (uint32_t > > > > *)ALIGN_DOWN((unsigned > > > > long)ptr, 4); \ > > > > > > You now appear to assume that this macro is only used with inputs > > > not > > > crossing word boundaries. That's okay as long as suitably > > > guaranteed > > > at the use sites, but imo wants saying in a comment. > > > > > > > + uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - > > > > sizeof(*ptr))) > > > > * BITS_PER_BYTE; \ > > > > > > Why 0x8 (i.e. spanning 64 bits), not 4 (matching the uint32_t use > > > above)? > > The idea to read 8 bytes was to deal with crossing word boundary. > > So if > > our address is 0x3 and we have to xchg() 2 bytes, what will cross 4 > > byte boundary. Instead we align add 0x3, so it will become 0x0 and > > then > > just always work with 8 bytes. > > Then what if my 2-byte access crosses a dword boundary? A cache line > one? A page one? Everything looks okay to me, except in the case of a page boundary. In the scenario of a dword boundary: 0 1 2 3 4 5 6 7 8 9 ... X X X X X X X 1 1 X Assuming a variable starts at address 7, 4-byte alignment will be enforced, and 8 bytes will be processed starting from address 4. Concerning a cache line, it should still work, with potential performance issues arising only if a part of the variable is cached while another part is not. Regarding page crossing, I acknowledge that it could be problematic if the variable is entirely located at the end of a page, as there is no guarantee that the next page exists. In this case, it would be preferable to consistently read 4 bytes with 4-byte alignment: X 4094 4095 4096? X 11? However, if the variable spans two pages, proper page mapping should be ensured. It appears sensible to reconsider the macros and implement 4-byte alignment and 4-byte access, but then this is not clear how better to deal with first case ( dword boundary ). Panic ? or use the macros twice for address 4, and address 8? > > > > > + unsigned long new_ = (unsigned long)(new) << mask_l; \ > > > > + unsigned long ret_; \ > > > > + unsigned long rc; \ > > > > > > Similarly, why unsigned long here? > > sizeof(unsigned long) is 8 bytes and it was chosen as we are > > working > > with lc/sc.d which are working with 8 bytes. > > > > > > > > I also wonder about the mix of underscore suffixed (or not) > > > variable > > > names here. > > If the question about ret_, then the same as before size of ret > > argument of the macros will be 1 or 2, but {lc/sc}.d expected to > > work > > with 8 bytes. > > Then what's the uint32_t * about? Agree, then it should be also unsigned long. > > > > > > + __typeof__(*(ptr)) new__ = (new); \ > > > > + __typeof__(*(ptr)) ret__; \ > > > > + switch (size) \ > > > > + { \ > > > > + case 1: \ > > > > + case 2: \ > > > > + emulate_xchg_1_2(ptr__, new__, ret__, release_barrier, > > > > acquire_barrier); \ > > > > > > ... this would become > > > > > > ret__ = emulate_xchg_1_2(ptr__, new__, release_barrier, > > > acquire_barrier); \ > > > > > > But, unlike assumed above, there's no enforcement here that a 2- > > > byte > > > quantity won't cross a word, double-word, cache line, or even > > > page > > > boundary. That might be okay if then the code would simply crash > > > (like > > > the AMO insns emitted further down would), but aiui silent > > > misbehavior > > > would result. > > As I mentioned above with 4-byte alignment and then reading and > > working > > with 8-byte then crossing a word or double-word boundary shouldn't > > be > > an issue. > > > > I am not sure that I know how to check that we are crossing cache > > line > > boundary. > > > > Regarding page boundary, if the next page is mapped then all should > > work fine, otherwise it will be an exception. > > Are you sure lr.d / sc.d are happy to access across such a boundary, > when both pages are mapped? If they are mapped, my expectation that lr.d and sc.d should be happy. > > To me it seems pretty clear that for atomic accesses you want to > demand natural alignment, i.e. 2-byte alignment for 2-byte accesses. > This way you can be sure no potentially problematic boundaries will > be crossed. It makes sense, but I am not sure that I can guarantee that a user of macros will always have 2-byte alignment (except during a panic) in the future. Even now, I am uncertain that everyone will be willing to add __alignment(...) to struct vcpu->is_urgent (xen/include/xen/sched.h:218) and other possible cases to accommodate RISC-V requirements. > > > > > + break; \ > > > > + case 4: \ > > > > + __amoswap_generic(ptr__,
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
Hi Oleksii, On 19/02/2024 14:00, Oleksii wrote: On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote: On 05/02/2024 15:32, Oleksii Kurochko wrote: The header was taken from Linux kernl 6.4.0-rc1. Addionally, were updated: * add emulation of {cmp}xchg for 1/2 byte types This explaination is a little bit light. IIUC, you are implementing them using 32-bit atomic access. Is that correct? If so, please spell it out. Sure, I'll update commit message. Also, I wonder whether it would be better to try to get rid of the 1/2 bytes access. Do you know where they are used? Right now, the issue is with test_and_clear_bool() which is used in common/sched/core.c:840 [https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/sched/core.c?ref_type=heads#L840 ] I don't remember details, but in xen-devel chat someone told me that grant table requires 1/2 bytes access. Ok :/. This would be part of the ABI then and therefore can't be easily changed. * replace tabs with spaces Does this mean you are not planning to backport any Linux fixes? If it will be any fixes for sure I'll back port them, but it looks like this code is stable enough and not to many fixes will be done there, so it shouldn't be hard to backport them and switch to spaces. Fair enough. + " and %1, %0, %z4\n" \ + " or %1, %1, %z3\n" \ + " sc.d %1, %1, %2\n" \ ... respectively sc.w. The same applies for cmpxchg. I agree that it would be better, and my initial attempt was to handle 4-byte or 8-byte boundary crossing during 2-byte access: 0 1 2 3 4 5 6 7 8 X X X 1 1 X X X X In this case, if I align address 3 to address 0 and then read 4 bytes instead of 8 bytes, I will not process the byte at address 4. This was the reason why I started to read 8 bytes. At least on Arm, the architecture doesn't support atomic operations if the access is not aligned to its size (this will send a data abort). On some architecture, this is supported but potentially very slow. So all the common code should already use properly aligned address. Therefore, I don't really see the reason to add support for unaligned access. Cheers, -- Julien Grall
[linux-linus test] 184700: tolerable FAIL - PUSHED
flight 184700 linux-linus real [real] flight 184702 linux-linus real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/184700/ http://logs.test-lab.xenproject.org/osstest/logs/184702/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-freebsd11-amd64 19 guest-localmigrate/x10 fail pass in 184702-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184697 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184697 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184697 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 184697 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 184697 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184697 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 184697 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184697 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: linuxb401b621758e46812da61fa58a67c3fd8d91de0d baseline version: linux6c160f16be5df1f66f6afe186c961ad446d7f94b Last test of basis 184697 2024-02-18 18:43:56 Z0 days Testing same since 184700 2024-02-19 03:44:55 Z0 days1 attempts People who touched revisions under test: Linus Torvalds jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386
Re: [PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure
On 2/19/24 08:12, Jan Beulich wrote: > On 19.02.2024 13:47, Stewart Hildebrand wrote: >> On 2/19/24 07:10, Jan Beulich wrote: >>> On 19.02.2024 12:47, Stewart Hildebrand wrote: @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) { unsigned int i; +/* + * Assert that d->pdev_list doesn't change. ASSERT_PDEV_LIST_IS_READ_LOCKED + * is not suitable here because it may allow either pcidevs_lock() or + * d->pci_lock to be held, but here we rely on d->pci_lock being held, not + * pcidevs_lock(). + */ +ASSERT(rw_is_locked(>pdev->domain->pci_lock)); +ASSERT(spin_is_locked(>pdev->vpci->lock)); >>> >>> There's no "d" in sight here, so it's a little odd that "d" is being talked >>> about. But I guess people can infer what's meant without too much trouble. >> >> I can s/d->pci_lock/msix->pdev->domain->pci_lock/ for the next rev. > > Or simply drop the d-s? That would be better for readability's sake, > I think. OK @@ -313,17 +316,36 @@ void vpci_dump_msi(void) { /* * On error vpci_msix_arch_print will always return without - * holding the lock. + * holding the locks. */ printk("unable to print all MSI-X entries: %d\n", rc); -process_pending_softirqs(); -continue; +goto pdev_done; } } +/* + * Unlock locks to process pending softirqs. This is + * potentially unsafe, as d->pdev_list can be changed in + * meantime. + */ spin_unlock(>vpci->lock); +read_unlock(>pci_lock); +pdev_done: process_pending_softirqs(); +if ( !read_trylock(>pci_lock) ) +{ +printk("unable to access other devices for the domain\n"); +goto domain_done; +} } +read_unlock(>pci_lock); +domain_done: +/* + * We need this label at the end of the loop, but some + * compilers might not be happy about label at the end of the + * compound statement so we adding an empty statement here. + */ +; >>> >>> As to "some compilers": Are there any which accept a label not followed >>> by a statement? Depending on the answer, this comment may be viewed as >>> superfluous. Or else I'd ask about wording: Besides a grammar issue I >>> also don't view it as appropriate that a comment talks about "adding" >>> something when its adjacent code that is meant. That something is there >>> when the comment is there, hence respective wording should imo be used. >> >> It seems like hit or miss whether gcc would accept it or not (prior >> discussion at [1]). I agree the comment is rather lengthy for what it's >> trying to convey. I'd be happy to either remove the comment or reduce >> it to: >> >> domain_done: >> ; /* Empty statement to make some compilers happy */ >> >> [1] >> https://lore.kernel.org/xen-devel/98b8c131-b0b9-f46c-5f46-c2136f2e3...@amd.com/ > > This earlier discussion only proves that there is at least one compiler > objecting. There's no proof there that any compiler exists which, as a > language extension, actually permits such syntax. Yet if the comment > was purely about normal language syntax, then imo it should be zapped > altogether, not just be shrunk. I'll zap it
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On 19.02.2024 15:00, Oleksii wrote: > On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote: >> On 05/02/2024 15:32, Oleksii Kurochko wrote: >>> --- /dev/null >>> +++ b/xen/arch/riscv/include/asm/cmpxchg.h >>> @@ -0,0 +1,237 @@ >>> +/* SPDX-License-Identifier: GPL-2.0-only */ >>> +/* Copyright (C) 2014 Regents of the University of California */ >>> + >>> +#ifndef _ASM_RISCV_CMPXCHG_H >>> +#define _ASM_RISCV_CMPXCHG_H >>> + >>> +#include >>> +#include >>> + >>> +#include >>> +#include >>> +#include >>> + >>> +#define ALIGN_DOWN(addr, size) ((addr) & (~((size) - 1))) >>> + >>> +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier, >>> acquire_barrier) \ >>> +({ \ >>> + asm volatile( \ >>> + release_barrier \ >>> + " amoswap" sfx " %0, %2, %1\n" \ >>> + acquire_barrier \ >>> + : "=r" (ret), "+A" (*ptr) \ >>> + : "r" (new) \ >>> + : "memory" ); \ >>> +}) >>> + >>> +#define emulate_xchg_1_2(ptr, new, ret, release_barrier, >>> acquire_barrier) \ >>> +({ \ >>> + uint32_t *ptr_32b_aligned = (uint32_t *)ALIGN_DOWN((unsigned >>> long)ptr, 4); \ >>> + uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - sizeof(*ptr))) >>> * BITS_PER_BYTE; \ >>> + uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \ >>> + uint8_t mask_h = mask_l + mask_size - 1; \ >>> + unsigned long mask = GENMASK(mask_h, mask_l); \ >>> + unsigned long new_ = (unsigned long)(new) << mask_l; \ >>> + unsigned long ret_; \ >>> + unsigned long rc; \ >>> + \ >>> + asm volatile( \ >>> + release_barrier \ >>> + "0: lr.d %0, %2\n" \ >> >> I was going to ask why this is lr.d rather than lr.w. But I see Jan >> already asked. I agree with him that it should probably be a lr.w and >> ... >> >>> + " and %1, %0, %z4\n" \ >>> + " or %1, %1, %z3\n" \ >>> + " sc.d %1, %1, %2\n" \ >> >> ... respectively sc.w. The same applies for cmpxchg. > > I agree that it would be better, and my initial attempt was to handle > 4-byte or 8-byte boundary crossing during 2-byte access: > > 0 1 2 3 4 5 6 7 8 > X X X 1 1 X X X X > > In this case, if I align address 3 to address 0 and then read 4 bytes > instead of 8 bytes, I will not process the byte at address 4. This was > the reason why I started to read 8 bytes. > > I also acknowledge that there could be an issue in the following case: > > X 4094 4095 4096 > X1 1X > In this situation, when I read 8 bytes, there could be an issue where > the new page (which starts at 4096) will not be mapped. It seems > correct in this case to check that variable is within one page and read > 4 bytes instead of 8. > > One more thing I am uncertain about is if we change everything to read > 4 bytes with 4-byte alignment, what should be done with the first case? > Should we panic? (I am not sure if this is an option.) Counter question (raised elsewhere already): What if a 4-byte access crosses a word / cache line / page boundary? Ideally exactly the same would happen for a 2-byte access crossing a respective boundary. (Which you can achieve relatively easily by masking off only address bit 1, keeping address bit 0 unaltered.) > Should we > perform the operation twice for addresses 0x0 and 0x4? That wouldn't be atomic then. Jan
Re: [PATCH 2/6] svm: Improve type of cpu_has_svm_feature
On 06.02.2024 02:20, George Dunlap wrote: > The "effective type" of the cpu_has_svm_feature macro is effectively > an unsigned log with one bit set (or not); at least one place someone > felt compelled to do a !! to make sure that they got a boolean out of > it. > > Ideally the whole of this would be folded into the cpufeature.h > infrastructure. But for now, duplicate the more type-safe static > inlines in that file, and remove the !!. > > Signed-off-by: George Dunlap Acked-by: Jan Beulich albeit preferably with ... > --- a/xen/arch/x86/include/asm/hvm/svm/svm.h > +++ b/xen/arch/x86/include/asm/hvm/svm/svm.h > @@ -38,7 +38,10 @@ extern u32 svm_feature_flags; > #define SVM_FEATURE_SSS 19 /* NPT Supervisor Shadow Stacks */ > #define SVM_FEATURE_SPEC_CTRL 20 /* MSR_SPEC_CTRL virtualisation */ > > -#define cpu_has_svm_feature(f) (svm_feature_flags & (1u << (f))) > +static inline bool cpu_has_svm_feature(unsigned int feat) > +{ > +return svm_feature_flags & (1u << (feat)); ... the inner pair of parentheses dropped. Jan
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On Sun, 2024-02-18 at 19:00 +, Julien Grall wrote: > > > On 05/02/2024 15:32, Oleksii Kurochko wrote: > > The header was taken from Linux kernl 6.4.0-rc1. > > > > Addionally, were updated: > > * add emulation of {cmp}xchg for 1/2 byte types > > This explaination is a little bit light. IIUC, you are implementing > them > using 32-bit atomic access. Is that correct? If so, please spell it > out. Sure, I'll update commit message. > > Also, I wonder whether it would be better to try to get rid of the > 1/2 > bytes access. Do you know where they are used? Right now, the issue is with test_and_clear_bool() which is used in common/sched/core.c:840 [https://gitlab.com/xen-project/xen/-/blob/staging/xen/common/sched/core.c?ref_type=heads#L840 ] I don't remember details, but in xen-devel chat someone told me that grant table requires 1/2 bytes access. > > > * replace tabs with spaces > Does this mean you are not planning to backport any Linux fixes? If it will be any fixes for sure I'll back port them, but it looks like this code is stable enough and not to many fixes will be done there, so it shouldn't be hard to backport them and switch to spaces. > > > * replace __* varialbed with *__ > > s/varialbed/variable/ > > > * introduce generic version of xchg_* and cmpxchg_*. > > > > Signed-off-by: Oleksii Kurochko > > --- > > Changes in V4: > > - Code style fixes. > > - enforce in __xchg_*() has the same type for new and *ptr, also > > "\n" > > was removed at the end of asm instruction. > > - dependency from > > https://lore.kernel.org/xen-devel/cover.1706259490.git.federico.seraf...@bugseng.com/ > > - switch from ASSERT_UNREACHABLE to STATIC_ASSERT_UNREACHABLE(). > > - drop xchg32(ptr, x) and xchg64(ptr, x) as they aren't used. > > - drop cmpxcg{32,64}_{local} as they aren't used. > > - introduce generic version of xchg_* and cmpxchg_*. > > - update the commit message. > > --- > > Changes in V3: > > - update the commit message > > - add emulation of {cmp}xchg_... for 1 and 2 bytes types > > --- > > Changes in V2: > > - update the comment at the top of the header. > > - change xen/lib.h to xen/bug.h. > > - sort inclusion of headers properly. > > --- > > xen/arch/riscv/include/asm/cmpxchg.h | 237 > > +++ > > 1 file changed, 237 insertions(+) > > create mode 100644 xen/arch/riscv/include/asm/cmpxchg.h > > > > diff --git a/xen/arch/riscv/include/asm/cmpxchg.h > > b/xen/arch/riscv/include/asm/cmpxchg.h > > new file mode 100644 > > index 00..b751a50cbf > > --- /dev/null > > +++ b/xen/arch/riscv/include/asm/cmpxchg.h > > @@ -0,0 +1,237 @@ > > +/* SPDX-License-Identifier: GPL-2.0-only */ > > +/* Copyright (C) 2014 Regents of the University of California */ > > + > > +#ifndef _ASM_RISCV_CMPXCHG_H > > +#define _ASM_RISCV_CMPXCHG_H > > + > > +#include > > +#include > > + > > +#include > > +#include > > +#include > > + > > +#define ALIGN_DOWN(addr, size) ((addr) & (~((size) - 1))) > > + > > +#define __amoswap_generic(ptr, new, ret, sfx, release_barrier, > > acquire_barrier) \ > > +({ \ > > + asm volatile( \ > > + release_barrier \ > > + " amoswap" sfx " %0, %2, %1\n" \ > > + acquire_barrier \ > > + : "=r" (ret), "+A" (*ptr) \ > > + : "r" (new) \ > > + : "memory" ); \ > > +}) > > + > > +#define emulate_xchg_1_2(ptr, new, ret, release_barrier, > > acquire_barrier) \ > > +({ \ > > + uint32_t *ptr_32b_aligned = (uint32_t *)ALIGN_DOWN((unsigned > > long)ptr, 4); \ > > + uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - sizeof(*ptr))) > > * BITS_PER_BYTE; \ > > + uint8_t mask_size = sizeof(*ptr) * BITS_PER_BYTE; \ > > + uint8_t mask_h = mask_l + mask_size - 1; \ > > + unsigned long mask = GENMASK(mask_h, mask_l); \ > > + unsigned long new_ = (unsigned long)(new) << mask_l; \ > > + unsigned long ret_; \ > > + unsigned long rc; \ > > + \ > > + asm volatile( \ > > + release_barrier \ > > + "0: lr.d %0, %2\n" \ > > I was going to ask why this is lr.d rather than lr.w. But I see Jan > already asked. I agree with him that it should probably be a lr.w and > ... > > > + " and %1, %0, %z4\n" \ > > + " or %1, %1, %z3\n" \ > > + " sc.d %1, %1, %2\n" \ > > ... respectively sc.w. The same applies for cmpxchg. I agree that it would be better, and my initial attempt was to handle 4-byte or 8-byte boundary crossing during 2-byte access: 0 1 2 3 4 5 6 7 8 X X X 1 1 X X X X In this case, if I align address 3 to address 0 and then read 4 bytes instead of 8 bytes, I will not process the byte at address 4. This was the reason why I started to read 8 bytes. I also acknowledge that there could be an issue in the following case: X 4094 4095 4096 X1 1X In this situation, when I read 8 bytes, there could be an issue where the new page (which starts at 4096) will not be mapped. It seems correct in this case to check that variable is
[PATCH] tools/9pfsd: add missing va_end() in fill_data()
In xen-9pfsd fill_data() va_end() needs to be called before returning. Coverity Id CID 1592145 Fixes: bcec59cf7ff4 ("tools/xen-9pfsd: add 9pfs version request support") Signed-off-by: Juergen Gross --- tools/9pfsd/io.c | 29 - 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/tools/9pfsd/io.c b/tools/9pfsd/io.c index ebc4102713..adb887c7d9 100644 --- a/tools/9pfsd/io.c +++ b/tools/9pfsd/io.c @@ -445,7 +445,7 @@ static int fill_data(struct ring *ring, const char *fmt, ...) if ( !*f || array_sz ) fmt_err(fmt); if ( !chk_data(ring, data, sizeof(uint16_t)) ) -return pars; +goto out; array_sz = get_unaligned((uint16_t *)data); data += sizeof(uint16_t); *(unsigned int *)par = array_sz; @@ -455,10 +455,10 @@ static int fill_data(struct ring *ring, const char *fmt, ...) case 'b': if ( !chk_data(ring, data, sizeof(uint8_t)) ) -return pars; +goto out; if ( !fill_data_elem(, array, _sz, sizeof(uint8_t), data) ) -return pars; +goto out; data += sizeof(uint8_t); break; @@ -466,48 +466,48 @@ static int fill_data(struct ring *ring, const char *fmt, ...) if ( array_sz ) fmt_err(fmt); if ( !chk_data(ring, data, sizeof(uint32_t)) ) -return pars; +goto out; len = get_unaligned((uint32_t *)data); data += sizeof(uint32_t); *(unsigned int *)par = len; par = va_arg(ap, void *); if ( !chk_data(ring, data, len) ) -return pars; +goto out; memcpy(par, data, len); data += len; break; case 'L': if ( !chk_data(ring, data, sizeof(uint64_t)) ) -return pars; +goto out; if ( !fill_data_elem(, array, _sz, sizeof(uint64_t), data) ) -return pars; +goto out; data += sizeof(uint64_t); break; case 'S': if ( !chk_data(ring, data, sizeof(uint16_t)) ) -return pars; +goto out; len = get_unaligned((uint16_t *)data); data += sizeof(uint16_t); if ( !chk_data(ring, data, len) ) -return pars; +goto out; str_off = add_string(ring, data, len); if ( str_off == ~0 ) -return pars; +goto out; if ( !fill_data_elem(, array, _sz, sizeof(unsigned int), _off) ) -return pars; +goto out; data += len; break; case 'U': if ( !chk_data(ring, data, sizeof(uint32_t)) ) -return pars; +goto out; if ( !fill_data_elem(, array, _sz, sizeof(uint32_t), data) ) -return pars; +goto out; data += sizeof(uint32_t); break; @@ -520,6 +520,9 @@ static int fill_data(struct ring *ring, const char *fmt, ...) pars++; } + out: +va_end(ap); + return pars; } -- 2.35.3
Re: [XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.3
On 19.02.2024 14:24, Federico Serafini wrote: > Update ECLAIR configuration to consider safe switch clauses ending > with __{get,put}_user_bad(). > > Update docs/misra/deviations.rst accordingly. > > Signed-off-by: Federico Serafini As mentioned I'm not happy with this, not the least because of it being unclear why these two would be deviated, when there's no sign of a similar deviation for, say, __bad_atomic_size(). Imo this approach doesn't scale, and that's already leaving aside that the purpose of identically named (pseudo-)helpers could differ between architectures, thus putting under question ... > --- a/automation/eclair_analysis/ECLAIR/deviations.ecl > +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl > @@ -368,6 +368,10 @@ safe." > -config=MC3R1.R16.3,reports+={safe, > "any_area(end_loc(any_exp(text(/BUG\\(\\);/"} > -doc_end > > +-doc_begin="Switch clauses ending with constructs \"__get_user_bad()\" and > \"__put_user_bad()\" are safe: they denote an unreachable program point." > +-config=MC3R1.R16.3,reports+={safe, > "any_area(end_loc(any_exp(text(/__(put|get)_user_bad\\(\\);/"} > +-doc_end ... the global scope of such a deviation. While it may not be a good idea, even within an arch such (pseudo-)helpers could be used for multiple distinct purposes. Jan
Re: [PATCH 1/6] xen/hvm: Convert hap_capabilities into a bitfield
On 06.02.2024 02:20, George Dunlap wrote: > hvm_function_table is an internal structure; rather than manually > |-ing and &-ing bits, just make it a boolean bitfield and let the > compiler do all the work. This makes everything easier to read, and > presumably allows the compiler more flexibility in producing efficient > code. > > No functional change intended. > > Signed-off-by: George Dunlap Reviewed-by: Jan Beulich > --- > Questions: > > * Should hap_superpage_2m really be set unconditionally, or should we > condition it on cpu_has_svm_npt? That's HAP capabilities; there's not going to be any use of HAP when there's no NPT (on an AMD system). IOW - all is fine as is, imo. > * Do we really need to "!!cpu_has_svm_npt"? If so, wouldn't it be > better to put the "!!" in the #define, rather than requiring the > user to know that it's needed? Considering that hap_supported is bool now, the !! can simply be dropped. We've been doing so as code was touched anyway, not in a concerted effort. > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -113,8 +113,8 @@ static int cf_check parse_ept_param_runtime(const char *s) > int val; > > if ( !cpu_has_vmx_ept || !hvm_funcs.hap_supported || > - !(hvm_funcs.hap_capabilities & > - (HVM_HAP_SUPERPAGE_2MB | HVM_HAP_SUPERPAGE_1GB)) ) > + !(hvm_funcs.caps.hap_superpage_2mb || > + hvm_funcs.caps.hap_superpage_1gb) ) > { > printk("VMX: EPT not available, or not in use - ignoring\n"); Just to mention it: The conditional and the log message don't really fit together. (I was first wondering what the 2mb/1gb checks had to do here at all, but that's immediately clear when seeing that the only sub-option here is "exec-sp".) > @@ -104,8 +96,11 @@ struct hvm_function_table { > /* Hardware virtual interrupt delivery enable? */ > bool virtual_intr_delivery_enabled; > > -/* Indicate HAP capabilities. */ > -unsigned int hap_capabilities; > +struct { > +/* Indicate HAP capabilities. */ > +bool hap_superpage_1gb:1, > +hap_superpage_2mb:1; Nit: Would be nice imo if the two identifiers aligned vertically with one another. Jan
[XEN PATCH] automation/eclair: add deviation for MISRA C:2012 Rule 16.3
Update ECLAIR configuration to consider safe switch clauses ending with __{get,put}_user_bad(). Update docs/misra/deviations.rst accordingly. Signed-off-by: Federico Serafini --- automation/eclair_analysis/ECLAIR/deviations.ecl | 4 docs/misra/deviations.rst| 6 ++ 2 files changed, 10 insertions(+) diff --git a/automation/eclair_analysis/ECLAIR/deviations.ecl b/automation/eclair_analysis/ECLAIR/deviations.ecl index fd32ff8a9c..831b5d4c67 100644 --- a/automation/eclair_analysis/ECLAIR/deviations.ecl +++ b/automation/eclair_analysis/ECLAIR/deviations.ecl @@ -368,6 +368,10 @@ safe." -config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(/BUG\\(\\);/"} -doc_end +-doc_begin="Switch clauses ending with constructs \"__get_user_bad()\" and \"__put_user_bad()\" are safe: they denote an unreachable program point." +-config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(/__(put|get)_user_bad\\(\\);/"} +-doc_end + -doc_begin="Switch clauses not ending with the break statement are safe if an explicit comment indicating the fallthrough intention is present." -config=MC3R1.R16.3,reports+={safe, "any_area(end_loc(any_exp(text(^(?s).*/\\* [fF]all ?through.? \\*/.*$,0..1"} diff --git a/docs/misra/deviations.rst b/docs/misra/deviations.rst index 123c78e20a..58f4fac18e 100644 --- a/docs/misra/deviations.rst +++ b/docs/misra/deviations.rst @@ -307,6 +307,12 @@ Deviations related to MISRA C:2012 Rules: - Switch clauses ending with failure method \"BUG()\" are safe. - Tagged as `safe` for ECLAIR. + * - R16.3 + - Switch clauses ending with constructs + \"__get_user_bad()\" and \"__put_user_bad()\" are safe: + they denote an unreachable program point. + - Tagged as `safe` for ECLAIR. + * - R16.3 - Existing switch clauses not ending with the break statement are safe if an explicit comment indicating the fallthrough intention is present. -- 2.34.1
Re: [PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure
On 19.02.2024 13:47, Stewart Hildebrand wrote: > On 2/19/24 07:10, Jan Beulich wrote: >> On 19.02.2024 12:47, Stewart Hildebrand wrote: >>> @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) >>> { >>> unsigned int i; >>> >>> +/* >>> + * Assert that d->pdev_list doesn't change. >>> ASSERT_PDEV_LIST_IS_READ_LOCKED >>> + * is not suitable here because it may allow either pcidevs_lock() or >>> + * d->pci_lock to be held, but here we rely on d->pci_lock being held, >>> not >>> + * pcidevs_lock(). >>> + */ >>> +ASSERT(rw_is_locked(>pdev->domain->pci_lock)); >>> +ASSERT(spin_is_locked(>pdev->vpci->lock)); >> >> There's no "d" in sight here, so it's a little odd that "d" is being talked >> about. But I guess people can infer what's meant without too much trouble. > > I can s/d->pci_lock/msix->pdev->domain->pci_lock/ for the next rev. Or simply drop the d-s? That would be better for readability's sake, I think. >>> @@ -313,17 +316,36 @@ void vpci_dump_msi(void) >>> { >>> /* >>> * On error vpci_msix_arch_print will always return >>> without >>> - * holding the lock. >>> + * holding the locks. >>> */ >>> printk("unable to print all MSI-X entries: %d\n", rc); >>> -process_pending_softirqs(); >>> -continue; >>> +goto pdev_done; >>> } >>> } >>> >>> +/* >>> + * Unlock locks to process pending softirqs. This is >>> + * potentially unsafe, as d->pdev_list can be changed in >>> + * meantime. >>> + */ >>> spin_unlock(>vpci->lock); >>> +read_unlock(>pci_lock); >>> +pdev_done: >>> process_pending_softirqs(); >>> +if ( !read_trylock(>pci_lock) ) >>> +{ >>> +printk("unable to access other devices for the domain\n"); >>> +goto domain_done; >>> +} >>> } >>> +read_unlock(>pci_lock); >>> +domain_done: >>> +/* >>> + * We need this label at the end of the loop, but some >>> + * compilers might not be happy about label at the end of the >>> + * compound statement so we adding an empty statement here. >>> + */ >>> +; >> >> As to "some compilers": Are there any which accept a label not followed >> by a statement? Depending on the answer, this comment may be viewed as >> superfluous. Or else I'd ask about wording: Besides a grammar issue I >> also don't view it as appropriate that a comment talks about "adding" >> something when its adjacent code that is meant. That something is there >> when the comment is there, hence respective wording should imo be used. > > It seems like hit or miss whether gcc would accept it or not (prior > discussion at [1]). I agree the comment is rather lengthy for what it's > trying to convey. I'd be happy to either remove the comment or reduce > it to: > > domain_done: > ; /* Empty statement to make some compilers happy */ > > [1] > https://lore.kernel.org/xen-devel/98b8c131-b0b9-f46c-5f46-c2136f2e3...@amd.com/ This earlier discussion only proves that there is at least one compiler objecting. There's no proof there that any compiler exists which, as a language extension, actually permits such syntax. Yet if the comment was purely about normal language syntax, then imo it should be zapped altogether, not just be shrunk. Jan
Re: [RESEND RFC] kernel/ksysfs.c: restrict /sys/kernel/notes to root access
On 2024/2/18 17:04, Kees Cook wrote: On Sun, Feb 18, 2024 at 08:47:03AM +0100, Greg KH wrote: On Sun, Feb 18, 2024 at 03:35:01PM +0800, Guixiong Wei wrote: From: Guixiong Wei Restrict non-privileged user access to /sys/kernel/notes to avoid security attack. The non-privileged users have read access to notes. The notes expose the load address of startup_xen. This address could be used to bypass KASLR. How can it be used to bypass it? KASLR is, for local users, pretty much not an issue, as that's not what it protects from, only remote ones. For example, the startup_xen is built at 0x82465180 and commit_creds is built at 0x810ad570 which could read from the /boot/System.map. And the loaded address of startup_xen is 0xbc265180 which read from /sys/kernel/notes. So the loaded address of commit_creds is 0xbc265180 - (0x82465180 - 0x810ad570) = 0xbaead570. I've cc: the hardening list on this, I'm sure the developers there have opinions about this. Oh eww, why is Xen spewing addresses into the notes section? (This must be how it finds its entry point? But that would be before relocations happen...) But yes, I can confirm that relocations are done against the .notes section at boot, so the addresses exposed in .notes is an immediate KASLR offset exposure. In /sys/kernel/notes (are there any tools to read this? I wrote my own...) type: 1 name: Xen desc: 0xb4a711c0 0x which matches a privileged read of /proc/kallsysms: b4a711c0 T startup_xen (and the hypercall_page too) There are all coming from arch/x86/xen/xen-head.S: ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz "linux") ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz "2.6") ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION,.asciz "xen-3.0") #ifdef CONFIG_XEN_PV ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, _ASM_PTR __START_KERNEL_map) /* Map the p2m table to a 512GB-aligned user address. */ ELFNOTE(Xen, XEN_ELFNOTE_INIT_P2M, .quad (PUD_SIZE * PTRS_PER_PUD)) ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, _ASM_PTR startup_xen) ... Introduced in commit 5ead97c84fa7 ("xen: Core Xen implementation") Exposed in commit da1a679cde9b ("Add /sys/kernel/notes") Amazingly these both went in on the same release (v2.6.23, 2007). This has been exposed for longer than KASLR has been upstream. :P Signed-off-by: Guixiong Wei --- kernel/ksysfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c index b1292a57c2a5..09bc0730239b 100644 --- a/kernel/ksysfs.c +++ b/kernel/ksysfs.c @@ -199,7 +199,7 @@ static ssize_t notes_read(struct file *filp, struct kobject *kobj, static struct bin_attribute notes_attr __ro_after_init = { .attr = { .name = "notes", - .mode = S_IRUGO, + .mode = S_IRUSR, }, .read = _read, }; Yes please. Reviewed-by: Kees Cook I wonder if we should also remove relocations that are aimed at the .notes section for good measure? If that had already been true, this would have just given the same info as System.map. That's a good idea, but it depends on whether the user space tool can accept the remove relocation address. No objection from me, but what userspace tool requires access to this file today? Will it break if permissions are changed on it? From the exposed content, it seems that the main users are Xen-related tools. I add Xen list, developers should be able to provide some information. And what about the module notes files? If you change one, shouldn't you change all? From what I currently know, the module note files do not expose any kernel symbol address, so there is no need for modification. Luckily all of _those_ contain what I'd expect: the Linux and GNU.build-id notes, which are harmless. But if we're going to suddenly have things appearing in here, let's make those root-only too. Yes, but I also not sure whether the user space tools using this file can accept this permission modification.
Re: [PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure
On 2/19/24 07:10, Jan Beulich wrote: > On 19.02.2024 12:47, Stewart Hildebrand wrote: >> @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) >> { >> unsigned int i; >> >> +/* >> + * Assert that d->pdev_list doesn't change. >> ASSERT_PDEV_LIST_IS_READ_LOCKED >> + * is not suitable here because it may allow either pcidevs_lock() or >> + * d->pci_lock to be held, but here we rely on d->pci_lock being held, >> not >> + * pcidevs_lock(). >> + */ >> +ASSERT(rw_is_locked(>pdev->domain->pci_lock)); >> +ASSERT(spin_is_locked(>pdev->vpci->lock)); > > There's no "d" in sight here, so it's a little odd that "d" is being talked > about. But I guess people can infer what's meant without too much trouble. I can s/d->pci_lock/msix->pdev->domain->pci_lock/ for the next rev. > >> @@ -313,17 +316,36 @@ void vpci_dump_msi(void) >> { >> /* >> * On error vpci_msix_arch_print will always return >> without >> - * holding the lock. >> + * holding the locks. >> */ >> printk("unable to print all MSI-X entries: %d\n", rc); >> -process_pending_softirqs(); >> -continue; >> +goto pdev_done; >> } >> } >> >> +/* >> + * Unlock locks to process pending softirqs. This is >> + * potentially unsafe, as d->pdev_list can be changed in >> + * meantime. >> + */ >> spin_unlock(>vpci->lock); >> +read_unlock(>pci_lock); >> +pdev_done: >> process_pending_softirqs(); >> +if ( !read_trylock(>pci_lock) ) >> +{ >> +printk("unable to access other devices for the domain\n"); >> +goto domain_done; >> +} >> } >> +read_unlock(>pci_lock); >> +domain_done: >> +/* >> + * We need this label at the end of the loop, but some >> + * compilers might not be happy about label at the end of the >> + * compound statement so we adding an empty statement here. >> + */ >> +; > > As to "some compilers": Are there any which accept a label not followed > by a statement? Depending on the answer, this comment may be viewed as > superfluous. Or else I'd ask about wording: Besides a grammar issue I > also don't view it as appropriate that a comment talks about "adding" > something when its adjacent code that is meant. That something is there > when the comment is there, hence respective wording should imo be used. It seems like hit or miss whether gcc would accept it or not (prior discussion at [1]). I agree the comment is rather lengthy for what it's trying to convey. I'd be happy to either remove the comment or reduce it to: domain_done: ; /* Empty statement to make some compilers happy */ [1] https://lore.kernel.org/xen-devel/98b8c131-b0b9-f46c-5f46-c2136f2e3...@amd.com/ > >> --- a/xen/include/xen/pci.h >> +++ b/xen/include/xen/pci.h >> @@ -171,6 +171,19 @@ void pcidevs_lock(void); >> void pcidevs_unlock(void); >> bool __must_check pcidevs_locked(void); >> >> +#ifndef NDEBUG >> +/* >> + * Check to ensure there will be no changes to the entries in d->pdev_list >> (but >> + * not the contents of each entry). >> + * This check is not suitable for protecting other state or critical >> regions. >> + */ >> +#define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) \ >> +/* NB: d may be evaluated multiple times, or not at all */ \ >> +ASSERT(pcidevs_locked() || ((d) && rw_is_locked(&(d)->pci_lock))) > > Is there actually any case where d can be NULL here? Yes, when called from ns16550 driver, if the driver failed to make the device RO. > >> +#else >> +#define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ({ (void)(d); }) > > Evaluating d here isn't very useful when the assertion expression doesn't > guarantee single evaluation. Plus even if it needed evaluating, there would > be no need to use a compiler extension here: > > #define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ((void)(d)) OK, I can make this change.
Re: [PATCH v4 07/30] xen/asm-generic: introdure nospec.h
On 19.02.2024 12:59, Oleksii wrote: > Hi Julien, > > On Sun, 2024-02-18 at 18:30 +, Julien Grall wrote: >> Hi Oleksii, >> >> Title: Typo s/introdure/introduce/ >> >> On 05/02/2024 15:32, Oleksii Kurochko wrote: >>> The header is similar between Arm, PPC, and RISC-V, >>> so it has been moved to asm-generic. >> >> I am not 100% convinced that moving this header to asm-generic is a >> good >> idea. At least for Arm, those helpers ought to be non-empty, what >> about >> RISC-V? > For Arm, they are not taking any action, are they? There are no > specific fences or other mechanisms inside > evaluate_nospec()/block_speculation() to address speculation. The question isn't the status quo, but how things should be looking like if everything was in place that's (in principle) needed. > For RISC-V, it can be implemented in a similar manner, at least for > now. Since these functions are only used in the grant tables code ( for > Arm and so for RISC-V ), which is not supported by RISC-V. Same here - the question is whether long term, when gnttab is also supported, RISC-V would get away without doing anything. Still ... >> If the answer is they should be non-empty. Then I would consider to >> keep >> the duplication to make clear that each architecture should take >> their >> own decision in term of security. >> >> The alternative, is to have a generic implementation that is safe by >> default (if that's even possible). > I am not certain that we can have a generic implementation, as each > architecture may have specific speculation issues. ... it's theoretically possible that there'd be an arch with no speculation issues, maybe simply because of not speculating. Jan
Re: [PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure
On 19.02.2024 12:47, Stewart Hildebrand wrote: > @@ -895,6 +891,15 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > { > unsigned int i; > > +/* > + * Assert that d->pdev_list doesn't change. > ASSERT_PDEV_LIST_IS_READ_LOCKED > + * is not suitable here because it may allow either pcidevs_lock() or > + * d->pci_lock to be held, but here we rely on d->pci_lock being held, > not > + * pcidevs_lock(). > + */ > +ASSERT(rw_is_locked(>pdev->domain->pci_lock)); > +ASSERT(spin_is_locked(>pdev->vpci->lock)); There's no "d" in sight here, so it's a little odd that "d" is being talked about. But I guess people can infer what's meant without too much trouble. > @@ -313,17 +316,36 @@ void vpci_dump_msi(void) > { > /* > * On error vpci_msix_arch_print will always return > without > - * holding the lock. > + * holding the locks. > */ > printk("unable to print all MSI-X entries: %d\n", rc); > -process_pending_softirqs(); > -continue; > +goto pdev_done; > } > } > > +/* > + * Unlock locks to process pending softirqs. This is > + * potentially unsafe, as d->pdev_list can be changed in > + * meantime. > + */ > spin_unlock(>vpci->lock); > +read_unlock(>pci_lock); > +pdev_done: > process_pending_softirqs(); > +if ( !read_trylock(>pci_lock) ) > +{ > +printk("unable to access other devices for the domain\n"); > +goto domain_done; > +} > } > +read_unlock(>pci_lock); > +domain_done: > +/* > + * We need this label at the end of the loop, but some > + * compilers might not be happy about label at the end of the > + * compound statement so we adding an empty statement here. > + */ > +; As to "some compilers": Are there any which accept a label not followed by a statement? Depending on the answer, this comment may be viewed as superfluous. Or else I'd ask about wording: Besides a grammar issue I also don't view it as appropriate that a comment talks about "adding" something when its adjacent code that is meant. That something is there when the comment is there, hence respective wording should imo be used. > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -171,6 +171,19 @@ void pcidevs_lock(void); > void pcidevs_unlock(void); > bool __must_check pcidevs_locked(void); > > +#ifndef NDEBUG > +/* > + * Check to ensure there will be no changes to the entries in d->pdev_list > (but > + * not the contents of each entry). > + * This check is not suitable for protecting other state or critical regions. > + */ > +#define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) \ > +/* NB: d may be evaluated multiple times, or not at all */ \ > +ASSERT(pcidevs_locked() || ((d) && rw_is_locked(&(d)->pci_lock))) Is there actually any case where d can be NULL here? > +#else > +#define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ({ (void)(d); }) Evaluating d here isn't very useful when the assertion expression doesn't guarantee single evaluation. Plus even if it needed evaluating, there would be no need to use a compiler extension here: #define ASSERT_PDEV_LIST_IS_READ_LOCKED(d) ((void)(d)) Jan
Re: [PATCH v4 07/30] xen/asm-generic: introdure nospec.h
Hi Julien, On Sun, 2024-02-18 at 18:30 +, Julien Grall wrote: > Hi Oleksii, > > Title: Typo s/introdure/introduce/ > > On 05/02/2024 15:32, Oleksii Kurochko wrote: > > The header is similar between Arm, PPC, and RISC-V, > > so it has been moved to asm-generic. > > I am not 100% convinced that moving this header to asm-generic is a > good > idea. At least for Arm, those helpers ought to be non-empty, what > about > RISC-V? For Arm, they are not taking any action, are they? There are no specific fences or other mechanisms inside evaluate_nospec()/block_speculation() to address speculation. For RISC-V, it can be implemented in a similar manner, at least for now. Since these functions are only used in the grant tables code ( for Arm and so for RISC-V ), which is not supported by RISC-V. > > If the answer is they should be non-empty. Then I would consider to > keep > the duplication to make clear that each architecture should take > their > own decision in term of security. > > The alternative, is to have a generic implementation that is safe by > default (if that's even possible). I am not certain that we can have a generic implementation, as each architecture may have specific speculation issues. ~ Oleksii
Re: QEMU features useful for Xen development?
On Thu, 31 Aug 2023 at 11:32, Ayan Kumar Halder wrote: > On 31/08/2023 11:03, Peter Maydell wrote: > > On Thu, 31 Aug 2023 at 10:53, Alex Bennée wrote: > >> Peter Maydell writes: > >>> On Thu, 31 Aug 2023 at 01:57, Stefano Stabellini > >>> wrote: > As Xen is gaining R52 and R82 support, it would be great to be able to > use QEMU for development and testing there as well, but I don't think > QEMU can emulate EL2 properly for the Cortex-R architecture. We would > need EL2 support in the GIC/timer for R52/R82 as well. > >>> (What sort of board model would Xen want to use it with?) > >> We already model a bunch of the mps2/mps3 images so I'm assuming adding > >> the mps3-an536 would be a fairly simple step to do (mps2tz.c is mostly > >> tweaking config values). The question is would it be a useful target for > >> Xen? > Yes, it will be helpful if Qemu can model this board. We have a > downstream port of Xen on R52 (upstreaming is in progress). > > So, we can test the Qemu model with Xen. Hi, all. I just wanted to provide an update on this. We've now completed the mps3-an536 board model, and you can find it if you check out the head-of-git QEMU. The new board will be in the 9.0 QEMU release, but if you have a chance to give it a spin now we'll be able to fix any bugs or problems with it before the release. The documentation for the board is here: https://www.qemu.org/docs/master/system/arm/mps2.html and it lists the limitations/missing features. (If any of those are important let me know and we can look at scheduling the work to fill them in.) I'd also like to draw your attention to the note about the UART ordering on the AN536; unfortunately the hardware setup is a bit awkward here, so if you have an "I don't see any output" problem make sure your guest is sending to the same UART you're asking QEMU to show you the output from :-) thanks -- PMM
[PATCH v13.2 01/14] vpci: use per-domain PCI lock to protect vpci structure
From: Oleksandr Andrushchenko Use the per-domain PCI read/write lock to protect the presence of the pci device vpci field. This lock can be used (and in a few cases is used right away) so that vpci removal can be performed while holding the lock in write mode. Previously such removal could race with vpci_read for example. When taking both d->pci_lock and pdev->vpci->lock, they should be taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid possible deadlock situations. 1. Per-domain's pci_lock is used to protect pdev->vpci structure from being removed. 2. Writing the command register and ROM BAR register may trigger modify_bars to run, which in turn may access multiple pdevs while checking for the existing BAR's overlap. The overlapping check, if done under the read lock, requires vpci->lock to be acquired on both devices being compared, which may produce a deadlock. It is not possible to upgrade read lock to write lock in such a case. So, in order to prevent the deadlock, use d->pci_lock in write mode instead. All other code, which doesn't lead to pdev->vpci destruction and does not access multiple pdevs at the same time, can still use a combination of the read lock and pdev->vpci->lock. 3. Drop const qualifier where the new rwlock is used and this is appropriate. 4. Do not call process_pending_softirqs with any locks held. For that unlock prior the call and re-acquire the locks after. After re-acquiring the lock there is no need to check if pdev->vpci exists: - in apply_map because of the context it is called (no race condition possible) - for MSI/MSI-X debug code because it is called at the end of pdev->vpci access and no further access to pdev->vpci is made 5. Use d->pci_lock around for_each_pdev and pci_get_pdev() while accessing pdevs in vpci code. 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs do not go away. The vPCI functions call several MSI-related functions which already have existing non-vPCI callers. Change those MSI-related functions to allow using either pcidevs_lock() or d->pci_lock for ensuring pdevs do not go away. Holding d->pci_lock in read mode is sufficient. Note that this pdev protection mechanism does not protect other state or critical sections. These MSI-related functions already have other race condition and state protection mechanims (e.g. d->event_lock and msixtbl RCU), so we deduce that the use of the global pcidevs_lock() is to ensure that pdevs do not go away. 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking that pdevs do not go away. The purpose of this wrapper is to aid readability and document the intent of the pdev protection mechanism. 8. When possible, the existing non-vPCI callers of these MSI-related functions haven't been switched to use the newly introduced per-domain pci_lock, and will continue to use the global pcidevs_lock(). This is done to reduce the risk of the new locking scheme introducing regressions. Those users will be adjusted in due time. One exception is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to the caller, physdev_map_pirq(): this instance is switched to read_lock(>pci_lock) right away. Suggested-by: Roger Pau Monné Suggested-by: Jan Beulich Signed-off-by: Oleksandr Andrushchenko Signed-off-by: Volodymyr Babchuk Signed-off-by: Stewart Hildebrand --- Changes in v13.2: - clarify comment in vpci_msix_arch_print() - add parenthesis around arg in macro definition - make the helper macro an ASSERT itself Changes in v13.1: - move pdev_list_is_read_locked() to pci.h - use pdev_list_is_read_locked() in more places - use d directly in pdev_list_is_read_locked() - wrap pdev_list_is_read_locked() helper in #ifndef NDEBUG, and add declaration in the #else case with no implementation - replace pcidevs_lock() with read_lock(>pci_lock) in physdev.c Changes in v13: - hold off adding Roger's R-b tag even though it was provided on v12.2 - use a wrapper construct to ease readability of odd-looking ASSERTs - new placement of ASSERT in __pci_enable_msix(), __pci_enable_msi(), and pci_enable_msi(). Rearrange/add pdev NULL check. - expand commit description with details about using either pcidevs_lock() or d->pci_lock Changes in v12.2: - drop Roger's R-b - drop both locks on error paths in vpci_msix_arch_print() - add another ASSERT in vpci_msix_arch_print(), to enforce the expectation both locks are held before calling vpci_msix_arch_print() - move pdev_done label in vpci_dump_msi() - update comments in vpci_dump_msi() to say locks (plural) Changes in v12.1: - use read_trylock() in vpci_msix_arch_print() - fixup in-code comments (revert double space, use DomXEN) in vpci_{read,write}() - minor updates in commit message - add Roger's R-b Changes in v12: - s/pci_rwlock/pci_lock/ in commit message - expand comment about scope of pci_lock in sched.h - in vpci_{read,write}, if hwdom is trying to access a device assigned
[PATCH v4] Reduce assembly code size of entry points
On many entries we push 8-bytes zero and exception constants are small so we can just write a single byte saving 3 bytes for instruction. With ENDBR64 this reduces the size of many entry points from 32 to 16 bytes (due to alignment). The push and the mov are overlapping stores either way. Swapping between movl and movb will make no difference at all on performance. Similar code is already used in autogen_stubs. Signed-off-by: Frediano Ziglio Reviewed-by: Jan Beulich -- v4: - minor space adjustments; - quoted possible error message; - update commit subject removing "exception"; - added "Reviewed-by", I hope to got everything asked for. v3: - add other missing entries; - reduce code when TRAP_syscall is used; - improved commit message. v2: - added missing entry points; - add mention to autogen_stubs code, as suggested. --- xen/arch/x86/x86_64/entry.S | 66 ++--- 1 file changed, 40 insertions(+), 26 deletions(-) diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S index ecdd6e5b47..a7bd8f0ca5 100644 --- a/xen/arch/x86/x86_64/entry.S +++ b/xen/arch/x86/x86_64/entry.S @@ -22,6 +22,14 @@ #endif .endm +.macro BUILD_BUG_ON condstr cond:vararg +.if \cond +.error "Condition \"\condstr\" not satisfied" +.endif +.endm +/* preprocessor macro to make error message more user friendly */ +#define BUILD_BUG_ON(cond) BUILD_BUG_ON #cond cond + #ifdef CONFIG_PV /* %rbx: struct vcpu */ FUNC_LOCAL(switch_to_kernel) @@ -187,7 +195,8 @@ FUNC_LOCAL(restore_all_guest) SPEC_CTRL_EXIT_TO_PV/* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */ RESTORE_ALL -testw $TRAP_syscall,4(%rsp) +BUILD_BUG_ON(TRAP_syscall & 0xff) +testb $TRAP_syscall >> 8, 4+1(%rsp) jziret_exit_to_guest movq 24(%rsp),%r11 # RFLAGS @@ -254,7 +263,8 @@ FUNC(lstar_enter) pushq $FLAT_KERNEL_CS64 pushq %rcx pushq $0 -movl $TRAP_syscall, 4(%rsp) +BUILD_BUG_ON(TRAP_syscall & 0xff) +movb $TRAP_syscall >> 8, 4+1(%rsp) SAVE_ALL SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */ @@ -292,7 +302,8 @@ FUNC(cstar_enter) pushq $FLAT_USER_CS32 pushq %rcx pushq $0 -movl $TRAP_syscall, 4(%rsp) +BUILD_BUG_ON(TRAP_syscall & 0xff) +movb $TRAP_syscall >> 8, 4+1(%rsp) SAVE_ALL SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */ @@ -334,7 +345,8 @@ LABEL(sysenter_eflags_saved, 0) pushq $3 /* ring 3 null cs */ pushq $0 /* null rip */ pushq $0 -movl $TRAP_syscall, 4(%rsp) +BUILD_BUG_ON(TRAP_syscall & 0xff) +movb $TRAP_syscall >> 8, 4+1(%rsp) SAVE_ALL SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */ @@ -389,7 +401,7 @@ FUNC(entry_int80) ENDBR64 ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP pushq $0 -movl $0x80, 4(%rsp) +movb $0x80, 4(%rsp) SAVE_ALL SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */ @@ -561,7 +573,8 @@ __UNLIKELY_END(create_bounce_frame_bad_sp) /* Rewrite our stack frame and return to guest-OS mode. */ /* IA32 Ref. Vol. 3: TF, VM, RF and NT flags are cleared on trap. */ /* Also clear AC: alignment checks shouldn't trigger in kernel mode. */ -orl $TRAP_syscall,UREGS_entry_vector+8(%rsp) +BUILD_BUG_ON(TRAP_syscall & 0xff) +orb $TRAP_syscall >> 8, UREGS_entry_vector+8+1(%rsp) andl $~(X86_EFLAGS_AC|X86_EFLAGS_VM|X86_EFLAGS_RF|\ X86_EFLAGS_NT|X86_EFLAGS_TF),UREGS_eflags+8(%rsp) movq $FLAT_KERNEL_SS,UREGS_ss+8(%rsp) @@ -653,7 +666,7 @@ END(ret_from_intr) .section .init.text, "ax", @progbits FUNC(early_page_fault) ENDBR64 -movl $X86_EXC_PF, 4(%rsp) +movb $X86_EXC_PF, 4(%rsp) SAVE_ALL movq %rsp, %rdi call do_early_page_fault @@ -722,7 +735,7 @@ END(common_interrupt) FUNC(entry_PF) ENDBR64 -movl $X86_EXC_PF, 4(%rsp) +movb $X86_EXC_PF, 4(%rsp) END(entry_PF) /* No special register assumptions. */ FUNC(handle_exception, 0) @@ -898,105 +911,106 @@ END(handle_exception) FUNC(entry_DE) ENDBR64 pushq $0 -movl $X86_EXC_DE, 4(%rsp) +/* no need to update exception type, already 0 */ +BUILD_BUG_ON(X86_EXC_DE) jmp handle_exception END(entry_DE) FUNC(entry_MF) ENDBR64 pushq $0 -movl $X86_EXC_MF, 4(%rsp) +movb $X86_EXC_MF, 4(%rsp) jmp handle_exception END(entry_MF) FUNC(entry_XM) ENDBR64 pushq $0 -movl $X86_EXC_XM, 4(%rsp) +movb $X86_EXC_XM, 4(%rsp) jmp handle_exception END(entry_XM) FUNC(entry_NM) ENDBR64
Re: [PATCH v9 7/7] xen/asm-generic: fold struct devarch into struct dev
On 16.02.2024 13:39, Oleksii Kurochko wrote: > The current patch is a follow-up to the patch titled: > xen/asm-generic: introduce generic device.h > Also, a prerequisite for this patch is, without which a compilation > error will occur: > xen/arm: switch Arm to use asm-generic/device.h > > The 'struct dev_archdata' is exclusively used within 'struct device', > so it could be merged into 'struct device.' > > After the merger, it is necessary to update the 'dev_archdata()' > macros and the comments above 'struct arm_smmu_xen_device' in > drivers/passthrough/arm/smmu.c. > Additionally, it is required to update instances of > "dev->archdata->iommu" to "dev->iommu". > > Suggested-by: Julien Grall > Signed-off-by: Oleksii Kurochko > --- > This patch can be merged with patches 4 and 5 of this patch series. > --- > Changes in V9: > - newly introduced patch. > --- > xen/drivers/passthrough/arm/smmu.c | 12 ++-- > xen/include/asm-generic/device.h | 8 +--- > 2 files changed, 7 insertions(+), 13 deletions(-) > > diff --git a/xen/drivers/passthrough/arm/smmu.c > b/xen/drivers/passthrough/arm/smmu.c > index 32e2ff279b..4a272c8779 100644 > --- a/xen/drivers/passthrough/arm/smmu.c > +++ b/xen/drivers/passthrough/arm/smmu.c > @@ -227,9 +227,9 @@ struct arm_smmu_xen_domain { > }; > > /* > - * Xen: Information about each device stored in dev->archdata.iommu > + * Xen: Information about each device stored in dev->iommu > * > - * Initially dev->archdata.iommu only stores the iommu_domain (runtime > + * Initially dev->iommu only stores the iommu_domain (runtime > * configuration of the SMMU) but, on Xen, we also have to store the > * iommu_group (list of streamIDs associated to the device). > * > @@ -242,7 +242,7 @@ struct arm_smmu_xen_device { > struct iommu_group *group; > }; > > -#define dev_archdata(dev) ((struct arm_smmu_xen_device *)dev->archdata.iommu) > +#define dev_archdata(dev) ((struct arm_smmu_xen_device *)dev->iommu) I find in particular the naming here odd. But I'll let Julien judge whether this really is along the lines of what he suggested. Jan
Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On 15.02.2024 14:41, Oleksii wrote: >>> + : "=r" (ret), "+A" (*ptr) \ >>> + : "r" (new) \ >>> + : "memory" ); \ >>> +}) >>> + >>> +#define emulate_xchg_1_2(ptr, new, ret, release_barrier, >>> acquire_barrier) \ >>> +({ \ >>> + uint32_t *ptr_32b_aligned = (uint32_t *)ALIGN_DOWN((unsigned >>> long)ptr, 4); \ >> >> You now appear to assume that this macro is only used with inputs not >> crossing word boundaries. That's okay as long as suitably guaranteed >> at the use sites, but imo wants saying in a comment. >> >>> + uint8_t mask_l = ((unsigned long)(ptr) & (0x8 - sizeof(*ptr))) >>> * BITS_PER_BYTE; \ >> >> Why 0x8 (i.e. spanning 64 bits), not 4 (matching the uint32_t use >> above)? > The idea to read 8 bytes was to deal with crossing word boundary. So if > our address is 0x3 and we have to xchg() 2 bytes, what will cross 4 > byte boundary. Instead we align add 0x3, so it will become 0x0 and then > just always work with 8 bytes. Then what if my 2-byte access crosses a dword boundary? A cache line one? A page one? >>> + unsigned long new_ = (unsigned long)(new) << mask_l; \ >>> + unsigned long ret_; \ >>> + unsigned long rc; \ >> >> Similarly, why unsigned long here? > sizeof(unsigned long) is 8 bytes and it was chosen as we are working > with lc/sc.d which are working with 8 bytes. > >> >> I also wonder about the mix of underscore suffixed (or not) variable >> names here. > If the question about ret_, then the same as before size of ret > argument of the macros will be 1 or 2, but {lc/sc}.d expected to work > with 8 bytes. Then what's the uint32_t * about? >>> + release_barrier \ >>> + "0: lr.d %0, %2\n" \ >> >> Even here it's an 8-byte access. Even if - didn't check - the insn >> was >> okay to use with just a 4-byte aligned pointer, wouldn't it make >> sense >> then to 8-byte align it, and be consistent throughout this macro wrt >> the base unit acted upon? Alternatively, why not use lr.w here, thus >> reducing possible collisions between multiple CPUs accessing the same >> cache line? > According to the docs: > LR and SC operate on naturally-aligned 64-bit (RV64 only) or 32-bit > words in memory. Misaligned > addresses will generate misaligned address exceptions. > > My intention was to deal with 4-byte crossing boundary. so if ptr is 4- > byte aligned then by reading 8-bytes we shouldn't care about boundary > crossing, if I am not missing something. If a ptr is 4-byte aligned, there's no point reading more than 4 bytes. >>> + " and %1, %0, %z4\n" \ >>> + " or %1, %1, %z3\n" \ >>> + " sc.d %1, %1, %2\n" \ >>> + " bnez %1, 0b\n" \ >>> + acquire_barrier \ >>> + : "=" (ret_), "=" (rc), "+A" (*ptr_32b_aligned) \ >>> + : "rJ" (new_), "rJ" (~mask) \ >> >> I think that as soon as there are more than 2 or maybe 3 operands, >> legibility is vastly improved by using named asm() operands. > Just to clarify you mean that it would be better to use instead of %0 > use names? Yes. Just like you have it in one of the other patches that I looked at later. >>> + : "memory"); \ >> >> Nit: Missing blank before closing parenthesis. >> >>> + \ >>> + ret = (__typeof__(*(ptr)))((ret_ & mask) >> mask_l); \ >>> +}) >> >> Why does "ret" need to be a macro argument? If you had only the >> expression here, not the the assigment, ... >> >>> +#define __xchg_generic(ptr, new, size, sfx, release_barrier, >>> acquire_barrier) \ >>> +({ \ >>> + __typeof__(ptr) ptr__ = (ptr); \ >> >> Is this local variable really needed? Can't you use "ptr" directly >> in the three macro invocations? >> >>> + __typeof__(*(ptr)) new__ = (new); \ >>> + __typeof__(*(ptr)) ret__; \ >>> + switch (size) \ >>> + { \ >>> + case 1: \ >>> + case 2: \ >>> + emulate_xchg_1_2(ptr__, new__, ret__, release_barrier, >>> acquire_barrier); \ >> >> ... this would become >> >> ret__ = emulate_xchg_1_2(ptr__, new__, release_barrier, >> acquire_barrier); \ >> >> But, unlike assumed above, there's no enforcement here that a 2-byte >> quantity won't cross a word, double-word, cache line, or even page >> boundary. That might be okay if then the code would simply crash >> (like >> the AMO insns emitted further down would), but aiui silent >> misbehavior >> would result. > As I mentioned above with 4-byte alignment and then reading and working > with 8-byte then crossing a word or double-word boundary shouldn't be > an issue. > > I am not sure that I know how to check that we are crossing cache line > boundary. > > Regarding page boundary, if the next page is mapped then all should > work fine, otherwise it will be an exception. Are you sure lr.d / sc.d are happy to access across such a boundary, when both pages are mapped? To me it seems pretty clear that for atomic accesses you want to demand natural alignment, i.e. 2-byte alignment for 2-byte accesses. This way you can be sure no potentially problematic
Re: [PATCH] ns16550: add Asix AX99100 serial card
On Mon, Feb 19, 2024 at 09:57:49AM +0100, Jan Beulich wrote: > On 18.02.2024 02:34, Marek Marczykowski-Górecki wrote: > > @@ -1170,6 +1177,11 @@ static const struct ns16550_config __initconst > > uart_config[] = > > .dev_id = 0x7adc, > > .param = param_intel_lpss > > }, > > +{ > > +.vendor_id = PCI_VENDOR_ID_ASIX, > > +.dev_id = 9100, > > As per Linux this is 0x9100. Right... but then, maybe the patch isn't needed at all, as it does work for me. Maybe what's needed instead is some other patch already in staging. Initial attempt that did not work was with 4.17.something. I guess setting the fifo size isn't that important. > > +.param = param_asix_ax99100 > > +}, > > }; > > > > static int __init > > diff --git a/xen/include/xen/pci_ids.h b/xen/include/xen/pci_ids.h > > index e798477a7e23..2a19f4ab7872 100644 > > --- a/xen/include/xen/pci_ids.h > > +++ b/xen/include/xen/pci_ids.h > > @@ -11,3 +11,5 @@ > > #define PCI_VENDOR_ID_BROADCOM 0x14e4 > > > > #define PCI_VENDOR_ID_INTEL 0x8086 > > + > > +#define PCI_VENDOR_ID_ASIX 0x125b > > Please insert such that numeric sorting is retained. > > Jan -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: Stats on Xen tarball downloads
On 19.02.2024 11:31, Roger Pau Monné wrote: > On Mon, Feb 19, 2024 at 06:01:54PM +0800, George Dunlap wrote: >> One of the questions we had with respect to changing our release >> practice (for instance, making the process more light-weight so that >> we could do a point release after every XSA) was, "How many people are >> actually using the tarballs?" > > What would this more lightweight process involve from a downstream > PoV? IOW: in what would the contents of the tarball change compared > to the current releases? >From all prior discussion my conclusion was "no tarball at all". Jan
Re: Stats on Xen tarball downloads
On Mon, Feb 19, 2024 at 06:01:54PM +0800, George Dunlap wrote: > Hey all, > > One of the questions we had with respect to changing our release > practice (for instance, making the process more light-weight so that > we could do a point release after every XSA) was, "How many people are > actually using the tarballs?" What would this more lightweight process involve from a downstream PoV? IOW: in what would the contents of the tarball change compared to the current releases? > I finally got access (again) to > downloads.xenproject.org, and took a look at the logs. It appears > that we only keep about 2 weeks of logs there. > > Short answer: It's pretty clear from looking at the logs that there > are large numbers of automated build systems building various versions > of Xen from tarballs. It *looks* like there are over 300 people a > week downloading 4.18.0 specifically from various web browsers. As someone who packages Xen for FreeBSD, I've recently switched the build to use the git sources directly, as otherwise keeping up with XSA tends to be a pain, specially when XSAs happen to depend on the context of some of the backports that happened between the point release and the XSA disclosure. Overall as a consumer of Xen it would be helpful if we could make a release for each (batch) or XSAs, as that would possibly make me switch to build from the release tarballs instead of git. I don't think it would be much of a disruption if such change to generate more lightweight tarball is done starting from a major release (ie: 4.19) and minor releases of previous versions (4.18.x) are kept using the non-lightweight process. Thanks, Roger.
[xen-unstable test] 184699: tolerable FAIL
flight 184699 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/184699/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 184693 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184693 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 184695 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184695 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184695 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184695 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184695 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184695 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184695 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 184695 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184695 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184695 test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: xen 0441c3acc7e9e72e984ce49d32e61827894ae4a3 baseline version: xen 0441c3acc7e9e72e984ce49d32e61827894ae4a3 Last test of basis 184699 2024-02-19 01:52:12 Z0 days Testing same since (not found) 0 attempts jobs: build-amd64-xsm pass build-arm64-xsm pass
Re: [PATCH v3] Reduce assembly code size of exception entry points
On 16.02.2024 11:50, Frediano Ziglio wrote: > --- a/xen/arch/x86/x86_64/entry.S > +++ b/xen/arch/x86/x86_64/entry.S > @@ -22,6 +22,14 @@ > #endif > .endm > > +.macro BUILD_BUG_ON condstr cond:vararg > +.if \cond > +.error "Condition \condstr not satisfied" Maybe .error "Condition \"\condstr\" not satisfied" ? > @@ -187,7 +195,8 @@ FUNC_LOCAL(restore_all_guest) > SPEC_CTRL_EXIT_TO_PV/* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: > cd */ > > RESTORE_ALL > -testw $TRAP_syscall,4(%rsp) > +BUILD_BUG_ON(TRAP_syscall & 0xff) > +testb $TRAP_syscall >> 8,4+1(%rsp) > jziret_exit_to_guest Nit: Blank after comma please (and again elsewhere). Preferably with both adjustments (which I'd be okay making while committing, so long as you agree specifically on the former) Reviewed-by: Jan Beulich That said, especially with this not being an entry point path, and neither this nor ... > @@ -254,7 +263,8 @@ FUNC(lstar_enter) > pushq $FLAT_KERNEL_CS64 > pushq %rcx > pushq $0 > -movl $TRAP_syscall, 4(%rsp) > +BUILD_BUG_ON(TRAP_syscall & 0xff) > +movb $TRAP_syscall >> 8, 4+1(%rsp) > SAVE_ALL > > SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd > */ > @@ -292,7 +302,8 @@ FUNC(cstar_enter) > pushq $FLAT_USER_CS32 > pushq %rcx > pushq $0 > -movl $TRAP_syscall, 4(%rsp) > +BUILD_BUG_ON(TRAP_syscall & 0xff) > +movb $TRAP_syscall >> 8, 4+1(%rsp) > SAVE_ALL > > SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd > */ > @@ -334,7 +345,8 @@ LABEL(sysenter_eflags_saved, 0) > pushq $3 /* ring 3 null cs */ > pushq $0 /* null rip */ > pushq $0 > -movl $TRAP_syscall, 4(%rsp) > +BUILD_BUG_ON(TRAP_syscall & 0xff) > +movb $TRAP_syscall >> 8, 4+1(%rsp) > SAVE_ALL ... any of these being exception entry point paths (and hence none of these changes being related to the subject), it would likely have been a good idea to split this into two patches. Jan
Re: [PATCH v3] x86/vmx: add support for virtualize SPEC_CTRL
On 15.02.2024 17:46, Roger Pau Monne wrote: > The feature is defined in the tertiary exec control, and is available starting > from Sapphire Rapids and Alder Lake CPUs. > > When enabled, two extra VMCS fields are used: SPEC_CTRL mask and shadow. Bits > set in mask are not allowed to be toggled by the guest (either set or clear) > and the value in the shadow field is the value the guest expects to be in the > SPEC_CTRL register. > > By using it the hypervisor can force the value of SPEC_CTRL bits behind the > guest back without having to trap all accesses to SPEC_CTRL, note that no bits > are forced into the guest as part of this patch. It also allows getting rid > of > SPEC_CTRL in the guest MSR load list, since the value in the shadow field will > be loaded by the hardware on vmentry. > > Signed-off-by: Roger Pau Monné Reviewed-by: Jan Beulich
Re: [PATCH] ns16550: add Asix AX99100 serial card
On 18.02.2024 02:34, Marek Marczykowski-Górecki wrote: > @@ -1170,6 +1177,11 @@ static const struct ns16550_config __initconst > uart_config[] = > .dev_id = 0x7adc, > .param = param_intel_lpss > }, > +{ > +.vendor_id = PCI_VENDOR_ID_ASIX, > +.dev_id = 9100, As per Linux this is 0x9100. > +.param = param_asix_ax99100 > +}, > }; > > static int __init > diff --git a/xen/include/xen/pci_ids.h b/xen/include/xen/pci_ids.h > index e798477a7e23..2a19f4ab7872 100644 > --- a/xen/include/xen/pci_ids.h > +++ b/xen/include/xen/pci_ids.h > @@ -11,3 +11,5 @@ > #define PCI_VENDOR_ID_BROADCOM 0x14e4 > > #define PCI_VENDOR_ID_INTEL 0x8086 > + > +#define PCI_VENDOR_ID_ASIX 0x125b Please insert such that numeric sorting is retained. Jan
Re: [PATCH 5/5] mm: add the __must_check attribute to {gfn,mfn}_add()
On 19.02.2024 06:07, George Dunlap wrote: > On Wed, Feb 14, 2024 at 7:42 PM Roger Pau Monne wrote: >> >> It's not obvious from the function itself whether the incremented value will >> be >> stored in the parameter, or returned to the caller. That has leads to bugs >> in >> the past as callers assume the incremented value is stored in the parameter. >> >> Add the __must_check attribute to the function to easily spot callers that >> don't consume the returned value, which signals an error in the caller logic. >> >> No functional change intended. > > Just thinking out loud here... I wonder if "mfn_plus()" would be less > likely to be misunderstood. Catching this during compile is def > better than catching it w/ Coverity or Eclair, but even better to help > people not make the mistake in the first place. To me while mfn_plus() would indeed be somewhat less likely to cause this kind of confusion, I don't think that naming would entirely eliminate the risk. I was actually considering to add mfn_inc() for the common case of incrementing by 1, yet I can't see an alternative name avoiding the issue there. Jan
Re: [PATCH v13.1 01/14] vpci: use per-domain PCI lock to protect vpci structure
On Fri, Feb 16, 2024 at 09:41:19AM -0500, Stewart Hildebrand wrote: > On 2/16/24 06:44, Roger Pau Monné wrote: > > On Thu, Feb 15, 2024 at 03:30:00PM -0500, Stewart Hildebrand wrote: > >> From: Oleksandr Andrushchenko > >> > >> Use the per-domain PCI read/write lock to protect the presence of the > >> pci device vpci field. This lock can be used (and in a few cases is used > >> right away) so that vpci removal can be performed while holding the lock > >> in write mode. Previously such removal could race with vpci_read for > >> example. > >> > >> When taking both d->pci_lock and pdev->vpci->lock, they should be > >> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid > >> possible deadlock situations. > >> > >> 1. Per-domain's pci_lock is used to protect pdev->vpci structure > >> from being removed. > >> > >> 2. Writing the command register and ROM BAR register may trigger > >> modify_bars to run, which in turn may access multiple pdevs while > >> checking for the existing BAR's overlap. The overlapping check, if > >> done under the read lock, requires vpci->lock to be acquired on both > >> devices being compared, which may produce a deadlock. It is not > >> possible to upgrade read lock to write lock in such a case. So, in > >> order to prevent the deadlock, use d->pci_lock in write mode instead. > >> > >> All other code, which doesn't lead to pdev->vpci destruction and does > >> not access multiple pdevs at the same time, can still use a > >> combination of the read lock and pdev->vpci->lock. > >> > >> 3. Drop const qualifier where the new rwlock is used and this is > >> appropriate. > >> > >> 4. Do not call process_pending_softirqs with any locks held. For that > >> unlock prior the call and re-acquire the locks after. After > >> re-acquiring the lock there is no need to check if pdev->vpci exists: > >> - in apply_map because of the context it is called (no race condition > >>possible) > >> - for MSI/MSI-X debug code because it is called at the end of > >>pdev->vpci access and no further access to pdev->vpci is made > >> > >> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev() > >> while accessing pdevs in vpci code. > >> > >> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs > >> do not go away. The vPCI functions call several MSI-related functions > >> which already have existing non-vPCI callers. Change those MSI-related > >> functions to allow using either pcidevs_lock() or d->pci_lock for > >> ensuring pdevs do not go away. Holding d->pci_lock in read mode is > >> sufficient. Note that this pdev protection mechanism does not protect > >> other state or critical sections. These MSI-related functions already > >> have other race condition and state protection mechanims (e.g. > >> d->event_lock and msixtbl RCU), so we deduce that the use of the global > >> pcidevs_lock() is to ensure that pdevs do not go away. > >> > >> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking > >> that pdevs do not go away. The purpose of this wrapper is to aid > >> readability and document the intent of the pdev protection mechanism. > >> > >> 8. When possible, the existing non-vPCI callers of these MSI-related > >> functions haven't been switched to use the newly introduced per-domain > >> pci_lock, and will continue to use the global pcidevs_lock(). This is > >> done to reduce the risk of the new locking scheme introducing > >> regressions. Those users will be adjusted in due time. One exception > >> is where the pcidevs_lock() in allocate_and_map_msi_pirq() is moved to > >> the caller, physdev_map_pirq(): this instance is switched to > >> read_lock(>pci_lock) right away. > >> > >> Suggested-by: Roger Pau Monné > >> Suggested-by: Jan Beulich > >> Signed-off-by: Oleksandr Andrushchenko > >> Signed-off-by: Volodymyr Babchuk > >> Signed-off-by: Stewart Hildebrand > > > > A couple of questions and the pdev_list_is_read_locked() needs a small > > adjustment. > > > >> @@ -895,6 +891,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix) > >> { > >> unsigned int i; > >> > >> +/* > >> + * Assert that d->pdev_list doesn't change. > >> pdev_list_is_read_locked() is > >> + * not suitable here because we may > >> read_unlock(>domain->pci_lock) > >> + * before returning. > > > > I'm confused by this comment, as I don't see why it matters that the > > lock might be lock before returning. We need to ensure the lock is > > taken at the time of the assert, and hence pdev_list_is_read_locked() > > can be used. > > pdev_list_is_read_locked() currently would allow either pcidevs_lock() > or d->pci_lock. If vpci_msix_arch_print() is entered with only > pcidevs_lock() held, we may end up unlocking d->pci_lock when it is > not locked, which would be wrong. > > Perhaps the comment could be clarified as: > > /* > * Assert that d->pdev_list doesn't change. > ASSERT_PDEV_LIST_IS_READ_LOCKED > * is not suitable
Re: [PATCH v2] x86: amend 'n' debug-key output with SMI count
On 16.02.2024 10:11, Roger Pau Monné wrote: > On Wed, Feb 14, 2024 at 11:15:51AM +0100, Jan Beulich wrote: >> --- a/xen/arch/x86/cpu/common.c >> +++ b/xen/arch/x86/cpu/common.c >> @@ -407,9 +407,15 @@ void __init early_cpu_init(bool verbose) >> paddr_bits -= (ebx >> 6) & 0x3f; >> } >> >> -if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) >> +if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) { >> +uint64_t smi_count; >> + >> park_offline_cpus = opt_mce; >> >> +if (!verbose && !rdmsr_safe(MSR_SMI_COUNT, smi_count)) >> +setup_force_cpu_cap(X86_FEATURE_SMI_COUNT); > > Why make it dependent on !verbose? The call with !verbose is tied to > part of the ucode loading being half-functional (for example > MCU_CONTROL_DIS_MCU_LOAD not being set) but I don't see that as a > signal that SMI count shouldn't be used. > > does it need to be part of the early cpu initialization instead of > being in the (later) Intel specific init code part of the > identify_cpu()? Yes, the condition was inverted. It could likely also be dropped altogether; not sure which one's better: On one hand avoiding multiple setup_force_cpu_cap() seems desirable (albeit not strictly necessary), while otoh the code would be simpler without. >> --- a/xen/arch/x86/nmi.c >> +++ b/xen/arch/x86/nmi.c >> @@ -585,15 +585,34 @@ static void cf_check do_nmi_trigger(unsi >> self_nmi(); >> } >> >> +static DEFINE_PER_CPU(unsigned int, smi_count); >> + >> +static void cf_check read_smi_count(void *unused) >> +{ >> +unsigned int dummy; >> + >> +rdmsr(MSR_SMI_COUNT, this_cpu(smi_count), dummy); >> +} >> + >> static void cf_check do_nmi_stats(unsigned char key) >> { >> const struct vcpu *v; >> unsigned int cpu; >> bool pend, mask; >> >> -printk("CPU\tNMI\n"); >> +printk("CPU\tNMI%s\n", boot_cpu_has(X86_FEATURE_SMI_COUNT) ? "\tSMI" : >> ""); >> + >> +if ( boot_cpu_has(X86_FEATURE_SMI_COUNT) ) >> +on_each_cpu(read_smi_count, NULL, 1); >> + >> for_each_online_cpu ( cpu ) >> -printk("%3u\t%3u\n", cpu, per_cpu(nmi_count, cpu)); >> +{ >> +printk("%3u\t%3u", cpu, per_cpu(nmi_count, cpu)); >> +if ( boot_cpu_has(X86_FEATURE_SMI_COUNT) ) >> +printk("\t%3u\n", per_cpu(smi_count, cpu)); >> +else >> +printk("\n"); >> +} >> >> if ( !hardware_domain || !(v = domain_vcpu(hardware_domain, 0)) ) >> return; > > Could you also amend the debug-key help text to mention SMI? Hmm, I had considered that and decided against. I'm uncertain, nevertheless, so could be talked into amending that help text. Just that I can't make it "NMI and SMI statistics" as whether SMI data is available is conditional. Yet "NMI (and maybe SMI) statistics" looks a little clumsy to me ... Jan
Re: [PATCH] build/xen: fail to rebuild if Kconfig fails
On 16.02.2024 11:51, Roger Pau Monné wrote: > On Fri, Feb 16, 2024 at 11:04:46AM +0100, Jan Beulich wrote: >> On 15.02.2024 18:23, Roger Pau Monné wrote: >>> On Thu, Feb 15, 2024 at 05:22:00PM +0100, Jan Beulich wrote: On 15.02.2024 17:08, Roger Pau Monné wrote: > On Thu, Feb 15, 2024 at 02:02:41PM +0100, Jan Beulich wrote: >> --- a/xen/Rules.mk >> +++ b/xen/Rules.mk >> @@ -15,7 +15,11 @@ srcdir := $(srctree)/$(src) >> PHONY := __build >> __build: >> >> --include $(objtree)/include/config/auto.conf >> +ifneq ($(obj),tools) >> +ifneq ($(obj),tools/kconfig) >> +include $(objtree)/include/config/auto.conf >> +endif >> +endif > > Trying to understand this, I assume it's to avoid an infinite > dependency loop that generating include/config/auto.conf requires some > tools that are build using xen/Rules.mk? The file has dependencies only in xen/Makefile. This is about the file simply not being there when initially building. Perhaps the patch description helps that I've written in the meantime: "Because of using "-include", failure to (re)build auto.conf (with auto.conf.cmd produced as a secondary target) won't stop make from continuing the build. Arrange for it being possible to drop the - from Rules.mk, requiring that the include be skipped for tools-only targets. >>> >>> Wouldn't it be more reliable if we skipped the include for any paths >>> in $(obj) that start with 'tools', rather than hardcoding 'tools' and >>> 'tools/kconfig'? >> >> I was first meaning to do so, but the expression would end up more >> complex than I'd like (for it needing to be an exact match of "tools" >> and a prefix match of "tools/"). Thinking of it, >> >> ifneq ($(obj),tools) >> ifneq ($(patsubst tools/%,$(obj)),) >> >> might do (and not be as complex as I first thought, when intending to >> put all in a single "if"). > > Would something like the rune below work? > > ifneq ($(word 1, $(subst /, ,$(obj))),tools) > > That should allow to have a single condition, and should match both > 'tools' and 'tools/*' Hmm, yes, that works. $(subst ...) is something I usually try to avoid, in favor of $(patsubst ...). Jan
Re: [XEN PATCH v2 2/3] x86/uaccess: replace __{get,put}_user_bad() with STATIC_ASSERT_UNREACHABLE()
On 15/02/24 11:32, Jan Beulich wrote: The important difference is: Here we're told that there was a use of __put_user_bad, which is easy to grep for, and thus see how the supplied function / file / line(?) relate to the ultimate problem. I'm afraid I'm meanwhile confused enough by the various replies containing results of experimentation that I can't really tell anymore what case is best. Hence I can only restate my expectation for an eventual v3: Diagnosing what the issue is, no matter whether the new macro is used in another macro or in an inline function, should not become meaningfully more difficult. In how far this is the case wants clarifying in the description of the change. I think the best thing at the moment is to deviate __{get,put}_user_bad() for Rule 16.3. I'll let maintainers further explore the possibility of having a compile-time assertion based on the assembler error. -- Federico Serafini, M.Sc. Software Engineer, BUGSENG (http://bugseng.com)
Re: [PATCH v4 26/30] xen/riscv: add minimal stuff to mm.h to build full Xen
On 16.02.2024 12:03, Oleksii wrote: >> >>> + } free; >>> + } u; >>> + >>> + union { >>> + /* Page is in use, but not as a shadow. */ >> >> I'm also pretty sure I asked before what shadow this comment alludes >> to. > I missed your request about 'shadow' before. > > The comment arrived from Arm. > > I tried to find out the answer by investigation how 'inuse' is used, > and, unfortunately, I couldn't find an answer what 'shadow' alludes to. That's from x86'es shadow paging, where a page can serve as a "shadow" of a guest page table page. Jan
Re: [PATCH v4 25/30] xen/riscv: add minimal stuff to processor.h to build full Xen
On 16.02.2024 12:16, Oleksii wrote: > On Thu, 2024-02-15 at 17:43 +0100, Jan Beulich wrote: >> On 15.02.2024 17:38, Oleksii wrote: >>> On Tue, 2024-02-13 at 14:33 +0100, Jan Beulich wrote: On 05.02.2024 16:32, Oleksii Kurochko wrote: > + depends on LLD_VERSION >= 15 || LD_VERSION >= > 23600 What's the linker dependency here? Depending on the answer I might further ask why "TOOLCHAIN" when elsewhere we use CC_HAS_ or HAS_CC_ or HAS_AS_. >>> I missed to introduce {L}LLD_VERSION config. It should output from >>> the >>> command: >>> riscv64-linux-gnu-ld --version >> >> Doesn't answer my question though where the linker version matters >> here. > Then I misinterpreted your initial question. > Could you please provide further clarification or rephrase it for > better understanding? > > Probably, your question was about why linker dependency is needed here, > then > it is not sufficient to check if a toolchain supports a particular > extension without checking if the linker supports that extension > too. > For example, Clang 15 supports Zihintpause but GNU bintutils > 2.35.2 does not, leading build errors like so: > >riscv64-linux-gnu-ld: -march=rv64i_zihintpause2p0: Invalid or >unknown z ISA extension: 'zihintpause' Hmm, that's certainly "interesting" behavior of the RISC-V linker. Yet isn't the linker capability expected to be tied to that of gas? I would find it far more natural if a gas dependency existed here. If such a connection cannot be taken for granted, I'm pretty sure you'd need to probe both then anyway. Jan