[xen-4.14-testing test] 168013: tolerable FAIL - PUSHED
flight 168013 xen-4.14-testing real [real] flight 168019 xen-4.14-testing real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/168013/ http://logs.test-lab.xenproject.org/osstest/logs/168019/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 168019-retest test-arm64-arm64-libvirt-raw 12 debian-di-install fail pass in 168019-retest Tests which did not succeed, but are not blocking: test-arm64-arm64-libvirt-raw 14 migrate-support-check fail in 168019 never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-check fail in 168019 never pass test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167964 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167964 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167964 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167964 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167964 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167964 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167964 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167964 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167964 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167964 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 167964 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167964 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14
[linux-linus test] 168012: tolerable FAIL - PUSHED
flight 168012 linux-linus real [real] flight 168020 linux-linus real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/168012/ http://logs.test-lab.xenproject.org/osstest/logs/168020/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-credit1 23 guest-start.2 fail pass in 168020-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168004 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 168004 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168004 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168004 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168004 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168004 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 168004 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 168004 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: linuxdcb85f85fa6f142aae1fe86f399d4503d49f2b60 baseline version: linux1f2cfdd349b7647f438c1e552dc1b983da86d830 Last test of basis 168004 2022-02-04 00:41:22 Z1 days Testing same since 168012 2022-02-04 16:57:25 Z0 days1 attempts People who touched revisions under test: Alex Elder Alexander Aring Alexandre Belloni Alexei Starovoitov Andrii Nakryiko Arınç ÜNAL Camel Guo Christian Brauner
Re: x86: insn-eval.c's use of native_store_gdt()
On Fri, Feb 04, 2022 at 03:13:52PM +0100, Jan Beulich wrote: > On 04.02.2022 15:08, Thomas Gleixner wrote: > > On Fri, Feb 04 2022 at 10:23, Jan Beulich wrote: > >> On 30.11.2021 12:09, Jan Beulich wrote: > >>> I think it is b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage") > >>> which uncovered an issue with get_desc() trying to access the GDT, as > >>> introduced by 670f928ba09b ("x86/insn-eval: Add utility function to > >>> get segment descriptor"). When running in a PV domain under Xen, the > >>> (hypervisor's) GDT isn't accessible; with UMIP enabled by Xen even > >>> SGDT wouldn't work, as the kernel runs in ring 3. > >>> > >>> There's currently no hypercall to retrieve a descriptor from the GDT. > >>> It is instead assumed that kernels know where their present GDT > >>> lives. Can the native_store_gdt() be replaced there, please? > >>> > >>> For context (I don't think it should matter much here) I'm observing > >>> this with the kernel put underneath a rather old distro, where > >>> hwclock triggers this path. > >> > >> I'd like to note that the issue still exists in 5.16. > > > > I'd like to note, that I've seen no patches to that effect. > > I could have worded it that way as well, yes. Hi Jan, I am sorry I missed your email. I'll look at the issue you describe and get back to you. Thanks and BR, Ricardo
Re: [PATCH v3 1/2] dt-bindings: arm: xen: document Xen iommu device
On Wed, Jan 26, 2022 at 10:56:39AM -0800, Stefano Stabellini wrote: > On Wed, 26 Jan 2022, Robin Murphy wrote: > > On 2022-01-26 15:09, Sergiy Kibrik wrote: > > > Hi Robin, > > > > > > > > > > > This could break Linux guests, since depending on the deferred probe > > > > timeout setting it could lead to drivers never probing because the > > > > "IOMMU" > > > > never becomes available. > > > > > > > > > > I've noticed no deferred probe timeouts when booting with this patch. > > > Could > > > you please explain more on how this would break guests? > > > > Right now I think it would actually require command-line intervention, e.g. > > "fw_devlink=on" or "deferred_probe_timeout=3600" (with modules enabled for > > the > > latter to take full effect), but I'm wary of the potential for future config > > options to control those behaviours by default. fw_devlink=on is now the default (for at least a couple of cycles). > > If deferred_probe_timeout=3600 was specified, we would just need an > IOMMU driver in Linux for the "xen,iommu-el2-v1" node to solve the > problem, right? I guess I am trying to say that it wouldn't be a device > tree interface problem but rather a Linux implementation discussion. You would have to add that IOMMU driver to old, existing kernels if you want compatibility with a new DT. Otherwise, that kernel would stop booting with a new DT. Rob
[xen-unstable test] 168008: tolerable FAIL
flight 168008 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/168008/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-credit1 22 guest-start/debian.repeat fail pass in 168001 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168001 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 168001 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168001 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168001 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168001 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168001 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 168001 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168001 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168001 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 168001 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168001 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168001 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version
Re: Xen data from meta-virtualization layer
Hi Julien, Am 2022-02-05 00:29, schrieb Julien Grall: [..] But not a very user friendly one, though. I guess the first UART is disabled/removed by Xen? I haven't looked at how it is handled. Can't we search for other uarts with the same interrupt and disable these, too? Maybe conditionally by the SoC compatible? The problem sounds quite similar to the one we had on sunxi. Although the UART was on the same page rather than sharing interrupts. Xen has per-platform hook to prevent a device been assigned to dom0. For an example, you could look at: https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/arm/platforms/sunxi.c Nice. At least, this is working now ;) I'll send a patch in the next few days. I guess it's safe to assume that we can always remove both UARTs on the LS1028A (probably on most layerscapes). The most common use case is to use the first UART for Xen. You could run Xen without console (?), then you'd miss the possibility to map the DUART. Or there could be a new driver for the LPUART on the LS1028A. In this case, the DUART wouldn't be used by Xen either. But I think we should start simple and just remove the DUART altogether via that hook. -michael
Re: Xen data from meta-virtualization layer
Hi Michael, On Fri, 4 Feb 2022 at 22:42, Michael Walle wrote: > Am 2022-02-04 22:11, schrieb Stefano Stabellini: > > On Fri, 4 Feb 2022, Michael Walle wrote: > >> > In regards to the reserved-memory regions, maybe we are not seeing them > >> > because Leo posted the host device tree, not the one passed at runtime > >> > from u-boot to Linux? > >> > > >> > If so, Leo, could you please boot Linux on native (no Xen) and get the > >> > device tree from there at runtime using dtc -I fs -O dts > >> > /proc/device-tree ? > >> > > >> > > >> > However, the name of the reserved-memory region created by u-boot seems > >> > to be "lpi_rd_table". I cannot find any mentions of lpi_rd_table in the > >> > Linux kernel tree either. > >> > > >> > Zhiqiang, Leo is trying to boot Xen on sAL28. Linux booting on Xen > >> > throws errors in regards to GIC/ITS initialization. On other hardware > >> > Xen can use and virtualize GICv3 and ITS just fine. Could you please > >> > explain what is different about sAL28 and how Xen/Linux is expected to > >> > use the lpi_rd_table reserved-memory region? > >> > >> I actually stumbled across this thread after trying out Xen myself. > >> I'm > >> using lastest vanilla u-boot (with pending PSCI patches), vanilla > >> kernel > >> and vanilla Xen. > >> > >> So far I've discovered, that xen complains that it cannot route IRQ64 > >> to > >> dom0. That is because on the LS1028A there is a dual UART (two 16550 > >> with one shared interrupt) and xen takes the first UART and then tries > >> to map the interrupt of the second UART to linux. For now, I don't > >> know > >> how this is solved correctly. As a quick hack, I removed the second > >> uart > >> node from the device tree. > > > > This is an interesting problem. Removing the second UART is a good > > workaround for now as there is no obvious solution I think. > > But not a very user friendly one, though. I guess the first UART > is disabled/removed by Xen? I haven't looked at how it is handled. > > Can't we search for other uarts with the same interrupt and disable > these, too? Maybe conditionally by the SoC compatible? The problem sounds quite similar to the one we had on sunxi. Although the UART was on the same page rather than sharing interrupts. Xen has per-platform hook to prevent a device been assigned to dom0. For an example, you could look at: https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/arm/platforms/sunxi.c > >> This is the first developer experience with Xen, so please bear with > >> me > >> :) It seems that Xen doesn't add the master to the IOMMU. To me it > >> seems > >> that only devices with a 'iommus' dt property are added. But in case > >> of > >> PCI devices the parent only has a iommu-map property. > >> > >> And it makes me wonder why Leo has an almost working setup. Maybe I'm > >> missing some patches though. > > > > Xen 4.16 is able to parse StreamID in the "iommus" property and also > > "mmu-masters" property. But It is not able to parse the "iommu-map" > > property yet. So if 0x42a is described in device tree using "iommu-map" > > then the error makes sense. > > > > A simple solution is to replace iommu-map with iommus in device tree. > > I'm not sure this is so easy, because they are dynamically assigned > by the bootloader. Sure for now I could do that I guess, but iommu=0 > works as well ;) > > I now got Xen and Linux booting and see the same problems with the > GIC ITS, that is that the enetc interrupts aren't delivered to the > dom0 linux. I've also applied the patch in this thread and I'm > seeing the same as Leo. Full boot log is here [1]. > > I noticed the following. > [0.168544] pci :00:00.0: Failed to add - passthrough or > MSI/MSI-X might fail! This message is harmless. This is printed because Xen on Arm doesn't hypercall the hypercall to add a PCI device. On Arm, we don't need it yet (it might be necessary for PCI passthrough) and MSI/MSI-X are handled/registered the same way as on bare-metal (for your case through the ITS) > > Not sure if it should work nonetheless. I looked through the log and couldn't spot anything obvious. However, skimming through Linux, I noticed there are some changes in the ITS for freescale (now NXP) such as: drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c Is it something that might be used on the board you are using? If the answer is yes, then my suggestion would be to look how this is meant to interact with the ITS. It might be possible that we are missing some pieces in Xen to properly support it. > > > It is possible that someone in CC to this email might already have a > > patch to introduce parsing of iommu-map in Xen. Pass. I don't have any and couldn't find any submission on the ML. > > I guess they've used the old mmu-masters property. > > Btw. I don't know if it matters, but the SMARC-sAL28 normally doesn't > use TF-A and runs without it. Nonetheless, I've booted the board with > the bl31 from NXP and it doesn't help either. There is
Re: Xen data from meta-virtualization layer
Hi Stefano, Am 2022-02-04 22:11, schrieb Stefano Stabellini: On Fri, 4 Feb 2022, Michael Walle wrote: > In regards to the reserved-memory regions, maybe we are not seeing them > because Leo posted the host device tree, not the one passed at runtime > from u-boot to Linux? > > If so, Leo, could you please boot Linux on native (no Xen) and get the > device tree from there at runtime using dtc -I fs -O dts > /proc/device-tree ? > > > However, the name of the reserved-memory region created by u-boot seems > to be "lpi_rd_table". I cannot find any mentions of lpi_rd_table in the > Linux kernel tree either. > > Zhiqiang, Leo is trying to boot Xen on sAL28. Linux booting on Xen > throws errors in regards to GIC/ITS initialization. On other hardware > Xen can use and virtualize GICv3 and ITS just fine. Could you please > explain what is different about sAL28 and how Xen/Linux is expected to > use the lpi_rd_table reserved-memory region? I actually stumbled across this thread after trying out Xen myself. I'm using lastest vanilla u-boot (with pending PSCI patches), vanilla kernel and vanilla Xen. So far I've discovered, that xen complains that it cannot route IRQ64 to dom0. That is because on the LS1028A there is a dual UART (two 16550 with one shared interrupt) and xen takes the first UART and then tries to map the interrupt of the second UART to linux. For now, I don't know how this is solved correctly. As a quick hack, I removed the second uart node from the device tree. This is an interesting problem. Removing the second UART is a good workaround for now as there is no obvious solution I think. But not a very user friendly one, though. I guess the first UART is disabled/removed by Xen? I haven't looked at how it is handled. Can't we search for other uarts with the same interrupt and disable these, too? Maybe conditionally by the SoC compatible? But what is more severe is that the iommu isn't set up correctly. I'm getting the following faults: (XEN) smmu: /soc/iommu@500: Unexpected global fault, this could be serious (XEN) smmu: /soc/iommu@500: GFSR 0x8002, GFSYNR0 0x, GFSYNR1 0x042a, GFSYNR2 0x If I decode it correctly, the streamid should be 0x2a which would be one of the PCI devices on the internal root complex. Probably the network card. Yes there is DMA transaction with an "unknown" StreamID. I think the StreamID is 0x42a. It means that there is a DMA master on the board with StreamID 0x42a that is either: - not described in device tree - described in device tree with a different StreamID - the right StreamID is described device tree, but it is not picked up by Xen See below. This is the first developer experience with Xen, so please bear with me :) It seems that Xen doesn't add the master to the IOMMU. To me it seems that only devices with a 'iommus' dt property are added. But in case of PCI devices the parent only has a iommu-map property. And it makes me wonder why Leo has an almost working setup. Maybe I'm missing some patches though. Xen 4.16 is able to parse StreamID in the "iommus" property and also "mmu-masters" property. But It is not able to parse the "iommu-map" property yet. So if 0x42a is described in device tree using "iommu-map" then the error makes sense. A simple solution is to replace iommu-map with iommus in device tree. I'm not sure this is so easy, because they are dynamically assigned by the bootloader. Sure for now I could do that I guess, but iommu=0 works as well ;) I now got Xen and Linux booting and see the same problems with the GIC ITS, that is that the enetc interrupts aren't delivered to the dom0 linux. I've also applied the patch in this thread and I'm seeing the same as Leo. Full boot log is here [1]. I noticed the following. [0.168544] pci :00:00.0: Failed to add - passthrough or MSI/MSI-X might fail! Not sure if it should work nonetheless. It is possible that someone in CC to this email might already have a patch to introduce parsing of iommu-map in Xen. I guess they've used the old mmu-masters property. Btw. I don't know if it matters, but the SMARC-sAL28 normally doesn't use TF-A and runs without it. Nonetheless, I've booted the board with the bl31 from NXP and it doesn't help either. There is still a difference between the NXP bootflow which uses bl1/bl2/bl31/u-boot and this board which uses u-boot-spl/u-boot or u-boot-spl/bl31/u-boot. I just found GIC setup code in the bl31. I'll also have a look there. -michael [1] https://pastebin.com/raw/XMjE3BvG
Re: [PATCH 04/16] x86/P2M: move map_domain_gfn() (again)
On Mon, Jul 5, 2021 at 5:07 PM Jan Beulich wrote: > The main user is the guest walking code, so move it back there; commit > 9a6787cc3809 ("x86/mm: build map_domain_gfn() just once") would perhaps > better have kept it there in the first place. This way it'll only get > built when it's actually needed (and still only once). > > This also eliminates one more CONFIG_HVM conditional from p2m.c. > > Signed-off-by: Jan Beulich > Reviewed-by: George Dunlap
Re: [PATCH 03/16] x86/P2M: drop a few CONFIG_HVM
On Mon, Jul 5, 2021 at 5:06 PM Jan Beulich wrote: > This is to make it easier to see which parts of p2m.c still aren't HVM- > specific: In one case the conditionals sat in an already guarded region, > while in the other case P2M_AUDIT implies HVM. > I think this would be much more easy to understand what's going on if it was more like this: --- x86/p2m: P2M_AUDIT implies CONFIG_HVM Remove one #endif / #ifdef CONFIG_HVM pair to make all the audit code CONFIG_HVM only. This is to make it easier to see which parts of p2m.c still aren't HVM-specific. While here, remove a redundant set of nested #ifdef CONFIG_HVM. --- Reviewed-by: George Dunlap
Re: [PATCH 02/16] x86/P2M: introduce p2m_{add,remove}_page()
On Mon, Jul 5, 2021 at 5:06 PM Jan Beulich wrote: > p2m_add_page() is simply a rename from guest_physmap_add_entry(). > p2m_remove_page() then is its counterpart, despite rendering > guest_physmap_remove_page(). This way callers can use suitable pairs of > functions (previously violated by hvm/grant_table.c). > Obviously this needs some clarification. While we're here, I find this a bit confusing; I tend to use the present tense for the way the code is before the patch, and the imperative for what the patch does; so Id' say: Rename guest_physmap_add_entry() to p2m_add_page; make guest_physmap_remove_page a wrapper with p2m_remove_page. That way callers can use suitable pairs... Other than that looks good. -George
Re: [PATCH 01/16] x86/P2M: rename p2m_remove_page()
On Mon, Jul 5, 2021 at 5:05 PM Jan Beulich wrote: > This is in preparation to re-using the original name. > > Signed-off-by: Jan Beulich > Hey Jan, This series overall looks good; thanks for taking this on. Functionally this patch looks good; just one question... --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -788,8 +788,8 @@ void p2m_final_teardown(struct domain *d > #ifdef CONFIG_HVM > > static int __must_check > -p2m_remove_page(struct p2m_domain *p2m, gfn_t gfn, mfn_t mfn, > -unsigned int page_order) > +p2m_remove_entry(struct p2m_domain *p2m, gfn_t gfn, mfn_t mfn, > + unsigned int page_order) > One question that's naturally raised for both this and the following patch is, what is the new naming "scheme" for these renamed functions, and how do they relate to the old scheme? Overall it seems like the intention is that "guest_physmap_..." can be called on a domain which may be PV or HVM, while "p2m_..." should only be called on HVM domains. There's also "..._entry" vs "..._page". Is the p2m_remove_page / p2m_remove_entry distinction have a meaning, and is it the same meaning as guest_physmap_add_page / guest_physmap_add_entry? Or is it similar to p2m_init_nestedp2m / p2m_nestedp2m_init -- we need both functions and don't want to make the names longer? -George
Re: Xen data from meta-virtualization layer
On Fri, 4 Feb 2022, Michael Walle wrote: > > In regards to the reserved-memory regions, maybe we are not seeing them > > because Leo posted the host device tree, not the one passed at runtime > > from u-boot to Linux? > > > > If so, Leo, could you please boot Linux on native (no Xen) and get the > > device tree from there at runtime using dtc -I fs -O dts > > /proc/device-tree ? > > > > > > However, the name of the reserved-memory region created by u-boot seems > > to be "lpi_rd_table". I cannot find any mentions of lpi_rd_table in the > > Linux kernel tree either. > > > > Zhiqiang, Leo is trying to boot Xen on sAL28. Linux booting on Xen > > throws errors in regards to GIC/ITS initialization. On other hardware > > Xen can use and virtualize GICv3 and ITS just fine. Could you please > > explain what is different about sAL28 and how Xen/Linux is expected to > > use the lpi_rd_table reserved-memory region? > > I actually stumbled across this thread after trying out Xen myself. I'm > using lastest vanilla u-boot (with pending PSCI patches), vanilla kernel > and vanilla Xen. > > So far I've discovered, that xen complains that it cannot route IRQ64 to > dom0. That is because on the LS1028A there is a dual UART (two 16550 > with one shared interrupt) and xen takes the first UART and then tries > to map the interrupt of the second UART to linux. For now, I don't know > how this is solved correctly. As a quick hack, I removed the second uart > node from the device tree. This is an interesting problem. Removing the second UART is a good workaround for now as there is no obvious solution I think. > But what is more severe is that the iommu isn't set up correctly. I'm > getting the following faults: > > (XEN) smmu: /soc/iommu@500: Unexpected global fault, this could be serious > (XEN) smmu: /soc/iommu@500: GFSR 0x8002, GFSYNR0 0x, > GFSYNR1 0x042a, GFSYNR2 0x > > If I decode it correctly, the streamid should be 0x2a which would be one > of the PCI devices on the internal root complex. Probably the network > card. Yes there is DMA transaction with an "unknown" StreamID. I think the StreamID is 0x42a. It means that there is a DMA master on the board with StreamID 0x42a that is either: - not described in device tree - described in device tree with a different StreamID - the right StreamID is described device tree, but it is not picked up by Xen > This is the first developer experience with Xen, so please bear with me > :) It seems that Xen doesn't add the master to the IOMMU. To me it seems > that only devices with a 'iommus' dt property are added. But in case of > PCI devices the parent only has a iommu-map property. > > And it makes me wonder why Leo has an almost working setup. Maybe I'm > missing some patches though. Xen 4.16 is able to parse StreamID in the "iommus" property and also "mmu-masters" property. But It is not able to parse the "iommu-map" property yet. So if 0x42a is described in device tree using "iommu-map" then the error makes sense. A simple solution is to replace iommu-map with iommus in device tree. It is possible that someone in CC to this email might already have a patch to introduce parsing of iommu-map in Xen.
[PATCH] xen/smp: Speed up on_selected_cpus()
cpumask_weight() is a horribly expensive way to find if no bits are set, made worse by the fact that the calculation is performed with the global call_lock held. Switch to using cpumask_empty() instead, which will short circuit as soon as it find any set bit in the cpumask. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu CC: Juergen Gross CC: Stefano Stabellini CC: Julien Grall CC: Volodymyr Babchuk CC: Bertrand Marquis I have not done performance testing, but I would be surprised if this cannot be measured on a busy or large box. --- xen/common/smp.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/xen/common/smp.c b/xen/common/smp.c index 781bcf2c246c..a011f541f1ea 100644 --- a/xen/common/smp.c +++ b/xen/common/smp.c @@ -50,8 +50,6 @@ void on_selected_cpus( void *info, int wait) { -unsigned int nr_cpus; - ASSERT(local_irq_is_enabled()); ASSERT(cpumask_subset(selected, _online_map)); @@ -59,8 +57,7 @@ void on_selected_cpus( cpumask_copy(_data.selected, selected); -nr_cpus = cpumask_weight(_data.selected); -if ( nr_cpus == 0 ) +if ( cpumask_empty(_data.selected) ) goto out; call_data.func = func; -- 2.11.0
Re: [PATCH v2] docs: document patch rules
Hi, On 03/02/2022 12:54, Juergen Gross wrote: +## The commit message + +The commit message is free text describing *why* the patch is done and +*how* the goal of the patch is achieved. A good commit message will describe +the current situation, the desired goal, and the way this goal is being +achieved. Parts of that can be omitted in obvious cases. + +In case additional changes are done in the patch (like e.g. cleanups), those +should be mentioned. + +When referencing other patches (e.g. `similar to patch xy ...`) those +patches should be referenced via their commit id (at least 12 digits) +and the patch subject, if the very same patch isn't referenced by the +`Fixes:` tag, too: + +Similar to commit 67d01cdb5518 ("x86: infrastructure to allow converting +certain indirect calls to direct ones") add ... + +The following ``git config`` settings can be used to add a pretty format for +outputting the above style in the ``git log`` or ``git show`` commands: + +[core] +abbrev = 12 +[pretty] +fixes = Fixes: %h (\"%s\") + +Lines in the commit message should not exceed 75 characters, except when I was under the impression that commit message should be wrap to 72 characters. This is because tools like "git log" would indent the commit message by 8 characters. +copying error output directly into the commit message. + +## Tags + +Tags are entries in the form + +Tag: something + +In general tags are added in chronological order. So a `Reviewed-by:` tag +should be added **after** the `Signed-off-by:` tag, as the review happened +after the patch was written. + +Do not split a tag across multiple lines, tags are exempt from the +"wrap at 75 columns" rule in order to simplify parsing scripts. This would need to be adjusted depending on the answer above. + +### Origin: + +Xen has inherited some source files from other open source projects. In case +a patch modifying such an inherited file is taken from that project (maybe in +modified form), the `Origin:` tag specifies the source of the patch: + +Origin: NIT: Likes you did for Fixes tags, can you make clear that the commit id should be the first 12 characters? So the line... + +E.g.: + +Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f093b08c47b3 ... doesn't get horribly long. + +All tags **above** the `Origin:` tag are from the original patch (which +should all be kept), while tags **after** `Origin:` are related to the +normal Xen patch process as described here. Cheers, -- Julien Grall
[xen-unstable-smoke test] 168011: tolerable all pass - PUSHED
flight 168011 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/168011/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 820cc393434097f3b7976acdccbf1d96071d6d23 baseline version: xen 75cc460a1b8cfd8e5d2c4302234ee194defe4872 Last test of basis 167999 2022-02-03 13:01:51 Z1 days Testing same since 168011 2022-02-04 16:01:44 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Jan Beulich jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 75cc460a1b..820cc39343 820cc393434097f3b7976acdccbf1d96071d6d23 -> smoke
[libvirt test] 168006: regressions - FAIL
flight 168006 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/168006/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-raw 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) blocked n/a version targeted for testing: libvirt bf36dcb2a6272a1ea7eb770878df2cbd455a6679 baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 574 days Failing since151818 2020-07-11 04:18:52 Z 573 days 555 attempts Testing same since 168006 2022-02-04 04:18:55 Z0 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Aleksandr Alekseev Aleksei Zakharov Andika Triwidada Andrea Bolognani Ani Sinha Balázs Meskó Barrett Schonefeld Bastian Germann Bastien Orivel BiaoXiang Ye Bihong Yu Binfeng Wu Bjoern Walk Boris Fiuczynski Brad Laue Brian Turek Bruno Haible Chris Mayo Christian Borntraeger Christian Ehrhardt Christian Kirbach Christian Schoenebeck Christophe Fergeau Cole Robinson Collin Walling Cornelia Huck Cédric Bosdonnat Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Didik Supriadi dinglimin Divya Garg Dmitrii Shcherbakov Dmytro Linkin Eiichi Tsukata Emilio Herrera Eric Farman Erik Skultety Fabian Affolter Fabian Freyer Fabiano Fidêncio Fangge Jin Farhan Ali Fedora Weblate Translation Franck Ridel Gavi Teitz gongwei Guoyi Tu Göran Uddeborg Halil Pasic Han Han Hao Wang Hela Basa Helmut Grohne Hiroki Narukawa Hyman Huang(黄勇) Ian Wienand Ioanna Alifieraki Ivan Teterevkov Jakob Meng Jamie Strandboge Jamie Strandboge Jan Kuparinen jason lee Jean-Baptiste Holcroft Jia Zhou Jianan Gao Jim Fehlig Jin Yan Jinsheng Zhang Jiri Denemark Joachim Falk John Ferlan Jonathan Watt Jonathon Jongsma Julio Faracco Justin Gatzen Ján Tomko Kashyap Chamarthy Kevin Locke Koichi Murase Kristina Hanicova Laine Stump Laszlo Ersek Lee Yarwood Lei Yang Liao Pingfang Lin Ma Lin Ma Lin Ma Liu Yiding Lubomir Rintel Luke Yue Luyao Zhong Marc Hartmayer Marc-André Lureau Marek Marczykowski-Górecki Markus Schade Martin Kletzander Masayoshi Mizuma Matej Cepl Matt Coleman Matt Coleman Mauro Matteo Cascella Meina Li Michal Privoznik Michał Smyk Milo Casagrande Moshe Levi Muha Aliss Nathan Neal Gompa Nick Chevsky Nick Shyrokovskiy Nickys Music Group Nico Pache Nicolas Lécureuil Nicolas Lécureuil Nikolay Shirokovskiy Olaf Hering Olesya Gerasimenko Or Ozeri Orion Poplawski Pany Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peng Liang Peter Krempa Pino Toscano Pino Toscano Piotr Drąg Prathamesh Chavan Praveen K Paladugu Richard W.M. Jones Ricky Tigg Robin Lee Rohit Kumar Roman Bogorodskiy Roman Bolshakov Ryan Gahagan Ryan Schmidt Sam Hartman Scott Shambarger Sebastian Mitterle SeongHyun Jo Shalini Chellathurai Saroja Shaojun Yang
[PATCH] x86/hvm: Fix boot on systems where HVM isn't available
c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call") went too far with dropping NULL function pointer checks. smp_callin() calls hvm_cpu_up() unconditionally. When the platform doesn't support HVM, hvm_enable() exits without filling in hvm_funcs, after which the altcall pass nukes the (now unconditional) indirect call, causing: (XEN) [ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ] (XEN) CPU:1 (XEN) RIP:e008:[] start_secondary+0x393/0x3b7 (XEN) RFLAGS: 00010086 CONTEXT: hypervisor ... (XEN) Xen code around (start_secondary+0x393/0x3b7): (XEN) ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff ... (XEN) Xen call trace: (XEN)[] R start_secondary+0x393/0x3b7 (XEN)[] F __high_start+0x42/0x60 To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally too, so what happen next is: (XEN) [ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:e008:[] __stop_this_cpu+0x12/0x3c (XEN) RFLAGS: 00010046 CONTEXT: hypervisor ... (XEN) Xen code around (__stop_this_cpu+0x12/0x3c): (XEN) 48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff ... (XEN) Xen call trace: (XEN)[] R __stop_this_cpu+0x12/0x3c (XEN)[] F smp_send_stop+0xdd/0xf8 (XEN)[] F machine_restart+0xa2/0x298 (XEN)[] F arch/x86/shutdown.c#__machine_restart+0xb/0x11 (XEN)[] F smp_call_function_interrupt+0xbf/0xea (XEN)[] F call_function_interrupt+0x35/0x37 (XEN)[] F do_IRQ+0xa3/0x6b5 (XEN)[] F common_interrupt+0x10a/0x120 (XEN)[] F __udelay+0x3a/0x51 (XEN)[] F __cpu_up+0x48f/0x734 (XEN)[] F cpu_up+0x7d/0xde (XEN)[] F __start_xen+0x200b/0x2618 (XEN)[] F __high_start+0x4f/0x60 which recurses until hitting a stack overflow. The #DF handler, which resets its stack on each invocation, loops indefinitely. Reinstate the NULL function pointer checks for hvm_cpu_{up,down}(). Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call") Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu RFC. Not tested yet on the imacted hardware. It's a Xeon PHI with another werid thing in need of debugging. First boot is fine, while second boot (loading microcode this time) has a problem with vmx. I wonder if we want to modify the callers to check for HVM being enabled, rather than leaving the NULL pointer checks in a position where they're liable to be reaped again. Also, the #UD handler really should identify 0f 0b 0f ff ff as the clobbered-altcall sequence, and provide a message to that effect. --- xen/arch/x86/include/asm/hvm/hvm.h | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h index a441cbc22159..2dd08567f730 100644 --- a/xen/arch/x86/include/asm/hvm/hvm.h +++ b/xen/arch/x86/include/asm/hvm/hvm.h @@ -553,12 +553,16 @@ static inline void hvm_invlpg(struct vcpu *v, unsigned long linear) static inline int hvm_cpu_up(void) { -return alternative_call(hvm_funcs.cpu_up); +if ( hvm_funcs.cpu_up ) +return alternative_call(hvm_funcs.cpu_up); + +return 0; } static inline void hvm_cpu_down(void) { -alternative_vcall(hvm_funcs.cpu_down); +if ( hvm_funcs.cpu_down ) +alternative_vcall(hvm_funcs.cpu_down); } static inline unsigned int hvm_get_insn_bytes(struct vcpu *v, uint8_t *buf) -- 2.11.0
[linux-linus test] 168004: tolerable FAIL - PUSHED
flight 168004 linux-linus real [real] flight 168010 linux-linus real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/168004/ http://logs.test-lab.xenproject.org/osstest/logs/168010/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-freebsd12-amd64 19 guest-localmigrate/x10 fail pass in 168010-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167988 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167988 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167988 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167988 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167988 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167988 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167988 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 167988 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: linux1f2cfdd349b7647f438c1e552dc1b983da86d830 baseline version: linux27bb0b18c208ecd4c0deda6aad28616d73e4133d Last test of basis 167988 2022-02-02 18:11:17 Z1 days Failing since167993 2022-02-03 04:34:02 Z1 days2 attempts Testing same since 168004 2022-02-04 00:41:22 Z0 days1 attempts People who touched revisions under test: "Eric W. Biederman" Chuck Lever Dai Ngo Dan Carpenter
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On Fri, Feb 04, 2022 at 02:43:07PM +, Oleksandr Andrushchenko wrote: > > > On 04.02.22 15:06, Roger Pau Monné wrote: > > On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote: > >> > >> On 04.02.22 14:47, Jan Beulich wrote: > >>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote: > On 04.02.22 13:37, Jan Beulich wrote: > > On 04.02.2022 12:13, Roger Pau Monné wrote: > >> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: > >>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: > On 04.02.22 11:15, Jan Beulich wrote: > > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > >> On 04.02.22 09:52, Jan Beulich wrote: > >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev > *pdev, uint16_t cmd, bool rom_only) > continue; > } > > +spin_lock(>vpci_lock); > +if ( !tmp->vpci ) > +{ > +spin_unlock(>vpci_lock); > +continue; > +} > for ( i = 0; i < > ARRAY_SIZE(tmp->vpci->header.bars); i++ ) > { > const struct vpci_bar *bar = > >vpci->header.bars[i]; > @@ -303,12 +310,14 @@ static int modify_bars(const struct > pci_dev *pdev, uint16_t cmd, bool rom_only) > rc = rangeset_remove_range(mem, start, end); > if ( rc ) > { > +spin_unlock(>vpci_lock); > printk(XENLOG_G_WARNING "Failed to remove > [%lx, %lx]: %d\n", > start, end, rc); > rangeset_destroy(mem); > return rc; > } > } > +spin_unlock(>vpci_lock); > } > >>> At the first glance this simply looks like another unjustified > >>> (in the > >>> description) change, as you're not converting anything here but > >>> you > >>> actually add locking (and I realize this was there before, so I'm > >>> sorry > >>> for not pointing this out earlier). > >> Well, I thought that the description already has "...the lock can > >> be > >> used (and in a few cases is used right away) to check whether vpci > >> is present" and this is enough for such uses as here. > >>> But then I wonder whether you > >>> actually tested this, since I can't help getting the impression > >>> that > >>> you're introducing a live-lock: The function is called from > >>> cmd_write() > >>> and rom_write(), which in turn are called out of vpci_write(). > >>> Yet that > >>> function already holds the lock, and the lock is not (currently) > >>> recursive. (For the 3rd caller of the function - init_bars() - > >>> otoh > >>> the locking looks to be entirely unnecessary.) > >> Well, you are correct: if tmp != pdev then it is correct to acquire > >> the lock. But if tmp == pdev and rom_only == true > >> then we'll deadlock. > >> > >> It seems we need to have the locking conditional, e.g. only lock > >> if tmp != pdev > > Which will address the live-lock, but introduce ABBA deadlock > > potential > > between the two locks. > I am not sure I can suggest a better solution here > @Roger, @Jan, could you please help here? > >>> Well, first of all I'd like to mention that while it may have been > >>> okay to > >>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when > >>> dealing > >>> with DomU-s' lists of PCI devices. The requirement really applies to > >>> the > >>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that > >>> there it probably wants to be a try-lock. > >>> > >>> Next I'd like to point out that here we have the still pending issue > >>> of > >>> how to deal with hidden devices, which Dom0 can access. See my RFC > >>> patch > >>> "vPCI: account for hidden devices in modify_bars()". Whatever the > >>> solution > >>> here, I think it wants to at least account for the extra need there. > >> Yes, sorry, I should take care of that. > >> > >>> Now it is quite clear that pcidevs_lock isn't going to help with > >>> avoiding > >>> the deadlock, as it's imo not an option at all to acquire that lock > >>>
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.22 15:06, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote: >> >> On 04.02.22 14:47, Jan Beulich wrote: >>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote: On 04.02.22 13:37, Jan Beulich wrote: > On 04.02.2022 12:13, Roger Pau Monné wrote: >> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: >>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: On 04.02.22 11:15, Jan Beulich wrote: > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: >> On 04.02.22 09:52, Jan Beulich wrote: >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) continue; } +spin_lock(>vpci_lock); +if ( !tmp->vpci ) +{ +spin_unlock(>vpci_lock); +continue; +} for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) { const struct vpci_bar *bar = >vpci->header.bars[i]; @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) rc = rangeset_remove_range(mem, start, end); if ( rc ) { +spin_unlock(>vpci_lock); printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n", start, end, rc); rangeset_destroy(mem); return rc; } } +spin_unlock(>vpci_lock); } >>> At the first glance this simply looks like another unjustified (in >>> the >>> description) change, as you're not converting anything here but you >>> actually add locking (and I realize this was there before, so I'm >>> sorry >>> for not pointing this out earlier). >> Well, I thought that the description already has "...the lock can be >> used (and in a few cases is used right away) to check whether vpci >> is present" and this is enough for such uses as here. >>> But then I wonder whether you >>> actually tested this, since I can't help getting the impression that >>> you're introducing a live-lock: The function is called from >>> cmd_write() >>> and rom_write(), which in turn are called out of vpci_write(). Yet >>> that >>> function already holds the lock, and the lock is not (currently) >>> recursive. (For the 3rd caller of the function - init_bars() - otoh >>> the locking looks to be entirely unnecessary.) >> Well, you are correct: if tmp != pdev then it is correct to acquire >> the lock. But if tmp == pdev and rom_only == true >> then we'll deadlock. >> >> It seems we need to have the locking conditional, e.g. only lock >> if tmp != pdev > Which will address the live-lock, but introduce ABBA deadlock > potential > between the two locks. I am not sure I can suggest a better solution here @Roger, @Jan, could you please help here? >>> Well, first of all I'd like to mention that while it may have been okay >>> to >>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when >>> dealing >>> with DomU-s' lists of PCI devices. The requirement really applies to the >>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that >>> there it probably wants to be a try-lock. >>> >>> Next I'd like to point out that here we have the still pending issue of >>> how to deal with hidden devices, which Dom0 can access. See my RFC patch >>> "vPCI: account for hidden devices in modify_bars()". Whatever the >>> solution >>> here, I think it wants to at least account for the extra need there. >> Yes, sorry, I should take care of that. >> >>> Now it is quite clear that pcidevs_lock isn't going to help with >>> avoiding >>> the deadlock, as it's imo not an option at all to acquire that lock >>> everywhere else you access ->vpci (or else the vpci lock itself would be >>> pointless). But a per-domain auxiliary r/w lock may help: Other paths >>> would acquire it in read mode, and here you'd acquire it in write mode >>> (in >>> the former case around the vpci lock, while in the latter case there may >>> then not be any need to
Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices
On 04.02.22 16:30, Jan Beulich wrote: > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >> Reset the command register when assigning a PCI device to a guest: >> according to the PCI spec the PCI_COMMAND register is typically all 0's >> after reset. > It's not entirely clear to me whether setting the hardware register to > zero is okay. What wants to be zero is the value the guest observes > initially. "the PCI spec says the PCI_COMMAND register is typically all 0's after reset." Why wouldn't it be ok? What is the exact concern here? >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -454,8 +454,7 @@ static void cmd_write(const struct pci_dev *pdev, >> unsigned int reg, >> pci_conf_write16(pdev->sbdf, reg, cmd); >> } >> >> -static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg, >> -uint32_t cmd, void *data) >> +static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd) > The command register is a 16-bit one, so parameter and return type should > either be plain unsigned int (preferred, see ./CODING_STYLE) or uint16_t > imo. God catch, thank you > Jan > Thank you, Oleksandr
Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > Reset the command register when assigning a PCI device to a guest: > according to the PCI spec the PCI_COMMAND register is typically all 0's > after reset. It's not entirely clear to me whether setting the hardware register to zero is okay. What wants to be zero is the value the guest observes initially. > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -454,8 +454,7 @@ static void cmd_write(const struct pci_dev *pdev, > unsigned int reg, > pci_conf_write16(pdev->sbdf, reg, cmd); > } > > -static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg, > -uint32_t cmd, void *data) > +static uint32_t emulate_cmd_reg(const struct pci_dev *pdev, uint32_t cmd) The command register is a 16-bit one, so parameter and return type should either be plain unsigned int (preferred, see ./CODING_STYLE) or uint16_t imo. Jan
Re: [PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for guests
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -454,6 +454,22 @@ static void cmd_write(const struct pci_dev *pdev, > unsigned int reg, > pci_conf_write16(pdev->sbdf, reg, cmd); > } > > +static void guest_cmd_write(const struct pci_dev *pdev, unsigned int reg, > +uint32_t cmd, void *data) > +{ > +/* TODO: Add proper emulation for all bits of the command register. */ > + > +#ifdef CONFIG_HAS_PCI_MSI > +if ( pdev->vpci->msi->enabled || pdev->vpci->msix->enabled ) > +{ > +/* Guest wants to enable INTx. It can't be enabled if MSI/MSI-X > enabled. */ > +cmd |= PCI_COMMAND_INTX_DISABLE; > +} > +#endif > + > +cmd_write(pdev, reg, cmd, data); > +} It's not really clear to me whether the TODO warrants this being a separate function. Personally I'd find it preferable if the logic was folded into cmd_write(). With this and ... > --- a/xen/drivers/vpci/msi.c > +++ b/xen/drivers/vpci/msi.c > @@ -70,6 +70,10 @@ static void control_write(const struct pci_dev *pdev, > unsigned int reg, > > if ( vpci_msi_arch_enable(msi, pdev, vectors) ) > return; > + > +/* Make sure guest doesn't enable INTx while enabling MSI. */ > +if ( !is_hardware_domain(pdev->domain) ) > +pci_intx(pdev, false); > } > else > vpci_msi_arch_disable(msi, pdev); > --- a/xen/drivers/vpci/msix.c > +++ b/xen/drivers/vpci/msix.c > @@ -92,6 +92,10 @@ static void control_write(const struct pci_dev *pdev, > unsigned int reg, > for ( i = 0; i < msix->max_entries; i++ ) > if ( !msix->entries[i].masked && msix->entries[i].updated ) > update_entry(>entries[i], pdev, i); > + > +/* Make sure guest doesn't enable INTx while enabling MSI-X. */ > +if ( !is_hardware_domain(pdev->domain) ) > +pci_intx(pdev, false); > } > else if ( !new_enabled && msix->enabled ) > { ... this done (as requested) behind the back of the guest, what's the idea wrt the guest reading the command register? That continues to be wired to vpci_hw_read16() (and hence accesses the underlying hardware value irrespective of what patch 4 did). Jan
Re: [PATCH v6 04/13] vpci: restrict unhandled read/write operations for guests
On 04.02.22 16:11, Jan Beulich wrote: > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >> A guest can read and write those registers which are not emulated and >> have no respective vPCI handlers, so it can access the HW directly. > I don't think this describes the present situation. Or did I miss where > devices can actually be exposed to guests already, despite much of the > support logic still missing? No, they are not exposed yet and you know that. I will update the commit message > >> In order to prevent a guest from reads and writes from/to the unhandled >> registers make sure only hardware domain can access HW directly and restrict >> guests from doing so. > Tangential question: Going over the titles of the remaining patches I > notice patch 6 is going to deal with BAR accesses. But (going just > from the titles) I can't spot anywhere that vendor and device IDs > would be exposed to guests. Yet that's the first thing guests will need > in order to actually recognize devices. As said before, allowing guests > access to such r/o fields is quite likely going to be fine. Agree, I was thinking about adding such a patch to allow IDs, but finally decided not to add more to this series. Again, the whole thing is not working yet and for the development this patch can/needs to be reverted. So, either we implement IDs or not this doesn't change anything with this respect > >> --- a/xen/drivers/vpci/vpci.c >> +++ b/xen/drivers/vpci/vpci.c >> @@ -215,11 +215,15 @@ int vpci_remove_register(struct vpci *vpci, unsigned >> int offset, >> } >> >> /* Wrappers for performing reads/writes to the underlying hardware. */ >> -static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int reg, >> +static uint32_t vpci_read_hw(bool is_hwdom, pci_sbdf_t sbdf, unsigned int >> reg, >>unsigned int size) > Was the passing around of a boolean the consensus which was reached? Was this patch committed yet? > Personally I'd fine it more natural if the two functions checked > current->domain themselves. This is also possible, but I would like to hear Roger's view on this as well I am fine either way > > Jan > Thank you, Oleksandr
Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility
On Fri, Feb 04, 2022 at 02:10:03PM +, Andrew Cooper wrote: > On 04/02/2022 13:46, Jan Beulich wrote: > > On 04.02.2022 14:34, Andrew Cooper wrote: > >> On 04/02/2022 13:09, Jan Beulich wrote: > >>> On 04.02.2022 13:12, Andrew Cooper wrote: > On 04/02/2022 08:31, Jan Beulich wrote: > > On 03.02.2022 19:10, Andrew Cooper wrote: > >> It was Xen 4.14 where CPUID data was added to the migration stream, > >> and 4.13 > >> that we need to worry about with regards to compatibility. Xen 4.12 > >> isn't > >> relevant. > >> > >> Expand and correct the commentary. > >> > >> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore") > > But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework > > xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is > > where DEF_MAX_* disappeared? > No. All that happened in that change was that we switched to using > > cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD > > instead, which remained the same size until Xen 4.15 when e9b4fe26364 > bumped it. > >>> Oh, right. I did try to look for a replacement, but managed to miss > >>> this. But then, as much as 4.12 isn't relevant, isn't it the case > >>> that the fact that CPUID data was added to the stream in 4.14 isn't > >>> relevant here either, and it's instead the bumping in 4.15 which is? > >> The fact that the bump happened is relevant, by virtue of the fact there > >> logic added to cope. The fact it was in 4.15 is not relevant - this > >> isn't a list of every ABI-relevant change. > >> > >> CPUID data being added to the stream is critically important, because > >> that's the point after which we never enter this compatibility path. > > If the bump happened before CPUID data was added to the stream, logic to > > cope with migrating-in guests would have been required too, wouldn't it. > > Yes, it would have been. > > It wasn't an accident that none of the max leaves changed while doing > the Xen CPUID work. > > We're unfortunately a long way behind on Intel CPUID leaves, but all(?) > of the new leaves need more complicated migration safely logic than the > toolstack currently knows how to do. > > > But anyway, just to be done with this: > > Acked-by: Jan Beulich > > Thanks. Will rebase my CPUID series on top of this, but I will wait for further comments before sending a new version. Roger.
Re: x86: insn-eval.c's use of native_store_gdt()
On 04.02.2022 15:08, Thomas Gleixner wrote: > On Fri, Feb 04 2022 at 10:23, Jan Beulich wrote: >> On 30.11.2021 12:09, Jan Beulich wrote: >>> I think it is b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage") >>> which uncovered an issue with get_desc() trying to access the GDT, as >>> introduced by 670f928ba09b ("x86/insn-eval: Add utility function to >>> get segment descriptor"). When running in a PV domain under Xen, the >>> (hypervisor's) GDT isn't accessible; with UMIP enabled by Xen even >>> SGDT wouldn't work, as the kernel runs in ring 3. >>> >>> There's currently no hypercall to retrieve a descriptor from the GDT. >>> It is instead assumed that kernels know where their present GDT >>> lives. Can the native_store_gdt() be replaced there, please? >>> >>> For context (I don't think it should matter much here) I'm observing >>> this with the kernel put underneath a rather old distro, where >>> hwclock triggers this path. >> >> I'd like to note that the issue still exists in 5.16. > > I'd like to note, that I've seen no patches to that effect. I could have worded it that way as well, yes. Jan
Re: [PATCH v6 04/13] vpci: restrict unhandled read/write operations for guests
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > A guest can read and write those registers which are not emulated and > have no respective vPCI handlers, so it can access the HW directly. I don't think this describes the present situation. Or did I miss where devices can actually be exposed to guests already, despite much of the support logic still missing? > In order to prevent a guest from reads and writes from/to the unhandled > registers make sure only hardware domain can access HW directly and restrict > guests from doing so. Tangential question: Going over the titles of the remaining patches I notice patch 6 is going to deal with BAR accesses. But (going just from the titles) I can't spot anywhere that vendor and device IDs would be exposed to guests. Yet that's the first thing guests will need in order to actually recognize devices. As said before, allowing guests access to such r/o fields is quite likely going to be fine. > --- a/xen/drivers/vpci/vpci.c > +++ b/xen/drivers/vpci/vpci.c > @@ -215,11 +215,15 @@ int vpci_remove_register(struct vpci *vpci, unsigned > int offset, > } > > /* Wrappers for performing reads/writes to the underlying hardware. */ > -static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned int reg, > +static uint32_t vpci_read_hw(bool is_hwdom, pci_sbdf_t sbdf, unsigned int > reg, > unsigned int size) Was the passing around of a boolean the consensus which was reached? Personally I'd fine it more natural if the two functions checked current->domain themselves. Jan
[seabios test] 168003: tolerable FAIL - PUSHED
flight 168003 seabios real [real] flight 168009 seabios real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/168003/ http://logs.test-lab.xenproject.org/osstest/logs/168009/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 168009-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167920 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167920 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167920 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167920 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167920 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass version targeted for testing: seabios 829b0f1a7cda1bccdf44a379fb3a96e519a7e8cd baseline version: seabios dc776a2d9ca9e1b857e880ff682668871369b4c3 Last test of basis 167920 2022-01-27 16:42:48 Z7 days Testing same since 168003 2022-02-03 23:10:24 Z0 days1 attempts People who touched revisions under test: Florian Larysch jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm pass test-amd64-i386-xl-qemuu-debianhvm-i386-xsm fail test-amd64-amd64-qemuu-nested-amdfail test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-amd64-qemuu-freebsd11-amd64 pass test-amd64-amd64-qemuu-freebsd12-amd64 pass test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-qemuu-ws16-amd64 fail test-amd64-i386-xl-qemuu-ws16-amd64 fail test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrictpass test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict pass test-amd64-amd64-qemuu-nested-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow pass test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/seabios.git dc776a2..829b0f1 829b0f1a7cda1bccdf44a379fb3a96e519a7e8cd -> xen-tested-master
Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility
On 04/02/2022 13:46, Jan Beulich wrote: > On 04.02.2022 14:34, Andrew Cooper wrote: >> On 04/02/2022 13:09, Jan Beulich wrote: >>> On 04.02.2022 13:12, Andrew Cooper wrote: On 04/02/2022 08:31, Jan Beulich wrote: > On 03.02.2022 19:10, Andrew Cooper wrote: >> It was Xen 4.14 where CPUID data was added to the migration stream, and >> 4.13 >> that we need to worry about with regards to compatibility. Xen 4.12 >> isn't >> relevant. >> >> Expand and correct the commentary. >> >> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore") > But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework > xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is > where DEF_MAX_* disappeared? No. All that happened in that change was that we switched to using cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD instead, which remained the same size until Xen 4.15 when e9b4fe26364 bumped it. >>> Oh, right. I did try to look for a replacement, but managed to miss >>> this. But then, as much as 4.12 isn't relevant, isn't it the case >>> that the fact that CPUID data was added to the stream in 4.14 isn't >>> relevant here either, and it's instead the bumping in 4.15 which is? >> The fact that the bump happened is relevant, by virtue of the fact there >> logic added to cope. The fact it was in 4.15 is not relevant - this >> isn't a list of every ABI-relevant change. >> >> CPUID data being added to the stream is critically important, because >> that's the point after which we never enter this compatibility path. > If the bump happened before CPUID data was added to the stream, logic to > cope with migrating-in guests would have been required too, wouldn't it. Yes, it would have been. It wasn't an accident that none of the max leaves changed while doing the Xen CPUID work. We're unfortunately a long way behind on Intel CPUID leaves, but all(?) of the new leaves need more complicated migration safely logic than the toolstack currently knows how to do. > But anyway, just to be done with this: > Acked-by: Jan Beulich Thanks. ~Andrew
Re: x86: insn-eval.c's use of native_store_gdt()
On Fri, Feb 04 2022 at 10:23, Jan Beulich wrote: > On 30.11.2021 12:09, Jan Beulich wrote: >> I think it is b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage") >> which uncovered an issue with get_desc() trying to access the GDT, as >> introduced by 670f928ba09b ("x86/insn-eval: Add utility function to >> get segment descriptor"). When running in a PV domain under Xen, the >> (hypervisor's) GDT isn't accessible; with UMIP enabled by Xen even >> SGDT wouldn't work, as the kernel runs in ring 3. >> >> There's currently no hypercall to retrieve a descriptor from the GDT. >> It is instead assumed that kernels know where their present GDT >> lives. Can the native_store_gdt() be replaced there, please? >> >> For context (I don't think it should matter much here) I'm observing >> this with the kernel put underneath a rather old distro, where >> hwclock triggers this path. > > I'd like to note that the issue still exists in 5.16. I'd like to note, that I've seen no patches to that effect. Thanks, tglx
Re: Xen data from meta-virtualization layer
Hi all, > + Zhiqiang Hou > > On Tue, 24 Nov 2020, Leo Krueger wrote: > > > >>> On Tue, 17 Nov 2020, Leo Krueger wrote: > > > Hi, > > > > > > I enabled CONFIG_HAS_ITS (what a stupid mistake by me to not set it > > > before...) but then had to add the following node to my device tree > > > > > > gic_lpi_base: syscon@0x8000 { > > > compatible = "gic-lpi-base"; > > > >> > > > >> I couldn't find this compatible defined/used in Linux 5.10-rc4. @Leo, > > > >> could you clarify which flavor/version of Linux you are using? > > > > > > > > It is Linux 4.19 from Yocto (Warror release). XEN 4.13.2. > > > > > > Do you have a link to the Linux tree? Is there any additional patches on > > > top of > > > vanilla? > > > > Linux tree is found here: > > https://github.com/kontron/linux-smarc-sal28/commits/master-LSDK-19.09 > > (up to the latest commit in that branch) FWIW, I'm the developer of the support for this board both in the mentioned branch as well as upstream. If you don't need graphics you shouldn't really be using the branch above, but the latest vanilla kernel release. > [...] > > > > Looking at the DT changes in [0], it looks like the node is not a child > > > of gic@. > > > So I think Xen will map the region to Dom0. > > > > > > There are two things that I can notice: > > >1) This region is RAM, but I can't find any reserve node. Is there any > > > specific > > > code in Linux to reserve it? > > >2) The implementation in U-boot seems to suggest that the firmware will > > > configure the LPIs and then enable it. If that's the case, then Xen needs > > > to > > > re-use the table in the DT rather than allocating a new one. > > That Linux tree has no mentions of gic-lpi-base. That means that > gic-lpi-base is only used in u-boot, not in Linux. In particular the > most relevant commit is af288cb291da3abef6be0875527729296f7de7a0. That node was horrible hack from NXP for u-boot and was removed in a84cea06bb8fff69810a890ac0e4b47ea5726512. > In regards to the reserved-memory regions, maybe we are not seeing them > because Leo posted the host device tree, not the one passed at runtime > from u-boot to Linux? > > If so, Leo, could you please boot Linux on native (no Xen) and get the > device tree from there at runtime using dtc -I fs -O dts > /proc/device-tree ? > > > However, the name of the reserved-memory region created by u-boot seems > to be "lpi_rd_table". I cannot find any mentions of lpi_rd_table in the > Linux kernel tree either. > > Zhiqiang, Leo is trying to boot Xen on sAL28. Linux booting on Xen > throws errors in regards to GIC/ITS initialization. On other hardware > Xen can use and virtualize GICv3 and ITS just fine. Could you please > explain what is different about sAL28 and how Xen/Linux is expected to > use the lpi_rd_table reserved-memory region? I actually stumbled across this thread after trying out Xen myself. I'm using lastest vanilla u-boot (with pending PSCI patches), vanilla kernel and vanilla Xen. So far I've discovered, that xen complains that it cannot route IRQ64 to dom0. That is because on the LS1028A there is a dual UART (two 16550 with one shared interrupt) and xen takes the first UART and then tries to map the interrupt of the second UART to linux. For now, I don't know how this is solved correctly. As a quick hack, I removed the second uart node from the device tree. But what is more severe is that the iommu isn't set up correctly. I'm getting the following faults: (XEN) smmu: /soc/iommu@500: Unexpected global fault, this could be serious (XEN) smmu: /soc/iommu@500: GFSR 0x8002, GFSYNR0 0x, GFSYNR1 0x042a, GFSYNR2 0x If I decode it correctly, the streamid should be 0x2a which would be one of the PCI devices on the internal root complex. Probably the network card. This is the first developer experience with Xen, so please bear with me :) It seems that Xen doesn't add the master to the IOMMU. To me it seems that only devices with a 'iommus' dt property are added. But in case of PCI devices the parent only has a iommu-map property. And it makes me wonder why Leo has an almost working setup. Maybe I'm missing some patches though. -michael
[PATCH] x86/Intel: don't log bogus frequency range on Core/Core2 processors
Models 0F and 17 don't have PLATFORM_INFO documented. While it exists on at least model 0F, the information there doesn't match the scheme used on newer models (I'm observing a range of 700 ... 600 MHz reported on a Xeon E5345). Sadly the Enhanced Intel Core instance of the table entry is not self- consistent: The numeric description of the low 3 bits doesn't match the subsequent more textual description in some of the cases; I'm using the former here. Include the older Core model 0E as well as the two other Core2 models, none of which have respective MSR tables in the SDM. Fixes: f6b6517cd5db ("x86: retrieve and log CPU frequency information") Signed-off-by: Jan Beulich --- While the SDM table for the two models lists FSB_FREQ, I'm afraid its information is of little use here: If anything it could serve as a reference for the frequency determined by calibrate_APIC_clock(). --- RFC: I may want to rebase over Roger's addition of intel-family.h, but first of all I wanted to see whether going this route is deemed acceptable at all. --- a/xen/arch/x86/cpu/intel.c +++ b/xen/arch/x86/cpu/intel.c @@ -435,6 +435,26 @@ static void intel_log_freq(const struct if ( c->x86 == 6 ) switch ( c->x86_model ) { +static const unsigned short core_factors[] = +{ 26667, 1, 2, 16667, 3, 1, 4 }; + +case 0x0e: /* Core */ +case 0x0f: case 0x16: case 0x17: case 0x1d: /* Core2 */ +/* + * PLATFORM_INFO, while not documented for these, appears to + * exist in at least some cases, but what it holds doesn't + * match the scheme used by newer CPUs. At a guess, the min + * and max fields look to be reversed, while the scaling + * factor is encoded in FSB_FREQ. + */ +if ( min_ratio > max_ratio ) +SWAP(min_ratio, max_ratio); +if ( rdmsr_safe(MSR_FSB_FREQ, msrval) || + (msrval &= 7) >= ARRAY_SIZE(core_factors) ) +return; +factor = core_factors[msrval]; +break; + case 0x1a: case 0x1e: case 0x1f: case 0x2e: /* Nehalem */ case 0x25: case 0x2c: case 0x2f: /* Westmere */ factor = 1;
Re: [PATCH v2] docs: document patch rules
On 03.02.2022 13:54, Juergen Gross wrote: > Add a document to describe the rules for sending a proper patch. > > As it contains all the information already being present in > docs/process/tags.pandoc remove that file. > > The "Reviewed-by:" and "Acked-by:" tags are expanded to allow an > optional restriction of the tag. > > A new tag "Origin:" is added to tag patches taken from another project. > > Signed-off-by: Juergen Gross Acked-by: Jan Beulich
Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility
On 04.02.2022 14:34, Andrew Cooper wrote: > On 04/02/2022 13:09, Jan Beulich wrote: >> On 04.02.2022 13:12, Andrew Cooper wrote: >>> On 04/02/2022 08:31, Jan Beulich wrote: On 03.02.2022 19:10, Andrew Cooper wrote: > It was Xen 4.14 where CPUID data was added to the migration stream, and > 4.13 > that we need to worry about with regards to compatibility. Xen 4.12 isn't > relevant. > > Expand and correct the commentary. > > Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore") But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is where DEF_MAX_* disappeared? >>> No. All that happened in that change was that we switched to using >>> >>> cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD >>> >>> instead, which remained the same size until Xen 4.15 when e9b4fe26364 >>> bumped it. >> Oh, right. I did try to look for a replacement, but managed to miss >> this. But then, as much as 4.12 isn't relevant, isn't it the case >> that the fact that CPUID data was added to the stream in 4.14 isn't >> relevant here either, and it's instead the bumping in 4.15 which is? > > The fact that the bump happened is relevant, by virtue of the fact there > logic added to cope. The fact it was in 4.15 is not relevant - this > isn't a list of every ABI-relevant change. > > CPUID data being added to the stream is critically important, because > that's the point after which we never enter this compatibility path. If the bump happened before CPUID data was added to the stream, logic to cope with migrating-in guests would have been required too, wouldn't it. But anyway, just to be done with this: Acked-by: Jan Beulich Jan
Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility
On 04/02/2022 13:09, Jan Beulich wrote: > On 04.02.2022 13:12, Andrew Cooper wrote: >> On 04/02/2022 08:31, Jan Beulich wrote: >>> On 03.02.2022 19:10, Andrew Cooper wrote: It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13 that we need to worry about with regards to compatibility. Xen 4.12 isn't relevant. Expand and correct the commentary. Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore") >>> But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework >>> xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is >>> where DEF_MAX_* disappeared? >> No. All that happened in that change was that we switched to using >> >> cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD >> >> instead, which remained the same size until Xen 4.15 when e9b4fe26364 >> bumped it. > Oh, right. I did try to look for a replacement, but managed to miss > this. But then, as much as 4.12 isn't relevant, isn't it the case > that the fact that CPUID data was added to the stream in 4.14 isn't > relevant here either, and it's instead the bumping in 4.15 which is? The fact that the bump happened is relevant, by virtue of the fact there logic added to cope. The fact it was in 4.15 is not relevant - this isn't a list of every ABI-relevant change. CPUID data being added to the stream is critically important, because that's the point after which we never enter this compatibility path. ~Andrew
Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility
On 04.02.2022 13:12, Andrew Cooper wrote: > On 04/02/2022 08:31, Jan Beulich wrote: >> On 03.02.2022 19:10, Andrew Cooper wrote: >>> It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13 >>> that we need to worry about with regards to compatibility. Xen 4.12 isn't >>> relevant. >>> >>> Expand and correct the commentary. >>> >>> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore") >> But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework >> xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is >> where DEF_MAX_* disappeared? > > No. All that happened in that change was that we switched to using > > cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD > > instead, which remained the same size until Xen 4.15 when e9b4fe26364 > bumped it. Oh, right. I did try to look for a replacement, but managed to miss this. But then, as much as 4.12 isn't relevant, isn't it the case that the fact that CPUID data was added to the stream in 4.14 isn't relevant here either, and it's instead the bumping in 4.15 which is? IOW while I've been fine with the comment adjustment anyway, there would still want to be an adjustment to the description. >> While looking at this, wasn't Roger's change incomplete, in that >> for Intel the extended leaf upper bound was 0x8008 in 4.12? > > CPUID_GUEST_NR_EXTD_INTEL is still 8, so this is all fine. Again, somehow I did manage to miss the replacement defines. Jan
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote: > > > On 04.02.22 14:47, Jan Beulich wrote: > > On 04.02.2022 13:37, Oleksandr Andrushchenko wrote: > >> > >> On 04.02.22 13:37, Jan Beulich wrote: > >>> On 04.02.2022 12:13, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: > > On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: > >> On 04.02.22 11:15, Jan Beulich wrote: > >>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > On 04.02.22 09:52, Jan Beulich wrote: > > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > >> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev > >> *pdev, uint16_t cmd, bool rom_only) > >> continue; > >> } > >> > >> +spin_lock(>vpci_lock); > >> +if ( !tmp->vpci ) > >> +{ > >> +spin_unlock(>vpci_lock); > >> +continue; > >> +} > >> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); > >> i++ ) > >> { > >> const struct vpci_bar *bar = > >> >vpci->header.bars[i]; > >> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev > >> *pdev, uint16_t cmd, bool rom_only) > >> rc = rangeset_remove_range(mem, start, end); > >> if ( rc ) > >> { > >> +spin_unlock(>vpci_lock); > >> printk(XENLOG_G_WARNING "Failed to remove > >> [%lx, %lx]: %d\n", > >> start, end, rc); > >> rangeset_destroy(mem); > >> return rc; > >> } > >> } > >> +spin_unlock(>vpci_lock); > >> } > > At the first glance this simply looks like another unjustified (in > > the > > description) change, as you're not converting anything here but you > > actually add locking (and I realize this was there before, so I'm > > sorry > > for not pointing this out earlier). > Well, I thought that the description already has "...the lock can be > used (and in a few cases is used right away) to check whether vpci > is present" and this is enough for such uses as here. > > But then I wonder whether you > > actually tested this, since I can't help getting the impression that > > you're introducing a live-lock: The function is called from > > cmd_write() > > and rom_write(), which in turn are called out of vpci_write(). Yet > > that > > function already holds the lock, and the lock is not (currently) > > recursive. (For the 3rd caller of the function - init_bars() - otoh > > the locking looks to be entirely unnecessary.) > Well, you are correct: if tmp != pdev then it is correct to acquire > the lock. But if tmp == pdev and rom_only == true > then we'll deadlock. > > It seems we need to have the locking conditional, e.g. only lock > if tmp != pdev > >>> Which will address the live-lock, but introduce ABBA deadlock > >>> potential > >>> between the two locks. > >> I am not sure I can suggest a better solution here > >> @Roger, @Jan, could you please help here? > > Well, first of all I'd like to mention that while it may have been okay > > to > > not hold pcidevs_lock here for Dom0, it surely needs acquiring when > > dealing > > with DomU-s' lists of PCI devices. The requirement really applies to the > > other use of for_each_pdev() as well (in vpci_dump_msi()), except that > > there it probably wants to be a try-lock. > > > > Next I'd like to point out that here we have the still pending issue of > > how to deal with hidden devices, which Dom0 can access. See my RFC patch > > "vPCI: account for hidden devices in modify_bars()". Whatever the > > solution > > here, I think it wants to at least account for the extra need there. > Yes, sorry, I should take care of that. > > > Now it is quite clear that pcidevs_lock isn't going to help with > > avoiding > > the deadlock, as it's imo not an option at all to acquire that lock > > everywhere else you access ->vpci (or else the vpci lock itself would be > > pointless). But a per-domain auxiliary r/w lock may help: Other paths > > would acquire it in read mode, and here you'd acquire it in write mode > > (in > > the former case around the vpci lock, while in the latter case there may > > then not be any need to acquire the individual vpci locks at all). > > FTAOD: >
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.2022 13:53, Oleksandr Andrushchenko wrote: > > > On 04.02.22 14:47, Jan Beulich wrote: >> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote: >>> >>> On 04.02.22 13:37, Jan Beulich wrote: On 04.02.2022 12:13, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: >> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: >>> On 04.02.22 11:15, Jan Beulich wrote: On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > On 04.02.22 09:52, Jan Beulich wrote: >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev >>> *pdev, uint16_t cmd, bool rom_only) >>> continue; >>> } >>> >>> +spin_lock(>vpci_lock); >>> +if ( !tmp->vpci ) >>> +{ >>> +spin_unlock(>vpci_lock); >>> +continue; >>> +} >>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); >>> i++ ) >>> { >>> const struct vpci_bar *bar = >>> >vpci->header.bars[i]; >>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev >>> *pdev, uint16_t cmd, bool rom_only) >>> rc = rangeset_remove_range(mem, start, end); >>> if ( rc ) >>> { >>> +spin_unlock(>vpci_lock); >>> printk(XENLOG_G_WARNING "Failed to remove >>> [%lx, %lx]: %d\n", >>> start, end, rc); >>> rangeset_destroy(mem); >>> return rc; >>> } >>> } >>> +spin_unlock(>vpci_lock); >>> } >> At the first glance this simply looks like another unjustified (in >> the >> description) change, as you're not converting anything here but you >> actually add locking (and I realize this was there before, so I'm >> sorry >> for not pointing this out earlier). > Well, I thought that the description already has "...the lock can be > used (and in a few cases is used right away) to check whether vpci > is present" and this is enough for such uses as here. >> But then I wonder whether you >> actually tested this, since I can't help getting the impression that >> you're introducing a live-lock: The function is called from >> cmd_write() >> and rom_write(), which in turn are called out of vpci_write(). Yet >> that >> function already holds the lock, and the lock is not (currently) >> recursive. (For the 3rd caller of the function - init_bars() - otoh >> the locking looks to be entirely unnecessary.) > Well, you are correct: if tmp != pdev then it is correct to acquire > the lock. But if tmp == pdev and rom_only == true > then we'll deadlock. > > It seems we need to have the locking conditional, e.g. only lock > if tmp != pdev Which will address the live-lock, but introduce ABBA deadlock potential between the two locks. >>> I am not sure I can suggest a better solution here >>> @Roger, @Jan, could you please help here? >> Well, first of all I'd like to mention that while it may have been okay >> to >> not hold pcidevs_lock here for Dom0, it surely needs acquiring when >> dealing >> with DomU-s' lists of PCI devices. The requirement really applies to the >> other use of for_each_pdev() as well (in vpci_dump_msi()), except that >> there it probably wants to be a try-lock. >> >> Next I'd like to point out that here we have the still pending issue of >> how to deal with hidden devices, which Dom0 can access. See my RFC patch >> "vPCI: account for hidden devices in modify_bars()". Whatever the >> solution >> here, I think it wants to at least account for the extra need there. > Yes, sorry, I should take care of that. > >> Now it is quite clear that pcidevs_lock isn't going to help with avoiding >> the deadlock, as it's imo not an option at all to acquire that lock >> everywhere else you access ->vpci (or else the vpci lock itself would be >> pointless). But a per-domain auxiliary r/w lock may help: Other paths >> would acquire it in read mode, and here you'd acquire it in write mode >> (in >> the former case around the vpci lock, while in the latter case there may >> then not be any need to acquire the individual vpci locks at all). FTAOD: >> I haven't fully thought through all implications (and hence whether this >> is >> viable in the first place); I expect you
Re: [PATCH] xen: Modify domain_crash() to take a print string
On 04.02.2022 12:56, Andrew Cooper wrote: > On 03/02/2022 15:06, Jan Beulich wrote: >> On 03.02.2022 15:41, Andrew Cooper wrote: >>> On 03/02/2022 14:19, Jan Beulich wrote: On 03.02.2022 15:11, Andrew Cooper wrote: > On 03/02/2022 13:48, Julien Grall wrote: >> On 03/02/2022 13:38, Andrew Cooper wrote: >>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >>> index 37f78cc4c4c9..38b390d20371 100644 >>> --- a/xen/include/xen/sched.h >>> +++ b/xen/include/xen/sched.h >>> @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v); >>> * from any processor. >>> */ >>> void __domain_crash(struct domain *d); >>> -#define domain_crash(d) do >>> { \ >>> - printk("domain_crash called from %s:%d\n", __FILE__, >>> __LINE__); \ >>> - >>> __domain_crash(d); \ >>> -} while (0) >>> +#define domain_crash(d, ...) \ >>> + do { \ >>> + if ( count_args(__VA_ARGS__) == 0 ) \ >>> + printk("domain_crash called from %s:%d\n", \ >>> + __FILE__, __LINE__); \ >> I find a bit odd that here you are using a normal printk > That's unmodified from before. Only reformatted. > >> but... >> >> >>> + else \ >>> + printk(XENLOG_G_ERR __VA_ARGS__); \ >> here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so, >> wouldn't it be better to only use XENLOG_ERR so they can always be >> seen? (A domain shouldn't be able to abuse it). > Perhaps. I suppose it is more important information than pretty much > anything else about the guest. Indeed, but then - is this really an error in all cases? >>> Yes. It is always a fatal event for the VM. >> Which may or may not be Xen's fault. If the guest put itself in a bad >> state, I don't see why we shouldn't consider such just a warning. > > Log level is the severity of the action, not who's potentially to blame > for causing the situation. > > Furthermore, almost all callers who do emit appropriate diagnostics > before domain_crash() already use ERR. Well, yes, this looks to indeed be the case (albeit frequently with gdprintk(), in which case I'm inclined to say the log level isn't very relevant in the first place). On this basis I'm willing to give in, despite continuing to not being fully convinced. Jan
Re: [PATCH] xen: Modify domain_crash() to take a print string
On 03.02.2022 14:38, Andrew Cooper wrote: > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -1693,11 +1693,8 @@ static void load_segments(struct vcpu *n) > put_guest(uregs->fs, esp - 5) | > put_guest(uregs->es, esp - 6) | > put_guest(uregs->ds, esp - 7) ) > -{ > -gprintk(XENLOG_ERR, > -"error while creating compat failsafe callback > frame\n"); > -domain_crash(n->domain); > -} > +domain_crash(n->domain, > + "Error creating compat failsafe callback > frame\n"); > > if ( n->arch.pv.vgc_flags & VGCF_failsafe_disables_events ) > vcpu_info(n, evtchn_upcall_mask) = 1; > @@ -1732,11 +1729,8 @@ static void load_segments(struct vcpu *n) > put_guest(uregs->ds, rsp - 9) | > put_guest(regs->r11, rsp - 10) | > put_guest(regs->rcx, rsp - 11) ) > -{ > -gprintk(XENLOG_ERR, > -"error while creating failsafe callback frame\n"); > -domain_crash(n->domain); > -} > +domain_crash(n->domain, > + "Error creating failsafe callback frame\n"); I assume it wasn't really intended to hide potentially relevant information (the subject vCPU) by this change, which - by way of gprintk() - did get logged before (since we already have n == current at this point)? Jan
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.22 14:47, Jan Beulich wrote: > On 04.02.2022 13:37, Oleksandr Andrushchenko wrote: >> >> On 04.02.22 13:37, Jan Beulich wrote: >>> On 04.02.2022 12:13, Roger Pau Monné wrote: On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: > On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: >> On 04.02.22 11:15, Jan Beulich wrote: >>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: On 04.02.22 09:52, Jan Beulich wrote: > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev >> *pdev, uint16_t cmd, bool rom_only) >> continue; >> } >> >> +spin_lock(>vpci_lock); >> +if ( !tmp->vpci ) >> +{ >> +spin_unlock(>vpci_lock); >> +continue; >> +} >> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); >> i++ ) >> { >> const struct vpci_bar *bar = >> >vpci->header.bars[i]; >> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev >> *pdev, uint16_t cmd, bool rom_only) >> rc = rangeset_remove_range(mem, start, end); >> if ( rc ) >> { >> +spin_unlock(>vpci_lock); >> printk(XENLOG_G_WARNING "Failed to remove [%lx, >> %lx]: %d\n", >> start, end, rc); >> rangeset_destroy(mem); >> return rc; >> } >> } >> +spin_unlock(>vpci_lock); >> } > At the first glance this simply looks like another unjustified (in the > description) change, as you're not converting anything here but you > actually add locking (and I realize this was there before, so I'm > sorry > for not pointing this out earlier). Well, I thought that the description already has "...the lock can be used (and in a few cases is used right away) to check whether vpci is present" and this is enough for such uses as here. > But then I wonder whether you > actually tested this, since I can't help getting the impression that > you're introducing a live-lock: The function is called from > cmd_write() > and rom_write(), which in turn are called out of vpci_write(). Yet > that > function already holds the lock, and the lock is not (currently) > recursive. (For the 3rd caller of the function - init_bars() - otoh > the locking looks to be entirely unnecessary.) Well, you are correct: if tmp != pdev then it is correct to acquire the lock. But if tmp == pdev and rom_only == true then we'll deadlock. It seems we need to have the locking conditional, e.g. only lock if tmp != pdev >>> Which will address the live-lock, but introduce ABBA deadlock potential >>> between the two locks. >> I am not sure I can suggest a better solution here >> @Roger, @Jan, could you please help here? > Well, first of all I'd like to mention that while it may have been okay to > not hold pcidevs_lock here for Dom0, it surely needs acquiring when > dealing > with DomU-s' lists of PCI devices. The requirement really applies to the > other use of for_each_pdev() as well (in vpci_dump_msi()), except that > there it probably wants to be a try-lock. > > Next I'd like to point out that here we have the still pending issue of > how to deal with hidden devices, which Dom0 can access. See my RFC patch > "vPCI: account for hidden devices in modify_bars()". Whatever the solution > here, I think it wants to at least account for the extra need there. Yes, sorry, I should take care of that. > Now it is quite clear that pcidevs_lock isn't going to help with avoiding > the deadlock, as it's imo not an option at all to acquire that lock > everywhere else you access ->vpci (or else the vpci lock itself would be > pointless). But a per-domain auxiliary r/w lock may help: Other paths > would acquire it in read mode, and here you'd acquire it in write mode (in > the former case around the vpci lock, while in the latter case there may > then not be any need to acquire the individual vpci locks at all). FTAOD: > I haven't fully thought through all implications (and hence whether this > is > viable in the first place); I expect you will, documenting what you've > found in the resulting patch description. Of course the double lock > acquire/release would then likely want hiding in helper functions. I've
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.2022 13:37, Oleksandr Andrushchenko wrote: > > > On 04.02.22 13:37, Jan Beulich wrote: >> On 04.02.2022 12:13, Roger Pau Monné wrote: >>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: > On 04.02.22 11:15, Jan Beulich wrote: >> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: >>> On 04.02.22 09:52, Jan Beulich wrote: On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev > *pdev, uint16_t cmd, bool rom_only) > continue; > } > > +spin_lock(>vpci_lock); > +if ( !tmp->vpci ) > +{ > +spin_unlock(>vpci_lock); > +continue; > +} > for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) > { > const struct vpci_bar *bar = > >vpci->header.bars[i]; > @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev > *pdev, uint16_t cmd, bool rom_only) > rc = rangeset_remove_range(mem, start, end); > if ( rc ) > { > +spin_unlock(>vpci_lock); > printk(XENLOG_G_WARNING "Failed to remove [%lx, > %lx]: %d\n", >start, end, rc); > rangeset_destroy(mem); > return rc; > } > } > +spin_unlock(>vpci_lock); > } At the first glance this simply looks like another unjustified (in the description) change, as you're not converting anything here but you actually add locking (and I realize this was there before, so I'm sorry for not pointing this out earlier). >>> Well, I thought that the description already has "...the lock can be >>> used (and in a few cases is used right away) to check whether vpci >>> is present" and this is enough for such uses as here. But then I wonder whether you actually tested this, since I can't help getting the impression that you're introducing a live-lock: The function is called from cmd_write() and rom_write(), which in turn are called out of vpci_write(). Yet that function already holds the lock, and the lock is not (currently) recursive. (For the 3rd caller of the function - init_bars() - otoh the locking looks to be entirely unnecessary.) >>> Well, you are correct: if tmp != pdev then it is correct to acquire >>> the lock. But if tmp == pdev and rom_only == true >>> then we'll deadlock. >>> >>> It seems we need to have the locking conditional, e.g. only lock >>> if tmp != pdev >> Which will address the live-lock, but introduce ABBA deadlock potential >> between the two locks. > I am not sure I can suggest a better solution here > @Roger, @Jan, could you please help here? Well, first of all I'd like to mention that while it may have been okay to not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing with DomU-s' lists of PCI devices. The requirement really applies to the other use of for_each_pdev() as well (in vpci_dump_msi()), except that there it probably wants to be a try-lock. Next I'd like to point out that here we have the still pending issue of how to deal with hidden devices, which Dom0 can access. See my RFC patch "vPCI: account for hidden devices in modify_bars()". Whatever the solution here, I think it wants to at least account for the extra need there. >>> Yes, sorry, I should take care of that. >>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding the deadlock, as it's imo not an option at all to acquire that lock everywhere else you access ->vpci (or else the vpci lock itself would be pointless). But a per-domain auxiliary r/w lock may help: Other paths would acquire it in read mode, and here you'd acquire it in write mode (in the former case around the vpci lock, while in the latter case there may then not be any need to acquire the individual vpci locks at all). FTAOD: I haven't fully thought through all implications (and hence whether this is viable in the first place); I expect you will, documenting what you've found in the resulting patch description. Of course the double lock acquire/release would then likely want hiding in helper functions. >>> I've been also thinking about this, and whether it's really worth to >>> have a per-device lock rather than a per-domain one that protects all >>> vpci regions of the devices assigned to the domain. >>> >>>
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.22 13:37, Jan Beulich wrote: > On 04.02.2022 12:13, Roger Pau Monné wrote: >> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: >>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: On 04.02.22 11:15, Jan Beulich wrote: > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: >> On 04.02.22 09:52, Jan Beulich wrote: >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) continue; } +spin_lock(>vpci_lock); +if ( !tmp->vpci ) +{ +spin_unlock(>vpci_lock); +continue; +} for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) { const struct vpci_bar *bar = >vpci->header.bars[i]; @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) rc = rangeset_remove_range(mem, start, end); if ( rc ) { +spin_unlock(>vpci_lock); printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n", start, end, rc); rangeset_destroy(mem); return rc; } } +spin_unlock(>vpci_lock); } >>> At the first glance this simply looks like another unjustified (in the >>> description) change, as you're not converting anything here but you >>> actually add locking (and I realize this was there before, so I'm sorry >>> for not pointing this out earlier). >> Well, I thought that the description already has "...the lock can be >> used (and in a few cases is used right away) to check whether vpci >> is present" and this is enough for such uses as here. >>> But then I wonder whether you >>> actually tested this, since I can't help getting the impression that >>> you're introducing a live-lock: The function is called from cmd_write() >>> and rom_write(), which in turn are called out of vpci_write(). Yet that >>> function already holds the lock, and the lock is not (currently) >>> recursive. (For the 3rd caller of the function - init_bars() - otoh >>> the locking looks to be entirely unnecessary.) >> Well, you are correct: if tmp != pdev then it is correct to acquire >> the lock. But if tmp == pdev and rom_only == true >> then we'll deadlock. >> >> It seems we need to have the locking conditional, e.g. only lock >> if tmp != pdev > Which will address the live-lock, but introduce ABBA deadlock potential > between the two locks. I am not sure I can suggest a better solution here @Roger, @Jan, could you please help here? >>> Well, first of all I'd like to mention that while it may have been okay to >>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing >>> with DomU-s' lists of PCI devices. The requirement really applies to the >>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that >>> there it probably wants to be a try-lock. >>> >>> Next I'd like to point out that here we have the still pending issue of >>> how to deal with hidden devices, which Dom0 can access. See my RFC patch >>> "vPCI: account for hidden devices in modify_bars()". Whatever the solution >>> here, I think it wants to at least account for the extra need there. >> Yes, sorry, I should take care of that. >> >>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding >>> the deadlock, as it's imo not an option at all to acquire that lock >>> everywhere else you access ->vpci (or else the vpci lock itself would be >>> pointless). But a per-domain auxiliary r/w lock may help: Other paths >>> would acquire it in read mode, and here you'd acquire it in write mode (in >>> the former case around the vpci lock, while in the latter case there may >>> then not be any need to acquire the individual vpci locks at all). FTAOD: >>> I haven't fully thought through all implications (and hence whether this is >>> viable in the first place); I expect you will, documenting what you've >>> found in the resulting patch description. Of course the double lock >>> acquire/release would then likely want hiding in helper functions. >> I've been also thinking about this, and whether it's really worth to >> have a per-device lock rather than a per-domain one that protects all >> vpci regions of the devices assigned to the domain. >> >> The OS is likely to serialize accesses to the PCI config space anyway, >> and the only place I could see a benefit of having per-device locks is >>
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On Fri, Feb 04, 2022 at 11:37:50AM +, Oleksandr Andrushchenko wrote: > > > On 04.02.22 13:13, Roger Pau Monné wrote: > > On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: > >> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: > >>> On 04.02.22 11:15, Jan Beulich wrote: > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > > On 04.02.22 09:52, Jan Beulich wrote: > >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > >>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev > >>> *pdev, uint16_t cmd, bool rom_only) > >>> continue; > >>> } > >>> > >>> +spin_lock(>vpci_lock); > >>> +if ( !tmp->vpci ) > >>> +{ > >>> +spin_unlock(>vpci_lock); > >>> +continue; > >>> +} > >>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) > >>> { > >>> const struct vpci_bar *bar = > >>> >vpci->header.bars[i]; > >>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev > >>> *pdev, uint16_t cmd, bool rom_only) > >>> rc = rangeset_remove_range(mem, start, end); > >>> if ( rc ) > >>> { > >>> +spin_unlock(>vpci_lock); > >>> printk(XENLOG_G_WARNING "Failed to remove [%lx, > >>> %lx]: %d\n", > >>>start, end, rc); > >>> rangeset_destroy(mem); > >>> return rc; > >>> } > >>> } > >>> +spin_unlock(>vpci_lock); > >>> } > >> At the first glance this simply looks like another unjustified (in the > >> description) change, as you're not converting anything here but you > >> actually add locking (and I realize this was there before, so I'm sorry > >> for not pointing this out earlier). > > Well, I thought that the description already has "...the lock can be > > used (and in a few cases is used right away) to check whether vpci > > is present" and this is enough for such uses as here. > >> But then I wonder whether you > >> actually tested this, since I can't help getting the impression that > >> you're introducing a live-lock: The function is called from cmd_write() > >> and rom_write(), which in turn are called out of vpci_write(). Yet that > >> function already holds the lock, and the lock is not (currently) > >> recursive. (For the 3rd caller of the function - init_bars() - otoh > >> the locking looks to be entirely unnecessary.) > > Well, you are correct: if tmp != pdev then it is correct to acquire > > the lock. But if tmp == pdev and rom_only == true > > then we'll deadlock. > > > > It seems we need to have the locking conditional, e.g. only lock > > if tmp != pdev > Which will address the live-lock, but introduce ABBA deadlock potential > between the two locks. > >>> I am not sure I can suggest a better solution here > >>> @Roger, @Jan, could you please help here? > >> Well, first of all I'd like to mention that while it may have been okay to > >> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing > >> with DomU-s' lists of PCI devices. The requirement really applies to the > >> other use of for_each_pdev() as well (in vpci_dump_msi()), except that > >> there it probably wants to be a try-lock. > >> > >> Next I'd like to point out that here we have the still pending issue of > >> how to deal with hidden devices, which Dom0 can access. See my RFC patch > >> "vPCI: account for hidden devices in modify_bars()". Whatever the solution > >> here, I think it wants to at least account for the extra need there. > > Yes, sorry, I should take care of that. > > > >> Now it is quite clear that pcidevs_lock isn't going to help with avoiding > >> the deadlock, as it's imo not an option at all to acquire that lock > >> everywhere else you access ->vpci (or else the vpci lock itself would be > >> pointless). But a per-domain auxiliary r/w lock may help: Other paths > >> would acquire it in read mode, and here you'd acquire it in write mode (in > >> the former case around the vpci lock, while in the latter case there may > >> then not be any need to acquire the individual vpci locks at all). FTAOD: > >> I haven't fully thought through all implications (and hence whether this is > >> viable in the first place); I expect you will, documenting what you've > >> found in the resulting patch description. Of course the double lock > >> acquire/release would then likely want hiding in helper functions. > > I've been also thinking about this, and whether it's really worth to > > have a per-device lock rather than a per-domain one that protects all > > vpci regions of the devices assigned to the domain. > > > > The OS is likely to
Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility
On 04/02/2022 08:31, Jan Beulich wrote: > On 03.02.2022 19:10, Andrew Cooper wrote: >> It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13 >> that we need to worry about with regards to compatibility. Xen 4.12 isn't >> relevant. >> >> Expand and correct the commentary. >> >> Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore") > But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework > xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is > where DEF_MAX_* disappeared? No. All that happened in that change was that we switched to using cpuid.h:89:#define CPUID_GUEST_NR_EXTD_AMD instead, which remained the same size until Xen 4.15 when e9b4fe26364 bumped it. > While looking at this, wasn't Roger's change incomplete, in that > for Intel the extended leaf upper bound was 0x8008 in 4.12? CPUID_GUEST_NR_EXTD_INTEL is still 8, so this is all fine. Even if hardware has more, we'll still limit to 8 on Intel. That said, to this date Intel haven't bumped the limit, so we're approaching 2 decades without change. Should this change, then yes, we'd have have to adjust the compat path. ~Andrew
Re: [PATCH] xen: Modify domain_crash() to take a print string
On 03/02/2022 15:06, Jan Beulich wrote: > On 03.02.2022 15:41, Andrew Cooper wrote: >> On 03/02/2022 14:19, Jan Beulich wrote: >>> On 03.02.2022 15:11, Andrew Cooper wrote: On 03/02/2022 13:48, Julien Grall wrote: > On 03/02/2022 13:38, Andrew Cooper wrote: >> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >> index 37f78cc4c4c9..38b390d20371 100644 >> --- a/xen/include/xen/sched.h >> +++ b/xen/include/xen/sched.h >> @@ -736,10 +736,15 @@ void vcpu_end_shutdown_deferral(struct vcpu *v); >> * from any processor. >> */ >> void __domain_crash(struct domain *d); >> -#define domain_crash(d) do >> { \ >> - printk("domain_crash called from %s:%d\n", __FILE__, >> __LINE__); \ >> - >> __domain_crash(d); \ >> -} while (0) >> +#define domain_crash(d, ...) \ >> + do { \ >> + if ( count_args(__VA_ARGS__) == 0 ) \ >> + printk("domain_crash called from %s:%d\n", \ >> + __FILE__, __LINE__); \ > I find a bit odd that here you are using a normal printk That's unmodified from before. Only reformatted. > but... > > >> + else \ >> + printk(XENLOG_G_ERR __VA_ARGS__); \ > here it is XENLOG_G_ERR. In fact, isn't it ratelimited? If so, > wouldn't it be better to only use XENLOG_ERR so they can always be > seen? (A domain shouldn't be able to abuse it). Perhaps. I suppose it is more important information than pretty much anything else about the guest. >>> Indeed, but then - is this really an error in all cases? >> Yes. It is always a fatal event for the VM. > Which may or may not be Xen's fault. If the guest put itself in a bad > state, I don't see why we shouldn't consider such just a warning. Log level is the severity of the action, not who's potentially to blame for causing the situation. Furthermore, almost all callers who do emit appropriate diagnostics before domain_crash() already use ERR. And, as already pointed out, it doesn't matter. The line is going out on the console however you want to try and bikeshed the internals. > IOW > I continue to think a log level, if so wanted, should be supplied by > the user of the macro. That undermines the whole purpose of preventing callers from being able to omit diagnostics. ~Andrew
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.22 13:13, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: >> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: >>> On 04.02.22 11:15, Jan Beulich wrote: On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > On 04.02.22 09:52, Jan Beulich wrote: >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, >>> uint16_t cmd, bool rom_only) >>> continue; >>> } >>> >>> +spin_lock(>vpci_lock); >>> +if ( !tmp->vpci ) >>> +{ >>> +spin_unlock(>vpci_lock); >>> +continue; >>> +} >>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) >>> { >>> const struct vpci_bar *bar = >vpci->header.bars[i]; >>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev >>> *pdev, uint16_t cmd, bool rom_only) >>> rc = rangeset_remove_range(mem, start, end); >>> if ( rc ) >>> { >>> +spin_unlock(>vpci_lock); >>> printk(XENLOG_G_WARNING "Failed to remove [%lx, >>> %lx]: %d\n", >>>start, end, rc); >>> rangeset_destroy(mem); >>> return rc; >>> } >>> } >>> +spin_unlock(>vpci_lock); >>> } >> At the first glance this simply looks like another unjustified (in the >> description) change, as you're not converting anything here but you >> actually add locking (and I realize this was there before, so I'm sorry >> for not pointing this out earlier). > Well, I thought that the description already has "...the lock can be > used (and in a few cases is used right away) to check whether vpci > is present" and this is enough for such uses as here. >> But then I wonder whether you >> actually tested this, since I can't help getting the impression that >> you're introducing a live-lock: The function is called from cmd_write() >> and rom_write(), which in turn are called out of vpci_write(). Yet that >> function already holds the lock, and the lock is not (currently) >> recursive. (For the 3rd caller of the function - init_bars() - otoh >> the locking looks to be entirely unnecessary.) > Well, you are correct: if tmp != pdev then it is correct to acquire > the lock. But if tmp == pdev and rom_only == true > then we'll deadlock. > > It seems we need to have the locking conditional, e.g. only lock > if tmp != pdev Which will address the live-lock, but introduce ABBA deadlock potential between the two locks. >>> I am not sure I can suggest a better solution here >>> @Roger, @Jan, could you please help here? >> Well, first of all I'd like to mention that while it may have been okay to >> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing >> with DomU-s' lists of PCI devices. The requirement really applies to the >> other use of for_each_pdev() as well (in vpci_dump_msi()), except that >> there it probably wants to be a try-lock. >> >> Next I'd like to point out that here we have the still pending issue of >> how to deal with hidden devices, which Dom0 can access. See my RFC patch >> "vPCI: account for hidden devices in modify_bars()". Whatever the solution >> here, I think it wants to at least account for the extra need there. > Yes, sorry, I should take care of that. > >> Now it is quite clear that pcidevs_lock isn't going to help with avoiding >> the deadlock, as it's imo not an option at all to acquire that lock >> everywhere else you access ->vpci (or else the vpci lock itself would be >> pointless). But a per-domain auxiliary r/w lock may help: Other paths >> would acquire it in read mode, and here you'd acquire it in write mode (in >> the former case around the vpci lock, while in the latter case there may >> then not be any need to acquire the individual vpci locks at all). FTAOD: >> I haven't fully thought through all implications (and hence whether this is >> viable in the first place); I expect you will, documenting what you've >> found in the resulting patch description. Of course the double lock >> acquire/release would then likely want hiding in helper functions. > I've been also thinking about this, and whether it's really worth to > have a per-device lock rather than a per-domain one that protects all > vpci regions of the devices assigned to the domain. > > The OS is likely to serialize accesses to the PCI config space anyway, > and the only place I could see a benefit of having per-device locks is > in the handling of MSI-X tables, as the handling of the mask bit is > likely very performance sensitive, so adding a per-domain lock there >
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.2022 12:13, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: >> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: >>> On 04.02.22 11:15, Jan Beulich wrote: On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > On 04.02.22 09:52, Jan Beulich wrote: >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, >>> uint16_t cmd, bool rom_only) >>>continue; >>>} >>> >>> +spin_lock(>vpci_lock); >>> +if ( !tmp->vpci ) >>> +{ >>> +spin_unlock(>vpci_lock); >>> +continue; >>> +} >>>for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) >>>{ >>>const struct vpci_bar *bar = >vpci->header.bars[i]; >>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev >>> *pdev, uint16_t cmd, bool rom_only) >>>rc = rangeset_remove_range(mem, start, end); >>>if ( rc ) >>>{ >>> +spin_unlock(>vpci_lock); >>>printk(XENLOG_G_WARNING "Failed to remove [%lx, >>> %lx]: %d\n", >>> start, end, rc); >>>rangeset_destroy(mem); >>>return rc; >>>} >>>} >>> +spin_unlock(>vpci_lock); >>>} >> At the first glance this simply looks like another unjustified (in the >> description) change, as you're not converting anything here but you >> actually add locking (and I realize this was there before, so I'm sorry >> for not pointing this out earlier). > Well, I thought that the description already has "...the lock can be > used (and in a few cases is used right away) to check whether vpci > is present" and this is enough for such uses as here. >>But then I wonder whether you >> actually tested this, since I can't help getting the impression that >> you're introducing a live-lock: The function is called from cmd_write() >> and rom_write(), which in turn are called out of vpci_write(). Yet that >> function already holds the lock, and the lock is not (currently) >> recursive. (For the 3rd caller of the function - init_bars() - otoh >> the locking looks to be entirely unnecessary.) > Well, you are correct: if tmp != pdev then it is correct to acquire > the lock. But if tmp == pdev and rom_only == true > then we'll deadlock. > > It seems we need to have the locking conditional, e.g. only lock > if tmp != pdev Which will address the live-lock, but introduce ABBA deadlock potential between the two locks. >>> I am not sure I can suggest a better solution here >>> @Roger, @Jan, could you please help here? >> >> Well, first of all I'd like to mention that while it may have been okay to >> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing >> with DomU-s' lists of PCI devices. The requirement really applies to the >> other use of for_each_pdev() as well (in vpci_dump_msi()), except that >> there it probably wants to be a try-lock. >> >> Next I'd like to point out that here we have the still pending issue of >> how to deal with hidden devices, which Dom0 can access. See my RFC patch >> "vPCI: account for hidden devices in modify_bars()". Whatever the solution >> here, I think it wants to at least account for the extra need there. > > Yes, sorry, I should take care of that. > >> Now it is quite clear that pcidevs_lock isn't going to help with avoiding >> the deadlock, as it's imo not an option at all to acquire that lock >> everywhere else you access ->vpci (or else the vpci lock itself would be >> pointless). But a per-domain auxiliary r/w lock may help: Other paths >> would acquire it in read mode, and here you'd acquire it in write mode (in >> the former case around the vpci lock, while in the latter case there may >> then not be any need to acquire the individual vpci locks at all). FTAOD: >> I haven't fully thought through all implications (and hence whether this is >> viable in the first place); I expect you will, documenting what you've >> found in the resulting patch description. Of course the double lock >> acquire/release would then likely want hiding in helper functions. > > I've been also thinking about this, and whether it's really worth to > have a per-device lock rather than a per-domain one that protects all > vpci regions of the devices assigned to the domain. > > The OS is likely to serialize accesses to the PCI config space anyway, > and the only place I could see a benefit of having per-device locks is > in the handling of MSI-X tables, as the handling of the mask bit is > likely very performance sensitive, so adding a per-domain lock there > could
Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole
On 04.02.22 13:00, Julien Grall wrote: > > > On 04/02/2022 10:35, Oleksandr Andrushchenko wrote: >> >> >> On 04.02.22 11:57, Julien Grall wrote: >>> Hi, >>> >>> On 04/02/2022 09:47, Oleksandr Andrushchenko wrote: >> Could you please help me with the exact message you would like to see? > > Here a summary of the discussion (+ some my follow-up thoughts): > > is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c > "xen/pci: detect when BARs are not suitably positioned") to check whether > the BAR are positioned outside of a valid memory range. This was > introduced to work-around quirky firmware. > > In theory, this could also happen on Arm. In practice, this may not > happen but it sounds better to sanity check that the BAR contains "valid" > I/O range. > > On x86, this is implemented by checking the region is not described is in > the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined > ranges. So I think it would be possible to implement is_memory_hole() by > going through the list of hostbridges and check the ranges. > > But first, I'd like to confirm my understanding with Rahul, and others. > > If we were going to go this route, I would also rename the function to be > better match what it is doing (i.e. it checks the BAR is correctly > placed). As a potentially optimization/hardening for Arm, we could pass > the hostbridge so we don't have to walk all of them. It seems this needs to live in the commit message then? So, it is easy to find as everything after "---" is going to be dropped on commit >>> I expect the function to be fully implemented before this is will be merged. >>> >>> So if it is fully implemented, then a fair chunk of what I wrote would not >>> be necessary to carry in the commit message. >> Well, we started from that we want *something* with TODO and now >> you request it to be fully implemented before it is merged. > > I don't think I ever suggested this patch would be merged as-is. Sorry if > this may have crossed like this. Np > > Instead, my intent by asking you to send a TODO patch is to start a > discussion how this function could be implemented for Arm. > > You sent a TODO but you didn't provide any summary on what is the issue, what > we want to achieve... Hence my request to add a bit more details so the other > reviewers can provide their opinion more easily. Ok, so we can discuss it here, but I won't have this patch in v7 > > Cheers, > Thank you, Oleksandr
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: > On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: > > On 04.02.22 11:15, Jan Beulich wrote: > >> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > >>> On 04.02.22 09:52, Jan Beulich wrote: > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > > @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, > > uint16_t cmd, bool rom_only) > >continue; > >} > > > > +spin_lock(>vpci_lock); > > +if ( !tmp->vpci ) > > +{ > > +spin_unlock(>vpci_lock); > > +continue; > > +} > >for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) > >{ > >const struct vpci_bar *bar = >vpci->header.bars[i]; > > @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev > > *pdev, uint16_t cmd, bool rom_only) > >rc = rangeset_remove_range(mem, start, end); > >if ( rc ) > >{ > > +spin_unlock(>vpci_lock); > >printk(XENLOG_G_WARNING "Failed to remove [%lx, > > %lx]: %d\n", > > start, end, rc); > >rangeset_destroy(mem); > >return rc; > >} > >} > > +spin_unlock(>vpci_lock); > >} > At the first glance this simply looks like another unjustified (in the > description) change, as you're not converting anything here but you > actually add locking (and I realize this was there before, so I'm sorry > for not pointing this out earlier). > >>> Well, I thought that the description already has "...the lock can be > >>> used (and in a few cases is used right away) to check whether vpci > >>> is present" and this is enough for such uses as here. > But then I wonder whether you > actually tested this, since I can't help getting the impression that > you're introducing a live-lock: The function is called from cmd_write() > and rom_write(), which in turn are called out of vpci_write(). Yet that > function already holds the lock, and the lock is not (currently) > recursive. (For the 3rd caller of the function - init_bars() - otoh > the locking looks to be entirely unnecessary.) > >>> Well, you are correct: if tmp != pdev then it is correct to acquire > >>> the lock. But if tmp == pdev and rom_only == true > >>> then we'll deadlock. > >>> > >>> It seems we need to have the locking conditional, e.g. only lock > >>> if tmp != pdev > >> Which will address the live-lock, but introduce ABBA deadlock potential > >> between the two locks. > > I am not sure I can suggest a better solution here > > @Roger, @Jan, could you please help here? > > Well, first of all I'd like to mention that while it may have been okay to > not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing > with DomU-s' lists of PCI devices. The requirement really applies to the > other use of for_each_pdev() as well (in vpci_dump_msi()), except that > there it probably wants to be a try-lock. > > Next I'd like to point out that here we have the still pending issue of > how to deal with hidden devices, which Dom0 can access. See my RFC patch > "vPCI: account for hidden devices in modify_bars()". Whatever the solution > here, I think it wants to at least account for the extra need there. Yes, sorry, I should take care of that. > Now it is quite clear that pcidevs_lock isn't going to help with avoiding > the deadlock, as it's imo not an option at all to acquire that lock > everywhere else you access ->vpci (or else the vpci lock itself would be > pointless). But a per-domain auxiliary r/w lock may help: Other paths > would acquire it in read mode, and here you'd acquire it in write mode (in > the former case around the vpci lock, while in the latter case there may > then not be any need to acquire the individual vpci locks at all). FTAOD: > I haven't fully thought through all implications (and hence whether this is > viable in the first place); I expect you will, documenting what you've > found in the resulting patch description. Of course the double lock > acquire/release would then likely want hiding in helper functions. I've been also thinking about this, and whether it's really worth to have a per-device lock rather than a per-domain one that protects all vpci regions of the devices assigned to the domain. The OS is likely to serialize accesses to the PCI config space anyway, and the only place I could see a benefit of having per-device locks is in the handling of MSI-X tables, as the handling of the mask bit is likely very performance sensitive, so adding a per-domain lock there could be a bottleneck. We could alternatively do a per-domain rwlock for vpci and special case the
[xen-unstable test] 168001: tolerable FAIL - PUSHED
flight 168001 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/168001/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 167994 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167994 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167994 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167994 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 167994 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 167994 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167994 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167994 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 167994 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 167994 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167994 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167994 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass version targeted for testing: xen 75cc460a1b8cfd8e5d2c4302234ee194defe4872 baseline version: xen
Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole
On 04/02/2022 10:35, Oleksandr Andrushchenko wrote: On 04.02.22 11:57, Julien Grall wrote: Hi, On 04/02/2022 09:47, Oleksandr Andrushchenko wrote: Could you please help me with the exact message you would like to see? Here a summary of the discussion (+ some my follow-up thoughts): is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c "xen/pci: detect when BARs are not suitably positioned") to check whether the BAR are positioned outside of a valid memory range. This was introduced to work-around quirky firmware. In theory, this could also happen on Arm. In practice, this may not happen but it sounds better to sanity check that the BAR contains "valid" I/O range. On x86, this is implemented by checking the region is not described is in the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined ranges. So I think it would be possible to implement is_memory_hole() by going through the list of hostbridges and check the ranges. But first, I'd like to confirm my understanding with Rahul, and others. If we were going to go this route, I would also rename the function to be better match what it is doing (i.e. it checks the BAR is correctly placed). As a potentially optimization/hardening for Arm, we could pass the hostbridge so we don't have to walk all of them. It seems this needs to live in the commit message then? So, it is easy to find as everything after "---" is going to be dropped on commit I expect the function to be fully implemented before this is will be merged. So if it is fully implemented, then a fair chunk of what I wrote would not be necessary to carry in the commit message. Well, we started from that we want *something* with TODO and now you request it to be fully implemented before it is merged. I don't think I ever suggested this patch would be merged as-is. Sorry if this may have crossed like this. Instead, my intent by asking you to send a TODO patch is to start a discussion how this function could be implemented for Arm. You sent a TODO but you didn't provide any summary on what is the issue, what we want to achieve... Hence my request to add a bit more details so the other reviewers can provide their opinion more easily. Cheers, -- Julien Grall
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On Fri, Feb 04, 2022 at 10:12:46AM +, Oleksandr Andrushchenko wrote: > Hi, Jan! > > On 04.02.22 11:15, Jan Beulich wrote: > > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > >> On 04.02.22 09:52, Jan Beulich wrote: > >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, > uint16_t cmd, bool rom_only) > continue; > } > > +spin_lock(>vpci_lock); > +if ( !tmp->vpci ) > +{ > +spin_unlock(>vpci_lock); > +continue; > +} > for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) > { > const struct vpci_bar *bar = >vpci->header.bars[i]; > @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, > uint16_t cmd, bool rom_only) > rc = rangeset_remove_range(mem, start, end); > if ( rc ) > { > +spin_unlock(>vpci_lock); > printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: > %d\n", > start, end, rc); > rangeset_destroy(mem); > return rc; > } > } > +spin_unlock(>vpci_lock); > } > >>> At the first glance this simply looks like another unjustified (in the > >>> description) change, as you're not converting anything here but you > >>> actually add locking (and I realize this was there before, so I'm sorry > >>> for not pointing this out earlier). > >> Well, I thought that the description already has "...the lock can be > >> used (and in a few cases is used right away) to check whether vpci > >> is present" and this is enough for such uses as here. > >>>But then I wonder whether you > >>> actually tested this, since I can't help getting the impression that > >>> you're introducing a live-lock: The function is called from cmd_write() > >>> and rom_write(), which in turn are called out of vpci_write(). Yet that > >>> function already holds the lock, and the lock is not (currently) > >>> recursive. (For the 3rd caller of the function - init_bars() - otoh > >>> the locking looks to be entirely unnecessary.) > >> Well, you are correct: if tmp != pdev then it is correct to acquire > >> the lock. But if tmp == pdev and rom_only == true > >> then we'll deadlock. > >> > >> It seems we need to have the locking conditional, e.g. only lock > >> if tmp != pdev > > Which will address the live-lock, but introduce ABBA deadlock potential > > between the two locks. > I am not sure I can suggest a better solution here > @Roger, @Jan, could you please help here? I think we could set the locking order based on the memory address of the locks, ie: if ( >vpci_lock < >vpci_lock ) { spin_unlock(>vpci_lock); spin_lock(>vpci_lock); spin_lock(>vpci_lock); if ( !pdev->vpci || >vpci->header != header ) /* ERROR: vpci removed or recreated. */ } else spin_lock(>vpci_lock); That however creates a window where the address of the BARs on the current device (pdev) could be changed, so the result of the mapping might be skewed. I think the guest would only have itself to blame for that, as changing the position of the BARs while toggling memory decoding is not something sensible to do. > > > @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long > addr, unsigned int len, > break; > } > > +msix_put(msix); > return X86EMUL_OKAY; > } > > -spin_lock(>pdev->vpci->lock); > entry = get_entry(msix, addr); > offset = addr & (PCI_MSIX_ENTRY_SIZE - 1); > >>> You're increasing the locked region quite a bit here. If this is really > >>> needed, it wants explaining. And if this is deemed acceptable as a > >>> "side effect", it wants justifying or at least stating imo. Same for > >>> msix_write() then, obviously. > >> Yes, I do increase the locking region here, but the msix variable needs > >> to be protected all the time, so it seems to be obvious that it remains > >> under the lock > > What does the msix variable have to do with the vPCI lock? If you see > > a need to grow the locked region here, then surely this is independent > > of your conversion of the lock, and hence wants to be a prereq fix > > (which may in fact want/need backporting). > First of all, the implementation of msix_get is wrong and needs to be: > > /* > * Note: if vpci_msix found, then this function returns with > * pdev->vpci_lock held. Use msix_put to unlock. > */ > static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr) > { > struct vpci_msix *msix; > > list_for_each_entry ( msix, >arch.hvm.msix_tables, next )
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: > On 04.02.22 11:15, Jan Beulich wrote: >> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: >>> On 04.02.22 09:52, Jan Beulich wrote: On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, > uint16_t cmd, bool rom_only) >continue; >} > > +spin_lock(>vpci_lock); > +if ( !tmp->vpci ) > +{ > +spin_unlock(>vpci_lock); > +continue; > +} >for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) >{ >const struct vpci_bar *bar = >vpci->header.bars[i]; > @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, > uint16_t cmd, bool rom_only) >rc = rangeset_remove_range(mem, start, end); >if ( rc ) >{ > +spin_unlock(>vpci_lock); >printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: > %d\n", > start, end, rc); >rangeset_destroy(mem); >return rc; >} >} > +spin_unlock(>vpci_lock); >} At the first glance this simply looks like another unjustified (in the description) change, as you're not converting anything here but you actually add locking (and I realize this was there before, so I'm sorry for not pointing this out earlier). >>> Well, I thought that the description already has "...the lock can be >>> used (and in a few cases is used right away) to check whether vpci >>> is present" and this is enough for such uses as here. But then I wonder whether you actually tested this, since I can't help getting the impression that you're introducing a live-lock: The function is called from cmd_write() and rom_write(), which in turn are called out of vpci_write(). Yet that function already holds the lock, and the lock is not (currently) recursive. (For the 3rd caller of the function - init_bars() - otoh the locking looks to be entirely unnecessary.) >>> Well, you are correct: if tmp != pdev then it is correct to acquire >>> the lock. But if tmp == pdev and rom_only == true >>> then we'll deadlock. >>> >>> It seems we need to have the locking conditional, e.g. only lock >>> if tmp != pdev >> Which will address the live-lock, but introduce ABBA deadlock potential >> between the two locks. > I am not sure I can suggest a better solution here > @Roger, @Jan, could you please help here? Well, first of all I'd like to mention that while it may have been okay to not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing with DomU-s' lists of PCI devices. The requirement really applies to the other use of for_each_pdev() as well (in vpci_dump_msi()), except that there it probably wants to be a try-lock. Next I'd like to point out that here we have the still pending issue of how to deal with hidden devices, which Dom0 can access. See my RFC patch "vPCI: account for hidden devices in modify_bars()". Whatever the solution here, I think it wants to at least account for the extra need there. Now it is quite clear that pcidevs_lock isn't going to help with avoiding the deadlock, as it's imo not an option at all to acquire that lock everywhere else you access ->vpci (or else the vpci lock itself would be pointless). But a per-domain auxiliary r/w lock may help: Other paths would acquire it in read mode, and here you'd acquire it in write mode (in the former case around the vpci lock, while in the latter case there may then not be any need to acquire the individual vpci locks at all). FTAOD: I haven't fully thought through all implications (and hence whether this is viable in the first place); I expect you will, documenting what you've found in the resulting patch description. Of course the double lock acquire/release would then likely want hiding in helper functions. Jan
Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole
On 04.02.22 11:57, Julien Grall wrote: > Hi, > > On 04/02/2022 09:47, Oleksandr Andrushchenko wrote: Could you please help me with the exact message you would like to see? >>> >>> Here a summary of the discussion (+ some my follow-up thoughts): >>> >>> is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c >>> "xen/pci: detect when BARs are not suitably positioned") to check whether >>> the BAR are positioned outside of a valid memory range. This was introduced >>> to work-around quirky firmware. >>> >>> In theory, this could also happen on Arm. In practice, this may not happen >>> but it sounds better to sanity check that the BAR contains "valid" I/O >>> range. >>> >>> On x86, this is implemented by checking the region is not described is in >>> the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined >>> ranges. So I think it would be possible to implement is_memory_hole() by >>> going through the list of hostbridges and check the ranges. >>> >>> But first, I'd like to confirm my understanding with Rahul, and others. >>> >>> If we were going to go this route, I would also rename the function to be >>> better match what it is doing (i.e. it checks the BAR is correctly placed). >>> As a potentially optimization/hardening for Arm, we could pass the >>> hostbridge so we don't have to walk all of them. >> It seems this needs to live in the commit message then? So, it is easy to >> find >> as everything after "---" is going to be dropped on commit > I expect the function to be fully implemented before this is will be merged. > > So if it is fully implemented, then a fair chunk of what I wrote would not be > necessary to carry in the commit message. Well, we started from that we want *something* with TODO and now you request it to be fully implemented before it is merged. What do I miss here? > > Cheers, > Thank you, Oleksandr
Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field
On 04.02.2022 10:54, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 10:30:54AM +0100, Jan Beulich wrote: >> On 04.02.2022 10:23, Roger Pau Monné wrote: >>> On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote: On 20.01.2022 16:23, Roger Pau Monne wrote: > --- a/xen/arch/x86/include/asm/msi.h > +++ b/xen/arch/x86/include/asm/msi.h > @@ -54,6 +54,7 @@ > #define MSI_ADDR_DEST_ID_SHIFT 12 > #define MSI_ADDR_DEST_ID_MASK 0x00ff000 > #define MSI_ADDR_DEST_ID(dest) (((dest) << > MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) > +#define MSI_ADDR_EXT_DEST_ID_MASK 0xfe0 Especially the immediately preceding macro now becomes kind of stale. >>> >>> Hm, I'm not so sure about that. We could expand the macro to place the >>> high bits in dest at bits 11:4 of the resulting address. However that >>> macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own >>> messages, so until we add support for the hypervisor itself to use the >>> extended destination ID mode there's not much point in modifying the >>> macro IMO. >> >> Well, this is all unhelpful considering the different form of extended >> ID in Intel's doc. At least by way of a comment things need clarifying >> and potential pitfalls need pointing out imo. > > Sure, will add some comments there. > > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { > #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 > #define XEN_DOMCTL_VMSI_X86_TRIG_MASK0x008000 > #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x01 > +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe I'm not convinced it is a good idea to limit the overall destination ID width to 15 bits here - at the interface level we could as well permit more bits right away; the implementation would reject too high a value, of course. Not only with this I further wonder whether the field shouldn't be unsplit while extending it. You won't get away without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was bumped already for 4.17) since afaics the unused bits of this field previously weren't checked for being zero. We could easily have 8 bits vector, 16 bits flags, and 32 bits destination ID in the struct. And there would then still be 8 unused bits (which from now on we ought to check for being zero). >>> >>> So I've made gflags a 64bit field, used the high 32bits for the >>> destination ID, and the low ones for flags. I've left gvec as a >>> separate field in the struct, as I don't want to require a >>> modification to QEMU, just a rebuild against the updated headers will >>> be enough. >> >> Hmm, wait - if qemu uses this without going through a suitable >> abstraction in at least libxc, then we cannot _ever_ change this >> interface: If a rebuild was required, old qemu binaries would >> stop working with newer Xen. If such a dependency indeed exists, >> we'll need a prominent warning comment in the public header. > > Hm, it's bad. The xc_domain_update_msi_irq interface uses a gflags > parameter that's the gflags parameter of xen_domctl_bind_pt_irq. Which > is even worse because it's not using the mask definitions from > domctl.h, but rather a copy of them named XEN_PT_GFLAGS_* that are > hardcoded in xen_pt_msi.c in QEMU code. > > So we can likely expand the layout of gflags, but moving fields is not > an option. I think my original proposal of adding a > XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK mask is the less bad option until > we add a new stable interface for device passthrough for QEMU. Given the observations - yeah, not much of a choice left. Jan
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
Hi, Jan! On 04.02.22 11:15, Jan Beulich wrote: > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: >> On 04.02.22 09:52, Jan Beulich wrote: >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) continue; } +spin_lock(>vpci_lock); +if ( !tmp->vpci ) +{ +spin_unlock(>vpci_lock); +continue; +} for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) { const struct vpci_bar *bar = >vpci->header.bars[i]; @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, uint16_t cmd, bool rom_only) rc = rangeset_remove_range(mem, start, end); if ( rc ) { +spin_unlock(>vpci_lock); printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n", start, end, rc); rangeset_destroy(mem); return rc; } } +spin_unlock(>vpci_lock); } >>> At the first glance this simply looks like another unjustified (in the >>> description) change, as you're not converting anything here but you >>> actually add locking (and I realize this was there before, so I'm sorry >>> for not pointing this out earlier). >> Well, I thought that the description already has "...the lock can be >> used (and in a few cases is used right away) to check whether vpci >> is present" and this is enough for such uses as here. >>>But then I wonder whether you >>> actually tested this, since I can't help getting the impression that >>> you're introducing a live-lock: The function is called from cmd_write() >>> and rom_write(), which in turn are called out of vpci_write(). Yet that >>> function already holds the lock, and the lock is not (currently) >>> recursive. (For the 3rd caller of the function - init_bars() - otoh >>> the locking looks to be entirely unnecessary.) >> Well, you are correct: if tmp != pdev then it is correct to acquire >> the lock. But if tmp == pdev and rom_only == true >> then we'll deadlock. >> >> It seems we need to have the locking conditional, e.g. only lock >> if tmp != pdev > Which will address the live-lock, but introduce ABBA deadlock potential > between the two locks. I am not sure I can suggest a better solution here @Roger, @Jan, could you please help here? > @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len, break; } +msix_put(msix); return X86EMUL_OKAY; } -spin_lock(>pdev->vpci->lock); entry = get_entry(msix, addr); offset = addr & (PCI_MSIX_ENTRY_SIZE - 1); >>> You're increasing the locked region quite a bit here. If this is really >>> needed, it wants explaining. And if this is deemed acceptable as a >>> "side effect", it wants justifying or at least stating imo. Same for >>> msix_write() then, obviously. >> Yes, I do increase the locking region here, but the msix variable needs >> to be protected all the time, so it seems to be obvious that it remains >> under the lock > What does the msix variable have to do with the vPCI lock? If you see > a need to grow the locked region here, then surely this is independent > of your conversion of the lock, and hence wants to be a prereq fix > (which may in fact want/need backporting). First of all, the implementation of msix_get is wrong and needs to be: /* * Note: if vpci_msix found, then this function returns with * pdev->vpci_lock held. Use msix_put to unlock. */ static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr) { struct vpci_msix *msix; list_for_each_entry ( msix, >arch.hvm.msix_tables, next ) { const struct vpci_bar *bars; unsigned int i; spin_lock(>pdev->vpci_lock); if ( !msix->pdev->vpci ) { spin_unlock(>pdev->vpci_lock); continue; } bars = msix->pdev->vpci->header.bars; for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ ) if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled && VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) ) return msix; spin_unlock(>pdev->vpci_lock); } return NULL; } Then, both msix_{read|write} can dereference msix->pdev->vpci early, this is why Roger suggested we move to msix_{get|put} here. And yes, we grow the locked region here and yes this might want a prereq fix. Or just be fixed while at it. > @@ -327,7 +334,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int
Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole
Hi, On 04/02/2022 09:47, Oleksandr Andrushchenko wrote: Could you please help me with the exact message you would like to see? Here a summary of the discussion (+ some my follow-up thoughts): is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c "xen/pci: detect when BARs are not suitably positioned") to check whether the BAR are positioned outside of a valid memory range. This was introduced to work-around quirky firmware. In theory, this could also happen on Arm. In practice, this may not happen but it sounds better to sanity check that the BAR contains "valid" I/O range. On x86, this is implemented by checking the region is not described is in the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined ranges. So I think it would be possible to implement is_memory_hole() by going through the list of hostbridges and check the ranges. But first, I'd like to confirm my understanding with Rahul, and others. If we were going to go this route, I would also rename the function to be better match what it is doing (i.e. it checks the BAR is correctly placed). As a potentially optimization/hardening for Arm, we could pass the hostbridge so we don't have to walk all of them. It seems this needs to live in the commit message then? So, it is easy to find as everything after "---" is going to be dropped on commit I expect the function to be fully implemented before this is will be merged. So if it is fully implemented, then a fair chunk of what I wrote would not be necessary to carry in the commit message. Cheers, -- Julien Grall
Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field
On Fri, Feb 04, 2022 at 10:30:54AM +0100, Jan Beulich wrote: > On 04.02.2022 10:23, Roger Pau Monné wrote: > > On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote: > >> On 20.01.2022 16:23, Roger Pau Monne wrote: > >>> --- a/xen/arch/x86/include/asm/msi.h > >>> +++ b/xen/arch/x86/include/asm/msi.h > >>> @@ -54,6 +54,7 @@ > >>> #define MSI_ADDR_DEST_ID_SHIFT 12 > >>> #define MSI_ADDR_DEST_ID_MASK 0x00ff000 > >>> #define MSI_ADDR_DEST_ID(dest) (((dest) << > >>> MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) > >>> +#define MSI_ADDR_EXT_DEST_ID_MASK 0xfe0 > >> > >> Especially the immediately preceding macro now becomes kind of stale. > > > > Hm, I'm not so sure about that. We could expand the macro to place the > > high bits in dest at bits 11:4 of the resulting address. However that > > macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own > > messages, so until we add support for the hypervisor itself to use the > > extended destination ID mode there's not much point in modifying the > > macro IMO. > > Well, this is all unhelpful considering the different form of extended > ID in Intel's doc. At least by way of a comment things need clarifying > and potential pitfalls need pointing out imo. Sure, will add some comments there. > >>> --- a/xen/include/public/domctl.h > >>> +++ b/xen/include/public/domctl.h > >>> @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { > >>> #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 > >>> #define XEN_DOMCTL_VMSI_X86_TRIG_MASK0x008000 > >>> #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x01 > >>> +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe > >> > >> I'm not convinced it is a good idea to limit the overall destination > >> ID width to 15 bits here - at the interface level we could as well > >> permit more bits right away; the implementation would reject too high > >> a value, of course. Not only with this I further wonder whether the > >> field shouldn't be unsplit while extending it. You won't get away > >> without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was > >> bumped already for 4.17) since afaics the unused bits of this field > >> previously weren't checked for being zero. We could easily have 8 > >> bits vector, 16 bits flags, and 32 bits destination ID in the struct. > >> And there would then still be 8 unused bits (which from now on we > >> ought to check for being zero). > > > > So I've made gflags a 64bit field, used the high 32bits for the > > destination ID, and the low ones for flags. I've left gvec as a > > separate field in the struct, as I don't want to require a > > modification to QEMU, just a rebuild against the updated headers will > > be enough. > > Hmm, wait - if qemu uses this without going through a suitable > abstraction in at least libxc, then we cannot _ever_ change this > interface: If a rebuild was required, old qemu binaries would > stop working with newer Xen. If such a dependency indeed exists, > we'll need a prominent warning comment in the public header. Hm, it's bad. The xc_domain_update_msi_irq interface uses a gflags parameter that's the gflags parameter of xen_domctl_bind_pt_irq. Which is even worse because it's not using the mask definitions from domctl.h, but rather a copy of them named XEN_PT_GFLAGS_* that are hardcoded in xen_pt_msi.c in QEMU code. So we can likely expand the layout of gflags, but moving fields is not an option. I think my original proposal of adding a XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK mask is the less bad option until we add a new stable interface for device passthrough for QEMU. Thanks, Roger.
Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole
On 04.02.22 11:41, Julien Grall wrote: > On 04/02/2022 09:01, Oleksandr Andrushchenko wrote: >> On 04.02.22 10:51, Julien Grall wrote: >>> Hi, >>> >>> On 04/02/2022 06:34, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko Add a stub for is_memory_hole which is required for PCI passthrough on Arm. Signed-off-by: Oleksandr Andrushchenko --- Cc: Julien Grall Cc: Stefano Stabellini --- New in v6 --- xen/arch/arm/mm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index b1eae767c27c..c32e34a182a2 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void) return max_page - 1; } +bool is_memory_hole(mfn_t start, mfn_t end) +{ + /* TODO: this needs to be properly implemented. */ >>> >>> I was hoping to see a summary of the discussion from IRC somewhere in the >>> patch (maybe after ---). This would help to bring up to speed the others >>> that were not on IRC. >> I am not quite sure what needs to be put here as the summary > > At least some details on why this is a TODO. Is it because you are unsure of > the implementation? Is it because you wanted to send early?... > > IOW, what are you expecting from the reviewers? Well, I just need to allow PCI passthrough to be built on Arm at the moment. Clearly, without this stub I can't do so. This is the only intention now. Of course, while PCI passthrough on Arm is still not really enabled those who want trying it will need reverting the offending patch otherwise. I am fine both ways > >> Could you please help me with the exact message you would like to see? > > Here a summary of the discussion (+ some my follow-up thoughts): > > is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c > "xen/pci: detect when BARs are not suitably positioned") to check whether the > BAR are positioned outside of a valid memory range. This was introduced to > work-around quirky firmware. > > In theory, this could also happen on Arm. In practice, this may not happen > but it sounds better to sanity check that the BAR contains "valid" I/O range. > > On x86, this is implemented by checking the region is not described is in the > e820. IIUC, on Arm, the BARs have to be positioned in pre-defined ranges. So > I think it would be possible to implement is_memory_hole() by going through > the list of hostbridges and check the ranges. > > But first, I'd like to confirm my understanding with Rahul, and others. > > If we were going to go this route, I would also rename the function to be > better match what it is doing (i.e. it checks the BAR is correctly placed). > As a potentially optimization/hardening for Arm, we could pass the hostbridge > so we don't have to walk all of them. It seems this needs to live in the commit message then? So, it is easy to find as everything after "---" is going to be dropped on commit > > Cheers, > Thank you, Oleksandr
Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole
On 04/02/2022 09:01, Oleksandr Andrushchenko wrote: On 04.02.22 10:51, Julien Grall wrote: Hi, On 04/02/2022 06:34, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko Add a stub for is_memory_hole which is required for PCI passthrough on Arm. Signed-off-by: Oleksandr Andrushchenko --- Cc: Julien Grall Cc: Stefano Stabellini --- New in v6 --- xen/arch/arm/mm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index b1eae767c27c..c32e34a182a2 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void) return max_page - 1; } +bool is_memory_hole(mfn_t start, mfn_t end) +{ + /* TODO: this needs to be properly implemented. */ I was hoping to see a summary of the discussion from IRC somewhere in the patch (maybe after ---). This would help to bring up to speed the others that were not on IRC. I am not quite sure what needs to be put here as the summary At least some details on why this is a TODO. Is it because you are unsure of the implementation? Is it because you wanted to send early?... IOW, what are you expecting from the reviewers? Could you please help me with the exact message you would like to see? Here a summary of the discussion (+ some my follow-up thoughts): is_memory_hole() was recently introduced on x86 (see commit 75cc460a1b8c "xen/pci: detect when BARs are not suitably positioned") to check whether the BAR are positioned outside of a valid memory range. This was introduced to work-around quirky firmware. In theory, this could also happen on Arm. In practice, this may not happen but it sounds better to sanity check that the BAR contains "valid" I/O range. On x86, this is implemented by checking the region is not described is in the e820. IIUC, on Arm, the BARs have to be positioned in pre-defined ranges. So I think it would be possible to implement is_memory_hole() by going through the list of hostbridges and check the ranges. But first, I'd like to confirm my understanding with Rahul, and others. If we were going to go this route, I would also rename the function to be better match what it is doing (i.e. it checks the BAR is correctly placed). As a potentially optimization/hardening for Arm, we could pass the hostbridge so we don't have to walk all of them. Cheers, -- Julien Grall
Re: [PATCH v3 2/2] x86/mm: tidy XENMEM_{get,set}_pod_target handling
On 04.02.2022 10:28, Roger Pau Monné wrote: > On Wed, Feb 02, 2022 at 04:29:37PM +0100, Jan Beulich wrote: >> On 02.02.2022 16:14, Roger Pau Monné wrote: >>> On Tue, Jan 04, 2022 at 10:41:53AM +0100, Jan Beulich wrote: Do away with the "pod_target_out_unlock" label. In particular by folding if()-s, the logic can be expressed with less code (and no goto-s) this way. Limit scope of "p2m", constifying it at the same time. >>> >>> Is this stale? I cannot find any reference to a p2m variable in the >>> chunks below. >> >> Indeed it is, leftover from rebasing over the introduction of >> p2m_pod_get_mem_target() in what is now patch 1. Dropped. > > I'm happy with this change with the commit adjusted: > > Acked-by: Roger Pau Monné Thanks. > Not sure if you can commit this now regardless of patch 1? I think it can be moved ahead; there looks to be only a minor contextual dependency. Jan
Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field
On 04.02.2022 10:23, Roger Pau Monné wrote: > On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote: >> On 20.01.2022 16:23, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/include/asm/msi.h >>> +++ b/xen/arch/x86/include/asm/msi.h >>> @@ -54,6 +54,7 @@ >>> #define MSI_ADDR_DEST_ID_SHIFT 12 >>> #define MSI_ADDR_DEST_ID_MASK 0x00ff000 >>> #define MSI_ADDR_DEST_ID(dest)(((dest) << >>> MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) >>> +#define MSI_ADDR_EXT_DEST_ID_MASK 0xfe0 >> >> Especially the immediately preceding macro now becomes kind of stale. > > Hm, I'm not so sure about that. We could expand the macro to place the > high bits in dest at bits 11:4 of the resulting address. However that > macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own > messages, so until we add support for the hypervisor itself to use the > extended destination ID mode there's not much point in modifying the > macro IMO. Well, this is all unhelpful considering the different form of extended ID in Intel's doc. At least by way of a comment things need clarifying and potential pitfalls need pointing out imo. >>> --- a/xen/include/public/domctl.h >>> +++ b/xen/include/public/domctl.h >>> @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { >>> #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 >>> #define XEN_DOMCTL_VMSI_X86_TRIG_MASK0x008000 >>> #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x01 >>> +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe >> >> I'm not convinced it is a good idea to limit the overall destination >> ID width to 15 bits here - at the interface level we could as well >> permit more bits right away; the implementation would reject too high >> a value, of course. Not only with this I further wonder whether the >> field shouldn't be unsplit while extending it. You won't get away >> without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was >> bumped already for 4.17) since afaics the unused bits of this field >> previously weren't checked for being zero. We could easily have 8 >> bits vector, 16 bits flags, and 32 bits destination ID in the struct. >> And there would then still be 8 unused bits (which from now on we >> ought to check for being zero). > > So I've made gflags a 64bit field, used the high 32bits for the > destination ID, and the low ones for flags. I've left gvec as a > separate field in the struct, as I don't want to require a > modification to QEMU, just a rebuild against the updated headers will > be enough. Hmm, wait - if qemu uses this without going through a suitable abstraction in at least libxc, then we cannot _ever_ change this interface: If a rebuild was required, old qemu binaries would stop working with newer Xen. If such a dependency indeed exists, we'll need a prominent warning comment in the public header. Jan > I've been wondering about this interface though > (xen_domctl_bind_pt_irq), and it would seem better to just pass the > raw MSI address and data fields from the guest and let Xen do the > decoding. This however is not trivial to do as we would need to keep > the previous interface anyway as it's used by QEMU. Maybe we could > have some kind of union between a pair of address and data fields and > a gflags one that would match the native layout, but as said not > trivial (and would require using anonymous unions which I'm not sure > are accepted even for domctls in the public headers). > > Thanks, Roger. >
Re: [PATCH v3 2/2] x86/mm: tidy XENMEM_{get,set}_pod_target handling
On Wed, Feb 02, 2022 at 04:29:37PM +0100, Jan Beulich wrote: > On 02.02.2022 16:14, Roger Pau Monné wrote: > > On Tue, Jan 04, 2022 at 10:41:53AM +0100, Jan Beulich wrote: > >> Do away with the "pod_target_out_unlock" label. In particular by folding > >> if()-s, the logic can be expressed with less code (and no goto-s) this > >> way. > >> > >> Limit scope of "p2m", constifying it at the same time. > > > > Is this stale? I cannot find any reference to a p2m variable in the > > chunks below. > > Indeed it is, leftover from rebasing over the introduction of > p2m_pod_get_mem_target() in what is now patch 1. Dropped. I'm happy with this change with the commit adjusted: Acked-by: Roger Pau Monné Not sure if you can commit this now regardless of patch 1? Thanks, Roger.
Re: [PATCH] x86/Xen: streamline (and fix) PV CPU enumeration
On 01.02.22 11:57, Jan Beulich wrote: This started out with me noticing that "dom0_max_vcpus=" with larger than the number of physical CPUs reported through ACPI tables would not bring up the "excess" vCPU-s. Addressing this is the primary purpose of the change; CPU maps handling is being tidied only as far as is necessary for the change here (with the effect of also avoiding the setting up of too much per-CPU infrastructure, i.e. for CPUs which can never come online). Noticing that xen_fill_possible_map() is called way too early, whereas xen_filter_cpu_maps() is called too late (after per-CPU areas were already set up), and further observing that each of the functions serves only one of Dom0 or DomU, it looked like it was better to simplify this. Use the .get_smp_config hook instead, uniformly for Dom0 and DomU. xen_fill_possible_map() can be dropped altogether, while xen_filter_cpu_maps() is re-purposed but not otherwise changed. Signed-off-by: Jan Beulich Pushed to xen/tip.git for-linus-5.17a Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: x86: insn-eval.c's use of native_store_gdt()
On 30.11.2021 12:09, Jan Beulich wrote: > I think it is b968e84b509d ("x86/iopl: Fake iopl(3) CLI/STI usage") > which uncovered an issue with get_desc() trying to access the GDT, as > introduced by 670f928ba09b ("x86/insn-eval: Add utility function to > get segment descriptor"). When running in a PV domain under Xen, the > (hypervisor's) GDT isn't accessible; with UMIP enabled by Xen even > SGDT wouldn't work, as the kernel runs in ring 3. > > There's currently no hypercall to retrieve a descriptor from the GDT. > It is instead assumed that kernels know where their present GDT > lives. Can the native_store_gdt() be replaced there, please? > > For context (I don't think it should matter much here) I'm observing > this with the kernel put underneath a rather old distro, where > hwclock triggers this path. I'd like to note that the issue still exists in 5.16. Jan
Re: [PATCH v2] Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF
On 31.01.22 18:23, Demi Marie Obenour wrote: The current implementation of gntdev guarantees that the first call to IOCTL_GNTDEV_MAP_GRANT_REF will set @index to 0. This is required to use gntdev for Wayland, which is a future desire of Qubes OS. Additionally, requesting zero grants results in an error, but this was not documented either. Document both of these. Signed-off-by: Demi Marie Obenour Pushed to xen/tip.git for-linus-5.17a Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v2] xen: update missing ioctl magic numers documentation
On 31.01.22 17:19, Randy Dunlap wrote: Add missing ioctl "magic numbers" for various Xen interfaces (xenbus_dev.h, gntalloc.h, gntdev.h, and privcmd.h). Signed-off-by: Randy Dunlap Pushed to xen/tip.git for-linus-5.17a Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] xen: xenbus_dev.h: delete incorrect file name
On 30.01.22 20:17, Randy Dunlap wrote: It is better/preferred not to include file names in source files because (a) they are not needed and (b) they can be incorrect, so just delete this incorrect file name. Signed-off-by: Randy Dunlap Pushed to xen/tip.git for-linus-5.17a Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 2/3] x86/vmsi: add support for extended destination ID in address field
On Mon, Jan 24, 2022 at 02:47:58PM +0100, Jan Beulich wrote: > On 20.01.2022 16:23, Roger Pau Monne wrote: > > --- a/xen/arch/x86/include/asm/msi.h > > +++ b/xen/arch/x86/include/asm/msi.h > > @@ -54,6 +54,7 @@ > > #define MSI_ADDR_DEST_ID_SHIFT 12 > > #define MSI_ADDR_DEST_ID_MASK 0x00ff000 > > #define MSI_ADDR_DEST_ID(dest)(((dest) << > > MSI_ADDR_DEST_ID_SHIFT) & MSI_ADDR_DEST_ID_MASK) > > +#define MSI_ADDR_EXT_DEST_ID_MASK 0xfe0 > > Especially the immediately preceding macro now becomes kind of stale. Hm, I'm not so sure about that. We could expand the macro to place the high bits in dest at bits 11:4 of the resulting address. However that macro (MSI_ADDR_DEST_ID) is only used by Xen to compose its own messages, so until we add support for the hypervisor itself to use the extended destination ID mode there's not much point in modifying the macro IMO. > > > --- a/xen/drivers/passthrough/x86/hvm.c > > +++ b/xen/drivers/passthrough/x86/hvm.c > > @@ -269,7 +269,7 @@ int pt_irq_create_bind( > > { > > case PT_IRQ_TYPE_MSI: > > { > > -uint8_t dest, delivery_mode; > > +unsigned int dest, delivery_mode; > > bool dest_mode; > > If you touch delivery_mode's type, wouldn't that better become bool? > > > --- a/xen/include/public/domctl.h > > +++ b/xen/include/public/domctl.h > > @@ -588,6 +588,7 @@ struct xen_domctl_bind_pt_irq { > > #define XEN_DOMCTL_VMSI_X86_DELIV_MASK 0x007000 > > #define XEN_DOMCTL_VMSI_X86_TRIG_MASK0x008000 > > #define XEN_DOMCTL_VMSI_X86_UNMASKED 0x01 > > +#define XEN_DOMCTL_VMSI_X86_EXT_DEST_ID_MASK 0xfe > > I'm not convinced it is a good idea to limit the overall destination > ID width to 15 bits here - at the interface level we could as well > permit more bits right away; the implementation would reject too high > a value, of course. Not only with this I further wonder whether the > field shouldn't be unsplit while extending it. You won't get away > without bumping XEN_DOMCTL_INTERFACE_VERSION anyway (unless it was > bumped already for 4.17) since afaics the unused bits of this field > previously weren't checked for being zero. We could easily have 8 > bits vector, 16 bits flags, and 32 bits destination ID in the struct. > And there would then still be 8 unused bits (which from now on we > ought to check for being zero). So I've made gflags a 64bit field, used the high 32bits for the destination ID, and the low ones for flags. I've left gvec as a separate field in the struct, as I don't want to require a modification to QEMU, just a rebuild against the updated headers will be enough. I've been wondering about this interface though (xen_domctl_bind_pt_irq), and it would seem better to just pass the raw MSI address and data fields from the guest and let Xen do the decoding. This however is not trivial to do as we would need to keep the previous interface anyway as it's used by QEMU. Maybe we could have some kind of union between a pair of address and data fields and a gflags one that would match the native layout, but as said not trivial (and would require using anonymous unions which I'm not sure are accepted even for domctls in the public headers). Thanks, Roger.
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > On 04.02.22 09:52, Jan Beulich wrote: >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, >>> uint16_t cmd, bool rom_only) >>> continue; >>> } >>> >>> +spin_lock(>vpci_lock); >>> +if ( !tmp->vpci ) >>> +{ >>> +spin_unlock(>vpci_lock); >>> +continue; >>> +} >>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) >>> { >>> const struct vpci_bar *bar = >vpci->header.bars[i]; >>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, >>> uint16_t cmd, bool rom_only) >>> rc = rangeset_remove_range(mem, start, end); >>> if ( rc ) >>> { >>> +spin_unlock(>vpci_lock); >>> printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: >>> %d\n", >>> start, end, rc); >>> rangeset_destroy(mem); >>> return rc; >>> } >>> } >>> +spin_unlock(>vpci_lock); >>> } >> At the first glance this simply looks like another unjustified (in the >> description) change, as you're not converting anything here but you >> actually add locking (and I realize this was there before, so I'm sorry >> for not pointing this out earlier). > Well, I thought that the description already has "...the lock can be > used (and in a few cases is used right away) to check whether vpci > is present" and this is enough for such uses as here. >> But then I wonder whether you >> actually tested this, since I can't help getting the impression that >> you're introducing a live-lock: The function is called from cmd_write() >> and rom_write(), which in turn are called out of vpci_write(). Yet that >> function already holds the lock, and the lock is not (currently) >> recursive. (For the 3rd caller of the function - init_bars() - otoh >> the locking looks to be entirely unnecessary.) > Well, you are correct: if tmp != pdev then it is correct to acquire > the lock. But if tmp == pdev and rom_only == true > then we'll deadlock. > > It seems we need to have the locking conditional, e.g. only lock > if tmp != pdev Which will address the live-lock, but introduce ABBA deadlock potential between the two locks. >>> @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long >>> addr, unsigned int len, >>> break; >>> } >>> >>> +msix_put(msix); >>> return X86EMUL_OKAY; >>> } >>> >>> -spin_lock(>pdev->vpci->lock); >>> entry = get_entry(msix, addr); >>> offset = addr & (PCI_MSIX_ENTRY_SIZE - 1); >> You're increasing the locked region quite a bit here. If this is really >> needed, it wants explaining. And if this is deemed acceptable as a >> "side effect", it wants justifying or at least stating imo. Same for >> msix_write() then, obviously. > Yes, I do increase the locking region here, but the msix variable needs > to be protected all the time, so it seems to be obvious that it remains > under the lock What does the msix variable have to do with the vPCI lock? If you see a need to grow the locked region here, then surely this is independent of your conversion of the lock, and hence wants to be a prereq fix (which may in fact want/need backporting). >>> @@ -327,7 +334,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, >>> unsigned int size) >>> if ( !pdev ) >>> return vpci_read_hw(sbdf, reg, size); >>> >>> -spin_lock(>vpci->lock); >>> +spin_lock(>vpci_lock); >>> +if ( !pdev->vpci ) >>> +{ >>> +spin_unlock(>vpci_lock); >>> +return vpci_read_hw(sbdf, reg, size); >>> +} >> Didn't you say you would add justification of this part of the change >> (and its vpci_write() counterpart) to the description? > Again, I am referring to the commit message as described above No, sorry - that part applies only to what inside the parentheses of if(). But on the intermediate version (post-v5 in a 4-patch series) I did say: "In this case as well as in its write counterpart it becomes even more important to justify (in the description) the new behavior. It is not obvious at all that the absence of a struct vpci should be taken as an indication that the underlying device needs accessing instead. This also cannot be inferred from the "!pdev" case visible in context. In that case we have no record of a device at this SBDF, and hence the fallback pretty clearly is a "just in case" one. Yet if we know of a device, the absence of a struct vpci may mean various possible things." If it wasn't obvious: The comment was on the use of vpci_read_hw() on this path, not redundant with the earlier one regarding the added "is vpci non-NULL" in a few places. Jan
[qemu-mainline test] 168000: tolerable FAIL - PUSHED
flight 168000 qemu-mainline real [real] flight 168007 qemu-mainline real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/168000/ http://logs.test-lab.xenproject.org/osstest/logs/168007/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 168007-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 167991 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 167991 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 167991 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeatfail like 167991 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 167991 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 167991 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 167991 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 167991 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 167991 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass version targeted for testing: qemuu8f3e5ce773c62bb5c4a847f3a9a5c98bbb3b359f baseline version: qemuu
Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole
Hi, Julien! On 04.02.22 10:51, Julien Grall wrote: > Hi, > > On 04/02/2022 06:34, Oleksandr Andrushchenko wrote: >> From: Oleksandr Andrushchenko >> >> Add a stub for is_memory_hole which is required for PCI passthrough >> on Arm. >> >> Signed-off-by: Oleksandr Andrushchenko >> >> --- >> Cc: Julien Grall >> Cc: Stefano Stabellini >> --- >> New in v6 >> --- >> xen/arch/arm/mm.c | 6 ++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c >> index b1eae767c27c..c32e34a182a2 100644 >> --- a/xen/arch/arm/mm.c >> +++ b/xen/arch/arm/mm.c >> @@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void) >> return max_page - 1; >> } >> +bool is_memory_hole(mfn_t start, mfn_t end) >> +{ >> + /* TODO: this needs to be properly implemented. */ > > I was hoping to see a summary of the discussion from IRC somewhere in the > patch (maybe after ---). This would help to bring up to speed the others that > were not on IRC. I am not quite sure what needs to be put here as the summary Could you please help me with the exact message you would like to see? > >> + return true; >> +} >> + >> /* >> * Local variables: >> * mode: C > Thank you, Oleksandr
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
Hi, Jan! On 04.02.22 09:52, Jan Beulich wrote: > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, >> uint16_t cmd, bool rom_only) >> continue; >> } >> >> +spin_lock(>vpci_lock); >> +if ( !tmp->vpci ) >> +{ >> +spin_unlock(>vpci_lock); >> +continue; >> +} >> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) >> { >> const struct vpci_bar *bar = >vpci->header.bars[i]; >> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, >> uint16_t cmd, bool rom_only) >> rc = rangeset_remove_range(mem, start, end); >> if ( rc ) >> { >> +spin_unlock(>vpci_lock); >> printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: >> %d\n", >> start, end, rc); >> rangeset_destroy(mem); >> return rc; >> } >> } >> +spin_unlock(>vpci_lock); >> } > At the first glance this simply looks like another unjustified (in the > description) change, as you're not converting anything here but you > actually add locking (and I realize this was there before, so I'm sorry > for not pointing this out earlier). Well, I thought that the description already has "...the lock can be used (and in a few cases is used right away) to check whether vpci is present" and this is enough for such uses as here. > But then I wonder whether you > actually tested this, since I can't help getting the impression that > you're introducing a live-lock: The function is called from cmd_write() > and rom_write(), which in turn are called out of vpci_write(). Yet that > function already holds the lock, and the lock is not (currently) > recursive. (For the 3rd caller of the function - init_bars() - otoh > the locking looks to be entirely unnecessary.) Well, you are correct: if tmp != pdev then it is correct to acquire the lock. But if tmp == pdev and rom_only == true then we'll deadlock. It seems we need to have the locking conditional, e.g. only lock if tmp != pdev > > Then again this was present already even in Roger's original patch, so > I guess I must be missing something ... > >> --- a/xen/drivers/vpci/msix.c >> +++ b/xen/drivers/vpci/msix.c >> @@ -138,7 +138,7 @@ static void control_write(const struct pci_dev *pdev, >> unsigned int reg, >> pci_conf_write16(pdev->sbdf, reg, val); >> } >> >> -static struct vpci_msix *msix_find(const struct domain *d, unsigned long >> addr) >> +static struct vpci_msix *msix_get(const struct domain *d, unsigned long >> addr) >> { >> struct vpci_msix *msix; >> >> @@ -150,15 +150,29 @@ static struct vpci_msix *msix_find(const struct domain >> *d, unsigned long addr) >> for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ ) >> if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled && >>VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) ) >> +{ >> +spin_lock(>pdev->vpci_lock); >> return msix; >> +} > I think deliberately returning with a lock held requires a respective > comment ahead of the function. Ok, will add a comment > >> } >> >> return NULL; >> } >> >> +static void msix_put(struct vpci_msix *msix) >> +{ >> +if ( !msix ) >> +return; >> + >> +spin_unlock(>pdev->vpci_lock); >> +} > Maybe shorter > > if ( msix ) > spin_unlock(>pdev->vpci_lock); Looks good > > ? Yet there's only one case where you may pass NULL in here, so > maybe it's better anyway to move the conditional ... > >> static int msix_accept(struct vcpu *v, unsigned long addr) >> { >> -return !!msix_find(v->domain, addr); >> +struct vpci_msix *msix = msix_get(v->domain, addr); >> + >> +msix_put(msix); >> +return !!msix; >> } > ... here? Yes, I can have that check here, but what if there is yet another caller of the same? I am not sure whether it is better to have the check in msix_get or at the caller site. At the moment (with a single place with NULL possible) I can move the check. @Roger? > >> @@ -186,7 +200,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, >> unsigned int len, >>unsigned long *data) >> { >> const struct domain *d = v->domain; >> -struct vpci_msix *msix = msix_find(d, addr); >> +struct vpci_msix *msix = msix_get(d, addr); >> const struct vpci_msix_entry *entry; >> unsigned int offset; >> >> @@ -196,7 +210,10 @@ static int msix_read(struct vcpu *v, unsigned long >> addr, unsigned int len, >> return X86EMUL_RETRY; >> >> if ( !access_allowed(msix->pdev, addr, len) ) >> +{ >> +msix_put(msix); >> return X86EMUL_OKAY; >> +} >> >> if (
Re: [PATCH v6 01/13] xen/pci: arm: add stub for is_memory_hole
Hi, On 04/02/2022 06:34, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko Add a stub for is_memory_hole which is required for PCI passthrough on Arm. Signed-off-by: Oleksandr Andrushchenko --- Cc: Julien Grall Cc: Stefano Stabellini --- New in v6 --- xen/arch/arm/mm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index b1eae767c27c..c32e34a182a2 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1640,6 +1640,12 @@ unsigned long get_upper_mfn_bound(void) return max_page - 1; } +bool is_memory_hole(mfn_t start, mfn_t end) +{ +/* TODO: this needs to be properly implemented. */ I was hoping to see a summary of the discussion from IRC somewhere in the patch (maybe after ---). This would help to bring up to speed the others that were not on IRC. +return true; +} + /* * Local variables: * mode: C -- Julien Grall
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
On 04.02.2022 09:13, Oleksandr Andrushchenko wrote: > On 04.02.22 09:52, Jan Beulich wrote: >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >> >> At the first glance this simply looks like another unjustified (in the >> description) change, as you're not converting anything here but you >> actually add locking (and I realize this was there before, so I'm sorry >> for not pointing this out earlier). But then I wonder whether you >> actually tested this > This is already stated in the cover letter that I have tested two x86 > configurations and tested that on Arm... Okay, I'm sorry then. But could you then please point out where I'm wrong with my analysis? Jan
Re: [PATCH] tools/guest: Fix comment regarding CPUID compatibility
On 03.02.2022 19:10, Andrew Cooper wrote: > It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13 > that we need to worry about with regards to compatibility. Xen 4.12 isn't > relevant. > > Expand and correct the commentary. > > Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore") But doesn't this commit amend 685e922d6f30 ("tools/libxc: Rework xc_cpuid_apply_policy() to use {get,set}_cpu_policy()"), which is where DEF_MAX_* disappeared? That was a 4.13 change, and iirc that was also why the commit above moved the 4.13/4.14 boundary related comment from outside of the if() to inside it. While looking at this, wasn't Roger's change incomplete, in that for Intel the extended leaf upper bound was 0x8008 in 4.12? Jan
Re: [PATCH v6 12/13] xen/arm: translate virtual PCI bus topology for guests
Hi, Jan! On 04.02.22 09:56, Jan Beulich wrote: > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >> --- a/xen/drivers/vpci/vpci.c >> +++ b/xen/drivers/vpci/vpci.c >> @@ -168,6 +168,35 @@ static void vpci_remove_virtual_device(struct domain *d, >> pdev->vpci->guest_sbdf.sbdf = ~0; >> } >> >> +/* >> + * Find the physical device which is mapped to the virtual device >> + * and translate virtual SBDF to the physical one. >> + */ >> +bool vpci_translate_virtual_device(const struct domain *d, pci_sbdf_t *sbdf) >> +{ >> +struct pci_dev *pdev; >> + >> +ASSERT(!is_hardware_domain(d)); > In addition to this, don't you also need to assert that pcidevs_lock is > held (or if it isn't, you'd need to acquire it) for ... > >> +for_each_pdev( d, pdev ) > ... this to be race-free? Yes, you are right and this needs pcidevs_lock(); Will add > > Jan > Thank you, Oleksandr
Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci
Hi, Jan! On 04.02.22 09:52, Jan Beulich wrote: > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > > At the first glance this simply looks like another unjustified (in the > description) change, as you're not converting anything here but you > actually add locking (and I realize this was there before, so I'm sorry > for not pointing this out earlier). But then I wonder whether you > actually tested this This is already stated in the cover letter that I have tested two x86 configurations and tested that on Arm... Would you like to see the relevant logs? Thank you, Oleksandr