[ovmf test] 185941: all pass - PUSHED
flight 185941 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/185941/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 2727231b0a6fb4c043479d132df4d36cf9f751c2 baseline version: ovmf 987bea6525d70cd01649472c93d19f89d41d83a2 Last test of basis 185937 2024-05-07 07:43:05 Z0 days Testing same since 185941 2024-05-08 01:57:46 Z0 days1 attempts People who touched revisions under test: Gerd Hoffmann Jiaxin Wu Jiewen Yao Ray Ni jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 987bea6525..2727231b0a 2727231b0a6fb4c043479d132df4d36cf9f751c2 -> xen-tested-master
[xen-unstable test] 185940: tolerable FAIL - PUSHED
flight 185940 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/185940/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185936 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185936 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185936 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 185936 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 185936 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185936 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185936 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-qcow214 migrate-support-checkfail never pass test-armhf-armhf-xl-qcow215 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail never pass version targeted for testing: xen a2330b51df267e20e66bbba6c5bf08f0570ed58b baseline version: xen ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 Last test of basis 185936 2024-05-07 05:53:32 Z0 days Testing same since 185940 2024-05-07 20:08:36 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Roger Pau Monné jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-arm64 pass build-armhf pass
Re: [PATCH 05/15] tools/libs/light: Increase nr_spi to 160
Hi Julien, On 5/7/2024 10:35 PM, Julien Grall wrote: Hi, On 06/05/2024 06:17, Henry Wang wrote: On 5/1/2024 9:58 PM, Anthony PERARD wrote: On Wed, Apr 24, 2024 at 11:34:39AM +0800, Henry Wang wrote: Increase number of spi to 160 i.e. gic_number_lines() for Xilinx ZynqMP - 32. This was done to allocate and assign IRQs to a running domain. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- tools/libs/light/libxl_arm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index dd5c9f4917..50dbd0f2a9 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -181,7 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, LOG(DEBUG, "Configure the domain"); - config->arch.nr_spis = nr_spis; + /* gic_number_lines() is 192 for Xilinx ZynqMP. min nr_spis = 192 - 32. */ + config->arch.nr_spis = MAX(nr_spis, 160); Is there a way that that Xen or libxl could find out what the minimum number of SPI needs to be? I am afraid currently there is none. Are we going to have to increase that minimum number every time a new platform comes along? It doesn't appear that libxl is using that `nr_spis` value and it is probably just given to Xen. So my guess is that Xen could simply take care of the minimum value, gic_number_lines() seems to be a Xen function. Xen will take care of the value of nr_spis for dom0 in create_dom0() dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) - 32; and also for dom0less domUs in create_domUs(). However, it looks like Xen will not take care of the mininum value for libxl guests, the value from config->arch.nr_spis in guest config file will be directly passed to the domain_vgic_init() function from arch_domain_create(). I agree with you that we shouldn't just bump the number everytime when we have a new platform. Therefore, would it be a good idea to move the logic in this patch to arch_sanitise_domain_config()? Xen domains are supposed to be platform agnostics and therefore the numbers of SPIs should not be based on the HW. Furthermore, with your proposal we would end up to allocate data structure for N SPIs when a domain may never needs any SPIs (such as if passthrough is not in-use). This is more likely for domain created by the toolstack than from Xen directly. Agreed on both comments. Instead, we should introduce a new XL configuration to let the user decide the number of SPIs. I would suggest to name "nr_spis" to match the DT bindings. Sure, I will introduce a new xl config for this to replace this patch. Thank you for the suggestion. Kind regards, Henry Cheers,
Re: [PATCH v6 8/8] xen: allow up to 16383 cpus
On Tue, 7 May 2024, Julien Grall wrote: > Hi Stefano, > > On 03/05/2024 20:07, Stefano Stabellini wrote: > > On Fri, 3 May 2024, Julien Grall wrote: > > [...] > > > > So are you saying that from Xen point of view, you are expecting no > > > difference > > > between 256 and 512. And therefore you would be happy if to backport > > > patches > > > if someone find differences (or even security issues) when using > 256 > > > pCPUs? > > > > It is difficult to be sure about anything that it is not regularly > > tested. I am pretty sure someone in the community got Xen running on an > > Ampere, so like you said 192 is a good number. However, that is not > > regularly tested, so we don't have any regression checks in gitlab-ci or > > OSSTest for it. > > > > One approach would be to only support things regularly tested either by > > OSSTest, Gitlab-ci, or also Xen community members. I am not sure what > > would be the highest number with this way of thinking but likely no > > more than 192, probably less. I don't know the CPU core count of the > > biggest ARM machine in OSSTest. > > This would be rochester* (Cavium Thunder-X). They have 96 pCPUs which, IIRC, > are split across two numa nodes. > > > > > Another approach is to support a "sensible" number: not something tested > > but something we believe it should work. No regular testing. (In safety, > > they only believe in things that are actually tested, so this would not > > be OK. But this is security, not safety, just FYI.) With this approach, > > we could round up the number to a limit we think it won't break. If 192 > > works, 256/512 should work? I don't know but couldn't think of something > > that would break going from 192 to 256. > > It depends what you mean by work/break. Strictly speaking, Xen should run > (i.e. not crash). However, it is unclear how well as if you increase the > number of physical CPUs, you will increase contention and may find some > bottleneck. > > I haven't done any performance testing with that many CPUs and I haven't seen > any so far with Xen. But I have some areas of concerns. > > * Xenstored: At least the C version is single-threaded. Technically the limit > here is not based on the number of pCPUs, but as you increase it, you > indirectly increase the number of domains that can run. I doubt it will behave > well if you have 4096 domains running (I am thinking about the x86 limit...). > > * Locking > * How Xen use the locks: I don't think we have many places where we have > global locks (one is the memory subsystem). If a lock is already taken, the > others will spin. It is unclear if we could high contending. > * How Xen implements the locks: At the moment, we are using LL/SC. My take > of XSA-295 is there is a lack of fairness with them. I am not sure what would > happen if they get contented (as we support more pCPUs). It is also probably > time to finally implement LSE atomics. > > * TLB flush: The TLB flush are broadcasted. There are some suggestions on the > Linux ML [1] that they don't perform well on some processors. The discussion > seems to have gone nowhere in Linux. But I think it is propably worth to take > into account when we decide to update the limit we (security) support. > > > > > It depends on how strict we want to be on testing requirements. > From above, I am rather worry about claiming that Xen can supports up to 256 > (and TBH even 192) without any proper testing. This could end up to backfire > as we may need to do (in a rush) and backport some rather large work (unless > we decide to remove support after the fact). I agree with everything you said and I would also add that is not just about backports: if we "support" something it is supposed to mean that we strongly believe it is working. I think we should only make that claim if we test regularly that configuration/feature. > I think I would prefer if we have a low number until someone can do some > testing (including potentially malicious guest). If we want for a > power-of-two, I would go with 128 because this is closer to the HW we have in > testing. If in the future someone can show some data on other platforms (e.g. > Ampere), then we can up the limit. I am OK with that. I wonder if we could use QEMU to add a test for this. > > I am not > > sure what approach was taken by x86 so far. > > It is unclear to me. I don't see how we can claim to support up to 4096 CPUs. > But that's for the x86 folks to decide. Until not long ago, many things were "supported" in many Open Source projects (including Linux, QEMU, etc.) without any automated tests at all. Maybe it is time to revisit this practice.
Re: [PATCH 02/15] xen/arm/gic: Enable interrupt assignment to running VM
Hi Henry, On 06/05/2024 09:32, Henry Wang wrote: On 5/1/2024 4:13 AM, Julien Grall wrote: Hi Henry, On 30/04/2024 04:50, Henry Wang wrote: On 4/25/2024 10:28 PM, Julien Grall wrote: Thanks for your feeedback. After checking the b8577547236f commit message I think I now understand your point. Do you have any suggestion about how can I properly add the support to route/remove the IRQ to running domains? Thanks. I spent some time going through the GIC/vGIC code and had some discussions with Stefano and Stewart during the last couple of days, let me see if I can describe the use case properly now to continue the discussion: We have some use cases that requires assigning devices to domains after domain boot time. For example, suppose there is an FPGA on the board which can simulate a device, and the bitstream for the FPGA is provided and programmed after domain boot. So we need a way to assign the device to the running domain. This series tries to implement this use case by using device tree overlay - users can firstly add the overlay to Xen dtb, assign the device in the overlay to a domain by the xl command, then apply the overlay to Linux. Thanks for the description! This helps to understand your goal :). Thank you very much for spending your time on discussing this and provide these valuable comments! I haven't really look at that code in quite a while. I think we need to make sure that the virtual and physical IRQ state matches at the time we do the routing. I am undecided on whether we want to simply prevent the action to happen or try to reset the state. There is also the question of what to do if the guest is enabling the vIRQ before it is routed. Sorry for bothering, would you mind elaborating a bit more about the two cases that you mentioned above? Commit b8577547236f ("xen/arm: Restrict when a physical IRQ can be routed/removed from/to a domain") only said there will be undesirable effects, so I am not sure if I understand the concerns raised above and the consequences of these two use cases. I will try to explain them below after I answer the rest. I am probably wrong, I think when we add the overlay, we are probably fine as the interrupt is not being used before. What if the DT overlay is unloaded and then reloaded? Wouldn't the same interrupt be re-used? As a more generic case, this could also be a new bitstream for the FPGA. But even if the interrupt is brand new every time for the DT overlay, you are effectively relaxing the check for every user (such as XEN_DOMCTL_bind_pt_irq). So the interrupt re-use case needs to be taken into account. I agree. I think IIUC, with your explanation here and below, could we simplify the problem to how to properly handle the removal of the IRQ from a running guest, if we always properly remove and clean up the information when remove the IRQ from the guest? In this way, the IRQ can always be viewed as a brand new one when we add it back. If we can make sure the virtual IRQ and physical IRQ is cleaned then yes. Then the only corner case that we need to take care of would be... Can you clarify whether you say the "only corner case" because you looked at the code? Or is it just because I mentioned only one? Also since we only load the device driver after the IRQ is routed to the guest, This is what a well-behave guest will do. However, we need to think what will happen if a guest misbehaves. I am not concerned about a guest only impacting itself, I am more concerned about the case where the rest of the system is impacted. I am not sure the guest can enable the vIRQ before it is routed. Xen allows the guest to enable a vIRQ even if there is no pIRQ assigned. Thanksfully, it looks like the vgic_connect_hw_irq(), in both the current and new vGIC, will return an error if we are trying to route a pIRQ to an already enabled vIRQ. But we need to investigate all the possible scenarios to make sure that any inconsistencies between the physical state and virtual state (including the LRs) will not result to bigger problem. The one that comes to my mind is: The physical interrupt is de-assigned from the guest before it was EOIed. In this case, the interrupt will still be in the LR with the HW bit set. This would allow the guest to EOI the interrupt even if it is routed to someone else. It is unclear what would be the impact on the other guest. ...same as this case, i.e. test_bit(_IRQ_INPROGRESS, >status) || !test_bit(_IRQ_DISABLED, >status)) when we try to remove the IRQ from a running domain. We already call ->shutdown() which will disable the IRQ. So don't we only need to take care of _IRQ_INPROGRESS? [...] we have 3 possible states which can be read from LR for this case : active, pending, pending and active. - I don't think we can do anything about the active state, so we should return -EBUSY and reject the whole operation of removing the IRQ from running
[xen-unstable-smoke test] 185939: tolerable all pass - PUSHED
flight 185939 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/185939/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen a2330b51df267e20e66bbba6c5bf08f0570ed58b baseline version: xen ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 Last test of basis 185926 2024-05-06 13:02:07 Z1 days Testing same since 185939 2024-05-07 17:03:58 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git ebab808eb1..a2330b51df a2330b51df267e20e66bbba6c5bf08f0570ed58b -> smoke
Re: [PATCH v3 6/9] xen/arm64: bpi: Add missing code symbol annotations
On 07/05/2024 17:55, Edgar E. Iglesias wrote: On Tue, May 7, 2024 at 11:57 AM Julien Grall wrote: Hi Julien, Hi Edgar, The reason I choose FUNC for the start of the symbol is because these symbols contain executable code (not only a table of pointers to code somewhere else) and the ELF spec says that STT_FUNC means the symbol contains functions or other executable code (not only callable functions IIUC): "STT_FUNC The symbol is associated with a function or other executable code." https://refspecs.linuxbase.org/elf/elf.pdf (Symbol Table 1-20). Thanks for the pointer. I originally did intend to suggest the change, but then I saw the use of LABEL in x86 (such as svm_stgi_label). There are a few others example with LABEL_LOCAL. AFAICT, this is also executable code which the only difference that it is not meant to be called by someone else. Furthermore, LABEL is using DO_CODE_ALIGN(...) for the alignment which imply that it is intended to be used by executable code. So I thought the only difference was whether the label was intended to be used as a function. I think using LABEL instead of GLOBAL for the _end labels of these code sequences makes sense. I'm happy to change the _start labels to LABEL too if you guys feel that's better. I have to admit I am little confused with the difference between LABEL vs FUNC. I think I will need some guidance from Jan (he introduced linkage.h). Cheers, -- Julien Grall
Re: [PATCH v4 15/17] xen: mapcache: Remove assumption of RAMBlock with 0 offset
On Thu, May 2, 2024 at 10:02 PM Stefano Stabellini wrote: > > On Thu, 2 May 2024, Edgar E. Iglesias wrote: > > On Thu, May 2, 2024 at 8:53 PM Stefano Stabellini > > wrote: > > > > > > +Xenia > > > > > > On Thu, 2 May 2024, Edgar E. Iglesias wrote: > > > > On Wed, May 1, 2024 at 11:24 PM Stefano Stabellini > > > > wrote: > > > > > > > > > > On Tue, 30 Apr 2024, Edgar E. Iglesias wrote: > > > > > > From: "Edgar E. Iglesias" > > > > > > > > > > > > The current mapcache assumes that all memory is mapped > > > > > > in a single RAM MR (the first one with offset 0). Remove > > > > > > this assumption and propagate the offset to the mapcache > > > > > > so it can do reverse mappings (from hostptr -> ram_addr). > > > > > > > > > > > > This is in preparation for adding grant mappings. > > > > > > > > > > > > Signed-off-by: Edgar E. Iglesias > > > > > > > > > > > > > > > Looking at xen_remap_bucket, it is only using address_index (without > > > > > adding ram_offset) to map foreign memory. From xen_remap_bucket, I > > > > > would > > > > > understand that address_index already includes the ram_offset. > > > > > > > > > > Meaning that if we want to map foreign mapping at address 0x5000, then > > > > > address_index would be 0x5000, even if ram_offset is 0x1000. > > > > > > > > > > But then looking xen_ram_addr_from_mapcache_single ram_offset is added > > > > > to paddr_index to calculate the physical address. So in that case we > > > > > would want address_index to be 0x4000 and ram_offset to be 0x1000. But > > > > > xen_remap_bucket would have to sum address_index and ram_offset to map > > > > > foreign memory. > > > > > > > > > > So I am a bit confused, did I get it wrong? One more comment below. > > > > > > > > > > > > > Thanks Stefano, > > > > > > > > I think the confusion is that this ram_addr_offset is not related to > > > > guest address-space. > > > > It's a QEMU internal thing and it shouldn't be included in the address > > > > used to map foreign memory. > > > > The mapcache can treat this ram_addr offset like a cookie that we keep > > > > around to be able to do > > > > reverse mappings from host pointers into ram_addr space > > > > (xen_ram_addr_from_mapcache). > > > > > > > > The current mapcache implementation works because we've really only > > > > been using foreign mappings > > > > on RAMBlocks with offset 0. We're also creating RAM's such that the > > > > offset into the RAM is also > > > > the guest physical address, for x86 this is natural since RAM starts > > > > at zero (for lowmem) but for > > > > ARM we're creating larger than needed RAM's (GUEST_RAM0_BASE + > > > > ram-size) to > > > > make this assumption true. Anyway, In this series I'm not addressing > > > > this second assumption. > > > > > > Let's see if I understand correctly. > > > > > > The ram_addr space is an internal QEMU address space which is different > > > from the guest physical address space and thus cannot and should not be > > > used to do foreign mappings (foreign mapping hypercalls take a guest > > > physical or a real physical address to map). Is that correct? > > > > > > If so, then I understand. > > > > > > > Yes, that matches my understanding. > > > > > > > > > > > > There's a second call in physmem.c to xen_map_cache using the > > > > block->offset as an address. > > > > I was considering removing that second call since I can't see how it > > > > can work > > > > (except perhaps in some specific use-case by luck?). Anyway, for now > > > > I've left it unmodified. > > > > > > Yes, that code was written with the assumption that block->offset is an > > > offset in the guest physical address space and could be used as a guest > > > physical address. Actually, you might have spotted a real bug. > > > > > > The intent was for smaller regions (not the bit RAM region, things like > > > a ROM region for instance) we could map them in full. So here we were > > > trying to map the whole thing from start to finish using block->offset > > > as start. > > > > > > > > > > > > --- > > > > > > hw/xen/xen-mapcache.c | 25 ++--- > > > > > > include/sysemu/xen-mapcache.h | 2 ++ > > > > > > system/physmem.c | 8 > > > > > > 3 files changed, 24 insertions(+), 11 deletions(-) > > > > > > > > > > > > diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c > > > > > > index 09b5f36d9c..1b32d0c003 100644 > > > > > > --- a/hw/xen/xen-mapcache.c > > > > > > +++ b/hw/xen/xen-mapcache.c > > > > > > @@ -43,6 +43,9 @@ typedef struct MapCacheEntry { > > > > > > #define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0) > > > > > > uint8_t flags; > > > > > > hwaddr size; > > > > > > + > > > > > > +/* Keep ram_addr offset for reverse mappings (hostptr -> > > > > > > ram_addr). */ > > > > > > +ram_addr_t ram_offset; > > > > > > struct MapCacheEntry *next; > > > > > > } MapCacheEntry; > > > > > > > > > > > > @@ -165,7 +168,8 @@ static void xen_remap_bucket(MapCache *mc, > > > >
Xen Security Advisory 457 v1 - Linux/xen-netback: Memory leak due to missing cleanup function
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory XSA-457 Linux/xen-netback: Memory leak due to missing cleanup function ISSUE DESCRIPTION = In netback, xennet_alloc_one_rx_buffer() failed to call the appropriate clean-up function, resulting in a memory leak. IMPACT == A malicious guest userspace process can exhaust memory resources within the guest kernel, potentially leading to a system crash (Denial of Service). It is not known whether it can be triggered remotely. VULNERABLE SYSTEMS == Systems with guests running Linux 5.9 and later with Xen PV network devices are affected. MITIGATION == For HVM guests, using emulated network devices will avoid this issue. RESOLUTION == The following patch in Linux resolves the issue: https://git.kernel.org/torvalds/c/037965402a010898d34f4e35327d22c0a95cd51f A copy of which has attached. xsa457.patch Linux 5.9 $ sha256sum xsa457* 9d6ae3da27f1ff92f9f45c800822beecda603d6dea6726207cee6c768416114c xsa457.patch $ NOTE ON THE LACK OF EMBARGO === The issue was reported initially on a public bug tracker and fixed in public before it was realized that there was a security aspect. -BEGIN PGP SIGNATURE- iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmY6YN8MHHBncEB4ZW4u b3JnAAoJEIP+FMlX6CvZq4kH/0BcaF/4dKqxQ/hYMMoLxcE1kzHn2kAdFPcvxcuu Csk1yLugbvxHgwgp0lI9JjiqzSMt68pN8B9mWbcMBBvA7jGGsJ6Vjp25kQnUToLe FPiAhW/TY+1YXOnhsfn9dHHk1Tv0W5D69QuUuj6zGUvRMdV+WPyA/mGPWnBrJgT+ 5s6tKFxls1JiLdFxuJKqi8Ok8HrX1zE9unSWEUri8SNE2k3h5i29X2v+S8yBv2y0 XBnzr16kL9KKim0sNSErB1QU5BThnDBCFk+7FKAAYGAv5H6N3VLv66DLARCYfPhP iXJU3/+yvAjwZjp5oYtbqHXzdd/m0b/IrF/0ZMLBaoDs0s4= =vfs6 -END PGP SIGNATURE- xsa457.patch Description: Binary data
Xen Security Advisory 456 v3 (CVE-2024-2201) - x86: Native Branch History Injection
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Xen Security Advisory CVE-2024-2201 / XSA-456 version 3 x86: Native Branch History Injection UPDATES IN VERSION 3 Issues were found with the original code changes. See the bottom of the Resolution section for how to obtain those. ISSUE DESCRIPTION = In August 2022, researchers at VU Amsterdam disclosed Spectre-BHB. Spectre-BHB was discussed in XSA-398. At the time, the susceptibility of Xen to Spectre-BHB was uncertain so no specific action was taken in XSA-398. However, various changes were made thereafter in upstream Xen as a consequence; more on these later. VU Amsterdam have subsequently adjusted the attack to be pulled off entirely from userspace, without the aid of a managed runtime in the victim context. For more details, see: https://vusec.net/projects/native-bhi https://vusec.net/projects/bhi-spectre-bhb https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html https://xenbits.xen.org/xsa/advisory-398.html IMPACT == An attacker might be able to infer the contents of arbitrary host memory, including memory assigned to other guests. VULNERABLE SYSTEMS == Systems running all versions of Xen are affected. Only Intel x86 CPUs are potentially affected. CPUs from other manufacturers are not known to be affected. A wide range of Intel CPUs employ Branch History prediction techniques. However for older CPUs existing Spectre-v2 mitigations (XSA-254) are believed to be sufficient to mitigate Native-BHI. Therefore, the rest of the discussion will be limited in scope to the CPUs for which a change in behaviour is expected. These are believed to be all CPUs with eIBRS (Enhanced IBRS, a.k.a. IBRS_ALL or IBRS_ATT). eIBRS signifies a hardware adjustment (mode-tagged indirect predictions) designed to combat Spectre-v2, available in CPUs from 2019 onwards. To determine if a system has eIBRS, run `xen-cpuid -v` in dom0, looking for the string "eibrs" in the Dynamic Raw block of information. e.g. # xen-cpuid -v ... Dynamic sets: Raw ... ... [16] MSR_ARCH_CAPS.lo ... eibrs ... ... ... Be aware that the Static sets are compile time information so will include the string "eibrs" irrespective of hardware support. If there is no row for "[16] MSR_ARCH_CAPS.lo" then the fixes for XSA-435 are missing. MITIGATION == There are no mitigations. CREDITS === This issue was discovered by VU Amsterdam. RESOLUTION == In Xen 4.17, in response to the original Spectre-BHB, CET-IBT support was added to Xen to use on capable hardware. It also came with work to remove unnecessary function pointers, and to de-virtualise function pointers at boot, as both a performance and hardening improvement. This work has been steadily continuing since, and every removed/de-virtualised function pointer reduces the options available to an adversary trying to mount a Native-BHI attack. All of this work has been backported to 4.17 and later for this advisory. Beginning with the Intel Alder Lake (Client) and Sapphire Rapids (Server) CPUs, a hardware control called BHI_DIS_S is available, which restricts history-based predictions. This control requires updated microcode on some CPUs. Look for "bhi-ctrl" in `xen-cpuid -v`, similar to eibrs above. Xen has been updated to use this control when available, and to virtualise it for guests to use. For CPUs without BHI_DIS_S, BHB clearing sequences need using. Out of an abundance of caution, all sequences in the Intel whitepaper have been implemented, although Xen will only use the "short" sequence by default. The others are available to opt in to. The work to mitigate Native-BHI is extensive, and the backports are more-extensive still. Therefore, we have decided to produce new releases on all stable trees. Please find fixes in the respective branches under the following release tags: RELEASE-4.18.2 RELEASE-4.17.4 RELEASE-4.16.6 RELEASE-4.15.6 Other release activities (tarballs, announcements, etc) will happen in due course. Issues were in those found subsequently. To address those, newer commits from the stable branches need updating to, in particular stable-4.15 05653eb44314cb90f2e3e7b2d405e86b5657 stable-4.16 d0e8f8ffbb19b5df5f767328baeb54c069b08e6a stable-4.17 effcf70f020ff12d34c80e2abde0ecb00ce92bda stable-4.18 f0ff1d9cb96041a84a24857a6464628240deed4f For 4.15, since we're closing the branch, RELEASE-4.15.7 was tagged in addition; other release activities - as per above - will follow. DEPLOYMENT DURING EMBARGO = Deployment of the patches and/or mitigations described above (or others which are substantially similar) is permitted during the embargo, even on public-facing
Re: [PATCH v3 6/9] xen/arm64: bpi: Add missing code symbol annotations
On Tue, May 7, 2024 at 11:57 AM Julien Grall wrote: > > Hi, > > On 06/05/2024 13:54, Edgar E. Iglesias wrote: > > On Sat, May 4, 2024 at 2:14 AM Stefano Stabellini > > wrote: > >> > >> On Wed, 1 May 2024, Edgar E. Iglesias wrote: > >>> From: "Edgar E. Iglesias" > >>> > >>> Use the generic xen/linkage.h macros to annotate code symbols > >>> and add missing annotations. > >>> > >>> Signed-off-by: Edgar E. Iglesias > >>> --- > >>> xen/arch/arm/arm64/bpi.S | 20 > >>> 1 file changed, 12 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/xen/arch/arm/arm64/bpi.S b/xen/arch/arm/arm64/bpi.S > >>> index 4e63825220..b16e4d1e29 100644 > >>> --- a/xen/arch/arm/arm64/bpi.S > >>> +++ b/xen/arch/arm/arm64/bpi.S > >>> @@ -52,14 +52,15 @@ > >>>* micro-architectures in a system. > >>>*/ > >>> .align 11 > >>> -ENTRY(__bp_harden_hyp_vecs_start) > >>> +FUNC(__bp_harden_hyp_vecs_start) > >>> .rept 4 > >>> vectors hyp_traps_vector > >>> .endr > >>> -ENTRY(__bp_harden_hyp_vecs_end) > >>> +GLOBAL(__bp_harden_hyp_vecs_end) > >>> +END(__bp_harden_hyp_vecs_start) > >> > >> Shouldn't GLOBAL be changed to FUNC as well? > >> > > > > I was a bit unsure but went for GLOBAL since the _end labels point to > > addresses after and outside of the code sequence. > > But I don't have a strong opinion and am happy to change them to FUNC > > if you feel that's better. > > I don't think it should be FUNC as this is not meant to be called > directly. I am also under the impression, we were planning to get rid of > GLOBAL() as well. > > Furthermore, __bp_harden_hyp_vec_start is not a function per say. It is > a pointer to the vector table. > > From the brief look, the same remarks would apply to the rest of bpi.S. > So I think we want to switch all the ENTRY() to LABEL(). Hi Julien, The reason I choose FUNC for the start of the symbol is because these symbols contain executable code (not only a table of pointers to code somewhere else) and the ELF spec says that STT_FUNC means the symbol contains functions or other executable code (not only callable functions IIUC): "STT_FUNC The symbol is associated with a function or other executable code." https://refspecs.linuxbase.org/elf/elf.pdf (Symbol Table 1-20). I think using LABEL instead of GLOBAL for the _end labels of these code sequences makes sense. I'm happy to change the _start labels to LABEL too if you guys feel that's better. Cheers, Edgar
Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation
On Tue, May 07, 2024, Mickaël Salaün wrote: > > Actually, potential bad/crazy idea. Why does the _host_ need to define > > policy? > > Linux already knows what assets it wants to (un)protect and when. What's > > missing > > is a way for the guest kernel to effectively deprivilege and re-authenticate > > itself as needed. We've been tossing around the idea of paired VMs+vCPUs to > > support VTLs and SEV's VMPLs, what if we usurped/piggybacked those ideas, > > with a > > bit of pKVM mixed in? > > > > Borrowing VTL terminology, where VTL0 is the least privileged, userspace > > launches > > the VM at VTL0. At some point, the guest triggers the deprivileging > > sequence and > > userspace creates VTL1. Userpace also provides a way for VTL0 restrict > > access to > > its memory, e.g. to effectively make the page tables for the kernel's > > direct map > > writable only from VTL1, to make kernel text RO (or XO), etc. And VTL0 > > could then > > also completely remove its access to code that changes CR0/CR4. > > > > It would obviously require a _lot_ more upfront work, e.g. to isolate the > > kernel > > text that modifies CR0/CR4 so that it can be removed from VTL0, but that > > should > > be doable with annotations, e.g. tag relevant functions with __magic or > > whatever, > > throw them in a dedicated section, and then free/protect the section(s) at > > the > > appropriate time. > > > > KVM would likely need to provide the ability to switch VTLs (or whatever > > they get > > called), and host userspace would need to provide a decent amount of the > > backend > > mechanisms and "core" policies, e.g. to manage VTL0 memory, teardown (turn > > off?) > > VTL1 on kexec(), etc. But everything else could live in the guest kernel > > itself. > > E.g. to have CR pinning play nice with kexec(), toss the relevant kexec() > > code into > > VTL1. That way VTL1 can verify the kexec() target and tear itself down > > before > > jumping into the new kernel. > > > > This is very off the cuff and have-wavy, e.g. I don't have much of an idea > > what > > it would take to harden kernel text patching, but keeping the policy in the > > guest > > seems like it'd make everything more tractable than trying to define an ABI > > between Linux and a VMM that is rich and flexible enough to support all the > > fancy things Linux does (and will do in the future). > > Yes, we agree that the guest needs to manage its own policy. That's why > we implemented Heki for KVM this way, but without VTLs because KVM > doesn't support them. > > To sum up, is the VTL approach the only one that would be acceptable for > KVM? Heh, that's not a question you want to be asking. You're effectively asking me to make an authorative, "final" decision on a topic which I am only passingly familiar with. But since you asked it... :-) Probably? I see a lot of advantages to a VTL/VSM-like approach: 1. Provides Linux-as-a guest the flexibility it needs to meaningfully advance its security, with the least amount of policy built into the guest/host ABI. 2. Largely decouples guest policy from the host, i.e. should allow the guest to evolve/update it's policy without needing to coordinate changes with the host. 3. The KVM implementation can be generic enough to be reusable for other features. 4. Other groups are already working on VTL-like support in KVM, e.g. for VSM itself, and potentially for VMPL/SVSM support. IMO, #2 is a *huge* selling point. Not having to coordinate changes across multiple code bases and/or organizations and/or maintainers is a big win for velocity, long term maintenance, and probably the very viability of HEKI. Providing the guest with the tools to define and implement its own policy means end users don't have to way for some third party, e.g. CSPs, to deploy the accompanying host-side changes, because there are no host-side changes. And encapsulating everything in the guest drastically reduces the friction with changes in the kernel that interact with hardening, both from a technical and a social perspective. I.e. giving the kernel (near) complete control over its destiny minimizes the number of moving parts, and will be far, far easier to sell to maintainers. I would expect maintainers to react much more favorably to being handed tools to harden the kernel, as opposed to being presented a set of APIs that can be used to make the kernel compliant with _someone else's_ vision of what kernel hardening should look like. E.g. imagine a new feature comes along that requires overriding CR0/CR4 pinning in a way that doesn't fit into existing policy. If the VMM is involved in defining/enforcing the CR pinning policy, then supporting said new feature would require new guest/host ABI and an updated host VMM in order to make the new feature compatible with HEKI. Inevitably, even if everything goes smoothly from an upstreaming perspective, that will result in guests that have to choose
Re: [PATCH v2 (resend) 12/27] x86/mapcache: Initialise the mapcache for the idle domain
On 20/02/2024 10:51, Jan Beulich wrote: On 16.01.2024 20:25, Elias El Yandouzi wrote: --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -750,9 +750,16 @@ int arch_domain_create(struct domain *d, spin_lock_init(>arch.e820_lock); +if ( (rc = mapcache_domain_init(d)) != 0) +{ +free_perdomain_mappings(d); +return rc; +} + /* Minimal initialisation for the idle domain. */ if ( unlikely(is_idle_domain(d)) ) { +struct page_info *pg = d->arch.perdomain_l3_pg; static const struct arch_csw idle_csw = { .from = paravirt_ctxt_switch_from, .to = paravirt_ctxt_switch_to, @@ -763,6 +770,9 @@ int arch_domain_create(struct domain *d, d->arch.cpu_policy = ZERO_BLOCK_PTR; /* Catch stray misuses. */ +idle_pg_table[l4_table_offset(PERDOMAIN_VIRT_START)] = +l4e_from_page(pg, __PAGE_HYPERVISOR_RW); + return 0; } Why not add another call to mapcache_domain_init() right here, allowing a more specific panic() to be invoked in case of failure (compared to the BUG_ON() upon failure of creation of the idle domain as a whole)? Then the other mapcache_domain_init() call doesn't need moving a 2nd time in close succession. Sorry but I don't get your point, why calling another time `mapcache_domain_init()`? What panic() are you referring to? Elias
Re: [PATCH v2 (resend) 11/27] x86: Lift mapcache variable to the arch level
This only lifts the mapcache variable up. Whether we populate the mapcache for a domain is unchanged in this patch. Is it? I wonder because of ... I agree, the commit message doesn't completely reflect the changes below. --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -843,6 +843,8 @@ int arch_domain_create(struct domain *d, psr_domain_init(d); +mapcache_domain_init(d); + if ( is_hvm_domain(d) ) { if ( (rc = hvm_domain_initialise(d, config)) != 0 ) @@ -850,8 +852,6 @@ int arch_domain_create(struct domain *d, } else if ( is_pv_domain(d) ) { -mapcache_domain_init(d); - if ( (rc = pv_domain_initialise(d)) != 0 ) goto fail; } ... this and ... --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -82,11 +82,11 @@ void *map_domain_page(mfn_t mfn) #endif v = mapcache_current_vcpu(); -if ( !v || !is_pv_vcpu(v) ) +if ( !v ) return mfn_to_virt(mfn_x(mfn)); ... this and yet more changes indicating otherwise. Yet if which domains have a mapcache set up is already changed here, I wonder whether the idle domain shouldn't be taken care of here as well. Do you suggest to fold here the following patch where the mapcache gets initialized for idle domains? Elias
Re: [PATCH v2 (resend) 09/27] x86/pv: Rewrite how building PV dom0 handles domheap mappings
> On 20/02/2024 10:28, Jan Beulich wrote: On 16.01.2024 20:25, Elias El Yandouzi wrote: --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -382,6 +382,10 @@ int __init dom0_construct_pv(struct domain *d, l3_pgentry_t *l3tab = NULL, *l3start = NULL; l2_pgentry_t *l2tab = NULL, *l2start = NULL; l1_pgentry_t *l1tab = NULL, *l1start = NULL; +mfn_t l4start_mfn = INVALID_MFN; +mfn_t l3start_mfn = INVALID_MFN; +mfn_t l2start_mfn = INVALID_MFN; +mfn_t l1start_mfn = INVALID_MFN; The reason initializers are needed here is, aiui, the overly large scope of these variables. For example ... Correct, is it just an observation or do you want me to do anything? @@ -708,22 +712,32 @@ int __init dom0_construct_pv(struct domain *d, v->arch.pv.event_callback_cs= FLAT_COMPAT_KERNEL_CS; } +#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \ +do {\ +unmap_domain_page(virt_var);\ +mfn_var = maddr_to_mfn(maddr); \ +maddr += PAGE_SIZE; \ +virt_var = map_domain_page(mfn_var);\ +} while ( false ) + if ( !compat ) { maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table; -l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; +UNMAP_MAP_AND_ADVANCE(l4start_mfn, l4start, mpt_alloc); +l4tab = l4start; clear_page(l4tab); -init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)), - d, INVALID_MFN, true); -v->arch.guest_table = pagetable_from_paddr(__pa(l4start)); +init_xen_l4_slots(l4tab, l4start_mfn, d, INVALID_MFN, true); +v->arch.guest_table = pagetable_from_mfn(l4start_mfn); ... looks to be required only here, while ... } else { /* Monitor table already created by switch_compat(). */ -l4start = l4tab = __va(pagetable_get_paddr(v->arch.guest_table)); +l4start_mfn = pagetable_get_mfn(v->arch.guest_table); +l4start = l4tab = map_domain_page(l4start_mfn); ... in principle the use of the variable could be avoided here. Below from here there's no further use of it. @@ -781,30 +797,34 @@ int __init dom0_construct_pv(struct domain *d, if ( compat ) { -l2_pgentry_t *l2t; - /* Ensure the first four L3 entries are all populated. */ for ( i = 0, l3tab = l3start; i < 4; ++i, ++l3tab ) { if ( !l3e_get_intpte(*l3tab) ) { maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l2_page_table; -l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; -clear_page(l2tab); -*l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT); +UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc); +clear_page(l2start); +*l3tab = l3e_from_mfn(l2start_mfn, L3_PROT); } The updating of l2start is only conditional here, yet ... if ( i == 3 ) l3e_get_page(*l3tab)->u.inuse.type_info |= PGT_pae_xen_l2; } -l2t = map_l2t_from_l3e(l3start[3]); -init_xen_pae_l2_slots(l2t, d); -unmap_domain_page(l2t); +init_xen_pae_l2_slots(l2start, d); ... here you assume it points at the page referenced by the 3rd L3 entry. Hmm, I missed it when sending the revision and indeed it doesn't look correct. Question is why the original code is being replaced here in the first place: It was already suitably mapping the page in question. The code was already suitably mapping the pages in question. This patch doesn't aim to make any functional change, just to rework how the domheap pages are used. The goal of the series is to remove the mappings from the directmap, which means those pages needs to be mapped and unmapped when required. This is all this patch do, see `UNMAP_MAP_AND_ADVANCE()` macro. Elias
[xen-unstable test] 185936: tolerable FAIL
flight 185936 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/185936/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185929 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185929 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185929 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 185929 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 185929 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185929 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185929 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-qcow214 migrate-support-checkfail never pass test-armhf-armhf-xl-qcow215 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail never pass version targeted for testing: xen ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 baseline version: xen ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 Last test of basis 185936 2024-05-07 05:53:32 Z0 days Testing same since (not found) 0 attempts jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass
Re: [PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
On 07/05/2024 3:45 pm, Roger Pau Monné wrote: > On Tue, May 07, 2024 at 03:31:19PM +0100, Andrew Cooper wrote: >> On 07/05/2024 3:24 pm, Roger Pau Monné wrote: >>> On Tue, May 07, 2024 at 02:45:40PM +0100, Andrew Cooper wrote: Ever since Xen 4.14, there has been a latent bug with migration. While some toolstacks can level the features properly, they don't shink feat.max_subleaf when all features have been dropped. This is because we *still* have not completed the toolstack side work for full CPU Policy objects. As a consequence, even when properly feature levelled, VMs can't migrate "backwards" across hardware which reduces feat.max_subleaf. One such example is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). Extend the max policies feat.max_subleaf to the hightest number Xen knows about, but leave the default policies matching the host. This will allow VMs with a higher feat.max_subleaf than strictly necessary to migrate in. Eventually we'll manage to teach the toolstack how to avoid creating such VMs in the first place, but there's still more work to do there. Signed-off-by: Andrew Cooper >>> Acked-by: Roger Pau Monné >> Thanks. >> >>> Even if we have just found one glitch with PSFD and Ice Lake vs >>> Cascade Lack, wouldn't it be safer to always extend the max policies >>> max leafs and subleafs to match the known array sizes? >> This is the final max leaf (containing feature information) to gain >> custom handling, I think? > Couldn't the same happen with extended leaves? Some of the extended > leaves contain features, and hence for policy leveling toolstack might > decide to zero them, yet extd.max_leaf won't be adjusted. Hmm. Right now, extd max leaf is also the one with the bit that we unconditionally advertise, and it's inherited all the way from the host policy. So yes, in principle, but anything that bumps this limit is going to have other implications too, and I'd prefer not to second-guess them at this point. I hope we can get the toolstack side fixes before this becomes a real problem... ~Andrew
Re: [PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
On Tue, May 07, 2024 at 03:31:19PM +0100, Andrew Cooper wrote: > On 07/05/2024 3:24 pm, Roger Pau Monné wrote: > > On Tue, May 07, 2024 at 02:45:40PM +0100, Andrew Cooper wrote: > >> Ever since Xen 4.14, there has been a latent bug with migration. > >> > >> While some toolstacks can level the features properly, they don't shink > >> feat.max_subleaf when all features have been dropped. This is because > >> we *still* have not completed the toolstack side work for full CPU Policy > >> objects. > >> > >> As a consequence, even when properly feature levelled, VMs can't migrate > >> "backwards" across hardware which reduces feat.max_subleaf. One such > >> example > >> is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). > >> > >> Extend the max policies feat.max_subleaf to the hightest number Xen knows > >> about, but leave the default policies matching the host. This will allow > >> VMs > >> with a higher feat.max_subleaf than strictly necessary to migrate in. > >> > >> Eventually we'll manage to teach the toolstack how to avoid creating such > >> VMs > >> in the first place, but there's still more work to do there. > >> > >> Signed-off-by: Andrew Cooper > > Acked-by: Roger Pau Monné > > Thanks. > > > > > Even if we have just found one glitch with PSFD and Ice Lake vs > > Cascade Lack, wouldn't it be safer to always extend the max policies > > max leafs and subleafs to match the known array sizes? > > This is the final max leaf (containing feature information) to gain > custom handling, I think? Couldn't the same happen with extended leaves? Some of the extended leaves contain features, and hence for policy leveling toolstack might decide to zero them, yet extd.max_leaf won't be adjusted. Thanks, Roger.
Re: [PATCH 05/15] tools/libs/light: Increase nr_spi to 160
Hi, On 06/05/2024 06:17, Henry Wang wrote: On 5/1/2024 9:58 PM, Anthony PERARD wrote: On Wed, Apr 24, 2024 at 11:34:39AM +0800, Henry Wang wrote: Increase number of spi to 160 i.e. gic_number_lines() for Xilinx ZynqMP - 32. This was done to allocate and assign IRQs to a running domain. Signed-off-by: Vikram Garhwal Signed-off-by: Stefano Stabellini Signed-off-by: Henry Wang --- tools/libs/light/libxl_arm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index dd5c9f4917..50dbd0f2a9 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -181,7 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, LOG(DEBUG, "Configure the domain"); - config->arch.nr_spis = nr_spis; + /* gic_number_lines() is 192 for Xilinx ZynqMP. min nr_spis = 192 - 32. */ + config->arch.nr_spis = MAX(nr_spis, 160); Is there a way that that Xen or libxl could find out what the minimum number of SPI needs to be? I am afraid currently there is none. Are we going to have to increase that minimum number every time a new platform comes along? It doesn't appear that libxl is using that `nr_spis` value and it is probably just given to Xen. So my guess is that Xen could simply take care of the minimum value, gic_number_lines() seems to be a Xen function. Xen will take care of the value of nr_spis for dom0 in create_dom0() dom0_cfg.arch.nr_spis = min(gic_number_lines(), (unsigned int) 992) - 32; and also for dom0less domUs in create_domUs(). However, it looks like Xen will not take care of the mininum value for libxl guests, the value from config->arch.nr_spis in guest config file will be directly passed to the domain_vgic_init() function from arch_domain_create(). I agree with you that we shouldn't just bump the number everytime when we have a new platform. Therefore, would it be a good idea to move the logic in this patch to arch_sanitise_domain_config()? Xen domains are supposed to be platform agnostics and therefore the numbers of SPIs should not be based on the HW. Furthermore, with your proposal we would end up to allocate data structure for N SPIs when a domain may never needs any SPIs (such as if passthrough is not in-use). This is more likely for domain created by the toolstack than from Xen directly. Instead, we should introduce a new XL configuration to let the user decide the number of SPIs. I would suggest to name "nr_spis" to match the DT bindings. Cheers, -- Julien Grall
Re: [PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
On 07/05/2024 3:24 pm, Roger Pau Monné wrote: > On Tue, May 07, 2024 at 02:45:40PM +0100, Andrew Cooper wrote: >> Ever since Xen 4.14, there has been a latent bug with migration. >> >> While some toolstacks can level the features properly, they don't shink >> feat.max_subleaf when all features have been dropped. This is because >> we *still* have not completed the toolstack side work for full CPU Policy >> objects. >> >> As a consequence, even when properly feature levelled, VMs can't migrate >> "backwards" across hardware which reduces feat.max_subleaf. One such example >> is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). >> >> Extend the max policies feat.max_subleaf to the hightest number Xen knows >> about, but leave the default policies matching the host. This will allow VMs >> with a higher feat.max_subleaf than strictly necessary to migrate in. >> >> Eventually we'll manage to teach the toolstack how to avoid creating such VMs >> in the first place, but there's still more work to do there. >> >> Signed-off-by: Andrew Cooper > Acked-by: Roger Pau Monné Thanks. > > Even if we have just found one glitch with PSFD and Ice Lake vs > Cascade Lack, wouldn't it be safer to always extend the max policies > max leafs and subleafs to match the known array sizes? This is the final max leaf (containing feature information) to gain custom handling, I think? ~Andrew
Re: [PATCH] tools/xl: Open xldevd.log with O_CLOEXEC
On Tue, May 07, 2024 at 01:32:00PM +0200, Marek Marczykowski-Górecki wrote: > On Tue, May 07, 2024 at 12:08:06PM +0100, Andrew Cooper wrote: > > `xl devd` has been observed leaking /var/log/xldevd.log into children. > > > > Link: https://github.com/QubesOS/qubes-issues/issues/8292 > > Reported-by: Demi Marie Obenour > > Signed-off-by: Andrew Cooper > > --- > > CC: Anthony PERARD > > CC: Juergen Gross > > CC: Demi Marie Obenour > > CC: Marek Marczykowski-Górecki > > > > Also entirely speculative based on the QubesOS ticket. > > --- > > tools/xl/xl_utils.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c > > index 17489d182954..060186db3a59 100644 > > --- a/tools/xl/xl_utils.c > > +++ b/tools/xl/xl_utils.c > > @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile) > > exit(-1); > > } > > > > -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644)); > > +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | > > O_CLOEXEC, 0644)); > > This one might be not enough, as the FD gets dup2()-ed to stdout/stderr > just outside of the context here, and then inherited by various hotplug > script. Just adding O_CLOEXEC here means the hotplug scripts will run > with stdout/stderr closed. The scripts shipped with Xen do redirect > stderr to a log quite early, but a) it doesn't do it for stdout, and b) > custom hotplug scripts are a valid use case. > Without that, I see at least few potential issues: > - some log messages may be lost (minor, but annoying) > - something might simply fail on writing to a closed FD, breaking the > hotplug script > - FD 1 will be used as first free FD for any open() or similar call - if > a tool later tries writing something to stdout, it will gets written > to that FD - worse of all three Wait, the above is wrong, dup does not copy the O_CLOEXEC flag over to the new FD. So, maybe your patch is correct after all. > What should be the behavior of hotplug scripts logging? Should they > always take care of their own logging? If so, the hotplug calling part > should redirect stdout/stderr to /dev/null IMO. But if `xl` should > provide some default logging for them (like, the xldevd.log here?), then > the O_CLOEXEC should be set only after duplicating logfile over stdout/err. > > > free(fullname); > > assert(logfile >= 3); > > > > > > base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 > > prerequisite-patch-id: 212e50457e9b6bdfd06a97da545a5aa7155bb919 > > Which one is this? I don't see it in staging, nor in any of your > branches on xenbits. Lore finds "tools/libxs: Open /dev/xen/xenbus fds > as O_CLOEXEC" which I guess is correct, but I have no idea how it > correlates it, as this hash doesn't appear anywhere in the message, nor > its headers... > > -- > Best Regards, > Marek Marczykowski-Górecki > Invisible Things Lab -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH] tools/xl: Open xldevd.log with O_CLOEXEC
On 07/05/2024 3:23 pm, Marek Marczykowski-Górecki wrote: > On Tue, May 07, 2024 at 03:15:48PM +0100, Andrew Cooper wrote: >> On 07/05/2024 12:32 pm, Marek Marczykowski-Górecki wrote: >>> On Tue, May 07, 2024 at 12:08:06PM +0100, Andrew Cooper wrote: `xl devd` has been observed leaking /var/log/xldevd.log into children. Link: https://github.com/QubesOS/qubes-issues/issues/8292 Reported-by: Demi Marie Obenour Signed-off-by: Andrew Cooper --- CC: Anthony PERARD CC: Juergen Gross CC: Demi Marie Obenour CC: Marek Marczykowski-Górecki Also entirely speculative based on the QubesOS ticket. --- tools/xl/xl_utils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c index 17489d182954..060186db3a59 100644 --- a/tools/xl/xl_utils.c +++ b/tools/xl/xl_utils.c @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile) exit(-1); } -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644)); +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | O_CLOEXEC, 0644)); >>> This one might be not enough, as the FD gets dup2()-ed to stdout/stderr >>> just outside of the context here, and then inherited by various hotplug >>> script. Just adding O_CLOEXEC here means the hotplug scripts will run >>> with stdout/stderr closed. >> Lovely :( Yes - this won't work. I guess what we want instead is: >> >> diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c >> index 060186db3a59..a0ce7dd7fa21 100644 >> --- a/tools/xl/xl_utils.c >> +++ b/tools/xl/xl_utils.c >> @@ -282,6 +282,7 @@ int do_daemonize(const char *name, const char *pidfile) >> dup2(logfile, 2); >> >> close(nullfd); >> + close(logfile); >> >> CHK_SYSCALL(daemon(0, 1)); >> >> which at least means there's not a random extra fd attached to the logfile. > But logfile is a global variable, and it looks to be used in dolog()... Urgh, fine. Lets go back to your suggestion of setting CLOEXEC after dup()ing. ~Andrew
Re: [PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
On Tue, May 07, 2024 at 02:45:40PM +0100, Andrew Cooper wrote: > Ever since Xen 4.14, there has been a latent bug with migration. > > While some toolstacks can level the features properly, they don't shink > feat.max_subleaf when all features have been dropped. This is because > we *still* have not completed the toolstack side work for full CPU Policy > objects. > > As a consequence, even when properly feature levelled, VMs can't migrate > "backwards" across hardware which reduces feat.max_subleaf. One such example > is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). > > Extend the max policies feat.max_subleaf to the hightest number Xen knows > about, but leave the default policies matching the host. This will allow VMs > with a higher feat.max_subleaf than strictly necessary to migrate in. > > Eventually we'll manage to teach the toolstack how to avoid creating such VMs > in the first place, but there's still more work to do there. > > Signed-off-by: Andrew Cooper Acked-by: Roger Pau Monné Even if we have just found one glitch with PSFD and Ice Lake vs Cascade Lack, wouldn't it be safer to always extend the max policies max leafs and subleafs to match the known array sizes? Thanks, Roger.
Re: [PATCH] tools/xl: Open xldevd.log with O_CLOEXEC
On Tue, May 07, 2024 at 03:15:48PM +0100, Andrew Cooper wrote: > On 07/05/2024 12:32 pm, Marek Marczykowski-Górecki wrote: > > On Tue, May 07, 2024 at 12:08:06PM +0100, Andrew Cooper wrote: > >> `xl devd` has been observed leaking /var/log/xldevd.log into children. > >> > >> Link: https://github.com/QubesOS/qubes-issues/issues/8292 > >> Reported-by: Demi Marie Obenour > >> Signed-off-by: Andrew Cooper > >> --- > >> CC: Anthony PERARD > >> CC: Juergen Gross > >> CC: Demi Marie Obenour > >> CC: Marek Marczykowski-Górecki > >> > >> Also entirely speculative based on the QubesOS ticket. > >> --- > >> tools/xl/xl_utils.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c > >> index 17489d182954..060186db3a59 100644 > >> --- a/tools/xl/xl_utils.c > >> +++ b/tools/xl/xl_utils.c > >> @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile) > >> exit(-1); > >> } > >> > >> -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, > >> 0644)); > >> +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | > >> O_CLOEXEC, 0644)); > > This one might be not enough, as the FD gets dup2()-ed to stdout/stderr > > just outside of the context here, and then inherited by various hotplug > > script. Just adding O_CLOEXEC here means the hotplug scripts will run > > with stdout/stderr closed. > > Lovely :( Yes - this won't work. I guess what we want instead is: > > diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c > index 060186db3a59..a0ce7dd7fa21 100644 > --- a/tools/xl/xl_utils.c > +++ b/tools/xl/xl_utils.c > @@ -282,6 +282,7 @@ int do_daemonize(const char *name, const char *pidfile) > dup2(logfile, 2); > > close(nullfd); > + close(logfile); > > CHK_SYSCALL(daemon(0, 1)); > > which at least means there's not a random extra fd attached to the logfile. But logfile is a global variable, and it looks to be used in dolog()... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH 2/7] xen/arm: Wrap shared memory mapping code in one function
Hi Michal, int __init process_shm(struct domain *d, struct kernel_info *kinfo, const struct dt_device_node *node) { @@ -249,32 +290,10 @@ int __init process_shm(struct domain *d, struct kernel_info *kinfo, if ( dt_property_read_string(shm_node, "role", _str) == 0 ) owner_dom_io = false; >>> Looking at owner_dom_io, why don't you move parsing role and setting >>> owner_dom_io accordingly to handle_shared_mem_bank()? >> >> I think I wanted to keep all dt_* functions on the same level inside >> process_shm, otherwise yes, I could >> pass down shm_node and do the reading of role_str in handle_shared_mem_bank, >> or I could derive >> owner_dom_io from role_str being passed or not, something like: >> >> role_str = NULL; >> dt_property_read_string(shm_node, "role", _str) >> >> [inside handle_shared_mem_bank]: >> If ( role_str ) >>owner_dom_io = false; >> >> And pass only role_str to handle_shared_mem_bank. >> >> Is this comment to reduce the number of parameters passed? I guess it’s not >> for where we call > In this series as well as the previous one you limit the number of arguments > passed to quite a few functions. > So naturally I would expect it to be done here as well. owner_dom_io is used > only by handle_shared_mem_bank, so it makes more sense to move parsing to this > function so that it is self-contained. Ok I will, just to be on the same page here, you mean having dt_property_read_string inside handle_shared_mem_bank? Or the above example would work for you as well? That one would have role_str passed instead of shm_node. Cheers, Luca
Re: [PATCH] tools/xl: Open xldevd.log with O_CLOEXEC
On 07/05/2024 12:32 pm, Marek Marczykowski-Górecki wrote: > On Tue, May 07, 2024 at 12:08:06PM +0100, Andrew Cooper wrote: >> `xl devd` has been observed leaking /var/log/xldevd.log into children. >> >> Link: https://github.com/QubesOS/qubes-issues/issues/8292 >> Reported-by: Demi Marie Obenour >> Signed-off-by: Andrew Cooper >> --- >> CC: Anthony PERARD >> CC: Juergen Gross >> CC: Demi Marie Obenour >> CC: Marek Marczykowski-Górecki >> >> Also entirely speculative based on the QubesOS ticket. >> --- >> tools/xl/xl_utils.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c >> index 17489d182954..060186db3a59 100644 >> --- a/tools/xl/xl_utils.c >> +++ b/tools/xl/xl_utils.c >> @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile) >> exit(-1); >> } >> >> -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644)); >> +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | >> O_CLOEXEC, 0644)); > This one might be not enough, as the FD gets dup2()-ed to stdout/stderr > just outside of the context here, and then inherited by various hotplug > script. Just adding O_CLOEXEC here means the hotplug scripts will run > with stdout/stderr closed. Lovely :( Yes - this won't work. I guess what we want instead is: diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c index 060186db3a59..a0ce7dd7fa21 100644 --- a/tools/xl/xl_utils.c +++ b/tools/xl/xl_utils.c @@ -282,6 +282,7 @@ int do_daemonize(const char *name, const char *pidfile) dup2(logfile, 2); close(nullfd); + close(logfile); CHK_SYSCALL(daemon(0, 1)); which at least means there's not a random extra fd attached to the logfile. >> free(fullname); >> assert(logfile >= 3); >> >> >> base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 >> prerequisite-patch-id: 212e50457e9b6bdfd06a97da545a5aa7155bb919 > Which one is this? I don't see it in staging, nor in any of your > branches on xenbits. Lore finds "tools/libxs: Open /dev/xen/xenbus fds > as O_CLOEXEC" which I guess is correct, but I have no idea how it > correlates it, as this hash doesn't appear anywhere in the message, nor > its headers... It's the libxs patch, but rebased over yesterday's push to staging. auto-base doesn't work quite so well in cases like this. ~Andrew
Re: [PATCH 1/7] xen/arm: Lookup bootinfo shm bank during the mapping
Hi Michal, >> @@ -440,6 +431,26 @@ int __init process_shm_node(const void *fdt, int node, uint32_t address_cells, device_tree_get_reg(, address_cells, address_cells, , ); size = dt_next_cell(size_cells, ); +if ( !IS_ALIGNED(paddr, PAGE_SIZE) ) +{ +printk("fdt: physical address 0x%"PRIpaddr" is not suitably aligned.\n", + paddr); +return -EINVAL; +} + +if ( !IS_ALIGNED(gaddr, PAGE_SIZE) ) +{ +printk("fdt: guest address 0x%"PRIpaddr" is not suitably aligned.\n", + gaddr); +return -EINVAL; +} + +if ( !IS_ALIGNED(size, PAGE_SIZE) ) >>> What sense does it make to check for size being aligned before checking for >>> size being 0? It would pass this check. >> >> Yes, but in the end we are doing that to print a different error message, so >> it would pass >> for 0 and it’s totally fine, but in the end it will fail afterwards. I don’t >> see functional disruptions >> having this one before the other, what is the concern here? > It does not cause the functional disruption. It is more about code > readability and writing cleaner code. > It makes more sense to first check for size being 0 rather than whether it's > page aligned, since the latter can > pass if former is true and thus not making much sense. Ok then I will switch them and check it being different from 0 before the alignment check. Cheers, Luca
Re: [PATCH 2/7] xen/arm: Wrap shared memory mapping code in one function
On 07/05/2024 15:57, Luca Fancellu wrote: > > > Hi Michal, > >>> >>> +static int __init handle_shared_mem_bank(struct domain *d, paddr_t gbase, >>> + bool owner_dom_io, >>> + const char *role_str, >>> + const struct membank *shm_bank) >>> +{ >>> +paddr_t pbase, psize; >>> +int ret; >>> + >>> +BUG_ON(!shm_bank); >> not needed >> >>> + >>> +pbase = shm_bank->start; >>> +psize = shm_bank->size; >> please add empty line here > > Will do >>> >>> int __init process_shm(struct domain *d, struct kernel_info *kinfo, >>>const struct dt_device_node *node) >>> { >>> @@ -249,32 +290,10 @@ int __init process_shm(struct domain *d, struct >>> kernel_info *kinfo, >>> if ( dt_property_read_string(shm_node, "role", _str) == 0 ) >>> owner_dom_io = false; >> Looking at owner_dom_io, why don't you move parsing role and setting >> owner_dom_io accordingly to handle_shared_mem_bank()? > > I think I wanted to keep all dt_* functions on the same level inside > process_shm, otherwise yes, I could > pass down shm_node and do the reading of role_str in handle_shared_mem_bank, > or I could derive > owner_dom_io from role_str being passed or not, something like: > > role_str = NULL; > dt_property_read_string(shm_node, "role", _str) > > [inside handle_shared_mem_bank]: > If ( role_str ) > owner_dom_io = false; > > And pass only role_str to handle_shared_mem_bank. > > Is this comment to reduce the number of parameters passed? I guess it’s not > for where we call In this series as well as the previous one you limit the number of arguments passed to quite a few functions. So naturally I would expect it to be done here as well. owner_dom_io is used only by handle_shared_mem_bank, so it makes more sense to move parsing to this function so that it is self-contained. ~Michal
Re: [PATCH 1/7] xen/arm: Lookup bootinfo shm bank during the mapping
On 07/05/2024 15:44, Luca Fancellu wrote: > > > Hi Michal, > > Thanks for your review. > >> On 6 May 2024, at 14:24, Michal Orzel wrote: >> >> Hi Luca, >> >> On 23/04/2024 10:25, Luca Fancellu wrote: >>> >>> >>> The current static shared memory code is using bootinfo banks when it >>> needs to find the number of borrower, so every time assign_shared_memory >> s/borrower/borrowers > > Will fix > >> >>> is called, the bank is searched in the bootinfo.shmem structure. >>> >>> There is nothing wrong with it, however the bank can be used also to >>> retrieve the start address and size and also to pass less argument to >>> assign_shared_memory. When retrieving the information from the bootinfo >>> bank, it's also possible to move the checks on alignment to >>> process_shm_node in the early stages. >> Is this change really required for what you want to achieve? At the moment >> the alignment checks >> are done before first use, which requires these values to be aligned. FDT >> processing part does not need it. > > That’s true, but it would separate better the parsing part, in the end what > is the point of failing later if, for example, > some value are passed but not aligned? > >> >>> >>> So create a new function find_shm() which takes a 'struct shared_meminfo' >> Can we name it find_shm_bank() or find_shm_bank_by_id()? >> I agree that it's better to use a unique ID rather than matching by >> address/size > > Yes either names are good for me, I would use find_shm_bank_by_id > >> >>> structure and the shared memory ID, to look for a bank with a matching ID, >>> take the physical host address and size from the bank, pass the bank to >>> assign_shared_memory() removing the now unnecessary arguments and finally >>> remove the acquire_nr_borrower_domain() function since now the information >>> can be extracted from the passed bank. >>> Move the "xen,shm-id" parsing early in process_shm to bail out quickly in >>> case of errors (unlikely), as said above, move the checks on alignment >>> to process_shm_node. >>> >>> Drawback of this change is that now the bootinfo are used also when the >>> bank doesn't need to be allocated, however it will be convinient later >>> to use it as an argument for assign_shared_memory when dealing with >>> the use case where the Host physical address is not supplied by the user. >>> >>> Signed-off-by: Luca Fancellu >>> --- >>> xen/arch/arm/static-shmem.c | 105 >>> 1 file changed, 58 insertions(+), 47 deletions(-) >>> >>> diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c >>> index 09f474ec6050..f6cf74e58a83 100644 >>> --- a/xen/arch/arm/static-shmem.c >>> +++ b/xen/arch/arm/static-shmem.c >>> @@ -19,29 +19,24 @@ static void __init __maybe_unused build_assertions(void) >>> offsetof(struct shared_meminfo, bank))); >>> } >>> >>> -static int __init acquire_nr_borrower_domain(struct domain *d, >>> - paddr_t pbase, paddr_t psize, >>> - unsigned long *nr_borrowers) >>> +static const struct membank __init *find_shm(const struct membanks *shmem, >>> + const char *shm_id) >>> { >>> -const struct membanks *shmem = bootinfo_get_shmem(); >>> unsigned int bank; >>> >>> -/* Iterate reserved memory to find requested shm bank. */ >>> +BUG_ON(!shmem || !shm_id); >> Is it really necessary? For example, before calling find_shm(), strlen is >> used on shm_id > > So, I guess I did that to have more robust code, in case someone changes the > code in the > future and perhaps removes something we rely on. If you object to them I will > remove though, > here and the other related points below. > >> >>> + >>> for ( bank = 0 ; bank < shmem->nr_banks; bank++ ) >>> { >>> -paddr_t bank_start = shmem->bank[bank].start; >>> -paddr_t bank_size = shmem->bank[bank].size; >>> - >>> -if ( (pbase == bank_start) && (psize == bank_size) ) >>> +if ( strncmp(shm_id, shmem->bank[bank].shmem_extra->shm_id, >>> + MAX_SHM_ID_LENGTH) == 0 ) >> Why not strcmp? AFAICS it's been validated many times already >> >>> break; >>> } >>> >>> if ( bank == shmem->nr_banks ) >>> -return -ENOENT; >>> - >>> -*nr_borrowers = shmem->bank[bank].shmem_extra->nr_shm_borrowers; >>> +return NULL; >>> >>> -return 0; >>> +return >bank[bank]; >>> } >>> >>> /* >>> @@ -103,14 +98,20 @@ static mfn_t __init acquire_shared_memory_bank(struct >>> domain *d, >>> return smfn; >>> } >>> >>> -static int __init assign_shared_memory(struct domain *d, >>> - paddr_t pbase, paddr_t psize, >>> - paddr_t gbase) >>> +static int __init assign_shared_memory(struct domain *d, paddr_t gbase, >>> + const struct membank
Re: [PATCH 2/7] xen/arm: Wrap shared memory mapping code in one function
Hi Michal, >> >> +static int __init handle_shared_mem_bank(struct domain *d, paddr_t gbase, >> + bool owner_dom_io, >> + const char *role_str, >> + const struct membank *shm_bank) >> +{ >> +paddr_t pbase, psize; >> +int ret; >> + >> +BUG_ON(!shm_bank); > not needed > >> + >> +pbase = shm_bank->start; >> +psize = shm_bank->size; > please add empty line here Will do >> >> int __init process_shm(struct domain *d, struct kernel_info *kinfo, >>const struct dt_device_node *node) >> { >> @@ -249,32 +290,10 @@ int __init process_shm(struct domain *d, struct >> kernel_info *kinfo, >> if ( dt_property_read_string(shm_node, "role", _str) == 0 ) >> owner_dom_io = false; > Looking at owner_dom_io, why don't you move parsing role and setting > owner_dom_io accordingly to handle_shared_mem_bank()? I think I wanted to keep all dt_* functions on the same level inside process_shm, otherwise yes, I could pass down shm_node and do the reading of role_str in handle_shared_mem_bank, or I could derive owner_dom_io from role_str being passed or not, something like: role_str = NULL; dt_property_read_string(shm_node, "role", _str) [inside handle_shared_mem_bank]: If ( role_str ) owner_dom_io = false; And pass only role_str to handle_shared_mem_bank. Is this comment to reduce the number of parameters passed? I guess it’s not for where we call dt_property_read_string isn’t it? Cheers, Luca
Re: [PATCH net] xen-netfront: Add missing skb_mark_for_recycle
Hello, Please could we request a CVE for "xen-netfront: Add missing skb_mark_for_recycle" which is 037965402a010898d34f4e35327d22c0a95cd51f in Linus' tree. This is a kernel memory leak trigger-able from unprivileged userspace. I can't see any evidence of this fix having been assigned a CVE thus far on the linux-cve-annouce mailing list. Thanks, ~Andrew On 25/04/2024 4:13 pm, Greg KH wrote: > On Thu, Apr 25, 2024 at 02:39:38PM +0100, George Dunlap wrote: >> Greg, >> >> We're issuing an XSA for this; can you issue a CVE? > To ask for a cve, please contact c...@kernel.org as per our > documentation. Please provide the git id of the commit you wish to have > the cve assigned to. > > thanks, > > greg k-h
[PATCH v2] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
Ever since Xen 4.14, there has been a latent bug with migration. While some toolstacks can level the features properly, they don't shink feat.max_subleaf when all features have been dropped. This is because we *still* have not completed the toolstack side work for full CPU Policy objects. As a consequence, even when properly feature levelled, VMs can't migrate "backwards" across hardware which reduces feat.max_subleaf. One such example is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). Extend the max policies feat.max_subleaf to the hightest number Xen knows about, but leave the default policies matching the host. This will allow VMs with a higher feat.max_subleaf than strictly necessary to migrate in. Eventually we'll manage to teach the toolstack how to avoid creating such VMs in the first place, but there's still more work to do there. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné v2: * Adjust max policies rather than the host policy. --- xen/arch/x86/cpu-policy.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index 4b6d96276399..f7e2910c01b5 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -590,6 +590,13 @@ static void __init calculate_pv_max_policy(void) unsigned int i; *p = host_cpu_policy; + +/* + * Some VMs may have a larger-than-necessary feat max_leaf. Allow them to + * migrate in. + */ +p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1; + x86_cpu_policy_to_featureset(p, fs); for ( i = 0; i < ARRAY_SIZE(fs); ++i ) @@ -630,6 +637,10 @@ static void __init calculate_pv_def_policy(void) unsigned int i; *p = pv_max_cpu_policy; + +/* Default to the same max_subleaf as the host. */ +p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf; + x86_cpu_policy_to_featureset(p, fs); for ( i = 0; i < ARRAY_SIZE(fs); ++i ) @@ -666,6 +677,13 @@ static void __init calculate_hvm_max_policy(void) const uint32_t *mask; *p = host_cpu_policy; + +/* + * Some VMs may have a larger-than-necessary feat max_leaf. Allow them to + * migrate in. + */ +p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1; + x86_cpu_policy_to_featureset(p, fs); mask = hvm_hap_supported() ? @@ -783,6 +801,10 @@ static void __init calculate_hvm_def_policy(void) const uint32_t *mask; *p = hvm_max_cpu_policy; + +/* Default to the same max_subleaf as the host. */ +p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf; + x86_cpu_policy_to_featureset(p, fs); mask = hvm_hap_supported() ? base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 -- 2.30.2
Re: [PATCH 1/7] xen/arm: Lookup bootinfo shm bank during the mapping
Hi Michal, Thanks for your review. > On 6 May 2024, at 14:24, Michal Orzel wrote: > > Hi Luca, > > On 23/04/2024 10:25, Luca Fancellu wrote: >> >> >> The current static shared memory code is using bootinfo banks when it >> needs to find the number of borrower, so every time assign_shared_memory > s/borrower/borrowers Will fix > >> is called, the bank is searched in the bootinfo.shmem structure. >> >> There is nothing wrong with it, however the bank can be used also to >> retrieve the start address and size and also to pass less argument to >> assign_shared_memory. When retrieving the information from the bootinfo >> bank, it's also possible to move the checks on alignment to >> process_shm_node in the early stages. > Is this change really required for what you want to achieve? At the moment > the alignment checks > are done before first use, which requires these values to be aligned. FDT > processing part does not need it. That’s true, but it would separate better the parsing part, in the end what is the point of failing later if, for example, some value are passed but not aligned? > >> >> So create a new function find_shm() which takes a 'struct shared_meminfo' > Can we name it find_shm_bank() or find_shm_bank_by_id()? > I agree that it's better to use a unique ID rather than matching by > address/size Yes either names are good for me, I would use find_shm_bank_by_id > >> structure and the shared memory ID, to look for a bank with a matching ID, >> take the physical host address and size from the bank, pass the bank to >> assign_shared_memory() removing the now unnecessary arguments and finally >> remove the acquire_nr_borrower_domain() function since now the information >> can be extracted from the passed bank. >> Move the "xen,shm-id" parsing early in process_shm to bail out quickly in >> case of errors (unlikely), as said above, move the checks on alignment >> to process_shm_node. >> >> Drawback of this change is that now the bootinfo are used also when the >> bank doesn't need to be allocated, however it will be convinient later >> to use it as an argument for assign_shared_memory when dealing with >> the use case where the Host physical address is not supplied by the user. >> >> Signed-off-by: Luca Fancellu >> --- >> xen/arch/arm/static-shmem.c | 105 >> 1 file changed, 58 insertions(+), 47 deletions(-) >> >> diff --git a/xen/arch/arm/static-shmem.c b/xen/arch/arm/static-shmem.c >> index 09f474ec6050..f6cf74e58a83 100644 >> --- a/xen/arch/arm/static-shmem.c >> +++ b/xen/arch/arm/static-shmem.c >> @@ -19,29 +19,24 @@ static void __init __maybe_unused build_assertions(void) >> offsetof(struct shared_meminfo, bank))); >> } >> >> -static int __init acquire_nr_borrower_domain(struct domain *d, >> - paddr_t pbase, paddr_t psize, >> - unsigned long *nr_borrowers) >> +static const struct membank __init *find_shm(const struct membanks *shmem, >> + const char *shm_id) >> { >> -const struct membanks *shmem = bootinfo_get_shmem(); >> unsigned int bank; >> >> -/* Iterate reserved memory to find requested shm bank. */ >> +BUG_ON(!shmem || !shm_id); > Is it really necessary? For example, before calling find_shm(), strlen is > used on shm_id So, I guess I did that to have more robust code, in case someone changes the code in the future and perhaps removes something we rely on. If you object to them I will remove though, here and the other related points below. > >> + >> for ( bank = 0 ; bank < shmem->nr_banks; bank++ ) >> { >> -paddr_t bank_start = shmem->bank[bank].start; >> -paddr_t bank_size = shmem->bank[bank].size; >> - >> -if ( (pbase == bank_start) && (psize == bank_size) ) >> +if ( strncmp(shm_id, shmem->bank[bank].shmem_extra->shm_id, >> + MAX_SHM_ID_LENGTH) == 0 ) > Why not strcmp? AFAICS it's been validated many times already > >> break; >> } >> >> if ( bank == shmem->nr_banks ) >> -return -ENOENT; >> - >> -*nr_borrowers = shmem->bank[bank].shmem_extra->nr_shm_borrowers; >> +return NULL; >> >> -return 0; >> +return >bank[bank]; >> } >> >> /* >> @@ -103,14 +98,20 @@ static mfn_t __init acquire_shared_memory_bank(struct >> domain *d, >> return smfn; >> } >> >> -static int __init assign_shared_memory(struct domain *d, >> - paddr_t pbase, paddr_t psize, >> - paddr_t gbase) >> +static int __init assign_shared_memory(struct domain *d, paddr_t gbase, >> + const struct membank *shm_bank) >> { >> mfn_t smfn; >> int ret = 0; >> unsigned long nr_pages, nr_borrowers, i; >> struct page_info *page; >> +paddr_t pbase, psize; >> + >> +
Re: [PATCH 3/7] xen/p2m: put reference for superpage
Hi Julien, > On 7 May 2024, at 14:20, Julien Grall wrote: > > Hi Luca, > > On 23/04/2024 09:25, Luca Fancellu wrote: >> From: Penny Zheng >> We are doing foreign memory mapping for static shared memory, and >> there is a great possibility that it could be super mapped. > > Is this because we are mapping more than one page at the time? Can you point > me to the code? So, to be honest this patch was originally in Penny’s serie, my knowledge of this side of the codebase is very limited and so I pushed this one basically untouched. From what I can see in the serie the mappings are made in handle_shared_mem_bank, and map_regions_p2mt is called for one page at the time (allocated through the function allocate_domheap_memory (new function introduced in the serie). So is that the case that this patch is not needed? > >> But today, p2m_put_l3_page could not handle superpages. > > This was done on purpose. Xen is not preemptible and therefore we need to be > cautious how much work is done within the p2m code. > > With the below proposal, for 1GB mapping, we may end up to call put_page() up > to 512 * 512 = 262144 times. put_page() can free memory. This could be a very > long operation. > > Have you benchmark how long it would take? I did not, since its purpose was unclear to me and was not commented in the last serie from Penny. Cheers, Luca
Re: [PATCH 3/7] xen/p2m: put reference for superpage
Hi Luca, On 23/04/2024 09:25, Luca Fancellu wrote: From: Penny Zheng We are doing foreign memory mapping for static shared memory, and there is a great possibility that it could be super mapped. Is this because we are mapping more than one page at the time? Can you point me to the code? But today, p2m_put_l3_page could not handle superpages. This was done on purpose. Xen is not preemptible and therefore we need to be cautious how much work is done within the p2m code. With the below proposal, for 1GB mapping, we may end up to call put_page() up to 512 * 512 = 262144 times. put_page() can free memory. This could be a very long operation. Have you benchmark how long it would take? Cheers, -- Julien Grall
Re: [PATCH v6 8/8] xen: allow up to 16383 cpus
Hi Jan, On 06/05/2024 07:42, Jan Beulich wrote: Of course those calculations depend on what people choose as actual values, but giving an upper bound being a power of 2 may at least serve as a hint to them. This is rather a weak hint. If you want to encourage users to chose a power-of-2 value, then let's spell it out in the description. Cheers, -- Julien Grall
Re: [PATCH v6 8/8] xen: allow up to 16383 cpus
Hi Stefano, On 03/05/2024 20:07, Stefano Stabellini wrote: On Fri, 3 May 2024, Julien Grall wrote: [...] So are you saying that from Xen point of view, you are expecting no difference between 256 and 512. And therefore you would be happy if to backport patches if someone find differences (or even security issues) when using > 256 pCPUs? It is difficult to be sure about anything that it is not regularly tested. I am pretty sure someone in the community got Xen running on an Ampere, so like you said 192 is a good number. However, that is not regularly tested, so we don't have any regression checks in gitlab-ci or OSSTest for it. One approach would be to only support things regularly tested either by OSSTest, Gitlab-ci, or also Xen community members. I am not sure what would be the highest number with this way of thinking but likely no more than 192, probably less. I don't know the CPU core count of the biggest ARM machine in OSSTest. This would be rochester* (Cavium Thunder-X). They have 96 pCPUs which, IIRC, are split across two numa nodes. Another approach is to support a "sensible" number: not something tested but something we believe it should work. No regular testing. (In safety, they only believe in things that are actually tested, so this would not be OK. But this is security, not safety, just FYI.) With this approach, we could round up the number to a limit we think it won't break. If 192 works, 256/512 should work? I don't know but couldn't think of something that would break going from 192 to 256. It depends what you mean by work/break. Strictly speaking, Xen should run (i.e. not crash). However, it is unclear how well as if you increase the number of physical CPUs, you will increase contention and may find some bottleneck. I haven't done any performance testing with that many CPUs and I haven't seen any so far with Xen. But I have some areas of concerns. * Xenstored: At least the C version is single-threaded. Technically the limit here is not based on the number of pCPUs, but as you increase it, you indirectly increase the number of domains that can run. I doubt it will behave well if you have 4096 domains running (I am thinking about the x86 limit...). * Locking * How Xen use the locks: I don't think we have many places where we have global locks (one is the memory subsystem). If a lock is already taken, the others will spin. It is unclear if we could high contending. * How Xen implements the locks: At the moment, we are using LL/SC. My take of XSA-295 is there is a lack of fairness with them. I am not sure what would happen if they get contented (as we support more pCPUs). It is also probably time to finally implement LSE atomics. * TLB flush: The TLB flush are broadcasted. There are some suggestions on the Linux ML [1] that they don't perform well on some processors. The discussion seems to have gone nowhere in Linux. But I think it is propably worth to take into account when we decide to update the limit we (security) support. It depends on how strict we want to be on testing requirements. From above, I am rather worry about claiming that Xen can supports up to 256 (and TBH even 192) without any proper testing. This could end up to backfire as we may need to do (in a rush) and backport some rather large work (unless we decide to remove support after the fact). I think I would prefer if we have a low number until someone can do some testing (including potentially malicious guest). If we want for a power-of-two, I would go with 128 because this is closer to the HW we have in testing. If in the future someone can show some data on other platforms (e.g. Ampere), then we can up the limit. > I am not > sure what approach was taken by x86 so far. It is unclear to me. I don't see how we can claim to support up to 4096 CPUs. But that's for the x86 folks to decide. Cheers, [1] https://lore.kernel.org/linux-arm-kernel/20190617143255.10462-1-indou.ta...@jp.fujitsu.com/ -- Julien Grall
[PATCH v7 5/6] [DO NOT APPLY] switch to qemu fork
This makes tests to use patched QEMU, to actually test the new behavior. --- Config.mk | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Config.mk b/Config.mk index a962f095ca16..5e220a1284e4 100644 --- a/Config.mk +++ b/Config.mk @@ -220,8 +220,8 @@ endif OVMF_UPSTREAM_URL ?= https://xenbits.xen.org/git-http/ovmf.git OVMF_UPSTREAM_REVISION ?= ba91d0292e593df8528b66f99c1b0b14fadc8e16 -QEMU_UPSTREAM_URL ?= https://xenbits.xen.org/git-http/qemu-xen.git -QEMU_UPSTREAM_REVISION ?= master +QEMU_UPSTREAM_URL ?= https://github.com/marmarek/qemu +QEMU_UPSTREAM_REVISION ?= origin/msix MINIOS_UPSTREAM_URL ?= https://xenbits.xen.org/git-http/mini-os.git MINIOS_UPSTREAM_REVISION ?= b6a5b4d72b88e5c4faed01f5a44505de022860fc -- git-series 0.9.1
[PATCH v7 1/6] x86/msi: Extend per-domain/device warning mechanism
The arch_msix struct had a single "warned" field with a domid for which warning was issued. Upcoming patch will need similar mechanism for few more warnings, so change it to save a bit field of issued warnings. Signed-off-by: Marek Marczykowski-Górecki Reviewed-by: Jan Beulich --- Changes in v6: - add MSIX_CHECK_WARN macro (Jan) - drop struct name from warned_kind union (Jan) New in v5 --- xen/arch/x86/include/asm/msi.h | 17 - xen/arch/x86/msi.c | 5 + 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h index 997ccb87be0c..bcfdfd35345d 100644 --- a/xen/arch/x86/include/asm/msi.h +++ b/xen/arch/x86/include/asm/msi.h @@ -208,6 +208,15 @@ struct msg_address { PCI_MSIX_ENTRY_SIZE + \ (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1))) +#define MSIX_CHECK_WARN(msix, domid, which) ({ \ +if ( (msix)->warned_domid != (domid) ) \ +{ \ +(msix)->warned_domid = (domid); \ +(msix)->warned_kind.all = 0; \ +} \ +(msix)->warned_kind.which ? false : ((msix)->warned_kind.which = true); \ +}) + struct arch_msix { unsigned int nr_entries, used_entries; struct { @@ -217,7 +226,13 @@ struct arch_msix { int table_idx[MAX_MSIX_TABLE_PAGES]; spinlock_t table_lock; bool host_maskall, guest_maskall; -domid_t warned; +domid_t warned_domid; +union { +uint8_t all; +struct { +bool maskall : 1; +}; +} warned_kind; }; void early_msi_init(void); diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c index e721aaf5c001..42c793426da3 100644 --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -364,13 +364,10 @@ static bool msi_set_mask_bit(struct irq_desc *desc, bool host, bool guest) domid_t domid = pdev->domain->domain_id; maskall = true; -if ( pdev->msix->warned != domid ) -{ -pdev->msix->warned = domid; +if ( MSIX_CHECK_WARN(pdev->msix, domid, maskall) ) printk(XENLOG_G_WARNING "cannot mask IRQ %d: masking MSI-X on Dom%d's %pp\n", desc->irq, domid, >sbdf); -} } pdev->msix->host_maskall = maskall; if ( maskall || pdev->msix->guest_maskall ) -- git-series 0.9.1
[PATCH v7 3/6] automation: prevent QEMU access to /dev/mem in PCI passthrough tests
/dev/mem access doesn't work in dom0 in lockdown and in stubdomain. Simulate this environment with removing /dev/mem device node. Full test for lockdown and stubdomain will come later, when all requirements will be in place. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Stefano Stabellini --- This can be applied only after QEMU change is committed. Otherwise the test will fail. --- automation/scripts/qubes-x86-64.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh index d81ed7b931cf..7eabc1bd6ad4 100755 --- a/automation/scripts/qubes-x86-64.sh +++ b/automation/scripts/qubes-x86-64.sh @@ -163,6 +163,8 @@ ifconfig eth0 up ifconfig xenbr0 up ifconfig xenbr0 192.168.0.1 +# ensure QEMU wont have access /dev/mem +rm -f /dev/mem # get domU console content into test log tail -F /var/log/xen/console/guest-domU.log 2>/dev/null | sed -e \"s/^/(domU) /\" & xl create /etc/xen/domU.cfg -- git-series 0.9.1
[PATCH v7 6/6] [DO NOT APPLY] switch to alternative artifact repo
For testing, switch to my containers registry that includes containers rebuilt with changes in this series. --- automation/gitlab-ci/build.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml index 49d6265ad5b4..0d7e311417d8 100644 --- a/automation/gitlab-ci/build.yaml +++ b/automation/gitlab-ci/build.yaml @@ -320,7 +320,7 @@ qemu-system-ppc64-8.1.0-ppc64-export: alpine-3.18-rootfs-export: extends: .test-jobs-artifact-common - image: registry.gitlab.com/xen-project/xen/tests-artifacts/alpine:3.18 + image: registry.gitlab.com/xen-project/people/marmarek/xen/tests-artifacts/alpine:3.18 script: - mkdir binaries && cp /initrd.tar.gz binaries/initrd.tar.gz artifacts: @@ -331,7 +331,7 @@ alpine-3.18-rootfs-export: kernel-6.1.19-export: extends: .test-jobs-artifact-common - image: registry.gitlab.com/xen-project/xen/tests-artifacts/kernel:6.1.19 + image: registry.gitlab.com/xen-project/people/marmarek/xen/tests-artifacts/kernel:6.1.19 script: - mkdir binaries && cp /bzImage binaries/bzImage artifacts: -- git-series 0.9.1
[PATCH v7 4/6] automation: switch to a wifi card on ADL system
Switch to a wifi card that has registers on a MSI-X page. This tests the "x86/hvm: Allow writes to registers on the same page as MSI-X table" feature. Switch it only for HVM test, because MSI-X adjacent write is not supported on PV. This requires also including drivers and firmware in system for tests. Remove firmware unrelated to the test, to not increase initrd size too much (all firmware takes over 100MB compressed). And finally adjusts test script to handle not only eth0 as a test device, but also wlan0 and connect it to the wifi network. Signed-off-by: Marek Marczykowski-Górecki Reviewed-by: Stefano Stabellini --- This needs two new gitlab variables: WIFI_HW2_SSID and WIFI_HW2_PSK. I'll provide them in private. This change requires rebuilding test containers. This can be applied only after QEMU change is committed. Otherwise the test will fail. --- automation/gitlab-ci/test.yaml | 4 automation/scripts/qubes-x86-64.sh | 7 +++ automation/tests-artifacts/alpine/3.18.dockerfile | 7 +++ automation/tests-artifacts/kernel/6.1.19.dockerfile | 2 ++ 4 files changed, 20 insertions(+) diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml index ad249fa0a5d9..6803cae116b5 100644 --- a/automation/gitlab-ci/test.yaml +++ b/automation/gitlab-ci/test.yaml @@ -193,6 +193,10 @@ adl-pci-pv-x86-64-gcc-debug: adl-pci-hvm-x86-64-gcc-debug: extends: .adl-x86-64 + variables: +PCIDEV: "00:14.3" +WIFI_SSID: "$WIFI_HW2_SSID" +WIFI_PSK: "$WIFI_HW2_PSK" script: - ./automation/scripts/qubes-x86-64.sh pci-hvm 2>&1 | tee ${LOGFILE} needs: diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh index 7eabc1bd6ad4..60498ef1e89a 100755 --- a/automation/scripts/qubes-x86-64.sh +++ b/automation/scripts/qubes-x86-64.sh @@ -94,6 +94,13 @@ on_reboot = "destroy" domU_check=" set -x -e interface=eth0 +if [ -e /sys/class/net/wlan0 ]; then +interface=wlan0 +set +x +wpa_passphrase "$WIFI_SSID" "$WIFI_PSK" > /etc/wpa_supplicant.conf +set -x +wpa_supplicant -B -iwlan0 -c /etc/wpa_supplicant.conf +fi ip link set \"\$interface\" up timeout 30s udhcpc -i \"\$interface\" pingip=\$(ip -o -4 r show default|cut -f 3 -d ' ') diff --git a/automation/tests-artifacts/alpine/3.18.dockerfile b/automation/tests-artifacts/alpine/3.18.dockerfile index 9cde6c9ad4da..c323e266c7da 100644 --- a/automation/tests-artifacts/alpine/3.18.dockerfile +++ b/automation/tests-artifacts/alpine/3.18.dockerfile @@ -34,6 +34,13 @@ RUN \ apk add udev && \ apk add pciutils && \ apk add libelf && \ + apk add wpa_supplicant && \ + # Select firmware for hardware tests + apk add linux-firmware-other && \ + mkdir /lib/firmware-preserve && \ + mv /lib/firmware/iwlwifi-so-a0-gf-a0* /lib/firmware-preserve/ && \ + rm -rf /lib/firmware && \ + mv /lib/firmware-preserve /lib/firmware && \ \ # Xen cd / && \ diff --git a/automation/tests-artifacts/kernel/6.1.19.dockerfile b/automation/tests-artifacts/kernel/6.1.19.dockerfile index 3a4096780d20..84ed5dff23ae 100644 --- a/automation/tests-artifacts/kernel/6.1.19.dockerfile +++ b/automation/tests-artifacts/kernel/6.1.19.dockerfile @@ -32,6 +32,8 @@ RUN curl -fsSLO https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-"$LINUX_VERSI make xen.config && \ scripts/config --enable BRIDGE && \ scripts/config --enable IGC && \ +scripts/config --enable IWLWIFI && \ +scripts/config --enable IWLMVM && \ cp .config .config.orig && \ cat .config.orig | grep XEN | grep =m |sed 's/=m/=y/g' >> .config && \ make -j$(nproc) bzImage && \ -- git-series 0.9.1
[PATCH v7 0/6] MSI-X support with qemu in stubdomain, and other related changes
This series includes changes to make MSI-X working with Linux stubdomain and especially Intel Wifi 6 AX210 card. This takes care of remaining reasons for QEMU to access /dev/mem, but also the Intel Wifi card violating spec by putting some registers on the same page as the MSI-X table. Besides the stubdomain case (of which I care more), this is also necessary for PCI-passthrough to work with lockdown enabled in dom0 (when QEMU runs in dom0). See individual patches for details. This series include also tests for MSI-X using new approach (by preventing QEMU access to /dev/mem). But for it to work, it needs QEMU change that makes use of the changes introduced here. It can be seen at https://github.com/marmarek/qemu/commits/msix Here is the pipeline that used the QEMU fork above: https://gitlab.com/xen-project/people/marmarek/xen/-/pipelines/1280152273 (the build failures are due to issues with fetching or building newer QEMU discussed on Matrix) v7: - "x86/msi: passthrough all MSI-X vector ctrl writes to device model" is already applied Marek Marczykowski-Górecki (6): x86/msi: Extend per-domain/device warning mechanism x86/hvm: Allow access to registers on the same page as MSI-X table automation: prevent QEMU access to /dev/mem in PCI passthrough tests automation: switch to a wifi card on ADL system [DO NOT APPLY] switch to qemu fork [DO NOT APPLY] switch to alternative artifact repo Config.mk | 4 +- automation/gitlab-ci/build.yaml | 4 +- automation/gitlab-ci/test.yaml | 4 +- automation/scripts/qubes-x86-64.sh | 9 +- automation/tests-artifacts/alpine/3.18.dockerfile | 7 +- automation/tests-artifacts/kernel/6.1.19.dockerfile | 2 +- xen/arch/x86/hvm/vmsi.c | 205 - xen/arch/x86/include/asm/msi.h | 22 +- xen/arch/x86/msi.c | 47 ++- 9 files changed, 285 insertions(+), 19 deletions(-) base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 -- git-series 0.9.1
[PATCH v7 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table
Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers on the same page as MSI-X table. Device model (especially one in stubdomain) cannot really handle those, as direct writes to that page is refused (page is on the mmio_ro_ranges list). Instead, extend msixtbl_mmio_ops to handle such accesses too. Doing this, requires correlating read/write location with guest MSI-X table address. Since QEMU doesn't map MSI-X table to the guest, it requires msixtbl_entry->gtable, which is HVM-only. Similar feature for PV would need to be done separately. This will be also used to read Pending Bit Array, if it lives on the same page, making QEMU not needing /dev/mem access at all (especially helpful with lockdown enabled in dom0). If PBA lives on another page, QEMU will map it to the guest directly. If PBA lives on the same page, discard writes and log a message. Technically, writes outside of PBA could be allowed, but at this moment the precise location of PBA isn't saved, and also no known device abuses the spec in this way (at least yet). To access those registers, msixtbl_mmio_ops need the relevant page mapped. MSI handling already has infrastructure for that, using fixmap, so try to map first/last page of the MSI-X table (if necessary) and save their fixmap indexes. Note that msix_get_fixmap() does reference counting and reuses existing mapping, so just call it directly, even if the page was mapped before. Also, it uses a specific range of fixmap indexes which doesn't include 0, so use 0 as default ("not mapped") value - which simplifies code a bit. Based on assumption that all MSI-X page accesses are handled by Xen, do not forward adjacent accesses to other hypothetical ioreq servers, even if the access wasn't handled for some reason (failure to map pages etc). Relevant places log a message about that already. Signed-off-by: Marek Marczykowski-Górecki --- Changes in v7: - simplify logic based on assumption that all access to MSI-X pages are handled by Xen (Roger) - move calling adjacent_handle() into adjacent_{read,write}() (Roger) - move range check into msixtbl_addr_to_desc() (Roger) - fix off-by-one when initializing adj_access_idx[ADJ_IDX_LAST] (Roger) - no longer distinguish between unhandled write due to PBA nearby and other reasons - add missing break after ASSERT_UNREACHABLE (Jan) Changes in v6: - use MSIX_CHECK_WARN macro - extend assert on fixmap_idx - add break in default label, after ASSERT_UNREACHABLE(), and move setting default there - style fixes Changes in v5: - style fixes - include GCC version in the commit message - warn only once (per domain, per device) about failed adjacent access Changes in v4: - drop same_page parameter of msixtbl_find_entry(), distinguish two cases in relevant callers - rename adj_access_table_idx to adj_access_idx - code style fixes - drop alignment check in adjacent_{read,write}() - all callers already have it earlier - delay mapping first/last MSI-X pages until preparing device for a passthrough v3: - merge handling into msixtbl_mmio_ops - extend commit message v2: - adjust commit message - pass struct domain to msixtbl_page_handler_get_hwaddr() - reduce local variables used only once - log a warning if write is forbidden if MSI-X and PBA lives on the same page - do not passthrough unaligned accesses - handle accesses both before and after MSI-X table --- xen/arch/x86/hvm/vmsi.c| 205 -- xen/arch/x86/include/asm/msi.h | 5 +- xen/arch/x86/msi.c | 42 +++- 3 files changed, 242 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c index 17983789..f7b7b4998b5e 100644 --- a/xen/arch/x86/hvm/vmsi.c +++ b/xen/arch/x86/hvm/vmsi.c @@ -180,6 +180,10 @@ static bool msixtbl_initialised(const struct domain *d) return d->arch.hvm.msixtbl_list.next; } +/* + * Lookup an msixtbl_entry on the same page as given addr. It's up to the + * caller to check if address is strictly part of the table - if relevant. + */ static struct msixtbl_entry *msixtbl_find_entry( struct vcpu *v, unsigned long addr) { @@ -187,8 +191,8 @@ static struct msixtbl_entry *msixtbl_find_entry( struct domain *d = v->domain; list_for_each_entry( entry, >arch.hvm.msixtbl_list, list ) -if ( addr >= entry->gtable && - addr < entry->gtable + entry->table_len ) +if ( PFN_DOWN(addr) >= PFN_DOWN(entry->gtable) && + PFN_DOWN(addr) <= PFN_DOWN(entry->gtable + entry->table_len - 1) ) return entry; return NULL; @@ -203,6 +207,10 @@ static struct msi_desc *msixtbl_addr_to_desc( if ( !entry || !entry->pdev ) return NULL; +if ( addr < entry->gtable || + addr >= entry->gtable + entry->table_len ) +return NULL; + nr_entry = (addr - entry->gtable) / PCI_MSIX_ENTRY_SIZE; list_for_each_entry( desc, >pdev->msi_list, list ) @@ -213,6 +221,152 @@ static struct
Re: [PATCH 3/7] xen/p2m: put reference for superpage
Hi Luca, On 23/04/2024 10:25, Luca Fancellu wrote: > > > From: Penny Zheng > > We are doing foreign memory mapping for static shared memory, and > there is a great possibility that it could be super mapped. > But today, p2m_put_l3_page could not handle superpages. > > This commits implements a new function p2m_put_superpage to handle superpages, > specifically for helping put extra references for foreign superpages. > > Signed-off-by: Penny Zheng > Signed-off-by: Luca Fancellu > --- > v1: > - patch from > https://patchwork.kernel.org/project/xen-devel/patch/20231206090623.1932275-9-penny.zh...@arm.com/ > --- > xen/arch/arm/mmu/p2m.c | 58 +++--- > 1 file changed, 43 insertions(+), 15 deletions(-) > > diff --git a/xen/arch/arm/mmu/p2m.c b/xen/arch/arm/mmu/p2m.c > index 41fcca011cf4..479a80fbd4cf 100644 > --- a/xen/arch/arm/mmu/p2m.c > +++ b/xen/arch/arm/mmu/p2m.c > @@ -753,17 +753,9 @@ static int p2m_mem_access_radix_set(struct p2m_domain > *p2m, gfn_t gfn, > return rc; > } > > -/* > - * Put any references on the single 4K page referenced by pte. > - * TODO: Handle superpages, for now we only take special references for leaf > - * pages (specifically foreign ones, which can't be super mapped today). > - */ > -static void p2m_put_l3_page(const lpae_t pte) > +/* Put any references on the single 4K page referenced by mfn. */ > +static void p2m_put_l3_page(mfn_t mfn, unsigned type) Shouldn't type be of p2m_type_t? > { > -mfn_t mfn = lpae_get_mfn(pte); > - > -ASSERT(p2m_is_valid(pte)); > - > /* > * TODO: Handle other p2m types > * > @@ -771,16 +763,53 @@ static void p2m_put_l3_page(const lpae_t pte) > * flush the TLBs if the page is reallocated before the end of > * this loop. > */ > -if ( p2m_is_foreign(pte.p2m.type) ) > +if ( p2m_is_foreign(type) ) > { > ASSERT(mfn_valid(mfn)); > put_page(mfn_to_page(mfn)); > } > /* Detect the xenheap page and mark the stored GFN as invalid. */ > -else if ( p2m_is_ram(pte.p2m.type) && is_xen_heap_mfn(mfn) ) > +else if ( p2m_is_ram(type) && is_xen_heap_mfn(mfn) ) > page_set_xenheap_gfn(mfn_to_page(mfn), INVALID_GFN); > } > > +/* Put any references on the superpage referenced by mfn. */ > +static void p2m_put_superpage(mfn_t mfn, unsigned int next_level, unsigned > type) Shouldn't type be of p2m_type_t? > +{ > +unsigned int i; > +unsigned int level_order = XEN_PT_LEVEL_ORDER(next_level); > + > +for ( i = 0; i < XEN_PT_LPAE_ENTRIES; i++ ) > +{ > +if ( next_level == 3 ) > +p2m_put_l3_page(mfn, type); > +else > +p2m_put_superpage(mfn, next_level + 1, type); > + > +mfn = mfn_add(mfn, 1 << level_order); > +} > +} > + > +/* Put any references on the page referenced by pte. */ > +static void p2m_put_page(const lpae_t pte, unsigned int level) > +{ > +mfn_t mfn = lpae_get_mfn(pte); > + > +ASSERT(p2m_is_valid(pte)); > + > +/* > + * We are either having a first level 1G superpage or a > + * second level 2M superpage. > + */ > +if ( p2m_is_superpage(pte, level) ) > +return p2m_put_superpage(mfn, level + 1, pte.p2m.type); > +else No need for this else > +{ > +ASSERT(level == 3); > +return p2m_put_l3_page(mfn, pte.p2m.type); > +} > +} > + > /* Free lpae sub-tree behind an entry */ > static void p2m_free_entry(struct p2m_domain *p2m, > lpae_t entry, unsigned int level) > @@ -809,9 +838,8 @@ static void p2m_free_entry(struct p2m_domain *p2m, > #endif > > p2m->stats.mappings[level]--; > -/* Nothing to do if the entry is a super-page. */ > -if ( level == 3 ) > -p2m_put_l3_page(entry); > +p2m_put_page(entry, level); > + > return; > } > > -- > 2.34.1 > ~Michal
Re: [PATCH] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
On 07/05/2024 12:50 pm, Roger Pau Monné wrote: > On Tue, May 07, 2024 at 12:29:57PM +0100, Andrew Cooper wrote: >> Ever since Xen 4.14, there has been a latent bug with migration. >> >> While some toolstacks can level the features properly, they don't shink >> feat.max_subleaf when all features have been dropped. This is because >> we *still* have not completed the toolstack side work for full CPU Policy >> objects. >> >> As a consequence, even when properly feature levelled, VMs can't migrate >> "backwards" across hardware which reduces feat.max_subleaf. One such example >> is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). >> >> Extend the host policy's feat.max_subleaf to the hightest number Xen knows >> about, similarly to how we extend extd.max_leaf for LFENCE_DISPATCH. This >> will allow VMs with a higher feat.max_subleaf than strictly necessary to >> migrate in. > Seeing what we do for max_extd_leaf, shouldn't we switch to doing what > you propose for feat.max_subleaf to max_extd_leaf also? > > To allow migration between hosts that have 0x8021.eax and hosts > that don't have such extended leaf. > > cpu_has_lfence_dispatch kind of does that, but if lfence cannot be > made serializing then the max extended leaf is not expanded. And we > should also likely account for more feature leafs possibly appearing > after 0x8021? On second thoughts, this adjustment ought to be in the max policies only. It's slightly different to LFENCE_DISPATCH, in that we don't actually have any set bits in those leaves. I'll do a different patch. ~Andrew
Re: [PATCH] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
On Tue, May 07, 2024 at 12:29:57PM +0100, Andrew Cooper wrote: > Ever since Xen 4.14, there has been a latent bug with migration. > > While some toolstacks can level the features properly, they don't shink > feat.max_subleaf when all features have been dropped. This is because > we *still* have not completed the toolstack side work for full CPU Policy > objects. > > As a consequence, even when properly feature levelled, VMs can't migrate > "backwards" across hardware which reduces feat.max_subleaf. One such example > is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). > > Extend the host policy's feat.max_subleaf to the hightest number Xen knows > about, similarly to how we extend extd.max_leaf for LFENCE_DISPATCH. This > will allow VMs with a higher feat.max_subleaf than strictly necessary to > migrate in. Seeing what we do for max_extd_leaf, shouldn't we switch to doing what you propose for feat.max_subleaf to max_extd_leaf also? To allow migration between hosts that have 0x8021.eax and hosts that don't have such extended leaf. cpu_has_lfence_dispatch kind of does that, but if lfence cannot be made serializing then the max extended leaf is not expanded. And we should also likely account for more feature leafs possibly appearing after 0x8021? Thanks, Roger.
Re: [PATCH] tools/xl: Open xldevd.log with O_CLOEXEC
On Tue, May 07, 2024 at 12:08:06PM +0100, Andrew Cooper wrote: > `xl devd` has been observed leaking /var/log/xldevd.log into children. > > Link: https://github.com/QubesOS/qubes-issues/issues/8292 > Reported-by: Demi Marie Obenour > Signed-off-by: Andrew Cooper > --- > CC: Anthony PERARD > CC: Juergen Gross > CC: Demi Marie Obenour > CC: Marek Marczykowski-Górecki > > Also entirely speculative based on the QubesOS ticket. > --- > tools/xl/xl_utils.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c > index 17489d182954..060186db3a59 100644 > --- a/tools/xl/xl_utils.c > +++ b/tools/xl/xl_utils.c > @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile) > exit(-1); > } > > -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644)); > +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | > O_CLOEXEC, 0644)); This one might be not enough, as the FD gets dup2()-ed to stdout/stderr just outside of the context here, and then inherited by various hotplug script. Just adding O_CLOEXEC here means the hotplug scripts will run with stdout/stderr closed. The scripts shipped with Xen do redirect stderr to a log quite early, but a) it doesn't do it for stdout, and b) custom hotplug scripts are a valid use case. Without that, I see at least few potential issues: - some log messages may be lost (minor, but annoying) - something might simply fail on writing to a closed FD, breaking the hotplug script - FD 1 will be used as first free FD for any open() or similar call - if a tool later tries writing something to stdout, it will gets written to that FD - worse of all three What should be the behavior of hotplug scripts logging? Should they always take care of their own logging? If so, the hotplug calling part should redirect stdout/stderr to /dev/null IMO. But if `xl` should provide some default logging for them (like, the xldevd.log here?), then the O_CLOEXEC should be set only after duplicating logfile over stdout/err. > free(fullname); > assert(logfile >= 3); > > > base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 > prerequisite-patch-id: 212e50457e9b6bdfd06a97da545a5aa7155bb919 Which one is this? I don't see it in staging, nor in any of your branches on xenbits. Lore finds "tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC" which I guess is correct, but I have no idea how it correlates it, as this hash doesn't appear anywhere in the message, nor its headers... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
[PATCH] x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
Ever since Xen 4.14, there has been a latent bug with migration. While some toolstacks can level the features properly, they don't shink feat.max_subleaf when all features have been dropped. This is because we *still* have not completed the toolstack side work for full CPU Policy objects. As a consequence, even when properly feature levelled, VMs can't migrate "backwards" across hardware which reduces feat.max_subleaf. One such example is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0). Extend the host policy's feat.max_subleaf to the hightest number Xen knows about, similarly to how we extend extd.max_leaf for LFENCE_DISPATCH. This will allow VMs with a higher feat.max_subleaf than strictly necessary to migrate in. Eventually we'll manage to teach the toolstack how to avoid creating such VMs in the first place, but there's still more work to do there. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné --- xen/arch/x86/cpu-policy.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c index 4b6d96276399..a216fc8b886f 100644 --- a/xen/arch/x86/cpu-policy.c +++ b/xen/arch/x86/cpu-policy.c @@ -373,8 +373,13 @@ static void __init calculate_host_policy(void) p->basic.max_leaf = min_t(uint32_t, p->basic.max_leaf, ARRAY_SIZE(p->basic.raw) - 1); -p->feat.max_subleaf = -min_t(uint32_t, p->feat.max_subleaf, ARRAY_SIZE(p->feat.raw) - 1); + +/* + * p->feat is "just" featureset information. We know about more than may + * be present in this hardware. Also, VMs may have a higher max_subleaf + * than strictly necessary, and we can accept those too. + */ +p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1; max_extd_leaf = p->extd.max_leaf; base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 -- 2.30.2
Re: [XEN PATCH v4 5/5] xen/arm: ffa: support notification
Hi Julien, On Fri, May 3, 2024 at 4:25 PM Julien Grall wrote: > > Hi Jens, > > On 03/05/2024 14:54, Jens Wiklander wrote: > >>> +static int ffa_setup_irq_callback(struct notifier_block *nfb, > >>> + unsigned long action, void *hcpu) > >>> +{ > >>> +unsigned int cpu = (unsigned long)hcpu; > >>> +struct notif_irq_info irq_info = { }; > >>> + > >>> +switch ( action ) > >>> +{ > >>> +case CPU_ONLINE: > >> > >> Can't you execute the notifier in CPU_STARTING? This will be called on > >> the CPU directly, so you should be able to use request_irq(...). > > > > I tried that first but it failed with the ASSERT_ALLOC_CONTEXT() in > > _xmalloc(). > > > > I've also tested a three-step solution with CPU_UP_PREPARE, > > CPU_STARTING, and CPU_UP_CANCELED. > > My approach here is more direct, but it still suffers from a weakness > > in error handling even if it seems quite unlikely to run out of heap > > or for setup_irq() to fail at this stage. > > Ah I didn't notice that notify_cpu_starting() is called with IRQ > disabled. I assumed they would be enabled. > > Then I would consider to do: > > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > index 6efed876782e..db322672e508 100644 > --- a/xen/arch/arm/smpboot.c > +++ b/xen/arch/arm/smpboot.c > @@ -389,6 +389,7 @@ void asmlinkage start_secondary(void) >*/ > init_maintenance_interrupt(); > init_timer_interrupt(); > +init_tee_interrupt(); > > local_abort_enable(); > > And plumb through the TEE subsystem. I'll use that in the next version, it should remove a lot of complex code. Thanks, Jens
[PATCH] tools/xl: Open xldevd.log with O_CLOEXEC
`xl devd` has been observed leaking /var/log/xldevd.log into children. Link: https://github.com/QubesOS/qubes-issues/issues/8292 Reported-by: Demi Marie Obenour Signed-off-by: Andrew Cooper --- CC: Anthony PERARD CC: Juergen Gross CC: Demi Marie Obenour CC: Marek Marczykowski-Górecki Also entirely speculative based on the QubesOS ticket. --- tools/xl/xl_utils.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c index 17489d182954..060186db3a59 100644 --- a/tools/xl/xl_utils.c +++ b/tools/xl/xl_utils.c @@ -270,7 +270,7 @@ int do_daemonize(const char *name, const char *pidfile) exit(-1); } -CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644)); +CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | O_CLOEXEC, 0644)); free(fullname); assert(logfile >= 3); base-commit: ebab808eb1bb8f24c7d0dd41b956e48cb1824b81 prerequisite-patch-id: 212e50457e9b6bdfd06a97da545a5aa7155bb919 -- 2.30.2
[libvirt test] 185934: tolerable all pass - PUSHED
flight 185934 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/185934/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 185908 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail never pass version targeted for testing: libvirt da95bcb6b2d9b04958e0f2603202801dd29debb8 baseline version: libvirt 3146305fd3a610573963fe4858cc12ec1c4cf5c7 Last test of basis 185908 2024-05-03 04:20:40 Z4 days Testing same since 185934 2024-05-07 04:20:24 Z0 days1 attempts People who touched revisions under test: Adam Julis Ján Tomko Kristina Hanicova Michal Privoznik Oleg Sviridov jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-amd64-amd64-libvirt pass test-arm64-arm64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-amd64-libvirt-qcow2 pass test-arm64-arm64-libvirt-qcow2 pass test-amd64-amd64-libvirt-raw pass test-arm64-arm64-libvirt-raw pass test-amd64-amd64-libvirt-vhd pass test-armhf-armhf-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/libvirt.git 3146305fd3..da95bcb6b2 da95bcb6b2d9b04958e0f2603202801dd29debb8 -> xen-tested-master
[linux-linus test] 185930: tolerable FAIL - PUSHED
flight 185930 linux-linus real [real] flight 185938 linux-linus real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/185930/ http://logs.test-lab.xenproject.org/osstest/logs/185938/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-armhf-armhf-xl-raw 8 xen-bootfail pass in 185938-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-raw 14 migrate-support-check fail in 185938 never pass test-armhf-armhf-xl-raw 15 saverestore-support-check fail in 185938 never pass test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 185928 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 185928 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 185928 test-armhf-armhf-xl-credit2 8 xen-boot fail like 185928 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 185928 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 185928 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 185928 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-qcow214 migrate-support-checkfail never pass test-armhf-armhf-xl-qcow215 saverestore-support-checkfail never pass version targeted for testing: linuxdccb07f2914cdab2ac3a5b6c98406f765acab803 baseline version: linuxdd5a440a31fae6e459c0d627162825505361 Last test of basis 185928 2024-05-06 15:43:38 Z0 days Testing same since 185930 2024-05-06 23:44:00 Z0 days1 attempts People who touched revisions under test: Andy Shevchenko Chengming Zhou Dan Carpenter David Rientjes David Sterba Geert Uytterhoeven Josef Bacik Linus Torvalds Maxime Ripard Nicolas Bouchinet Qu Wenruo Uwe Kleine-König Vlastimil Babka jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64
Re: [PATCH v3 6/9] xen/arm64: bpi: Add missing code symbol annotations
Hi, On 06/05/2024 13:54, Edgar E. Iglesias wrote: On Sat, May 4, 2024 at 2:14 AM Stefano Stabellini wrote: On Wed, 1 May 2024, Edgar E. Iglesias wrote: From: "Edgar E. Iglesias" Use the generic xen/linkage.h macros to annotate code symbols and add missing annotations. Signed-off-by: Edgar E. Iglesias --- xen/arch/arm/arm64/bpi.S | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/xen/arch/arm/arm64/bpi.S b/xen/arch/arm/arm64/bpi.S index 4e63825220..b16e4d1e29 100644 --- a/xen/arch/arm/arm64/bpi.S +++ b/xen/arch/arm/arm64/bpi.S @@ -52,14 +52,15 @@ * micro-architectures in a system. */ .align 11 -ENTRY(__bp_harden_hyp_vecs_start) +FUNC(__bp_harden_hyp_vecs_start) .rept 4 vectors hyp_traps_vector .endr -ENTRY(__bp_harden_hyp_vecs_end) +GLOBAL(__bp_harden_hyp_vecs_end) +END(__bp_harden_hyp_vecs_start) Shouldn't GLOBAL be changed to FUNC as well? I was a bit unsure but went for GLOBAL since the _end labels point to addresses after and outside of the code sequence. But I don't have a strong opinion and am happy to change them to FUNC if you feel that's better. I don't think it should be FUNC as this is not meant to be called directly. I am also under the impression, we were planning to get rid of GLOBAL() as well. Furthermore, __bp_harden_hyp_vec_start is not a function per say. It is a pointer to the vector table. From the brief look, the same remarks would apply to the rest of bpi.S. So I think we want to switch all the ENTRY() to LABEL(). Cheers, -- Julien Grall
[ovmf test] 185937: all pass - PUSHED
flight 185937 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/185937/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 987bea6525d70cd01649472c93d19f89d41d83a2 baseline version: ovmf 1c0d4ae2c0fd24164873947c2e262c499ecf13b5 Last test of basis 185935 2024-05-07 05:12:56 Z0 days Testing same since 185937 2024-05-07 07:43:05 Z0 days1 attempts People who touched revisions under test: Abdul Lateef Attar Ray Ni jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git 1c0d4ae2c0..987bea6525 987bea6525d70cd01649472c93d19f89d41d83a2 -> xen-tested-master
Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation
On Mon, May 06, 2024 at 06:34:53PM GMT, Sean Christopherson wrote: > On Mon, May 06, 2024, Mickaël Salaün wrote: > > On Fri, May 03, 2024 at 07:03:21AM GMT, Sean Christopherson wrote: > > > > --- > > > > > > > > Changes since v1: > > > > * New patch. Making user space aware of Heki properties was requested by > > > > Sean Christopherson. > > > > > > No, I suggested having userspace _control_ the pinning[*], not merely be > > > notified > > > of pinning. > > > > > > : IMO, manipulation of protections, both for memory (this patch) and CPU > > > state > > > : (control registers in the next patch) should come from userspace. I > > > have no > > > : objection to KVM providing plumbing if necessary, but I think > > > userspace needs to > > > : to have full control over the actual state. > > > : > > > : One of the things that caused Intel's control register pinning series > > > to stall > > > : out was how to handle edge cases like kexec() and reboot. Deferring > > > to userspace > > > : means the kernel doesn't need to define policy, e.g. when to unprotect > > > memory, > > > : and avoids questions like "should userspace be able to overwrite > > > pinned control > > > : registers". > > > : > > > : And like the confidential VM use case, keeping userspace in the loop > > > is a big > > > : beneifit, e.g. the guest can't circumvent protections by coercing > > > userspace into > > > : writing to protected memory. > > > > > > I stand by that suggestion, because I don't see a sane way to handle > > > things like > > > kexec() and reboot without having a _much_ more sophisticated policy than > > > would > > > ever be acceptable in KVM. > > > > > > I think that can be done without KVM having any awareness of CR pinning > > > whatsoever. > > > E.g. userspace just needs to ability to intercept CR writes and inject > > > #GPs. Off > > > the cuff, I suspect the uAPI could look very similar to MSR filtering. > > > E.g. I bet > > > userspace could enforce MSR pinning without any new KVM uAPI at all. > > > > > > [*] https://lore.kernel.org/all/zfuyhpuhtmbyd...@google.com > > > > OK, I had concern about the control not directly coming from the guest, > > especially in the case of pKVM and confidential computing, but I get you > > Hardware-based CoCo is completely out of scope, because KVM has zero > visibility > into the guest (well, SNP technically allows trapping CR0/CR4, but KVM really > shouldn't intercept CR0/CR4 for SNP guests). > > And more importantly, _KVM_ doesn't define any policies for CoCo VMs. KVM > might > help enforce policies that are defined by hardware/firmware, but KVM doesn't > define any of its own. > > If pKVM on x86 comes along, then KVM will likely get in the business of > defining > policy, but until that happens, KVM needs to stay firmly out of the picture. > > > point. It should indeed be quite similar to the MSR filtering on the > > userspace side, except that we need another interface for the guest to > > request such change (i.e. self-protection). > > > > Would it be OK to keep this new KVM_HC_LOCK_CR_UPDATE hypercall but > > forward the request to userspace with a VM exit instead? That would > > also enable userspace to get the request and directly configure the CR > > pinning with the same VM exit. > > No? Maybe? I strongly suspect that full support will need a richer set of > APIs > than a single hypercall. E.g. to handle kexec(), suspend+resume, emulated > SMM, > and so on and so forth. And that's just for CR pinning. > > And hypercalls are hampered by the fact that VMCALL/VMMCALL don't allow for > delegation or restriction, i.e. there's no way for the guest to communicate to > the hypervisor that a less privileged component is allowed to perform some > action, > nor is there a way for the guest to say some chunk of CPL0 code *isn't* > allowed > to request transition. Delegation and restriction all has to be done > out-of-band. > > It'd probably be more annoying to setup initially, but I think a synthetic > device > with an MMIO-based interface would be more powerful and flexible in the long > run. > Then userspace can evolve without needing to wait for KVM to catch up. > > Actually, potential bad/crazy idea. Why does the _host_ need to define > policy? > Linux already knows what assets it wants to (un)protect and when. What's > missing > is a way for the guest kernel to effectively deprivilege and re-authenticate > itself as needed. We've been tossing around the idea of paired VMs+vCPUs to > support VTLs and SEV's VMPLs, what if we usurped/piggybacked those ideas, > with a > bit of pKVM mixed in? > > Borrowing VTL terminology, where VTL0 is the least privileged, userspace > launches > the VM at VTL0. At some point, the guest triggers the deprivileging sequence > and > userspace creates VTL1. Userpace also provides a way for VTL0 restrict > access to > its memory, e.g. to effectively make the page tables for
[PATCH v2 0/2] Some fixes for the existing dynamic dtbo code
During the review process for the v1 of the dynamic dtbo series, some issues of the existing code were identified. Discussions of them can be found in [1] (for the first patch) and [2] (for the second patch). Since the main part of the remaining dynamic dtbo series requires more rework, just send these fixes for now. [1] https://lore.kernel.org/xen-devel/835099c8-6cf0-4f6d-899b-07388df89...@xen.org/ [2] https://lore.kernel.org/xen-devel/eaea1986-a27e-4d6c-932f-1d0a9918861f@perard/ Henry Wang (2): xen/common/dt-overlay: Fix missing lock when remove the device tools/xl: Correct the help information and exit code of the dt-overlay command tools/xl/xl_vmcontrol.c | 6 +++--- xen/common/dt-overlay.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) -- 2.34.1
[PATCH v2 2/2] tools/xl: Correct the help information and exit code of the dt-overlay command
Fix the name mismatch in the xl dt-overlay command, the command name should be "dt-overlay" instead of "dt_overlay". Fix the exit code of the dt-overlay command, use EXIT_FAILURE instead of ERROR_FAIL. Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Suggested-by: Anthony PERARD Signed-off-by: Henry Wang --- v2: - New patch --- tools/xl/xl_vmcontrol.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c index 98f6bd2e76..02575d5d36 100644 --- a/tools/xl/xl_vmcontrol.c +++ b/tools/xl/xl_vmcontrol.c @@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv) const int overlay_remove_op = 2; if (argc < 2) { -help("dt_overlay"); +help("dt-overlay"); return EXIT_FAILURE; } @@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv) fprintf(stderr, "failed to read the overlay device tree file %s\n", overlay_config_file); free(overlay_dtb); -return ERROR_FAIL; +return EXIT_FAILURE; } } else { fprintf(stderr, "overlay dtbo file not provided\n"); -return ERROR_FAIL; +return EXIT_FAILURE; } rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op); -- 2.34.1
[PATCH v2 1/2] xen/common/dt-overlay: Fix missing lock when remove the device
If CONFIG_DEBUG=y, below assertion will be triggered: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) [ Xen-4.19-unstable arm64 debug=y Not tainted ] (XEN) CPU:0 (XEN) PC: 0a257418 iommu_remove_dt_device+0x8c/0xd4 (XEN) LR: 0a2573a0 (XEN) SP: 8000fff7fb30 (XEN) CPSR: 0249 MODE:64-bit EL2h (Hypervisor, handler) [...] (XEN) Xen call trace: (XEN)[<0a257418>] iommu_remove_dt_device+0x8c/0xd4 (PC) (XEN)[<0a2573a0>] iommu_remove_dt_device+0x14/0xd4 (LR) (XEN)[<0a20797c>] dt-overlay.c#remove_node_resources+0x8c/0x90 (XEN)[<0a207f14>] dt-overlay.c#remove_nodes+0x524/0x648 (XEN)[<0a208460>] dt_overlay_sysctl+0x428/0xc68 (XEN)[<0a2707f8>] arch_do_sysctl+0x1c/0x2c (XEN)[<0a230b40>] do_sysctl+0x96c/0x9ec (XEN)[<0a271e08>] traps.c#do_trap_hypercall+0x1e8/0x288 (XEN)[<0a273490>] do_trap_guest_sync+0x448/0x63c (XEN)[<0a25c480>] entry.o#guest_sync_slowpath+0xa8/0xd8 (XEN) (XEN) (XEN) (XEN) Panic on CPU 0: (XEN) Assertion 'rw_is_locked(_host_lock)' failed at drivers/passthrough/device_tree.c:146 (XEN) This is because iommu_remove_dt_device() is called without taking the dt_host_lock. dt_host_lock is meant to ensure that the DT node will not disappear behind back. So fix the issue by taking the lock as soon as getting hold of overlay_node. Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities") Signed-off-by: Henry Wang --- v2: - Take the lock as soon as getting hold of overlay_node. v1.1: - Move the unlock position before the check of rc. --- xen/common/dt-overlay.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index 1b197381f6..25d15cbcb1 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -429,6 +429,8 @@ static int remove_nodes(const struct overlay_track *tracker) if ( overlay_node == NULL ) return -EINVAL; +write_lock(_host_lock); + rc = remove_descendant_nodes_resources(overlay_node); if ( rc ) return rc; @@ -439,8 +441,6 @@ static int remove_nodes(const struct overlay_track *tracker) dt_dprintk("Removing node: %s\n", overlay_node->full_name); -write_lock(_host_lock); - rc = dt_overlay_remove_node(overlay_node); if ( rc ) { -- 2.34.1
[ovmf test] 185935: all pass - PUSHED
flight 185935 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/185935/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 1c0d4ae2c0fd24164873947c2e262c499ecf13b5 baseline version: ovmf c12bbc14900aa5c70eec8c0576757c2182db3d01 Last test of basis 185932 2024-05-07 02:44:06 Z0 days Testing same since 185935 2024-05-07 05:12:56 Z0 days1 attempts People who touched revisions under test: Xianglei Cai jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git c12bbc1490..1c0d4ae2c0 1c0d4ae2c0fd24164873947c2e262c499ecf13b5 -> xen-tested-master