Re: [PATCH] powerpc/kvm: Fix typo in the kvm functions
On 2024/09/20 05:08 PM, Kajol Jain wrote: > Fix typo in the following kvm function names from: > > kmvhv_counters_tracepoint_regfunc -> kvmhv_counters_tracepoint_regfunc > kmvhv_counters_tracepoint_unregfunc -> kvmhv_counters_tracepoint_unregfunc > Nice catch! Reviewed-by: Amit Machhiwal > Fixes: e1f288d2f9c6 ("KVM: PPC: Book3S HV nestedv2: Add support for reading > VPA counters for pseries guests") > Reported-by: Madhavan Srinivasan > Signed-off-by: Kajol Jain > --- > arch/powerpc/include/asm/kvm_book3s_64.h | 4 ++-- > arch/powerpc/kvm/book3s_hv.c | 4 ++-- > arch/powerpc/kvm/trace_hv.h | 2 +- > 3 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h > b/arch/powerpc/include/asm/kvm_book3s_64.h > index 2ef9a5f4e5d1..11065313d4c1 100644 > --- a/arch/powerpc/include/asm/kvm_book3s_64.h > +++ b/arch/powerpc/include/asm/kvm_book3s_64.h > @@ -684,8 +684,8 @@ int kvmhv_nestedv2_set_ptbl_entry(unsigned long lpid, u64 > dw0, u64 dw1); > int kvmhv_nestedv2_parse_output(struct kvm_vcpu *vcpu); > int kvmhv_nestedv2_set_vpa(struct kvm_vcpu *vcpu, unsigned long vpa); > > -int kmvhv_counters_tracepoint_regfunc(void); > -void kmvhv_counters_tracepoint_unregfunc(void); > +int kvmhv_counters_tracepoint_regfunc(void); > +void kvmhv_counters_tracepoint_unregfunc(void); > int kvmhv_get_l2_counters_status(void); > void kvmhv_set_l2_counters_status(int cpu, bool status); > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index ba0492f9de65..c36d036d7155 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -4154,7 +4154,7 @@ void kvmhv_set_l2_counters_status(int cpu, bool status) > lppaca_of(cpu).l2_counters_enable = 0; > } > > -int kmvhv_counters_tracepoint_regfunc(void) > +int kvmhv_counters_tracepoint_regfunc(void) > { > int cpu; > > @@ -4164,7 +4164,7 @@ int kmvhv_counters_tracepoint_regfunc(void) > return 0; > } > > -void kmvhv_counters_tracepoint_unregfunc(void) > +void kvmhv_counters_tracepoint_unregfunc(void) > { > int cpu; > > diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h > index 77ebc724e6cd..35fccaa575cc 100644 > --- a/arch/powerpc/kvm/trace_hv.h > +++ b/arch/powerpc/kvm/trace_hv.h > @@ -538,7 +538,7 @@ TRACE_EVENT_FN_COND(kvmppc_vcpu_stats, > TP_printk("VCPU %d: l1_to_l2_cs_time=%llu ns l2_to_l1_cs_time=%llu ns > l2_runtime=%llu ns", > __entry->vcpu_id, __entry->l1_to_l2_cs, > __entry->l2_to_l1_cs, __entry->l2_runtime), > - kmvhv_counters_tracepoint_regfunc, kmvhv_counters_tracepoint_unregfunc > + kvmhv_counters_tracepoint_regfunc, kvmhv_counters_tracepoint_unregfunc > ); > #endif > #endif /* _TRACE_KVM_HV_H */ > -- > 2.43.5 > >
Re: [PATCH v3] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Hi Michael, On 2024/08/17 08:59 AM, Michael Ellerman wrote: > Amit Machhiwal writes: > > On 2024/08/15 01:20 PM, Michael Ellerman wrote: > >> Bjorn Helgaas writes: > >> > On Sat, Aug 03, 2024 at 12:03:25AM +0530, Amit Machhiwal wrote: > >> >> With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence > >> >> of a PCI device attached to a PCI-bridge causes following kernel Oops on > >> >> a pseries KVM guest: > >> > > >> > What is unique about pseries here? There's nothing specific to > >> > pseries in the patch, so I would expect this to be a generic problem > >> > on any arch. > >> > > >> >> RTAS: event: 2, Type: Hotplug Event (229), Severity: 1 > >> >> Kernel attempted to read user page (10ec0048) - exploit attempt? > >> >> (uid: 0) > >> >> BUG: Unable to handle kernel data access on read at 0x10ec0048 > >> > > >> > Weird address. I would expect NULL or something. Where did this > >> > non-NULL pointer come from? > >> > >> It originally comes from np->data, which is supposed to be an > >> of_changeset. > >> > >> The powerpc code also uses np->data for the struct pci_dn pointer, see > >> pci_add_device_node_info(). > >> > >> I wonder if that's why it's non-NULL? > > > > I'm also looking into the code to figure out where's that value coming > > from. I > > will update as soon as I get there. > > Thanks. > > >> Amit, do we have exact steps to reproduce this? I poked around a bit but > >> couldn't get it to trigger. > > > > Sure, below are the steps: > > > > 1. Set CONFIG_PCI_DYNAMIC_OF_NODES=y in the kernel config and compile > > (Fedora > >has it disabled in it's distro config, Ubuntu has it enabled but will > > have it > >disabled in the next update) > > > > 2. If you are using Fedora cloud images, make sure you've these packages > >installed: > > $ rpm -qa | grep -e 'ppc64-diag\|powerpc-utils' > > powerpc-utils-core-1.3.11-6.fc40.ppc64le > > powerpc-utils-1.3.11-6.fc40.ppc64le > > ppc64-diag-rtas-2.7.9-6.fc40.ppc64le > > ppc64-diag-2.7.9-6.fc40.ppc64le > > > > 3. Hotplug a pci device as follows: > > virsh attach-interface bridge --source virbr0 > > I don't use virsh :) No worries. Fortunately, we do have a way to do it with qemu monitor. > > Any idea how to do it with just qemu monitor commands? > 1. Boot the guest with the below included in the qemu cmdline: -netdev bridge,id=,br=virbr0,helper=/usr/libexec/qemu-bridge-helper 2. Once the guest boots, run the below command on qemu monitor to hot-plug a pci device: device_add rtl8139,netdev=,mac=52:54:00:88:31:28,id= dmesg = [ 116.968210] pci :00:01.0: [10ec:8139] type 00 class 0x02 conventional PCI endpoint [ 116.969260] pci :00:01.0: BAR 0 [io 0x1-0x100ff] [ 116.969904] pci :00:01.0: BAR 1 [mem 0x-0x00ff] [ 116.970745] pci :00:01.0: ROM [mem 0x-0x0003 pref] [ 116.971456] pci :00:01.0: No hypervisor support for SR-IOV on this device, IOV BARs disabled. [ 116.972583] pci :00:01.0: Adding to iommu group 0 [ 116.978466] pci :00:01.0: ROM [mem 0x20008008-0x2000800b pref]: assigned [ 116.979347] pci :00:01.0: BAR 0 [io 0x10400-0x104ff]: assigned [ 116.980063] pci :00:01.0: BAR 1 [mem 0x200080001000-0x2000800010ff]: assigned [ 117.017187] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004) [ 117.018577] 8139cp :00:01.0: enabling device ( -> 0003) [ 117.025414] 8139cp :00:01.0 eth1: RTL-8139C+ at 0xfbf09e59, 52:54:00:88:31:28, IRQ 26 [ 117.051028] 8139too: 8139too Fast Ethernet driver 0.9.28 [ 117.076577] 8139cp :00:01.0 eth1: link up, 100Mbps, full-duplex, lpa 0x05E1 3. Try hot-unplug of the device to recreate the kernel Oops. device_del Thanks, Amit > cheers
Re: [PATCH v3] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Hi Michael, On 2024/08/15 01:20 PM, Michael Ellerman wrote: > Bjorn Helgaas writes: > > On Sat, Aug 03, 2024 at 12:03:25AM +0530, Amit Machhiwal wrote: > >> With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence > >> of a PCI device attached to a PCI-bridge causes following kernel Oops on > >> a pseries KVM guest: > > > > What is unique about pseries here? There's nothing specific to > > pseries in the patch, so I would expect this to be a generic problem > > on any arch. > > > >> RTAS: event: 2, Type: Hotplug Event (229), Severity: 1 > >> Kernel attempted to read user page (10ec0048) - exploit attempt? > >> (uid: 0) > >> BUG: Unable to handle kernel data access on read at 0x10ec0048 > > > > Weird address. I would expect NULL or something. Where did this > > non-NULL pointer come from? > > It originally comes from np->data, which is supposed to be an > of_changeset. > > The powerpc code also uses np->data for the struct pci_dn pointer, see > pci_add_device_node_info(). > > I wonder if that's why it's non-NULL? I'm also looking into the code to figure out where's that value coming from. I will update as soon as I get there. > > Amit, do we have exact steps to reproduce this? I poked around a bit but > couldn't get it to trigger. Sure, below are the steps: 1. Set CONFIG_PCI_DYNAMIC_OF_NODES=y in the kernel config and compile (Fedora has it disabled in it's distro config, Ubuntu has it enabled but will have it disabled in the next update) 2. If you are using Fedora cloud images, make sure you've these packages installed: $ rpm -qa | grep -e 'ppc64-diag\|powerpc-utils' powerpc-utils-core-1.3.11-6.fc40.ppc64le powerpc-utils-1.3.11-6.fc40.ppc64le ppc64-diag-rtas-2.7.9-6.fc40.ppc64le ppc64-diag-2.7.9-6.fc40.ppc64le 3. Hotplug a pci device as follows: virsh attach-interface bridge --source virbr0 4. Check if the pci device was added by running `ip a s` 5. Try hot-unplug of that device by supplying the MAC, which should trigger the Oops virsh detach-interface bridge Thanks, Amit > cheers
[PATCH v3] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence of a PCI device attached to a PCI-bridge causes following kernel Oops on a pseries KVM guest: RTAS: event: 2, Type: Hotplug Event (229), Severity: 1 Kernel attempted to read user page (10ec0048) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0x10ec0048 Faulting instruction address: 0xc12d8728 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180 Call Trace: [cbcc3970] [c12daa60] of_changeset_revert+0x58/0xd8 [cbcc39c0] [c0d0ed78] of_pci_remove_node+0x74/0xb0 [cbcc39f0] [c0cdcfe0] pci_stop_bus_device+0xf4/0x138 [cbcc3a30] [c0cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64 [cbcc3a60] [c0cf3780] remove_store+0xf0/0x108 [cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78 [cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4 [cbcc3af0] [c07f7248] kernfs_fop_write_iter+0x1d0/0x2e0 [cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558 [cbcc3bf0] [c06ca168] ksys_write+0x90/0x170 [cbcc3c40] [c0033248] system_call_exception+0xf8/0x290 [cbcc3e50] [c000d05c] system_call_vectored_common+0x15c/0x2ec A git bisect pointed this regression to be introduced via [1] that added a mechanism to create device tree nodes for parent PCI bridges when a PCI device is hot-plugged. The Oops is caused when `pci_stop_dev()` tries to remove a non-existing device-tree node associated with the pci_dev that was earlier hot-plugged and was attached under a pci-bridge. The PCI dev header `dev->hdr_type` being 0, results a conditional check done with `pci_is_bridge()` into false. Consequently, a call to `of_pci_make_dev_node()` to create a device node is never made. When at a later point in time, in the device node removal path, a memcpy is attempted in `__of_changeset_entry_invert()`; since the device node was never created, results in an Oops due to kernel read access to a bad address. To fix this issue, the patch introduces a new flag OF_CREATE_WITH_CSET to keep track of device nodes created via `of_pci_make_dev_node()` and later attempt to destroy only such device nodes which have this flag set. [1] commit 407d1a51921e ("PCI: Create device tree node for bridge") Fixes: 407d1a51921e ("PCI: Create device tree node for bridge") Reported-by: Kowshik Jois B S Signed-off-by: Lizhi Hou Signed-off-by: Amit Machhiwal --- Changes since v2: * Drop v2 changes and introduce a different approach from Lizhi discussed over the v2 of this patch * V2: https://lore.kernel.org/all/20240715080726.2496198-1-amach...@linux.ibm.com/ Changes since v1: * Included Lizhi's suggested changes on V1 * Fixed below two warnings from Lizhi's changes and rearranged the cleanup part a bit in `of_pci_make_dev_node` drivers/pci/of.c:611:6: warning: no previous prototype for ‘of_pci_free_node’ [-Wmissing-prototypes] 611 | void of_pci_free_node(struct device_node *np) | ^~~~ drivers/pci/of.c: In function ‘of_pci_make_dev_node’: drivers/pci/of.c:696:1: warning: label ‘out_destroy_cset’ defined but not used [-Wunused-label] 696 | out_destroy_cset: | ^~~~ * V1: https://lore.kernel.org/all/20240703141634.2974589-1-amach...@linux.ibm.com/ drivers/pci/of.c | 3 ++- include/linux/of.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/of.c b/drivers/pci/of.c index dacea3fc5128..bc455370143e 100644 --- a/drivers/pci/of.c +++ b/drivers/pci/of.c @@ -653,7 +653,7 @@ void of_pci_remove_node(struct pci_dev *pdev) struct device_node *np; np = pci_device_to_OF_node(pdev); - if (!np || !of_node_check_flag(np, OF_DYNAMIC)) + if (!np || !of_node_check_flag(np, OF_CREATE_WITH_CSET)) return; pdev->dev.of_node = NULL; @@ -712,6 +712,7 @@ void of_pci_make_dev_node(struct pci_dev *pdev) if (ret) goto out_free_node; + of_node_set_flag(np, OF_CREATE_WITH_CSET); np->data = cset; pdev->dev.of_node = np; kfree(name); diff --git a/include/linux/of.h b/include/linux/of.h index 85b60ac9eec5..5faa5a1198c6 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -153,6 +153,7 @@ extern struct device_node *of_stdout; #define OF_POPULATED_BUS 4 /* platform bus created for children */ #define OF_OVERLAY 5 /* allocated for an overlay */ #define OF_OVERLAY_FREE_CSET 6 /* in overlay cset being freed */ +#define OF_CREATE_WITH_CSET7 /* Created
Re: [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Hi Stefan, On 2024/07/29 03:27 PM, Stefan Bader wrote: > On 26.07.24 13:37, Rob Herring wrote: > > + Ubuntu kernel list, again > > > > On Thu, Jul 25, 2024 at 11:15:39PM +0530, Amit Machhiwal wrote: > > > Hi Lizhi, Rob, > > > > > > Sorry for responding late. I got busy with some other things. > > > > > > On 2024/07/23 02:08 PM, Lizhi Hou wrote: > > > > > > > > On 7/23/24 12:54, Rob Herring wrote: > > > > > On Tue, Jul 23, 2024 at 12:21 PM Lizhi Hou wrote: > > > > > > > > > > > > On 7/23/24 09:21, Rob Herring wrote: > > > > > > > On Mon, Jul 15, 2024 at 01:52:30PM -0700, Lizhi Hou wrote: > > > > > > > > On 7/15/24 11:55, Rob Herring wrote: > > > > > > > > > On Mon, Jul 15, 2024 at 2:08 AM Amit Machhiwal > > > > > > > > > wrote: > > > > > > > > > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and > > > > > > > > > > hot-unplug sequence > > > > > > > > > > of a PCI device attached to a PCI-bridge causes following > > > > > > > > > > kernel Oops on > > > > > > > > > > a pseries KVM guest: > > > > > > > > > > > > > > > > > > > > RTAS: event: 2, Type: Hotplug Event (229), Severity: 1 > > > > > > > > > > Kernel attempted to read user page (10ec0048) - > > > > > > > > > > exploit attempt? (uid: 0) > > > > > > > > > > BUG: Unable to handle kernel data access on read at > > > > > > > > > > 0x10ec0048 > > > > > > > > > > Faulting instruction address: 0xc12d8728 > > > > > > > > > > Oops: Kernel access of bad area, sig: 11 [#1] > > > > > > > > > > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA > > > > > > > > > > pSeries > > > > > > > > > > > > > > > > > > > > NIP [c12d8728] > > > > > > > > > > __of_changeset_entry_invert+0x10/0x1ac > > > > > > > > > > LR [c12da7f0] > > > > > > > > > > __of_changeset_revert_entries+0x98/0x180 > > > > > > > > > > Call Trace: > > > > > > > > > > [cbcc3970] [c12daa60] > > > > > > > > > > of_changeset_revert+0x58/0xd8 > > > > > > > > > > [cbcc39c0] [c0d0ed78] > > > > > > > > > > of_pci_remove_node+0x74/0xb0 > > > > > > > > > > [cbcc39f0] [c0cdcfe0] > > > > > > > > > > pci_stop_bus_device+0xf4/0x138 > > > > > > > > > > [cbcc3a30] [c0cdd140] > > > > > > > > > > pci_stop_and_remove_bus_device_locked+0x34/0x64 > > > > > > > > > > [cbcc3a60] [c0cf3780] > > > > > > > > > > remove_store+0xf0/0x108 > > > > > > > > > > [cbcc3ab0] [c0e89e04] > > > > > > > > > > dev_attr_store+0x34/0x78 > > > > > > > > > > [cbcc3ad0] [c07f8dd4] > > > > > > > > > > sysfs_kf_write+0x70/0xa4 > > > > > > > > > > [cbcc3af0] [c07f7248] > > > > > > > > > > kernfs_fop_write_iter+0x1d0/0x2e0 > > > > > > > > > > [cbcc3b40] [c06c9b08] > > > > > > > > > > vfs_write+0x27c/0x558 > > > > > > > > > > [cbcc3bf0] [c06ca168] > > > > > > > > > > ksys_write+0x90/0x170 > > > > > > > > > > [cbcc3c40] [c0033248] > > > > > > > > > > system_call_exception+0xf8/0x290 > > > > > > > > > > [cbcc3e50] [c000d05c] > > > > > > > > > > system_call_vectored_common+0x15c/0x2ec > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > A
Re: [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Hi Lizhi, On 2024/07/29 09:47 AM, Lizhi Hou wrote: > Hi Amit > > On 7/29/24 04:13, Amit Machhiwal wrote: > > Hi Lizhi, > > > > On 2024/07/26 11:45 AM, Lizhi Hou wrote: > > > On 7/26/24 10:52, Rob Herring wrote: > > > > On Thu, Jul 25, 2024 at 6:06 PM Lizhi Hou wrote: > > > > > Hi Amit, > > > > > > > > > > > > > > > I try to follow the option which add a OF flag. If Rob is ok with > > > > > this, > > > > > I would suggest to use it instead of V1 patch > > > > > > > > > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c > > > > > index dda6092e6d3a..a401ed0463d9 100644 > > > > > --- a/drivers/of/dynamic.c > > > > > +++ b/drivers/of/dynamic.c > > > > > @@ -382,6 +382,11 @@ void of_node_release(struct kobject *kobj) > > > > > __func__, node); > > > > >} > > > > > > > > > > + if (of_node_check_flag(node, OF_CREATED_WITH_CSET)) { > > > > > + of_changeset_revert(node->data); > > > > > + of_changeset_destroy(node->data); > > > > > + } > > > > What happens if multiple nodes are created in the changeset? > > > Ok. multiple nodes will not work. > > > > > + > > > > >if (node->child) > > > > >pr_err("ERROR: %s() unexpected children for > > > > > %pOF/%s\n", > > > > >__func__, node->parent, node->full_name); > > > > > @@ -507,6 +512,7 @@ struct device_node > > > > > *of_changeset_create_node(struct > > > > > of_changeset *ocs, > > > > >np = __of_node_dup(NULL, full_name); > > > > >if (!np) > > > > >return NULL; > > > > > + of_node_set_flag(np, OF_CREATED_WITH_CSET); > > > > This should be set where the data ptr is set. > > > Ok. It sounds the fix could be simplified to 3 lines change. > > Thanks for the patch. The hot-plug and hot-unplug of PCI device seem to work > > fine as expected. I see this patch would attempt to remove only the nodes > > which > > were created in `of_pci_make_dev_node()` with the help of the newly > > introduced > > flag, which looks good to me. > > > > Also, since a call to `of_pci_make_dev_node()` from `pci_bus_add_device()`, > > that > > creates devices nodes only for bridge devices, is conditional on > > `pci_is_bridge()`, it only makes sense to retain the logical symmetry and > > call > > `of_pci_remove_node()` conditionally on `pci_is_bridge()` as well in > > `pci_stop_dev()`. Hence, I would like to propose the below change along > > with the > > above patch: > > > > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c > > index 910387e5bdbf..c6394bf562cd 100644 > > --- a/drivers/pci/remove.c > > +++ b/drivers/pci/remove.c > > @@ -23,7 +23,8 @@ static void pci_stop_dev(struct pci_dev *dev) > > device_release_driver(&dev->dev); > > pci_proc_detach_device(dev); > > pci_remove_sysfs_dev_files(dev); > > - of_pci_remove_node(dev); > > + if (pci_is_bridge(dev)) > > + of_pci_remove_node(dev); > > pci_dev_assign_added(dev, false); > > } > > > > Please let me know of your thoughts on this and based on that I can spin > > the v3 > > of this patch. > > As I mentioned, there are endpoints in pci quirks (pci/quirks.c) will also > create nodes by of_pci_make_dev_node(). So please remove above two lines. Sorry if I'm misinterpreting something here but as I mentioned, `of_pci_make_dev_node()` is called only for bridge devices with check performed via `pci_is_bridge()`, could you please elaborate more on why the same check can't be put while removing the node via `of_pci_remove_node()`? Thanks, Amit > > Thanks, > > Lizhi > > > > > In addition to this, can this patch be taken as part of 6.11 as a bug fix? > > > > Thanks, > > Amit > > > > > > > > diff --git a/drivers/pci/of.c b/drivers/pci/of.c > > > index 51e3dd0ea5ab..0b3ba1e1b18c 100644 > > > --- a/drivers/pci/of.c > > > +++ b/drivers/pci/of.c > > > @@ -613,7 +613,7 @@ void of_pci_remove_node(str
Re: [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Hi Lizhi, On 2024/07/26 11:45 AM, Lizhi Hou wrote: > > On 7/26/24 10:52, Rob Herring wrote: > > On Thu, Jul 25, 2024 at 6:06 PM Lizhi Hou wrote: > > > Hi Amit, > > > > > > > > > I try to follow the option which add a OF flag. If Rob is ok with this, > > > I would suggest to use it instead of V1 patch > > > > > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c > > > index dda6092e6d3a..a401ed0463d9 100644 > > > --- a/drivers/of/dynamic.c > > > +++ b/drivers/of/dynamic.c > > > @@ -382,6 +382,11 @@ void of_node_release(struct kobject *kobj) > > > __func__, node); > > > } > > > > > > + if (of_node_check_flag(node, OF_CREATED_WITH_CSET)) { > > > + of_changeset_revert(node->data); > > > + of_changeset_destroy(node->data); > > > + } > > What happens if multiple nodes are created in the changeset? > Ok. multiple nodes will not work. > > > > > + > > > if (node->child) > > > pr_err("ERROR: %s() unexpected children for %pOF/%s\n", > > > __func__, node->parent, node->full_name); > > > @@ -507,6 +512,7 @@ struct device_node *of_changeset_create_node(struct > > > of_changeset *ocs, > > > np = __of_node_dup(NULL, full_name); > > > if (!np) > > > return NULL; > > > + of_node_set_flag(np, OF_CREATED_WITH_CSET); > > This should be set where the data ptr is set. > > Ok. It sounds the fix could be simplified to 3 lines change. Thanks for the patch. The hot-plug and hot-unplug of PCI device seem to work fine as expected. I see this patch would attempt to remove only the nodes which were created in `of_pci_make_dev_node()` with the help of the newly introduced flag, which looks good to me. Also, since a call to `of_pci_make_dev_node()` from `pci_bus_add_device()`, that creates devices nodes only for bridge devices, is conditional on `pci_is_bridge()`, it only makes sense to retain the logical symmetry and call `of_pci_remove_node()` conditionally on `pci_is_bridge()` as well in `pci_stop_dev()`. Hence, I would like to propose the below change along with the above patch: diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index 910387e5bdbf..c6394bf562cd 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -23,7 +23,8 @@ static void pci_stop_dev(struct pci_dev *dev) device_release_driver(&dev->dev); pci_proc_detach_device(dev); pci_remove_sysfs_dev_files(dev); - of_pci_remove_node(dev); + if (pci_is_bridge(dev)) + of_pci_remove_node(dev); pci_dev_assign_added(dev, false); } Please let me know of your thoughts on this and based on that I can spin the v3 of this patch. In addition to this, can this patch be taken as part of 6.11 as a bug fix? Thanks, Amit > > > diff --git a/drivers/pci/of.c b/drivers/pci/of.c > index 51e3dd0ea5ab..0b3ba1e1b18c 100644 > --- a/drivers/pci/of.c > +++ b/drivers/pci/of.c > @@ -613,7 +613,7 @@ void of_pci_remove_node(struct pci_dev *pdev) > struct device_node *np; > > np = pci_device_to_OF_node(pdev); > - if (!np || !of_node_check_flag(np, OF_DYNAMIC)) > + if (!np || !of_node_check_flag(np, OF_CREATED_WITH_CSET)) > return; > pdev->dev.of_node = NULL; > > @@ -672,6 +672,7 @@ void of_pci_make_dev_node(struct pci_dev *pdev) > if (ret) > goto out_free_node; > > + of_node_set_flag(np, OF_CREATED_WITH_CSET); > np->data = cset; > pdev->dev.of_node = np; > kfree(name); > diff --git a/include/linux/of.h b/include/linux/of.h > index a0bedd038a05..a46317f6626e 100644 > --- a/include/linux/of.h > +++ b/include/linux/of.h > @@ -153,6 +153,7 @@ extern struct device_node *of_stdout; > #define OF_POPULATED_BUS 4 /* platform bus created for children */ > #define OF_OVERLAY 5 /* allocated for an overlay */ > #define OF_OVERLAY_FREE_CSET 6 /* in overlay cset being freed */ > +#define OF_CREATED_WITH_CSET 7 /* created by of_changeset_create_node */ > > #define OF_BAD_ADDR ((u64)-1) > > > Lizhi > > > > > Rob
Re: [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Hi Bjorn, On 2024/07/25 03:55 PM, Bjorn Helgaas wrote: > On Thu, Jul 25, 2024 at 11:15:39PM +0530, Amit Machhiwal wrote: > > ... > > The crash in question is a critical issue that we would want to have > > a fix for soon. And while this is still being figured out, is it > > okay to go with the fix I proposed in the V1 of this patch? > > v6.10 has been released already, and it will be a couple months before > the v6.11 release. > > It looks like the regression is 407d1a51921e, which appeared in v6.6, > almost a year ago, so it's fairly old. > > What target are you thinking about for the V1 patch? I guess if we > add it as a v6.11 post-merge window fix, it might get backported to > stable kernels before v6.11? Yes, I think we can go ahead with taking V1 patch for v6.11 post-merge window to fix the current bug and ask Ubuntu to pick it while Lizhi's proposed patch goes under test and review. Thanks, Amit
Re: [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Hi Lizhi, Rob, Sorry for responding late. I got busy with some other things. On 2024/07/23 02:08 PM, Lizhi Hou wrote: > > On 7/23/24 12:54, Rob Herring wrote: > > On Tue, Jul 23, 2024 at 12:21 PM Lizhi Hou wrote: > > > > > > On 7/23/24 09:21, Rob Herring wrote: > > > > On Mon, Jul 15, 2024 at 01:52:30PM -0700, Lizhi Hou wrote: > > > > > On 7/15/24 11:55, Rob Herring wrote: > > > > > > On Mon, Jul 15, 2024 at 2:08 AM Amit Machhiwal > > > > > > wrote: > > > > > > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug > > > > > > > sequence > > > > > > > of a PCI device attached to a PCI-bridge causes following kernel > > > > > > > Oops on > > > > > > > a pseries KVM guest: > > > > > > > > > > > > > > RTAS: event: 2, Type: Hotplug Event (229), Severity: 1 > > > > > > > Kernel attempted to read user page (10ec0048) - exploit > > > > > > > attempt? (uid: 0) > > > > > > > BUG: Unable to handle kernel data access on read at > > > > > > > 0x10ec0048 > > > > > > > Faulting instruction address: 0xc12d8728 > > > > > > > Oops: Kernel access of bad area, sig: 11 [#1] > > > > > > > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > > > > > > > > > > > > > > NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac > > > > > > > LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180 > > > > > > > Call Trace: > > > > > > > [cbcc3970] [c12daa60] > > > > > > > of_changeset_revert+0x58/0xd8 > > > > > > > [cbcc39c0] [c0d0ed78] > > > > > > > of_pci_remove_node+0x74/0xb0 > > > > > > > [cbcc39f0] [c0cdcfe0] > > > > > > > pci_stop_bus_device+0xf4/0x138 > > > > > > > [cbcc3a30] [c0cdd140] > > > > > > > pci_stop_and_remove_bus_device_locked+0x34/0x64 > > > > > > > [cbcc3a60] [c0cf3780] remove_store+0xf0/0x108 > > > > > > > [cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78 > > > > > > > [cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4 > > > > > > > [cbcc3af0] [c07f7248] > > > > > > > kernfs_fop_write_iter+0x1d0/0x2e0 > > > > > > > [cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558 > > > > > > > [cbcc3bf0] [c06ca168] ksys_write+0x90/0x170 > > > > > > > [cbcc3c40] [c0033248] > > > > > > > system_call_exception+0xf8/0x290 > > > > > > > [cbcc3e50] [c000d05c] > > > > > > > system_call_vectored_common+0x15c/0x2ec > > > > > > > > > > > > > > > > > > > > > A git bisect pointed this regression to be introduced via [1] > > > > > > > that added > > > > > > > a mechanism to create device tree nodes for parent PCI bridges > > > > > > > when a > > > > > > > PCI device is hot-plugged. > > > > > > > > > > > > > > The Oops is caused when `pci_stop_dev()` tries to remove a > > > > > > > non-existing > > > > > > > device-tree node associated with the pci_dev that was earlier > > > > > > > hot-plugged and was attached under a pci-bridge. The PCI dev > > > > > > > header > > > > > > > `dev->hdr_type` being 0, results a conditional check done with > > > > > > > `pci_is_bridge()` into false. Consequently, a call to > > > > > > > `of_pci_make_dev_node()` to create a device node is never made. > > > > > > > When at > > > > > > > a later point in time, in the device node removal path, a memcpy > > > > > > > is > > > > > > > attempted in `__of_changeset_entry_invert()`; since the device > > > > > > > node was > > > > > > > never created, results in an Oops due to kernel read access to a > > > >
[PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence of a PCI device attached to a PCI-bridge causes following kernel Oops on a pseries KVM guest: RTAS: event: 2, Type: Hotplug Event (229), Severity: 1 Kernel attempted to read user page (10ec0048) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0x10ec0048 Faulting instruction address: 0xc12d8728 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180 Call Trace: [cbcc3970] [c12daa60] of_changeset_revert+0x58/0xd8 [cbcc39c0] [c0d0ed78] of_pci_remove_node+0x74/0xb0 [cbcc39f0] [c0cdcfe0] pci_stop_bus_device+0xf4/0x138 [cbcc3a30] [c0cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64 [cbcc3a60] [c0cf3780] remove_store+0xf0/0x108 [cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78 [cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4 [cbcc3af0] [c07f7248] kernfs_fop_write_iter+0x1d0/0x2e0 [cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558 [cbcc3bf0] [c06ca168] ksys_write+0x90/0x170 [cbcc3c40] [c0033248] system_call_exception+0xf8/0x290 [cbcc3e50] [c000d05c] system_call_vectored_common+0x15c/0x2ec A git bisect pointed this regression to be introduced via [1] that added a mechanism to create device tree nodes for parent PCI bridges when a PCI device is hot-plugged. The Oops is caused when `pci_stop_dev()` tries to remove a non-existing device-tree node associated with the pci_dev that was earlier hot-plugged and was attached under a pci-bridge. The PCI dev header `dev->hdr_type` being 0, results a conditional check done with `pci_is_bridge()` into false. Consequently, a call to `of_pci_make_dev_node()` to create a device node is never made. When at a later point in time, in the device node removal path, a memcpy is attempted in `__of_changeset_entry_invert()`; since the device node was never created, results in an Oops due to kernel read access to a bad address. To fix this issue, the patch updates `of_changeset_create_node()` to allocate a new node only when the device node doesn't exist and init it in case it does already. Also, introduce `of_pci_free_node()` to be called to only revert and destroy the changeset device node that was created via a call to `of_changeset_create_node()`. [1] commit 407d1a51921e ("PCI: Create device tree node for bridge") Fixes: 407d1a51921e ("PCI: Create device tree node for bridge") Reported-by: Kowshik Jois B S Signed-off-by: Lizhi Hou Signed-off-by: Amit Machhiwal --- Changes since v1: * Included Lizhi's suggested changes on V1 * Fixed below two warnings from Lizhi's changes and rearranged the cleanup part a bit in `of_pci_make_dev_node` drivers/pci/of.c:611:6: warning: no previous prototype for ‘of_pci_free_node’ [-Wmissing-prototypes] 611 | void of_pci_free_node(struct device_node *np) | ^~~~ drivers/pci/of.c: In function ‘of_pci_make_dev_node’: drivers/pci/of.c:696:1: warning: label ‘out_destroy_cset’ defined but not used [-Wunused-label] 696 | out_destroy_cset: | ^~~~ * V1: https://lore.kernel.org/all/20240703141634.2974589-1-amach...@linux.ibm.com/ drivers/of/dynamic.c | 16 drivers/of/unittest.c | 2 +- drivers/pci/bus.c | 3 +-- drivers/pci/of.c | 39 ++- drivers/pci/pci.h | 2 ++ include/linux/of.h| 1 + 6 files changed, 43 insertions(+), 20 deletions(-) diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c index dda6092e6d3a..9bba5e82a384 100644 --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct device_node *np, * a given changeset. * * @ocs: Pointer to changeset + * @np: Pointer to device node. If null, allocate a new node. If not, init an + * existing one. * @parent: Pointer to parent device node * @full_name: Node full name * * Return: Pointer to the created device node or NULL in case of an error. */ struct device_node *of_changeset_create_node(struct of_changeset *ocs, +struct device_node *np, struct device_node *parent, const char *full_name) { - struct device_node *np; int ret; - np = __of_node_dup(NULL, full_name); - if (!np) - return NULL; + if (!np) { + np = __of_node_dup(NULL, full_name); + if (!n
Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Hi Lizhi, On 2024/07/11 02:18 PM, Lizhi Hou wrote: > > On 7/11/24 11:48, Amit Machhiwal wrote: > > On 2024/07/10 09:48 PM, Lizhi Hou wrote: > > > On 7/5/24 12:20, Bjorn Helgaas wrote: > > > > [+cc Lukas, FYI] > > > > > > > > On Wed, Jul 03, 2024 at 07:46:34PM +0530, Amit Machhiwal wrote: > > > > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug > > > > > sequence > > > > > of a PCI device attached to a PCI-bridge causes following kernel Oops > > > > > on > > > > > a pseries KVM guest: > > > > > > > > > >RTAS: event: 2, Type: Hotplug Event (229), Severity: 1 > > > > >Kernel attempted to read user page (10ec0048) - exploit > > > > > attempt? (uid: 0) > > > > >BUG: Unable to handle kernel data access on read at 0x10ec0048 > > > > >Faulting instruction address: 0xc12d8728 > > > > >Oops: Kernel access of bad area, sig: 11 [#1] > > > > >LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > > > > > > > > > >NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac > > > > >LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180 > > > > >Call Trace: > > > > >[cbcc3970] [c12daa60] of_changeset_revert+0x58/0xd8 > > > > >[cbcc39c0] [c0d0ed78] of_pci_remove_node+0x74/0xb0 > > > > >[cbcc39f0] [c0cdcfe0] > > > > > pci_stop_bus_device+0xf4/0x138 > > > > >[cbcc3a30] [c0cdd140] > > > > > pci_stop_and_remove_bus_device_locked+0x34/0x64 > > > > >[cbcc3a60] [c0cf3780] remove_store+0xf0/0x108 > > > > >[cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78 > > > > >[cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4 > > > > >[cbcc3af0] [c07f7248] > > > > > kernfs_fop_write_iter+0x1d0/0x2e0 > > > > >[cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558 > > > > >[cbcc3bf0] [c06ca168] ksys_write+0x90/0x170 > > > > >[cbcc3c40] [c0033248] > > > > > system_call_exception+0xf8/0x290 > > > > >[cbcc3e50] [c000d05c] > > > > > system_call_vectored_common+0x15c/0x2ec > > > > > > > > > > > > > > > A git bisect pointed this regression to be introduced via [1] that > > > > > added > > > > > a mechanism to create device tree nodes for parent PCI bridges when a > > > > > PCI device is hot-plugged. > > > > > > > > > > The Oops is caused when `pci_stop_dev()` tries to remove a > > > > > non-existing > > > > > device-tree node associated with the pci_dev that was earlier > > > > > hot-plugged and was attached under a pci-bridge. The PCI dev header > > > > > `dev->hdr_type` being 0, results a conditional check done with > > > > > `pci_is_bridge()` into false. Consequently, a call to > > > > > `of_pci_make_dev_node()` to create a device node is never made. When > > > > > at > > > > > a later point in time, in the device node removal path, a memcpy is > > > > > attempted in `__of_changeset_entry_invert()`; since the device node > > > > > was > > > > > never created, results in an Oops due to kernel read access to a bad > > > > > address. > > > > > > > > > > To fix this issue the patch updates `pci_stop_dev()` to ensure that a > > > > > call to `of_pci_remove_node()` is only made for pci-bridge devices. > > > > > > > > > > [1] commit 407d1a51921e ("PCI: Create device tree node for bridge") > > > > > > > > > > Fixes: 407d1a51921e ("PCI: Create device tree node for bridge") > > > > > Reported-by: Kowshik Jois B S > > > > > Tested-by: Kowshik Jois B S > > > > > Signed-off-by: Amit Machhiwal > > > > Thanks for the patch and testing! Would like a reviewed-by from > > > > Lizhi. > > > of_pci_make_dev_node() will create of nodes for some endpoint devices > > > (not a > > > bridge) as well. And actually this is the main purpose. > > > > &
Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
On 2024/07/10 09:48 PM, Lizhi Hou wrote: > > On 7/5/24 12:20, Bjorn Helgaas wrote: > > [+cc Lukas, FYI] > > > > On Wed, Jul 03, 2024 at 07:46:34PM +0530, Amit Machhiwal wrote: > > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence > > > of a PCI device attached to a PCI-bridge causes following kernel Oops on > > > a pseries KVM guest: > > > > > > RTAS: event: 2, Type: Hotplug Event (229), Severity: 1 > > > Kernel attempted to read user page (10ec0048) - exploit attempt? > > > (uid: 0) > > > BUG: Unable to handle kernel data access on read at 0x10ec0048 > > > Faulting instruction address: 0xc12d8728 > > > Oops: Kernel access of bad area, sig: 11 [#1] > > > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > > > > > > NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac > > > LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180 > > > Call Trace: > > > [cbcc3970] [c12daa60] of_changeset_revert+0x58/0xd8 > > > [cbcc39c0] [c0d0ed78] of_pci_remove_node+0x74/0xb0 > > > [cbcc39f0] [c0cdcfe0] pci_stop_bus_device+0xf4/0x138 > > > [cbcc3a30] [c0cdd140] > > > pci_stop_and_remove_bus_device_locked+0x34/0x64 > > > [cbcc3a60] [c0cf3780] remove_store+0xf0/0x108 > > > [cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78 > > > [cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4 > > > [cbcc3af0] [c07f7248] kernfs_fop_write_iter+0x1d0/0x2e0 > > > [cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558 > > > [cbcc3bf0] [c06ca168] ksys_write+0x90/0x170 > > > [cbcc3c40] [c0033248] system_call_exception+0xf8/0x290 > > > [cbcc3e50] [c000d05c] > > > system_call_vectored_common+0x15c/0x2ec > > > > > > > > > A git bisect pointed this regression to be introduced via [1] that added > > > a mechanism to create device tree nodes for parent PCI bridges when a > > > PCI device is hot-plugged. > > > > > > The Oops is caused when `pci_stop_dev()` tries to remove a non-existing > > > device-tree node associated with the pci_dev that was earlier > > > hot-plugged and was attached under a pci-bridge. The PCI dev header > > > `dev->hdr_type` being 0, results a conditional check done with > > > `pci_is_bridge()` into false. Consequently, a call to > > > `of_pci_make_dev_node()` to create a device node is never made. When at > > > a later point in time, in the device node removal path, a memcpy is > > > attempted in `__of_changeset_entry_invert()`; since the device node was > > > never created, results in an Oops due to kernel read access to a bad > > > address. > > > > > > To fix this issue the patch updates `pci_stop_dev()` to ensure that a > > > call to `of_pci_remove_node()` is only made for pci-bridge devices. > > > > > > [1] commit 407d1a51921e ("PCI: Create device tree node for bridge") > > > > > > Fixes: 407d1a51921e ("PCI: Create device tree node for bridge") > > > Reported-by: Kowshik Jois B S > > > Tested-by: Kowshik Jois B S > > > Signed-off-by: Amit Machhiwal > > Thanks for the patch and testing! Would like a reviewed-by from > > Lizhi. > > of_pci_make_dev_node() will create of nodes for some endpoint devices (not a > bridge) as well. And actually this is the main purpose. > > Maybe the patch as below would resolve the Oops? Thanks for the patch, Lizhi! I tried out this patch and don't see the issue with the same. The hot-plug and hot-unplug of PCI device seem to work fine as expected. ~ Amit > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c > index dda6092e6d3a..3c693b091ecf 100644 > --- a/drivers/of/dynamic.c > +++ b/drivers/of/dynamic.c > @@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct > device_node *np, > * a given changeset. > * > * @ocs: Pointer to changeset > + * @np: Pointer to device node. If it is not null, init it directly instead > of > + * allocate a new node. > * @parent: Pointer to parent device node > * @full_name: Node full name > * > * Return: Pointer to the created device node or NULL in case of an error. > */ > struct device_node *of_changeset_create_node(struct of_changeset *ocs, > + stru
Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Hi Rob, On 2024/07/11 06:20 AM, Rob Herring wrote: > On Wed, Jul 3, 2024 at 8:17 AM Amit Machhiwal wrote: > > > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence > > of a PCI device attached to a PCI-bridge causes following kernel Oops on > > a pseries KVM guest: > > Can I ask why you have this option on in the first place? Do you have > a use for it or it's just a case of distros turn on every kconfig > option. Yes, this option is turned on in Ubuntu's distro kernel config where the issue was originally reported, while Fedora is keeping this turned off. root@ubuntu:~# cat /boot/config-6.8.0-38-generic | grep PCI_DYN CONFIG_PCI_DYNAMIC_OF_NODES=y root@fedora:~# cat /boot/config-6.9.7-200.fc40.ppc64le | grep PCI_DYN # CONFIG_PCI_DYNAMIC_OF_NODES is not set Thanks, Amit > > Rob
[PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence of a PCI device attached to a PCI-bridge causes following kernel Oops on a pseries KVM guest: RTAS: event: 2, Type: Hotplug Event (229), Severity: 1 Kernel attempted to read user page (10ec0048) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0x10ec0048 Faulting instruction address: 0xc12d8728 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries NIP [c12d8728] __of_changeset_entry_invert+0x10/0x1ac LR [c12da7f0] __of_changeset_revert_entries+0x98/0x180 Call Trace: [cbcc3970] [c12daa60] of_changeset_revert+0x58/0xd8 [cbcc39c0] [c0d0ed78] of_pci_remove_node+0x74/0xb0 [cbcc39f0] [c0cdcfe0] pci_stop_bus_device+0xf4/0x138 [cbcc3a30] [c0cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64 [cbcc3a60] [c0cf3780] remove_store+0xf0/0x108 [cbcc3ab0] [c0e89e04] dev_attr_store+0x34/0x78 [cbcc3ad0] [c07f8dd4] sysfs_kf_write+0x70/0xa4 [cbcc3af0] [c07f7248] kernfs_fop_write_iter+0x1d0/0x2e0 [cbcc3b40] [c06c9b08] vfs_write+0x27c/0x558 [cbcc3bf0] [c06ca168] ksys_write+0x90/0x170 [cbcc3c40] [c0033248] system_call_exception+0xf8/0x290 [cbcc3e50] [c000d05c] system_call_vectored_common+0x15c/0x2ec A git bisect pointed this regression to be introduced via [1] that added a mechanism to create device tree nodes for parent PCI bridges when a PCI device is hot-plugged. The Oops is caused when `pci_stop_dev()` tries to remove a non-existing device-tree node associated with the pci_dev that was earlier hot-plugged and was attached under a pci-bridge. The PCI dev header `dev->hdr_type` being 0, results a conditional check done with `pci_is_bridge()` into false. Consequently, a call to `of_pci_make_dev_node()` to create a device node is never made. When at a later point in time, in the device node removal path, a memcpy is attempted in `__of_changeset_entry_invert()`; since the device node was never created, results in an Oops due to kernel read access to a bad address. To fix this issue the patch updates `pci_stop_dev()` to ensure that a call to `of_pci_remove_node()` is only made for pci-bridge devices. [1] commit 407d1a51921e ("PCI: Create device tree node for bridge") Fixes: 407d1a51921e ("PCI: Create device tree node for bridge") Reported-by: Kowshik Jois B S Tested-by: Kowshik Jois B S Signed-off-by: Amit Machhiwal --- drivers/pci/remove.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index d749ea8250d6..4e51c64af416 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -22,7 +22,8 @@ static void pci_stop_dev(struct pci_dev *dev) device_release_driver(&dev->dev); pci_proc_detach_device(dev); pci_remove_sysfs_dev_files(dev); - of_pci_remove_node(dev); + if (pci_is_bridge(dev)) + of_pci_remove_node(dev); pci_dev_assign_added(dev, false); } base-commit: e9d22f7a6655941fc8b2b942ed354ec780936b3e -- 2.45.2
[PATCH v4] KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
Currently, rebooting a pseries nested qemu-kvm guest (L2) results in below error as L1 qemu sends PVR value 'arch_compat' == 0 via ppc_set_compat ioctl. This triggers a condition failure in kvmppc_set_arch_compat() resulting in an EINVAL. qemu-system-ppc64: Unable to set CPU compatibility mode in KVM: Invalid argument Also, a value of 0 for arch_compat generally refers the default compatibility of the host. But, arch_compat, being a Guest Wide Element in nested API v2, cannot be set to 0 in GSB as PowerVM (L0) expects a non-zero value. A value of 0 triggers a kernel trap during a reboot and consequently causes it to fail: [ 22.106360] reboot: Restarting system KVM: unknown exit, hardware reason ffea NIP 0100 LR fe44 CTR XER 20040092 CPU#0 MSR 1000 HID0 HF 6c00 iidx 3 didx 3 TB DECR 0 GPR00 c2a8c300 7fe0 GPR04 1002 82803033 GPR08 0a00 0004 2fff GPR12 c2e1 000105639200 0004 GPR16 00010563a090 GPR20 000105639e20 0001056399c8 7fffe54abab0 000105639288 GPR24 0001 0001 GPR28 c2b30840 CR [ - - - - - - - - ] RES 000@ SRR0 SRR1 PVR 00800200 VRSAVE SPRG0 SPRG1 SPRG2 SPRG3 SPRG4 SPRG5 SPRG6 SPRG7 HSRR0 HSRR1 CFAR LPCR 00020400 PTCR DAR DSISR kernel:trap=0xffea | pc=0x100 | msr=0x1000 This patch updates kvmppc_set_arch_compat() to use the host PVR value if 'compat_pvr' == 0 indicating that qemu doesn't want to enforce any specific PVR compat mode. The relevant part of the code might need a rework if PowerVM implements a support for `arch_compat == 0` in nestedv2 API. Fixes: 19d31c5f1157 ("KVM: PPC: Add support for nestedv2 guests") Reviewed-by: Aneesh Kumar K.V (IBM) Reviewed-by: Vaibhav Jain Signed-off-by: Amit Machhiwal --- Changes v3 -> v4: - Moved part of a code comment around implementation of `arch_compat == 0` in PowerVM to the patch description based on an off mailing list discussion Changes v2 -> v3: - Vaibhav: Use a 'break' instead of switch-case fallthrough - v2: https://lore.kernel.org/all/20240205132607.2776637-1-amach...@linux.ibm.com/ Changes v1 -> v2: - Added descriptive error log in the patch description when `arch_compat == 0` passed in GSB - Added a helper function for PCR to capabilities mapping - Added relevant comments around the changes being made - v1: https://lore.kernel.org/lkml/20240118095653.2588129-1-amach...@linux.ibm.com/ arch/powerpc/kvm/book3s_hv.c | 26 -- arch/powerpc/kvm/book3s_hv_nestedv2.c | 20 ++-- 2 files changed, 42 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 52427fc2a33f..0b921704da45 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -391,6 +391,24 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr) /* Dummy value used in computing PCR value below */ #define PCR_ARCH_31(PCR_ARCH_300 << 1) +static inline unsigned long map_pcr_to_cap(unsigned long pcr) +{ + unsigned long cap = 0; + + switch (pcr) { + case PCR_ARCH_300: + cap = H_GUEST_CAP_POWER9; + break; + case PCR_ARCH_31: + cap = H_GUEST_CAP_POWER10; + break; + default: + break; + } + + return cap; +} + static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) { unsigned long host_pcr_bit = 0, guest_pcr_bit = 0, cap = 0; @@ -424,11 +442,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) break; case PVR_ARCH_300: guest_pcr_bit = PCR_ARCH_300; - cap = H_GUEST_CAP_POWER9; break; case PVR_ARCH_31: guest_pcr_bit = PCR_ARCH_31; - cap = H_GUEST_CAP_POWER10; break; default: return -EINVAL; @@ -440,6 +456,12 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
Re: Re: [PATCH v3] KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
Hi Michael, Thanks for looking into the patch and your comments. On 2024/02/06 09:09 PM, Michael Ellerman wrote: > Hi Amit, > > One comment below ... > > Amit Machhiwal writes: > > Currently, rebooting a pseries nested qemu-kvm guest (L2) results in > > below error as L1 qemu sends PVR value 'arch_compat' == 0 via > > ppc_set_compat ioctl. This triggers a condition failure in > > kvmppc_set_arch_compat() resulting in an EINVAL. > ... > > > > diff --git a/arch/powerpc/kvm/book3s_hv_nestedv2.c > > b/arch/powerpc/kvm/book3s_hv_nestedv2.c > > index 5378eb40b162..6042bdc70230 100644 > > --- a/arch/powerpc/kvm/book3s_hv_nestedv2.c > > +++ b/arch/powerpc/kvm/book3s_hv_nestedv2.c > > @@ -347,8 +348,26 @@ static int gs_msg_ops_vcpu_fill_info(struct > > kvmppc_gs_buff *gsb, > > break; > > } > > case KVMPPC_GSID_LOGICAL_PVR: > > - rc = kvmppc_gse_put_u32(gsb, iden, > > - vcpu->arch.vcore->arch_compat); > > + /* > > +* Though 'arch_compat == 0' would mean the default > > +* compatibility, arch_compat, being a Guest Wide > > +* Element, cannot be filled with a value of 0 in GSB > > +* as this would result into a kernel trap. > > +* Hence, when `arch_compat == 0`, arch_compat should > > +* default to L1's PVR. > > +* > > +* Rework this when PowerVM supports a value of 0 > > +* for arch_compat for KVM API v2. > > +*/ > > Is there an actual plan that PowerVM will support this in future? > > If so, how will a future kernel know that it's running on a version of > PowerVM that does support arch_compat == 0? > > Similarly how will we know when it's OK to drop support for this > workaround? I'm sending a v4 based on an off mailing list discussion. > > cheers ~Amit
[PATCH v3] KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
Currently, rebooting a pseries nested qemu-kvm guest (L2) results in below error as L1 qemu sends PVR value 'arch_compat' == 0 via ppc_set_compat ioctl. This triggers a condition failure in kvmppc_set_arch_compat() resulting in an EINVAL. qemu-system-ppc64: Unable to set CPU compatibility mode in KVM: Invalid argument Also, a value of 0 for arch_compat generally refers the default compatibility of the host. But, arch_compat, being a Guest Wide Element in nested API v2, cannot be set to 0 in GSB as PowerVM (L0) expects a non-zero value. A value of 0 triggers a kernel trap during a reboot and consequently causes it to fail: [ 22.106360] reboot: Restarting system KVM: unknown exit, hardware reason ffea NIP 0100 LR fe44 CTR XER 20040092 CPU#0 MSR 1000 HID0 HF 6c00 iidx 3 didx 3 TB DECR 0 GPR00 c2a8c300 7fe0 GPR04 1002 82803033 GPR08 0a00 0004 2fff GPR12 c2e1 000105639200 0004 GPR16 00010563a090 GPR20 000105639e20 0001056399c8 7fffe54abab0 000105639288 GPR24 0001 0001 GPR28 c2b30840 CR [ - - - - - - - - ] RES 000@ SRR0 SRR1 PVR 00800200 VRSAVE SPRG0 SPRG1 SPRG2 SPRG3 SPRG4 SPRG5 SPRG6 SPRG7 HSRR0 HSRR1 CFAR LPCR 00020400 PTCR DAR DSISR kernel:trap=0xffea | pc=0x100 | msr=0x1000 This patch updates kvmppc_set_arch_compat() to use the host PVR value if 'compat_pvr' == 0 indicating that qemu doesn't want to enforce any specific PVR compat mode. Fixes: 19d31c5f1157 ("KVM: PPC: Add support for nestedv2 guests") Reviewed-by: Aneesh Kumar K.V (IBM) Reviewed-by: Vaibhav Jain Signed-off-by: Amit Machhiwal --- Changes v2 -> v3: - Vaibhav: Use a 'break' instead of switch-case fallthrough - v2: https://lore.kernel.org/all/20240205132607.2776637-1-amach...@linux.ibm.com/ Changes v1 -> v2: - Added descriptive error log in the patch description when `arch_compat == 0` passed in GSB - Added a helper function for PCR to capabilities mapping - Added relevant comments around the changes being made - v1: https://lore.kernel.org/lkml/20240118095653.2588129-1-amach...@linux.ibm.com/ arch/powerpc/kvm/book3s_hv.c | 26 -- arch/powerpc/kvm/book3s_hv_nestedv2.c | 23 +-- 2 files changed, 45 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 52427fc2a33f..0b921704da45 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -391,6 +391,24 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr) /* Dummy value used in computing PCR value below */ #define PCR_ARCH_31(PCR_ARCH_300 << 1) +static inline unsigned long map_pcr_to_cap(unsigned long pcr) +{ + unsigned long cap = 0; + + switch (pcr) { + case PCR_ARCH_300: + cap = H_GUEST_CAP_POWER9; + break; + case PCR_ARCH_31: + cap = H_GUEST_CAP_POWER10; + break; + default: + break; + } + + return cap; +} + static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) { unsigned long host_pcr_bit = 0, guest_pcr_bit = 0, cap = 0; @@ -424,11 +442,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) break; case PVR_ARCH_300: guest_pcr_bit = PCR_ARCH_300; - cap = H_GUEST_CAP_POWER9; break; case PVR_ARCH_31: guest_pcr_bit = PCR_ARCH_31; - cap = H_GUEST_CAP_POWER10; break; default: return -EINVAL; @@ -440,6 +456,12 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) return -EINVAL; if (kvmhv_on_pseries() && kvmhv_is_nestedv2()) { + /* +* 'arch_compat == 0' would mean the guest should default to +* L1's compatibility. In this case, the guest woul
Re: Re: [PATCH v2] KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
Hi Vaibhav, Thanks for looking into the patch. On 2024/02/05 11:05 PM, Vaibhav Jain wrote: > Hi Amit, > > Thanks for the patch. Minor comment on the patch below: > > Amit Machhiwal writes: > > > > > > > +static inline unsigned long map_pcr_to_cap(unsigned long pcr) > > +{ > > + unsigned long cap = 0; > > + > > + switch (pcr) { > > + case PCR_ARCH_300: > > + cap = H_GUEST_CAP_POWER9; > > + break; > > + case PCR_ARCH_31: > > + cap = H_GUEST_CAP_POWER10; > Though CONFIG_CC_IMPLICIT_FALLTHROUGH and '-Wimplicit-fallthrough' > doesnt explicitly flag this usage, please consider using the > 'fallthrough;' keyword here. > > However you probably dont want this switch-case to fallthrough so please > use a 'break' instead. Sure, v3 on the way. > > > + default: > > + break; > > + } > > + > > + return cap; > > +} > > + > > > > > With the suggested change above > > Reviewed-by: Vaibhav Jain Thanks! > > -- > Cheers > ~ Vaibhav ~Amit
[PATCH v2] KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
Currently, rebooting a pseries nested qemu-kvm guest (L2) results in below error as L1 qemu sends PVR value 'arch_compat' == 0 via ppc_set_compat ioctl. This triggers a condition failure in kvmppc_set_arch_compat() resulting in an EINVAL. qemu-system-ppc64: Unable to set CPU compatibility mode in KVM: Invalid argument Also, a value of 0 for arch_compat generally refers the default compatibility of the host. But, arch_compat, being a Guest Wide Element in nested API v2, cannot be set to 0 in GSB as PowerVM (L0) expects a non-zero value. A value of 0 triggers a kernel trap during a reboot and consequently causes it to fail: [ 22.106360] reboot: Restarting system KVM: unknown exit, hardware reason ffea NIP 0100 LR fe44 CTR XER 20040092 CPU#0 MSR 1000 HID0 HF 6c00 iidx 3 didx 3 TB DECR 0 GPR00 c2a8c300 7fe0 GPR04 1002 82803033 GPR08 0a00 0004 2fff GPR12 c2e1 000105639200 0004 GPR16 00010563a090 GPR20 000105639e20 0001056399c8 7fffe54abab0 000105639288 GPR24 0001 0001 GPR28 c2b30840 CR [ - - - - - - - - ] RES 000@ SRR0 SRR1 PVR 00800200 VRSAVE SPRG0 SPRG1 SPRG2 SPRG3 SPRG4 SPRG5 SPRG6 SPRG7 HSRR0 HSRR1 CFAR LPCR 00020400 PTCR DAR DSISR kernel:trap=0xffea | pc=0x100 | msr=0x1000 This patch updates kvmppc_set_arch_compat() to use the host PVR value if 'compat_pvr' == 0 indicating that qemu doesn't want to enforce any specific PVR compat mode. Fixes: 19d31c5f1157 ("KVM: PPC: Add support for nestedv2 guests") Signed-off-by: Amit Machhiwal --- Changes v1 -> v2: - Added descriptive error log in the patch description when `arch_compat == 0` passed in GSB - Added a helper function for PCR to capabilities mapping - Added relevant comments around the changes being made v1: https://lore.kernel.org/lkml/20240118095653.2588129-1-amach...@linux.ibm.com/ arch/powerpc/kvm/book3s_hv.c | 25 +++-- arch/powerpc/kvm/book3s_hv_nestedv2.c | 23 +-- 2 files changed, 44 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 52427fc2a33f..270ab9cf9a54 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -391,6 +391,23 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr) /* Dummy value used in computing PCR value below */ #define PCR_ARCH_31(PCR_ARCH_300 << 1) +static inline unsigned long map_pcr_to_cap(unsigned long pcr) +{ + unsigned long cap = 0; + + switch (pcr) { + case PCR_ARCH_300: + cap = H_GUEST_CAP_POWER9; + break; + case PCR_ARCH_31: + cap = H_GUEST_CAP_POWER10; + default: + break; + } + + return cap; +} + static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) { unsigned long host_pcr_bit = 0, guest_pcr_bit = 0, cap = 0; @@ -424,11 +441,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) break; case PVR_ARCH_300: guest_pcr_bit = PCR_ARCH_300; - cap = H_GUEST_CAP_POWER9; break; case PVR_ARCH_31: guest_pcr_bit = PCR_ARCH_31; - cap = H_GUEST_CAP_POWER10; break; default: return -EINVAL; @@ -440,6 +455,12 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) return -EINVAL; if (kvmhv_on_pseries() && kvmhv_is_nestedv2()) { + /* +* 'arch_compat == 0' would mean the guest should default to +* L1's compatibility. In this case, the guest would pick +* host's PCR and evaluate the corresponding capabilities. +*/ + cap = map_pcr_to_cap(guest_pcr_bit); if (!(cap & nested_capabilities)) return -EINVAL; } diff --git a/arch/powerpc/kvm/book3s_hv_nestedv2
Re: Re: [PATCH] KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
Hi Aneesh, Thanks for looking into the patch. My comments are inline below. On 2024/01/24 01:06 PM, Aneesh Kumar K.V wrote: > Amit Machhiwal writes: > > > Currently, rebooting a pseries nested qemu-kvm guest (L2) results in > > below error as L1 qemu sends PVR value 'arch_compat' == 0 via > > ppc_set_compat ioctl. This triggers a condition failure in > > kvmppc_set_arch_compat() resulting in an EINVAL. > > > > qemu-system-ppc64: Unable to set CPU compatibility mode in KVM: Invalid > > > > This patch updates kvmppc_set_arch_compat() to use the host PVR value if > > 'compat_pvr' == 0 indicating that qemu doesn't want to enforce any > > specific PVR compat mode. > > > > Signed-off-by: Amit Machhiwal > > --- > > arch/powerpc/kvm/book3s_hv.c | 2 +- > > arch/powerpc/kvm/book3s_hv_nestedv2.c | 12 ++-- > > 2 files changed, 11 insertions(+), 3 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > > index 1ed6ec140701..9573d7f4764a 100644 > > --- a/arch/powerpc/kvm/book3s_hv.c > > +++ b/arch/powerpc/kvm/book3s_hv.c > > @@ -439,7 +439,7 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu > > *vcpu, u32 arch_compat) > > if (guest_pcr_bit > host_pcr_bit) > > return -EINVAL; > > > > - if (kvmhv_on_pseries() && kvmhv_is_nestedv2()) { > > + if (kvmhv_on_pseries() && kvmhv_is_nestedv2() && arch_compat) { > > if (!(cap & nested_capabilities)) > > return -EINVAL; > > } > > > > Instead of that arch_compat check, would it better to do > > if (kvmhv_on_pseries() && kvmhv_is_nestedv2()) { > if (cap && !(cap & nested_capabilities)) > return -EINVAL; > } > > ie, if a capability is requested, then check against nested_capbilites > to see if the capability exist. The above condition check will cause problems when we would try to boot a machine below Power 9. For example, if we passed the arch_compat == PVR_ARCH_207, cap will remain 0 resulting the above check into a false condition. Consequently, we would never return an -EINVAL in that case resulting the arch compatilbility request succeed when it doesn't support nested papr guest. > > > > diff --git a/arch/powerpc/kvm/book3s_hv_nestedv2.c > > b/arch/powerpc/kvm/book3s_hv_nestedv2.c > > index fd3c4f2d9480..069a1fcfd782 100644 > > --- a/arch/powerpc/kvm/book3s_hv_nestedv2.c > > +++ b/arch/powerpc/kvm/book3s_hv_nestedv2.c > > @@ -138,6 +138,7 @@ static int gs_msg_ops_vcpu_fill_info(struct > > kvmppc_gs_buff *gsb, > > vector128 v; > > int rc, i; > > u16 iden; > > + u32 arch_compat = 0; > > > > vcpu = gsm->data; > > > > @@ -347,8 +348,15 @@ static int gs_msg_ops_vcpu_fill_info(struct > > kvmppc_gs_buff *gsb, > > break; > > } > > case KVMPPC_GSID_LOGICAL_PVR: > > - rc = kvmppc_gse_put_u32(gsb, iden, > > - vcpu->arch.vcore->arch_compat); > > + if (!vcpu->arch.vcore->arch_compat) { > > + if (cpu_has_feature(CPU_FTR_ARCH_31)) > > + arch_compat = PVR_ARCH_31; > > + else if (cpu_has_feature(CPU_FTR_ARCH_300)) > > + arch_compat = PVR_ARCH_300; > > + } else { > > + arch_compat = vcpu->arch.vcore->arch_compat; > > + } > > + rc = kvmppc_gse_put_u32(gsb, iden, arch_compat); > > > > Won't a arch_compat = 0 work here?. ie, where you observing the -EINVAL from > the first hunk or does this hunk have an impact? > No, an arch_compat == 0 won't work in nested API v2. That's because the guest wide PVR cannot be 0, and if arch_compat == 0, then suppported host PVR value should be mentioned. If we were to skip this hunk (keeping the arch_compat == 0), a system reboot of L2 guest would fail and result into a kernel trap as below: [ 22.106360] reboot: Restarting system KVM: unknown exit, hardware reason ffea NIP 0100 LR fe44 CTR XER 20040092 CPU#0 MSR 1000 HID0 HF 6c00 iidx 3 didx 3 TB DECR 0 GPR00 c2a8c300 7fe0 GPR04 1002 8000
[PATCH] KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
Currently, rebooting a pseries nested qemu-kvm guest (L2) results in below error as L1 qemu sends PVR value 'arch_compat' == 0 via ppc_set_compat ioctl. This triggers a condition failure in kvmppc_set_arch_compat() resulting in an EINVAL. qemu-system-ppc64: Unable to set CPU compatibility mode in KVM: Invalid This patch updates kvmppc_set_arch_compat() to use the host PVR value if 'compat_pvr' == 0 indicating that qemu doesn't want to enforce any specific PVR compat mode. Signed-off-by: Amit Machhiwal --- arch/powerpc/kvm/book3s_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_nestedv2.c | 12 ++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 1ed6ec140701..9573d7f4764a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -439,7 +439,7 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat) if (guest_pcr_bit > host_pcr_bit) return -EINVAL; - if (kvmhv_on_pseries() && kvmhv_is_nestedv2()) { + if (kvmhv_on_pseries() && kvmhv_is_nestedv2() && arch_compat) { if (!(cap & nested_capabilities)) return -EINVAL; } diff --git a/arch/powerpc/kvm/book3s_hv_nestedv2.c b/arch/powerpc/kvm/book3s_hv_nestedv2.c index fd3c4f2d9480..069a1fcfd782 100644 --- a/arch/powerpc/kvm/book3s_hv_nestedv2.c +++ b/arch/powerpc/kvm/book3s_hv_nestedv2.c @@ -138,6 +138,7 @@ static int gs_msg_ops_vcpu_fill_info(struct kvmppc_gs_buff *gsb, vector128 v; int rc, i; u16 iden; + u32 arch_compat = 0; vcpu = gsm->data; @@ -347,8 +348,15 @@ static int gs_msg_ops_vcpu_fill_info(struct kvmppc_gs_buff *gsb, break; } case KVMPPC_GSID_LOGICAL_PVR: - rc = kvmppc_gse_put_u32(gsb, iden, - vcpu->arch.vcore->arch_compat); + if (!vcpu->arch.vcore->arch_compat) { + if (cpu_has_feature(CPU_FTR_ARCH_31)) + arch_compat = PVR_ARCH_31; + else if (cpu_has_feature(CPU_FTR_ARCH_300)) + arch_compat = PVR_ARCH_300; + } else { + arch_compat = vcpu->arch.vcore->arch_compat; + } + rc = kvmppc_gse_put_u32(gsb, iden, arch_compat); break; } -- 2.43.0