On Thu, 22 Sep 2022 12:40:01 -0400 Peter Xu <pet...@redhat.com> wrote:
> On Thu, Sep 22, 2022 at 03:46:17PM +0200, Igor Mammedov wrote: > > On Wed, 21 Sep 2022 12:12:27 -0400 > > Peter Xu <pet...@redhat.com> wrote: > > > > > It's true that when vcpus<=255 we don't require the length of 32bit APIC > > > IDs. However here since we already have EIM=ON it means the hypervisor > > > will declare the VM as x2apic supported (e.g. VT-d ECAP register will have > > > EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width > > > even if vcpus<=255. In short, commit 77250171bdc breaks any simple > > > cmdline > > > that wants to boot a VM with >=9 but <=255 vcpus with: > > > > > > -device intel-iommu,intremap=on > > > > > > For anyone who does not want to enable x2apic, we can use eim=off in the > > > intel-iommu parameters to skip enabling KVM x2apic. > > > > > > This partly reverts commit 77250171bdc02aee106083fd2a068147befa1a38, while > > > keeping the valid bit on checking split irqchip, but revert the other > > > change. > > > > > > Cc: David Woodhouse <dw...@infradead.org> > > > Cc: Claudio Fontana <cfont...@suse.de> > > > Cc: Igor Mammedov <imamm...@redhat.com> > > > Signed-off-by: Peter Xu <pet...@redhat.com> > > > --- > > > hw/i386/intel_iommu.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > > index 05d53a1aa9..6524c2ee32 100644 > > > --- a/hw/i386/intel_iommu.c > > > +++ b/hw/i386/intel_iommu.c > > > @@ -3818,6 +3818,11 @@ static bool vtd_decide_config(IntelIOMMUState *s, > > > Error **errp) > > > error_setg(errp, "eim=on requires > > > accel=kvm,kernel-irqchip=split"); > > > return false; > > > } > > > + if (!kvm_enable_x2apic()) { > > > > above 'check' has side-effects > > if it's supposed to be a check it would be better to use > > kvm_has_x2apic_api() > > instead. > > A check is not what I wanted. > > As stated in the commit message, since for some reason EIM is enabled on > the VT-d device already, we need to enable x2apic for the whole guest > (including KVM) to match with EIM=on being declared even if vcpus<255. > > > > > Also 77250171bdc says: > > " > > The check on kvm_enable_x2apic() needs to happen *anyway* in order to > > allow CPUs above 254 even without an IOMMU, so allow that to happen > > elsewhere. > > " > > > > Looking for that elsewhere, it looks like commit dc89f32d92b was supposed > > to take care of removed chunk, but that is not reachable because of > 255 > > vCPUs" > > > > Likely 77250171bdc just exposed a bug in dc89f32d92b, where > > the later removed kvm_enable_x2apic() always called (with split irqchip) > > and made it called only when > 255 vCPUs. > > > > So migration wise it looks like all version with it and less than 255 cpus > > are broken. > > > > Wait earlier c1bb5418e3, introduced that > > kvm_irqchip_is_split() && kvm_enable_x2apic() > > 'condition', also without any compat machinery to keep old behavior. > > There's actually some attempt to be compatible, with: > > GlobalProperty pc_compat_5_1[] = { > { "ICH9-LPC", "x-smi-cpu-hotplug", "off" }, > + { TYPE_X86_CPU, "kvm-msi-ext-dest-id", "off" }, > }; > > But I don't think that's strictly correct, because afaict that only > controls whether guest enables it or not (I can only see Linux does it that > way; no idea whether that's detected to other OSes from the PV interfaces). > The KVM x2apic will still be enabled even on old machines I think, as you > mentioned. yep, that compat affects only kvm-msi-ext-dest-id, the kvm_enable_x2apic() was still called regardless of that. > > And before that kvm_enable_x2apic() was affecting only configuration > > with intel_iommu (fb506e701e9b). > > Right, afaict that's what we "officially" support. > > > > > I'm not sure if anything could be salvaged here migration wise > > This whole thing is indeed very unfortunate. For easier reference of > future, here are the list of commits that are relevant in time order: > > fb506e701e ("intel_iommu: reject broken EIM", 2016-10-17) > c1bb5418e3 ("target/i386: Support up to 32768 CPUs without IRQ remapping", > 2020-12-10) > 77250171bd ("intel_iommu: Fix irqchip / X2APIC configuration checks", > 2022-05-16) > dc89f32d92 ("target/i386: Fix sanity check on max APIC ID / X2APIC > enablement", 2022-05-16) > > So regarding compatibility I'm wondering whether we should loose it for > this case, depending on whether vendors (like RH, or QEMU community in > general) should support "allowing 32K vcpu without vIOMMU" as an "official" > feature, or treat it as "experimental only". If I recall correctly that's PV only feature that also requires special tailored guest. i.e. it's possible to deliver IPIs in such configuration but devices could interact only with a fraction of CPUs (irq-wise) or something else should take care of IRQ remapping. I don't think mainstream distributions care about this case much. > IMO it's more important to always make the officially supported bits > compatible and work as expected. Here IIUC the (only) official way to > support >255 vcpus should still be using vIOMMU with EIM enabled so far. > But I'm happy to be corrected.. > > If so, I would still suggest having such a patch because it should fix one > of the basic use cases with vIOMMU and currently upstream is broken on it. > This patch will definitely change the behavior again, but the old was > simply broken and we don't really have much choice out of it, IMHO. The thing I worry about is calling kvm_enable_x2apic() from multiple places. It would be better to fix x86_cpus_init() part of dc89f32d92 and have a check in IOMMU. But if that's impossible/too ugly, I won't object to this patch. > Thanks, > > > > > PS: > > I'd keep kvm_enable_x2apic() only in corrected x86_cpus_init() > > and use kvm_has_x2apic_api() elsewhere for checks and bailing out. > > > > > > > + error_setg(errp, "eim=on requires support on the KVM side" > > > + "(X2APIC_API, first shipped in v4.7)"); > > > + return false; > > > + } > > > > > > > > > } > > > > > > /* Currently only address widths supported are 39 and 48 bits */ > > >