Andrew Jones <drjo...@redhat.com> wrote: > On Tue, Sep 04, 2018 at 09:16:58AM +0000, Jaggi, Manish wrote: >> So which approach should be taken here, whats your take... >>
[ Remoning Anthony form CC. Address don't exist anymore ] > Inventing a base-AArch64 cpu model that can then be extended with optional > features is a nice way to extend the migratability of a guest, however > it's hard to do because of errata. Since errata workarounds are enabled > per MIDR, then we'd need to invent our own MIDR and also some way to > communicate which errata we want to enable, possibly through some paravirt > mechanism or through some implementation defined system registers that > KVM would need to reserve and define. > > That's not just a ton of work for the entire virt stack (not just KVM and > QEMU, but also all the layers above), but it's possible that it won't be > useful in the end anyway. There's risk that enabling just one erratum > workaround would restrict the guest to hosts of the exact same type > anyway. For each erratum that needs to be enabled, the probability of > enabling an incompatible one goes up, so it may not be likely to do much > better than '-cpu host' in the end. I'm afraid that until errata are > primarily showing up in optional CPU features that can simply be disabled > for the workaround, that we're stuck with '-cpu host'. I'd be happy to > discuss it more though. Then, we are basically at the point when we can only migrate to the exact same processor, no? > In short, I'd go with the proposal above, for now, with possibly one > change. libvirt folk (Andrea Bolognani and Pino Toscano) suggest that > the guest invariant register updating on the destination host only be > done if the user opts-in to it. This is because right now if a user > tries to migrate to a host that is not 100% identical the migration > will fail, which makes the "mistake" clear. If we silently change the > behavior to allow it, then what could have been a mistake, because > the hosts aren't actually "close enough", may go unnoticed. I'm not > 100% sure we need another user opt-in flag to be set, though, as I > think the '-cpu host' indicates the user expects the VCPU to look > like the host CPU, and even after migration that expectation should be > met. Simply, users that migrate '-cpu host' VMs need to know what they're > doing. I don't know really what to say here: - on the one hand, not creating the proper cpu types is going to bite us, big time, later. - on the other hand, it appears that cpu compatibility is not so "strong", "nice", or whatever do you want to call it on ARM land. Why I am so worried? Because we have spent lots (and I mean lots) of time on x86_64 when we forgot to enable/disable/indicate that one cpu has a new MSR/feature. Normal problem is that it only happens when customer is using that particular feature. It has been the case for us that to reporduce the problem we had to ping-pong migration several hundred times between two different cpus until we found _why_ it failed. So, I am pretty sure that: a- doing it right is a lot of work now. b- doing it fast now is a lot of work now, and much more work later. Later, Juan.