On Wed, Jul 24, 2013 at 03:21:48PM +0200, Paolo Bonzini wrote: > Il 24/07/2013 15:15, Eduardo Habkost ha scritto: > > On Tue, Jul 23, 2013 at 09:43:06PM +0200, Paolo Bonzini wrote: > >> Il 23/07/2013 19:41, Eduardo Habkost ha scritto: > >>> On Tue, Jul 23, 2013 at 06:23:08PM +0200, Paolo Bonzini wrote: > >>>> Il 23/07/2013 17:40, Eduardo Habkost ha scritto: > >>>>> On Tue, Jul 23, 2013 at 05:09:02PM +0200, Paolo Bonzini wrote: > >>>>>> Il 23/07/2013 16:13, Eduardo Habkost ha scritto: > >>>>>>> On Tue, Jul 23, 2013 at 11:18:03AM +0200, Paolo Bonzini wrote: > >>>>>>>> Il 22/07/2013 21:25, Eduardo Habkost ha scritto: > >>>>>>>>> Bug description: QEMU currently gets all bits from > >>>>>>>>> GET_SUPPORTED_CPUID > >>>>>>>>> for CPUID leaf 0xA and passes them directly to the guest. This makes > >>>>>>>>> the guest ABI depend on host kernel and host CPU capabilities, and > >>>>>>>>> breaks live migration if we migrate between host with different > >>>>>>>>> capabilities (e.g. different number of PMU counters). > >>>>>>>>> > >>>>>>>>> This patch adds a "pmu-passthrough" property to X86CPU, and set it > >>>>>>>>> to > >>>>>>>>> true only on "-cpu host", or on pc-*-1.5 and older machine-types. > >>>>>>>> > >>>>>>>> Can we just call the property "pmu"? It doesn't have to be > >>>>>>>> passthough. > >>>>>>> > >>>>>>> Yes, but the only options we have today are "no PMU" and "passthrough > >>>>>>> PMU". I wouldn't like to make "pmu=on" enable the passthrough behavior > >>>>>>> implicitly (I don't want things that break live-migration to be > >>>>>>> enabled > >>>>>>> without making it explicit that it is a host-dependent/passthrough > >>>>>>> mode). > >>>>>> > >>>>>> I think "passthrough PMU" should be considered a bug except of course > >>>>>> with "-cpu host". > >>>>>> > >>>>>> If "-cpu Nehalem,pmu=on" goes from passthrough to Nehalem-compatible in > >>>>>> a future QEMU release, that'll be a bugfix. > >>>>> > >>>>> Exactly. But then I don't understand your suggestion. We still need a > >>>>> property to enable pasthrough behavior on old machine-types (not > >>>>> perfect, but a best-effort way to try to keep compatibility), > >>>> > >>>> Do we? > >>>> > >>>> We only need "pmu=on"---which right now is buggy on old machine types > >>>> because it will always passthrough. > >>> > >>> I am not sure I understand what you are arguing for. > >>> > >>> You agree that pmu=on needs to keep the buggy passthrough behavior on > >>> pc-1.5 and older, right? > >> > >> I agree it needs to remain enabled on 1.5. But if, for example, 1.8 > >> makes pmu=on emulate a Nehalem-compatible PMU, I think it is fine if > >> pc-1.5 moves from a host-compatible PMU to a Nehalem-compatible PMU. > > > > That's where I disagree. Today users are (luckily) able to migrate > > safely between hosts with the same number of PMU counters. But if we > > make, e.g., "qemu-1.6 -machine pc-1.5 -cpu Westmere" present a smaller > > number of PMU counters than "qemu-1.5 -machine pc-1.5 -cpu Westmere" on > > the same host, we will break an existing setup where everything was > > working before, which is something we could have easily avoided. > > But at the same time we will fix live migration from a Sandy Bridge host > to a Westmere. So it's a choice we have to make anyway.
True. > > > (Just to clarify what breaking this means in practice: changing the > > number of PMU counters under the guest on live-migration means the guest > > will crash when trying to use counters that suddenly went away, and it > > may crash a very long time after it was migrated.) > > And at the same time we fix live migration of a Sandy Bridge to a Westmere. Something that never worked in the first place. Breaking what is working today, on the other hand, is a regression. If users are interested in a fix for the new SandyBrige->Westmere use-case, we can always say "please upgrade your VM to a newer machine-type". > > >> The reason is that pc-1.5 has never guaranteed any feature of the > >> emulated PMU. > > > > Right, current behavior is buggy and we never guaranteed anything, but > > IMO we shouldn't break on purpose something that is working today. > > Even if it is to fix something else? I believe so, because machine-types allow us to have both: we can fix the new use-cases in new machine-types while keeping existing working setups without regressions on the older machine-types. -- Eduardo