On Wed, Jun 14, 2017 at 09:40:37PM +0300, Roman Kagan wrote:
> On Wed, Jun 14, 2017 at 10:45:23AM -0300, Eduardo Habkost wrote:
> > On Wed, Jun 14, 2017 at 03:38:59PM +0200, Igor Mammedov wrote:
> > > On Wed, 14 Jun 2017 10:22:16 -0300
> > > Eduardo Habkost <ehabk...@redhat.com> wrote:
> > > 
> > > > On Wed, Jun 14, 2017 at 03:17:54PM +0200, Paolo Bonzini wrote:
> > > > > 
> > > > > 
> > > > > On 14/06/2017 15:11, Igor Mammedov wrote:  
> > > > > >> No, KVM really uses the VCPU _index_ for HV_X64_MSR_VP_INDEX:  
> > > > > >
> > > > > > and as you pointed out that works just by luck,
> > > > > > as soon as we there would be out of order created CPUs
> > > > > > returned value won't match cpu_index.
> > > > > > 
> > > > > > So instead of spreading this nonsense out to QEMU, is it possible
> > > > > > to fix kernel(kvm+guest) to use apic_id instead?  
> > > > > 
> > > > > Yes, definitely.  At this point let's add a new 
> > > > > "KVM_CAP_HYPERV_SYNIC2"
> > > > > capability and declare the old one broken, that will make things 
> > > > > easier.  
> > > > 
> > > > What do we need a capability for?
> > > > Can't we just call
> > > > KVM_SET_MSRS using arch_id (== apic_id) and warn the user if it
> > > > doesn't work?
> > > warn won't solve issue, and it's called rather late in vcpu creation
> > > chain without any error propagation so it's not possible (hard) to
> > > gracefully fail cpu hotplug.
> > 
> > The issue is unsolvable on old kernels (and I don't think we want
> > to prevent the VM from starting), hence the warning.  But the
> > ability to report errors gracefully on CPU hotplug is a very good
> > reason.  Good point.
> > 
> > > 
> > > with KVM_CAP_HYPERV_SYNIC2 we could easily turn on compat property for
> > > SYNC device (needs QOMification) that would turn on broken compat
> > > logic if capability is not present/old machine type.
> > 
> > I was going to propose enabling the compat logic if KVM_SET_MSRS
> > failed, but you have a good point about KVM_SET_MSRS being called
> > very late.
> 
> Thanks for this discussion, I didn't realize all the implications
> (including that hotplug issue too).
> 
> One more data point is that until now there was no use for vp_index in
> QEMU, so it didn't care how KVM managed it.  In KVM the only
> vp_index-aware path that the guests could trigger was exactly reading of
> HV_X64_MSR_VP_INDEX.
> 
> So let me try to sum up (to make sure I understand it right);
> 
> 1) we add KVM_CAP_HYPERV_SYNIC2 to KVM; when QEMU enables it KVM
>    switches to using vcpu_id as vp_index and stops zeroing synic pages

If we want to keep KVM code simpler, we could make QEMU
explicitly initialize vp_index using vcpu_id (== arch_id == apic_id)
if KVM_CAP_HYPERV_SYNIC2 is reported as supported.

> 
> 2) new QEMU refuses to start in non-compat mode when
>    KVM_CAP_HYPERV_SYNIC2 is not supported

It depends on which cases are considered "in non-compat mode".
Getting a VM not runnable just because the machine-type was
updated is not desirable, especially considering that on most
cases we will create the VCPUs on the same order and things would
keep working.  Probably the best we can do on this case is to
automatically enable compat mode, but print a warning saying
future QEMU versions might break if the kernel is not upgraded.

> 
> 3) old QEMU or new QEMU in compat mode enables KVM_CAP_HYPERV_SYNIC
>    making KVM keep using internal cpu index as vp_index and zeroing
>    synic pages

Maybe we need a warning on this case too, because QEMU won't
guarantee it will always create VCPUs in the same order in the
future.

> 
> 4) new QEMU in compat mode refuses vmbus or sint route creation

Also: new QEMU in compat mode refuses CPU hotplug.

> 
> Is this what is proposed?  My only problem here is that KVM will have to
> somehow guarantee stable numbering in the old synic mode.  How can this
> be ensured?  And should it be at all?

As far as I can see, KVM can only guarantee stable numbering in
the old mode if QEMU creates the VCPUs in the same order.  QEMU
can do a reasonable effort to not break that unnecessarily, but
users should be aware that old mode is deprecated and can break.

-- 
Eduardo

Reply via email to