On Wed, 14 Jun 2017 10:00:15 -0300
Eduardo Habkost <ehabk...@redhat.com> wrote:

> On Wed, Jun 14, 2017 at 02:25:07PM +0300, Roman Kagan wrote:
> > On Tue, Jun 13, 2017 at 03:57:52PM -0300, Eduardo Habkost wrote:  
> > > On Tue, Jun 06, 2017 at 09:19:30PM +0300, Roman Kagan wrote:  
> > > > Hyper-V identifies vcpus by the virtual processor (VP) index which is
> > > > normally queried by the guest via HV_X64_MSR_VP_INDEX msr.
> > > > 
> > > > It has to be owned by QEMU in order to preserve it across migration.
> > > > 
> > > > However, the current implementation in KVM doesn't allow to set this
> > > > msr, and KVM uses its own notion of VP index.  Fortunately, the way
> > > > vcpus are created in QEMU/KVM basically guarantees that the KVM value is
> > > > equal to QEMU cpu_index.  
> > > 
> > > This might not be true in the future.  cpu_index is not a good
> > > identifier for CPUs, and we would like to remove it in the
> > > future.
> > > 
> > > But it looks like we have no choice, see extra comments below:  
> >   
> > > > +static void hyperv_set_vp_index(CPUState *cs)
> > > > +{
> > > > +    struct {
> > > > +        struct kvm_msrs info;
> > > > +        struct kvm_msr_entry entries[1];
> > > > +    } msr_data;
> > > > +    int ret;
> > > > +
> > > > +    msr_data.info.nmsrs = 1;
> > > > +    msr_data.entries[0].index = HV_X64_MSR_VP_INDEX;
> > > > +
> > > > +    /*
> > > > +     * Some kernels don't support setting HV_X64_MSR_VP_INDEX.  
> > > > However,
> > > > +     * they use VP_INDEX equal to cpu_index due to the way vcpus are 
> > > > created.
> > > > +     * So ignore the errors from SET_MSRS, but verify that GET_MSRS 
> > > > returns the
> > > > +     * expected value.
> > > > +     */  
> > > 
> > > Oh.  This sounds like a problem.  As reference for others, this
> > > is the KVM code in kvm_hv_get_msr():
> > > 
> > >   case HV_X64_MSR_VP_INDEX: {
> > >           int r;
> > >           struct kvm_vcpu *v;
> > > 
> > >           kvm_for_each_vcpu(r, v, vcpu->kvm) {
> > >                   if (v == vcpu) {
> > >                           data = r;
> > >                           break;
> > >                   }
> > >           }
> > >           break;
> > >   }
> > > 
> > > The problem with that is that it will break as soon as we create
> > > VCPUs in a different order.  Unsolvable on hosts that don't allow
> > > HV_X64_MSR_VP_INDEX to be set, however.  
> > 
> > Right, thanks for putting together a detailed explanation.
> > 
> > This was a thinko back then, not to have HV_X64_MSR_VP_INDEX maintained
> > by QEMU.  I'm going to post a patch to KVM fixing that.
> > 
> > Meanwhile QEMU needs a way to maintain its notion of vp_index that is
> >   1) in sync with kernel's notion
> >   2) also with kernels that don't support setting the msr
> >   3) persistent across migrations
> > 
> > cpu_index looked like a perfect candidate.  
> 
> I would like to be able to stop using cpu_index as a persistent
> VCPU identifier in the future.  I would also like to be able to
> create VCPUs in any order without breaking guest ABI.  But it
> seems to be impossible to do that without breaking vp_index on
> older kernels.
> 
> So cpu_index looks like the only candidate we have.
> 
> >   
> > > > +    msr_data.entries[0].data = cs->cpu_index;
> > > > +    kvm_vcpu_ioctl(cs, KVM_SET_MSRS, &msr_data);
> > > > +    ret = kvm_vcpu_ioctl(cs, KVM_GET_MSRS, &msr_data);
> > > > +    assert(ret == 1);
> > > > +    assert(msr_data.entries[0].data == cs->cpu_index);  
> > > 
> > > If KVM already initializes the MSR to cpu_index and we will abort
> > > if it was not set to anything except cpu_index here, why exactly
> > > do we need the KVM_SET_MSRS call?  
> > 
> > The kernel made no obligations to keep initializing its vp_index
> > identically to QEMU's cpu_index.  So we'd better check and abort if that
> > got out of sync.  Once KVM gains the ability to learn vp_index from QEMU
> > we'll no longer depend on that as we'll do explicit synchronization.  
> 
> I think the kernel now has an obligation to keep initializing
> HV_X64_MSR_VP_INDEX in a compatible way, or it will break older
> QEMU versions that don't set the MSR.
> 
> But if you don't think we can rely on that, then the KVM_SET_MSRS
> call won't hurt.

if we can tell apart old index based vp_index kernel and new that supports
MSR setting (add cap check perhaps) then we should be able to
 - leave out old vp_index broken as it is now (for old kernels/old QEMU and/ 
new QEMU old machine types)
     i.e. do not set vp_index MSR in QEMU
 - in case of (new QEMU new machine type/new kernel) use APIC ID which would 
work
   with out of order CPU creation



Reply via email to