> On 2 Nov 2018, at 11:40, Paolo Bonzini <pbonz...@redhat.com> wrote:
> 
> On 02/11/2018 04:46, Liran Alon wrote:
>>> On Thu, Nov1, 2018 at 09:45 AM, Jim Mattson <jmatt...@google.com> wrote:
>> 
>>>> On Thu, Nov 1, 2018 at 8:56 AM, Dr. David Alan Gilbert 
>>>> <dgilb...@redhat.com> wrote:
>> 
>>>> So if I have matching host kernels it should always work?
>>>> What happens if I upgrade the source kernel to increase it's maximum
>>>> nested size, can I force it to keep things small for some VMs?
>> 
>>> Any change to the format of the nested state should be gated by a
>>> KVM_CAP set by userspace. (Unlike, say, how the
>>> KVM_VCPUEVENT_VALID_SMM flag was added to the saved VCPU events state
>>> in commit f077825a8758d.) KVM has traditionally been quite bad about
>>> maintaining backwards compatibility, but I hope the community is more
>>> cognizant of the issues now.
>> 
>>> As a cloud provider, one would only enable the new capability from
>>> userspace once all hosts in the pool have a kernel that supports it.
>>> During the transition, the capability would not be enabled on the
>>> hosts with a new kernel, and these hosts would continue to provide
>>> nested state that could be consumed by hosts running the older kernel.
>> 
>> Hmm this makes sense.
>> 
>> This means though that the patch I have submitted here isn't good enough.
>> My patch currently assumes that when it attempts to get nested state from 
>> KVM,
>> QEMU should always set nested_state->size to max size supported by KVM as 
>> received
>> from kvm_check_extension(s, KVM_CAP_NESTED_STATE);
>> (See kvm_get_nested_state() introduced on my patch).
>> This indeed won't allow migration from host with new KVM to host with old 
>> KVM if
>> nested_state size was enlarged between these KVM versions.
>> Which is obviously an issue.
> 
> Actually I think this is okay, because unlike the "new" capability was
> enabled, KVM would always reduce nested_state->size to a value that is
> compatible with current kernels.
> 
>> But on second thought, I'm not sure that this is the right approach as-well.
>> We don't really want the used version of nested_state to be determined on 
>> kvm_init().
>> * On source QEMU, we actually want to determine it when preparing for 
>> migration based
>> on to the support given by our destination host. If it's an old host, we 
>> would like to
>> save an old version nested_state and if it's a new host, we will like to 
>> save our newest
>> supported nested_state.
> 
> No, that's wrong because it would lead to losing state.  If the source
> QEMU supports more state than the destination QEMU, and the current VM
> state needs to transmit it for migration to be _correct_, then migration
> to that destination QEMU must fail.
> 
> In particular, enabling the new KVM capability needs to be gated by a
> new machine type and/or -cpu flag, if migration compatibility is needed.
> (In particular, this is one reason why I haven't considered this series
> for 3.1.  Right now, migration of nested hypervisors is completely
> busted but if we make it "almost" work, pre-3.1 machine types would not
> ever be able to add support for KVM_CAP_EXCEPTION_PAYLOAD.  Therefore,
> it's better for users if we wait for one release more, and add support
> for KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD at the same time).
> 
> Personally, I would like to say that, starting from QEMU 3.2, enabling
> nested VMX requires a 4.20 kernel.  It's a bit bold, but I think it's a
> good way to keep some sanity.  Any opinions on that?
> 
> Paolo

If I understand you correctly, you wish that nested_state version used will be 
tied to the machine-type used to launch the guest.
The reason I am not fond of this approach is that it means that once a VM is 
launched with some machine-type, it’s nested_state
will be forever saved with a specific version. Even if this VM has already 
migrated to a host with newer kernel which knows how to
save more accurate state for the next migration.

The scheme I have described below should avoid this kind of issues while still 
preserving the ability to migrate to older hosts.
Note that I believe it’s in the responsibility of the mgmt-layer to decide the 
risk of migrating from a new host to an old host
in case the old host cannot receive the full nested_state that the new host is 
capable of generating. I don’t think migration
should fail in this case like happens currently with the patch I have submitted.

So in general I still agree with Jim's approach, but I’m not convinced we 
should use KVM_CAP for this.
(See details on proposed scheme below).

What are the disadvantages you see of using the proposed scheme below?
Why using KVM_CAP is better?

BTW, I agree with the rest of the group here that it’s too aggressive to make 
QEMU 3.2 force having kernel 4.20 for using nVMX.
This will hurt common nVMX workloads that don’t care about the ability to 
migrate.

-Liran

> 
>> Therefore, I don't think that we want this versioning to be based on KVM_CAP 
>> at all.
>> It seems that we would want the process to behave as follows:
>> 1) Mgmt-layer at dest queries dest host max supported nested_state size.
>>   (Which should be returned from kvm_check_extension(KVM_CAP_NESTED_STATE))
>> 2) Mgmt-layer at source initiate migration to dest with requesting QEMU to 
>> send nested_state 
>>   matching dest max supported nested_state size.
>>   When saving nested state using KVM_GET_NESTED_STATE IOCTL, QEMU will 
>> specify in nested_state->size
>>   the *requested* size to be saved and KVM should be able to save only the 
>> information which matches
>>   the version that worked with that size.
>> 3) After some sanity checks on received migration stream, dest host use 
>> KVM_SET_NESTED_STATE IOCTL.
>>   This IOCTL should deduce which information it should deploy based on given 
>> nested_state->size.
>> 
>> This also makes me wonder if it's not just nicer to use nested_state->flags 
>> to specify which
>> information is actually present on nested_state instead of managing 
>> versioning with nested_state->size.
>> 
>> What are your opinions on this?
>> 
>> -Liran
>> 
> 


Reply via email to