Hello Steve,
On Thu, Apr 01, 2021 at 06:40:06PM -0700, Steve Rutherford wrote:
> On Fri, Mar 19, 2021 at 11:00 AM Ashish Kalra <[email protected]> wrote:
> >
> > On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> > > On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra <[email protected]>
> > > wrote:
> > > >
> > > > On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> > > > > [+Marc]
> > > > >
> > > > > On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > > > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford
> > > > > > > > wrote:
> > > > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > > Thanks for grabbing the data!
> > > > > > > > >
> > > > > > > > > I am fine with both paths. Sean has stated an explicit desire
> > > > > > > > > for
> > > > > > > > > hypercall exiting, so I think that would be the current
> > > > > > > > > consensus.
> > > > > > >
> > > > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > > > >
> > > > > > > > > If we want to do hypercall exiting, this should be in a
> > > > > > > > > follow-up
> > > > > > > > > series where we implement something more generic, e.g. a
> > > > > > > > > hypercall
> > > > > > > > > exiting bitmap or hypercall exit list. If we are taking the
> > > > > > > > > hypercall
> > > > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > > > >
> > > > > > > I don't think this is a good candidate for arbitrary hypercall
> > > > > > > interception. Or
> > > > > > > rather, I think hypercall interception should be an orthogonal
> > > > > > > implementation.
> > > > > > >
> > > > > > > The guest, including guest firmware, needs to be aware that the
> > > > > > > hypercall is
> > > > > > > supported, and the ABI needs to be well-defined. Relying on
> > > > > > > userspace VMMs to
> > > > > > > implement a common ABI is an unnecessary risk.
> > > > > > >
> > > > > > > We could make KVM's default behavior be a nop, i.e. have KVM
> > > > > > > enforce the ABI but
> > > > > > > require further VMM intervention. But, I just don't see the
> > > > > > > point, it would
> > > > > > > save only a few lines of code. It would also limit what KVM
> > > > > > > could do in the
> > > > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit
> > > > > > > to userspace,
> > > > > > > then mandatory interception would essentially make it impossible
> > > > > > > for KVM to do
> > > > > > > bookkeeping while still honoring the interception request.
> > > > > > >
> > > > > > > However, I do think it would make sense to have the userspace
> > > > > > > exit be a generic
> > > > > > > exit type. But hey, we already have the necessary ABI defined
> > > > > > > for that! It's
> > > > > > > just not used anywhere.
> > > > > > >
> > > > > > > /* KVM_EXIT_HYPERCALL */
> > > > > > > struct {
> > > > > > > __u64 nr;
> > > > > > > __u64 args[6];
> > > > > > > __u64 ret;
> > > > > > > __u32 longmode;
> > > > > > > __u32 pad;
> > > > > > > } hypercall;
> > > > > > >
> > > > > > >
> > > > > > > > > Userspace could also handle the MSR using MSR filters (would
> > > > > > > > > need to
> > > > > > > > > confirm that). Then userspace could also be in control of
> > > > > > > > > the cpuid bit.
> > > > > > >
> > > > > > > An MSR is not a great fit; it's x86 specific and limited to 64
> > > > > > > bits of data.
> > > > > > > The data limitation could be fudged by shoving data into
> > > > > > > non-standard GPRs, but
> > > > > > > that will result in truly heinous guest code, and extensibility
> > > > > > > issues.
> > > > > > >
> >
> > We may also need to pass-through the MSR to userspace, as it is a part of
> > this
> > complete host (userspace/kernel), OVMF and guest kernel negotiation of
> > the SEV live migration feature.
> >
> > Host (userspace/kernel) advertises it's support for SEV live migration
> > feature via the CPUID bits, which is queried by OVMF and which in turn
> > adds a new UEFI runtime variable to indicate support for SEV live
> > migration, which is later queried during guest kernel boot and
> > accordingly the guest does a wrmrsl() to custom MSR to complete SEV
> > live migration negotiation and enable it.
> >
> > Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write
> > enables SEV live migration, hence, preventing userspace to start live
> > migration before the feature support has been negotiated and enabled on
> > all the three components - host, guest OVMF and kernel.
> >
> > But, now with this ioctl not existing anymore, we will need to
> > pass-through the MSR to userspace too, for it to only initiate live
> > migration once the feature negotiation has been completed.
>
> I can't tell if you were waiting for feedback on this before posting
> the follow-up patch series.
Actually, i am going to post the follow-up patch series upstream early
next week.
I have already added support for MSR handling and exit to userspace, the
current implementation looks like this :
The custom MSR is hooked both in svm_get_msr() and svm_set_msr():
@@ -2800,6 +2800,17 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct
msr_data *msr_info)
case MSR_F10H_DECFG:
msr_info->data = svm->msr_decfg;
break;
+ case MSR_KVM_SEV_LIVE_MIGRATION:
+ if (!sev_guest(vcpu->kvm))
+ return 1;
+
+ if (!guest_cpuid_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION))
+ return 1;
+
+ /*
+ * Let userspace handle the MSR using MSR filters.
+ */
+ return KVM_MSR_RET_FILTERED;
So there is special check added for both sev_guest() and requisite CPUID
bit set in guest CPUID, if either fails this will signal #GP to guest.
Otherwise, it returns MSR_FILTER return code, which will allow userspace
to use msr intercepts to handle the reads and writes via userspace exits
using KVM_EXIT_X86_RDMSR/KVM_EXIT_X86_WRMSR.
Let me know if you have any feedback/comments on the above handling.
Thanks,
Ashish
>
> Here are a few options:
> 1) Add the MSR explicitly to the list of custom kvm MSRs, but don't
> have it hooked up anywhere. The expectation would be for the VMM to
> use msr intercepts to handle the reads and writes. If that seems
> weird, have svm_set_msr (or whatever) explicitly ignore it.
> 2) Add a getter and setter for the MSR. Only allow guests to use it if
> they are sev_guests with the requisite CPUID bit set.
>
> I think I prefer the former, and it should work fine from my
> understanding of the msr intercepts implementation. I'm also open to
> other ideas. You could also have the MSR write trigger a KVM_EXIT of
> the same type as the hypercall, but have it just say "the msr value
> changed to XYZ", but that design sounds awkward.
>
> Thanks,
> Steve