Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-04-02 Thread Ashish Kalra
Hello Steve,

On Thu, Apr 01, 2021 at 06:40:06PM -0700, Steve Rutherford wrote:
> On Fri, Mar 19, 2021 at 11:00 AM Ashish Kalra  wrote:
> >
> > On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> > > On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra  
> > > wrote:
> > > >
> > > > On Wed, Mar 03, 2021 at 06:54:41PM +, Will Deacon wrote:
> > > > > [+Marc]
> > > > >
> > > > > On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> > > > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford 
> > > > > > > > wrote:
> > > > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra 
> > > > > > > > >  wrote:
> > > > > > > > > Thanks for grabbing the data!
> > > > > > > > >
> > > > > > > > > I am fine with both paths. Sean has stated an explicit desire 
> > > > > > > > > for
> > > > > > > > > hypercall exiting, so I think that would be the current 
> > > > > > > > > consensus.
> > > > > > >
> > > > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > > > >
> > > > > > > > > If we want to do hypercall exiting, this should be in a 
> > > > > > > > > follow-up
> > > > > > > > > series where we implement something more generic, e.g. a 
> > > > > > > > > hypercall
> > > > > > > > > exiting bitmap or hypercall exit list. If we are taking the 
> > > > > > > > > hypercall
> > > > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > > > >
> > > > > > > I don't think this is a good candidate for arbitrary hypercall 
> > > > > > > interception.  Or
> > > > > > > rather, I think hypercall interception should be an orthogonal 
> > > > > > > implementation.
> > > > > > >
> > > > > > > The guest, including guest firmware, needs to be aware that the 
> > > > > > > hypercall is
> > > > > > > supported, and the ABI needs to be well-defined.  Relying on 
> > > > > > > userspace VMMs to
> > > > > > > implement a common ABI is an unnecessary risk.
> > > > > > >
> > > > > > > We could make KVM's default behavior be a nop, i.e. have KVM 
> > > > > > > enforce the ABI but
> > > > > > > require further VMM intervention.  But, I just don't see the 
> > > > > > > point, it would
> > > > > > > save only a few lines of code.  It would also limit what KVM 
> > > > > > > could do in the
> > > > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit 
> > > > > > > to userspace,
> > > > > > > then mandatory interception would essentially make it impossible 
> > > > > > > for KVM to do
> > > > > > > bookkeeping while still honoring the interception request.
> > > > > > >
> > > > > > > However, I do think it would make sense to have the userspace 
> > > > > > > exit be a generic
> > > > > > > exit type.  But hey, we already have the necessary ABI defined 
> > > > > > > for that!  It's
> > > > > > > just not used anywhere.
> > > > > > >
> > > > > > >   /* KVM_EXIT_HYPERCALL */
> > > > > > >   struct {
> > > > > > >   __u64 nr;
> > > > > > >   __u64 args[6];
> > > > > > >   __u64 ret;
> > > > > > >   __u32 longmode;
> > > > > > >   __u32 pad;
> > > > > > >   } hypercall;
> > > > > > >
> > > > > > >
> > > > > > > > > Userspace could also handle the MSR using MSR filters (would 
> > > > > > > > > need to
> > > > > > > > > confirm that).  Then userspace could also be in control of 
> > > > > > > > > the cpuid bit.
> > > > > > >
> > > > > > > An MSR is not a great fit; it's x86 specific and limited to 64 
> > > > > > > bits of data.
> > > > > > > The data limitation could be fudged by shoving data into 
> > > > > > > non-standard GPRs, but
> > > > > > > that will result in truly heinous guest code, and extensibility 
> > > > > > > issues.
> > > > > > >
> >
> > We may also need to pass-through the MSR to userspace, as it is a part of 
> > this
> > complete host (userspace/kernel), OVMF and guest kernel negotiation of
> > the SEV live migration feature.
> >
> > Host (userspace/kernel) advertises it's support for SEV live migration
> > feature via the CPUID bits, which is queried by OVMF and which in turn
> > adds a new UEFI runtime variable to indicate support for SEV live
> > migration, which is later queried during guest kernel boot and
> > accordingly the guest does a wrmrsl() to custom MSR to complete SEV
> > live migration negotiation and enable it.
> >
> > Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write
> > enables SEV live migration, hence, preventing userspace to start live
> > migration before the feature support has been negotiated and enabled on
> > all the three components - host, guest OVMF and kernel.
> >
> > But, now with this ioctl not existing anymore, we will need to
> > pass-through the MSR to userspace too, for it to only initiate live
> > migration once the feature negotiation has been completed.
> 
> I can't tell if you were waiting for feedback on this 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-04-01 Thread Steve Rutherford
On Fri, Mar 19, 2021 at 11:00 AM Ashish Kalra  wrote:
>
> On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> > On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra  wrote:
> > >
> > > On Wed, Mar 03, 2021 at 06:54:41PM +, Will Deacon wrote:
> > > > [+Marc]
> > > >
> > > > On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> > > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra 
> > > > > > > >  wrote:
> > > > > > > > Thanks for grabbing the data!
> > > > > > > >
> > > > > > > > I am fine with both paths. Sean has stated an explicit desire 
> > > > > > > > for
> > > > > > > > hypercall exiting, so I think that would be the current 
> > > > > > > > consensus.
> > > > > >
> > > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > > >
> > > > > > > > If we want to do hypercall exiting, this should be in a 
> > > > > > > > follow-up
> > > > > > > > series where we implement something more generic, e.g. a 
> > > > > > > > hypercall
> > > > > > > > exiting bitmap or hypercall exit list. If we are taking the 
> > > > > > > > hypercall
> > > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > > >
> > > > > > I don't think this is a good candidate for arbitrary hypercall 
> > > > > > interception.  Or
> > > > > > rather, I think hypercall interception should be an orthogonal 
> > > > > > implementation.
> > > > > >
> > > > > > The guest, including guest firmware, needs to be aware that the 
> > > > > > hypercall is
> > > > > > supported, and the ABI needs to be well-defined.  Relying on 
> > > > > > userspace VMMs to
> > > > > > implement a common ABI is an unnecessary risk.
> > > > > >
> > > > > > We could make KVM's default behavior be a nop, i.e. have KVM 
> > > > > > enforce the ABI but
> > > > > > require further VMM intervention.  But, I just don't see the point, 
> > > > > > it would
> > > > > > save only a few lines of code.  It would also limit what KVM could 
> > > > > > do in the
> > > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > > > > userspace,
> > > > > > then mandatory interception would essentially make it impossible 
> > > > > > for KVM to do
> > > > > > bookkeeping while still honoring the interception request.
> > > > > >
> > > > > > However, I do think it would make sense to have the userspace exit 
> > > > > > be a generic
> > > > > > exit type.  But hey, we already have the necessary ABI defined for 
> > > > > > that!  It's
> > > > > > just not used anywhere.
> > > > > >
> > > > > >   /* KVM_EXIT_HYPERCALL */
> > > > > >   struct {
> > > > > >   __u64 nr;
> > > > > >   __u64 args[6];
> > > > > >   __u64 ret;
> > > > > >   __u32 longmode;
> > > > > >   __u32 pad;
> > > > > >   } hypercall;
> > > > > >
> > > > > >
> > > > > > > > Userspace could also handle the MSR using MSR filters (would 
> > > > > > > > need to
> > > > > > > > confirm that).  Then userspace could also be in control of the 
> > > > > > > > cpuid bit.
> > > > > >
> > > > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits 
> > > > > > of data.
> > > > > > The data limitation could be fudged by shoving data into 
> > > > > > non-standard GPRs, but
> > > > > > that will result in truly heinous guest code, and extensibility 
> > > > > > issues.
> > > > > >
>
> We may also need to pass-through the MSR to userspace, as it is a part of this
> complete host (userspace/kernel), OVMF and guest kernel negotiation of
> the SEV live migration feature.
>
> Host (userspace/kernel) advertises it's support for SEV live migration
> feature via the CPUID bits, which is queried by OVMF and which in turn
> adds a new UEFI runtime variable to indicate support for SEV live
> migration, which is later queried during guest kernel boot and
> accordingly the guest does a wrmrsl() to custom MSR to complete SEV
> live migration negotiation and enable it.
>
> Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write
> enables SEV live migration, hence, preventing userspace to start live
> migration before the feature support has been negotiated and enabled on
> all the three components - host, guest OVMF and kernel.
>
> But, now with this ioctl not existing anymore, we will need to
> pass-through the MSR to userspace too, for it to only initiate live
> migration once the feature negotiation has been completed.
Hey Ashish,

I can't tell if you were waiting for feedback on this before posting
the follow-up patch series.

Here are a few options:
1) Add the MSR explicitly to the list of custom kvm MSRs, but don't
have it hooked up anywhere. The expectation would be for the VMM to
use msr intercepts to handle the reads and writes. If that seems
weird, have svm_set_msr (or 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-19 Thread Ashish Kalra
On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra  wrote:
> >
> > On Wed, Mar 03, 2021 at 06:54:41PM +, Will Deacon wrote:
> > > [+Marc]
> > >
> > > On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra 
> > > > > > >  wrote:
> > > > > > > Thanks for grabbing the data!
> > > > > > >
> > > > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > > > hypercall exiting, so I think that would be the current consensus.
> > > > >
> > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > >
> > > > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > > > series where we implement something more generic, e.g. a hypercall
> > > > > > > exiting bitmap or hypercall exit list. If we are taking the 
> > > > > > > hypercall
> > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > >
> > > > > I don't think this is a good candidate for arbitrary hypercall 
> > > > > interception.  Or
> > > > > rather, I think hypercall interception should be an orthogonal 
> > > > > implementation.
> > > > >
> > > > > The guest, including guest firmware, needs to be aware that the 
> > > > > hypercall is
> > > > > supported, and the ABI needs to be well-defined.  Relying on 
> > > > > userspace VMMs to
> > > > > implement a common ABI is an unnecessary risk.
> > > > >
> > > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce 
> > > > > the ABI but
> > > > > require further VMM intervention.  But, I just don't see the point, 
> > > > > it would
> > > > > save only a few lines of code.  It would also limit what KVM could do 
> > > > > in the
> > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > > > userspace,
> > > > > then mandatory interception would essentially make it impossible for 
> > > > > KVM to do
> > > > > bookkeeping while still honoring the interception request.
> > > > >
> > > > > However, I do think it would make sense to have the userspace exit be 
> > > > > a generic
> > > > > exit type.  But hey, we already have the necessary ABI defined for 
> > > > > that!  It's
> > > > > just not used anywhere.
> > > > >
> > > > >   /* KVM_EXIT_HYPERCALL */
> > > > >   struct {
> > > > >   __u64 nr;
> > > > >   __u64 args[6];
> > > > >   __u64 ret;
> > > > >   __u32 longmode;
> > > > >   __u32 pad;
> > > > >   } hypercall;
> > > > >
> > > > >
> > > > > > > Userspace could also handle the MSR using MSR filters (would need 
> > > > > > > to
> > > > > > > confirm that).  Then userspace could also be in control of the 
> > > > > > > cpuid bit.
> > > > >
> > > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits 
> > > > > of data.
> > > > > The data limitation could be fudged by shoving data into non-standard 
> > > > > GPRs, but
> > > > > that will result in truly heinous guest code, and extensibility 
> > > > > issues.
> > > > >

We may also need to pass-through the MSR to userspace, as it is a part of this
complete host (userspace/kernel), OVMF and guest kernel negotiation of
the SEV live migration feature. 

Host (userspace/kernel) advertises it's support for SEV live migration
feature via the CPUID bits, which is queried by OVMF and which in turn
adds a new UEFI runtime variable to indicate support for SEV live
migration, which is later queried during guest kernel boot and
accordingly the guest does a wrmrsl() to custom MSR to complete SEV
live migration negotiation and enable it. 

Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write
enables SEV live migration, hence, preventing userspace to start live
migration before the feature support has been negotiated and enabled on
all the three components - host, guest OVMF and kernel. 

But, now with this ioctl not existing anymore, we will need to
pass-through the MSR to userspace too, for it to only initiate live
migration once the feature negotiation has been completed.

Thanks,
Ashish

> > > > > The data limitation is a moot point, because the x86-only thing is a 
> > > > > deal
> > > > > breaker.  arm64's pKVM work has a near-identical use case for a guest 
> > > > > to share
> > > > > memory with a host.  I can't think of a clever way to avoid having to 
> > > > > support
> > > > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not 
> > > > > have
> > > > > multiple KVM variants.
> > > >
> > > > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > > > patch-set and probably relevant to arm64 nVHE hypervisor
> > > > mode/implementation, and potentially makes sense as it adds guest
> > > > 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-11 Thread Steve Rutherford
On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra  wrote:
>
> On Wed, Mar 03, 2021 at 06:54:41PM +, Will Deacon wrote:
> > [+Marc]
> >
> > On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra 
> > > > > >  wrote:
> > > > > > Thanks for grabbing the data!
> > > > > >
> > > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > > hypercall exiting, so I think that would be the current consensus.
> > > >
> > > > Yep, though it'd be good to get Paolo's input, too.
> > > >
> > > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > > series where we implement something more generic, e.g. a hypercall
> > > > > > exiting bitmap or hypercall exit list. If we are taking the 
> > > > > > hypercall
> > > > > > exit route, we can drop the kvm side of the hypercall.
> > > >
> > > > I don't think this is a good candidate for arbitrary hypercall 
> > > > interception.  Or
> > > > rather, I think hypercall interception should be an orthogonal 
> > > > implementation.
> > > >
> > > > The guest, including guest firmware, needs to be aware that the 
> > > > hypercall is
> > > > supported, and the ABI needs to be well-defined.  Relying on userspace 
> > > > VMMs to
> > > > implement a common ABI is an unnecessary risk.
> > > >
> > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce 
> > > > the ABI but
> > > > require further VMM intervention.  But, I just don't see the point, it 
> > > > would
> > > > save only a few lines of code.  It would also limit what KVM could do 
> > > > in the
> > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > > userspace,
> > > > then mandatory interception would essentially make it impossible for 
> > > > KVM to do
> > > > bookkeeping while still honoring the interception request.
> > > >
> > > > However, I do think it would make sense to have the userspace exit be a 
> > > > generic
> > > > exit type.  But hey, we already have the necessary ABI defined for 
> > > > that!  It's
> > > > just not used anywhere.
> > > >
> > > >   /* KVM_EXIT_HYPERCALL */
> > > >   struct {
> > > >   __u64 nr;
> > > >   __u64 args[6];
> > > >   __u64 ret;
> > > >   __u32 longmode;
> > > >   __u32 pad;
> > > >   } hypercall;
> > > >
> > > >
> > > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > > confirm that).  Then userspace could also be in control of the 
> > > > > > cpuid bit.
> > > >
> > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> > > > data.
> > > > The data limitation could be fudged by shoving data into non-standard 
> > > > GPRs, but
> > > > that will result in truly heinous guest code, and extensibility issues.
> > > >
> > > > The data limitation is a moot point, because the x86-only thing is a 
> > > > deal
> > > > breaker.  arm64's pKVM work has a near-identical use case for a guest 
> > > > to share
> > > > memory with a host.  I can't think of a clever way to avoid having to 
> > > > support
> > > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not 
> > > > have
> > > > multiple KVM variants.
> > >
> > > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > > patch-set and probably relevant to arm64 nVHE hypervisor
> > > mode/implementation, and potentially makes sense as it adds guest
> > > memory protection as both host and guest kernels are running on the same
> > > privilege level ?
> > >
> > > Though i do see that the pKVM stuff adds two hypercalls, specifically :
> > >
> > > pkvm_create_mappings() ( I assume this is for setting shared memory
> > > regions between host and guest) &
> > > pkvm_create_private_mappings().
> > >
> > > And the use-cases are quite similar to memory protection architectues
> > > use cases, for example, use with virtio devices, guest DMA I/O, etc.
> >
> > These hypercalls are both private to the host kernel communicating with
> > its hypervisor counterpart, so I don't think they're particularly
> > relevant here. As far as I can see, the more useful thing is to allow
> > the guest to communicate back to the host (and the VMM) that it has opened
> > up a memory window, perhaps for virtio rings or some other shared memory.
> >
> > We hacked this up as a prototype in the past:
> >
> > 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-11 Thread Ashish Kalra
On Wed, Mar 03, 2021 at 06:54:41PM +, Will Deacon wrote:
> [+Marc]
> 
> On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > > > wrote:
> > > > > Thanks for grabbing the data!
> > > > > 
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > > 
> > > Yep, though it'd be good to get Paolo's input, too.
> > > 
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > > 
> > > I don't think this is a good candidate for arbitrary hypercall 
> > > interception.  Or
> > > rather, I think hypercall interception should be an orthogonal 
> > > implementation.
> > > 
> > > The guest, including guest firmware, needs to be aware that the hypercall 
> > > is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace 
> > > VMMs to
> > > implement a common ABI is an unnecessary risk.
> > > 
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> > > ABI but
> > > require further VMM intervention.  But, I just don't see the point, it 
> > > would
> > > save only a few lines of code.  It would also limit what KVM could do in 
> > > the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > userspace,
> > > then mandatory interception would essentially make it impossible for KVM 
> > > to do
> > > bookkeeping while still honoring the interception request.
> > > 
> > > However, I do think it would make sense to have the userspace exit be a 
> > > generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  
> > > It's
> > > just not used anywhere.
> > > 
> > >   /* KVM_EXIT_HYPERCALL */
> > >   struct {
> > >   __u64 nr;
> > >   __u64 args[6];
> > >   __u64 ret;
> > >   __u32 longmode;
> > >   __u32 pad;
> > >   } hypercall;
> > > 
> > > 
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid 
> > > > > bit.
> > > 
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> > > data.
> > > The data limitation could be fudged by shoving data into non-standard 
> > > GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > > 
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> > > share
> > > memory with a host.  I can't think of a clever way to avoid having to 
> > > support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > 
> > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > patch-set and probably relevant to arm64 nVHE hypervisor
> > mode/implementation, and potentially makes sense as it adds guest
> > memory protection as both host and guest kernels are running on the same
> > privilege level ?
> > 
> > Though i do see that the pKVM stuff adds two hypercalls, specifically :
> > 
> > pkvm_create_mappings() ( I assume this is for setting shared memory
> > regions between host and guest) &
> > pkvm_create_private_mappings().
> > 
> > And the use-cases are quite similar to memory protection architectues
> > use cases, for example, use with virtio devices, guest DMA I/O, etc.
> 
> These hypercalls are both private to the host kernel communicating with
> its hypervisor counterpart, so I don't think they're particularly
> relevant here. As far as I can see, the more useful thing is to allow
> the guest to communicate back to the host (and the VMM) that it has opened
> up a memory window, perhaps for virtio rings or some other shared memory.
> 
> We hacked this up as a prototype in the past:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid-kvm.googlesource.com%2Flinux%2F%2B%2Fd12a9e2c12a52cf7140d40cd9fa092dc8a85fac9%255E%2521%2Fdata=04%7C01%7Cashish.kalra%40amd.com%7C7ae6bbd9fa6442f9edcc08d8de75d14b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637503944913839841%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Juon5nJ7BB6moTWYssRXOWrDOrYfZLmA%2BLrz3s12Ook%3Dreserved=0
> 
> but that's all arm64-specific and if we're solving the same problem as
> you, then let's avoid arch-specific stuff if possible. The way in which
> the guest discovers the interface 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-09 Thread Steve Rutherford
On Tue, Mar 9, 2021 at 7:42 PM Kalra, Ashish  wrote:
>
>
>
> > On Mar 9, 2021, at 3:22 AM, Steve Rutherford  wrote:
> >
> > On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh  wrote:
> >>
> >>
> >>> On 3/8/21 1:51 PM, Sean Christopherson wrote:
> >>> On Mon, Mar 08, 2021, Ashish Kalra wrote:
>  On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > +Will and Quentin (arm64)
> >
> > Moving the non-KVM x86 folks to bcc, I don't they care about KVM 
> > details at this
> > point.
> >
> > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> >> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> >>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> >>> wrote:
> >>> Thanks for grabbing the data!
> >>>
> >>> I am fine with both paths. Sean has stated an explicit desire for
> >>> hypercall exiting, so I think that would be the current consensus.
> > Yep, though it'd be good to get Paolo's input, too.
> >
> >>> If we want to do hypercall exiting, this should be in a follow-up
> >>> series where we implement something more generic, e.g. a hypercall
> >>> exiting bitmap or hypercall exit list. If we are taking the hypercall
> >>> exit route, we can drop the kvm side of the hypercall.
> > I don't think this is a good candidate for arbitrary hypercall 
> > interception.  Or
> > rather, I think hypercall interception should be an orthogonal 
> > implementation.
> >
> > The guest, including guest firmware, needs to be aware that the 
> > hypercall is
> > supported, and the ABI needs to be well-defined.  Relying on userspace 
> > VMMs to
> > implement a common ABI is an unnecessary risk.
> >
> > We could make KVM's default behavior be a nop, i.e. have KVM enforce 
> > the ABI but
> > require further VMM intervention.  But, I just don't see the point, it 
> > would
> > save only a few lines of code.  It would also limit what KVM could do 
> > in the
> > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > userspace,
> > then mandatory interception would essentially make it impossible for 
> > KVM to do
> > bookkeeping while still honoring the interception request.
> >
> > However, I do think it would make sense to have the userspace exit be a 
> > generic
> > exit type.  But hey, we already have the necessary ABI defined for 
> > that!  It's
> > just not used anywhere.
> >
> >/* KVM_EXIT_HYPERCALL */
> >struct {
> >__u64 nr;
> >__u64 args[6];
> >__u64 ret;
> >__u32 longmode;
> >__u32 pad;
> >} hypercall;
> >
> >
> >>> Userspace could also handle the MSR using MSR filters (would need to
> >>> confirm that).  Then userspace could also be in control of the cpuid 
> >>> bit.
> > An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> > data.
> > The data limitation could be fudged by shoving data into non-standard 
> > GPRs, but
> > that will result in truly heinous guest code, and extensibility issues.
> >
> > The data limitation is a moot point, because the x86-only thing is a 
> > deal
> > breaker.  arm64's pKVM work has a near-identical use case for a guest 
> > to share
> > memory with a host.  I can't think of a clever way to avoid having to 
> > support
> > TDX's and SNP's hypervisor-agnostic variants, but we can at least not 
> > have
> > multiple KVM variants.
> >
>  Potentially, there is another reason for in-kernel hypercall handling
>  considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
>  of each guest page, for instance pages in hypervisor state, i.e., pages
>  with C=0 and pages in guest valid state with C=1.
> 
>  Now, there shouldn't be a need for page encryption status hypercalls on
>  SEV-SNP as KVM can track & reference guest page status directly using
>  the RMP table.
> >>> Relying on the RMP table itself would require locking the RMP table for an
> >>> extended duration, and walking the entire RMP to find shared pages would 
> >>> be
> >>> very inefficient.
> >>>
>  As KVM maintains the RMP table, therefore we will need SET/GET type of
>  interfaces to provide the guest page encryption status to userspace.
> >>> Hrm, somehow I temporarily forgot about SNP and TDX adding their own 
> >>> hypercalls
> >>> for converting between shared and private.  And in the case of TDX, the 
> >>> hypercall
> >>> can't be trusted, i.e. is just a hint, otherwise the guest could induce a 
> >>> #MC in
> >>> the host.
> >>>
> >>> But, the different guest behavior doesn't require KVM to maintain a 
> >>> list/tree,
> >>> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page 
> >>> encryption
> >>> status changes 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-09 Thread Kalra, Ashish


> On Mar 9, 2021, at 3:22 AM, Steve Rutherford  wrote:
> 
> On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh  wrote:
>> 
>> 
>>> On 3/8/21 1:51 PM, Sean Christopherson wrote:
>>> On Mon, Mar 08, 2021, Ashish Kalra wrote:
 On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> +Will and Quentin (arm64)
> 
> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details 
> at this
> point.
> 
> On Fri, Feb 26, 2021, Ashish Kalra wrote:
>> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
>>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
>>> wrote:
>>> Thanks for grabbing the data!
>>> 
>>> I am fine with both paths. Sean has stated an explicit desire for
>>> hypercall exiting, so I think that would be the current consensus.
> Yep, though it'd be good to get Paolo's input, too.
> 
>>> If we want to do hypercall exiting, this should be in a follow-up
>>> series where we implement something more generic, e.g. a hypercall
>>> exiting bitmap or hypercall exit list. If we are taking the hypercall
>>> exit route, we can drop the kvm side of the hypercall.
> I don't think this is a good candidate for arbitrary hypercall 
> interception.  Or
> rather, I think hypercall interception should be an orthogonal 
> implementation.
> 
> The guest, including guest firmware, needs to be aware that the hypercall 
> is
> supported, and the ABI needs to be well-defined.  Relying on userspace 
> VMMs to
> implement a common ABI is an unnecessary risk.
> 
> We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> ABI but
> require further VMM intervention.  But, I just don't see the point, it 
> would
> save only a few lines of code.  It would also limit what KVM could do in 
> the
> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> userspace,
> then mandatory interception would essentially make it impossible for KVM 
> to do
> bookkeeping while still honoring the interception request.
> 
> However, I do think it would make sense to have the userspace exit be a 
> generic
> exit type.  But hey, we already have the necessary ABI defined for that!  
> It's
> just not used anywhere.
> 
>/* KVM_EXIT_HYPERCALL */
>struct {
>__u64 nr;
>__u64 args[6];
>__u64 ret;
>__u32 longmode;
>__u32 pad;
>} hypercall;
> 
> 
>>> Userspace could also handle the MSR using MSR filters (would need to
>>> confirm that).  Then userspace could also be in control of the cpuid 
>>> bit.
> An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> data.
> The data limitation could be fudged by shoving data into non-standard 
> GPRs, but
> that will result in truly heinous guest code, and extensibility issues.
> 
> The data limitation is a moot point, because the x86-only thing is a deal
> breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> share
> memory with a host.  I can't think of a clever way to avoid having to 
> support
> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> multiple KVM variants.
> 
 Potentially, there is another reason for in-kernel hypercall handling
 considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
 of each guest page, for instance pages in hypervisor state, i.e., pages
 with C=0 and pages in guest valid state with C=1.
 
 Now, there shouldn't be a need for page encryption status hypercalls on
 SEV-SNP as KVM can track & reference guest page status directly using
 the RMP table.
>>> Relying on the RMP table itself would require locking the RMP table for an
>>> extended duration, and walking the entire RMP to find shared pages would be
>>> very inefficient.
>>> 
 As KVM maintains the RMP table, therefore we will need SET/GET type of
 interfaces to provide the guest page encryption status to userspace.
>>> Hrm, somehow I temporarily forgot about SNP and TDX adding their own 
>>> hypercalls
>>> for converting between shared and private.  And in the case of TDX, the 
>>> hypercall
>>> can't be trusted, i.e. is just a hint, otherwise the guest could induce a 
>>> #MC in
>>> the host.
>>> 
>>> But, the different guest behavior doesn't require KVM to maintain a 
>>> list/tree,
>>> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page 
>>> encryption
>>> status changes would also suffice.
>>> 
>>> Actually, that made me think of another argument against maintaining a list 
>>> in
>>> KVM: there's no way to notify userspace that a page's status has changed.
>>> Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
>>> Obviously not a 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-09 Thread Sean Christopherson
On Mon, Mar 08, 2021, Steve Rutherford wrote:
> On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh  wrote:
> > On 3/8/21 1:51 PM, Sean Christopherson wrote:
> > > If the guest does the hypercall after writing the page, then the guest is 
> > > hosed
> > > if it gets migrated while writing the page (scenario #1):
> > >
> > >   vCPU Userspace
> > >   zero_bytes[0:N]
> > > > > shared>
> > >  
> > >   zero_bytes[N+1:4095]
> > >   set_shared (dest)
> > >   kaboom!
> >
> >
> > Maybe I am missing something, this is not any different from a normal
> > operation inside a guest. Making a page shared/private in the page table
> > does not update the content of the page itself. In your above case, I
> > assume zero_bytes[N+1:4095] are written by the destination VM. The
> > memory region was private in the source VM page table, so, those writes
> > will be performed encrypted. The destination VM later changed the memory
> > to shared, but nobody wrote to the memory after it has been transitioned
> > to the  shared, so a reader of the memory should get ciphertext and
> > unless there was a write after the set_shared (dest).

Sorry, that wasn't clear, there's an implied page table update to make the page
shared before zero_bytes.

> > > If userspace does GET_DIRTY after GET_LIST, then the host would transfer 
> > > bad
> > > data by consuming a stale list (scenario #2):
> > >
> > >   vCPU   Userspace
> > >  get_list (from KVM or internally)
> > >   set_shared (src)
> > >   zero_page (src)
> > >  get_dirty
> > >  
> > >  
> > >   kaboom!
> >
> >
> > I don't remember how things are done in recent Ashish Qemu/KVM patches
> > but in previous series, the get_dirty() happens before the querying the
> > encrypted state. There was some logic in VMM to resync the encrypted
> > bitmap during the final migration stage and perform any additional data
> > transfer since last sync.

It's likely that Ashish's patches did the correct thing, I just wanted to point
out that both host and guest need to do everything in a very specific order.

> > > If both guest and host order things to avoid #1 and #2, the host can still
> > > migrate the wrong data (scenario #3):
> > >
> > >   vCPU   Userspace
> > >   set_private
> > >   zero_bytes[0:4096]
> > >  get_dirty
> > >   set_shared (src)
> > >  get_list
> > >  
> > >
> > >   set_private (dest)
> > >   kaboom!
> >
> >
> > Since there was no write to the memory after the set_shared (src), so
> > the content of the page should not have changed. After the set_private
> > (dest), the caller should be seeing the same content written by the
> > zero_bytes[0:4096]
>
> I think Sean was going for the situation where the VM has moved to the
> destination, which would have changed the VEK. That way the guest
> would be decrypting the old ciphertext with the new (wrong) key.

I think that's what I was saying?

I was pointing out that the userspace VMM would see the page as "shared" and so
read the guest memory directly instead of routing it through the PSP.

Anyways, my intent wasn't to point out a known issue anywhere.  I was working
through how GET_LIST would interact with GET_DIRTY_LOG to convince myself that
the various flows were bombproof.  I wanted to record those thoughts so that I
can remind myself of the requirements when I inevitably forget them in the 
future.

> > > Scenario #3 is unlikely, but plausible, e.g. if the guest bails from its
> > > conversion flow for whatever reason, after making the initial hypercall.  
> > > Maybe
> > > it goes without saying, but to address #3, the guest must consider 
> > > existing data
> > > as lost the instant it tells the host the page has been converted to a 
> > > different
> > > type.
> > >
> > >> For the above reason if we do in-kernel hypercall handling for page
> > >> encryption status (which we probably won't require for SEV-SNP &
> > >> correspondingly there will be no hypercall exiting),
> > > As above, that doesn't preclude KVM from exiting to userspace on 
> > > conversion.
> > >
> > >> then we can implement a standard GET/SET ioctl interface to get/set the 
> > >> guest
> > >> page encryption status for userspace, which will work across SEV, SEV-ES 
> > >> and
> > >> SEV-SNP.


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-09 Thread Ashish Kalra
On Wed, Mar 03, 2021 at 06:54:41PM +, Will Deacon wrote:
> [+Marc]
> 
> On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > > > wrote:
> > > > > Thanks for grabbing the data!
> > > > > 
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > > 
> > > Yep, though it'd be good to get Paolo's input, too.
> > > 
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > > 
> > > I don't think this is a good candidate for arbitrary hypercall 
> > > interception.  Or
> > > rather, I think hypercall interception should be an orthogonal 
> > > implementation.
> > > 
> > > The guest, including guest firmware, needs to be aware that the hypercall 
> > > is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace 
> > > VMMs to
> > > implement a common ABI is an unnecessary risk.
> > > 
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> > > ABI but
> > > require further VMM intervention.  But, I just don't see the point, it 
> > > would
> > > save only a few lines of code.  It would also limit what KVM could do in 
> > > the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > userspace,
> > > then mandatory interception would essentially make it impossible for KVM 
> > > to do
> > > bookkeeping while still honoring the interception request.
> > > 
> > > However, I do think it would make sense to have the userspace exit be a 
> > > generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  
> > > It's
> > > just not used anywhere.
> > > 
> > >   /* KVM_EXIT_HYPERCALL */
> > >   struct {
> > >   __u64 nr;
> > >   __u64 args[6];
> > >   __u64 ret;
> > >   __u32 longmode;
> > >   __u32 pad;
> > >   } hypercall;
> > > 

Doc mentions to use KVM_EXIT_IO for x86 to implement "hypercall to userspace"
functionality. 

Any reasons for not using KVM_EXIT_HYPERCALL interface any more, even
though it is still there and not removed ?

> > > 
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid 
> > > > > bit.
> > > 
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> > > data.
> > > The data limitation could be fudged by shoving data into non-standard 
> > > GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > > 
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> > > share
> > > memory with a host.  I can't think of a clever way to avoid having to 
> > > support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > 
> > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > patch-set and probably relevant to arm64 nVHE hypervisor
> > mode/implementation, and potentially makes sense as it adds guest
> > memory protection as both host and guest kernels are running on the same
> > privilege level ?
> > 
> > Though i do see that the pKVM stuff adds two hypercalls, specifically :
> > 
> > pkvm_create_mappings() ( I assume this is for setting shared memory
> > regions between host and guest) &
> > pkvm_create_private_mappings().
> > 
> > And the use-cases are quite similar to memory protection architectues
> > use cases, for example, use with virtio devices, guest DMA I/O, etc.
> 
> These hypercalls are both private to the host kernel communicating with
> its hypervisor counterpart, so I don't think they're particularly
> relevant here. As far as I can see, the more useful thing is to allow
> the guest to communicate back to the host (and the VMM) that it has opened
> up a memory window, perhaps for virtio rings or some other shared memory.
> 
> We hacked this up as a prototype in the past:
> 
> 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Steve Rutherford
On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh  wrote:
>
>
> On 3/8/21 1:51 PM, Sean Christopherson wrote:
> > On Mon, Mar 08, 2021, Ashish Kalra wrote:
> >> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> >>> +Will and Quentin (arm64)
> >>>
> >>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details 
> >>> at this
> >>> point.
> >>>
> >>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
>  On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > wrote:
> > Thanks for grabbing the data!
> >
> > I am fine with both paths. Sean has stated an explicit desire for
> > hypercall exiting, so I think that would be the current consensus.
> >>> Yep, though it'd be good to get Paolo's input, too.
> >>>
> > If we want to do hypercall exiting, this should be in a follow-up
> > series where we implement something more generic, e.g. a hypercall
> > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > exit route, we can drop the kvm side of the hypercall.
> >>> I don't think this is a good candidate for arbitrary hypercall 
> >>> interception.  Or
> >>> rather, I think hypercall interception should be an orthogonal 
> >>> implementation.
> >>>
> >>> The guest, including guest firmware, needs to be aware that the hypercall 
> >>> is
> >>> supported, and the ABI needs to be well-defined.  Relying on userspace 
> >>> VMMs to
> >>> implement a common ABI is an unnecessary risk.
> >>>
> >>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> >>> ABI but
> >>> require further VMM intervention.  But, I just don't see the point, it 
> >>> would
> >>> save only a few lines of code.  It would also limit what KVM could do in 
> >>> the
> >>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> >>> userspace,
> >>> then mandatory interception would essentially make it impossible for KVM 
> >>> to do
> >>> bookkeeping while still honoring the interception request.
> >>>
> >>> However, I do think it would make sense to have the userspace exit be a 
> >>> generic
> >>> exit type.  But hey, we already have the necessary ABI defined for that!  
> >>> It's
> >>> just not used anywhere.
> >>>
> >>> /* KVM_EXIT_HYPERCALL */
> >>> struct {
> >>> __u64 nr;
> >>> __u64 args[6];
> >>> __u64 ret;
> >>> __u32 longmode;
> >>> __u32 pad;
> >>> } hypercall;
> >>>
> >>>
> > Userspace could also handle the MSR using MSR filters (would need to
> > confirm that).  Then userspace could also be in control of the cpuid 
> > bit.
> >>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> >>> data.
> >>> The data limitation could be fudged by shoving data into non-standard 
> >>> GPRs, but
> >>> that will result in truly heinous guest code, and extensibility issues.
> >>>
> >>> The data limitation is a moot point, because the x86-only thing is a deal
> >>> breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> >>> share
> >>> memory with a host.  I can't think of a clever way to avoid having to 
> >>> support
> >>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> >>> multiple KVM variants.
> >>>
> >> Potentially, there is another reason for in-kernel hypercall handling
> >> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> >> of each guest page, for instance pages in hypervisor state, i.e., pages
> >> with C=0 and pages in guest valid state with C=1.
> >>
> >> Now, there shouldn't be a need for page encryption status hypercalls on
> >> SEV-SNP as KVM can track & reference guest page status directly using
> >> the RMP table.
> > Relying on the RMP table itself would require locking the RMP table for an
> > extended duration, and walking the entire RMP to find shared pages would be
> > very inefficient.
> >
> >> As KVM maintains the RMP table, therefore we will need SET/GET type of
> >> interfaces to provide the guest page encryption status to userspace.
> > Hrm, somehow I temporarily forgot about SNP and TDX adding their own 
> > hypercalls
> > for converting between shared and private.  And in the case of TDX, the 
> > hypercall
> > can't be trusted, i.e. is just a hint, otherwise the guest could induce a 
> > #MC in
> > the host.
> >
> > But, the different guest behavior doesn't require KVM to maintain a 
> > list/tree,
> > e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page 
> > encryption
> > status changes would also suffice.
> >
> > Actually, that made me think of another argument against maintaining a list 
> > in
> > KVM: there's no way to notify userspace that a page's status has changed.
> > Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> > Obviously not a huge issue, but it does make migration slightly less 
> > efficient.
> >
> > On 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Steve Rutherford
On Mon, Mar 8, 2021 at 11:52 AM Sean Christopherson  wrote:
>
> On Mon, Mar 08, 2021, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > +Will and Quentin (arm64)
> > >
> > > Moving the non-KVM x86 folks to bcc, I don't they care about KVM details 
> > > at this
> > > point.
> > >
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > > > wrote:
> > > > > Thanks for grabbing the data!
> > > > >
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > >
> > > Yep, though it'd be good to get Paolo's input, too.
> > >
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > >
> > > I don't think this is a good candidate for arbitrary hypercall 
> > > interception.  Or
> > > rather, I think hypercall interception should be an orthogonal 
> > > implementation.
> > >
> > > The guest, including guest firmware, needs to be aware that the hypercall 
> > > is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace 
> > > VMMs to
> > > implement a common ABI is an unnecessary risk.
> > >
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> > > ABI but
> > > require further VMM intervention.  But, I just don't see the point, it 
> > > would
> > > save only a few lines of code.  It would also limit what KVM could do in 
> > > the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > userspace,
> > > then mandatory interception would essentially make it impossible for KVM 
> > > to do
> > > bookkeeping while still honoring the interception request.
> > >
> > > However, I do think it would make sense to have the userspace exit be a 
> > > generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  
> > > It's
> > > just not used anywhere.
> > >
> > > /* KVM_EXIT_HYPERCALL */
> > > struct {
> > > __u64 nr;
> > > __u64 args[6];
> > > __u64 ret;
> > > __u32 longmode;
> > > __u32 pad;
> > > } hypercall;
> > >
> > >
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid 
> > > > > bit.
> > >
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> > > data.
> > > The data limitation could be fudged by shoving data into non-standard 
> > > GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > >
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> > > share
> > > memory with a host.  I can't think of a clever way to avoid having to 
> > > support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > >
> >
> > Potentially, there is another reason for in-kernel hypercall handling
> > considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> > of each guest page, for instance pages in hypervisor state, i.e., pages
> > with C=0 and pages in guest valid state with C=1.
> >
> > Now, there shouldn't be a need for page encryption status hypercalls on
> > SEV-SNP as KVM can track & reference guest page status directly using
> > the RMP table.
>
> Relying on the RMP table itself would require locking the RMP table for an
> extended duration, and walking the entire RMP to find shared pages would be
> very inefficient.
>
> > As KVM maintains the RMP table, therefore we will need SET/GET type of
> > interfaces to provide the guest page encryption status to userspace.
>
> Hrm, somehow I temporarily forgot about SNP and TDX adding their own 
> hypercalls
> for converting between shared and private.  And in the case of TDX, the 
> hypercall
> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC 
> in
> the host.
>
> But, the different guest behavior doesn't require KVM to maintain a list/tree,
> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
> status changes would also suffice.
>
> Actually, that made me think of another argument against maintaining a list in
> KVM: there's no way to notify userspace that a page's status has changed.
> Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> Obviously not a huge issue, but it does make migration slightly less 
> efficient.
>
> On a related topic, there are fatal race conditions that will 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Ashish Kalra
On Mon, Mar 08, 2021 at 03:11:41PM -0600, Brijesh Singh wrote:
> 
> On 3/8/21 1:51 PM, Sean Christopherson wrote:
> > On Mon, Mar 08, 2021, Ashish Kalra wrote:
> >> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> >>> +Will and Quentin (arm64)
> >>>
> >>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details 
> >>> at this
> >>> point.
> >>>
> >>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
>  On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > wrote:
> > Thanks for grabbing the data!
> >
> > I am fine with both paths. Sean has stated an explicit desire for
> > hypercall exiting, so I think that would be the current consensus.
> >>> Yep, though it'd be good to get Paolo's input, too.
> >>>
> > If we want to do hypercall exiting, this should be in a follow-up
> > series where we implement something more generic, e.g. a hypercall
> > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > exit route, we can drop the kvm side of the hypercall.
> >>> I don't think this is a good candidate for arbitrary hypercall 
> >>> interception.  Or
> >>> rather, I think hypercall interception should be an orthogonal 
> >>> implementation.
> >>>
> >>> The guest, including guest firmware, needs to be aware that the hypercall 
> >>> is
> >>> supported, and the ABI needs to be well-defined.  Relying on userspace 
> >>> VMMs to
> >>> implement a common ABI is an unnecessary risk.
> >>>
> >>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> >>> ABI but
> >>> require further VMM intervention.  But, I just don't see the point, it 
> >>> would
> >>> save only a few lines of code.  It would also limit what KVM could do in 
> >>> the
> >>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> >>> userspace,
> >>> then mandatory interception would essentially make it impossible for KVM 
> >>> to do
> >>> bookkeeping while still honoring the interception request.
> >>>
> >>> However, I do think it would make sense to have the userspace exit be a 
> >>> generic
> >>> exit type.  But hey, we already have the necessary ABI defined for that!  
> >>> It's
> >>> just not used anywhere.
> >>>
> >>>   /* KVM_EXIT_HYPERCALL */
> >>>   struct {
> >>>   __u64 nr;
> >>>   __u64 args[6];
> >>>   __u64 ret;
> >>>   __u32 longmode;
> >>>   __u32 pad;
> >>>   } hypercall;
> >>>
> >>>
> > Userspace could also handle the MSR using MSR filters (would need to
> > confirm that).  Then userspace could also be in control of the cpuid 
> > bit.
> >>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> >>> data.
> >>> The data limitation could be fudged by shoving data into non-standard 
> >>> GPRs, but
> >>> that will result in truly heinous guest code, and extensibility issues.
> >>>
> >>> The data limitation is a moot point, because the x86-only thing is a deal
> >>> breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> >>> share
> >>> memory with a host.  I can't think of a clever way to avoid having to 
> >>> support
> >>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> >>> multiple KVM variants.
> >>>
> >> Potentially, there is another reason for in-kernel hypercall handling
> >> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> >> of each guest page, for instance pages in hypervisor state, i.e., pages
> >> with C=0 and pages in guest valid state with C=1.
> >>
> >> Now, there shouldn't be a need for page encryption status hypercalls on 
> >> SEV-SNP as KVM can track & reference guest page status directly using 
> >> the RMP table.
> > Relying on the RMP table itself would require locking the RMP table for an
> > extended duration, and walking the entire RMP to find shared pages would be
> > very inefficient.
> >
> >> As KVM maintains the RMP table, therefore we will need SET/GET type of
> >> interfaces to provide the guest page encryption status to userspace.
> > Hrm, somehow I temporarily forgot about SNP and TDX adding their own 
> > hypercalls
> > for converting between shared and private.  And in the case of TDX, the 
> > hypercall
> > can't be trusted, i.e. is just a hint, otherwise the guest could induce a 
> > #MC in
> > the host.
> >
> > But, the different guest behavior doesn't require KVM to maintain a 
> > list/tree,
> > e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page 
> > encryption
> > status changes would also suffice.  
> >
> > Actually, that made me think of another argument against maintaining a list 
> > in
> > KVM: there's no way to notify userspace that a page's status has changed.
> > Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> > Obviously not a huge issue, but it does make migration slightly less 
> > efficient.
> >
> > On a 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Brijesh Singh


On 3/8/21 1:51 PM, Sean Christopherson wrote:
> On Mon, Mar 08, 2021, Ashish Kalra wrote:
>> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
>>> +Will and Quentin (arm64)
>>>
>>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at 
>>> this
>>> point.
>>>
>>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
 On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> wrote:
> Thanks for grabbing the data!
>
> I am fine with both paths. Sean has stated an explicit desire for
> hypercall exiting, so I think that would be the current consensus.
>>> Yep, though it'd be good to get Paolo's input, too.
>>>
> If we want to do hypercall exiting, this should be in a follow-up
> series where we implement something more generic, e.g. a hypercall
> exiting bitmap or hypercall exit list. If we are taking the hypercall
> exit route, we can drop the kvm side of the hypercall.
>>> I don't think this is a good candidate for arbitrary hypercall 
>>> interception.  Or
>>> rather, I think hypercall interception should be an orthogonal 
>>> implementation.
>>>
>>> The guest, including guest firmware, needs to be aware that the hypercall is
>>> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs 
>>> to
>>> implement a common ABI is an unnecessary risk.
>>>
>>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
>>> ABI but
>>> require further VMM intervention.  But, I just don't see the point, it would
>>> save only a few lines of code.  It would also limit what KVM could do in the
>>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
>>> userspace,
>>> then mandatory interception would essentially make it impossible for KVM to 
>>> do
>>> bookkeeping while still honoring the interception request.
>>>
>>> However, I do think it would make sense to have the userspace exit be a 
>>> generic
>>> exit type.  But hey, we already have the necessary ABI defined for that!  
>>> It's
>>> just not used anywhere.
>>>
>>> /* KVM_EXIT_HYPERCALL */
>>> struct {
>>> __u64 nr;
>>> __u64 args[6];
>>> __u64 ret;
>>> __u32 longmode;
>>> __u32 pad;
>>> } hypercall;
>>>
>>>
> Userspace could also handle the MSR using MSR filters (would need to
> confirm that).  Then userspace could also be in control of the cpuid bit.
>>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
>>> The data limitation could be fudged by shoving data into non-standard GPRs, 
>>> but
>>> that will result in truly heinous guest code, and extensibility issues.
>>>
>>> The data limitation is a moot point, because the x86-only thing is a deal
>>> breaker.  arm64's pKVM work has a near-identical use case for a guest to 
>>> share
>>> memory with a host.  I can't think of a clever way to avoid having to 
>>> support
>>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
>>> multiple KVM variants.
>>>
>> Potentially, there is another reason for in-kernel hypercall handling
>> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
>> of each guest page, for instance pages in hypervisor state, i.e., pages
>> with C=0 and pages in guest valid state with C=1.
>>
>> Now, there shouldn't be a need for page encryption status hypercalls on 
>> SEV-SNP as KVM can track & reference guest page status directly using 
>> the RMP table.
> Relying on the RMP table itself would require locking the RMP table for an
> extended duration, and walking the entire RMP to find shared pages would be
> very inefficient.
>
>> As KVM maintains the RMP table, therefore we will need SET/GET type of
>> interfaces to provide the guest page encryption status to userspace.
> Hrm, somehow I temporarily forgot about SNP and TDX adding their own 
> hypercalls
> for converting between shared and private.  And in the case of TDX, the 
> hypercall
> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC 
> in
> the host.
>
> But, the different guest behavior doesn't require KVM to maintain a list/tree,
> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
> status changes would also suffice.  
>
> Actually, that made me think of another argument against maintaining a list in
> KVM: there's no way to notify userspace that a page's status has changed.
> Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> Obviously not a huge issue, but it does make migration slightly less 
> efficient.
>
> On a related topic, there are fatal race conditions that will require careful
> coordination between guest and host, and will effectively be wired into the 
> ABI.
> SNP and TDX don't suffer these issues because host awareness of status is 
> atomic
> with respect to the guest actually writing the page with the new 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Ashish Kalra
On Mon, Mar 08, 2021 at 11:51:57AM -0800, Sean Christopherson wrote:
> On Mon, Mar 08, 2021, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > +Will and Quentin (arm64)
> > > 
> > > Moving the non-KVM x86 folks to bcc, I don't they care about KVM details 
> > > at this
> > > point.
> > > 
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > > > wrote:
> > > > > Thanks for grabbing the data!
> > > > > 
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > > 
> > > Yep, though it'd be good to get Paolo's input, too.
> > > 
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > > 
> > > I don't think this is a good candidate for arbitrary hypercall 
> > > interception.  Or
> > > rather, I think hypercall interception should be an orthogonal 
> > > implementation.
> > > 
> > > The guest, including guest firmware, needs to be aware that the hypercall 
> > > is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace 
> > > VMMs to
> > > implement a common ABI is an unnecessary risk.
> > > 
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> > > ABI but
> > > require further VMM intervention.  But, I just don't see the point, it 
> > > would
> > > save only a few lines of code.  It would also limit what KVM could do in 
> > > the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > userspace,
> > > then mandatory interception would essentially make it impossible for KVM 
> > > to do
> > > bookkeeping while still honoring the interception request.
> > > 
> > > However, I do think it would make sense to have the userspace exit be a 
> > > generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  
> > > It's
> > > just not used anywhere.
> > > 
> > >   /* KVM_EXIT_HYPERCALL */
> > >   struct {
> > >   __u64 nr;
> > >   __u64 args[6];
> > >   __u64 ret;
> > >   __u32 longmode;
> > >   __u32 pad;
> > >   } hypercall;
> > > 
> > > 
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid 
> > > > > bit.
> > > 
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> > > data.
> > > The data limitation could be fudged by shoving data into non-standard 
> > > GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > > 
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> > > share
> > > memory with a host.  I can't think of a clever way to avoid having to 
> > > support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > > 
> > 
> > Potentially, there is another reason for in-kernel hypercall handling
> > considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> > of each guest page, for instance pages in hypervisor state, i.e., pages
> > with C=0 and pages in guest valid state with C=1.
> > 
> > Now, there shouldn't be a need for page encryption status hypercalls on 
> > SEV-SNP as KVM can track & reference guest page status directly using 
> > the RMP table.
> 
> Relying on the RMP table itself would require locking the RMP table for an
> extended duration, and walking the entire RMP to find shared pages would be
> very inefficient.
> 
> > As KVM maintains the RMP table, therefore we will need SET/GET type of
> > interfaces to provide the guest page encryption status to userspace.
> 
> Hrm, somehow I temporarily forgot about SNP and TDX adding their own 
> hypercalls
> for converting between shared and private.  And in the case of TDX, the 
> hypercall
> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC 
> in
> the host.

One question here, is this because if hypercall can cause direct
modifications to the shared EPT, it can induce #MC in the host ?

Thanks,
Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Sean Christopherson
On Mon, Mar 08, 2021, Ashish Kalra wrote:
> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > +Will and Quentin (arm64)
> > 
> > Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at 
> > this
> > point.
> > 
> > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > > wrote:
> > > > Thanks for grabbing the data!
> > > > 
> > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > hypercall exiting, so I think that would be the current consensus.
> > 
> > Yep, though it'd be good to get Paolo's input, too.
> > 
> > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > series where we implement something more generic, e.g. a hypercall
> > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > exit route, we can drop the kvm side of the hypercall.
> > 
> > I don't think this is a good candidate for arbitrary hypercall 
> > interception.  Or
> > rather, I think hypercall interception should be an orthogonal 
> > implementation.
> > 
> > The guest, including guest firmware, needs to be aware that the hypercall is
> > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs 
> > to
> > implement a common ABI is an unnecessary risk.
> > 
> > We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> > ABI but
> > require further VMM intervention.  But, I just don't see the point, it would
> > save only a few lines of code.  It would also limit what KVM could do in the
> > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > userspace,
> > then mandatory interception would essentially make it impossible for KVM to 
> > do
> > bookkeeping while still honoring the interception request.
> > 
> > However, I do think it would make sense to have the userspace exit be a 
> > generic
> > exit type.  But hey, we already have the necessary ABI defined for that!  
> > It's
> > just not used anywhere.
> > 
> > /* KVM_EXIT_HYPERCALL */
> > struct {
> > __u64 nr;
> > __u64 args[6];
> > __u64 ret;
> > __u32 longmode;
> > __u32 pad;
> > } hypercall;
> > 
> > 
> > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > confirm that).  Then userspace could also be in control of the cpuid 
> > > > bit.
> > 
> > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > The data limitation could be fudged by shoving data into non-standard GPRs, 
> > but
> > that will result in truly heinous guest code, and extensibility issues.
> > 
> > The data limitation is a moot point, because the x86-only thing is a deal
> > breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> > share
> > memory with a host.  I can't think of a clever way to avoid having to 
> > support
> > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > multiple KVM variants.
> > 
> 
> Potentially, there is another reason for in-kernel hypercall handling
> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> of each guest page, for instance pages in hypervisor state, i.e., pages
> with C=0 and pages in guest valid state with C=1.
> 
> Now, there shouldn't be a need for page encryption status hypercalls on 
> SEV-SNP as KVM can track & reference guest page status directly using 
> the RMP table.

Relying on the RMP table itself would require locking the RMP table for an
extended duration, and walking the entire RMP to find shared pages would be
very inefficient.

> As KVM maintains the RMP table, therefore we will need SET/GET type of
> interfaces to provide the guest page encryption status to userspace.

Hrm, somehow I temporarily forgot about SNP and TDX adding their own hypercalls
for converting between shared and private.  And in the case of TDX, the 
hypercall
can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC in
the host.

But, the different guest behavior doesn't require KVM to maintain a list/tree,
e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
status changes would also suffice.  

Actually, that made me think of another argument against maintaining a list in
KVM: there's no way to notify userspace that a page's status has changed.
Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
Obviously not a huge issue, but it does make migration slightly less efficient.

On a related topic, there are fatal race conditions that will require careful
coordination between guest and host, and will effectively be wired into the ABI.
SNP and TDX don't suffer these issues because host awareness of status is atomic
with respect to the guest actually writing the page with the new encryption
status.

For SEV live migration...

If the 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Ashish Kalra
On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> +Will and Quentin (arm64)
> 
> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at 
> this
> point.
> 
> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > wrote:
> > > Thanks for grabbing the data!
> > > 
> > > I am fine with both paths. Sean has stated an explicit desire for
> > > hypercall exiting, so I think that would be the current consensus.
> 
> Yep, though it'd be good to get Paolo's input, too.
> 
> > > If we want to do hypercall exiting, this should be in a follow-up
> > > series where we implement something more generic, e.g. a hypercall
> > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > exit route, we can drop the kvm side of the hypercall.
> 
> I don't think this is a good candidate for arbitrary hypercall interception.  
> Or
> rather, I think hypercall interception should be an orthogonal implementation.
> 
> The guest, including guest firmware, needs to be aware that the hypercall is
> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> implement a common ABI is an unnecessary risk.
> 
> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI 
> but
> require further VMM intervention.  But, I just don't see the point, it would
> save only a few lines of code.  It would also limit what KVM could do in the
> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> then mandatory interception would essentially make it impossible for KVM to do
> bookkeeping while still honoring the interception request.
> 
> However, I do think it would make sense to have the userspace exit be a 
> generic
> exit type.  But hey, we already have the necessary ABI defined for that!  It's
> just not used anywhere.
> 
>   /* KVM_EXIT_HYPERCALL */
>   struct {
>   __u64 nr;
>   __u64 args[6];
>   __u64 ret;
>   __u32 longmode;
>   __u32 pad;
>   } hypercall;
> 
> 
> > > Userspace could also handle the MSR using MSR filters (would need to
> > > confirm that).  Then userspace could also be in control of the cpuid bit.
> 
> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> The data limitation could be fudged by shoving data into non-standard GPRs, 
> but
> that will result in truly heinous guest code, and extensibility issues.
> 
> The data limitation is a moot point, because the x86-only thing is a deal
> breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> memory with a host.  I can't think of a clever way to avoid having to support
> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> multiple KVM variants.
> 

Potentially, there is another reason for in-kernel hypercall handling
considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
of each guest page, for instance pages in hypervisor state, i.e., pages
with C=0 and pages in guest valid state with C=1.

Now, there shouldn't be a need for page encryption status hypercalls on 
SEV-SNP as KVM can track & reference guest page status directly using 
the RMP table.

As KVM maintains the RMP table, therefore we will need SET/GET type of
interfaces to provide the guest page encryption status to userspace.

For the above reason if we do in-kernel hypercall handling for page
encryption status (which we probably won't require for SEV-SNP &
correspondingly there will be no hypercall exiting), then we can
implement a standard GET/SET ioctl interface to get/set the guest page
encryption status for userspace, which will work across SEV, SEV-ES and
SEV-SNP.

Thanks,
Ashish



Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-03 Thread Ashish Kalra
On Wed, Mar 03, 2021 at 06:54:41PM +, Will Deacon wrote:
> [+Marc]
> 
> On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > > > wrote:
> > > > > Thanks for grabbing the data!
> > > > > 
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > > 
> > > Yep, though it'd be good to get Paolo's input, too.
> > > 
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > > 
> > > I don't think this is a good candidate for arbitrary hypercall 
> > > interception.  Or
> > > rather, I think hypercall interception should be an orthogonal 
> > > implementation.
> > > 
> > > The guest, including guest firmware, needs to be aware that the hypercall 
> > > is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace 
> > > VMMs to
> > > implement a common ABI is an unnecessary risk.
> > > 
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> > > ABI but
> > > require further VMM intervention.  But, I just don't see the point, it 
> > > would
> > > save only a few lines of code.  It would also limit what KVM could do in 
> > > the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > userspace,
> > > then mandatory interception would essentially make it impossible for KVM 
> > > to do
> > > bookkeeping while still honoring the interception request.
> > > 
> > > However, I do think it would make sense to have the userspace exit be a 
> > > generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  
> > > It's
> > > just not used anywhere.
> > > 
> > >   /* KVM_EXIT_HYPERCALL */
> > >   struct {
> > >   __u64 nr;
> > >   __u64 args[6];
> > >   __u64 ret;
> > >   __u32 longmode;
> > >   __u32 pad;
> > >   } hypercall;
> > > 
> > > 
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid 
> > > > > bit.
> > > 
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> > > data.
> > > The data limitation could be fudged by shoving data into non-standard 
> > > GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > > 
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> > > share
> > > memory with a host.  I can't think of a clever way to avoid having to 
> > > support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > 
> > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > patch-set and probably relevant to arm64 nVHE hypervisor
> > mode/implementation, and potentially makes sense as it adds guest
> > memory protection as both host and guest kernels are running on the same
> > privilege level ?
> > 
> > Though i do see that the pKVM stuff adds two hypercalls, specifically :
> > 
> > pkvm_create_mappings() ( I assume this is for setting shared memory
> > regions between host and guest) &
> > pkvm_create_private_mappings().
> > 
> > And the use-cases are quite similar to memory protection architectues
> > use cases, for example, use with virtio devices, guest DMA I/O, etc.
> 
> These hypercalls are both private to the host kernel communicating with
> its hypervisor counterpart, so I don't think they're particularly
> relevant here. 

Yes, i have the same thoughts here as this looked like a private hypercall
interface between host kernel and hypervisor, rather than between guest
and host/hypervisor.

> As far as I can see, the more useful thing is to allow
> the guest to communicate back to the host (and the VMM) that it has opened
> up a memory window, perhaps for virtio rings or some other shared memory.
> 

Yes, this is our main use case.

> We hacked this up as a prototype in the past:
> 
> 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-03 Thread Will Deacon
[+Marc]

On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > > wrote:
> > > > Thanks for grabbing the data!
> > > > 
> > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > hypercall exiting, so I think that would be the current consensus.
> > 
> > Yep, though it'd be good to get Paolo's input, too.
> > 
> > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > series where we implement something more generic, e.g. a hypercall
> > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > exit route, we can drop the kvm side of the hypercall.
> > 
> > I don't think this is a good candidate for arbitrary hypercall 
> > interception.  Or
> > rather, I think hypercall interception should be an orthogonal 
> > implementation.
> > 
> > The guest, including guest firmware, needs to be aware that the hypercall is
> > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs 
> > to
> > implement a common ABI is an unnecessary risk.
> > 
> > We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> > ABI but
> > require further VMM intervention.  But, I just don't see the point, it would
> > save only a few lines of code.  It would also limit what KVM could do in the
> > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > userspace,
> > then mandatory interception would essentially make it impossible for KVM to 
> > do
> > bookkeeping while still honoring the interception request.
> > 
> > However, I do think it would make sense to have the userspace exit be a 
> > generic
> > exit type.  But hey, we already have the necessary ABI defined for that!  
> > It's
> > just not used anywhere.
> > 
> > /* KVM_EXIT_HYPERCALL */
> > struct {
> > __u64 nr;
> > __u64 args[6];
> > __u64 ret;
> > __u32 longmode;
> > __u32 pad;
> > } hypercall;
> > 
> > 
> > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > confirm that).  Then userspace could also be in control of the cpuid 
> > > > bit.
> > 
> > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > The data limitation could be fudged by shoving data into non-standard GPRs, 
> > but
> > that will result in truly heinous guest code, and extensibility issues.
> > 
> > The data limitation is a moot point, because the x86-only thing is a deal
> > breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> > share
> > memory with a host.  I can't think of a clever way to avoid having to 
> > support
> > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > multiple KVM variants.
> 
> Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> patch-set and probably relevant to arm64 nVHE hypervisor
> mode/implementation, and potentially makes sense as it adds guest
> memory protection as both host and guest kernels are running on the same
> privilege level ?
> 
> Though i do see that the pKVM stuff adds two hypercalls, specifically :
> 
> pkvm_create_mappings() ( I assume this is for setting shared memory
> regions between host and guest) &
> pkvm_create_private_mappings().
> 
> And the use-cases are quite similar to memory protection architectues
> use cases, for example, use with virtio devices, guest DMA I/O, etc.

These hypercalls are both private to the host kernel communicating with
its hypervisor counterpart, so I don't think they're particularly
relevant here. As far as I can see, the more useful thing is to allow
the guest to communicate back to the host (and the VMM) that it has opened
up a memory window, perhaps for virtio rings or some other shared memory.

We hacked this up as a prototype in the past:

https://android-kvm.googlesource.com/linux/+/d12a9e2c12a52cf7140d40cd9fa092dc8a85fac9%5E%21/

but that's all arm64-specific and if we're solving the same problem as
you, then let's avoid arch-specific stuff if possible. The way in which
the guest discovers the interface will be arch-specific (we already have
a mechanism for that and some hypercalls are already allocated by specs
from Arm), but the interface back to the VMM and some (most?) of the host
handling could be shared.

> But, isn't this patch set still RFC, and though i agree that it adds
> an infrastructure for standardised communication between the host and
> it's guests for mutually controlled shared memory regions and
> surely adds some kind of portability between hypervisor
> implementations, but nothing is standardised still, right ?

No, and it seems that you're further ahead than us in terms of
implementation in this area. We're happy to 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-02 Thread Ashish Kalra
On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> +Will and Quentin (arm64)
> 
> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at 
> this
> point.
> 
> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > wrote:
> > > Thanks for grabbing the data!
> > > 
> > > I am fine with both paths. Sean has stated an explicit desire for
> > > hypercall exiting, so I think that would be the current consensus.
> 
> Yep, though it'd be good to get Paolo's input, too.
> 
> > > If we want to do hypercall exiting, this should be in a follow-up
> > > series where we implement something more generic, e.g. a hypercall
> > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > exit route, we can drop the kvm side of the hypercall.
> 
> I don't think this is a good candidate for arbitrary hypercall interception.  
> Or
> rather, I think hypercall interception should be an orthogonal implementation.
> 
> The guest, including guest firmware, needs to be aware that the hypercall is
> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
> implement a common ABI is an unnecessary risk.
> 
> We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI 
> but
> require further VMM intervention.  But, I just don't see the point, it would
> save only a few lines of code.  It would also limit what KVM could do in the
> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> then mandatory interception would essentially make it impossible for KVM to do
> bookkeeping while still honoring the interception request.
> 
> However, I do think it would make sense to have the userspace exit be a 
> generic
> exit type.  But hey, we already have the necessary ABI defined for that!  It's
> just not used anywhere.
> 
>   /* KVM_EXIT_HYPERCALL */
>   struct {
>   __u64 nr;
>   __u64 args[6];
>   __u64 ret;
>   __u32 longmode;
>   __u32 pad;
>   } hypercall;
> 
> 
> > > Userspace could also handle the MSR using MSR filters (would need to
> > > confirm that).  Then userspace could also be in control of the cpuid bit.
> 
> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> The data limitation could be fudged by shoving data into non-standard GPRs, 
> but
> that will result in truly heinous guest code, and extensibility issues.
> 
> The data limitation is a moot point, because the x86-only thing is a deal
> breaker.  arm64's pKVM work has a near-identical use case for a guest to share
> memory with a host.  I can't think of a clever way to avoid having to support
> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> multiple KVM variants.

Looking at arm64's pKVM work, i see that it is a recently introduced RFC
patch-set and probably relevant to arm64 nVHE hypervisor
mode/implementation, and potentially makes sense as it adds guest
memory protection as both host and guest kernels are running on the same
privilege level ?

Though i do see that the pKVM stuff adds two hypercalls, specifically :

pkvm_create_mappings() ( I assume this is for setting shared memory
regions between host and guest) &
pkvm_create_private_mappings().

And the use-cases are quite similar to memory protection architectues
use cases, for example, use with virtio devices, guest DMA I/O, etc.

But, isn't this patch set still RFC, and though i agree that it adds
an infrastructure for standardised communication between the host and
it's guests for mutually controlled shared memory regions and
surely adds some kind of portability between hypervisor
implementations, but nothing is standardised still, right ?

Thanks,
Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-02 Thread Ashish Kalra
On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > +Will and Quentin (arm64)
> > 
> > Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at 
> > this
> > point.
> > 
> > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > > wrote:
> > > > Thanks for grabbing the data!
> > > > 
> > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > hypercall exiting, so I think that would be the current consensus.
> > 
> > Yep, though it'd be good to get Paolo's input, too.
> > 
> > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > series where we implement something more generic, e.g. a hypercall
> > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > exit route, we can drop the kvm side of the hypercall.
> > 
> > I don't think this is a good candidate for arbitrary hypercall 
> > interception.  Or
> > rather, I think hypercall interception should be an orthogonal 
> > implementation.
> > 
> > The guest, including guest firmware, needs to be aware that the hypercall is
> > supported, and the ABI needs to be well-defined.  Relying on userspace VMMs 
> > to
> > implement a common ABI is an unnecessary risk.
> > 
> > We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> > ABI but
> > require further VMM intervention.  But, I just don't see the point, it would
> > save only a few lines of code.  It would also limit what KVM could do in the
> > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > userspace,
> > then mandatory interception would essentially make it impossible for KVM to 
> > do
> > bookkeeping while still honoring the interception request.
> > 
> > However, I do think it would make sense to have the userspace exit be a 
> > generic
> > exit type.  But hey, we already have the necessary ABI defined for that!  
> > It's
> > just not used anywhere.
> > 
> > /* KVM_EXIT_HYPERCALL */
> > struct {
> > __u64 nr;
> > __u64 args[6];
> > __u64 ret;
> > __u32 longmode;
> > __u32 pad;
> > } hypercall;
> > 
> > 
> > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > confirm that).  Then userspace could also be in control of the cpuid 
> > > > bit.
> > 
> > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > The data limitation could be fudged by shoving data into non-standard GPRs, 
> > but
> > that will result in truly heinous guest code, and extensibility issues.
> > 
> > The data limitation is a moot point, because the x86-only thing is a deal
> > breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> > share
> > memory with a host.  I can't think of a clever way to avoid having to 
> > support
> > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > multiple KVM variants.
> 
> Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> patch-set and probably relevant to arm64 nVHE hypervisor
> mode/implementation, and potentially makes sense as it adds guest
> memory protection as both host and guest kernels are running on the same
> privilege level ?
> 
> Though i do see that the pKVM stuff adds two hypercalls, specifically :
> 
> pkvm_create_mappings() ( I assume this is for setting shared memory
> regions between host and guest) &
> pkvm_create_private_mappings().
> 
> And the use-cases are quite similar to memory protection architectues
> use cases, for example, use with virtio devices, guest DMA I/O, etc.
> 
> But, isn't this patch set still RFC, and though i agree that it adds
> an infrastructure for standardised communication between the host and
> it's guests for mutually controlled shared memory regions and
> surely adds some kind of portability between hypervisor
> implementations, but nothing is standardised still, right ?
> 

And to add here, the hypercall implementation is in-HYP mode,
there is no infrastructure as part of this patch-set to do
hypercall exiting and handling it in user-space. 

Though arguably, we may able to add a hypercall exiting code path on the
amd64 implementation for the same hypercall interfaces ?

Alternatively, we implement this in-kernel and then add SET/GET ioctl
interfaces to export the shared pages/regions list to user-space.

Thanks,
Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-26 Thread Sean Christopherson
+Will and Quentin (arm64)

Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at this
point.

On Fri, Feb 26, 2021, Ashish Kalra wrote:
> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  wrote:
> > Thanks for grabbing the data!
> > 
> > I am fine with both paths. Sean has stated an explicit desire for
> > hypercall exiting, so I think that would be the current consensus.

Yep, though it'd be good to get Paolo's input, too.

> > If we want to do hypercall exiting, this should be in a follow-up
> > series where we implement something more generic, e.g. a hypercall
> > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > exit route, we can drop the kvm side of the hypercall.

I don't think this is a good candidate for arbitrary hypercall interception.  Or
rather, I think hypercall interception should be an orthogonal implementation.

The guest, including guest firmware, needs to be aware that the hypercall is
supported, and the ABI needs to be well-defined.  Relying on userspace VMMs to
implement a common ABI is an unnecessary risk.

We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
require further VMM intervention.  But, I just don't see the point, it would
save only a few lines of code.  It would also limit what KVM could do in the
future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
then mandatory interception would essentially make it impossible for KVM to do
bookkeeping while still honoring the interception request.

However, I do think it would make sense to have the userspace exit be a generic
exit type.  But hey, we already have the necessary ABI defined for that!  It's
just not used anywhere.

/* KVM_EXIT_HYPERCALL */
struct {
__u64 nr;
__u64 args[6];
__u64 ret;
__u32 longmode;
__u32 pad;
} hypercall;


> > Userspace could also handle the MSR using MSR filters (would need to
> > confirm that).  Then userspace could also be in control of the cpuid bit.

An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
The data limitation could be fudged by shoving data into non-standard GPRs, but
that will result in truly heinous guest code, and extensibility issues.

The data limitation is a moot point, because the x86-only thing is a deal
breaker.  arm64's pKVM work has a near-identical use case for a guest to share
memory with a host.  I can't think of a clever way to avoid having to support
TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
multiple KVM variants.

> > Essentially, I think you could drop most of the host kernel work if
> > there were generic support for hypercall exiting. Then userspace would
> > be responsible for all of that. Thoughts on this?

> So if i understand it correctly, i will submitting v11 of this patch-set
> with in-kernel support for page encryption status hypercalls and shared
> pages list and the userspace control of SEV live migration feature
> support and fixes for MSR handling.

At this point, I'd say hold off on putting more effort into an implementation
until we have consensus.

> In subsequent follow-up patches we will add generic support for hypercall 
> exiting and then drop kvm side of hypercall and also add userspace
> support for MSR handling.
> 
> Thanks,
> Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-26 Thread Ashish Kalra
Hello Steve,

On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  wrote:
> >
> > On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> > > On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > > > # Samples: 19K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 19573
> > > > #
> > > > # Overhead  Command  Shared Object Symbol
> > > > #   ...    .
> > > > #
> > > >100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > > >
> > > > Out of these 19573 hypercalls, # of page encryption status hcalls are 
> > > > 19479,
> > > > so almost all hypercalls here are page encryption status hypercalls.
> > >
> > > Oof.
> > >
> > > > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > > > during SEV guest boot if we do page encryption status hypercalls
> > > > pass-through to host userspace.
> > > >
> > > > But, then Brijesh pointed out to me and highlighted that currently
> > > > OVMF is doing lot of VMEXITs because they don't use the DMA pool to 
> > > > minimize the C-bit toggles,
> > > > in other words, OVMF bounce buffer does page state change on every DMA 
> > > > allocate and free.
> > > >
> > > > So here is the performance analysis after kernel and initrd have been
> > > > loaded into memory using grub and then starting perf just before 
> > > > booting the kernel.
> > > >
> > > > These are the performance #'s after kernel and initrd have been loaded 
> > > > into memory,
> > > > then perf is attached and kernel is booted :
> > > >
> > > > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > > > # Event count (approx.): 1081235
> > > > #
> > > > # Overhead  Trace output
> > > > #   
> > > > #
> > > > 99.77%  reason KVM_EXIT_IO (2)
> > > >  0.23%  reason KVM_EXIT_MMIO (6)
> > > >
> > > > # Samples: 1K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 1279
> > > > #
> > > >
> > > > So as the above data indicates, Linux is only making ~1K hypercalls,
> > > > compared to ~18K hypercalls made by OVMF in the above use case.
> > > >
> > > > Does the above adds a prerequisite that OVMF needs to be optimized if
> > > > and before hypercall pass-through can be done ?
> > >
> > > Disclaimer: my math could be totally wrong.
> > >
> > > I doubt it's a hard requirement.  Assuming a conversative roundtrip time 
> > > of 50k
> > > cycles, those 18K hypercalls will add well under a 1/2 a second of boot 
> > > time.
> > > If userspace can push the roundtrip time down to 10k cycles, the overhead 
> > > is
> > > more like 50 milliseconds.
> > >
> > > That being said, this does seem like a good OVMF cleanup, irrespective of 
> > > this
> > > new hypercall.  I assume it's not cheap to convert a page between 
> > > encrypted and
> > > decrypted.
> > >
> > > Thanks much for getting the numbers!
> >
> > Considering the above data and guest boot time latencies
> > (and potential issues with OVMF and optimizations required there),
> > do we have any consensus on whether we want to do page encryption
> > status hypercall passthrough or not ?
> >
> > Thanks,
> > Ashish
> 
> Thanks for grabbing the data!
> 
> I am fine with both paths. Sean has stated an explicit desire for
> hypercall exiting, so I think that would be the current consensus.
> 
> If we want to do hypercall exiting, this should be in a follow-up
> series where we implement something more generic, e.g. a hypercall
> exiting bitmap or hypercall exit list. If we are taking the hypercall
> exit route, we can drop the kvm side of the hypercall. Userspace could
> also handle the MSR using MSR filters (would need to confirm that).
> Then userspace could also be in control of the cpuid bit.
> 
> Essentially, I think you could drop most of the host kernel work if
> there were generic support for hypercall exiting. Then userspace would
> be responsible for all of that. Thoughts on this?
> 
So if i understand it correctly, i will submitting v11 of this patch-set
with in-kernel support for page encryption status hypercalls and shared
pages list and the userspace control of SEV live migration feature
support and fixes for MSR handling.

In subsequent follow-up patches we will add generic support for hypercall 
exiting and then drop kvm side of hypercall and also add userspace
support for MSR handling.

Thanks,
Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-25 Thread Steve Rutherford
On Thu, Feb 25, 2021 at 2:59 PM Steve Rutherford  wrote:
>
> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  wrote:
> >
> > On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> > > On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > > > # Samples: 19K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 19573
> > > > #
> > > > # Overhead  Command  Shared Object Symbol
> > > > #   ...    .
> > > > #
> > > >100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > > >
> > > > Out of these 19573 hypercalls, # of page encryption status hcalls are 
> > > > 19479,
> > > > so almost all hypercalls here are page encryption status hypercalls.
> > >
> > > Oof.
> > >
> > > > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > > > during SEV guest boot if we do page encryption status hypercalls
> > > > pass-through to host userspace.
> > > >
> > > > But, then Brijesh pointed out to me and highlighted that currently
> > > > OVMF is doing lot of VMEXITs because they don't use the DMA pool to 
> > > > minimize the C-bit toggles,
> > > > in other words, OVMF bounce buffer does page state change on every DMA 
> > > > allocate and free.
> > > >
> > > > So here is the performance analysis after kernel and initrd have been
> > > > loaded into memory using grub and then starting perf just before 
> > > > booting the kernel.
> > > >
> > > > These are the performance #'s after kernel and initrd have been loaded 
> > > > into memory,
> > > > then perf is attached and kernel is booted :
> > > >
> > > > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > > > # Event count (approx.): 1081235
> > > > #
> > > > # Overhead  Trace output
> > > > #   
> > > > #
> > > > 99.77%  reason KVM_EXIT_IO (2)
> > > >  0.23%  reason KVM_EXIT_MMIO (6)
> > > >
> > > > # Samples: 1K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 1279
> > > > #
> > > >
> > > > So as the above data indicates, Linux is only making ~1K hypercalls,
> > > > compared to ~18K hypercalls made by OVMF in the above use case.
> > > >
> > > > Does the above adds a prerequisite that OVMF needs to be optimized if
> > > > and before hypercall pass-through can be done ?
> > >
> > > Disclaimer: my math could be totally wrong.
> > >
> > > I doubt it's a hard requirement.  Assuming a conversative roundtrip time 
> > > of 50k
> > > cycles, those 18K hypercalls will add well under a 1/2 a second of boot 
> > > time.
> > > If userspace can push the roundtrip time down to 10k cycles, the overhead 
> > > is
> > > more like 50 milliseconds.
> > >
> > > That being said, this does seem like a good OVMF cleanup, irrespective of 
> > > this
> > > new hypercall.  I assume it's not cheap to convert a page between 
> > > encrypted and
> > > decrypted.
> > >
> > > Thanks much for getting the numbers!
> >
> > Considering the above data and guest boot time latencies
> > (and potential issues with OVMF and optimizations required there),
> > do we have any consensus on whether we want to do page encryption
> > status hypercall passthrough or not ?
> >
> > Thanks,
> > Ashish
>
> Thanks for grabbing the data!
>
> I am fine with both paths. Sean has stated an explicit desire for
> hypercall exiting, so I think that would be the current consensus.
>
> If we want to do hypercall exiting, this should be in a follow-up
> series where we implement something more generic, e.g. a hypercall
> exiting bitmap or hypercall exit list. If we are taking the hypercall
> exit route, we can drop the kvm side of the hypercall. Userspace could
> also handle the MSR using MSR filters (would need to confirm that).
> Then userspace could also be in control of the cpuid bit.
>
> Essentially, I think you could drop most of the host kernel work if
> there were generic support for hypercall exiting. Then userspace would
> be responsible for all of that. Thoughts on this?
>
> --Steve

This could even go a step further, and use an MSR write from within
the guest instead of a hypercall, which could be patched through to
userspace without host modification, if I understand the MSR filtering
correctly.
--Steve


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-25 Thread Steve Rutherford
On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  wrote:
>
> On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> > On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > > # Samples: 19K of event 'kvm:kvm_hypercall'
> > > # Event count (approx.): 19573
> > > #
> > > # Overhead  Command  Shared Object Symbol
> > > #   ...    .
> > > #
> > >100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > >
> > > Out of these 19573 hypercalls, # of page encryption status hcalls are 
> > > 19479,
> > > so almost all hypercalls here are page encryption status hypercalls.
> >
> > Oof.
> >
> > > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > > during SEV guest boot if we do page encryption status hypercalls
> > > pass-through to host userspace.
> > >
> > > But, then Brijesh pointed out to me and highlighted that currently
> > > OVMF is doing lot of VMEXITs because they don't use the DMA pool to 
> > > minimize the C-bit toggles,
> > > in other words, OVMF bounce buffer does page state change on every DMA 
> > > allocate and free.
> > >
> > > So here is the performance analysis after kernel and initrd have been
> > > loaded into memory using grub and then starting perf just before booting 
> > > the kernel.
> > >
> > > These are the performance #'s after kernel and initrd have been loaded 
> > > into memory,
> > > then perf is attached and kernel is booted :
> > >
> > > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > > # Event count (approx.): 1081235
> > > #
> > > # Overhead  Trace output
> > > #   
> > > #
> > > 99.77%  reason KVM_EXIT_IO (2)
> > >  0.23%  reason KVM_EXIT_MMIO (6)
> > >
> > > # Samples: 1K of event 'kvm:kvm_hypercall'
> > > # Event count (approx.): 1279
> > > #
> > >
> > > So as the above data indicates, Linux is only making ~1K hypercalls,
> > > compared to ~18K hypercalls made by OVMF in the above use case.
> > >
> > > Does the above adds a prerequisite that OVMF needs to be optimized if
> > > and before hypercall pass-through can be done ?
> >
> > Disclaimer: my math could be totally wrong.
> >
> > I doubt it's a hard requirement.  Assuming a conversative roundtrip time of 
> > 50k
> > cycles, those 18K hypercalls will add well under a 1/2 a second of boot 
> > time.
> > If userspace can push the roundtrip time down to 10k cycles, the overhead is
> > more like 50 milliseconds.
> >
> > That being said, this does seem like a good OVMF cleanup, irrespective of 
> > this
> > new hypercall.  I assume it's not cheap to convert a page between encrypted 
> > and
> > decrypted.
> >
> > Thanks much for getting the numbers!
>
> Considering the above data and guest boot time latencies
> (and potential issues with OVMF and optimizations required there),
> do we have any consensus on whether we want to do page encryption
> status hypercall passthrough or not ?
>
> Thanks,
> Ashish

Thanks for grabbing the data!

I am fine with both paths. Sean has stated an explicit desire for
hypercall exiting, so I think that would be the current consensus.

If we want to do hypercall exiting, this should be in a follow-up
series where we implement something more generic, e.g. a hypercall
exiting bitmap or hypercall exit list. If we are taking the hypercall
exit route, we can drop the kvm side of the hypercall. Userspace could
also handle the MSR using MSR filters (would need to confirm that).
Then userspace could also be in control of the cpuid bit.

Essentially, I think you could drop most of the host kernel work if
there were generic support for hypercall exiting. Then userspace would
be responsible for all of that. Thoughts on this?

--Steve


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-25 Thread Ashish Kalra
On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > # Samples: 19K of event 'kvm:kvm_hypercall'
> > # Event count (approx.): 19573
> > #
> > # Overhead  Command  Shared Object Symbol
> > #   ...    .
> > #
> >100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > 
> > Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
> > so almost all hypercalls here are page encryption status hypercalls.
> 
> Oof.
> 
> > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > during SEV guest boot if we do page encryption status hypercalls 
> > pass-through to host userspace.
> > 
> > But, then Brijesh pointed out to me and highlighted that currently
> > OVMF is doing lot of VMEXITs because they don't use the DMA pool to 
> > minimize the C-bit toggles,
> > in other words, OVMF bounce buffer does page state change on every DMA 
> > allocate and free.
> > 
> > So here is the performance analysis after kernel and initrd have been
> > loaded into memory using grub and then starting perf just before booting 
> > the kernel.
> > 
> > These are the performance #'s after kernel and initrd have been loaded into 
> > memory, 
> > then perf is attached and kernel is booted : 
> > 
> > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > # Event count (approx.): 1081235
> > #
> > # Overhead  Trace output
> > #   
> > #
> > 99.77%  reason KVM_EXIT_IO (2)
> >  0.23%  reason KVM_EXIT_MMIO (6)
> > 
> > # Samples: 1K of event 'kvm:kvm_hypercall'
> > # Event count (approx.): 1279
> > #
> > 
> > So as the above data indicates, Linux is only making ~1K hypercalls,
> > compared to ~18K hypercalls made by OVMF in the above use case.
> > 
> > Does the above adds a prerequisite that OVMF needs to be optimized if 
> > and before hypercall pass-through can be done ? 
> 
> Disclaimer: my math could be totally wrong.
> 
> I doubt it's a hard requirement.  Assuming a conversative roundtrip time of 
> 50k
> cycles, those 18K hypercalls will add well under a 1/2 a second of boot time.
> If userspace can push the roundtrip time down to 10k cycles, the overhead is
> more like 50 milliseconds.
> 
> That being said, this does seem like a good OVMF cleanup, irrespective of this
> new hypercall.  I assume it's not cheap to convert a page between encrypted 
> and
> decrypted.
> 
> Thanks much for getting the numbers!

Considering the above data and guest boot time latencies 
(and potential issues with OVMF and optimizations required there),
do we have any consensus on whether we want to do page encryption
status hypercall passthrough or not ?

Thanks,
Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-24 Thread Sean Christopherson
On Wed, Feb 24, 2021, Ashish Kalra wrote:
> # Samples: 19K of event 'kvm:kvm_hypercall'
> # Event count (approx.): 19573
> #
> # Overhead  Command  Shared Object Symbol
> #   ...    .
> #
>100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> 
> Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
> so almost all hypercalls here are page encryption status hypercalls.

Oof.

> The above data indicates that there will be ~2% more Heavyweight VMEXITs
> during SEV guest boot if we do page encryption status hypercalls 
> pass-through to host userspace.
> 
> But, then Brijesh pointed out to me and highlighted that currently
> OVMF is doing lot of VMEXITs because they don't use the DMA pool to minimize 
> the C-bit toggles,
> in other words, OVMF bounce buffer does page state change on every DMA 
> allocate and free.
> 
> So here is the performance analysis after kernel and initrd have been
> loaded into memory using grub and then starting perf just before booting the 
> kernel.
> 
> These are the performance #'s after kernel and initrd have been loaded into 
> memory, 
> then perf is attached and kernel is booted : 
> 
> # Samples: 1M of event 'kvm:kvm_userspace_exit'
> # Event count (approx.): 1081235
> #
> # Overhead  Trace output
> #   
> #
> 99.77%  reason KVM_EXIT_IO (2)
>  0.23%  reason KVM_EXIT_MMIO (6)
> 
> # Samples: 1K of event 'kvm:kvm_hypercall'
> # Event count (approx.): 1279
> #
> 
> So as the above data indicates, Linux is only making ~1K hypercalls,
> compared to ~18K hypercalls made by OVMF in the above use case.
> 
> Does the above adds a prerequisite that OVMF needs to be optimized if 
> and before hypercall pass-through can be done ? 

Disclaimer: my math could be totally wrong.

I doubt it's a hard requirement.  Assuming a conversative roundtrip time of 50k
cycles, those 18K hypercalls will add well under a 1/2 a second of boot time.
If userspace can push the roundtrip time down to 10k cycles, the overhead is
more like 50 milliseconds.

That being said, this does seem like a good OVMF cleanup, irrespective of this
new hypercall.  I assume it's not cheap to convert a page between encrypted and
decrypted.

Thanks much for getting the numbers!


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-24 Thread Ashish Kalra
On Thu, Feb 18, 2021 at 12:32:47PM -0600, Kalra, Ashish wrote:
> [AMD Public Use]
> 
> 
> -Original Message-
> From: Sean Christopherson  
> Sent: Tuesday, February 16, 2021 7:03 PM
> To: Kalra, Ashish 
> Cc: pbonz...@redhat.com; t...@linutronix.de; mi...@redhat.com; 
> h...@zytor.com; rkrc...@redhat.com; j...@8bytes.org; b...@suse.de; Lendacky, 
> Thomas ; x...@kernel.org; k...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; srutherf...@google.com; 
> venu.busire...@oracle.com; Singh, Brijesh 
> Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST 
> ioctl
> 
> On Thu, Feb 04, 2021, Ashish Kalra wrote:
> > From: Brijesh Singh 
> > 
> > The ioctl is used to retrieve a guest's shared pages list.
> 
> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed 
> >through to userspace?  That way, userspace could manage the set of pages >in 
> >whatever data structure they want, and these get/set ioctls go away.
> 
> I will be more concerned about performance hit during guest DMA I/O if the 
> page encryption status hypercalls are passed through to user-space, 
> a lot of guest DMA I/O dynamically sets up pages for encryption and then 
> flips them at DMA completion, so guest I/O will surely take a performance 
> hit with this pass-through stuff.
> 

Here are some rough performance numbers comparing # of heavy-weight VMEXITs 
compared to # of hypercalls, 
during a SEV guest boot: (launch of a ubuntu 18.04 guest)

# ./perf record -e kvm:kvm_userspace_exit -e kvm:kvm_hypercall -a 
./qemu-system-x86_64 -enable-kvm -cpu host -machine q35 -smp 16,maxcpus=64 -m 
512M -drive 
if=pflash,format=raw,unit=0,file=/home/ashish/sev-migration/qemu-5.1.50/OVMF_CODE.fd,readonly
 -drive if=pflash,format=raw,unit=1,file=OVMF_VARS.fd -drive 
file=../ubuntu-18.04.qcow2,if=none,id=disk0,format=qcow2 -device 
virtio-scsi-pci,id=scsi,disable-legacy=on,iommu_platform=true -device 
scsi-hd,drive=disk0 -object 
sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1,policy=0x0 -machine 
memory-encryption=sev0 -trace events=/tmp/events -nographic -monitor pty 
-monitor unix:monitor-source,server,nowait -qmp 
unix:/tmp/qmp-sock,server,nowait -device 
virtio-rng-pci,disable-legacy=on,iommu_platform=true

...
...

root@diesel2540:/home/ashish/sev-migration/qemu-5.1.50# ./perf report
# To display the perf.data header info, please use --header/--header-only 
options.
#
#
# Total Lost Samples: 0
#
# Samples: 981K of event 'kvm:kvm_userspace_exit'
# Event count (approx.): 981021
#
# Overhead  Command  Shared Object Symbol
#   ...    ..
#
   100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_vcpu_ioctl


# Samples: 19K of event 'kvm:kvm_hypercall'
# Event count (approx.): 19573
#
# Overhead  Command  Shared Object Symbol
#   ...    .
#
   100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall

Out of these 19573 hypercalls, # of page encryption status hcalls are 19479,
so almost all hypercalls here are page encryption status hypercalls.

The above data indicates that there will be ~2% more Heavyweight VMEXITs
during SEV guest boot if we do page encryption status hypercalls 
pass-through to host userspace.

But, then Brijesh pointed out to me and highlighted that currently
OVMF is doing lot of VMEXITs because they don't use the DMA pool to minimize 
the C-bit toggles,
in other words, OVMF bounce buffer does page state change on every DMA allocate 
and free.

So here is the performance analysis after kernel and initrd have been
loaded into memory using grub and then starting perf just before booting the 
kernel.

These are the performance #'s after kernel and initrd have been loaded into 
memory, 
then perf is attached and kernel is booted : 

# Samples: 1M of event 'kvm:kvm_userspace_exit'
# Event count (approx.): 1081235
#
# Overhead  Trace output
#   
#
99.77%  reason KVM_EXIT_IO (2)
 0.23%  reason KVM_EXIT_MMIO (6)

# Samples: 1K of event 'kvm:kvm_hypercall'
# Event count (approx.): 1279
#

So as the above data indicates, Linux is only making ~1K hypercalls,
compared to ~18K hypercalls made by OVMF in the above use case.

Does the above adds a prerequisite that OVMF needs to be optimized if 
and before hypercall pass-through can be done ? 

Thanks,
Ashish


RE: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-18 Thread Kalra, Ashish
[AMD Public Use]


-Original Message-
From: Sean Christopherson  
Sent: Tuesday, February 16, 2021 7:03 PM
To: Kalra, Ashish 
Cc: pbonz...@redhat.com; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; 
rkrc...@redhat.com; j...@8bytes.org; b...@suse.de; Lendacky, Thomas 
; x...@kernel.org; k...@vger.kernel.org; 
linux-kernel@vger.kernel.org; srutherf...@google.com; 
venu.busire...@oracle.com; Singh, Brijesh 
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST 
ioctl

On Thu, Feb 04, 2021, Ashish Kalra wrote:
> From: Brijesh Singh 
> 
> The ioctl is used to retrieve a guest's shared pages list.

>What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed 
>through to userspace?  That way, userspace could manage the set of pages >in 
>whatever data structure they want, and these get/set ioctls go away.

I will be more concerned about performance hit during guest DMA I/O if the page 
encryption status hypercalls are passed through to user-space, 
a lot of guest DMA I/O dynamically sets up pages for encryption and then flips 
them at DMA completion, so guest I/O will surely take a performance 
hit with this pass-through stuff.

Thanks,
Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-18 Thread Sean Christopherson
On Thu, Feb 18, 2021, Kalra, Ashish wrote:
> From: Sean Christopherson  
> 
> On Thu, Feb 18, 2021, Kalra, Ashish wrote:
> > From: Sean Christopherson 
> > 
> > On Wed, Feb 17, 2021, Kalra, Ashish wrote:
> > >> From: Sean Christopherson  On Thu, Feb 04, 2021, 
> > >> Ashish Kalra wrote:
> > >> > From: Brijesh Singh 
> > >> > 
> > >> > The ioctl is used to retrieve a guest's shared pages list.
> > >> 
> > >> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS 
> > >> >is passed through to userspace?  That way, userspace could manage 
> > >> >the set of pages >in whatever data structure they want, and these 
> > >> >get/set ioctls go away.
> > >> 
> > >> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to 
> > >> user-space ?
> > >> 
> > >> As such it is just a simple interface to get the shared page list 
> > >> via the get/set ioctl's. simply an array is passed to these ioctl 
> > >> to get/set the shared pages list.
> >> 
> >> > It eliminates any probability of the kernel choosing the wrong data 
> >> > structure, and it's two fewer ioctls to maintain and test.
> >> 
> >> The set shared pages list ioctl cannot be avoided as it needs to be 
> >> issued to setup the shared pages list on the migrated VM, it cannot be 
> >> achieved by passing KVM_HC_PAGE_ENC_STATUS through to user-space.
> 
> >Why's that?  AIUI, KVM doesn't do anything with the list other than pass it
> >back to userspace.  Assuming that's the case, userspace can just hold onto
> >the list >for the next migration.
> 
> KVM does use it as part of the SEV DBG_DECTYPT API, within sev_dbg_decrypt()
> to check if the guest page(s) are encrypted or not, and accordingly use it to
> decide whether to decrypt the guest page(s) and return that back to
> user-space or just return it as it is.

Why is handling shared memory KVM's responsibility?  Userspace shouldn't be
asking KVM to decrypt memory it knows isn't encrypted.  My understanding is that
bogus decryption won't harm the kernel, it will only corrupt the guest.  In
other words, patch 16 can be dropped if managing the set of shared pages is
punted to userspace.


RE: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-18 Thread Kalra, Ashish
[AMD Public Use]


-Original Message-
From: Sean Christopherson  
Sent: Thursday, February 18, 2021 10:39 AM
To: Kalra, Ashish 
Cc: pbonz...@redhat.com; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; 
rkrc...@redhat.com; j...@8bytes.org; b...@suse.de; Lendacky, Thomas 
; x...@kernel.org; k...@vger.kernel.org; 
linux-kernel@vger.kernel.org; srutherf...@google.com; 
venu.busire...@oracle.com; Singh, Brijesh 
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST 
ioctl

On Thu, Feb 18, 2021, Kalra, Ashish wrote:
> From: Sean Christopherson 
> 
> On Wed, Feb 17, 2021, Kalra, Ashish wrote:
> >> From: Sean Christopherson  On Thu, Feb 04, 2021, 
> >> Ashish Kalra wrote:
> >> > From: Brijesh Singh 
> >> > 
> >> > The ioctl is used to retrieve a guest's shared pages list.
> >> 
> >> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS 
> >> >is passed through to userspace?  That way, userspace could manage 
> >> >the set of pages >in whatever data structure they want, and these get/set 
> >> >ioctls go away.
> >> 
> >> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to 
> >> user-space ?
> >> 
> >> As such it is just a simple interface to get the shared page list 
> >> via the get/set ioctl's. simply an array is passed to these ioctl 
> >> to get/set the shared pages list.
>> 
>> > It eliminates any probability of the kernel choosing the wrong data 
>> > structure, and it's two fewer ioctls to maintain and test.
>> 
>> The set shared pages list ioctl cannot be avoided as it needs to be 
>> issued to setup the shared pages list on the migrated VM, it cannot be 
>> achieved by passing KVM_HC_PAGE_ENC_STATUS through to user-space.

>Why's that?  AIUI, KVM doesn't do anything with the list other than pass it 
>back to userspace.  Assuming that's the case, userspace can just hold onto the 
>list >for the next migration.

KVM does use it as part of the SEV DBG_DECTYPT API, within sev_dbg_decrypt() to 
check if the guest page(s) are encrypted or not,
and accordingly use it to decide whether to decrypt the guest page(s) and 
return that back to user-space or just return it as it is.

Thanks,
Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-18 Thread Sean Christopherson
On Thu, Feb 18, 2021, Kalra, Ashish wrote:
> From: Sean Christopherson  
> 
> On Wed, Feb 17, 2021, Kalra, Ashish wrote:
> >> From: Sean Christopherson  On Thu, Feb 04, 2021, 
> >> Ashish Kalra wrote:
> >> > From: Brijesh Singh 
> >> > 
> >> > The ioctl is used to retrieve a guest's shared pages list.
> >> 
> >> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is 
> >> >passed through to userspace?  That way, userspace could manage the 
> >> >set of pages >in whatever data structure they want, and these get/set 
> >> >ioctls go away.
> >> 
> >> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to 
> >> user-space ?
> >> 
> >> As such it is just a simple interface to get the shared page list via 
> >> the get/set ioctl's. simply an array is passed to these ioctl to 
> >> get/set the shared pages list.
> 
> > It eliminates any probability of the kernel choosing the wrong data
> > structure, and it's two fewer ioctls to maintain and test.
> 
> The set shared pages list ioctl cannot be avoided as it needs to be issued to
> setup the shared pages list on the migrated VM, it cannot be achieved by
> passing KVM_HC_PAGE_ENC_STATUS through to user-space.

Why's that?  AIUI, KVM doesn't do anything with the list other than pass it back
to userspace.  Assuming that's the case, userspace can just hold onto the list
for the next migration.

> So it makes sense to add both get/set shared pages list ioctl, passing
> through to user-space is just adding more complexity without any significant
> gains.

No, it reduces complexity for KVM by avoiding locking and memslot dependencies.


RE: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-17 Thread Kalra, Ashish
[AMD Public Use]

-Original Message-
From: Sean Christopherson  
Sent: Wednesday, February 17, 2021 10:13 AM
To: Kalra, Ashish 
Cc: pbonz...@redhat.com; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; 
rkrc...@redhat.com; j...@8bytes.org; b...@suse.de; Lendacky, Thomas 
; x...@kernel.org; k...@vger.kernel.org; 
linux-kernel@vger.kernel.org; srutherf...@google.com; 
venu.busire...@oracle.com; Singh, Brijesh 
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST 
ioctl

On Wed, Feb 17, 2021, Kalra, Ashish wrote:
>> From: Sean Christopherson  On Thu, Feb 04, 2021, 
>> Ashish Kalra wrote:
>> > From: Brijesh Singh 
>> > 
>> > The ioctl is used to retrieve a guest's shared pages list.
>> 
>> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is 
>> >passed through to userspace?  That way, userspace could manage the 
>> >set of pages >in whatever data structure they want, and these get/set 
>> >ioctls go away.
>> 
>> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to 
>> user-space ?
>> 
>> As such it is just a simple interface to get the shared page list via 
>> the get/set ioctl's. simply an array is passed to these ioctl to 
>> get/set the shared pages list.

> It eliminates any probability of the kernel choosing the wrong data 
> structure, and it's two fewer ioctls to maintain and test.

The set shared pages list ioctl cannot be avoided as it needs to be issued to 
setup the shared pages list on the migrated
VM, it cannot be achieved by passing KVM_HC_PAGE_ENC_STATUS through to 
user-space.

So it makes sense to add both get/set shared pages list ioctl, passing through 
to user-space is just adding more complexity
without any significant gains.

> >Also, aren't there plans for an in-guest migration helper?  If so, do 
> >we have any idea what that interface will look like?  E.g. if we're 
> >going to end up with a full >fledged driver in the guest, why not 
> >bite the bullet now and bypass KVM entirely?
> 
> Even the in-guest migration helper will be using page encryption 
> status hypercalls, so some interface is surely required.

>If it's a driver with a more extensive interace, then the hypercalls can be 
>replaced by a driver operation.  That's obviously a big if, though.

> Also the in-guest migration will be mainly an OVMF component, won't  
> really be a full fledged kernel driver in the guest.

>Is there code and/or a description of what the proposed helper would look like?

Not right now, there are prototype(s) under development, I assume they will be 
posted upstream soon.

Thanks,
Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-17 Thread Sean Christopherson
On Wed, Feb 17, 2021, Kalra, Ashish wrote:
> From: Sean Christopherson  
> On Thu, Feb 04, 2021, Ashish Kalra wrote:
> > From: Brijesh Singh 
> > 
> > The ioctl is used to retrieve a guest's shared pages list.
> 
> >What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed
> >through to userspace?  That way, userspace could manage the set of pages >in
> >whatever data structure they want, and these get/set ioctls go away.
> 
> What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to user-space
> ?
> 
> As such it is just a simple interface to get the shared page list via the
> get/set ioctl's. simply an array is passed to these ioctl to get/set the
> shared pages list.

It eliminates any probability of the kernel choosing the wrong data structure,
and it's two fewer ioctls to maintain and test.

> >Also, aren't there plans for an in-guest migration helper?  If so, do we
> >have any idea what that interface will look like?  E.g. if we're going to
> >end up with a full >fledged driver in the guest, why not bite the bullet now
> >and bypass KVM entirely?
> 
> Even the in-guest migration helper will be using page encryption status
> hypercalls, so some interface is surely required.

If it's a driver with a more extensive interace, then the hypercalls can be
replaced by a driver operation.  That's obviously a big if, though.

> Also the in-guest migration will be mainly an OVMF component, won't  really
> be a full fledged kernel driver in the guest.

Is there code and/or a description of what the proposed helper would look like?


RE: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-17 Thread Kalra, Ashish
[AMD Public Use]

-Original Message-
From: Sean Christopherson  
Sent: Tuesday, February 16, 2021 7:03 PM
To: Kalra, Ashish 
Cc: pbonz...@redhat.com; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; 
rkrc...@redhat.com; j...@8bytes.org; b...@suse.de; Lendacky, Thomas 
; x...@kernel.org; k...@vger.kernel.org; 
linux-kernel@vger.kernel.org; srutherf...@google.com; 
venu.busire...@oracle.com; Singh, Brijesh 
Subject: Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST 
ioctl

On Thu, Feb 04, 2021, Ashish Kalra wrote:
> From: Brijesh Singh 
> 
> The ioctl is used to retrieve a guest's shared pages list.

>What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed 
>through to userspace?  That way, userspace could manage the set of pages >in 
>whatever data structure they want, and these get/set ioctls go away.

What is the advantage of passing KVM_HC_PAGE_ENC_STATUS through to user-space ?

As such it is just a simple interface to get the shared page list via the 
get/set ioctl's. simply an array is passed to these ioctl to get/set the shared 
pages
list.

>Also, aren't there plans for an in-guest migration helper?  If so, do we have 
>any idea what that interface will look like?  E.g. if we're going to end up 
>with a full >fledged driver in the guest, why not bite the bullet now and 
>bypass KVM entirely?

Even the in-guest migration helper will be using page encryption status 
hypercalls, so some interface is surely required.

Also the in-guest migration will be mainly an OVMF component, won't  really be 
a full fledged kernel driver in the guest.

Thanks,
Ashish


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-16 Thread Sean Christopherson
On Thu, Feb 04, 2021, Ashish Kalra wrote:
> From: Brijesh Singh 
> 
> The ioctl is used to retrieve a guest's shared pages list.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 

AFAIK, Radim is no longer involved with KVM, and his RedHat email is bouncing.
Probably best to drop his Cc in future revisions.


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-16 Thread Sean Christopherson
On Thu, Feb 04, 2021, Ashish Kalra wrote:
> From: Brijesh Singh 
> 
> The ioctl is used to retrieve a guest's shared pages list.

What's the performance hit to boot time if KVM_HC_PAGE_ENC_STATUS is passed
through to userspace?  That way, userspace could manage the set of pages in
whatever data structure they want, and these get/set ioctls go away.

Also, aren't there plans for an in-guest migration helper?  If so, do we have
any idea what that interface will look like?  E.g. if we're going to end up with
a full fledged driver in the guest, why not bite the bullet now and bypass KVM
entirely?


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-04 Thread Ashish Kalra
Hello Tom,

On Thu, Feb 04, 2021 at 10:14:37AM -0600, Tom Lendacky wrote:
> On 2/3/21 6:39 PM, Ashish Kalra wrote:
> > From: Brijesh Singh 
> > 
> > The ioctl is used to retrieve a guest's shared pages list.
> > 
> 
> ...
> 
> > +int svm_get_shared_pages_list(struct kvm *kvm,
> > + struct kvm_shared_pages_list *list)
> > +{
> > +   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> > +   struct shared_region_array_entry *array;
> > +   struct shared_region *pos;
> > +   int ret, nents = 0;
> > +   unsigned long sz;
> > +
> > +   if (!sev_guest(kvm))
> > +   return -ENOTTY;
> > +
> > +   if (!list->size)
> > +   return -EINVAL;
> > +
> > +   if (!sev->shared_pages_list_count)
> > +   return put_user(0, list->pnents);
> > +
> > +   sz = sev->shared_pages_list_count * sizeof(struct 
> > shared_region_array_entry);
> > +   if (sz > list->size)
> > +   return -E2BIG;
> > +
> > +   array = kmalloc(sz, GFP_KERNEL);
> > +   if (!array)
> > +   return -ENOMEM;
> > +
> > +   mutex_lock(>lock);
> 
> I think this lock needs to be taken before the memory size is calculated. If
> the list is expanded after obtaining the size and before taking the lock,
> you will run off the end of the array.
> 

Yes, as the page encryption status hcalls can happen simultaneously to
this ioctl, therefore this is an issue, so i will take the lock before
the memory size is calculated.

Thanks,
Ashish

> 
> > +   list_for_each_entry(pos, >shared_pages_list, list) {
> > +   array[nents].gfn_start = pos->gfn_start;
> > +   array[nents++].gfn_end = pos->gfn_end;
> > +   }
> > +   mutex_unlock(>lock);
> > +
> > +   ret = -EFAULT;
> > +   if (copy_to_user(list->buffer, array, sz))
> > +   goto out;
> > +   if (put_user(nents, list->pnents))
> > +   goto out;
> > +   ret = 0;
> > +out:
> > +   kfree(array);
> > +   return ret;
> > +}
> > +


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-04 Thread Tom Lendacky

On 2/3/21 6:39 PM, Ashish Kalra wrote:

From: Brijesh Singh 

The ioctl is used to retrieve a guest's shared pages list.



...

  
+int svm_get_shared_pages_list(struct kvm *kvm,

+ struct kvm_shared_pages_list *list)
+{
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct shared_region_array_entry *array;
+   struct shared_region *pos;
+   int ret, nents = 0;
+   unsigned long sz;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (!list->size)
+   return -EINVAL;
+
+   if (!sev->shared_pages_list_count)
+   return put_user(0, list->pnents);
+
+   sz = sev->shared_pages_list_count * sizeof(struct 
shared_region_array_entry);
+   if (sz > list->size)
+   return -E2BIG;
+
+   array = kmalloc(sz, GFP_KERNEL);
+   if (!array)
+   return -ENOMEM;
+
+   mutex_lock(>lock);


I think this lock needs to be taken before the memory size is calculated. 
If the list is expanded after obtaining the size and before taking the 
lock, you will run off the end of the array.


Thanks,
Tom


+   list_for_each_entry(pos, >shared_pages_list, list) {
+   array[nents].gfn_start = pos->gfn_start;
+   array[nents++].gfn_end = pos->gfn_end;
+   }
+   mutex_unlock(>lock);
+
+   ret = -EFAULT;
+   if (copy_to_user(list->buffer, array, sz))
+   goto out;
+   if (put_user(nents, list->pnents))
+   goto out;
+   ret = 0;
+out:
+   kfree(array);
+   return ret;
+}
+


[PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-03 Thread Ashish Kalra
From: Brijesh Singh 

The ioctl is used to retrieve a guest's shared pages list.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Joerg Roedel 
Cc: Borislav Petkov 
Cc: Tom Lendacky 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh 
Co-developed-by: Ashish Kalra 
Signed-off-by: Ashish Kalra 
---
 Documentation/virt/kvm/api.rst  | 24 
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/svm/sev.c  | 49 +
 arch/x86/kvm/svm/svm.c  |  1 +
 arch/x86/kvm/svm/svm.h  |  1 +
 arch/x86/kvm/x86.c  | 12 
 include/uapi/linux/kvm.h|  9 ++
 7 files changed, 98 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 99ceb978c8b0..59ef537c0cdd 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4677,6 +4677,30 @@ This ioctl resets VCPU registers and control structures 
according to
 the clear cpu reset definition in the POP. However, the cpu is not put
 into ESA mode. This reset is a superset of the initial reset.
 
+4.125 KVM_GET_SHARED_PAGES_LIST (vm ioctl)
+---
+
+:Capability: basic
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_shared_pages_list (in/out)
+:Returns: 0 on success, -1 on error
+
+/* for KVM_GET_SHARED_PAGES_LIST */
+struct kvm_shared_pages_list {
+   int __user *pnents;
+   void __user *buffer;
+   __u32 size;
+};
+
+The encrypted VMs have the concept of private and shared pages. The private
+pages are encrypted with the guest-specific key, while the shared pages may
+be encrypted with the hypervisor key. The KVM_GET_SHARED_PAGES_LIST can
+be used to get guest's shared/unencrypted memory regions list.
+This list can be used during the guest migration. If the page
+is private then the userspace need to use SEV migration commands to transmit
+the page.
+
 
 4.125 KVM_S390_PV_COMMAND
 -
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2da5f5e2a10e..cd354d830e13 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1303,6 +1303,8 @@ struct kvm_x86_ops {
void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
  unsigned long sz, unsigned long mode);
+   int (*get_shared_pages_list)(struct kvm *kvm,
+struct kvm_shared_pages_list *list);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 55c628df5155..701d74c8b15b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -50,6 +50,11 @@ struct shared_region {
unsigned long gfn_start, gfn_end;
 };
 
+struct shared_region_array_entry {
+   unsigned long gfn_start;
+   unsigned long gfn_end;
+};
+
 static int sev_flush_asids(void)
 {
int ret, error = 0;
@@ -1622,6 +1627,50 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned 
long gpa,
return ret;
 }
 
+int svm_get_shared_pages_list(struct kvm *kvm,
+ struct kvm_shared_pages_list *list)
+{
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct shared_region_array_entry *array;
+   struct shared_region *pos;
+   int ret, nents = 0;
+   unsigned long sz;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (!list->size)
+   return -EINVAL;
+
+   if (!sev->shared_pages_list_count)
+   return put_user(0, list->pnents);
+
+   sz = sev->shared_pages_list_count * sizeof(struct 
shared_region_array_entry);
+   if (sz > list->size)
+   return -E2BIG;
+
+   array = kmalloc(sz, GFP_KERNEL);
+   if (!array)
+   return -ENOMEM;
+
+   mutex_lock(>lock);
+   list_for_each_entry(pos, >shared_pages_list, list) {
+   array[nents].gfn_start = pos->gfn_start;
+   array[nents++].gfn_end = pos->gfn_end;
+   }
+   mutex_unlock(>lock);
+
+   ret = -EFAULT;
+   if (copy_to_user(list->buffer, array, sz))
+   goto out;
+   if (put_user(nents, list->pnents))
+   goto out;
+   ret = 0;
+out:
+   kfree(array);
+   return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bb249ec625fc..533ce47ff158 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4538,6 +4538,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 
.page_enc_status_hc = svm_page_enc_status_hc,
+   .get_shared_pages_list =