Re: [PATCH v13 00/12] Add AMD SEV guest live migration support

2021-04-16 Thread Steve Rutherford
On Thu, Apr 15, 2021 at 8:52 AM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> The series add support for AMD SEV guest live migration commands. To protect 
> the
> confidentiality of an SEV protected guest memory while in transit we need to
> use the SEV commands defined in SEV API spec [1].
>
> SEV guest VMs have the concept of private and shared memory. Private memory
> is encrypted with the guest-specific key, while shared memory may be encrypted
> with hypervisor key. The commands provided by the SEV FW are meant to be used
> for the private memory only. The patch series introduces a new hypercall.
> The guest OS can use this hypercall to notify the page encryption status.
> If the page is encrypted with guest specific-key then we use SEV command 
> during
> the migration. If page is not encrypted then fallback to default.
>
> The patch uses the KVM_EXIT_HYPERCALL exitcode and hypercall to
> userspace exit functionality as a common interface from the guest back to the
> VMM and passing on the guest shared/unencrypted page information to the
> userspace VMM/Qemu. Qemu can consult this information during migration to know
> whether the page is encrypted.
>
> This section descibes how the SEV live migration feature is negotiated
> between the host and guest, the host indicates this feature support via
> KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
> sets a UEFI enviroment variable indicating OVMF support for live
> migration, the guest kernel also detects the host support for this
> feature via cpuid and in case of an EFI boot verifies if OVMF also
> supports this feature by getting the UEFI enviroment variable and if it
> set then enables live migration feature on host by writing to a custom
> MSR, if not booted under EFI, then it simply enables the feature by
> again writing to the custom MSR. The MSR is also handled by the
> userspace VMM/Qemu.
>
> A branch containing these patches is available here:
> https://github.com/AMDESE/linux/tree/sev-migration-v13
>
> [1] https://developer.amd.com/wp-content/resources/55766.PDF
>
> Changes since v12:
> - Reset page encryption status during early boot instead of just
>   before the kexec to avoid SMP races during kvm_pv_guest_cpu_reboot().

Does this series need to disable the MSR during kvm_pv_guest_cpu_reboot()?

I _think_ going into blackout during the window after restart, but
before the MSR is explicitly reenabled, would cause corruption. The
historical shared pages could be re-allocated as non-shared pages
during restart.

Steve


Re: [PATCH v12 13/13] x86/kvm: Add kexec support for SEV Live Migration.

2021-04-13 Thread Steve Rutherford
On Tue, Apr 13, 2021 at 4:47 AM Ashish Kalra  wrote:
>
> On Mon, Apr 12, 2021 at 07:25:03PM -0700, Steve Rutherford wrote:
> > On Mon, Apr 12, 2021 at 6:48 PM Ashish Kalra  wrote:
> > >
> > > On Mon, Apr 12, 2021 at 06:23:32PM -0700, Steve Rutherford wrote:
> > > > On Mon, Apr 12, 2021 at 5:22 PM Steve Rutherford 
> > > >  wrote:
> > > > >
> > > > > On Mon, Apr 12, 2021 at 12:48 PM Ashish Kalra  
> > > > > wrote:
> > > > > >
> > > > > > From: Ashish Kalra 
> > > > > >
> > > > > > Reset the host's shared pages list related to kernel
> > > > > > specific page encryption status settings before we load a
> > > > > > new kernel by kexec. We cannot reset the complete
> > > > > > shared pages list here as we need to retain the
> > > > > > UEFI/OVMF firmware specific settings.
> > > > > >
> > > > > > The host's shared pages list is maintained for the
> > > > > > guest to keep track of all unencrypted guest memory regions,
> > > > > > therefore we need to explicitly mark all shared pages as
> > > > > > encrypted again before rebooting into the new guest kernel.
> > > > > >
> > > > > > Signed-off-by: Ashish Kalra 
> > > > > > ---
> > > > > >  arch/x86/kernel/kvm.c | 24 
> > > > > >  1 file changed, 24 insertions(+)
> > > > > >
> > > > > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > > > > index bcc82e0c9779..4ad3ed547ff1 100644
> > > > > > --- a/arch/x86/kernel/kvm.c
> > > > > > +++ b/arch/x86/kernel/kvm.c
> > > > > > @@ -39,6 +39,7 @@
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > +#include 
> > > > > >
> > > > > >  DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
> > > > > >
> > > > > > @@ -384,6 +385,29 @@ static void kvm_pv_guest_cpu_reboot(void 
> > > > > > *unused)
> > > > > >  */
> > > > > > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> > > > > > wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> > > > > > +   /*
> > > > > > +* Reset the host's shared pages list related to kernel
> > > > > > +* specific page encryption status settings before we load a
> > > > > > +* new kernel by kexec. NOTE: We cannot reset the complete
> > > > > > +* shared pages list here as we need to retain the
> > > > > > +* UEFI/OVMF firmware specific settings.
> > > > > > +*/
> > > > > > +   if (sev_live_migration_enabled & (smp_processor_id() == 0)) 
> > > > > > {
> > > > > What happens if the reboot of CPU0 races with another CPU servicing a
> > > > > device request (while the reboot is pending for that CPU)?
> > > > > Seems like you could run into a scenario where you have hypercalls 
> > > > > racing.
> > > > >
> > > > > Calling this on every core isn't free, but it is an easy way to avoid 
> > > > > this race.
> > > > > You could also count cores, and have only last core do the job, but
> > > > > that seems more complicated.
> > > > On second thought, I think this may be insufficient as a fix, since my
> > > > read of kernel/reboot.c seems to imply that devices aren't shutdown
> > > > until after these notifiers occur. As such, a single thread might be
> > > > able to race with itself. I could be wrong here though.
> > > >
> > > > The heavy hammer would be to disable migration through the MSR (which
> > > > the subsequent boot will re-enable).
> > > >
> > > > I'm curious if there is a less "blocking" way of handling kexecs (that
> > > > strategy would block LM while the guest booted).
> > > >
> > > > One option that comes to mind would be for the guest to "mute" the
> > > > encryption status hypercall after the call to reset the encryption
> > > > status. The problem would be that the encryption status for pages
> > > > would be very temporarily inaccurate in 

Re: [PATCH v12 13/13] x86/kvm: Add kexec support for SEV Live Migration.

2021-04-12 Thread Steve Rutherford
On Mon, Apr 12, 2021 at 6:48 PM Ashish Kalra  wrote:
>
> On Mon, Apr 12, 2021 at 06:23:32PM -0700, Steve Rutherford wrote:
> > On Mon, Apr 12, 2021 at 5:22 PM Steve Rutherford  
> > wrote:
> > >
> > > On Mon, Apr 12, 2021 at 12:48 PM Ashish Kalra  
> > > wrote:
> > > >
> > > > From: Ashish Kalra 
> > > >
> > > > Reset the host's shared pages list related to kernel
> > > > specific page encryption status settings before we load a
> > > > new kernel by kexec. We cannot reset the complete
> > > > shared pages list here as we need to retain the
> > > > UEFI/OVMF firmware specific settings.
> > > >
> > > > The host's shared pages list is maintained for the
> > > > guest to keep track of all unencrypted guest memory regions,
> > > > therefore we need to explicitly mark all shared pages as
> > > > encrypted again before rebooting into the new guest kernel.
> > > >
> > > > Signed-off-by: Ashish Kalra 
> > > > ---
> > > >  arch/x86/kernel/kvm.c | 24 
> > > >  1 file changed, 24 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > > index bcc82e0c9779..4ad3ed547ff1 100644
> > > > --- a/arch/x86/kernel/kvm.c
> > > > +++ b/arch/x86/kernel/kvm.c
> > > > @@ -39,6 +39,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >
> > > >  DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
> > > >
> > > > @@ -384,6 +385,29 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> > > >  */
> > > > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> > > > wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> > > > +   /*
> > > > +* Reset the host's shared pages list related to kernel
> > > > +* specific page encryption status settings before we load a
> > > > +* new kernel by kexec. NOTE: We cannot reset the complete
> > > > +* shared pages list here as we need to retain the
> > > > +* UEFI/OVMF firmware specific settings.
> > > > +*/
> > > > +   if (sev_live_migration_enabled & (smp_processor_id() == 0)) {
> > > What happens if the reboot of CPU0 races with another CPU servicing a
> > > device request (while the reboot is pending for that CPU)?
> > > Seems like you could run into a scenario where you have hypercalls racing.
> > >
> > > Calling this on every core isn't free, but it is an easy way to avoid 
> > > this race.
> > > You could also count cores, and have only last core do the job, but
> > > that seems more complicated.
> > On second thought, I think this may be insufficient as a fix, since my
> > read of kernel/reboot.c seems to imply that devices aren't shutdown
> > until after these notifiers occur. As such, a single thread might be
> > able to race with itself. I could be wrong here though.
> >
> > The heavy hammer would be to disable migration through the MSR (which
> > the subsequent boot will re-enable).
> >
> > I'm curious if there is a less "blocking" way of handling kexecs (that
> > strategy would block LM while the guest booted).
> >
> > One option that comes to mind would be for the guest to "mute" the
> > encryption status hypercall after the call to reset the encryption
> > status. The problem would be that the encryption status for pages
> > would be very temporarily inaccurate in the window between that call
> > and the start of the next boot. That isn't ideal, but, on the other
> > hand, the VM was about to reboot anyway, so a corrupted shared page
> > for device communication probably isn't super important. Still, I'm
> > not really a fan of that. This would avoid corrupting the next boot,
> > which is clearly an improvement.
> >
> > Each time the kernel boots it could also choose something like a
> > generation ID, and pass that down each time it calls the hypercall.
> > This would then let userspace identify which requests were coming from
> > the subsequent boot.
> >
> > Everything here (except, perhaps, disabling migration through the MSR)
> > seems kind of complicated. I somewhat hope my interpretation of
> > kernel/reboot.c is wrong and this race just is not possible in the
> > first place.
> >
>
> Disabling mig

Re: [PATCH v12 13/13] x86/kvm: Add kexec support for SEV Live Migration.

2021-04-12 Thread Steve Rutherford
On Mon, Apr 12, 2021 at 5:22 PM Steve Rutherford  wrote:
>
> On Mon, Apr 12, 2021 at 12:48 PM Ashish Kalra  wrote:
> >
> > From: Ashish Kalra 
> >
> > Reset the host's shared pages list related to kernel
> > specific page encryption status settings before we load a
> > new kernel by kexec. We cannot reset the complete
> > shared pages list here as we need to retain the
> > UEFI/OVMF firmware specific settings.
> >
> > The host's shared pages list is maintained for the
> > guest to keep track of all unencrypted guest memory regions,
> > therefore we need to explicitly mark all shared pages as
> > encrypted again before rebooting into the new guest kernel.
> >
> > Signed-off-by: Ashish Kalra 
> > ---
> >  arch/x86/kernel/kvm.c | 24 
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index bcc82e0c9779..4ad3ed547ff1 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -39,6 +39,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
> >
> > @@ -384,6 +385,29 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
> >  */
> > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> > wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> > +   /*
> > +* Reset the host's shared pages list related to kernel
> > +* specific page encryption status settings before we load a
> > +* new kernel by kexec. NOTE: We cannot reset the complete
> > +* shared pages list here as we need to retain the
> > +* UEFI/OVMF firmware specific settings.
> > +*/
> > +   if (sev_live_migration_enabled & (smp_processor_id() == 0)) {
> What happens if the reboot of CPU0 races with another CPU servicing a
> device request (while the reboot is pending for that CPU)?
> Seems like you could run into a scenario where you have hypercalls racing.
>
> Calling this on every core isn't free, but it is an easy way to avoid this 
> race.
> You could also count cores, and have only last core do the job, but
> that seems more complicated.
On second thought, I think this may be insufficient as a fix, since my
read of kernel/reboot.c seems to imply that devices aren't shutdown
until after these notifiers occur. As such, a single thread might be
able to race with itself. I could be wrong here though.

The heavy hammer would be to disable migration through the MSR (which
the subsequent boot will re-enable).

I'm curious if there is a less "blocking" way of handling kexecs (that
strategy would block LM while the guest booted).

One option that comes to mind would be for the guest to "mute" the
encryption status hypercall after the call to reset the encryption
status. The problem would be that the encryption status for pages
would be very temporarily inaccurate in the window between that call
and the start of the next boot. That isn't ideal, but, on the other
hand, the VM was about to reboot anyway, so a corrupted shared page
for device communication probably isn't super important. Still, I'm
not really a fan of that. This would avoid corrupting the next boot,
which is clearly an improvement.

Each time the kernel boots it could also choose something like a
generation ID, and pass that down each time it calls the hypercall.
This would then let userspace identify which requests were coming from
the subsequent boot.

Everything here (except, perhaps, disabling migration through the MSR)
seems kind of complicated. I somewhat hope my interpretation of
kernel/reboot.c is wrong and this race just is not possible in the
first place.

Steve
> Steve
> > +   int i;
> > +   unsigned long nr_pages;
> > +
> > +   for (i = 0; i < e820_table->nr_entries; i++) {
> > +   struct e820_entry *entry = _table->entries[i];
> > +
> > +   if (entry->type != E820_TYPE_RAM)
> > +   continue;
> > +
> > +   nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> > +
> > +   kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> > +  entry->addr, nr_pages, 1);
> > +   }
> > +   }
> > kvm_pv_disable_apf();
> > kvm_disable_steal_time();
> >  }
> > --
> > 2.17.1
> >


Re: [PATCH v12 12/13] x86/kvm: Add guest support for detecting and enabling SEV Live Migration feature.

2021-04-12 Thread Steve Rutherford
le_guid, NULL, , );
> +
> +   if (status == EFI_NOT_FOUND) {
> +   pr_info("%s : EFI live migration variable not found\n", 
> __func__);
> +   return 0;
> +   }
> +
> +   if (status != EFI_SUCCESS) {
> +   pr_info("%s : EFI variable retrieval failed\n", __func__);
> +   return 0;
> +   }
> +
> +   if (enabled == 0) {
> +   pr_info("%s: live migration disabled in EFI\n", __func__);
> +   return 0;
> +   }
> +
> +   pr_info("%s : live migration enabled in EFI\n", __func__);
> +   wrmsrl(MSR_KVM_SEV_LIVE_MIGRATION, KVM_SEV_LIVE_MIGRATION_ENABLED);
> +
> +   return true;
> +}
> +
> +late_initcall(setup_kvm_sev_migration);
> +
>  /*
>   * Iterate through all possible CPUs and map the memory region pointed
>   * by apf_reason, steal_time and kvm_apic_eoi as decrypted at once.
> @@ -747,6 +798,7 @@ static bool __init kvm_msi_ext_dest_id(void)
>
>  static void __init kvm_init_platform(void)
>  {
> +   check_kvm_sev_migration();
> kvmclock_init();
> x86_platform.apic_post_init = kvm_apic_init;
>  }
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index fae9ccbd0da7..4de417333c09 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -48,6 +49,8 @@ EXPORT_SYMBOL_GPL(sev_enable_key);
>
>  bool sev_enabled __section(".data");
>
> +bool sev_live_migration_enabled __section(".data");
> +
>  /* Buffer used for early in-place encryption by BSP, no locking needed */
>  static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
>
> @@ -237,6 +240,9 @@ static void set_memory_enc_dec_hypercall(unsigned long 
> vaddr, int npages,
> unsigned long sz = npages << PAGE_SHIFT;
> unsigned long vaddr_end, vaddr_next;
>
> +   if (!sev_live_migration_enabled)
> +   return;
> +
> vaddr_end = vaddr + sz;
>
> for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> @@ -407,6 +413,12 @@ int __init early_set_memory_encrypted(unsigned long 
> vaddr, unsigned long size)
> return early_set_memory_enc_dec(vaddr, size, true);
>  }
>
> +void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
> +   bool enc)
> +{
> +   set_memory_enc_dec_hypercall(vaddr, npages, enc);
> +}
> +
>  /*
>   * SME and SEV are very similar but they are not the same, so there are
>   * times that the kernel will need to distinguish between SME and SEV. The
> @@ -462,6 +474,35 @@ bool force_dma_unencrypted(struct device *dev)
> return false;
>  }
>
> +void __init check_kvm_sev_migration(void)
> +{
> +   if (sev_active() &&
> +   kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
> +   unsigned long nr_pages;
> +
> +   pr_info("KVM enable live migration\n");
> +   sev_live_migration_enabled = true;
> +
> +   /*
> +* Ensure that _bss_decrypted section is marked as decrypted 
> in the
> +* shared pages list.
> +*/
> +   nr_pages = DIV_ROUND_UP(__end_bss_decrypted - 
> __start_bss_decrypted,
> +   PAGE_SIZE);
> +   early_set_mem_enc_dec_hypercall((unsigned 
> long)__start_bss_decrypted,
> +   nr_pages, 0);
> +
> +   /*
> +* If not booted using EFI, enable Live migration support.
> +*/
> +   if (!efi_enabled(EFI_BOOT))
> +   wrmsrl(MSR_KVM_SEV_LIVE_MIGRATION,
> +  KVM_SEV_LIVE_MIGRATION_ENABLED);
> +   } else {
> +   pr_info("KVM enable live migration feature 
> unsupported\n");
I might be misunderstanding this, but I'm not sure this log message is
correct: isn't the intention that the late initcall will be the one to
check if this should be enabled later in this case?

I have a similar question above about the log message after
"!efi_enabled(EFI_RUNTIME_SERVICES)": shouldn't that avoid logging if
!efi_enabled(EFI_BOOT) (since the wrmsl call already had been made
here?)
> +   }
> +}
> +
>  void __init mem_encrypt_free_decrypted_mem(void)
>  {
> unsigned long vaddr, vaddr_end, npages;
> --
> 2.17.1
>

Other than these:
Reviewed-by: Steve Rutherford 


Re: [PATCH v12 13/13] x86/kvm: Add kexec support for SEV Live Migration.

2021-04-12 Thread Steve Rutherford
On Mon, Apr 12, 2021 at 12:48 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> Reset the host's shared pages list related to kernel
> specific page encryption status settings before we load a
> new kernel by kexec. We cannot reset the complete
> shared pages list here as we need to retain the
> UEFI/OVMF firmware specific settings.
>
> The host's shared pages list is maintained for the
> guest to keep track of all unencrypted guest memory regions,
> therefore we need to explicitly mark all shared pages as
> encrypted again before rebooting into the new guest kernel.
>
> Signed-off-by: Ashish Kalra 
> ---
>  arch/x86/kernel/kvm.c | 24 
>  1 file changed, 24 insertions(+)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index bcc82e0c9779..4ad3ed547ff1 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -39,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
>
> @@ -384,6 +385,29 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
>  */
> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> +   /*
> +* Reset the host's shared pages list related to kernel
> +* specific page encryption status settings before we load a
> +* new kernel by kexec. NOTE: We cannot reset the complete
> +* shared pages list here as we need to retain the
> +* UEFI/OVMF firmware specific settings.
> +*/
> +   if (sev_live_migration_enabled & (smp_processor_id() == 0)) {
What happens if the reboot of CPU0 races with another CPU servicing a
device request (while the reboot is pending for that CPU)?
Seems like you could run into a scenario where you have hypercalls racing.

Calling this on every core isn't free, but it is an easy way to avoid this race.
You could also count cores, and have only last core do the job, but
that seems more complicated.

Steve
> +   int i;
> +   unsigned long nr_pages;
> +
> +   for (i = 0; i < e820_table->nr_entries; i++) {
> +   struct e820_entry *entry = _table->entries[i];
> +
> +   if (entry->type != E820_TYPE_RAM)
> +   continue;
> +
> +   nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> +
> +   kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> +  entry->addr, nr_pages, 1);
> +   }
> +   }
> kvm_pv_disable_apf();
> kvm_disable_steal_time();
>  }
> --
> 2.17.1
>


Re: [PATCH v12 08/13] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

2021-04-12 Thread Steve Rutherford
 vcpu->run->hypercall.longmode = op_64_bit;
> +   vcpu->arch.complete_userspace_io = complete_hypercall_exit;
> +   return 0;
> +   }
> default:
> ret = -KVM_ENOSYS;
> break;
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..847b83b75dc8 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
>  #define KVM_HC_CLOCK_PAIRING   9
>  #define KVM_HC_SEND_IPI10
>  #define KVM_HC_SCHED_YIELD 11
> +#define KVM_HC_PAGE_ENC_STATUS 12
>
>  /*
>   * hypercalls use architecture specific
> --
> 2.17.1
>
Reviewed-by: Steve Rutherford 


Re: [PATCH v12 06/13] KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command

2021-04-12 Thread Steve Rutherford
On Mon, Apr 12, 2021 at 12:45 PM Ashish Kalra  wrote:
>
> From: Brijesh Singh 
>
> The command finalize the guest receiving process and make the SEV guest
> ready for the execution.
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Paolo Bonzini 
> Cc: Joerg Roedel 
> Cc: Borislav Petkov 
> Cc: Tom Lendacky 
> Cc: x...@kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Steve Rutherford 
> Signed-off-by: Brijesh Singh 
> Signed-off-by: Ashish Kalra 
> ---
>  .../virt/kvm/amd-memory-encryption.rst|  8 +++
>  arch/x86/kvm/svm/sev.c| 23 +++
>  2 files changed, 31 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
> b/Documentation/virt/kvm/amd-memory-encryption.rst
> index c6ed5b26d841..0466c0febff9 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -396,6 +396,14 @@ Returns: 0 on success, -negative on error
>  __u32 trans_len;
>  };
>
> +15. KVM_SEV_RECEIVE_FINISH
> +
> +
> +After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command 
> can be
> +issued by the hypervisor to make the guest ready for execution.
> +
> +Returns: 0 on success, -negative on error
> +
>  References
>  ==
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2c95657cc9bf..c9795a22e502 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1524,6 +1524,26 @@ static int sev_receive_update_data(struct kvm *kvm, 
> struct kvm_sev_cmd *argp)
> return ret;
>  }
>
> +static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> +   struct sev_data_receive_finish *data;
> +   int ret;
> +
> +   if (!sev_guest(kvm))
> +   return -ENOTTY;
> +
> +   data = kzalloc(sizeof(*data), GFP_KERNEL);
> +   if (!data)
> +   return -ENOMEM;
> +
> +   data->handle = sev->handle;
> +   ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, >error);
> +
> +   kfree(data);
> +   return ret;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
> struct kvm_sev_cmd sev_cmd;
> @@ -1592,6 +1612,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_RECEIVE_UPDATE_DATA:
> r = sev_receive_update_data(kvm, _cmd);
> break;
> +   case KVM_SEV_RECEIVE_FINISH:
> +   r = sev_receive_finish(kvm, _cmd);
> +   break;
> default:
> r = -EINVAL;
> goto out;
> --
> 2.17.1
>
Reviewed-by: Steve Rutherford 


Re: [PATCH v12 09/13] mm: x86: Invoke hypercall when page encryption status is changed

2021-04-12 Thread Steve Rutherford
eak;
> +   default:
> +   return;
> +   }
> +
> +   psize = page_level_size(level);
> +   pmask = page_level_mask(level);
> +
> +   kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> +  pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, 
> enc);
> +
> +   vaddr_next = (vaddr & pmask) + psize;
> +   }
> +}
> +
>  static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
>  {
> pgprot_t old_prot, new_prot;
> @@ -286,12 +329,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int 
> level, bool enc)
>  static int __init early_set_memory_enc_dec(unsigned long vaddr,
>unsigned long size, bool enc)
>  {
> -   unsigned long vaddr_end, vaddr_next;
> +   unsigned long vaddr_end, vaddr_next, start;
> unsigned long psize, pmask;
> int split_page_size_mask;
> int level, ret;
> pte_t *kpte;
>
> +   start = vaddr;
> vaddr_next = vaddr;
> vaddr_end = vaddr + size;
>
> @@ -346,6 +390,8 @@ static int __init early_set_memory_enc_dec(unsigned long 
> vaddr,
>
> ret = 0;
>
> +   set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> +   enc);
>  out:
> __flush_tlb_all();
> return ret;
> @@ -481,6 +527,15 @@ void __init mem_encrypt_init(void)
> if (sev_active() && !sev_es_active())
> static_branch_enable(_enable_key);
>
> +#ifdef CONFIG_PARAVIRT
> +   /*
> +* With SEV, we need to make a hypercall when page encryption state is
> +* changed.
> +*/
> +   if (sev_active())
> +   pv_ops.mmu.page_encryption_changed = 
> set_memory_enc_dec_hypercall;
> +#endif
> +
> print_mem_encrypt_feature_info();
>  }
>
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 16f878c26667..3576b583ac65 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "../mm_internal.h"
>
> @@ -2012,6 +2013,12 @@ static int __set_memory_enc_dec(unsigned long addr, 
> int numpages, bool enc)
>  */
> cpa_flush(, 0);
>
> +   /* Notify hypervisor that a given memory range is mapped encrypted
> +* or decrypted. The hypervisor will use this information during the
> +* VM migration.
> +*/
> +   page_encryption_changed(addr, numpages, enc);
> +
> return ret;
>  }
>
> --
> 2.17.1
>
Reviewed-by: Steve Rutherford 


Re: [PATCH v12 05/13] KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command

2021-04-12 Thread Steve Rutherford
a->guest_len = params.guest_len;
> +   data->handle = sev->handle;
> +
> +   ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data,
> +   >error);
> +
> +   sev_unpin_memory(kvm, guest_page, n);
> +
> +e_free:
> +   kfree(data);
> +e_free_trans:
> +   kfree(trans);
> +e_free_hdr:
> +   kfree(hdr);
> +
> +   return ret;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
> struct kvm_sev_cmd sev_cmd;
> @@ -1513,6 +1589,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_RECEIVE_START:
> r = sev_receive_start(kvm, _cmd);
> break;
> +   case KVM_SEV_RECEIVE_UPDATE_DATA:
> +   r = sev_receive_update_data(kvm, _cmd);
> +   break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 29c25e641a0c..3a656d43fc6c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1759,6 +1759,15 @@ struct kvm_sev_receive_start {
> __u32 session_len;
>  };
>
> +struct kvm_sev_receive_update_data {
> +   __u64 hdr_uaddr;
> +   __u32 hdr_len;
> +   __u64 guest_uaddr;
> +   __u32 guest_len;
> +   __u64 trans_uaddr;
> +   __u32 trans_len;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX   (1 << 2)
> --
> 2.17.1
>
Reviewed-by: Steve Rutherford 


Re: [PATCH v12 04/13] KVM: SVM: Add support for KVM_SEV_RECEIVE_START command

2021-04-12 Thread Steve Rutherford
 goto e_free;
> +
> +   /* Bind ASID to this guest */
> +   ret = sev_bind_asid(kvm, start->handle, error);
> +   if (ret)
> +   goto e_free;
> +
> +   params.handle = start->handle;
> +   if (copy_to_user((void __user *)(uintptr_t)argp->data,
> +, sizeof(struct kvm_sev_receive_start))) {
> +   ret = -EFAULT;
> +   sev_unbind_asid(kvm, start->handle);
> +   goto e_free;
> +   }
> +
> +   sev->handle = start->handle;
> +   sev->fd = argp->sev_fd;
> +
> +e_free:
> +   kfree(start);
> +e_free_session:
> +   kfree(session_data);
> +e_free_pdh:
> +   kfree(pdh_data);
> +
> +   return ret;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
> struct kvm_sev_cmd sev_cmd;
> @@ -1432,6 +1510,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SEND_FINISH:
> r = sev_send_finish(kvm, _cmd);
> break;
> +   case KVM_SEV_RECEIVE_START:
> +   r = sev_receive_start(kvm, _cmd);
> +   break;
> default:
> r = -EINVAL;
> goto out;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index d45af34c31be..29c25e641a0c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1750,6 +1750,15 @@ struct kvm_sev_send_update_data {
> __u32 trans_len;
>  };
>
> +struct kvm_sev_receive_start {
> +   __u32 handle;
> +   __u32 policy;
> +   __u64 pdh_uaddr;
> +   __u32 pdh_len;
> +   __u64 session_uaddr;
> +   __u32 session_len;
> +};
> +
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>  #define KVM_DEV_ASSIGN_MASK_INTX   (1 << 2)
> --
> 2.17.1
>
Reviewed-by: Steve Rutherford 


Re: [PATCH v12 11/13] EFI: Introduce the new AMD Memory Encryption GUID.

2021-04-12 Thread Steve Rutherford
On Mon, Apr 12, 2021 at 12:46 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> Introduce a new AMD Memory Encryption GUID which is currently
> used for defining a new UEFI environment variable which indicates
> UEFI/OVMF support for the SEV live migration feature. This variable
> is setup when UEFI/OVMF detects host/hypervisor support for SEV
> live migration and later this variable is read by the kernel using
> EFI runtime services to verify if OVMF supports the live migration
> feature.
>
> Signed-off-by: Ashish Kalra 
> ---
>  include/linux/efi.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 6b5d36babfcc..6f364ace82cb 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -362,6 +362,7 @@ void efi_native_runtime_setup(void);
>
>  /* OEM GUIDs */
>  #define DELLEMC_EFI_RCI2_TABLE_GUIDEFI_GUID(0x2d9f28a2, 0xa886, 
> 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
> +#define MEM_ENCRYPT_GUID   EFI_GUID(0x0cf29b71, 0x9e51, 
> 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
>
>  typedef struct {
> efi_guid_t guid;
> --
> 2.17.1
>

Reviewed-by: Steve Rutherford 


Re: [PATCH v12 10/13] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

2021-04-12 Thread Steve Rutherford
On Mon, Apr 12, 2021 at 12:46 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> for host-side support for SEV live migration. Also add a new custom
> MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> feature.
>
> MSR is handled by userspace using MSR filters.
>
> Signed-off-by: Ashish Kalra 
> ---
>  Documentation/virt/kvm/cpuid.rst |  5 +
>  Documentation/virt/kvm/msr.rst   | 12 
>  arch/x86/include/uapi/asm/kvm_para.h |  4 
>  arch/x86/kvm/cpuid.c |  3 ++-
>  4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst 
> b/Documentation/virt/kvm/cpuid.rst
> index cf62162d4be2..0bdb6cdb12d3 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID15  guest 
> checks this feature bit
> before using extended 
> destination
> ID bits in MSI address bits 
> 11-5.
>
> +KVM_FEATURE_SEV_LIVE_MIGRATION 16  guest checks this feature bit 
> before
> +   using the page encryption 
> state
> +   hypercall to notify the page 
> state
> +   change
> +
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24  host will warn if no 
> guest-side
> per-cpu warps are expected in
> kvmclock
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index e37a14c323d2..020245d16087 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -376,3 +376,15 @@ data:
> write '1' to bit 0 of the MSR, this causes the host to re-scan its 
> queue
> and check if there are more notifications pending. The MSR is 
> available
> if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> +
> +MSR_KVM_SEV_LIVE_MIGRATION:
> +0x4b564d08
> +
> +   Control SEV Live Migration features.
> +
> +data:
> +Bit 0 enables (1) or disables (0) host-side SEV Live Migration 
> feature,
> +in other words, this is guest->host communication that it's properly
> +handling the shared pages list.
> +
> +All other bits are reserved.
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 950afebfba88..f6bfa138874f 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -33,6 +33,7 @@
>  #define KVM_FEATURE_PV_SCHED_YIELD 13
>  #define KVM_FEATURE_ASYNC_PF_INT   14
>  #define KVM_FEATURE_MSI_EXT_DEST_ID15
> +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
>
>  #define KVM_HINTS_REALTIME  0
>
> @@ -54,6 +55,7 @@
>  #define MSR_KVM_POLL_CONTROL   0x4b564d05
>  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
>  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> +#define MSR_KVM_SEV_LIVE_MIGRATION 0x4b564d08
>
>  struct kvm_steal_time {
> __u64 steal;
> @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
>  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
>  #define KVM_PV_EOI_DISABLED 0x0
>
> +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> +
>  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 6bd2f8b830e4..4e2e69a692aa 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -812,7 +812,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array 
> *array, u32 function)
>  (1 << KVM_FEATURE_PV_SEND_IPI) |
>  (1 << KVM_FEATURE_POLL_CONTROL) |
>  (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> -        (1 << KVM_FEATURE_ASYNC_PF_INT);
> +(1 << KVM_FEATURE_ASYNC_PF_INT) |
> +(1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
>
> if (sched_info_on())
> entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> --
> 2.17.1
>
Reviewed-by: Steve Rutherford 


Re: [PATCH v12 03/13] KVM: SVM: Add KVM_SEV_SEND_FINISH command

2021-04-12 Thread Steve Rutherford
On Mon, Apr 12, 2021 at 12:44 PM Ashish Kalra  wrote:
>
> From: Brijesh Singh 
>
> The command is used to finailize the encryption context created with
> KVM_SEV_SEND_START command.
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Paolo Bonzini 
> Cc: Joerg Roedel 
> Cc: Borislav Petkov 
> Cc: Tom Lendacky 
> Cc: x...@kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh 
> Signed-off-by: Ashish Kalra 
> ---
>  .../virt/kvm/amd-memory-encryption.rst|  8 +++
>  arch/x86/kvm/svm/sev.c| 23 +++
>  2 files changed, 31 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
> b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 3c5456e0268a..26c4e6c83f62 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -335,6 +335,14 @@ Returns: 0 on success, -negative on error
>  __u32 trans_len;
>  };
>
> +12. KVM_SEV_SEND_FINISH
> +
> +
> +After completion of the migration flow, the KVM_SEV_SEND_FINISH command can 
> be
> +issued by the hypervisor to delete the encryption context.
> +
> +Returns: 0 on success, -negative on error
> +
>  References
>  ==
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 30527285a39a..92325d9527ce 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1350,6 +1350,26 @@ static int sev_send_update_data(struct kvm *kvm, 
> struct kvm_sev_cmd *argp)
> return ret;
>  }
>
> +static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> +   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> +   struct sev_data_send_finish *data;
> +   int ret;
> +
> +   if (!sev_guest(kvm))
> +   return -ENOTTY;
> +
> +   data = kzalloc(sizeof(*data), GFP_KERNEL);
> +   if (!data)
> +   return -ENOMEM;
> +
> +   data->handle = sev->handle;
> +   ret = sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, data, >error);
> +
> +   kfree(data);
> +   return ret;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
> struct kvm_sev_cmd sev_cmd;
> @@ -1409,6 +1429,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
> case KVM_SEV_SEND_UPDATE_DATA:
> r = sev_send_update_data(kvm, _cmd);
> break;
> +   case KVM_SEV_SEND_FINISH:
> +   r = sev_send_finish(kvm, _cmd);
> +   break;
> default:
> r = -EINVAL;
> goto out;
> --
> 2.17.1
>
Reviewed-by: Steve Rutherford 


Re: [PATCH v2] KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command

2021-04-12 Thread Steve Rutherford
On Sun, Apr 11, 2021 at 1:56 AM kernel test robot  wrote:
>
> Hi Steve,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on kvm/queue]
> [also build test ERROR on vhost/linux-next cryptodev/master linux/master 
> linus/master v5.12-rc6 next-20210409]
> [cannot apply to crypto/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:
> https://github.com/0day-ci/linux/commits/Steve-Rutherford/KVM-SVM-Add-support-for-KVM_SEV_SEND_CANCEL-command/20210410-060941
> base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
> config: x86_64-allyesconfig (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> reproduce (this is a W=1 build):
> # 
> https://github.com/0day-ci/linux/commit/16f9122ec5c3ee772f1edb80c2c2526650b60868
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review 
> Steve-Rutherford/KVM-SVM-Add-support-for-KVM_SEV_SEND_CANCEL-command/20210410-060941
> git checkout 16f9122ec5c3ee772f1edb80c2c2526650b60868
> # save the attached .config to linux build tree
> make W=1 ARCH=x86_64
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
>
> All errors (new ones prefixed by >>):
>
>drivers/crypto/ccp/sev-dev.c: In function 'sev_cmd_buffer_len':
> >> drivers/crypto/ccp/sev-dev.c:132:7: error: 'SEV_SEND_CANCEL' undeclared 
> >> (first use in this function); did you mean 'SEV_CMD_SEND_CANCEL'?
>  132 |  case SEV_SEND_CANCEL:return sizeof(struct 
> sev_data_send_cancel);
>  |   ^~~
>  |   SEV_CMD_SEND_CANCEL
>drivers/crypto/ccp/sev-dev.c:132:7: note: each undeclared identifier is 
> reported only once for each function it appears in
>
>
> vim +132 drivers/crypto/ccp/sev-dev.c
>
>100
>101  static int sev_cmd_buffer_len(int cmd)
>102  {
>103  switch (cmd) {
>104  case SEV_CMD_INIT:  return sizeof(struct 
> sev_data_init);
>105  case SEV_CMD_PLATFORM_STATUS:   return sizeof(struct 
> sev_user_data_status);
>106  case SEV_CMD_PEK_CSR:   return sizeof(struct 
> sev_data_pek_csr);
>107  case SEV_CMD_PEK_CERT_IMPORT:   return sizeof(struct 
> sev_data_pek_cert_import);
>108  case SEV_CMD_PDH_CERT_EXPORT:   return sizeof(struct 
> sev_data_pdh_cert_export);
>109  case SEV_CMD_LAUNCH_START:  return sizeof(struct 
> sev_data_launch_start);
>110  case SEV_CMD_LAUNCH_UPDATE_DATA:return sizeof(struct 
> sev_data_launch_update_data);
>111  case SEV_CMD_LAUNCH_UPDATE_VMSA:return sizeof(struct 
> sev_data_launch_update_vmsa);
>112  case SEV_CMD_LAUNCH_FINISH: return sizeof(struct 
> sev_data_launch_finish);
>113  case SEV_CMD_LAUNCH_MEASURE:return sizeof(struct 
> sev_data_launch_measure);
>114  case SEV_CMD_ACTIVATE:  return sizeof(struct 
> sev_data_activate);
>115  case SEV_CMD_DEACTIVATE:return sizeof(struct 
> sev_data_deactivate);
>116  case SEV_CMD_DECOMMISSION:  return sizeof(struct 
> sev_data_decommission);
>117  case SEV_CMD_GUEST_STATUS:  return sizeof(struct 
> sev_data_guest_status);
>118  case SEV_CMD_DBG_DECRYPT:   return sizeof(struct 
> sev_data_dbg);
>119  case SEV_CMD_DBG_ENCRYPT:   return sizeof(struct 
> sev_data_dbg);
>120  case SEV_CMD_SEND_START:return sizeof(struct 
> sev_data_send_start);
>121  case SEV_CMD_SEND_UPDATE_DATA:  return sizeof(struct 
> sev_data_send_update_data);
>122  case SEV_CMD_SEND_UPDATE_VMSA:  return sizeof(struct 
> sev_data_send_update_vmsa);
>123  case SEV_CMD_SEND_FINISH:   return sizeof(struct 
> sev_data_send_finish);
>124  case SEV_CMD_RECEIVE_START: return sizeof(struct 
> sev_data_receive_start);
>125  case SEV_CMD_RECEIVE_FINISH:return sizeof(struct 
> sev_data_receive_finish);
>126  case SEV_CMD_RECEIVE_UPDATE_DATA:   return sizeof(struct 
> sev_data_receive_update_data);
>127  case SEV_CMD_RECEIVE_UPDATE_VMSA:   return sizeof(struct 
> sev_data_receive_update_vmsa);
>128  case SE

[PATCH v3] KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command

2021-04-12 Thread Steve Rutherford
After completion of SEND_START, but before SEND_FINISH, the source VMM can
issue the SEND_CANCEL command to stop a migration. This is necessary so
that a cancelled migration can restart with a new target later.

Reviewed-by: Nathan Tempelman 
Reviewed-by: Brijesh Singh 
Signed-off-by: Steve Rutherford 
---
 .../virt/kvm/amd-memory-encryption.rst|  9 
 arch/x86/kvm/svm/sev.c| 23 +++
 drivers/crypto/ccp/sev-dev.c  |  1 +
 include/linux/psp-sev.h   | 10 
 include/uapi/linux/kvm.h  |  2 ++
 5 files changed, 45 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
b/Documentation/virt/kvm/amd-memory-encryption.rst
index 469a6308765b1..9e018a3eec03b 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -284,6 +284,15 @@ Returns: 0 on success, -negative on error
 __u32 len;
 };
 
+16. KVM_SEV_SEND_CANCEL
+
+
+After completion of SEND_START, but before SEND_FINISH, the source VMM can 
issue the
+SEND_CANCEL command to stop a migration. This is necessary so that a cancelled
+migration can restart with a new target later.
+
+Returns: 0 on success, -negative on error
+
 References
 ==
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 83e00e5245136..16d75b39e5e78 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1110,6 +1110,26 @@ static int sev_get_attestation_report(struct kvm *kvm, 
struct kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_send_cancel(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct sev_data_send_cancel *data;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   data->handle = sev->handle;
+   ret = sev_issue_cmd(kvm, SEV_CMD_SEND_CANCEL, data, >error);
+
+   kfree(data);
+   return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -1163,6 +1183,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_GET_ATTESTATION_REPORT:
r = sev_get_attestation_report(kvm, _cmd);
break;
+   case KVM_SEV_SEND_CANCEL:
+   r = sev_send_cancel(kvm, _cmd);
+   break;
default:
r = -EINVAL;
goto out;
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index cb9b4c4e371ed..4172a1afa0db9 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -129,6 +129,7 @@ static int sev_cmd_buffer_len(int cmd)
case SEV_CMD_DOWNLOAD_FIRMWARE: return sizeof(struct 
sev_data_download_firmware);
case SEV_CMD_GET_ID:return sizeof(struct 
sev_data_get_id);
case SEV_CMD_ATTESTATION_REPORT:return sizeof(struct 
sev_data_attestation_report);
+   case SEV_CMD_SEND_CANCEL:   return sizeof(struct 
sev_data_send_cancel);
default:return 0;
}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index b801ead1e2bb5..74f2babffc574 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -73,6 +73,7 @@ enum sev_cmd {
SEV_CMD_SEND_UPDATE_DATA= 0x041,
SEV_CMD_SEND_UPDATE_VMSA= 0x042,
SEV_CMD_SEND_FINISH = 0x043,
+   SEV_CMD_SEND_CANCEL = 0x044,
 
/* Guest migration commands (incoming) */
SEV_CMD_RECEIVE_START   = 0x050,
@@ -392,6 +393,15 @@ struct sev_data_send_finish {
u32 handle; /* In */
 } __packed;
 
+/**
+ * struct sev_data_send_cancel - SEND_CANCEL command parameters
+ *
+ * @handle: handle of the VM to process
+ */
+struct sev_data_send_cancel {
+   u32 handle; /* In */
+} __packed;
+
 /**
  * struct sev_data_receive_start - RECEIVE_START command parameters
  *
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620d..707469b6b7072 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1671,6 +1671,8 @@ enum sev_cmd_id {
KVM_SEV_CERT_EXPORT,
/* Attestation report */
KVM_SEV_GET_ATTESTATION_REPORT,
+   /* Guest Migration Extension */
+   KVM_SEV_SEND_CANCEL,
 
KVM_SEV_NR_MAX,
 };
-- 
2.31.1.295.g9ea45b61b8-goog



[PATCH v2] KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command

2021-04-09 Thread Steve Rutherford
After completion of SEND_START, but before SEND_FINISH, the source VMM can
issue the SEND_CANCEL command to stop a migration. This is necessary so
that a cancelled migration can restart with a new target later.

Reviewed-by: Nathan Tempelman 
Reviewed-by: Brijesh Singh 
Signed-off-by: Steve Rutherford 
---
 .../virt/kvm/amd-memory-encryption.rst|  9 
 arch/x86/kvm/svm/sev.c| 23 +++
 drivers/crypto/ccp/sev-dev.c  |  1 +
 include/linux/psp-sev.h   | 10 
 include/uapi/linux/kvm.h  |  2 ++
 5 files changed, 45 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
b/Documentation/virt/kvm/amd-memory-encryption.rst
index 469a6308765b1..9e018a3eec03b 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -284,6 +284,15 @@ Returns: 0 on success, -negative on error
 __u32 len;
 };
 
+16. KVM_SEV_SEND_CANCEL
+
+
+After completion of SEND_START, but before SEND_FINISH, the source VMM can 
issue the
+SEND_CANCEL command to stop a migration. This is necessary so that a cancelled
+migration can restart with a new target later.
+
+Returns: 0 on success, -negative on error
+
 References
 ==
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 83e00e5245136..16d75b39e5e78 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1110,6 +1110,26 @@ static int sev_get_attestation_report(struct kvm *kvm, 
struct kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_send_cancel(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct sev_data_send_cancel *data;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   data->handle = sev->handle;
+   ret = sev_issue_cmd(kvm, SEV_CMD_SEND_CANCEL, data, >error);
+
+   kfree(data);
+   return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -1163,6 +1183,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_GET_ATTESTATION_REPORT:
r = sev_get_attestation_report(kvm, _cmd);
break;
+   case KVM_SEV_SEND_CANCEL:
+   r = sev_send_cancel(kvm, _cmd);
+   break;
default:
r = -EINVAL;
goto out;
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index cb9b4c4e371ed..2c0a60120c785 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -129,6 +129,7 @@ static int sev_cmd_buffer_len(int cmd)
case SEV_CMD_DOWNLOAD_FIRMWARE: return sizeof(struct 
sev_data_download_firmware);
case SEV_CMD_GET_ID:return sizeof(struct 
sev_data_get_id);
case SEV_CMD_ATTESTATION_REPORT:return sizeof(struct 
sev_data_attestation_report);
+   case SEV_SEND_CANCEL:   return sizeof(struct 
sev_data_send_cancel);
default:return 0;
}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index b801ead1e2bb5..74f2babffc574 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -73,6 +73,7 @@ enum sev_cmd {
SEV_CMD_SEND_UPDATE_DATA= 0x041,
SEV_CMD_SEND_UPDATE_VMSA= 0x042,
SEV_CMD_SEND_FINISH = 0x043,
+   SEV_CMD_SEND_CANCEL = 0x044,
 
/* Guest migration commands (incoming) */
SEV_CMD_RECEIVE_START   = 0x050,
@@ -392,6 +393,15 @@ struct sev_data_send_finish {
u32 handle; /* In */
 } __packed;
 
+/**
+ * struct sev_data_send_cancel - SEND_CANCEL command parameters
+ *
+ * @handle: handle of the VM to process
+ */
+struct sev_data_send_cancel {
+   u32 handle; /* In */
+} __packed;
+
 /**
  * struct sev_data_receive_start - RECEIVE_START command parameters
  *
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620d..707469b6b7072 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1671,6 +1671,8 @@ enum sev_cmd_id {
KVM_SEV_CERT_EXPORT,
/* Attestation report */
KVM_SEV_GET_ATTESTATION_REPORT,
+   /* Guest Migration Extension */
+   KVM_SEV_SEND_CANCEL,
 
KVM_SEV_NR_MAX,
 };
-- 
2.31.1.295.g9ea45b61b8-goog



Re: [RFC v2] KVM: x86: Support KVM VMs sharing SEV context

2021-04-09 Thread Steve Rutherford
On Fri, Apr 9, 2021 at 1:14 AM Paolo Bonzini  wrote:
>
> On 09/04/21 03:18, James Bottomley wrote:
> > If you want to share ASIDs you have to share the firmware that the
> > running VM has been attested to.  Once the VM moves from LAUNCH to
> > RUNNING, the PSP won't allow the VMM to inject any more firmware or do
> > any more attestations.
>
> I think Steve is suggesting to just change the RIP of the mirror VM,
> which would work for SEV but not SEV-ES (the RAM migration helper won't
> *suffice* for SEV-ES, but perhaps you could use the PSP to migrate the
> VMSA and the migration helper for the rest?).
Exactly: you can use the existing PSP flows to migrate both the
migration helper itself and the necessary VMSAs.
>
> If you want to use a single firmware binary, SEC does almost no I/O
> accesses (the exception being the library constructor from
> SourceLevelDebugPkg's SecPeiDebugAgentLib), so you probably can:
>
> - detect the migration helper hardware in PlatformPei, either from
> fw_cfg or based on the lack of it
>
> - either divert execution to the migration helper through
> gEfiEndOfPeiSignalPpiGuid, or if it's too late add a new boot mode and
> PPI to DxeLoadCore.
>
> Paolo
>
> > What you mirror after this point can thus only
> > contain what has already been measured or what the guest added.  This
> > is why we think there has to be a new entry path into the VM for the
> > mirror vCPU.
>


Re: [PATCH] KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command

2021-04-08 Thread Steve Rutherford
On Thu, Apr 8, 2021 at 3:27 PM Brijesh Singh  wrote:
>
>
> On 4/1/21 8:44 PM, Steve Rutherford wrote:
> > After completion of SEND_START, but before SEND_FINISH, the source VMM can
> > issue the SEND_CANCEL command to stop a migration. This is necessary so
> > that a cancelled migration can restart with a new target later.
> >
> > Signed-off-by: Steve Rutherford 
> > ---
> >  .../virt/kvm/amd-memory-encryption.rst|  9 +++
> >  arch/x86/kvm/svm/sev.c| 24 +++
> >  include/linux/psp-sev.h   | 10 
> >  include/uapi/linux/kvm.h  |  2 ++
> >  4 files changed, 45 insertions(+)
>
>
> Can we add a new case statement in sev_cmd_buffer_len()
> [drivers/crypto/ccp/sev-dev.c] for this command ? I understand that the
> command just contains the handle. I have found dyndbg very helpful. If
> the command is not added in the sev_cmd_buffer_len() then we don't dump
> the command buffer.
>
> With that fixed.
>
> Reviewed-by: Brijesh Singh 

Nice catch, will follow-up shortly.


Steve


Re: [RFC v2] KVM: x86: Support KVM VMs sharing SEV context

2021-04-08 Thread Steve Rutherford
On Thu, Apr 8, 2021 at 2:15 PM James Bottomley  wrote:
>
> On Thu, 2021-04-08 at 12:48 -0700, Steve Rutherford wrote:
> > On Thu, Apr 8, 2021 at 10:43 AM James Bottomley 
> > wrote:
> > > On Fri, 2021-04-02 at 16:20 +0200, Paolo Bonzini wrote:
> > > > On 02/04/21 13:58, Ashish Kalra wrote:
> > > > > Hi Nathan,
> > > > >
> > > > > Will you be posting a corresponding Qemu patch for this ?
> > > >
> > > > Hi Ashish,
> > > >
> > > > as far as I know IBM is working on QEMU patches for guest-based
> > > > migration helpers.
> > >
> > > Yes, that's right, we'll take on this part.
> > >
> > > > However, it would be nice to collaborate on the low-level
> > > > (SEC/PEI) firmware patches to detect whether a CPU is part of the
> > > > primary VM or the mirror.  If Google has any OVMF patches already
> > > > done for that, it would be great to combine it with IBM's SEV
> > > > migration code and merge it into upstream OVMF.
> > >
> > > We've reached the stage with our prototyping where not having the
> > > OVMF support is blocking us from working on QEMU.  If we're going
> > > to have to reinvent the wheel in OVMF because Google is unwilling
> > > to publish the patches, can you at least give some hints about how
> > > you did it?
> > >
> > > Thanks,
> > >
> > > James
> >
> > Hey James,
> > It's not strictly necessary to modify OVMF to make SEV VMs live
> > migrate. If we were to modify OVMF, we would contribute those changes
> > upstream.
>
> Well, no, we already published an OVMF RFC to this list that does
> migration.  However, the mirror approach requires a different boot
> mechanism for the extra vCPU in the mirror.  I assume you're doing this
> bootstrap through OVMF so the hypervisor can interrogate it to get the
> correct entry point?  That's the code we're asking to see because
> that's what replaces our use of the MP service in the RFC.
>
> James

Hey James,
The intention would be to have a separate, stand-alone firmware-like
binary run by the mirror. Since the VMM is in control of where it
places that binary in the guest physical address space and the initial
configuration of the vCPUs, it can point the vCPUs at an entry point
contained within that binary, rather than at the standard x86 reset
vector.

Thanks,
Steve


Re: [RFC v2] KVM: x86: Support KVM VMs sharing SEV context

2021-04-08 Thread Steve Rutherford
On Thu, Apr 8, 2021 at 10:43 AM James Bottomley  wrote:
>
> On Fri, 2021-04-02 at 16:20 +0200, Paolo Bonzini wrote:
> > On 02/04/21 13:58, Ashish Kalra wrote:
> > > Hi Nathan,
> > >
> > > Will you be posting a corresponding Qemu patch for this ?
> >
> > Hi Ashish,
> >
> > as far as I know IBM is working on QEMU patches for guest-based
> > migration helpers.
>
> Yes, that's right, we'll take on this part.
>
> > However, it would be nice to collaborate on the low-level (SEC/PEI)
> > firmware patches to detect whether a CPU is part of the primary VM
> > or the mirror.  If Google has any OVMF patches already done for that,
> > it would be great to combine it with IBM's SEV migration code and
> > merge it into upstream OVMF.
>
> We've reached the stage with our prototyping where not having the OVMF
> support is blocking us from working on QEMU.  If we're going to have to
> reinvent the wheel in OVMF because Google is unwilling to publish the
> patches, can you at least give some hints about how you did it?
>
> Thanks,
>
> James

Hey James,
It's not strictly necessary to modify OVMF to make SEV VMs live
migrate. If we were to modify OVMF, we would contribute those changes
upstream.

Thanks,
Steve


Re: [PATCH v11 08/13] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

2021-04-06 Thread Steve Rutherford
On Tue, Apr 6, 2021 at 9:08 AM Ashish Kalra  wrote:
>
> On Tue, Apr 06, 2021 at 03:48:20PM +, Sean Christopherson wrote:
> > On Mon, Apr 05, 2021, Ashish Kalra wrote:
> > > From: Ashish Kalra 
> >
> > ...
> >
> > > diff --git a/arch/x86/include/asm/kvm_host.h 
> > > b/arch/x86/include/asm/kvm_host.h
> > > index 3768819693e5..78284ebbbee7 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1352,6 +1352,8 @@ struct kvm_x86_ops {
> > > int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
> > >
> > > void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> > > +   int (*page_enc_status_hc)(struct kvm_vcpu *vcpu, unsigned long gpa,
> > > + unsigned long sz, unsigned long mode);
> > >  };
> > >
> > >  struct kvm_x86_nested_ops {
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index c9795a22e502..fb3a315e5827 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -1544,6 +1544,67 @@ static int sev_receive_finish(struct kvm *kvm, 
> > > struct kvm_sev_cmd *argp)
> > > return ret;
> > >  }
> > >
> > > +static int sev_complete_userspace_page_enc_status_hc(struct kvm_vcpu 
> > > *vcpu)
> > > +{
> > > +   vcpu->run->exit_reason = 0;
> > > +   kvm_rax_write(vcpu, vcpu->run->dma_sharing.ret);
> > > +   ++vcpu->stat.hypercalls;
> > > +   return kvm_skip_emulated_instruction(vcpu);
> > > +}
> > > +
> > > +int svm_page_enc_status_hc(struct kvm_vcpu *vcpu, unsigned long gpa,
> > > +  unsigned long npages, unsigned long enc)
> > > +{
> > > +   kvm_pfn_t pfn_start, pfn_end;
> > > +   struct kvm *kvm = vcpu->kvm;
> > > +   gfn_t gfn_start, gfn_end;
> > > +
> > > +   if (!sev_guest(kvm))
> > > +   return -EINVAL;
> > > +
> > > +   if (!npages)
> > > +   return 0;
> >
> > Parth of me thinks passing a zero size should be an error not a nop.  
> > Either way
> > works, just feels a bit weird to allow this to be a nop.
> >
> > > +
> > > +   gfn_start = gpa_to_gfn(gpa);
> >
> > This should check that @gpa is aligned.
> >
> > > +   gfn_end = gfn_start + npages;
> > > +
> > > +   /* out of bound access error check */
> > > +   if (gfn_end <= gfn_start)
> > > +   return -EINVAL;
> > > +
> > > +   /* lets make sure that gpa exist in our memslot */
> > > +   pfn_start = gfn_to_pfn(kvm, gfn_start);
> > > +   pfn_end = gfn_to_pfn(kvm, gfn_end);
> > > +
> > > +   if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> > > +   /*
> > > +* Allow guest MMIO range(s) to be added
> > > +* to the shared pages list.
> > > +*/
> > > +   return -EINVAL;
> > > +   }
> > > +
> > > +   if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> > > +   /*
> > > +* Allow guest MMIO range(s) to be added
> > > +* to the shared pages list.
> > > +*/
> > > +   return -EINVAL;
> > > +   }
> >
> > I don't think KVM should do any checks beyond gfn_end <= gfn_start.  Just 
> > punt
> > to userspace and give userspace full say over what is/isn't legal.
> >
> > > +
> > > +   if (enc)
> > > +   vcpu->run->exit_reason = KVM_EXIT_DMA_UNSHARE;
> > > +   else
> > > +   vcpu->run->exit_reason = KVM_EXIT_DMA_SHARE;
> >
> > Use a single exit and pass "enc" via kvm_run.  I also strongly dislike 
> > "DMA",
> > there's no guarantee the guest is sharing memory for DMA.
> >
> > I think we can usurp KVM_EXIT_HYPERCALL for this?  E.g.
> >
>
> I see the following in Documentation/virt/kvm/api.rst :
> ..
> ..
> /* KVM_EXIT_HYPERCALL */
> struct {
> __u64 nr;
> __u64 args[6];
> __u64 ret;
> __u32 longmode;
> __u32 pad;
> } hypercall;
>
> Unused.  This was once used for 'hypercall to userspace'.  To implement
> such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
>
> This mentions this exitcode to be unused and implementing this
> functionality using KVM_EXIT_IO for x86?

I suspect this description is historical. It was originally from 2009.
KVM_EXIT_IO is used for IO port reads/writes.

Personally, I would prefer to stay the course and use a name similar
to KVM_EXIT_DMA_SHARE, say KVM_EXIT_MEM_SHARE and
KVM_EXIT_MEM_UNSHARE. These just seem very clear, which I appreciate.
Reusing hypercall would work, but shoehorning this into
KVM_EXIT_HYPERCALL when we don't have generic hypercall exits feels a
bit off in my mind. Note: that preference isn't particularly strong.

Steve
>
> Thanks,
> Ashish
>
> >   vcpu->run->exit_reason= KVM_EXIT_HYPERCALL;
> >   vcpu->run->hypercall.nr   = KVM_HC_PAGE_ENC_STATUS;
> >   vcpu->run->hypercall.args[0]  = gfn_start << PAGE_SHIFT;
> >   vcpu->run->hypercall.args[1]  = npages * PAGE_SIZE;
> >   

Re: [PATCH v11 10/13] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

2021-04-06 Thread Steve Rutherford
On Tue, Apr 6, 2021 at 7:00 AM Ashish Kalra  wrote:
>
> Hello Paolo,
>
> On Tue, Apr 06, 2021 at 03:47:59PM +0200, Paolo Bonzini wrote:
> > On 06/04/21 15:26, Ashish Kalra wrote:
> > > > It's a little unintuitive to see KVM_MSR_RET_FILTERED here, since
> > > > userspace can make this happen on its own without having an entry in
> > > > this switch statement (by setting it in the msr filter bitmaps). When
> > > > using MSR filters, I would only expect to get MSR filter exits for
> > > > MSRs I specifically asked for.
> > > >
> > > > Not a huge deal, just a little unintuitive. I'm not sure other options
> > > > are much better (you could put KVM_MSR_RET_INVALID, or you could just
> > > > not have these entries in svm_{get,set}_msr).
> > > >
> > > Actually KVM_MSR_RET_FILTERED seems more logical to use, especially in
> > > comparison with KVM_MSR_RET_INVALID.
> > >
> > > Also, hooking this msr in svm_{get,set}_msr allows some in-kernel error
> > > pre-processsing before doing the pass-through to userspace.
> >
> > I agree that it should be up to userspace to set up the filter since we now
> > have that functionality.
> >
>
> The userspace is still setting up the filter and handling this MSR, it
> is only some basic error pre-processing being done in-kernel here.
The bit that is unintuitive is that userspace will still get the
kvm_exit from an msr filter for KVM_MSR_RET_FILTERED even if they did
not add it to the filters. I don't think this is a huge deal:
userspace asked for it indirectly (through cpuid+sev enablement).
>
> Thanks,
> Ashish
>
> > Let me read the whole threads for the past versions to see what the
> > objections were...
> >
> > Paolo
> >


Re: [PATCH v11 00/13] Add AMD SEV guest live migration support

2021-04-05 Thread Steve Rutherford
On Mon, Apr 5, 2021 at 7:20 AM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> The series add support for AMD SEV guest live migration commands. To protect 
> the
> confidentiality of an SEV protected guest memory while in transit we need to
> use the SEV commands defined in SEV API spec [1].
>
> SEV guest VMs have the concept of private and shared memory. Private memory
> is encrypted with the guest-specific key, while shared memory may be encrypted
> with hypervisor key. The commands provided by the SEV FW are meant to be used
> for the private memory only. The patch series introduces a new hypercall.
> The guest OS can use this hypercall to notify the page encryption status.
> If the page is encrypted with guest specific-key then we use SEV command 
> during
> the migration. If page is not encrypted then fallback to default.
>
> The patch adds new KVM_EXIT_DMA_SHARE/KVM_EXIT_DMA_UNSHARE hypercall to
> userspace exit functionality as a common interface from the guest back to the
> VMM and passing on the guest shared/unencrypted page information to the
> userspace VMM/Qemu. Qemu can consult this information during migration to know
> whether the page is encrypted.
>
> This section descibes how the SEV live migration feature is negotiated
> between the host and guest, the host indicates this feature support via
> KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
> sets a UEFI enviroment variable indicating OVMF support for live
> migration, the guest kernel also detects the host support for this
> feature via cpuid and in case of an EFI boot verifies if OVMF also
> supports this feature by getting the UEFI enviroment variable and if it
> set then enables live migration feature on host by writing to a custom
> MSR, if not booted under EFI, then it simply enables the feature by
> again writing to the custom MSR. The MSR is also handled by the
> userspace VMM/Qemu.
>
> A branch containing these patches is available here:
> https://github.com/AMDESE/linux/tree/sev-migration-v11
>
> [1] https://developer.amd.com/wp-content/resources/55766.PDF
>
> Changes since v10:
> - Adds new KVM_EXIT_DMA_SHARE/KVM_EXIT_DMA_UNSHARE hypercall to
>   userspace exit functionality as a common interface from the guest back to 
> the
>   KVM and passing on the guest shared/unencrypted region information to the
>   userspace VMM/Qemu. KVM/host kernel does not maintain the guest shared
>   memory regions information anymore.
> - Remove implicit enabling of SEV live migration feature for an SEV
>   guest, now this is explicitly in control of the userspace VMM/Qemu.
> - Custom MSR handling is also now moved into userspace VMM/Qemu.
> - As KVM does not maintain the guest shared memory region information
>   anymore, sev_dbg_crypt() cannot bypass unencrypted guest memory
>   regions without support from userspace VMM/Qemu.
>
> Changes since v9:
> - Transitioning from page encryption bitmap to the shared pages list
>   to keep track of guest's shared/unencrypted memory regions.
> - Move back to marking the complete _bss_decrypted section as
>   decrypted in the shared pages list.
> - Invoke a new function check_kvm_sev_migration() via kvm_init_platform()
>   for guest to query for host-side support for SEV live migration
>   and to enable the SEV live migration feature, to avoid
>   #ifdefs in code
> - Rename MSR_KVM_SEV_LIVE_MIG_EN to MSR_KVM_SEV_LIVE_MIGRATION.
> - Invoke a new function handle_unencrypted_region() from
>   sev_dbg_crypt() to bypass unencrypted guest memory regions.
>
> Changes since v8:
> - Rebasing to kvm next branch.
> - Fixed and added comments as per review feedback on v8 patches.
> - Removed implicitly enabling live migration for incoming VMs in
>   in KVM_SET_PAGE_ENC_BITMAP, it is now done via KVM_SET_MSR ioctl.
> - Adds support for bypassing unencrypted guest memory regions for
>   DBG_DECRYPT API calls, guest memory region encryption status in
>   sev_dbg_decrypt() is referenced using the page encryption bitmap.
>
> Changes since v7:
> - Removed the hypervisor specific hypercall/paravirt callback for
>   SEV live migration and moved back to calling kvm_sev_hypercall3
>   directly.
> - Fix build errors as
>   Reported-by: kbuild test robot , specifically fixed
>   build error when CONFIG_HYPERVISOR_GUEST=y and
>   CONFIG_AMD_MEM_ENCRYPT=n.
> - Implicitly enabled live migration for incoming VM(s) to handle
>   A->B->C->... VM migrations.
> - Fixed Documentation as per comments on v6 patches.
> - Fixed error return path in sev_send_update_data() as per comments
>   on v6 patches.
>
> Changes since v6:
> - Rebasing to mainline and refactoring to the new split SVM
>   infrastructre.
> - Move to static allocation of the unified Page Encryption bitmap
>   instead of the dynamic resizing of the bitmap, the static allocation
>   is done implicitly by extending kvm_arch_commit_memory_region() callack
>   to add svm specific x86_ops which can read the userspace provided memory
>   region/memslots and 

Re: [PATCH v11 10/13] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

2021-04-05 Thread Steve Rutherford
On Mon, Apr 5, 2021 at 7:30 AM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> for host-side support for SEV live migration. Also add a new custom
> MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> feature.
>
> MSR is handled by userspace using MSR filters.
>
> Signed-off-by: Ashish Kalra 
> ---
>  Documentation/virt/kvm/cpuid.rst |  5 +
>  Documentation/virt/kvm/msr.rst   | 12 
>  arch/x86/include/uapi/asm/kvm_para.h |  4 
>  arch/x86/kvm/cpuid.c |  3 ++-
>  arch/x86/kvm/svm/svm.c   | 22 ++
>  5 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst 
> b/Documentation/virt/kvm/cpuid.rst
> index cf62162d4be2..0bdb6cdb12d3 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID15  guest 
> checks this feature bit
> before using extended 
> destination
> ID bits in MSI address bits 
> 11-5.
>
> +KVM_FEATURE_SEV_LIVE_MIGRATION 16  guest checks this feature bit 
> before
> +   using the page encryption 
> state
> +   hypercall to notify the page 
> state
> +   change
> +
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24  host will warn if no 
> guest-side
> per-cpu warps are expected in
> kvmclock
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index e37a14c323d2..020245d16087 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -376,3 +376,15 @@ data:
> write '1' to bit 0 of the MSR, this causes the host to re-scan its 
> queue
> and check if there are more notifications pending. The MSR is 
> available
> if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> +
> +MSR_KVM_SEV_LIVE_MIGRATION:
> +0x4b564d08
> +
> +   Control SEV Live Migration features.
> +
> +data:
> +Bit 0 enables (1) or disables (0) host-side SEV Live Migration 
> feature,
> +in other words, this is guest->host communication that it's properly
> +handling the shared pages list.
> +
> +All other bits are reserved.
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 950afebfba88..f6bfa138874f 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -33,6 +33,7 @@
>  #define KVM_FEATURE_PV_SCHED_YIELD 13
>  #define KVM_FEATURE_ASYNC_PF_INT   14
>  #define KVM_FEATURE_MSI_EXT_DEST_ID15
> +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
>
>  #define KVM_HINTS_REALTIME  0
>
> @@ -54,6 +55,7 @@
>  #define MSR_KVM_POLL_CONTROL   0x4b564d05
>  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
>  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> +#define MSR_KVM_SEV_LIVE_MIGRATION 0x4b564d08
>
>  struct kvm_steal_time {
> __u64 steal;
> @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
>  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
>  #define KVM_PV_EOI_DISABLED 0x0
>
> +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> +
>  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 6bd2f8b830e4..4e2e69a692aa 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -812,7 +812,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array 
> *array, u32 function)
>  (1 << KVM_FEATURE_PV_SEND_IPI) |
>  (1 << KVM_FEATURE_POLL_CONTROL) |
>  (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> -(1 << KVM_FEATURE_ASYNC_PF_INT);
> +(1 << KVM_FEATURE_ASYNC_PF_INT) |
> +(1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
>
> if (sched_info_on())
> entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3cbf000beff1..1ac79e2f2a6c 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2800,6 +2800,17 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
> msr_data *msr_info)
> case MSR_F10H_DECFG:
> msr_info->data = svm->msr_decfg;
> break;
> +   case MSR_KVM_SEV_LIVE_MIGRATION:
> +   if (!sev_guest(vcpu->kvm))
> +   return 1;
> +
> +   if (!guest_cpuid_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION))
> +   return 1;
> +
> +   /*
> +* Let userspace 

Re: [PATCH v11 08/13] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

2021-04-05 Thread Steve Rutherford
elieve you need to clear exit_reason: it's universally set on exit.

> +   kvm_rax_write(vcpu, vcpu->run->dma_sharing.ret);
> +   ++vcpu->stat.hypercalls;
> +   return kvm_skip_emulated_instruction(vcpu);
> +}
> +
> +int svm_page_enc_status_hc(struct kvm_vcpu *vcpu, unsigned long gpa,
> +  unsigned long npages, unsigned long enc)
> +{
> +   kvm_pfn_t pfn_start, pfn_end;
> +   struct kvm *kvm = vcpu->kvm;
> +   gfn_t gfn_start, gfn_end;
> +
> +   if (!sev_guest(kvm))
> +   return -EINVAL;
> +
> +   if (!npages)
> +   return 0;
> +
> +   gfn_start = gpa_to_gfn(gpa);
> +   gfn_end = gfn_start + npages;
> +
> +   /* out of bound access error check */
> +   if (gfn_end <= gfn_start)
> +   return -EINVAL;
> +
> +   /* lets make sure that gpa exist in our memslot */
> +   pfn_start = gfn_to_pfn(kvm, gfn_start);
> +   pfn_end = gfn_to_pfn(kvm, gfn_end);
> +
> +   if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> +   /*
> +* Allow guest MMIO range(s) to be added
> +* to the shared pages list.
> +*/
> +   return -EINVAL;
> +   }
> +
> +   if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> +   /*
> +* Allow guest MMIO range(s) to be added
> +* to the shared pages list.
> +*/
> +   return -EINVAL;
> +   }
> +
> +   if (enc)
> +   vcpu->run->exit_reason = KVM_EXIT_DMA_UNSHARE;
> +   else
> +   vcpu->run->exit_reason = KVM_EXIT_DMA_SHARE;
> +
> +   vcpu->run->dma_sharing.addr = gfn_start;
> +   vcpu->run->dma_sharing.len = npages * PAGE_SIZE;
> +   vcpu->arch.complete_userspace_io =
> +   sev_complete_userspace_page_enc_status_hc;
> +
> +   return 0;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
> struct kvm_sev_cmd sev_cmd;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 58a45bb139f8..3cbf000beff1 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4620,6 +4620,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> .complete_emulated_msr = svm_complete_emulated_msr,
>
> .vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
> +
> +   .page_enc_status_hc = svm_page_enc_status_hc,
>  };
>
>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 39e071fdab0c..9cc16d2c0b8f 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -451,6 +451,8 @@ int nested_svm_check_exception(struct vcpu_svm *svm, 
> unsigned nr,
>bool has_error_code, u32 error_code);
>  int nested_svm_exit_special(struct vcpu_svm *svm);
>  void sync_nested_vmcb_control(struct vcpu_svm *svm);
> +int svm_page_enc_status_hc(struct kvm_vcpu *vcpu, unsigned long gpa,
> +  unsigned long npages, unsigned long enc);
>
>  extern struct kvm_x86_nested_ops svm_nested_ops;
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 32cf8287d4a7..2c98a5ed554b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7748,6 +7748,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
> .can_emulate_instruction = vmx_can_emulate_instruction,
> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> .migrate_timers = vmx_migrate_timers,
> +   .page_enc_status_hc = NULL,
>
> .msr_filter_changed = vmx_msr_filter_changed,
> .complete_emulated_msr = kvm_complete_insn_gp,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f7d12fca397b..ef5c77d59651 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8273,6 +8273,18 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> kvm_sched_yield(vcpu->kvm, a0);
> ret = 0;
> break;
> +   case KVM_HC_PAGE_ENC_STATUS: {
> +   int r;
> +
> +   ret = -KVM_ENOSYS;
> +   if (kvm_x86_ops.page_enc_status_hc) {
> +   r = kvm_x86_ops.page_enc_status_hc(vcpu, a0, a1, a2);
> +   if (r >= 0)
> +   return r;
> +   ret = r;
Style nit: Why not just set ret, and return ret if ret >=0?

This looks good. I just had a few nitpicks.
Reviewed-by: Steve Rutherford 


Re: [PATCH v11 00/13] Add AMD SEV guest live migration support

2021-04-05 Thread Steve Rutherford
On Mon, Apr 5, 2021 at 8:17 AM Peter Gonda  wrote:
>
> Could this patch set include support for the SEND_CANCEL command?
>
That's separate from this patchset. I sent up an implementation last week.


[PATCH] KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command

2021-04-01 Thread Steve Rutherford
After completion of SEND_START, but before SEND_FINISH, the source VMM can
issue the SEND_CANCEL command to stop a migration. This is necessary so
that a cancelled migration can restart with a new target later.

Signed-off-by: Steve Rutherford 
---
 .../virt/kvm/amd-memory-encryption.rst|  9 +++
 arch/x86/kvm/svm/sev.c| 24 +++
 include/linux/psp-sev.h   | 10 
 include/uapi/linux/kvm.h  |  2 ++
 4 files changed, 45 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
b/Documentation/virt/kvm/amd-memory-encryption.rst
index 469a6308765b1..9e018a3eec03b 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -284,6 +284,15 @@ Returns: 0 on success, -negative on error
 __u32 len;
 };
 
+16. KVM_SEV_SEND_CANCEL
+
+
+After completion of SEND_START, but before SEND_FINISH, the source VMM can 
issue the
+SEND_CANCEL command to stop a migration. This is necessary so that a cancelled
+migration can restart with a new target later.
+
+Returns: 0 on success, -negative on error
+
 References
 ==
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 83e00e5245136..88e72102cb900 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1110,6 +1110,27 @@ static int sev_get_attestation_report(struct kvm *kvm, 
struct kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_send_cancel(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct sev_data_send_cancel *data;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   data->handle = sev->handle;
+   ret = sev_issue_cmd(kvm, SEV_CMD_SEND_CANCEL, data, >error);
+
+   kfree(data);
+   return ret;
+}
+
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -1163,6 +1184,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_GET_ATTESTATION_REPORT:
r = sev_get_attestation_report(kvm, _cmd);
break;
+   case KVM_SEV_SEND_CANCEL:
+   r = sev_send_cancel(kvm, _cmd);
+   break;
default:
r = -EINVAL;
goto out;
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index b801ead1e2bb5..74f2babffc574 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -73,6 +73,7 @@ enum sev_cmd {
SEV_CMD_SEND_UPDATE_DATA= 0x041,
SEV_CMD_SEND_UPDATE_VMSA= 0x042,
SEV_CMD_SEND_FINISH = 0x043,
+   SEV_CMD_SEND_CANCEL = 0x044,
 
/* Guest migration commands (incoming) */
SEV_CMD_RECEIVE_START   = 0x050,
@@ -392,6 +393,15 @@ struct sev_data_send_finish {
u32 handle; /* In */
 } __packed;
 
+/**
+ * struct sev_data_send_cancel - SEND_CANCEL command parameters
+ *
+ * @handle: handle of the VM to process
+ */
+struct sev_data_send_cancel {
+   u32 handle; /* In */
+} __packed;
+
 /**
  * struct sev_data_receive_start - RECEIVE_START command parameters
  *
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f6afee209620d..707469b6b7072 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1671,6 +1671,8 @@ enum sev_cmd_id {
KVM_SEV_CERT_EXPORT,
/* Attestation report */
KVM_SEV_GET_ATTESTATION_REPORT,
+   /* Guest Migration Extension */
+   KVM_SEV_SEND_CANCEL,
 
KVM_SEV_NR_MAX,
 };
-- 
2.31.0.208.g409f899ff0-goog



Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-04-01 Thread Steve Rutherford
On Fri, Mar 19, 2021 at 11:00 AM Ashish Kalra  wrote:
>
> On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> > On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra  wrote:
> > >
> > > On Wed, Mar 03, 2021 at 06:54:41PM +, Will Deacon wrote:
> > > > [+Marc]
> > > >
> > > > On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> > > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra 
> > > > > > > >  wrote:
> > > > > > > > Thanks for grabbing the data!
> > > > > > > >
> > > > > > > > I am fine with both paths. Sean has stated an explicit desire 
> > > > > > > > for
> > > > > > > > hypercall exiting, so I think that would be the current 
> > > > > > > > consensus.
> > > > > >
> > > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > > >
> > > > > > > > If we want to do hypercall exiting, this should be in a 
> > > > > > > > follow-up
> > > > > > > > series where we implement something more generic, e.g. a 
> > > > > > > > hypercall
> > > > > > > > exiting bitmap or hypercall exit list. If we are taking the 
> > > > > > > > hypercall
> > > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > > >
> > > > > > I don't think this is a good candidate for arbitrary hypercall 
> > > > > > interception.  Or
> > > > > > rather, I think hypercall interception should be an orthogonal 
> > > > > > implementation.
> > > > > >
> > > > > > The guest, including guest firmware, needs to be aware that the 
> > > > > > hypercall is
> > > > > > supported, and the ABI needs to be well-defined.  Relying on 
> > > > > > userspace VMMs to
> > > > > > implement a common ABI is an unnecessary risk.
> > > > > >
> > > > > > We could make KVM's default behavior be a nop, i.e. have KVM 
> > > > > > enforce the ABI but
> > > > > > require further VMM intervention.  But, I just don't see the point, 
> > > > > > it would
> > > > > > save only a few lines of code.  It would also limit what KVM could 
> > > > > > do in the
> > > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > > > > userspace,
> > > > > > then mandatory interception would essentially make it impossible 
> > > > > > for KVM to do
> > > > > > bookkeeping while still honoring the interception request.
> > > > > >
> > > > > > However, I do think it would make sense to have the userspace exit 
> > > > > > be a generic
> > > > > > exit type.  But hey, we already have the necessary ABI defined for 
> > > > > > that!  It's
> > > > > > just not used anywhere.
> > > > > >
> > > > > >   /* KVM_EXIT_HYPERCALL */
> > > > > >   struct {
> > > > > >   __u64 nr;
> > > > > >   __u64 args[6];
> > > > > >   __u64 ret;
> > > > > >   __u32 longmode;
> > > > > >   __u32 pad;
> > > > > >   } hypercall;
> > > > > >
> > > > > >
> > > > > > > > Userspace could also handle the MSR using MSR filters (would 
> > > > > > > > need to
> > > > > > > > confirm that).  Then userspace could also be in control of the 
> > > > > > > > cpuid bit.
> > > > > >
> > > > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits 
> > > > > > of data.
> > > > > > The data limitation could be fudged by shoving data into 
> > > > > > non-standard GPRs, but
> > > > > > that will result in truly heinous guest code, and extensibility

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-11 Thread Steve Rutherford
On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra  wrote:
>
> On Wed, Mar 03, 2021 at 06:54:41PM +, Will Deacon wrote:
> > [+Marc]
> >
> > On Tue, Mar 02, 2021 at 02:55:43PM +, Ashish Kalra wrote:
> > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra 
> > > > > >  wrote:
> > > > > > Thanks for grabbing the data!
> > > > > >
> > > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > > hypercall exiting, so I think that would be the current consensus.
> > > >
> > > > Yep, though it'd be good to get Paolo's input, too.
> > > >
> > > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > > series where we implement something more generic, e.g. a hypercall
> > > > > > exiting bitmap or hypercall exit list. If we are taking the 
> > > > > > hypercall
> > > > > > exit route, we can drop the kvm side of the hypercall.
> > > >
> > > > I don't think this is a good candidate for arbitrary hypercall 
> > > > interception.  Or
> > > > rather, I think hypercall interception should be an orthogonal 
> > > > implementation.
> > > >
> > > > The guest, including guest firmware, needs to be aware that the 
> > > > hypercall is
> > > > supported, and the ABI needs to be well-defined.  Relying on userspace 
> > > > VMMs to
> > > > implement a common ABI is an unnecessary risk.
> > > >
> > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce 
> > > > the ABI but
> > > > require further VMM intervention.  But, I just don't see the point, it 
> > > > would
> > > > save only a few lines of code.  It would also limit what KVM could do 
> > > > in the
> > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > > userspace,
> > > > then mandatory interception would essentially make it impossible for 
> > > > KVM to do
> > > > bookkeeping while still honoring the interception request.
> > > >
> > > > However, I do think it would make sense to have the userspace exit be a 
> > > > generic
> > > > exit type.  But hey, we already have the necessary ABI defined for 
> > > > that!  It's
> > > > just not used anywhere.
> > > >
> > > >   /* KVM_EXIT_HYPERCALL */
> > > >   struct {
> > > >   __u64 nr;
> > > >   __u64 args[6];
> > > >   __u64 ret;
> > > >   __u32 longmode;
> > > >   __u32 pad;
> > > >   } hypercall;
> > > >
> > > >
> > > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > > confirm that).  Then userspace could also be in control of the 
> > > > > > cpuid bit.
> > > >
> > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> > > > data.
> > > > The data limitation could be fudged by shoving data into non-standard 
> > > > GPRs, but
> > > > that will result in truly heinous guest code, and extensibility issues.
> > > >
> > > > The data limitation is a moot point, because the x86-only thing is a 
> > > > deal
> > > > breaker.  arm64's pKVM work has a near-identical use case for a guest 
> > > > to share
> > > > memory with a host.  I can't think of a clever way to avoid having to 
> > > > support
> > > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not 
> > > > have
> > > > multiple KVM variants.
> > >
> > > Looking at arm64's pKVM work, i see that it is a recently introduced RFC
> > > patch-set and probably relevant to arm64 nVHE hypervisor
> > > mode/implementation, and potentially makes sense as it adds guest
> > > memory protection as both host and guest kernels are running on the same
> > > privilege level ?
> > >
> > > Though i do see that the pKVM stuff adds two hyp

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-09 Thread Steve Rutherford
On Tue, Mar 9, 2021 at 7:42 PM Kalra, Ashish  wrote:
>
>
>
> > On Mar 9, 2021, at 3:22 AM, Steve Rutherford  wrote:
> >
> > On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh  wrote:
> >>
> >>
> >>> On 3/8/21 1:51 PM, Sean Christopherson wrote:
> >>> On Mon, Mar 08, 2021, Ashish Kalra wrote:
> >>>> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> >>>>> +Will and Quentin (arm64)
> >>>>>
> >>>>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM 
> >>>>> details at this
> >>>>> point.
> >>>>>
> >>>>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> >>>>>> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> >>>>>>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> >>>>>>> wrote:
> >>>>>>> Thanks for grabbing the data!
> >>>>>>>
> >>>>>>> I am fine with both paths. Sean has stated an explicit desire for
> >>>>>>> hypercall exiting, so I think that would be the current consensus.
> >>>>> Yep, though it'd be good to get Paolo's input, too.
> >>>>>
> >>>>>>> If we want to do hypercall exiting, this should be in a follow-up
> >>>>>>> series where we implement something more generic, e.g. a hypercall
> >>>>>>> exiting bitmap or hypercall exit list. If we are taking the hypercall
> >>>>>>> exit route, we can drop the kvm side of the hypercall.
> >>>>> I don't think this is a good candidate for arbitrary hypercall 
> >>>>> interception.  Or
> >>>>> rather, I think hypercall interception should be an orthogonal 
> >>>>> implementation.
> >>>>>
> >>>>> The guest, including guest firmware, needs to be aware that the 
> >>>>> hypercall is
> >>>>> supported, and the ABI needs to be well-defined.  Relying on userspace 
> >>>>> VMMs to
> >>>>> implement a common ABI is an unnecessary risk.
> >>>>>
> >>>>> We could make KVM's default behavior be a nop, i.e. have KVM enforce 
> >>>>> the ABI but
> >>>>> require further VMM intervention.  But, I just don't see the point, it 
> >>>>> would
> >>>>> save only a few lines of code.  It would also limit what KVM could do 
> >>>>> in the
> >>>>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> >>>>> userspace,
> >>>>> then mandatory interception would essentially make it impossible for 
> >>>>> KVM to do
> >>>>> bookkeeping while still honoring the interception request.
> >>>>>
> >>>>> However, I do think it would make sense to have the userspace exit be a 
> >>>>> generic
> >>>>> exit type.  But hey, we already have the necessary ABI defined for 
> >>>>> that!  It's
> >>>>> just not used anywhere.
> >>>>>
> >>>>>/* KVM_EXIT_HYPERCALL */
> >>>>>struct {
> >>>>>__u64 nr;
> >>>>>__u64 args[6];
> >>>>>__u64 ret;
> >>>>>__u32 longmode;
> >>>>>__u32 pad;
> >>>>>} hypercall;
> >>>>>
> >>>>>
> >>>>>>> Userspace could also handle the MSR using MSR filters (would need to
> >>>>>>> confirm that).  Then userspace could also be in control of the cpuid 
> >>>>>>> bit.
> >>>>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> >>>>> data.
> >>>>> The data limitation could be fudged by shoving data into non-standard 
> >>>>> GPRs, but
> >>>>> that will result in truly heinous guest code, and extensibility issues.
> >>>>>
> >>>>> The data limitation is a moot point, because the x86-only thing is a 
> >>>>> deal
> >>>>> breaker.  arm64's pKVM work has a near-identical use case for a guest 
> >>>>> to share
> >>>>> memory with a host.  I can't think of a clever way to avoid having to 
> 

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Steve Rutherford
On Mon, Mar 8, 2021 at 1:11 PM Brijesh Singh  wrote:
>
>
> On 3/8/21 1:51 PM, Sean Christopherson wrote:
> > On Mon, Mar 08, 2021, Ashish Kalra wrote:
> >> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> >>> +Will and Quentin (arm64)
> >>>
> >>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details 
> >>> at this
> >>> point.
> >>>
> >>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
> >>>> On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> >>>>> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> >>>>> wrote:
> >>>>> Thanks for grabbing the data!
> >>>>>
> >>>>> I am fine with both paths. Sean has stated an explicit desire for
> >>>>> hypercall exiting, so I think that would be the current consensus.
> >>> Yep, though it'd be good to get Paolo's input, too.
> >>>
> >>>>> If we want to do hypercall exiting, this should be in a follow-up
> >>>>> series where we implement something more generic, e.g. a hypercall
> >>>>> exiting bitmap or hypercall exit list. If we are taking the hypercall
> >>>>> exit route, we can drop the kvm side of the hypercall.
> >>> I don't think this is a good candidate for arbitrary hypercall 
> >>> interception.  Or
> >>> rather, I think hypercall interception should be an orthogonal 
> >>> implementation.
> >>>
> >>> The guest, including guest firmware, needs to be aware that the hypercall 
> >>> is
> >>> supported, and the ABI needs to be well-defined.  Relying on userspace 
> >>> VMMs to
> >>> implement a common ABI is an unnecessary risk.
> >>>
> >>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> >>> ABI but
> >>> require further VMM intervention.  But, I just don't see the point, it 
> >>> would
> >>> save only a few lines of code.  It would also limit what KVM could do in 
> >>> the
> >>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> >>> userspace,
> >>> then mandatory interception would essentially make it impossible for KVM 
> >>> to do
> >>> bookkeeping while still honoring the interception request.
> >>>
> >>> However, I do think it would make sense to have the userspace exit be a 
> >>> generic
> >>> exit type.  But hey, we already have the necessary ABI defined for that!  
> >>> It's
> >>> just not used anywhere.
> >>>
> >>> /* KVM_EXIT_HYPERCALL */
> >>> struct {
> >>> __u64 nr;
> >>> __u64 args[6];
> >>> __u64 ret;
> >>> __u32 longmode;
> >>> __u32 pad;
> >>> } hypercall;
> >>>
> >>>
> >>>>> Userspace could also handle the MSR using MSR filters (would need to
> >>>>> confirm that).  Then userspace could also be in control of the cpuid 
> >>>>> bit.
> >>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> >>> data.
> >>> The data limitation could be fudged by shoving data into non-standard 
> >>> GPRs, but
> >>> that will result in truly heinous guest code, and extensibility issues.
> >>>
> >>> The data limitation is a moot point, because the x86-only thing is a deal
> >>> breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> >>> share
> >>> memory with a host.  I can't think of a clever way to avoid having to 
> >>> support
> >>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> >>> multiple KVM variants.
> >>>
> >> Potentially, there is another reason for in-kernel hypercall handling
> >> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> >> of each guest page, for instance pages in hypervisor state, i.e., pages
> >> with C=0 and pages in guest valid state with C=1.
> >>
> >> Now, there shouldn't be a need for page encryption status hypercalls on
> >> SEV-SNP as KVM can track & reference guest page status directly using
> >> the RMP table.
> > Relying on the RMP table itself would require locking the RMP table

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Steve Rutherford
On Mon, Mar 8, 2021 at 11:52 AM Sean Christopherson  wrote:
>
> On Mon, Mar 08, 2021, Ashish Kalra wrote:
> > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > +Will and Quentin (arm64)
> > >
> > > Moving the non-KVM x86 folks to bcc, I don't they care about KVM details 
> > > at this
> > > point.
> > >
> > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> > > > > wrote:
> > > > > Thanks for grabbing the data!
> > > > >
> > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > hypercall exiting, so I think that would be the current consensus.
> > >
> > > Yep, though it'd be good to get Paolo's input, too.
> > >
> > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > series where we implement something more generic, e.g. a hypercall
> > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > exit route, we can drop the kvm side of the hypercall.
> > >
> > > I don't think this is a good candidate for arbitrary hypercall 
> > > interception.  Or
> > > rather, I think hypercall interception should be an orthogonal 
> > > implementation.
> > >
> > > The guest, including guest firmware, needs to be aware that the hypercall 
> > > is
> > > supported, and the ABI needs to be well-defined.  Relying on userspace 
> > > VMMs to
> > > implement a common ABI is an unnecessary risk.
> > >
> > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
> > > ABI but
> > > require further VMM intervention.  But, I just don't see the point, it 
> > > would
> > > save only a few lines of code.  It would also limit what KVM could do in 
> > > the
> > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
> > > userspace,
> > > then mandatory interception would essentially make it impossible for KVM 
> > > to do
> > > bookkeeping while still honoring the interception request.
> > >
> > > However, I do think it would make sense to have the userspace exit be a 
> > > generic
> > > exit type.  But hey, we already have the necessary ABI defined for that!  
> > > It's
> > > just not used anywhere.
> > >
> > > /* KVM_EXIT_HYPERCALL */
> > > struct {
> > > __u64 nr;
> > > __u64 args[6];
> > > __u64 ret;
> > > __u32 longmode;
> > > __u32 pad;
> > > } hypercall;
> > >
> > >
> > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > confirm that).  Then userspace could also be in control of the cpuid 
> > > > > bit.
> > >
> > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of 
> > > data.
> > > The data limitation could be fudged by shoving data into non-standard 
> > > GPRs, but
> > > that will result in truly heinous guest code, and extensibility issues.
> > >
> > > The data limitation is a moot point, because the x86-only thing is a deal
> > > breaker.  arm64's pKVM work has a near-identical use case for a guest to 
> > > share
> > > memory with a host.  I can't think of a clever way to avoid having to 
> > > support
> > > TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
> > > multiple KVM variants.
> > >
> >
> > Potentially, there is another reason for in-kernel hypercall handling
> > considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
> > of each guest page, for instance pages in hypervisor state, i.e., pages
> > with C=0 and pages in guest valid state with C=1.
> >
> > Now, there shouldn't be a need for page encryption status hypercalls on
> > SEV-SNP as KVM can track & reference guest page status directly using
> > the RMP table.
>
> Relying on the RMP table itself would require locking the RMP table for an
> extended duration, and walking the entire RMP to find shared pages would be
> very inefficient.
>
> > As KVM maintains the RMP table, therefore we will need SET/GET type of
> > interfaces to provide the

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-25 Thread Steve Rutherford
On Thu, Feb 25, 2021 at 2:59 PM Steve Rutherford  wrote:
>
> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  wrote:
> >
> > On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> > > On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > > > # Samples: 19K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 19573
> > > > #
> > > > # Overhead  Command  Shared Object Symbol
> > > > #   ...    .
> > > > #
> > > >100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > > >
> > > > Out of these 19573 hypercalls, # of page encryption status hcalls are 
> > > > 19479,
> > > > so almost all hypercalls here are page encryption status hypercalls.
> > >
> > > Oof.
> > >
> > > > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > > > during SEV guest boot if we do page encryption status hypercalls
> > > > pass-through to host userspace.
> > > >
> > > > But, then Brijesh pointed out to me and highlighted that currently
> > > > OVMF is doing lot of VMEXITs because they don't use the DMA pool to 
> > > > minimize the C-bit toggles,
> > > > in other words, OVMF bounce buffer does page state change on every DMA 
> > > > allocate and free.
> > > >
> > > > So here is the performance analysis after kernel and initrd have been
> > > > loaded into memory using grub and then starting perf just before 
> > > > booting the kernel.
> > > >
> > > > These are the performance #'s after kernel and initrd have been loaded 
> > > > into memory,
> > > > then perf is attached and kernel is booted :
> > > >
> > > > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > > > # Event count (approx.): 1081235
> > > > #
> > > > # Overhead  Trace output
> > > > #   
> > > > #
> > > > 99.77%  reason KVM_EXIT_IO (2)
> > > >  0.23%  reason KVM_EXIT_MMIO (6)
> > > >
> > > > # Samples: 1K of event 'kvm:kvm_hypercall'
> > > > # Event count (approx.): 1279
> > > > #
> > > >
> > > > So as the above data indicates, Linux is only making ~1K hypercalls,
> > > > compared to ~18K hypercalls made by OVMF in the above use case.
> > > >
> > > > Does the above adds a prerequisite that OVMF needs to be optimized if
> > > > and before hypercall pass-through can be done ?
> > >
> > > Disclaimer: my math could be totally wrong.
> > >
> > > I doubt it's a hard requirement.  Assuming a conversative roundtrip time 
> > > of 50k
> > > cycles, those 18K hypercalls will add well under a 1/2 a second of boot 
> > > time.
> > > If userspace can push the roundtrip time down to 10k cycles, the overhead 
> > > is
> > > more like 50 milliseconds.
> > >
> > > That being said, this does seem like a good OVMF cleanup, irrespective of 
> > > this
> > > new hypercall.  I assume it's not cheap to convert a page between 
> > > encrypted and
> > > decrypted.
> > >
> > > Thanks much for getting the numbers!
> >
> > Considering the above data and guest boot time latencies
> > (and potential issues with OVMF and optimizations required there),
> > do we have any consensus on whether we want to do page encryption
> > status hypercall passthrough or not ?
> >
> > Thanks,
> > Ashish
>
> Thanks for grabbing the data!
>
> I am fine with both paths. Sean has stated an explicit desire for
> hypercall exiting, so I think that would be the current consensus.
>
> If we want to do hypercall exiting, this should be in a follow-up
> series where we implement something more generic, e.g. a hypercall
> exiting bitmap or hypercall exit list. If we are taking the hypercall
> exit route, we can drop the kvm side of the hypercall. Userspace could
> also handle the MSR using MSR filters (would need to confirm that).
> Then userspace could also be in control of the cpuid bit.
>
> Essentially, I think you could drop most of the host kernel work if
> there were generic support for hypercall exiting. Then userspace would
> be responsible for all of that. Thoughts on this?
>
> --Steve

This could even go a step further, and use an MSR write from within
the guest instead of a hypercall, which could be patched through to
userspace without host modification, if I understand the MSR filtering
correctly.
--Steve


Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-25 Thread Steve Rutherford
On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  wrote:
>
> On Wed, Feb 24, 2021 at 10:22:33AM -0800, Sean Christopherson wrote:
> > On Wed, Feb 24, 2021, Ashish Kalra wrote:
> > > # Samples: 19K of event 'kvm:kvm_hypercall'
> > > # Event count (approx.): 19573
> > > #
> > > # Overhead  Command  Shared Object Symbol
> > > #   ...    .
> > > #
> > >100.00%  qemu-system-x86  [kernel.vmlinux]  [k] kvm_emulate_hypercall
> > >
> > > Out of these 19573 hypercalls, # of page encryption status hcalls are 
> > > 19479,
> > > so almost all hypercalls here are page encryption status hypercalls.
> >
> > Oof.
> >
> > > The above data indicates that there will be ~2% more Heavyweight VMEXITs
> > > during SEV guest boot if we do page encryption status hypercalls
> > > pass-through to host userspace.
> > >
> > > But, then Brijesh pointed out to me and highlighted that currently
> > > OVMF is doing lot of VMEXITs because they don't use the DMA pool to 
> > > minimize the C-bit toggles,
> > > in other words, OVMF bounce buffer does page state change on every DMA 
> > > allocate and free.
> > >
> > > So here is the performance analysis after kernel and initrd have been
> > > loaded into memory using grub and then starting perf just before booting 
> > > the kernel.
> > >
> > > These are the performance #'s after kernel and initrd have been loaded 
> > > into memory,
> > > then perf is attached and kernel is booted :
> > >
> > > # Samples: 1M of event 'kvm:kvm_userspace_exit'
> > > # Event count (approx.): 1081235
> > > #
> > > # Overhead  Trace output
> > > #   
> > > #
> > > 99.77%  reason KVM_EXIT_IO (2)
> > >  0.23%  reason KVM_EXIT_MMIO (6)
> > >
> > > # Samples: 1K of event 'kvm:kvm_hypercall'
> > > # Event count (approx.): 1279
> > > #
> > >
> > > So as the above data indicates, Linux is only making ~1K hypercalls,
> > > compared to ~18K hypercalls made by OVMF in the above use case.
> > >
> > > Does the above adds a prerequisite that OVMF needs to be optimized if
> > > and before hypercall pass-through can be done ?
> >
> > Disclaimer: my math could be totally wrong.
> >
> > I doubt it's a hard requirement.  Assuming a conversative roundtrip time of 
> > 50k
> > cycles, those 18K hypercalls will add well under a 1/2 a second of boot 
> > time.
> > If userspace can push the roundtrip time down to 10k cycles, the overhead is
> > more like 50 milliseconds.
> >
> > That being said, this does seem like a good OVMF cleanup, irrespective of 
> > this
> > new hypercall.  I assume it's not cheap to convert a page between encrypted 
> > and
> > decrypted.
> >
> > Thanks much for getting the numbers!
>
> Considering the above data and guest boot time latencies
> (and potential issues with OVMF and optimizations required there),
> do we have any consensus on whether we want to do page encryption
> status hypercall passthrough or not ?
>
> Thanks,
> Ashish

Thanks for grabbing the data!

I am fine with both paths. Sean has stated an explicit desire for
hypercall exiting, so I think that would be the current consensus.

If we want to do hypercall exiting, this should be in a follow-up
series where we implement something more generic, e.g. a hypercall
exiting bitmap or hypercall exit list. If we are taking the hypercall
exit route, we can drop the kvm side of the hypercall. Userspace could
also handle the MSR using MSR filters (would need to confirm that).
Then userspace could also be in control of the cpuid bit.

Essentially, I think you could drop most of the host kernel work if
there were generic support for hypercall exiting. Then userspace would
be responsible for all of that. Thoughts on this?

--Steve


Re: [RFC] KVM: x86: Support KVM VMs sharing SEV context

2021-02-25 Thread Steve Rutherford
On Thu, Feb 25, 2021 at 6:57 AM Tom Lendacky  wrote:
> >> +int svm_vm_copy_asid_to(struct kvm *kvm, unsigned int mirror_kvm_fd)
> >> +{
> >> +   struct file *mirror_kvm_file;
> >> +   struct kvm *mirror_kvm;
> >> +   struct kvm_sev_info *mirror_kvm_sev;
> >> +   unsigned int asid;
> >> +   int ret;
> >> +
> >> +   if (!sev_guest(kvm))
> >> +   return -ENOTTY;
> >
> > You definitely don't want this: this is the function that turns the vm
> > into an SEV guest (marks SEV as active).
>
> The sev_guest() function does not set sev->active, it only checks it. The
> sev_guest_init() function is where sev->active is set.
Sorry, bad use of the english on my part: the "this" was referring to
svm_vm_copy_asid_to. Right now, you could only pass this sev_guest
check if you had already called sev_guest_init, which seems incorrect.
>
> >
> > (Not an issue with this patch, but a broader issue) I believe
> > sev_guest lacks the necessary acquire/release barriers on sev->active,
>
> The svm_mem_enc_op() takes the kvm lock and that is the only way into the
> sev_guest_init() function where sev->active is set.
There are a few places that check sev->active which don't have the kvm
lock, which is not problematic if we add in a few compiler barriers
(ala irqchip_split et al).
>
> Thanks,
> Tom

Thanks,
Steve


Re: [RFC] KVM: x86: Support KVM VMs sharing SEV context

2021-02-24 Thread Steve Rutherford
On Wed, Feb 24, 2021 at 9:37 AM Sean Christopherson  wrote:
> > + unsigned int asid;
> > + int ret;
> > +
> > + if (!sev_guest(kvm))
> > + return -ENOTTY;
> > +
> > + mutex_lock(>lock);
> > +
> > + /* Mirrors of mirrors should work, but let's not get silly */
>
> Do we really care?
Yes, unless you reparent mirrors of mirrors to the original ASID
owner. If you don't do that, I think userspace could pump a chain of
mirrors to blow the kernel stack when it closes the leaf vm, since you
could build up a chain of sev_vm_destroys. Refcounting the ASIDs
directly would also fix this.

Nate's early implementation did the reparenting, but I pushed for the
simplification since it made the locking a bit hairy.
>
> > + if (is_mirroring_enc_context(kvm)) {
> > + ret = -ENOTTY;
> > + goto failed;
> > + }
> > +


Re: [RFC] KVM: x86: Support KVM VMs sharing SEV context

2021-02-24 Thread Steve Rutherford
On Wed, Feb 24, 2021 at 1:00 AM Nathan Tempelman  wrote:
>
> @@ -1186,6 +1195,10 @@ int svm_register_enc_region(struct kvm *kvm,
> if (!sev_guest(kvm))
> return -ENOTTY;
>
> +   /* If kvm is mirroring encryption context it isn't responsible for it 
> */
> +   if (is_mirroring_enc_context(kvm))
> +   return -ENOTTY;
> +

Is this necessary? Same for unregister. When we looked at
sev_pin_memory, I believe we concluded that double pinning was safe.
>
> if (range->addr > ULONG_MAX || range->size > ULONG_MAX)
> return -EINVAL;
>
> @@ -1252,6 +1265,10 @@ int svm_unregister_enc_region(struct kvm *kvm,
> struct enc_region *region;
> int ret;
>
> +   /* If kvm is mirroring encryption context it isn't responsible for it 
> */
> +   if (is_mirroring_enc_context(kvm))
> +   return -ENOTTY;
> +
> mutex_lock(>lock);
>
> if (!sev_guest(kvm)) {
> @@ -1282,6 +1299,65 @@ int svm_unregister_enc_region(struct kvm *kvm,
> return ret;
>  }
>
> +int svm_vm_copy_asid_to(struct kvm *kvm, unsigned int mirror_kvm_fd)
> +{
> +   struct file *mirror_kvm_file;
> +   struct kvm *mirror_kvm;
> +   struct kvm_sev_info *mirror_kvm_sev;
> +   unsigned int asid;
> +   int ret;
> +
> +   if (!sev_guest(kvm))
> +   return -ENOTTY;

You definitely don't want this: this is the function that turns the vm
into an SEV guest (marks SEV as active).

(Not an issue with this patch, but a broader issue) I believe
sev_guest lacks the necessary acquire/release barriers on sev->active,
since it's called without the kvm lock. I mean, it's x86, so the only
one that's going to hose you is the compiler for this type of access.
There should be an smp_rmb() after the access in sev_guest and an
smp_wmb() before the access in SEV_GUEST_INIT and here.
>
> +
> +   mutex_lock(>lock);
> +
> +   /* Mirrors of mirrors should work, but let's not get silly */
> +   if (is_mirroring_enc_context(kvm)) {
> +   ret = -ENOTTY;
> +   goto failed;
> +   }
> +
> +   mirror_kvm_file = fget(mirror_kvm_fd);
> +   if (!kvm_is_kvm(mirror_kvm_file)) {
> +   ret = -EBADF;
> +   goto failed;
> +   }
> +
> +   mirror_kvm = mirror_kvm_file->private_data;
> +
> +   if (mirror_kvm == kvm || is_mirroring_enc_context(mirror_kvm)) {
Just check if the source is an sev_guest and that the destination is
not an sev_guest.

I reviewed earlier incarnations of this, and think the high-level idea
is sound. I'd like to see kvm-selftests for this patch, and plan on
collaborating with AMD to help make those happen.


Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

2021-02-10 Thread Steve Rutherford
On Wed, Feb 10, 2021 at 2:01 PM Steve Rutherford  wrote:
>
> Hi Ashish,
>
> On Wed, Feb 10, 2021 at 12:37 PM Ashish Kalra  wrote:
> >
> > Hello Steve,
> >
> > We can remove the implicit enabling of this live migration feature
> > from svm_vcpu_after_set_cpuid() callback invoked afer KVM_SET_CPUID2
> > ioctl, and let this feature flag be controlled by the userspace
> > VMM/qemu.
> >
> > Userspace can set this feature flag explicitly by calling the
> > KVM_SET_CPUID2 ioctl and enable this feature whenever it is ready to
> > do so.
> >
> > I have tested this as part of Qemu code :
> >
> > int kvm_arch_init_vcpu(CPUState *cs)
> > {
> > ...
> > ...
> > c->function = KVM_CPUID_FEATURES | kvm_base;
> > c->eax = env->features[FEAT_KVM];
> > c->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > ...
> > ...
> >
> > r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, _data);
> > ...
> >
> > Let me know if this addresses your concerns.
> Removing implicit enablement is one part of the equation.
> The other two are:
> 1) Host userspace being able to ask the kernel if it supports SEV Live 
> Migration
> 2) Host userspace being able to disable access to the MSR/hypercall
>
> Feature flagging for paravirt features is pretty complicated, since
> you need all three parties to negotiate (host userspace/host
> kernel/guest), and every single one has veto power. In the end, the
> feature should only be available to the guest if every single party
> says yes.
>
> For an example of how to handle 1), the new feature flag could be
> checked when asking the kernel which cpuid bits it supports by adding
> it to the list of features that the kernel mentions in
> KVM_GET_SUPPORTED_CPUID.
>
> For example (in KVM's arch/x86/kvm/cpuid.c):
> case KVM_CPUID_FEATURES:
> ==
> entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
> (1 << KVM_FEATURE_NOP_IO_DELAY) |
> ...
> (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> +  (1 << KVM_FEATURE_ASYNC_PF_INT) |
> -   (1 << KVM_FEATURE_ASYNC_PF_INT);
> +  (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> ==
>
> Without this, userspace has to infer if the kernel it is on supports that 
> flag.
>
> For an example of how to handle 2), in the new msr handler, KVM should
> throw a GP `if (!guest_pv_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION))`
> (it can do this by returning th. The issue here is "what if the guest
Correction: (it can do this by returning 1).


Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

2021-02-10 Thread Steve Rutherford
Hi Ashish,

On Wed, Feb 10, 2021 at 12:37 PM Ashish Kalra  wrote:
>
> Hello Steve,
>
> We can remove the implicit enabling of this live migration feature
> from svm_vcpu_after_set_cpuid() callback invoked afer KVM_SET_CPUID2
> ioctl, and let this feature flag be controlled by the userspace
> VMM/qemu.
>
> Userspace can set this feature flag explicitly by calling the
> KVM_SET_CPUID2 ioctl and enable this feature whenever it is ready to
> do so.
>
> I have tested this as part of Qemu code :
>
> int kvm_arch_init_vcpu(CPUState *cs)
> {
> ...
> ...
> c->function = KVM_CPUID_FEATURES | kvm_base;
> c->eax = env->features[FEAT_KVM];
> c->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> ...
> ...
>
> r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, _data);
> ...
>
> Let me know if this addresses your concerns.
Removing implicit enablement is one part of the equation.
The other two are:
1) Host userspace being able to ask the kernel if it supports SEV Live Migration
2) Host userspace being able to disable access to the MSR/hypercall

Feature flagging for paravirt features is pretty complicated, since
you need all three parties to negotiate (host userspace/host
kernel/guest), and every single one has veto power. In the end, the
feature should only be available to the guest if every single party
says yes.

For an example of how to handle 1), the new feature flag could be
checked when asking the kernel which cpuid bits it supports by adding
it to the list of features that the kernel mentions in
KVM_GET_SUPPORTED_CPUID.

For example (in KVM's arch/x86/kvm/cpuid.c):
case KVM_CPUID_FEATURES:
==
entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
(1 << KVM_FEATURE_NOP_IO_DELAY) |
...
(1 << KVM_FEATURE_PV_SCHED_YIELD) |
+  (1 << KVM_FEATURE_ASYNC_PF_INT) |
-   (1 << KVM_FEATURE_ASYNC_PF_INT);
+  (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
==

Without this, userspace has to infer if the kernel it is on supports that flag.

For an example of how to handle 2), in the new msr handler, KVM should
throw a GP `if (!guest_pv_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION))`
(it can do this by returning th. The issue here is "what if the guest
ignores CPUID and calls the MSR/hypercall anyway". This is a less
important issue as it requires the guest to be malicious, but still
worth resolving. Additionally, the hypercall itself should check if
the MSR has been toggled by the guest.

Thanks,
Steve


Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

2021-02-08 Thread Steve Rutherford
Hi Ashish,

On Sun, Feb 7, 2021 at 4:29 PM Ashish Kalra  wrote:
>
> Hello Steve,
>
> On Sat, Feb 06, 2021 at 01:56:46PM +, Ashish Kalra wrote:
> > Hello Steve,
> >
> > On Sat, Feb 06, 2021 at 05:46:17AM +, Ashish Kalra wrote:
> > > Hello Steve,
> > >
> > > Continued response to your queries, especially related to userspace
> > > control of SEV live migration feature :
> > >
> > > On Fri, Feb 05, 2021 at 06:54:21PM -0800, Steve Rutherford wrote:
> > > > On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra  
> > > > wrote:
> > > > >
> > > > > Hello Steve,
> > > > >
> > > > > On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > > > > > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra  
> > > > > > wrote:
> > > > > > >
> > > > > > > From: Ashish Kalra 
> > > > > > >
> > > > > > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > > > > > for host-side support for SEV live migration. Also add a new 
> > > > > > > custom
> > > > > > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live 
> > > > > > > migration
> > > > > > > feature.
> > > > > > >
> > > > > > > Signed-off-by: Ashish Kalra 
> > > > > > > ---
> > > > > > >  Documentation/virt/kvm/cpuid.rst |  5 +
> > > > > > >  Documentation/virt/kvm/msr.rst   | 12 
> > > > > > >  arch/x86/include/uapi/asm/kvm_para.h |  4 
> > > > > > >  arch/x86/kvm/svm/sev.c   | 13 +
> > > > > > >  arch/x86/kvm/svm/svm.c   | 16 
> > > > > > >  arch/x86/kvm/svm/svm.h   |  2 ++
> > > > > > >  6 files changed, 52 insertions(+)
> > > > > > >
> > > > > > > diff --git a/Documentation/virt/kvm/cpuid.rst 
> > > > > > > b/Documentation/virt/kvm/cpuid.rst
> > > > > > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > > > > > --- a/Documentation/virt/kvm/cpuid.rst
> > > > > > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > > > > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID15  
> > > > > > > guest checks this feature bit
> > > > > > > before using 
> > > > > > > extended destination
> > > > > > > ID bits in MSI 
> > > > > > > address bits 11-5.
> > > > > > >
> > > > > > > +KVM_FEATURE_SEV_LIVE_MIGRATION 16  guest checks this 
> > > > > > > feature bit before
> > > > > > > +   using the page 
> > > > > > > encryption state
> > > > > > > +   hypercall to 
> > > > > > > notify the page state
> > > > > > > +   change
> > > > > > > +
> > > > > > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24  host will warn if 
> > > > > > > no guest-side
> > > > > > > per-cpu warps are 
> > > > > > > expected in
> > > > > > > kvmclock
> > > > > > > diff --git a/Documentation/virt/kvm/msr.rst 
> > > > > > > b/Documentation/virt/kvm/msr.rst
> > > > > > > index e37a14c323d2..020245d16087 100644
> > > > > > > --- a/Documentation/virt/kvm/msr.rst
> > > > > > > +++ b/Documentation/virt/kvm/msr.rst
> > > > > > > @@ -376,3 +376,15 @@ data:
> > > > > > > write '1' to bit 0 of the MSR, this causes the host to 
> > > > > > > re-scan its queue
> > > > > > > and check if there are more notifications pending. The 
> > > > > > > MSR is available
> > > > > > > if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > > > > > +
> > > >

Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

2021-02-05 Thread Steve Rutherford
On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra  wrote:
>
> Hello Steve,
>
> On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra  wrote:
> > >
> > > From: Ashish Kalra 
> > >
> > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > for host-side support for SEV live migration. Also add a new custom
> > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > > feature.
> > >
> > > Signed-off-by: Ashish Kalra 
> > > ---
> > >  Documentation/virt/kvm/cpuid.rst |  5 +
> > >  Documentation/virt/kvm/msr.rst   | 12 
> > >  arch/x86/include/uapi/asm/kvm_para.h |  4 
> > >  arch/x86/kvm/svm/sev.c   | 13 +
> > >  arch/x86/kvm/svm/svm.c   | 16 
> > >  arch/x86/kvm/svm/svm.h   |  2 ++
> > >  6 files changed, 52 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/cpuid.rst 
> > > b/Documentation/virt/kvm/cpuid.rst
> > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > --- a/Documentation/virt/kvm/cpuid.rst
> > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID15  guest 
> > > checks this feature bit
> > > before using extended 
> > > destination
> > > ID bits in MSI address 
> > > bits 11-5.
> > >
> > > +KVM_FEATURE_SEV_LIVE_MIGRATION 16  guest checks this feature 
> > > bit before
> > > +   using the page encryption 
> > > state
> > > +   hypercall to notify the 
> > > page state
> > > +   change
> > > +
> > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24  host will warn if no 
> > > guest-side
> > > per-cpu warps are 
> > > expected in
> > > kvmclock
> > > diff --git a/Documentation/virt/kvm/msr.rst 
> > > b/Documentation/virt/kvm/msr.rst
> > > index e37a14c323d2..020245d16087 100644
> > > --- a/Documentation/virt/kvm/msr.rst
> > > +++ b/Documentation/virt/kvm/msr.rst
> > > @@ -376,3 +376,15 @@ data:
> > > write '1' to bit 0 of the MSR, this causes the host to re-scan 
> > > its queue
> > > and check if there are more notifications pending. The MSR is 
> > > available
> > > if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > +
> > > +MSR_KVM_SEV_LIVE_MIGRATION:
> > > +0x4b564d08
> > > +
> > > +   Control SEV Live Migration features.
> > > +
> > > +data:
> > > +Bit 0 enables (1) or disables (0) host-side SEV Live Migration 
> > > feature,
> > > +in other words, this is guest->host communication that it's 
> > > properly
> > > +handling the shared pages list.
> > > +
> > > +All other bits are reserved.
> > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> > > b/arch/x86/include/uapi/asm/kvm_para.h
> > > index 950afebfba88..f6bfa138874f 100644
> > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > @@ -33,6 +33,7 @@
> > >  #define KVM_FEATURE_PV_SCHED_YIELD 13
> > >  #define KVM_FEATURE_ASYNC_PF_INT   14
> > >  #define KVM_FEATURE_MSI_EXT_DEST_ID15
> > > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> > >
> > >  #define KVM_HINTS_REALTIME  0
> > >
> > > @@ -54,6 +55,7 @@
> > >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> > >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> > >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > > +#define MSR_KVM_SEV_LIVE_MIGRATION 0x4b564d08
> > >
> > >  struct kvm_steal_time {
> > > __u64 steal;
> > > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > >  #define KVM_PV_EOI_DISABLED 0x0
> > >
> > > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > > +
> > >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > > diff --git a/arch/x86/kvm/svm

Re: [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

2021-02-04 Thread Steve Rutherford
On Wed, Feb 3, 2021 at 4:38 PM Ashish Kalra  wrote:
>
> From: Brijesh Singh 
>
> This hypercall is used by the SEV guest to notify a change in the page
> encryption status to the hypervisor. The hypercall should be invoked
> only when the encryption attribute is changed from encrypted -> decrypted
> and vice versa. By default all guest pages are considered encrypted.
>
> The patch introduces a new shared pages list implemented as a
> sorted linked list to track the shared/unencrypted regions marked by the
> guest hypercall.
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Joerg Roedel 
> Cc: Borislav Petkov 
> Cc: Tom Lendacky 
> Cc: x...@kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh 
> Co-developed-by: Ashish Kalra 
> Signed-off-by: Ashish Kalra 
> ---
>  Documentation/virt/kvm/hypercalls.rst |  15 +++
>  arch/x86/include/asm/kvm_host.h   |   2 +
>  arch/x86/kvm/svm/sev.c| 150 ++
>  arch/x86/kvm/svm/svm.c|   2 +
>  arch/x86/kvm/svm/svm.h|   5 +
>  arch/x86/kvm/vmx/vmx.c|   1 +
>  arch/x86/kvm/x86.c|   6 ++
>  include/uapi/linux/kvm_para.h |   1 +
>  8 files changed, 182 insertions(+)
>
> diff --git a/Documentation/virt/kvm/hypercalls.rst 
> b/Documentation/virt/kvm/hypercalls.rst
> index ed4fddd364ea..7aff0cebab7c 100644
> --- a/Documentation/virt/kvm/hypercalls.rst
> +++ b/Documentation/virt/kvm/hypercalls.rst
> @@ -169,3 +169,18 @@ a0: destination APIC ID
>
>  :Usage example: When sending a call-function IPI-many to vCPUs, yield if
> any of the IPI target vCPUs was preempted.
> +
> +
> +8. KVM_HC_PAGE_ENC_STATUS
> +-
> +:Architecture: x86
> +:Status: active
> +:Purpose: Notify the encryption status changes in guest page table (SEV 
> guest)
> +
> +a0: the guest physical address of the start page
> +a1: the number of pages
> +a2: encryption attribute
> +
> +   Where:
> +   * 1: Encryption attribute is set
> +   * 0: Encryption attribute is cleared
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3d6616f6f6ef..2da5f5e2a10e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1301,6 +1301,8 @@ struct kvm_x86_ops {
> int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
>
> void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
> +   int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
> + unsigned long sz, unsigned long mode);
>  };
>
>  struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 25eaf35ba51d..55c628df5155 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -45,6 +45,11 @@ struct enc_region {
> unsigned long size;
>  };
>
> +struct shared_region {
> +   struct list_head list;
> +   unsigned long gfn_start, gfn_end;
> +};
> +
>  static int sev_flush_asids(void)
>  {
> int ret, error = 0;
> @@ -196,6 +201,8 @@ static int sev_guest_init(struct kvm *kvm, struct 
> kvm_sev_cmd *argp)
> sev->active = true;
> sev->asid = asid;
> INIT_LIST_HEAD(>regions_list);
> +   INIT_LIST_HEAD(>shared_pages_list);
> +   sev->shared_pages_list_count = 0;
>
> return 0;
>
> @@ -1473,6 +1480,148 @@ static int sev_receive_finish(struct kvm *kvm, struct 
> kvm_sev_cmd *argp)
> return ret;
>  }
>
> +static int remove_shared_region(unsigned long start, unsigned long end,
> +   struct list_head *head)
> +{
> +   struct shared_region *pos;
> +
> +   list_for_each_entry(pos, head, list) {
> +   if (pos->gfn_start == start &&
> +   pos->gfn_end == end) {
> +   list_del(>list);
> +   kfree(pos);
> +   return -1;
> +   } else if (start >= pos->gfn_start && end <= pos->gfn_end) {
> +   if (start == pos->gfn_start)
> +   pos->gfn_start = end + 1;
> +   else if (end == pos->gfn_end)
> +   pos->gfn_end = start - 1;
> +   else {
> +   /* Do a de-merge -- split linked list nodes */
> +   unsigned long tmp;
> +   struct shared_region *shrd_region;
> +
> +   tmp = pos->gfn_end;
> +   pos->gfn_end = start-1;
> +   shrd_region = kzalloc(sizeof(*shrd_region), 
> GFP_KERNEL_ACCOUNT);
> +   if (!shrd_region)
> +   return -ENOMEM;
> +   

Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

2021-02-04 Thread Steve Rutherford
On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> for host-side support for SEV live migration. Also add a new custom
> MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> feature.
>
> Signed-off-by: Ashish Kalra 
> ---
>  Documentation/virt/kvm/cpuid.rst |  5 +
>  Documentation/virt/kvm/msr.rst   | 12 
>  arch/x86/include/uapi/asm/kvm_para.h |  4 
>  arch/x86/kvm/svm/sev.c   | 13 +
>  arch/x86/kvm/svm/svm.c   | 16 
>  arch/x86/kvm/svm/svm.h   |  2 ++
>  6 files changed, 52 insertions(+)
>
> diff --git a/Documentation/virt/kvm/cpuid.rst 
> b/Documentation/virt/kvm/cpuid.rst
> index cf62162d4be2..0bdb6cdb12d3 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID15  guest 
> checks this feature bit
> before using extended 
> destination
> ID bits in MSI address bits 
> 11-5.
>
> +KVM_FEATURE_SEV_LIVE_MIGRATION 16  guest checks this feature bit 
> before
> +   using the page encryption 
> state
> +   hypercall to notify the page 
> state
> +   change
> +
>  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24  host will warn if no 
> guest-side
> per-cpu warps are expected in
> kvmclock
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index e37a14c323d2..020245d16087 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -376,3 +376,15 @@ data:
> write '1' to bit 0 of the MSR, this causes the host to re-scan its 
> queue
> and check if there are more notifications pending. The MSR is 
> available
> if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> +
> +MSR_KVM_SEV_LIVE_MIGRATION:
> +0x4b564d08
> +
> +   Control SEV Live Migration features.
> +
> +data:
> +Bit 0 enables (1) or disables (0) host-side SEV Live Migration 
> feature,
> +in other words, this is guest->host communication that it's properly
> +handling the shared pages list.
> +
> +All other bits are reserved.
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
> b/arch/x86/include/uapi/asm/kvm_para.h
> index 950afebfba88..f6bfa138874f 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -33,6 +33,7 @@
>  #define KVM_FEATURE_PV_SCHED_YIELD 13
>  #define KVM_FEATURE_ASYNC_PF_INT   14
>  #define KVM_FEATURE_MSI_EXT_DEST_ID15
> +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
>
>  #define KVM_HINTS_REALTIME  0
>
> @@ -54,6 +55,7 @@
>  #define MSR_KVM_POLL_CONTROL   0x4b564d05
>  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
>  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> +#define MSR_KVM_SEV_LIVE_MIGRATION 0x4b564d08
>
>  struct kvm_steal_time {
> __u64 steal;
> @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
>  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
>  #define KVM_PV_EOI_DISABLED 0x0
>
> +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> +
>  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index b0d324aed515..93f42b3d3e33 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned 
> long gpa,
> return ret;
>  }
>
> +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> +{
> +   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> +
> +   if (!sev_guest(kvm))
> +   return;

This should assert that userspace wanted the guest to be able to make
these calls (see more below).

>
> +
> +   sev->live_migration_enabled = !!(data & 
> KVM_SEV_LIVE_MIGRATION_ENABLED);
> +}
> +
>  int svm_get_shared_pages_list(struct kvm *kvm,
>   struct kvm_shared_pages_list *list)
>  {
> @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> if (!sev_guest(kvm))
> return -ENOTTY;
>
> +   if (!sev->live_migration_enabled)
> +   return -EINVAL;
> +
> if (!list->size)
> return -EINVAL;
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 58f89f83caab..43ea5061926f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
> msr_data *msr)
> svm->msr_decfg = data;
> break;
> }
> +   case 

Re: [PATCH v9 00/18] Add AMD SEV guest live migration support

2021-01-14 Thread Steve Rutherford
Forgot to ask this: is there an intention to support SEND_CANCEL in a
follow up patch?


On Tue, Dec 8, 2020 at 2:03 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> The series add support for AMD SEV guest live migration commands. To protect 
> the
> confidentiality of an SEV protected guest memory while in transit we need to
> use the SEV commands defined in SEV API spec [1].
>
> SEV guest VMs have the concept of private and shared memory. Private memory
> is encrypted with the guest-specific key, while shared memory may be encrypted
> with hypervisor key. The commands provided by the SEV FW are meant to be used
> for the private memory only. The patch series introduces a new hypercall.
> The guest OS can use this hypercall to notify the page encryption status.
> If the page is encrypted with guest specific-key then we use SEV command 
> during
> the migration. If page is not encrypted then fallback to default.
>
> The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be used
> by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> during the migration to know whether the page is encrypted.
>
> This section descibes how the SEV live migration feature is negotiated
> between the host and guest, the host indicates this feature support via
> KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
> sets a UEFI enviroment variable indicating OVMF support for live
> migration, the guest kernel also detects the host support for this
> feature via cpuid and in case of an EFI boot verifies if OVMF also
> supports this feature by getting the UEFI enviroment variable and if it
> set then enables live migration feature on host by writing to a custom
> MSR, if not booted under EFI, then it simply enables the feature by
> again writing to the custom MSR. The host returns error as part of
> SET_PAGE_ENC_BITMAP ioctl if guest has not enabled live migration.
>
> A branch containing these patches is available here:
> https://github.com/AMDESE/linux/tree/sev-migration-v9
>
> [1] https://developer.amd.com/wp-content/resources/55766.PDF
>
> Changes since v8:
> - Rebasing to kvm next branch.
> - Fixed and added comments as per review feedback on v8 patches.
> - Removed implicitly enabling live migration for incoming VMs in
>   in KVM_SET_PAGE_ENC_BITMAP, it is now done via KVM_SET_MSR ioctl.
> - Adds support for bypassing unencrypted guest memory regions for
>   DBG_DECRYPT API calls, guest memory region encryption status in
>   sev_dbg_decrypt() is referenced using the page encryption bitmap.
>
> Changes since v7:
> - Removed the hypervisor specific hypercall/paravirt callback for
>   SEV live migration and moved back to calling kvm_sev_hypercall3
>   directly.
> - Fix build errors as
>   Reported-by: kbuild test robot , specifically fixed
>   build error when CONFIG_HYPERVISOR_GUEST=y and
>   CONFIG_AMD_MEM_ENCRYPT=n.
> - Implicitly enabled live migration for incoming VM(s) to handle
>   A->B->C->... VM migrations.
> - Fixed Documentation as per comments on v6 patches.
> - Fixed error return path in sev_send_update_data() as per comments
>   on v6 patches.
>
> Changes since v6:
> - Rebasing to mainline and refactoring to the new split SVM
>   infrastructre.
> - Move to static allocation of the unified Page Encryption bitmap
>   instead of the dynamic resizing of the bitmap, the static allocation
>   is done implicitly by extending kvm_arch_commit_memory_region() callack
>   to add svm specific x86_ops which can read the userspace provided memory
>   region/memslots and calculate the amount of guest RAM managed by the KVM
>   and grow the bitmap.
> - Fixed KVM_SET_PAGE_ENC_BITMAP ioctl to set the whole bitmap instead
>   of simply clearing specific bits.
> - Removed KVM_PAGE_ENC_BITMAP_RESET ioctl, which is now performed using
>   KVM_SET_PAGE_ENC_BITMAP.
> - Extended guest support for enabling Live Migration feature by adding a
>   check for UEFI environment variable indicating OVMF support for Live
>   Migration feature and additionally checking for KVM capability for the
>   same feature. If not booted under EFI, then we simply check for KVM
>   capability.
> - Add hypervisor specific hypercall for SEV live migration by adding
>   a new paravirt callback as part of x86_hyper_runtime.
>   (x86 hypervisor specific runtime callbacks)
> - Moving MSR handling for MSR_KVM_SEV_LIVE_MIG_EN into svm/sev code
>   and adding check for SEV live migration enabled by guest in the
>   KVM_GET_PAGE_ENC_BITMAP ioctl.
> - Instead of the complete __bss_decrypted section, only specific variables
>   such as hv_clock_boot and wall_clock are marked as decrypted in the
>   page encryption bitmap
>
> Changes since v5:
> - Fix build errors as
>   Reported-by: kbuild test robot 
>
> Changes since v4:
> - Host support has been added to extend KVM capabilities/feature bits to
>   include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
>   query for host-side support for SEV live 

Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

2021-01-07 Thread Steve Rutherford
Supporting merging of consecutive entries (or not) is less important
to get right since it doesn't change any of the APIs. If someone runs
into performance issues, they can loop back and fix this then. I'm
slightly concerned with the behavior for overlapping regions. I also
have slight concerns with how we handle re-encrypting small chunks of
larger unencrypted regions. I don't think we've seen these in
practice, but nothing precludes them afaik.

On Thu, Jan 7, 2021 at 11:23 AM Sean Christopherson  wrote:
>
> On Thu, Jan 07, 2021, Ashish Kalra wrote:
> > On Thu, Jan 07, 2021 at 09:26:25AM -0800, Sean Christopherson wrote:
> > > On Thu, Jan 07, 2021, Ashish Kalra wrote:
> > > > Hello Steve,
> > > >
> > > > On Wed, Jan 06, 2021 at 05:01:33PM -0800, Steve Rutherford wrote:
> > > > > Avoiding an rbtree for such a small (but unstable) list seems correct.
> > > > >
> > > > > For the unencrypted region list strategy, the only questions that I
> > > > > have are fairly secondary.
> > > > > - How should the kernel upper bound the size of the list in the face
> > > > > of malicious guests, but still support large guests? (Something
> > > > > similar to the size provided in the bitmap API would work).
> > > >
> > > > I am thinking of another scenario, where a malicious guest can make
> > > > infinite/repetetive hypercalls and DOS attack the host.
> > > >
> > > > But probably this is a more generic issue, this can be done by any guest
> > > > and under any hypervisor, i don't know what kind of mitigations exist
> > > > for such a scenario ?
> > > >
> > > > Potentially, the guest memory donation model can handle such an attack,
> > > > because in this model, the hypervisor will expect only one hypercall,
> > > > any repetetive hypercalls can make the hypervisor disable the guest ?
> > >
> > > KVM doesn't need to explicitly bound its tracking structures, it just 
> > > needs to
> > > use GFP_KERNEL_ACCOUNT when allocating kernel memory for the structures 
> > > so that
> > > the memory will be accounted to the task/process/VM.  Shadow MMU pages 
> > > are the
> > > only exception that comes to mind; they're still accounted properly, but 
> > > KVM
> > > also explicitly limits them for a variety of reasons.
> > >
> > > The size of the list will naturally be bounded by the size of the guest; 
> > > and
> > > assuming KVM proactively merges adjancent regions, that upper bound is 
> > > probably
> > > reasonably low compared to other allocations, e.g. the aforementioned MMU 
> > > pages.
> > >
> > > And, using a list means a malicious guest will get automatically 
> > > throttled as
> > > the latency of walking the list (to merge/delete existing entries) will 
> > > increase
> > > with the size of the list.
> >
> > Just to add here, potentially there won't be any proactive
> > merging/deletion of existing entries, as the only static entries will be
> > initial guest MMIO regions, which are contigious guest PA ranges but not
> > necessarily adjacent.
>
> My point was that, if the guest is malicious, eventually there will be 
> adjacent
> entries, e.g. the worst case scenario is that the encrypted status changes on
> every 4k page.  Anyways, not really all that important, I mostly thinking out
> loud :-)

Agreed. Tagging this with GFP_KERNEL_ACCOUNT means we don't need to
upper bound the number of pages. I now don't think there is any
unusual DoS potential here. Perhaps, if the guest tries really hard to
make a massive list, they could get a softlockup on the host. Not sure
how important that is to fix.


Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

2021-01-07 Thread Steve Rutherford
On Thu, Jan 7, 2021 at 4:48 PM Ashish Kalra  wrote:
>
> > On Thu, Jan 07, 2021 at 01:34:14AM +, Ashish Kalra wrote:
> > > Hello Steve,
> > >
> > > My thoughts here ...
> > >
> > > On Wed, Jan 06, 2021 at 05:01:33PM -0800, Steve Rutherford wrote:
> > > > Avoiding an rbtree for such a small (but unstable) list seems correct.
> > > >
> > >
> > > I agree.
> > >
> > > > For the unencrypted region list strategy, the only questions that I
> > > > have are fairly secondary.
> > > > - How should the kernel upper bound the size of the list in the face
> > > > of malicious guests, but still support large guests? (Something
> > > > similar to the size provided in the bitmap API would work).
> > > > - What serialization format should be used for the ioctl API?
> > > > (Usermode could send down a pointer to a user region and a size. The
> > > > kernel could then populate that with an array of structs containing
> > > > bases and limits for unencrypted regions.)
> > > > - How will the kernel tag a guest as having exceeded its maximum list
> > > > size, in order to indicate that the list is now incomplete? (Track a
> > > > poison bit, and send it up when getting the serialized list of
> > > > regions).
> > > >
> > > > In my view, there are two main competitors to this strategy:
> > > > - (Existing) Bitmap API
> > > > - A guest memory donation based model
> > > >
> > > > The existing bitmap API avoids any issues with growing too large,
> > > > since it's size is predictable.
> > > >
> > > > To elaborate on the memory donation based model, the guest could put
> > > > an encryption status data structure into unencrypted guest memory, and
> > > > then use a hypercall to inform the host where the base of that
> > > > structure is located. The main advantage of this is that it side steps
> > > > any issues around malicious guests causing large allocations.
> > > >
> > > > The unencrypted region list seems very practical. It's biggest
> > > > advantage over the bitmap is how cheap it will be to pass the
> > > > structure up from the kernel. A memory donation based model could
> > > > achieve similar performance, but with some additional complexity.
> > > >
> > > > Does anyone view the memory donation model as worth the complexity?
> > > > Does anyone think the simplicity of the bitmap is a better tradeoff
> > > > compared to an unencrypted region list?
> > >
> > > One advantage in sticking with the bitmap is that it maps very nicely to
> > > the dirty bitmap page tracking logic in KVM/Qemu. The way Brijesh
> > > designed and implemented it is very similar to dirty page bitmap tracking
> > > and syncing between KVM and Qemu. The same logic is re-used for the page
> > > encryption bitmap which means quite mininal changes and code resuse in
> > > Qemu.
> > >
> > > Any changes to the backing data structure, will require additional
> > > mapping logic to be added to Qemu.
> > >
> > > This is one advantage in keeping the bitmap logic.
> > >
>
> So if nobody is in favor of keeping the (current) bitmap logic, we will
> move to the unencrypted region list approach.
Sounds good to me.

>
> Thanks,
> Ashish


Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

2021-01-06 Thread Steve Rutherford
Avoiding an rbtree for such a small (but unstable) list seems correct.

For the unencrypted region list strategy, the only questions that I
have are fairly secondary.
- How should the kernel upper bound the size of the list in the face
of malicious guests, but still support large guests? (Something
similar to the size provided in the bitmap API would work).
- What serialization format should be used for the ioctl API?
(Usermode could send down a pointer to a user region and a size. The
kernel could then populate that with an array of structs containing
bases and limits for unencrypted regions.)
- How will the kernel tag a guest as having exceeded its maximum list
size, in order to indicate that the list is now incomplete? (Track a
poison bit, and send it up when getting the serialized list of
regions).

In my view, there are two main competitors to this strategy:
- (Existing) Bitmap API
- A guest memory donation based model

The existing bitmap API avoids any issues with growing too large,
since it's size is predictable.

To elaborate on the memory donation based model, the guest could put
an encryption status data structure into unencrypted guest memory, and
then use a hypercall to inform the host where the base of that
structure is located. The main advantage of this is that it side steps
any issues around malicious guests causing large allocations.

The unencrypted region list seems very practical. It's biggest
advantage over the bitmap is how cheap it will be to pass the
structure up from the kernel. A memory donation based model could
achieve similar performance, but with some additional complexity.

Does anyone view the memory donation model as worth the complexity?
Does anyone think the simplicity of the bitmap is a better tradeoff
compared to an unencrypted region list?
Or have other ideas that are not mentioned here?


On Wed, Jan 6, 2021 at 3:06 PM Ashish Kalra  wrote:
>
> On Fri, Dec 18, 2020 at 07:56:41PM +, Dr. David Alan Gilbert wrote:
> > * Kalra, Ashish (ashish.ka...@amd.com) wrote:
> > > Hello Dave,
> > >
> > > On Dec 18, 2020, at 1:40 PM, Dr. David Alan Gilbert  
> > > wrote:
> > >
> > > * Ashish Kalra (ashish.ka...@amd.com) wrote:
> > > On Fri, Dec 11, 2020 at 10:55:42PM +, Ashish Kalra wrote:
> > > Hello All,
> > >
> > > On Tue, Dec 08, 2020 at 10:29:05AM -0600, Brijesh Singh wrote:
> > >
> > > On 12/7/20 9:09 PM, Steve Rutherford wrote:
> > > On Mon, Dec 7, 2020 at 12:42 PM Sean Christopherson  
> > > wrote:
> > > On Sun, Dec 06, 2020, Paolo Bonzini wrote:
> > > On 03/12/20 01:34, Sean Christopherson wrote:
> > > On Tue, Dec 01, 2020, Ashish Kalra wrote:
> > > From: Brijesh Singh 
> > >
> > > KVM hypercall framework relies on alternative framework to patch the
> > > VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> > > apply_alternative() is called then it defaults to VMCALL. The approach
> > > works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> > > will be able to decode the instruction and do the right things. But
> > > when SEV is active, guest memory is encrypted with guest key and
> > > hypervisor will not be able to decode the instruction bytes.
> > >
> > > Add SEV specific hypercall3, it unconditionally uses VMMCALL. The 
> > > hypercall
> > > will be used by the SEV guest to notify encrypted pages to the hypervisor.
> > > What if we invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to 
> > > VMMCALL
> > > and opt into VMCALL?  It's a synthetic feature flag either way, and I 
> > > don't
> > > think there are any existing KVM hypercalls that happen before 
> > > alternatives are
> > > patched, i.e. it'll be a nop for sane kernel builds.
> > >
> > > I'm also skeptical that a KVM specific hypercall is the right approach 
> > > for the
> > > encryption behavior, but I'll take that up in the patches later in the 
> > > series.
> > > Do you think that it's the guest that should "donate" memory for the 
> > > bitmap
> > > instead?
> > > No.  Two things I'd like to explore:
> > >
> > >  1. Making the hypercall to announce/request private vs. shared common 
> > > across
> > > hypervisors (KVM, Hyper-V, VMware, etc...) and technologies (SEV-* 
> > > and TDX).
> > > I'm concerned that we'll end up with multiple hypercalls that do more 
> > > or
> > > less the same thing, e.g. KVM+SEV, Hyper-V+SEV, TDX, etc...  Maybe 
> > > it's a
> > > pipe dream, but I'd like to at least explore options before shov

Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

2020-12-07 Thread Steve Rutherford
On Mon, Dec 7, 2020 at 12:42 PM Sean Christopherson  wrote:
>
> On Sun, Dec 06, 2020, Paolo Bonzini wrote:
> > On 03/12/20 01:34, Sean Christopherson wrote:
> > > On Tue, Dec 01, 2020, Ashish Kalra wrote:
> > > > From: Brijesh Singh 
> > > >
> > > > KVM hypercall framework relies on alternative framework to patch the
> > > > VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
> > > > apply_alternative() is called then it defaults to VMCALL. The approach
> > > > works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
> > > > will be able to decode the instruction and do the right things. But
> > > > when SEV is active, guest memory is encrypted with guest key and
> > > > hypervisor will not be able to decode the instruction bytes.
> > > >
> > > > Add SEV specific hypercall3, it unconditionally uses VMMCALL. The 
> > > > hypercall
> > > > will be used by the SEV guest to notify encrypted pages to the 
> > > > hypervisor.
> > >
> > > What if we invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to 
> > > VMMCALL
> > > and opt into VMCALL?  It's a synthetic feature flag either way, and I 
> > > don't
> > > think there are any existing KVM hypercalls that happen before 
> > > alternatives are
> > > patched, i.e. it'll be a nop for sane kernel builds.
> > >
> > > I'm also skeptical that a KVM specific hypercall is the right approach 
> > > for the
> > > encryption behavior, but I'll take that up in the patches later in the 
> > > series.
> >
> > Do you think that it's the guest that should "donate" memory for the bitmap
> > instead?
>
> No.  Two things I'd like to explore:
>
>   1. Making the hypercall to announce/request private vs. shared common across
>  hypervisors (KVM, Hyper-V, VMware, etc...) and technologies (SEV-* and 
> TDX).
>  I'm concerned that we'll end up with multiple hypercalls that do more or
>  less the same thing, e.g. KVM+SEV, Hyper-V+SEV, TDX, etc...  Maybe it's a
>  pipe dream, but I'd like to at least explore options before shoving in 
> KVM-
>  only hypercalls.
>
>
>   2. Tracking shared memory via a list of ranges instead of a using bitmap to
>  track all of guest memory.  For most use cases, the vast majority of 
> guest
>  memory will be private, most ranges will be 2mb+, and conversions between
>  private and shared will be uncommon events, i.e. the overhead to walk and
>  split/merge list entries is hopefully not a big concern.  I suspect a 
> list
>  would consume far less memory, hopefully without impacting performance.

For a fancier data structure, I'd suggest an interval tree. Linux
already has an rbtree-based interval tree implementation, which would
likely work, and would probably assuage any performance concerns.

Something like this would not be worth doing unless most of the shared
pages were physically contiguous. A sample Ubuntu 20.04 VM on GCP had
60ish discontiguous shared regions. This is by no means a thorough
search, but it's suggestive. If this is typical, then the bitmap would
be far less efficient than most any interval-based data structure.

You'd have to allow userspace to upper bound the number of intervals
(similar to the maximum bitmap size), to prevent host OOMs due to
malicious guests. There's something nice about the guest donating
memory for this, since that would eliminate the OOM risk.


Re: [PATCH v8 00/18] Add AMD SEV guest live migration support

2020-08-05 Thread Steve Rutherford
Are these likely to get merged into 5.9?


On Wed, Jun 3, 2020 at 3:14 PM Ashish Kalra  wrote:
>
> Hello Steve,
>
> On Mon, Jun 01, 2020 at 01:02:23PM -0700, Steve Rutherford wrote:
> > On Mon, May 18, 2020 at 12:07 PM Ashish Kalra  wrote:
> > >
> > > Hello All,
> > >
> > > Any other feedback, review or comments on this patch-set ?
> > >
> > > Thanks,
> > > Ashish
> > >
> > > On Tue, May 05, 2020 at 09:13:49PM +, Ashish Kalra wrote:
> > > > From: Ashish Kalra 
> > > >
> > > > The series add support for AMD SEV guest live migration commands. To 
> > > > protect the
> > > > confidentiality of an SEV protected guest memory while in transit we 
> > > > need to
> > > > use the SEV commands defined in SEV API spec [1].
> > > >
> > > > SEV guest VMs have the concept of private and shared memory. Private 
> > > > memory
> > > > is encrypted with the guest-specific key, while shared memory may be 
> > > > encrypted
> > > > with hypervisor key. The commands provided by the SEV FW are meant to 
> > > > be used
> > > > for the private memory only. The patch series introduces a new 
> > > > hypercall.
> > > > The guest OS can use this hypercall to notify the page encryption 
> > > > status.
> > > > If the page is encrypted with guest specific-key then we use SEV 
> > > > command during
> > > > the migration. If page is not encrypted then fallback to default.
> > > >
> > > > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can 
> > > > be used
> > > > by the qemu to get the page encrypted bitmap. Qemu can consult this 
> > > > bitmap
> > > > during the migration to know whether the page is encrypted.
> > > >
> > > > This section descibes how the SEV live migration feature is negotiated
> > > > between the host and guest, the host indicates this feature support via
> > > > KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
> > > > sets a UEFI enviroment variable indicating OVMF support for live
> > > > migration, the guest kernel also detects the host support for this
> > > > feature via cpuid and in case of an EFI boot verifies if OVMF also
> > > > supports this feature by getting the UEFI enviroment variable and if it
> > > > set then enables live migration feature on host by writing to a custom
> > > > MSR, if not booted under EFI, then it simply enables the feature by
> > > > again writing to the custom MSR. The host returns error as part of
> > > > SET_PAGE_ENC_BITMAP ioctl if guest has not enabled live migration.
> > > >
> > > > A branch containing these patches is available here:
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAMDESE%2Flinux%2Ftree%2Fsev-migration-v8data=02%7C01%7Cashish.kalra%40amd.com%7Cb7da54c6f7784a548ed208d80666c99b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637266386411473155sdata=igztZXTZl1i18e5T4DTlNJw07h6z3aBNCAD6%2BE7r9Ik%3Dreserved=0
> > > >
> > > > [1] 
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fwp-content%2Fresources%2F55766.PDFdata=02%7C01%7Cashish.kalra%40amd.com%7Cb7da54c6f7784a548ed208d80666c99b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637266386411473155sdata=GBgV6HAd2AXZzjK3hp%2F396tDaHlYtN%2FL3Zfny3GaSoU%3Dreserved=0
> > > >
> > > > Changes since v7:
> > > > - Removed the hypervisor specific hypercall/paravirt callback for
> > > >   SEV live migration and moved back to calling kvm_sev_hypercall3
> > > >   directly.
> > > > - Fix build errors as
> > > >   Reported-by: kbuild test robot , specifically fixed
> > > >   build error when CONFIG_HYPERVISOR_GUEST=y and
> > > >   CONFIG_AMD_MEM_ENCRYPT=n.
> > > > - Implicitly enabled live migration for incoming VM(s) to handle
> > > >   A->B->C->... VM migrations.
> > > > - Fixed Documentation as per comments on v6 patches.
> > > > - Fixed error return path in sev_send_update_data() as per comments
> > > >   on v6 patches.
> > > >
> > > > Changes since v6:
> > > > - Rebasing to mainline and refactoring to the new split SVM
> > > >   infrastructre.
> > > > - Move to static allocation of the unified Page Encryption bitmap
> >

Re: [PATCH v8 00/18] Add AMD SEV guest live migration support

2020-06-01 Thread Steve Rutherford
On Mon, May 18, 2020 at 12:07 PM Ashish Kalra  wrote:
>
> Hello All,
>
> Any other feedback, review or comments on this patch-set ?
>
> Thanks,
> Ashish
>
> On Tue, May 05, 2020 at 09:13:49PM +, Ashish Kalra wrote:
> > From: Ashish Kalra 
> >
> > The series add support for AMD SEV guest live migration commands. To 
> > protect the
> > confidentiality of an SEV protected guest memory while in transit we need to
> > use the SEV commands defined in SEV API spec [1].
> >
> > SEV guest VMs have the concept of private and shared memory. Private memory
> > is encrypted with the guest-specific key, while shared memory may be 
> > encrypted
> > with hypervisor key. The commands provided by the SEV FW are meant to be 
> > used
> > for the private memory only. The patch series introduces a new hypercall.
> > The guest OS can use this hypercall to notify the page encryption status.
> > If the page is encrypted with guest specific-key then we use SEV command 
> > during
> > the migration. If page is not encrypted then fallback to default.
> >
> > The patch adds new ioctls KVM_{SET,GET}_PAGE_ENC_BITMAP. The ioctl can be 
> > used
> > by the qemu to get the page encrypted bitmap. Qemu can consult this bitmap
> > during the migration to know whether the page is encrypted.
> >
> > This section descibes how the SEV live migration feature is negotiated
> > between the host and guest, the host indicates this feature support via
> > KVM_FEATURE_CPUID. The guest firmware (OVMF) detects this feature and
> > sets a UEFI enviroment variable indicating OVMF support for live
> > migration, the guest kernel also detects the host support for this
> > feature via cpuid and in case of an EFI boot verifies if OVMF also
> > supports this feature by getting the UEFI enviroment variable and if it
> > set then enables live migration feature on host by writing to a custom
> > MSR, if not booted under EFI, then it simply enables the feature by
> > again writing to the custom MSR. The host returns error as part of
> > SET_PAGE_ENC_BITMAP ioctl if guest has not enabled live migration.
> >
> > A branch containing these patches is available here:
> > https://github.com/AMDESE/linux/tree/sev-migration-v8
> >
> > [1] https://developer.amd.com/wp-content/resources/55766.PDF
> >
> > Changes since v7:
> > - Removed the hypervisor specific hypercall/paravirt callback for
> >   SEV live migration and moved back to calling kvm_sev_hypercall3
> >   directly.
> > - Fix build errors as
> >   Reported-by: kbuild test robot , specifically fixed
> >   build error when CONFIG_HYPERVISOR_GUEST=y and
> >   CONFIG_AMD_MEM_ENCRYPT=n.
> > - Implicitly enabled live migration for incoming VM(s) to handle
> >   A->B->C->... VM migrations.
> > - Fixed Documentation as per comments on v6 patches.
> > - Fixed error return path in sev_send_update_data() as per comments
> >   on v6 patches.
> >
> > Changes since v6:
> > - Rebasing to mainline and refactoring to the new split SVM
> >   infrastructre.
> > - Move to static allocation of the unified Page Encryption bitmap
> >   instead of the dynamic resizing of the bitmap, the static allocation
> >   is done implicitly by extending kvm_arch_commit_memory_region() callack
> >   to add svm specific x86_ops which can read the userspace provided memory
> >   region/memslots and calculate the amount of guest RAM managed by the KVM
> >   and grow the bitmap.
> > - Fixed KVM_SET_PAGE_ENC_BITMAP ioctl to set the whole bitmap instead
> >   of simply clearing specific bits.
> > - Removed KVM_PAGE_ENC_BITMAP_RESET ioctl, which is now performed using
> >   KVM_SET_PAGE_ENC_BITMAP.
> > - Extended guest support for enabling Live Migration feature by adding a
> >   check for UEFI environment variable indicating OVMF support for Live
> >   Migration feature and additionally checking for KVM capability for the
> >   same feature. If not booted under EFI, then we simply check for KVM
> >   capability.
> > - Add hypervisor specific hypercall for SEV live migration by adding
> >   a new paravirt callback as part of x86_hyper_runtime.
> >   (x86 hypervisor specific runtime callbacks)
> > - Moving MSR handling for MSR_KVM_SEV_LIVE_MIG_EN into svm/sev code
> >   and adding check for SEV live migration enabled by guest in the
> >   KVM_GET_PAGE_ENC_BITMAP ioctl.
> > - Instead of the complete __bss_decrypted section, only specific variables
> >   such as hv_clock_boot and wall_clock are marked as decrypted in the
> >   page encryption bitmap
> >
> > Changes since v5:
> > - Fix build errors as
> >   Reported-by: kbuild test robot 
> >
> > Changes since v4:
> > - Host support has been added to extend KVM capabilities/feature bits to
> >   include a new KVM_FEATURE_SEV_LIVE_MIGRATION, which the guest can
> >   query for host-side support for SEV live migration and a new custom MSR
> >   MSR_KVM_SEV_LIVE_MIG_EN is added for guest to enable the SEV live
> >   migration feature.
> > - Ensure that _bss_decrypted section is marked as decrypted in the
> >  

Re: [PATCH v8 17/18] KVM: x86: Add kexec support for SEV Live Migration.

2020-05-29 Thread Steve Rutherford
On Tue, May 5, 2020 at 2:21 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> Reset the host's page encryption bitmap related to kernel
> specific page encryption status settings before we load a
> new kernel by kexec. We cannot reset the complete
> page encryption bitmap here as we need to retain the
> UEFI/OVMF firmware specific settings.
>
> The host's page encryption bitmap is maintained for the
> guest to keep the encrypted/decrypted state of the guest pages,
> therefore we need to explicitly mark all shared pages as
> encrypted again before rebooting into the new guest kernel.
>
> Signed-off-by: Ashish Kalra 
> ---
>  arch/x86/kernel/kvm.c | 28 
>  1 file changed, 28 insertions(+)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 4b29815de873..a8bc30d5b15b 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  static int kvmapf = 1;
>
> @@ -358,6 +359,33 @@ static void kvm_pv_guest_cpu_reboot(void *unused)
>  */
> if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> wrmsrl(MSR_KVM_PV_EOI_EN, 0);
> +   /*
> +* Reset the host's page encryption bitmap related to kernel
> +* specific page encryption status settings before we load a
> +* new kernel by kexec. NOTE: We cannot reset the complete
> +* page encryption bitmap here as we need to retain the
> +* UEFI/OVMF firmware specific settings.
> +*/
> +   if (sev_live_migration_enabled() & (smp_processor_id() == 0)) {
> +   int i;
> +   unsigned long nr_pages;
> +
> +   for (i = 0; i < e820_table->nr_entries; i++) {
> +   struct e820_entry *entry = _table->entries[i];
> +   unsigned long start_pfn;
> +   unsigned long end_pfn;
> +
> +   if (entry->type != E820_TYPE_RAM)
> +   continue;
What should the behavior be for other memory types that are not
expected to be mucked with by firmware? Should we avoid resetting the
enc status of pmem/pram pages?

My intuition here is that we should only preserve the enc status of
those bits that are set by the firmware.

> +
> +   start_pfn = entry->addr >> PAGE_SHIFT;
> +   end_pfn = (entry->addr + entry->size) >> PAGE_SHIFT;
> +   nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE);
> +
> +   kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> +  entry->addr, nr_pages, 1);
> +   }
> +   }
> kvm_pv_disable_apf();
> kvm_disable_steal_time();
>  }
> --
> 2.17.1
>


Re: [PATCH v8 18/18] KVM: SVM: Enable SEV live migration feature implicitly on Incoming VM(s).

2020-05-29 Thread Steve Rutherford
On Tue, May 5, 2020 at 2:22 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> For source VM, live migration feature is enabled explicitly
> when the guest is booting, for the incoming VM(s) it is implied.
> This is required for handling A->B->C->... VM migrations case.
>
> Signed-off-by: Ashish Kalra 
> ---
>  arch/x86/kvm/svm/sev.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 6f69c3a47583..ba7c0ebfa1f3 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1592,6 +1592,13 @@ int svm_set_page_enc_bitmap(struct kvm *kvm,
> if (ret)
> goto unlock;
>
> +   /*
> +* For source VM, live migration feature is enabled
> +* explicitly when the guest is booting, for the
> +* incoming VM(s) it is implied.
> +*/
> +   sev_update_migration_flags(kvm, KVM_SEV_LIVE_MIGRATION_ENABLED);
> +
> bitmap_copy(sev->page_enc_bmap + BIT_WORD(gfn_start), bitmap,
> (gfn_end - gfn_start));
>
> --
> 2.17.1
>

Reviewed-by: Steve Rutherford 


Re: [PATCH v8 16/18] KVM: x86: Mark _bss_decrypted section variables as decrypted in page encryption bitmap.

2020-05-29 Thread Steve Rutherford
On Tue, May 5, 2020 at 2:20 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> Ensure that _bss_decrypted section variables such as hv_clock_boot and
> wall_clock are marked as decrypted in the page encryption bitmap if
> sev liv migration is supported.
>
> Signed-off-by: Ashish Kalra 
> ---
>  arch/x86/kernel/kvmclock.c | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 34b18f6eeb2c..65777bf1218d 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -334,6 +334,18 @@ void __init kvmclock_init(void)
> pr_info("kvm-clock: Using msrs %x and %x",
> msr_kvm_system_time, msr_kvm_wall_clock);
>
> +   if (sev_live_migration_enabled()) {
> +   unsigned long nr_pages;
> +   /*
> +* sizeof(hv_clock_boot) is already PAGE_SIZE aligned
> +*/
> +   early_set_mem_enc_dec_hypercall((unsigned long)hv_clock_boot,
> +   1, 0);
> +   nr_pages = DIV_ROUND_UP(sizeof(wall_clock), PAGE_SIZE);
> +   early_set_mem_enc_dec_hypercall((unsigned long)_clock,
> +   nr_pages, 0);
> +   }
> +
> this_cpu_write(hv_clock_per_cpu, _clock_boot[0]);
> kvm_register_clock("primary cpu clock");
> pvclock_set_pvti_cpu0_va(hv_clock_boot);
> --
> 2.17.1
>

Reviewed-by: Steve Rutherford 


Re: [PATCH v8 14/18] EFI: Introduce the new AMD Memory Encryption GUID.

2020-05-29 Thread Steve Rutherford
On Tue, May 5, 2020 at 2:20 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> Introduce a new AMD Memory Encryption GUID which is currently
> used for defining a new UEFI enviroment variable which indicates
> UEFI/OVMF support for the SEV live migration feature. This variable
> is setup when UEFI/OVMF detects host/hypervisor support for SEV
> live migration and later this variable is read by the kernel using
> EFI runtime services to verify if OVMF supports the live migration
> feature.
>
> Signed-off-by: Ashish Kalra 
> ---
>  include/linux/efi.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 251f1f783cdf..2efb42ccf3a8 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -358,6 +358,7 @@ void efi_native_runtime_setup(void);
>
>  /* OEM GUIDs */
>  #define DELLEMC_EFI_RCI2_TABLE_GUIDEFI_GUID(0x2d9f28a2, 0xa886, 
> 0x456a,  0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55)
> +#define MEM_ENCRYPT_GUID   EFI_GUID(0x0cf29b71, 0x9e51, 
> 0x433a,  0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75)
>
>  typedef struct {
> efi_guid_t guid;
> --
> 2.17.1
>
Have you gotten this GUID upstreamed into edk2?

Reviewed-by: Steve Rutherford 


Re: [PATCH v8 15/18] KVM: x86: Add guest support for detecting and enabling SEV Live Migration feature.

2020-05-29 Thread Steve Rutherford
) {
> +   pr_info("setup_kvm_sev_migration: live migration disabled in 
> OVMF\n");
> +   return false;
> +   }
> +
> +   pr_info("setup_kvm_sev_migration: live migration enabled in OVMF\n");
> +   wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN, KVM_SEV_LIVE_MIGRATION_ENABLED);
> +
> +   return true;
> +}
> +
> +late_initcall(setup_kvm_sev_migration);
> +#endif
> +
>  /*
>   * Iterate through all possible CPUs and map the memory region pointed
>   * by apf_reason, steal_time and kvm_apic_eoi as decrypted at once.
> @@ -725,6 +773,20 @@ static void __init kvm_apic_init(void)
>
>  static void __init kvm_init_platform(void)
>  {
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +   if (sev_active() &&
> +   kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
> +   printk(KERN_INFO "KVM enable live migration\n");
> +   sev_live_mig_enabled = true;
> +   /*
> +* If not booted using EFI, enable Live migration support.
> +*/
> +   if (!efi_enabled(EFI_BOOT))
> +   wrmsrl(MSR_KVM_SEV_LIVE_MIG_EN,
> +  KVM_SEV_LIVE_MIGRATION_ENABLED);
> +   } else
> +   printk(KERN_INFO "KVM enable live migration feature 
> unsupported\n");
> +#endif
> kvmclock_init();
> x86_platform.apic_post_init = kvm_apic_init;
>  }
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index c9800fa811f6..f54be71bc75f 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -46,6 +46,8 @@ EXPORT_SYMBOL_GPL(sev_enable_key);
>
>  bool sev_enabled __section(.data);
>
> +bool sev_live_mig_enabled __section(.data);
> +
>  /* Buffer used for early in-place encryption by BSP, no locking needed */
>  static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
>
> @@ -204,6 +206,9 @@ static void set_memory_enc_dec_hypercall(unsigned long 
> vaddr, int npages,
> unsigned long sz = npages << PAGE_SHIFT;
> unsigned long vaddr_end, vaddr_next;
>
> +   if (!sev_live_migration_enabled())
> +       return;
> +
> vaddr_end = vaddr + sz;
>
> for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> @@ -374,6 +379,12 @@ int __init early_set_memory_encrypted(unsigned long 
> vaddr, unsigned long size)
> return early_set_memory_enc_dec(vaddr, size, true);
>  }
>
> +void __init early_set_mem_enc_dec_hypercall(unsigned long vaddr, int npages,
> +   bool enc)
> +{
> +   set_memory_enc_dec_hypercall(vaddr, npages, enc);
> +}
> +
>  /*
>   * SME and SEV are very similar but they are not the same, so there are
>   * times that the kernel will need to distinguish between SME and SEV. The
> --
> 2.17.1
>


Reviewed-by: Steve Rutherford 


Re: [PATCH v8 12/18] KVM: SVM: Add support for static allocation of unified Page Encryption Bitmap.

2020-05-29 Thread Steve Rutherford
On Tue, May 5, 2020 at 2:18 PM Ashish Kalra  wrote:
>
> From: Ashish Kalra 
>
> Add support for static allocation of the unified Page encryption bitmap by
> extending kvm_arch_commit_memory_region() callack to add svm specific x86_ops
> which can read the userspace provided memory region/memslots and calculate
> the amount of guest RAM managed by the KVM and grow the bitmap based
> on that information, i.e. the highest guest PA that is mapped by a memslot.
>
> Signed-off-by: Ashish Kalra 
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/svm/sev.c  | 35 +
>  arch/x86/kvm/svm/svm.c  |  1 +
>  arch/x86/kvm/svm/svm.h  |  1 +
>  arch/x86/kvm/x86.c  |  5 +
>  5 files changed, 43 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index fc74144d5ab0..b573ea85b57e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1254,6 +1254,7 @@ struct kvm_x86_ops {
>
> bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> +   void (*commit_memory_region)(struct kvm *kvm, enum kvm_mr_change 
> change);
> int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>   unsigned long sz, unsigned long mode);
> int (*get_page_enc_bitmap)(struct kvm *kvm,
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 30efc1068707..c0d7043a0627 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1377,6 +1377,41 @@ static int sev_resize_page_enc_bitmap(struct kvm *kvm, 
> unsigned long new_size)
> return 0;
>  }
>
> +void svm_commit_memory_region(struct kvm *kvm, enum kvm_mr_change change)
> +{
> +   struct kvm_memslots *slots;
> +   struct kvm_memory_slot *memslot;
> +   gfn_t start, end = 0;
> +
> +   spin_lock(>mmu_lock);
> +   if (change == KVM_MR_CREATE) {
> +   slots = kvm_memslots(kvm);
> +   kvm_for_each_memslot(memslot, slots) {
> +   start = memslot->base_gfn;
> +   end = memslot->base_gfn + memslot->npages;
> +   /*
> +* KVM memslots is a sorted list, starting with
> +* the highest mapped guest PA, so pick the topmost
> +* valid guest PA.
> +*/
> +   if (memslot->npages)
> +   break;
> +   }
> +   }
> +   spin_unlock(>mmu_lock);
> +
> +   if (end) {
> +   /*
> +* NORE: This callback is invoked in vm ioctl
> +* set_user_memory_region, hence we can use a
> +* mutex here.
> +*/
> +   mutex_lock(>lock);
> +   sev_resize_page_enc_bitmap(kvm, end);
> +   mutex_unlock(>lock);
> +   }
> +}
> +
>  int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>   unsigned long npages, unsigned long enc)
>  {
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 501e82f5593c..442adbbb0641 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4015,6 +4015,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>
> .check_nested_events = svm_check_nested_events,
>
> +   .commit_memory_region = svm_commit_memory_region,
> .page_enc_status_hc = svm_page_enc_status_hc,
> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> .set_page_enc_bitmap = svm_set_page_enc_bitmap,
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 2ebdcce50312..fd99e0a5417a 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -406,6 +406,7 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long 
> gpa,
>   unsigned long npages, unsigned long enc);
>  int svm_get_page_enc_bitmap(struct kvm *kvm, struct kvm_page_enc_bitmap 
> *bmap);
>  int svm_set_page_enc_bitmap(struct kvm *kvm, struct kvm_page_enc_bitmap 
> *bmap);
> +void svm_commit_memory_region(struct kvm *kvm, enum kvm_mr_change change);
>
>  /* avic.c */
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c4166d7a0493..8938de868d42 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10133,6 +10133,11 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> kvm_mmu_change_mmu_pages(kvm,
> kvm_mmu_calculate_default_mmu_pages(kvm));
>
> +   if (change == KVM_MR_CREATE || change == KVM_MR_DELETE) {
> +   if (kvm_x86_ops.commit_memory_region)
> +   kvm_x86_ops.commit_memory_region(kvm, change);
Why not just call this every time (if it exists) and have the
kvm_x86_op determine if it should do anything?

It 

Re: [PATCH v8 13/18] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

2020-05-29 Thread Steve Rutherford
h/x86/kvm/svm/svm.c
> @@ -2633,6 +2633,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
> msr_data *msr)
> svm->msr_decfg = data;
> break;
> }
> +   case MSR_KVM_SEV_LIVE_MIG_EN:
> +   sev_update_migration_flags(vcpu->kvm, data);
> +   break;
> case MSR_IA32_APICBASE:
> if (kvm_vcpu_apicv_active(vcpu))
> avic_update_vapic_bar(to_svm(vcpu), data);
> @@ -3493,6 +3496,19 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
> svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
>  guest_cpuid_has(>vcpu, X86_FEATURE_NRIPS);
>
> +/*
> + * If SEV guest then enable the Live migration feature.
> + */
> +if (sev_guest(vcpu->kvm)) {
> +  struct kvm_cpuid_entry2 *best;
> +
> +  best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> +  if (!best)
> +  return;
> +
> +  best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> +}
> +
> if (!kvm_vcpu_apicv_active(vcpu))
> return;
>
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index fd99e0a5417a..77f132a6fead 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -65,6 +65,7 @@ struct kvm_sev_info {
> int fd; /* SEV device fd */
> unsigned long pages_locked; /* Number of pages locked */
> struct list_head regions_list;  /* List of registered regions */
> +   bool live_migration_enabled;
> unsigned long *page_enc_bmap;
> unsigned long page_enc_bmap_size;
>  };
> @@ -494,5 +495,6 @@ int svm_unregister_enc_region(struct kvm *kvm,
>  void pre_sev_run(struct vcpu_svm *svm, int cpu);
>  int __init sev_hardware_setup(void);
>  void sev_hardware_teardown(void);
> +void sev_update_migration_flags(struct kvm *kvm, u64 data);
>
>  #endif
> --
> 2.17.1
>

Reviewed-by: Steve Rutherford 


Re: [PATCH v8 11/18] KVM: x86: Introduce KVM_SET_PAGE_ENC_BITMAP ioctl

2020-05-29 Thread Steve Rutherford
t;page_enc_bmap,
> +   sev->page_enc_bmap_size);
> +   mutex_unlock(>lock);
> +   return 0;
> +   }
>
> +
> +   gfn_start = bmap->start_gfn;
> +   gfn_end = gfn_start + bmap->num_pages;
> +
> +   sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / 8;
> +   bitmap = kmalloc(sz, GFP_KERNEL);
> +   if (!bitmap)
> +   return -ENOMEM;
> +
> +   ret = -EFAULT;
> +   if (copy_from_user(bitmap, bmap->enc_bitmap, sz))
> +   goto out;
> +
> +   mutex_lock(>lock);
> +   ret = sev_resize_page_enc_bitmap(kvm, gfn_end);
> +   if (ret)
> +   goto unlock;
> +
> +   bitmap_copy(sev->page_enc_bmap + BIT_WORD(gfn_start), bitmap,
> +   (gfn_end - gfn_start));

I *think* this assumes that gfn_start is a multiple of 8. I'm not
certain I have a clean suggestion for fixing this, other than
advertising that this is an expectation, and returning an error if
that is not true.

If I'm reading bitmap_copy correctly, I also think it assumes all
bitmaps have lengths that are unsigned long aligned, which surprised
me.
>
> +
> +   ret = 0;
> +unlock:
> +   mutex_unlock(>lock);
> +out:
> +   kfree(bitmap);
> +   return ret;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
> struct kvm_sev_cmd sev_cmd;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 588709a9f68e..501e82f5593c 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4017,6 +4017,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>
> .page_enc_status_hc = svm_page_enc_status_hc,
> .get_page_enc_bitmap = svm_get_page_enc_bitmap,
> +   .set_page_enc_bitmap = svm_set_page_enc_bitmap,
>  };
>
>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index f087fa7b380c..2ebdcce50312 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -405,6 +405,7 @@ int nested_svm_exit_special(struct vcpu_svm *svm);
>  int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
>   unsigned long npages, unsigned long enc);
>  int svm_get_page_enc_bitmap(struct kvm *kvm, struct kvm_page_enc_bitmap 
> *bmap);
> +int svm_set_page_enc_bitmap(struct kvm *kvm, struct kvm_page_enc_bitmap 
> *bmap);
>
>  /* avic.c */
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 937797cfaf9a..c4166d7a0493 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5220,6 +5220,18 @@ long kvm_arch_vm_ioctl(struct file *filp,
> r = kvm_x86_ops.get_page_enc_bitmap(kvm, );
> break;
> }
> +   case KVM_SET_PAGE_ENC_BITMAP: {
> +   struct kvm_page_enc_bitmap bitmap;
> +
> +   r = -EFAULT;
> +   if (copy_from_user(, argp, sizeof(bitmap)))
> +   goto out;
> +
> +   r = -ENOTTY;
> +   if (kvm_x86_ops.set_page_enc_bitmap)
> +   r = kvm_x86_ops.set_page_enc_bitmap(kvm, );
> +   break;
> +   }
> default:
> r = -ENOTTY;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index af62f2afaa5d..2798b17484d0 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1529,6 +1529,7 @@ struct kvm_pv_cmd {
>  #define KVM_S390_PV_COMMAND_IOWR(KVMIO, 0xc5, struct kvm_pv_cmd)
>
>  #define KVM_GET_PAGE_ENC_BITMAP_IOW(KVMIO, 0xc6, struct 
> kvm_page_enc_bitmap)
> +#define KVM_SET_PAGE_ENC_BITMAP_IOW(KVMIO, 0xc7, struct 
> kvm_page_enc_bitmap)
>
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
> --
> 2.17.1
>

Otherwise, this looks good to me. Thanks for merging the ioctls together.

Reviewed-by: Steve Rutherford 


Re: [PATCH v8 10/18] mm: x86: Invoke hypercall when page encryption status is changed

2020-05-29 Thread Steve Rutherford
 pud_pfn(*(pud_t *)kpte);
> +   break;
> +   default:
> +   return;
> +   }
> +
> +   psize = page_level_size(level);
> +   pmask = page_level_mask(level);
> +
> +   kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS,
> +  pfn << PAGE_SHIFT, psize >> PAGE_SHIFT, 
> enc);
> +
> +   vaddr_next = (vaddr & pmask) + psize;
> +   }
> +}
> +
>  static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
>  {
> pgprot_t old_prot, new_prot;
> @@ -253,12 +296,13 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int 
> level, bool enc)
>  static int __init early_set_memory_enc_dec(unsigned long vaddr,
>unsigned long size, bool enc)
>  {
> -   unsigned long vaddr_end, vaddr_next;
> +   unsigned long vaddr_end, vaddr_next, start;
> unsigned long psize, pmask;
> int split_page_size_mask;
> int level, ret;
> pte_t *kpte;
>
> +   start = vaddr;
> vaddr_next = vaddr;
> vaddr_end = vaddr + size;
>
> @@ -313,6 +357,8 @@ static int __init early_set_memory_enc_dec(unsigned long 
> vaddr,
>
> ret = 0;
>
> +   set_memory_enc_dec_hypercall(start, PAGE_ALIGN(size) >> PAGE_SHIFT,
> +   enc);
>  out:
> __flush_tlb_all();
> return ret;
> @@ -451,6 +497,15 @@ void __init mem_encrypt_init(void)
> if (sev_active())
> static_branch_enable(_enable_key);
>
> +#ifdef CONFIG_PARAVIRT
> +   /*
> +* With SEV, we need to make a hypercall when page encryption state is
> +* changed.
> +*/
> +   if (sev_active())
> +   pv_ops.mmu.page_encryption_changed = 
> set_memory_enc_dec_hypercall;
> +#endif
> +
> pr_info("AMD %s active\n",
> sev_active() ? "Secure Encrypted Virtualization (SEV)"
>  : "Secure Memory Encryption (SME)");
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 59eca6a94ce7..9aaf1b6f5a1b 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "../mm_internal.h"
>
> @@ -2003,6 +2004,12 @@ static int __set_memory_enc_dec(unsigned long addr, 
> int numpages, bool enc)
>  */
> cpa_flush(, 0);
>
> +   /* Notify hypervisor that a given memory range is mapped encrypted
> +* or decrypted. The hypervisor will use this information during the
> +* VM migration.
> +*/
> +   page_encryption_changed(addr, numpages, enc);
> +
> return ret;
>  }
>
> --
> 2.17.1
>


Reviewed-by: Steve Rutherford 


Re: [PATCH v8 08/18] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

2020-05-29 Thread Steve Rutherford
= gfn_to_pfn(kvm, gfn_end);
> +
> +   if (is_error_noslot_pfn(pfn_start) && !is_noslot_pfn(pfn_start)) {
> +   /*
> +* Allow guest MMIO range(s) to be added
> +* to the page encryption bitmap.
> +*/
> +   return -EINVAL;
> +   }
> +
> +   if (is_error_noslot_pfn(pfn_end) && !is_noslot_pfn(pfn_end)) {
> +   /*
> +* Allow guest MMIO range(s) to be added
> +* to the page encryption bitmap.
> +*/
> +   return -EINVAL;
> +   }
> +
> +   mutex_lock(>lock);
> +
> +   if (sev->page_enc_bmap_size < gfn_end)
> +   goto unlock;
> +
> +   if (enc)
> +   __bitmap_set(sev->page_enc_bmap, gfn_start,
> +   gfn_end - gfn_start);
> +   else
> +   __bitmap_clear(sev->page_enc_bmap, gfn_start,
> +   gfn_end - gfn_start);
> +
> +unlock:
> +   mutex_unlock(>lock);
> +   return 0;
> +}
> +
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>  {
> struct kvm_sev_cmd sev_cmd;
> @@ -1560,6 +1647,9 @@ void sev_vm_destroy(struct kvm *kvm)
>
> sev_unbind_asid(kvm, sev->handle);
> sev_asid_free(sev->asid);
> +
> +   kvfree(sev->page_enc_bmap);
> +   sev->page_enc_bmap = NULL;
>  }
>
>  int __init sev_hardware_setup(void)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 2f379bacbb26..1013ef0f4ce2 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4014,6 +4014,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>
> .check_nested_events = svm_check_nested_events,
> +
> +   .page_enc_status_hc = svm_page_enc_status_hc,
>  };
>
>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index df3474f4fb02..6a562f5928a2 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -65,6 +65,8 @@ struct kvm_sev_info {
> int fd; /* SEV device fd */
> unsigned long pages_locked; /* Number of pages locked */
> struct list_head regions_list;  /* List of registered regions */
> +   unsigned long *page_enc_bmap;
> +   unsigned long page_enc_bmap_size;
>  };
>
>  struct kvm_svm {
> @@ -400,6 +402,8 @@ int nested_svm_check_exception(struct vcpu_svm *svm, 
> unsigned nr,
>bool has_error_code, u32 error_code);
>  int svm_check_nested_events(struct kvm_vcpu *vcpu);
>  int nested_svm_exit_special(struct vcpu_svm *svm);
> +int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> + unsigned long npages, unsigned long enc);
>
>  /* avic.c */
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c2c6335a998c..7d01d3aa6461 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7838,6 +7838,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
> .nested_get_evmcs_version = NULL,
> .need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
> .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> +   .page_enc_status_hc = NULL,
>  };
>
>  static __init int hardware_setup(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c5835f9cb9ad..5f5ddb5765e2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7605,6 +7605,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> kvm_sched_yield(vcpu->kvm, a0);
> ret = 0;
> break;
> +   case KVM_HC_PAGE_ENC_STATUS:
> +   ret = -KVM_ENOSYS;
> +   if (kvm_x86_ops.page_enc_status_hc)
> +   ret = kvm_x86_ops.page_enc_status_hc(vcpu->kvm,
> +   a0, a1, a2);
> +   break;
> default:
> ret = -KVM_ENOSYS;
> break;
> diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
> index 8b86609849b9..847b83b75dc8 100644
> --- a/include/uapi/linux/kvm_para.h
> +++ b/include/uapi/linux/kvm_para.h
> @@ -29,6 +29,7 @@
>  #define KVM_HC_CLOCK_PAIRING   9
>  #define KVM_HC_SEND_IPI10
>  #define KVM_HC_SCHED_YIELD 11
> +#define KVM_HC_PAGE_ENC_STATUS 12
>
>  /*
>   * hypercalls use architecture specific
> --
> 2.17.1
>


Reviewed-by: Steve Rutherford 


Re: [PATCH v8 09/18] KVM: x86: Introduce KVM_GET_PAGE_ENC_BITMAP ioctl

2020-05-29 Thread Steve Rutherford
On Tue, May 5, 2020 at 2:17 PM Ashish Kalra  wrote:
>
> From: Brijesh Singh 
>
> The ioctl can be used to retrieve page encryption bitmap for a given
> gfn range.
>
> Return the correct bitmap as per the number of pages being requested
> by the user. Ensure that we only copy bmap->num_pages bytes in the
> userspace buffer, if bmap->num_pages is not byte aligned we read
> the trailing bits from the userspace and copy those bits as is.
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Joerg Roedel 
> Cc: Borislav Petkov 
> Cc: Tom Lendacky 
> Cc: x...@kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Venu Busireddy 
> Signed-off-by: Brijesh Singh 
> Signed-off-by: Ashish Kalra 
> ---
>  Documentation/virt/kvm/api.rst  | 27 +
>  arch/x86/include/asm/kvm_host.h |  2 +
>  arch/x86/kvm/svm/sev.c  | 70 +
>  arch/x86/kvm/svm/svm.c  |  1 +
>  arch/x86/kvm/svm/svm.h  |  1 +
>  arch/x86/kvm/x86.c  | 12 ++
>  include/uapi/linux/kvm.h| 12 ++
>  7 files changed, 125 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index efbbe570aa9b..ecad84086892 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4636,6 +4636,33 @@ This ioctl resets VCPU registers and control 
> structures according to
>  the clear cpu reset definition in the POP. However, the cpu is not put
>  into ESA mode. This reset is a superset of the initial reset.
>
> +4.125 KVM_GET_PAGE_ENC_BITMAP (vm ioctl)
> +---
> +
> +:Capability: basic
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_page_enc_bitmap (in/out)
> +:Returns: 0 on success, -1 on error
> +
> +/* for KVM_GET_PAGE_ENC_BITMAP */
> +struct kvm_page_enc_bitmap {
> +   __u64 start_gfn;
> +   __u64 num_pages;
> +   union {
> +   void __user *enc_bitmap; /* one bit per page */
> +   __u64 padding2;
> +   };
> +};
> +
> +The encrypted VMs have the concept of private and shared pages. The private
> +pages are encrypted with the guest-specific key, while the shared pages may
> +be encrypted with the hypervisor key. The KVM_GET_PAGE_ENC_BITMAP can
> +be used to get the bitmap indicating whether the guest page is private
> +or shared. The bitmap can be used during the guest migration. If the page
> +is private then the userspace need to use SEV migration commands to transmit
> +the page.
> +
>
>  4.125 KVM_S390_PV_COMMAND
>  -
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4a8ee22f4f5b..9e428befb6a4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1256,6 +1256,8 @@ struct kvm_x86_ops {
> int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
> int (*page_enc_status_hc)(struct kvm *kvm, unsigned long gpa,
>   unsigned long sz, unsigned long mode);
> +   int (*get_page_enc_bitmap)(struct kvm *kvm,
> +   struct kvm_page_enc_bitmap *bmap);
>  };
>
>  struct kvm_x86_init_ops {
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index f088467708f0..387045902470 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1434,6 +1434,76 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned 
> long gpa,
> return 0;
>  }
>
> +int svm_get_page_enc_bitmap(struct kvm *kvm,
> +  struct kvm_page_enc_bitmap *bmap)
> +{
> +   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> +   unsigned long gfn_start, gfn_end;
> +   unsigned long sz, i, sz_bytes;
> +   unsigned long *bitmap;
> +   int ret, n;
> +
> +   if (!sev_guest(kvm))
> +   return -ENOTTY;
> +
> +   gfn_start = bmap->start_gfn;
> +   gfn_end = gfn_start + bmap->num_pages;
> +
> +   sz = ALIGN(bmap->num_pages, BITS_PER_LONG) / BITS_PER_BYTE;
> +   bitmap = kmalloc(sz, GFP_KERNEL);
> +   if (!bitmap)
> +   return -ENOMEM;
> +
> +   /* by default all pages are marked encrypted */
> +   memset(bitmap, 0xff, sz);
> +
> +   mutex_lock(>lock);
> +   if (sev->page_enc_bmap) {
> +   i = gfn_start;
> +   for_each_clear_bit_from(i, sev->page_enc_bmap,
> + min(sev->page_enc_bmap_size, gfn_end))
gfn_end is not a size? I believe you want either gfn_end - gfn_start
or bmap->num_pages.

> +   clear_bit(i - gfn_start, bitmap);
> +   }
> +   mutex_unlock(>lock);
> +
> +   ret = -EFAULT;
> +
> +   n = bmap->num_pages % BITS_PER_BYTE;
> +   sz_bytes = ALIGN(bmap->num_pages, BITS_PER_BYTE) / BITS_PER_BYTE;
> +
> +   /*
> +* Return the correct bitmap as per 

Re: [PATCH 2/2] KVM: x86: Allow userspace to define what's the microcode version

2017-11-27 Thread Steve Rutherford
On Mon, Nov 27, 2017 at 3:58 AM, Paolo Bonzini  wrote:
> On 26/11/2017 17:41, Filippo Sironi wrote:
>> ... that the guest should see.
>> Guest operating systems may check the microcode version to decide whether
>> to disable certain features that are known to be buggy up to certain
>> microcode versions.  Address the issue by making the microcode version
>> that the guest should see settable.
>> The rationale for having userspace specifying the microcode version, rather
>> than having the kernel picking it, is to ensure consistency for live-migrated
>> instances; we don't want them to see a microcode version increase without a
>> reset.
>>
>> Signed-off-by: Filippo Sironi 
>> ---
>>  arch/x86/kvm/x86.c   | 23 +++
>>  include/uapi/linux/kvm.h |  3 +++
>>  2 files changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 925c3e29cad3..741588f27ebc 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4033,6 +4033,29 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>   } u;
>>
>>   switch (ioctl) {
>> + case KVM_GET_MICROCODE_VERSION: {
>> + r = -EFAULT;
>> + if (copy_to_user(argp,
>> +  >arch.microcode_version,
>> +  sizeof(kvm->arch.microcode_version)))
>> + goto out;
>> + break;
>> + }
>> + case KVM_SET_MICROCODE_VERSION: {
>> + u32 microcode_version;
>> +
>> + r = -EFAULT;
>> + if (copy_from_user(_version,
>> +argp,
>> +sizeof(microcode_version)))
>> + goto out;
>> + r = -EINVAL;
>> + if (!microcode_version)
>> + goto out;
>> + kvm->arch.microcode_version = microcode_version;
>> + r = 0;
>> + break;
>> + }
>
> Also, there's no need to define new ioctls, instead you can just place
> it in the vcpu and use KVM_GET_MSR/KVM_SET_MSR.  I'd agree that's
> slightly less polished, but it matches what we do already for e.g.
> nested VMX model specific registers.  And it spares you for writing the
> documentation that you didn't include in this patch. :)
>
> Paolo

This feels good time to mention Peter Hornyack's old MSR KVM_EXIT
patches. With something like them, there would be no need to push this
into the kernel at all.


Re: [PATCH 2/2] KVM: x86: Allow userspace to define what's the microcode version

2017-11-27 Thread Steve Rutherford
On Mon, Nov 27, 2017 at 3:58 AM, Paolo Bonzini  wrote:
> On 26/11/2017 17:41, Filippo Sironi wrote:
>> ... that the guest should see.
>> Guest operating systems may check the microcode version to decide whether
>> to disable certain features that are known to be buggy up to certain
>> microcode versions.  Address the issue by making the microcode version
>> that the guest should see settable.
>> The rationale for having userspace specifying the microcode version, rather
>> than having the kernel picking it, is to ensure consistency for live-migrated
>> instances; we don't want them to see a microcode version increase without a
>> reset.
>>
>> Signed-off-by: Filippo Sironi 
>> ---
>>  arch/x86/kvm/x86.c   | 23 +++
>>  include/uapi/linux/kvm.h |  3 +++
>>  2 files changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 925c3e29cad3..741588f27ebc 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4033,6 +4033,29 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>   } u;
>>
>>   switch (ioctl) {
>> + case KVM_GET_MICROCODE_VERSION: {
>> + r = -EFAULT;
>> + if (copy_to_user(argp,
>> +  >arch.microcode_version,
>> +  sizeof(kvm->arch.microcode_version)))
>> + goto out;
>> + break;
>> + }
>> + case KVM_SET_MICROCODE_VERSION: {
>> + u32 microcode_version;
>> +
>> + r = -EFAULT;
>> + if (copy_from_user(_version,
>> +argp,
>> +sizeof(microcode_version)))
>> + goto out;
>> + r = -EINVAL;
>> + if (!microcode_version)
>> + goto out;
>> + kvm->arch.microcode_version = microcode_version;
>> + r = 0;
>> + break;
>> + }
>
> Also, there's no need to define new ioctls, instead you can just place
> it in the vcpu and use KVM_GET_MSR/KVM_SET_MSR.  I'd agree that's
> slightly less polished, but it matches what we do already for e.g.
> nested VMX model specific registers.  And it spares you for writing the
> documentation that you didn't include in this patch. :)
>
> Paolo

This feels good time to mention Peter Hornyack's old MSR KVM_EXIT
patches. With something like them, there would be no need to push this
into the kernel at all.


Re: [Part1 PATCH v7 00/17] x86: Secure Encrypted Virtualization (AMD)

2017-11-21 Thread Steve Rutherford
On Thu, Nov 16, 2017 at 6:41 AM, Tom Lendacky <thomas.lenda...@amd.com> wrote:
> On 11/16/2017 4:02 AM, Borislav Petkov wrote:
>>
>> On Wed, Nov 15, 2017 at 03:57:13PM -0800, Steve Rutherford wrote:
>>>
>>> One piece that seems missing here is the handling of the vmm
>>> communication exception. What's the plan for non-automatic exits? In
>>> particular, what's the plan for emulated devices that are currently
>>> accessed through MMIO (e.g. the IOAPIC)?
>>
>>
>> First of all, please do not top-post.
>>
>> Then, maybe this would answer some of your questions:
>>
>>
>> http://support.amd.com/TechDocs/Protecting%20VM%20Register%20State%20with%20SEV-ES.pdf
>>
>> But I'd look in Tom's direction for further comments.
>
>
> I'm not sure what the question really is...
>
> MMIO works just fine using the data contained in the VMCB on exit
> (exit_info_1, exit_info_2, insn_bytes, etc.).
>
> These patches are for SEV support.  If the question is related to SEV-ES
> (based on the non-automatic exit comment), that support is not part of
> these patches and will require additional changes to be able to both
> launch a guest as an SEV-ES guest and run as an SEV-ES guest.

I conflated SEV with SEV-ES, which I suspect answers everything here.
The reason it doesn't have support for the #VC exception is because
it's not supposed to... yet.

I'm still interested in the plan for the #VC exception handler, but
this thread doesn't seem like the place.

>
>>
>>> Maybe I'm getting ahead of myself: What's the testing story? (since I
>>> don't think linux would boot with these patches, I'm curious what you
>>> are doing to ensure these pieces work)
>>
>>
>> Seems to boot fine here :)
>
>
> Using these patches we have successfully booted and tested a guest both
> with and without SEV enabled.
>
> Thanks,
> Tom
>
>>
>

Thanks,
Steve


Re: [Part1 PATCH v7 00/17] x86: Secure Encrypted Virtualization (AMD)

2017-11-21 Thread Steve Rutherford
On Thu, Nov 16, 2017 at 6:41 AM, Tom Lendacky  wrote:
> On 11/16/2017 4:02 AM, Borislav Petkov wrote:
>>
>> On Wed, Nov 15, 2017 at 03:57:13PM -0800, Steve Rutherford wrote:
>>>
>>> One piece that seems missing here is the handling of the vmm
>>> communication exception. What's the plan for non-automatic exits? In
>>> particular, what's the plan for emulated devices that are currently
>>> accessed through MMIO (e.g. the IOAPIC)?
>>
>>
>> First of all, please do not top-post.
>>
>> Then, maybe this would answer some of your questions:
>>
>>
>> http://support.amd.com/TechDocs/Protecting%20VM%20Register%20State%20with%20SEV-ES.pdf
>>
>> But I'd look in Tom's direction for further comments.
>
>
> I'm not sure what the question really is...
>
> MMIO works just fine using the data contained in the VMCB on exit
> (exit_info_1, exit_info_2, insn_bytes, etc.).
>
> These patches are for SEV support.  If the question is related to SEV-ES
> (based on the non-automatic exit comment), that support is not part of
> these patches and will require additional changes to be able to both
> launch a guest as an SEV-ES guest and run as an SEV-ES guest.

I conflated SEV with SEV-ES, which I suspect answers everything here.
The reason it doesn't have support for the #VC exception is because
it's not supposed to... yet.

I'm still interested in the plan for the #VC exception handler, but
this thread doesn't seem like the place.

>
>>
>>> Maybe I'm getting ahead of myself: What's the testing story? (since I
>>> don't think linux would boot with these patches, I'm curious what you
>>> are doing to ensure these pieces work)
>>
>>
>> Seems to boot fine here :)
>
>
> Using these patches we have successfully booted and tested a guest both
> with and without SEV enabled.
>
> Thanks,
> Tom
>
>>
>

Thanks,
Steve


Re: [Part1 PATCH v7 00/17] x86: Secure Encrypted Virtualization (AMD)

2017-11-15 Thread Steve Rutherford
One piece that seems missing here is the handling of the vmm
communication exception. What's the plan for non-automatic exits? In
particular, what's the plan for emulated devices that are currently
accessed through MMIO (e.g. the IOAPIC)?

Maybe I'm getting ahead of myself: What's the testing story? (since I
don't think linux would boot with these patches, I'm curious what you
are doing to ensure these pieces work)

On Fri, Oct 20, 2017 at 7:30 AM, Brijesh Singh  wrote:
> This part of Secure Encrypted Virtualization (SEV) series focuses on the
> changes required in a guest OS for SEV support.
>
> When SEV is active, the memory content of guest OS will be transparently
> encrypted with a key unique to the guest VM.
>
> SEV guests have concept of private and shared memory. Private memory is
> encrypted with the guest-specific key, while shared memory may be encrypted 
> with
> hypervisor key. Certain type of memory (namely insruction pages and guest page
> tables) are always treated as private. Due to security reasons all DMA
> operations inside the guest must be performed on shared memory.
>
> The SEV feature is enabled by the hypervisor, and guest can identify it 
> through
> CPUID function and the 0xc0010131 (F17H_SEV) MSR. When enabled, page table
> entries will determine how memory is accessed. If a page table entry has the
> memory encryption mask set, then that memory will be accessed using
> guest-specific key. Certain memory (instruction pages, page tables) will 
> always
> be accessed using guest-specific key.
>
> This patch series builds upon the Secure Memory Encryption (SME) feature. 
> Unlike
> SME, when SEV is enabled, all the data (e.g EFI, kernel, initrd, etc) will 
> have
> been placed into memory as encrypted by the guest BIOS.
>
> The approach that this patch series takes is to encrypt everything possible
> starting early in the boot. Since the DMA operations inside guest must be
> performed on shared memory hence it uses SW-IOTLB to complete the DMA 
> operations.
>
> The following links provide additional details:
>
> AMD Memory Encryption whitepaper:
> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
>
> AMD64 Architecture Programmer's Manual:
> http://support.amd.com/TechDocs/24593.pdf
> SME is section 7.10
> SEV is section 15.34
>
> Secure Encrypted Virutualization Key Management:
> http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
>
> KVM Forum Presentation:
> http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
>
> SEV Guest BIOS support:
>   SEV support has been accepted into EDKII/OVMF BIOS
>   https://github.com/tianocore/edk2/commits/master
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Borislav Petkov 
> Cc: Andy Lutomirski 
> Cc: Tom Lendacky 
> Cc: Brijesh Singh 
> Cc: Paolo Bonzini 
> Cc: "Radim KrĠmář" 
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: x...@kernel.org
>
> ---
> This series is based on tip/master commit : 7ffee292ddca (Merge branch 
> 'x86/urgent')
>
> Complete git tree is available: 
> https://github.com/codomania/tip/tree/sev-v7-p1
>
> Changes since v6:
>  * include jump_label.h to fix the build error seen with one of the randconfig
>
> Changes since v5:
>  * enhance early_set_memory_decrypted() to do memory contents encrypt/decrypt 
> in
>addition to C bit changes.
>
> Changes since v4:
>  * rename per-CPU define to DEFINE_PER_CPU_DECRYPTED
>  * add more comments in per-CPU section definition
>  * rename __sev_active() to sev_key_active() to use more obivious naming
>  * changes to address v4 feedbacks
>
> Changes since v3:
>  * use static key to branch the unrolling of rep ins/outs when SEV is active
>  * simplify the memory encryption detection logic
>  * rename per-cpu define to DEFINE_PER_CPU_UNENCRYPTED
>  * simplfy the logic to map per-cpu as unencrypted
>  * changes to address v3 feedbacks
>
> Changes since v2:
>  * add documentation
>  * update early_set_memory_* to use kernel_physical_mapping_init()
>to split larger page into smaller (recommended by Boris)
>  * changes to address v2 feedback
>  * drop hypervisor specific patches, those patches will be included in part 2
>
> Brijesh Singh (5):
>   Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV)
> description
>   x86: Add support for changing memory encryption attribute in early
> boot
>   percpu: Introduce DEFINE_PER_CPU_DECRYPTED
>   X86/KVM: Decrypt shared per-cpu variables when SEV is active
>   X86/KVM: Clear encryption attribute when SEV is active
>
> Tom Lendacky (12):
>   x86/mm: Add Secure Encrypted Virtualization (SEV) support
>   x86/mm: Don't attempt to 

Re: [Part1 PATCH v7 00/17] x86: Secure Encrypted Virtualization (AMD)

2017-11-15 Thread Steve Rutherford
One piece that seems missing here is the handling of the vmm
communication exception. What's the plan for non-automatic exits? In
particular, what's the plan for emulated devices that are currently
accessed through MMIO (e.g. the IOAPIC)?

Maybe I'm getting ahead of myself: What's the testing story? (since I
don't think linux would boot with these patches, I'm curious what you
are doing to ensure these pieces work)

On Fri, Oct 20, 2017 at 7:30 AM, Brijesh Singh  wrote:
> This part of Secure Encrypted Virtualization (SEV) series focuses on the
> changes required in a guest OS for SEV support.
>
> When SEV is active, the memory content of guest OS will be transparently
> encrypted with a key unique to the guest VM.
>
> SEV guests have concept of private and shared memory. Private memory is
> encrypted with the guest-specific key, while shared memory may be encrypted 
> with
> hypervisor key. Certain type of memory (namely insruction pages and guest page
> tables) are always treated as private. Due to security reasons all DMA
> operations inside the guest must be performed on shared memory.
>
> The SEV feature is enabled by the hypervisor, and guest can identify it 
> through
> CPUID function and the 0xc0010131 (F17H_SEV) MSR. When enabled, page table
> entries will determine how memory is accessed. If a page table entry has the
> memory encryption mask set, then that memory will be accessed using
> guest-specific key. Certain memory (instruction pages, page tables) will 
> always
> be accessed using guest-specific key.
>
> This patch series builds upon the Secure Memory Encryption (SME) feature. 
> Unlike
> SME, when SEV is enabled, all the data (e.g EFI, kernel, initrd, etc) will 
> have
> been placed into memory as encrypted by the guest BIOS.
>
> The approach that this patch series takes is to encrypt everything possible
> starting early in the boot. Since the DMA operations inside guest must be
> performed on shared memory hence it uses SW-IOTLB to complete the DMA 
> operations.
>
> The following links provide additional details:
>
> AMD Memory Encryption whitepaper:
> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
>
> AMD64 Architecture Programmer's Manual:
> http://support.amd.com/TechDocs/24593.pdf
> SME is section 7.10
> SEV is section 15.34
>
> Secure Encrypted Virutualization Key Management:
> http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
>
> KVM Forum Presentation:
> http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf
>
> SEV Guest BIOS support:
>   SEV support has been accepted into EDKII/OVMF BIOS
>   https://github.com/tianocore/edk2/commits/master
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Borislav Petkov 
> Cc: Andy Lutomirski 
> Cc: Tom Lendacky 
> Cc: Brijesh Singh 
> Cc: Paolo Bonzini 
> Cc: "Radim KrĠmář" 
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: x...@kernel.org
>
> ---
> This series is based on tip/master commit : 7ffee292ddca (Merge branch 
> 'x86/urgent')
>
> Complete git tree is available: 
> https://github.com/codomania/tip/tree/sev-v7-p1
>
> Changes since v6:
>  * include jump_label.h to fix the build error seen with one of the randconfig
>
> Changes since v5:
>  * enhance early_set_memory_decrypted() to do memory contents encrypt/decrypt 
> in
>addition to C bit changes.
>
> Changes since v4:
>  * rename per-CPU define to DEFINE_PER_CPU_DECRYPTED
>  * add more comments in per-CPU section definition
>  * rename __sev_active() to sev_key_active() to use more obivious naming
>  * changes to address v4 feedbacks
>
> Changes since v3:
>  * use static key to branch the unrolling of rep ins/outs when SEV is active
>  * simplify the memory encryption detection logic
>  * rename per-cpu define to DEFINE_PER_CPU_UNENCRYPTED
>  * simplfy the logic to map per-cpu as unencrypted
>  * changes to address v3 feedbacks
>
> Changes since v2:
>  * add documentation
>  * update early_set_memory_* to use kernel_physical_mapping_init()
>to split larger page into smaller (recommended by Boris)
>  * changes to address v2 feedback
>  * drop hypervisor specific patches, those patches will be included in part 2
>
> Brijesh Singh (5):
>   Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV)
> description
>   x86: Add support for changing memory encryption attribute in early
> boot
>   percpu: Introduce DEFINE_PER_CPU_DECRYPTED
>   X86/KVM: Decrypt shared per-cpu variables when SEV is active
>   X86/KVM: Clear encryption attribute when SEV is active
>
> Tom Lendacky (12):
>   x86/mm: Add Secure Encrypted Virtualization (SEV) support
>   x86/mm: Don't attempt to encrypt initrd under SEV
>   x86/realmode: Don't decrypt trampoline area under SEV
>   x86/mm: Use encrypted access of boot related data with SEV
>   x86/mm: Include SEV for encryption memory attribute 

Re: kvm: use-after-free in process_srcu

2017-01-12 Thread Steve Rutherford
I'm not that familiar with the kernel's workqueues, but this seems
like the classic "callback outlives the memory it references"
use-after-free, where the process_srcu callback is outliving struct
kvm (which contains the srcu_struct). If that's right, then calling
srcu_barrier (which should wait for all of the call_srcu callbacks to
complete, which are what enqueue the process_srcu callbacks) before
cleanup_srcu_struct in kvm_destroy_vm probably fixes this.

The corresponding patch to virt/kvm/kvm_main.c looks something like:
static void kvm_destroy_vm(struct kvm *kvm)
{
...
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
kvm_free_memslots(kvm, kvm->memslots[i]);
+  srcu_barrier(>irq_srcu);
cleanup_srcu_struct(>irq_srcu);
+  srcu_barrier(>srcu);
cleanup_srcu_struct(>srcu);
...


Since we don't have a repro, this obviously won't be readily testable.
I find srcu subtle enough that I don't trust my reasoning fully (in
particular, I don't trust that waiting for all of the call_srcu
callbacks to complete also waits for all of the process_srcu
callbacks). Someone else know if that's the case?

Steve

On Sun, Dec 11, 2016 at 12:49 AM, Dmitry Vyukov  wrote:
> On Sun, Dec 11, 2016 at 9:40 AM, Vegard Nossum  
> wrote:
>> On 11 December 2016 at 07:46, Dmitry Vyukov  wrote:
>>> Hello,
>>>
>>> I am getting the following use-after-free reports while running
>>> syzkaller fuzzer.
>>> On commit 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7).
>>> Unfortunately it is not reproducible, but all reports look sane and
>>> very similar, so I would assume that it is some hard to trigger race.
>>> In all cases the use-after-free offset within struct kvm is 344 bytes.
>>> This points to srcu field, which starts at 208 with size 360 (I have
>>> some debug configs enabled).
>> [...]
>>>  [  376.024345] [] __fput+0x34e/0x910 fs/file_table.c:208
>>>  [  376.024345] [] fput+0x1a/0x20 fs/file_table.c:244
>>
>> I've been hitting what I think is a struct file refcounting bug which
>> causes similar symptoms as you have here (the struct file is freed
>> while somebody still has an active reference to it).
>>
>>>  [  376.024345] [] task_work_run+0x1a0/0x280
>>> kernel/task_work.c:116
>>>  [  376.024345] [< inline >] exit_task_work 
>>> include/linux/task_work.h:21
>>>  [  376.024345] [] do_exit+0x1842/0x2650 kernel/exit.c:828
>>>  [  376.024345] [] do_group_exit+0x14e/0x420 
>>> kernel/exit.c:932
>>>  [  376.024345] [] get_signal+0x663/0x1880
>>> kernel/signal.c:2307
>>>  [  376.024345] [] do_signal+0xc5/0x2190
>>> arch/x86/kernel/signal.c:807
>>
>> Was this or any other process by any chance killed by the OOM killer?
>> That seems to be a pattern in the crashes I've seen. If not, do you
>> know what killed this process?
>
>
> Difficult to say as I can't reproduce them.
> I've looked at the logs I have and there are no OOM kills, only some
> kvm-related messages:
>
> [  372.188708] kvm [12528]: vcpu0, guest rIP: 0xfff0
> kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x2, nop
> [  372.321334] kvm [12528]: vcpu0, guest rIP: 0xfff0 unhandled wrmsr:
> 0x0 data 0x0
> [  372.426831] kvm [12593]: vcpu512, guest rIP: 0xfff0 unhandled
> wrmsr: 0x5 data 0x200
> [  372.646417] irq bypass consumer (token 880052f74780)
> registration fails: -16
> [  373.001273] pit: kvm: requested 1676 ns i8254 timer period limited
> to 50 ns
> [  375.541449] kvm [13011]: vcpu0, guest rIP: 0x11 unhandled
> wrmsr: 0x0 data 0x2
> [  376.005387] 
> ==
> [  376.024345] BUG: KASAN: use-after-free in process_srcu+0x27a/0x280
> at addr 88005e29a418
>
> [  720.214985] kvm: vcpu 0: requested 244148 ns lapic timer period
> limited to 50 ns
> [  720.271334] kvm: vcpu 0: requested 244148 ns lapic timer period
> limited to 50 ns
> [  720.567985] kvm_vm_ioctl_assign_device: host device not found
> [  721.094589] kvm [22114]: vcpu0, guest rIP: 0x2 unhandled wrmsr: 0x6 data 
> 0x8
> [  723.829467] 
> ==
> [  723.829467] BUG: KASAN: use-after-free in process_srcu+0x27a/0x280
> at addr 88005a4d10d8
>
> Logs capture ~3-4 seconds before the crash.
> However, syzkaller test processes tend to consume lots of memory from
> time to time and cause low memory conditions.
>
> Kills are usually caused by my test driver that kills test processes
> after short time.
>
> However, I do see other assorted bugs caused by kvm that are induced
> by OOM kills:
> https://groups.google.com/d/msg/syzkaller/ytVPh93HLnI/KhZdengZBwAJ


Re: kvm: use-after-free in process_srcu

2017-01-12 Thread Steve Rutherford
I'm not that familiar with the kernel's workqueues, but this seems
like the classic "callback outlives the memory it references"
use-after-free, where the process_srcu callback is outliving struct
kvm (which contains the srcu_struct). If that's right, then calling
srcu_barrier (which should wait for all of the call_srcu callbacks to
complete, which are what enqueue the process_srcu callbacks) before
cleanup_srcu_struct in kvm_destroy_vm probably fixes this.

The corresponding patch to virt/kvm/kvm_main.c looks something like:
static void kvm_destroy_vm(struct kvm *kvm)
{
...
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
kvm_free_memslots(kvm, kvm->memslots[i]);
+  srcu_barrier(>irq_srcu);
cleanup_srcu_struct(>irq_srcu);
+  srcu_barrier(>srcu);
cleanup_srcu_struct(>srcu);
...


Since we don't have a repro, this obviously won't be readily testable.
I find srcu subtle enough that I don't trust my reasoning fully (in
particular, I don't trust that waiting for all of the call_srcu
callbacks to complete also waits for all of the process_srcu
callbacks). Someone else know if that's the case?

Steve

On Sun, Dec 11, 2016 at 12:49 AM, Dmitry Vyukov  wrote:
> On Sun, Dec 11, 2016 at 9:40 AM, Vegard Nossum  
> wrote:
>> On 11 December 2016 at 07:46, Dmitry Vyukov  wrote:
>>> Hello,
>>>
>>> I am getting the following use-after-free reports while running
>>> syzkaller fuzzer.
>>> On commit 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7).
>>> Unfortunately it is not reproducible, but all reports look sane and
>>> very similar, so I would assume that it is some hard to trigger race.
>>> In all cases the use-after-free offset within struct kvm is 344 bytes.
>>> This points to srcu field, which starts at 208 with size 360 (I have
>>> some debug configs enabled).
>> [...]
>>>  [  376.024345] [] __fput+0x34e/0x910 fs/file_table.c:208
>>>  [  376.024345] [] fput+0x1a/0x20 fs/file_table.c:244
>>
>> I've been hitting what I think is a struct file refcounting bug which
>> causes similar symptoms as you have here (the struct file is freed
>> while somebody still has an active reference to it).
>>
>>>  [  376.024345] [] task_work_run+0x1a0/0x280
>>> kernel/task_work.c:116
>>>  [  376.024345] [< inline >] exit_task_work 
>>> include/linux/task_work.h:21
>>>  [  376.024345] [] do_exit+0x1842/0x2650 kernel/exit.c:828
>>>  [  376.024345] [] do_group_exit+0x14e/0x420 
>>> kernel/exit.c:932
>>>  [  376.024345] [] get_signal+0x663/0x1880
>>> kernel/signal.c:2307
>>>  [  376.024345] [] do_signal+0xc5/0x2190
>>> arch/x86/kernel/signal.c:807
>>
>> Was this or any other process by any chance killed by the OOM killer?
>> That seems to be a pattern in the crashes I've seen. If not, do you
>> know what killed this process?
>
>
> Difficult to say as I can't reproduce them.
> I've looked at the logs I have and there are no OOM kills, only some
> kvm-related messages:
>
> [  372.188708] kvm [12528]: vcpu0, guest rIP: 0xfff0
> kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x2, nop
> [  372.321334] kvm [12528]: vcpu0, guest rIP: 0xfff0 unhandled wrmsr:
> 0x0 data 0x0
> [  372.426831] kvm [12593]: vcpu512, guest rIP: 0xfff0 unhandled
> wrmsr: 0x5 data 0x200
> [  372.646417] irq bypass consumer (token 880052f74780)
> registration fails: -16
> [  373.001273] pit: kvm: requested 1676 ns i8254 timer period limited
> to 50 ns
> [  375.541449] kvm [13011]: vcpu0, guest rIP: 0x11 unhandled
> wrmsr: 0x0 data 0x2
> [  376.005387] 
> ==
> [  376.024345] BUG: KASAN: use-after-free in process_srcu+0x27a/0x280
> at addr 88005e29a418
>
> [  720.214985] kvm: vcpu 0: requested 244148 ns lapic timer period
> limited to 50 ns
> [  720.271334] kvm: vcpu 0: requested 244148 ns lapic timer period
> limited to 50 ns
> [  720.567985] kvm_vm_ioctl_assign_device: host device not found
> [  721.094589] kvm [22114]: vcpu0, guest rIP: 0x2 unhandled wrmsr: 0x6 data 
> 0x8
> [  723.829467] 
> ==
> [  723.829467] BUG: KASAN: use-after-free in process_srcu+0x27a/0x280
> at addr 88005a4d10d8
>
> Logs capture ~3-4 seconds before the crash.
> However, syzkaller test processes tend to consume lots of memory from
> time to time and cause low memory conditions.
>
> Kills are usually caused by my test driver that kills test processes
> after short time.
>
> However, I do see other assorted bugs caused by kvm that are induced
> by OOM kills:
> https://groups.google.com/d/msg/syzkaller/ytVPh93HLnI/KhZdengZBwAJ


Re: [PATCH 0/3] KVM: Fix lost IRQ acks for RTC

2016-03-01 Thread Steve Rutherford
This issue seems generic to level triggered interrupts as well as RTC
interrupts. It looks like KVM hacks around the issue with level
triggered interrupts by clearing the remote IRR when an IRQ is
reconfigured. Seems like an (admittedly lossy) way to handle this
issue with the RTC-IRQ would be to follow the lead of level-triggered
interrupts, and clear the pending EOIs when reconfiguring the RTC-IRQ.

[Given that we are already talking about this, this could be viewed as
a good time to go back and fix the issues with the remote IRR in the
IOAPIC.]

On Mon, Feb 29, 2016 at 7:30 AM, Joerg Roedel  wrote:
> On Mon, Feb 29, 2016 at 04:12:42PM +0100, Paolo Bonzini wrote:
>> > This information is then used to match EOI signals from the
>> > guest to the RTC. This explicit back-tracking fixes the
>> > issue.
>>
>> Nice patches, really.  Ok to wait until 4.6?
>
> Thanks. Putting them into v4.6 is fine for me.
>
>
> Joerg
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM: Fix lost IRQ acks for RTC

2016-03-01 Thread Steve Rutherford
This issue seems generic to level triggered interrupts as well as RTC
interrupts. It looks like KVM hacks around the issue with level
triggered interrupts by clearing the remote IRR when an IRQ is
reconfigured. Seems like an (admittedly lossy) way to handle this
issue with the RTC-IRQ would be to follow the lead of level-triggered
interrupts, and clear the pending EOIs when reconfiguring the RTC-IRQ.

[Given that we are already talking about this, this could be viewed as
a good time to go back and fix the issues with the remote IRR in the
IOAPIC.]

On Mon, Feb 29, 2016 at 7:30 AM, Joerg Roedel  wrote:
> On Mon, Feb 29, 2016 at 04:12:42PM +0100, Paolo Bonzini wrote:
>> > This information is then used to match EOI signals from the
>> > guest to the RTC. This explicit back-tracking fixes the
>> > issue.
>>
>> Nice patches, really.  Ok to wait until 4.6?
>
> Thanks. Putting them into v4.6 is fine for me.
>
>
> Joerg
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: x86: set TMR when the interrupt is accepted

2015-09-02 Thread Steve Rutherford
On Thu, Aug 13, 2015 at 09:31:48AM +0200, Paolo Bonzini wrote:
Pinging this thread.

Should I put together a patch to make split irqchip work properly with the old 
TMR behavior?

> 
> 
> On 13/08/2015 08:35, Zhang, Yang Z wrote:
> >> You may be right. It is safe if no future hardware plans to use
> >> it. Let me check with our hardware team to see whether it will be
> >> used or not in future.
> > 
> > After checking with Jun, there is no guarantee that the guest running
> > on another CPU will operate properly if hypervisor modify the vTMR
> > from another CPU. So the hypervisor should not to do it.
> 
> I guess I can cause a vmexit on level-triggered interrupts, it's not a
> big deal, but no weasel words, please.
> 
> What's going to break, and where is it documented?
> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] KVM: x86: set TMR when the interrupt is accepted

2015-09-02 Thread Steve Rutherford
On Thu, Aug 13, 2015 at 09:31:48AM +0200, Paolo Bonzini wrote:
Pinging this thread.

Should I put together a patch to make split irqchip work properly with the old 
TMR behavior?

> 
> 
> On 13/08/2015 08:35, Zhang, Yang Z wrote:
> >> You may be right. It is safe if no future hardware plans to use
> >> it. Let me check with our hardware team to see whether it will be
> >> used or not in future.
> > 
> > After checking with Jun, there is no guarantee that the guest running
> > on another CPU will operate properly if hypervisor modify the vTMR
> > from another CPU. So the hypervisor should not to do it.
> 
> I guess I can cause a vmexit on level-triggered interrupts, it's not a
> big deal, but no weasel words, please.
> 
> What's going to break, and where is it documented?
> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] KVM: x86: set TMR when the interrupt is accepted

2015-07-30 Thread Steve Rutherford
On Thu, Jul 30, 2015 at 11:26:28PM +, Zhang, Yang Z wrote:
> Paolo Bonzini wrote on 2015-07-29:
> > Do not compute TMR in advance.  Instead, set the TMR just before the
> > interrupt is accepted into the IRR.  This limits the coupling between
> > IOAPIC and LAPIC.
> > 
> 
> Uh.., it back to original way which is wrong. You cannot modify the apic 
> page(here is the TMR reg) directly when the corresponding VMCS may be used at 
> same time.
Oh... Yeah. That's a damn good point, given that the interrupt can be injected 
from another thread while one is in that guest vcpu. 

Easiest time to update the TMR should be on guest entry through 
vcpu_scan_ioapic, as before. 

The best way to go is probably to ditch the new per vcpu EOI exit bitmap, and 
just update/use the TMR. There's no reason to duplicate that data in the 
representation of the apic (I suspect that the duplication was inspired by my 
mistaken notion of the TMR). The IOAPIC exit check can use the TMR instead. 

Based upon my reading of the SDM, the only reason that the eoi exit bitmaps are 
not the exact same as the TMR is that it is possible to have virtual-interrupt 
delivery enabled /without/ an apic access page (Note: V-ID => EOI exit bitmap 
enabled).

Yang, do you happen to know if that is the case?

[Note: Just looked back at the old code for updating the EOI exit bitmaps, 
which for some reason was configured to trigger EOI exits for all IOAPIC 
interrupts, not just level-triggered IOAPIC interrupts. Which is weird, and I 
believe totally unecessary.]


> 
> 
> > Signed-off-by: Paolo Bonzini 
> > ---
> >  arch/x86/kvm/ioapic.c |  9 ++---
> >  arch/x86/kvm/ioapic.h |  3 +--
> >  arch/x86/kvm/lapic.c  | 19 ++-
> >  arch/x86/kvm/lapic.h  |  1 -
> >  arch/x86/kvm/x86.c|  5 +
> >  5 files changed, 14 insertions(+), 23 deletions(-)
> > diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> > index 856f79105bb5..eaf4ec38d980 100644
> > --- a/arch/x86/kvm/ioapic.c
> > +++ b/arch/x86/kvm/ioapic.c
> > @@ -246,8 +246,7 @@ static void update_handled_vectors(struct kvm_ioapic
> > *ioapic)
> > smp_wmb();
> >  }
> > -void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
> > -   u32 *tmr)
> > +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
> >  {
> > struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
> > union kvm_ioapic_redirect_entry *e;
> > @@ -260,13 +259,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
> > u64 *eoi_exit_bitmap,
> > kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, 
> > index) ||
> > index == RTC_GSI) { if 
> > (kvm_apic_match_dest(vcpu, NULL, 0,
> > -   e->fields.dest_id, e->fields.dest_mode)) {
> > +   e->fields.dest_id, e->fields.dest_mode))
> > __set_bit(e->fields.vector,
> > (unsigned long *)eoi_exit_bitmap);
> > -   if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG)
> > -   __set_bit(e->fields.vector, -   
> > (unsigned long *)tmr); -
> > }
> > }
> > }
> > spin_unlock(>lock);
> > diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> > index ca0b0b4e6256..3dbd0e2aac4e 100644
> > --- a/arch/x86/kvm/ioapic.h
> > +++ b/arch/x86/kvm/ioapic.h
> > @@ -120,7 +120,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct
> > kvm_lapic *src,
> > struct kvm_lapic_irq *irq, unsigned long *dest_map);
> >  int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
> >  int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
> > -void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
> > -   u32 *tmr);
> > +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
> > 
> >  #endif
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 2a5ca97c263b..9be64c77d6db 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -551,15 +551,6 @@ static void pv_eoi_clr_pending(struct kvm_vcpu
> > *vcpu)
> > __clear_bit(KVM_APIC_PV_EOI_PENDING, >arch.apic_attention);
> >  }
> > -void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
> > -{
> > -   struct kvm_lapic *apic = vcpu->arch.apic;
> > -   int i;
> > -
> > -   for (i = 0; i < 8; i++)
> > -   apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]);
> > -}
> > -
> >  static void apic_update_ppr(struct kvm_lapic *apic)
> >  {
> > u32 tpr, isrv, ppr, old_ppr;
> > @@ -781,6 +772,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
> > delivery_mode,
> > case APIC_DM_LOWEST:
> > vcpu->arch.apic_arb_prio++;
> > case APIC_DM_FIXED:
> > +   if (unlikely(trig_mode && !level))
> > +   break;
> > +
> > 

Re: [PATCH 1/2] KVM: x86: set TMR when the interrupt is accepted

2015-07-30 Thread Steve Rutherford
On Thu, Jul 30, 2015 at 11:26:28PM +, Zhang, Yang Z wrote:
 Paolo Bonzini wrote on 2015-07-29:
  Do not compute TMR in advance.  Instead, set the TMR just before the
  interrupt is accepted into the IRR.  This limits the coupling between
  IOAPIC and LAPIC.
  
 
 Uh.., it back to original way which is wrong. You cannot modify the apic 
 page(here is the TMR reg) directly when the corresponding VMCS may be used at 
 same time.
Oh... Yeah. That's a damn good point, given that the interrupt can be injected 
from another thread while one is in that guest vcpu. 

Easiest time to update the TMR should be on guest entry through 
vcpu_scan_ioapic, as before. 

The best way to go is probably to ditch the new per vcpu EOI exit bitmap, and 
just update/use the TMR. There's no reason to duplicate that data in the 
representation of the apic (I suspect that the duplication was inspired by my 
mistaken notion of the TMR). The IOAPIC exit check can use the TMR instead. 

Based upon my reading of the SDM, the only reason that the eoi exit bitmaps are 
not the exact same as the TMR is that it is possible to have virtual-interrupt 
delivery enabled /without/ an apic access page (Note: V-ID = EOI exit bitmap 
enabled).

Yang, do you happen to know if that is the case?

[Note: Just looked back at the old code for updating the EOI exit bitmaps, 
which for some reason was configured to trigger EOI exits for all IOAPIC 
interrupts, not just level-triggered IOAPIC interrupts. Which is weird, and I 
believe totally unecessary.]


 
 
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  ---
   arch/x86/kvm/ioapic.c |  9 ++---
   arch/x86/kvm/ioapic.h |  3 +--
   arch/x86/kvm/lapic.c  | 19 ++-
   arch/x86/kvm/lapic.h  |  1 -
   arch/x86/kvm/x86.c|  5 +
   5 files changed, 14 insertions(+), 23 deletions(-)
  diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
  index 856f79105bb5..eaf4ec38d980 100644
  --- a/arch/x86/kvm/ioapic.c
  +++ b/arch/x86/kvm/ioapic.c
  @@ -246,8 +246,7 @@ static void update_handled_vectors(struct kvm_ioapic
  *ioapic)
  smp_wmb();
   }
  -void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
  -   u32 *tmr)
  +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
   {
  struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
  union kvm_ioapic_redirect_entry *e;
  @@ -260,13 +259,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
  u64 *eoi_exit_bitmap,
  kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, 
  index) ||
  index == RTC_GSI) { if 
  (kvm_apic_match_dest(vcpu, NULL, 0,
  -   e-fields.dest_id, e-fields.dest_mode)) {
  +   e-fields.dest_id, e-fields.dest_mode))
  __set_bit(e-fields.vector,
  (unsigned long *)eoi_exit_bitmap);
  -   if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG)
  -   __set_bit(e-fields.vector, -   
  (unsigned long *)tmr); -
  }
  }
  }
  spin_unlock(ioapic-lock);
  diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
  index ca0b0b4e6256..3dbd0e2aac4e 100644
  --- a/arch/x86/kvm/ioapic.h
  +++ b/arch/x86/kvm/ioapic.h
  @@ -120,7 +120,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct
  kvm_lapic *src,
  struct kvm_lapic_irq *irq, unsigned long *dest_map);
   int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
   int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
  -void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
  -   u32 *tmr);
  +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
  
   #endif
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index 2a5ca97c263b..9be64c77d6db 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -551,15 +551,6 @@ static void pv_eoi_clr_pending(struct kvm_vcpu
  *vcpu)
  __clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention);
   }
  -void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
  -{
  -   struct kvm_lapic *apic = vcpu-arch.apic;
  -   int i;
  -
  -   for (i = 0; i  8; i++)
  -   apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]);
  -}
  -
   static void apic_update_ppr(struct kvm_lapic *apic)
   {
  u32 tpr, isrv, ppr, old_ppr;
  @@ -781,6 +772,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int
  delivery_mode,
  case APIC_DM_LOWEST:
  vcpu-arch.apic_arb_prio++;
  case APIC_DM_FIXED:
  +   if (unlikely(trig_mode  !level))
  +   break;
  +
  /* FIXME add logic for vcpu on reset */
  if (unlikely(!apic_enabled(apic)))
  break;
  @@ -790,6 +784,13 @@ static int 

Re: [PATCH 3/4] KVM: i8254: remove unnecessary irqchip_in_kernel check

2015-07-29 Thread Steve Rutherford
On Wed, Jul 29, 2015 at 03:28:57PM +0200, Paolo Bonzini wrote:
> The PIT is only created if irqchip_in_kernel returns true, so the
> check is superfluous.
I poked around. Looks to me like the existence of an IOAPIC is not
checked on the creation of the in-kernel PIT. Userspace might limit itself to
that scenario (PIT implies IOAPIC in-kernel), but that isn't enforced at PIT
creation.

It's worth adding that check in.

> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/i8254.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index f90952f64e79..f588eb7bdf45 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -333,7 +333,7 @@ static void create_pit_timer(struct kvm *kvm, u32 val, 
> int is_period)
>   struct kvm_kpit_state *ps = >arch.vpit->pit_state;
>   s64 interval;
>  
> - if (!irqchip_in_kernel(kvm) || ps->flags & KVM_PIT_FLAGS_HPET_LEGACY)
> + if (ps->flags & KVM_PIT_FLAGS_HPET_LEGACY)
>   return;
>  
>   interval = muldiv64(val, NSEC_PER_SEC, KVM_PIT_FREQ);
> -- 
> 1.8.3.1
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] KVM: x86: store IOAPIC-handled vectors in each VCPU

2015-07-29 Thread Steve Rutherford
On Wed, Jul 29, 2015 at 03:37:35PM +0200, Paolo Bonzini wrote:
> We can reuse the algorithm that computes the EOI exit bitmap to figure
> out which vectors are handled by the IOAPIC.  The only difference
> between the two is for edge-triggered interrupts other than IRQ8
> that have no notifiers active; however, the IOAPIC does not have to
> do anything special for these interrupts anyway.
> 
> This again limits the interactions between the IOAPIC and the LAPIC,
> making it easier to move the former to userspace.
> 
> Inspired by a patch from Steve Rutherford.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/include/asm/kvm_host.h |  3 ++-
>  arch/x86/kvm/ioapic.c   | 18 ++
>  arch/x86/kvm/ioapic.h   |  8 
>  arch/x86/kvm/lapic.c| 10 --
>  arch/x86/kvm/svm.c  |  2 +-
>  arch/x86/kvm/vmx.c  |  3 ++-
>  arch/x86/kvm/x86.c  |  8 +++-
>  7 files changed, 18 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2f9e504f9f0c..d0e991ef6ef0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -383,6 +383,7 @@ struct kvm_vcpu_arch {
>   u64 efer;
>   u64 apic_base;
>   struct kvm_lapic *apic;/* kernel irqchip context */
> + u64 eoi_exit_bitmap[4];
>   unsigned long apic_attention;
>   int32_t apic_arb_prio;
>   int mp_state;
> @@ -808,7 +809,7 @@ struct kvm_x86_ops {
>   int (*vm_has_apicv)(struct kvm *kvm);
>   void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
>   void (*hwapic_isr_update)(struct kvm *kvm, int isr);
> - void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
> + void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
>   void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>   void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
>   void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index eaf4ec38d980..2dcda0f188ba 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -233,19 +233,6 @@ static void kvm_ioapic_inject_all(struct kvm_ioapic 
> *ioapic, unsigned long irr)
>  }
>  
>  
> -static void update_handled_vectors(struct kvm_ioapic *ioapic)
> -{
> - DECLARE_BITMAP(handled_vectors, 256);
> - int i;
> -
> - memset(handled_vectors, 0, sizeof(handled_vectors));
> - for (i = 0; i < IOAPIC_NUM_PINS; ++i)
> - __set_bit(ioapic->redirtbl[i].fields.vector, handled_vectors);
> - memcpy(ioapic->handled_vectors, handled_vectors,
> -sizeof(handled_vectors));
> - smp_wmb();
> -}
> -
>  void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>  {
>   struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
> @@ -310,7 +297,6 @@ static void ioapic_write_indirect(struct kvm_ioapic 
> *ioapic, u32 val)
>   e->bits |= (u32) val;
>   e->fields.remote_irr = 0;
>   }
> - update_handled_vectors(ioapic);
>   mask_after = e->fields.mask;
>   if (mask_before != mask_after)
>   kvm_fire_mask_notifiers(ioapic->kvm, 
> KVM_IRQCHIP_IOAPIC, index, mask_after);
> @@ -594,7 +580,6 @@ static void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
>   ioapic->id = 0;
>   memset(ioapic->irq_eoi, 0x00, IOAPIC_NUM_PINS);
>   rtc_irq_eoi_tracking_reset(ioapic);
> - update_handled_vectors(ioapic);
>  }
>  
>  static const struct kvm_io_device_ops ioapic_mmio_ops = {
> @@ -623,8 +608,10 @@ int kvm_ioapic_init(struct kvm *kvm)
>   if (ret < 0) {
>   kvm->arch.vioapic = NULL;
>   kfree(ioapic);
> + return ret;
>   }
>  
> + kvm_vcpu_request_scan_ioapic(kvm);
>   return ret;
>  }
>  
> @@ -661,7 +648,6 @@ int kvm_set_ioapic(struct kvm *kvm, struct 
> kvm_ioapic_state *state)
>   memcpy(ioapic, state, sizeof(struct kvm_ioapic_state));
>   ioapic->irr = 0;
>   ioapic->irr_delivered = 0;
> - update_handled_vectors(ioapic);
>   kvm_vcpu_request_scan_ioapic(kvm);
>   kvm_ioapic_inject_all(ioapic, state->irr);
>   spin_unlock(>lock);
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index 3dbd0e2aac4e..bf36d66a1951 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -73,7 +73,6 @@ struct kvm_ioapic {
>   struct kvm *kvm;
>   void (*ack_notifier)(

Re: [PATCH 1/2] KVM: x86: set TMR when the interrupt is accepted

2015-07-29 Thread Steve Rutherford
On Wed, Jul 29, 2015 at 03:37:34PM +0200, Paolo Bonzini wrote:
> Do not compute TMR in advance.  Instead, set the TMR just before the interrupt
> is accepted into the IRR.  This limits the coupling between IOAPIC and LAPIC.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/ioapic.c |  9 ++---
>  arch/x86/kvm/ioapic.h |  3 +--
>  arch/x86/kvm/lapic.c  | 19 ++-
>  arch/x86/kvm/lapic.h  |  1 -
>  arch/x86/kvm/x86.c|  5 +
>  5 files changed, 14 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 856f79105bb5..eaf4ec38d980 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -246,8 +246,7 @@ static void update_handled_vectors(struct kvm_ioapic 
> *ioapic)
>   smp_wmb();
>  }
>  
> -void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
> - u32 *tmr)
> +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>  {
>   struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>   union kvm_ioapic_redirect_entry *e;
> @@ -260,13 +259,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
> *eoi_exit_bitmap,
>   kvm_irq_has_notifier(ioapic->kvm, KVM_IRQCHIP_IOAPIC, 
> index) ||
>   index == RTC_GSI) {
>   if (kvm_apic_match_dest(vcpu, NULL, 0,
> - e->fields.dest_id, e->fields.dest_mode)) {
> + e->fields.dest_id, e->fields.dest_mode))
>   __set_bit(e->fields.vector,
>   (unsigned long *)eoi_exit_bitmap);
> - if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG)
> - __set_bit(e->fields.vector,
> - (unsigned long *)tmr);
> - }
>   }
>   }
>   spin_unlock(>lock);
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index ca0b0b4e6256..3dbd0e2aac4e 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -120,7 +120,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
> kvm_lapic *src,
>   struct kvm_lapic_irq *irq, unsigned long *dest_map);
>  int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
>  int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
> -void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
> - u32 *tmr);
> +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
>  
>  #endif
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 2a5ca97c263b..9be64c77d6db 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -551,15 +551,6 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
>   __clear_bit(KVM_APIC_PV_EOI_PENDING, >arch.apic_attention);
>  }
>  
> -void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
> -{
> - struct kvm_lapic *apic = vcpu->arch.apic;
> - int i;
> -
> - for (i = 0; i < 8; i++)
> - apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]);
> -}
> -
>  static void apic_update_ppr(struct kvm_lapic *apic)
>  {
>   u32 tpr, isrv, ppr, old_ppr;
> @@ -781,6 +772,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
> delivery_mode,
>   case APIC_DM_LOWEST:
>   vcpu->arch.apic_arb_prio++;
>   case APIC_DM_FIXED:
> + if (unlikely(trig_mode && !level))
> + break;
> +
>   /* FIXME add logic for vcpu on reset */
>   if (unlikely(!apic_enabled(apic)))
>   break;
> @@ -790,6 +784,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
> delivery_mode,
>   if (dest_map)
>   __set_bit(vcpu->vcpu_id, dest_map);
>  
> + if (apic_test_vector(vector, apic->regs + APIC_TMR) != 
> !!trig_mode) {
> + if (trig_mode)
> + apic_set_vector(vector, apic->regs + APIC_TMR);
> + else
> + apic_clear_vector(vector, apic->regs + 
> APIC_TMR);
> + }
> +
>   if (kvm_x86_ops->deliver_posted_interrupt)
>   kvm_x86_ops->deliver_posted_interrupt(vcpu, vector);
>   else {
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 764037991d26..eb46d6bcaa75 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -57,7 +57,6 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
>  u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
>  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
>  
> -void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
>  void __kvm_apic_update_irr(u32 *pir, void *regs);
>  void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
>  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
> diff 

Re: [PATCH 4/4] KVM: x86: clean/fix memory barriers in irqchip_in_kernel

2015-07-29 Thread Steve Rutherford
On Wed, Jul 29, 2015 at 03:28:58PM +0200, Paolo Bonzini wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2d62229aac26..23e47a0b054b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3626,30 +3626,25 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   r = kvm_ioapic_init(kvm);
>   if (r) {
>   mutex_lock(>slots_lock);
> - kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
> -   >dev_master);
> - kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
> -   >dev_slave);
> - kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
> -   >dev_eclr);
> + kvm_destroy_pic(vpic);
>   mutex_unlock(>slots_lock);
> - kfree(vpic);
>   goto create_irqchip_unlock;
>   }
>   } else
>   goto create_irqchip_unlock;
> - smp_wmb();
> - kvm->arch.vpic = vpic;
> - smp_wmb();
>   r = kvm_setup_default_irq_routing(kvm);
>   if (r) {
>   mutex_lock(>slots_lock);
>   mutex_lock(>irq_lock);
>   kvm_ioapic_destroy(kvm);
> - kvm_destroy_pic(kvm);
> + kvm_destroy_pic(vpic);
>   mutex_unlock(>irq_lock);
>   mutex_unlock(>slots_lock);
> + goto create_irqchip_unlock;
>   }
> + /* Write kvm->irq_routing before kvm->arch.vpic.  */
> + smp_wmb();
I assume this pairs with irqchip_in_kernel? 
> + kvm->arch.vpic = vpic;
>   create_irqchip_unlock:
>   mutex_unlock(>lock);
>   break;
> -- 
> 1.8.3.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] KVM: x86: set TMR when the interrupt is accepted

2015-07-29 Thread Steve Rutherford
On Wed, Jul 29, 2015 at 03:37:34PM +0200, Paolo Bonzini wrote:
 Do not compute TMR in advance.  Instead, set the TMR just before the interrupt
 is accepted into the IRR.  This limits the coupling between IOAPIC and LAPIC.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/ioapic.c |  9 ++---
  arch/x86/kvm/ioapic.h |  3 +--
  arch/x86/kvm/lapic.c  | 19 ++-
  arch/x86/kvm/lapic.h  |  1 -
  arch/x86/kvm/x86.c|  5 +
  5 files changed, 14 insertions(+), 23 deletions(-)
 
 diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
 index 856f79105bb5..eaf4ec38d980 100644
 --- a/arch/x86/kvm/ioapic.c
 +++ b/arch/x86/kvm/ioapic.c
 @@ -246,8 +246,7 @@ static void update_handled_vectors(struct kvm_ioapic 
 *ioapic)
   smp_wmb();
  }
  
 -void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
 - u32 *tmr)
 +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
  {
   struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
   union kvm_ioapic_redirect_entry *e;
 @@ -260,13 +259,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
 *eoi_exit_bitmap,
   kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, 
 index) ||
   index == RTC_GSI) {
   if (kvm_apic_match_dest(vcpu, NULL, 0,
 - e-fields.dest_id, e-fields.dest_mode)) {
 + e-fields.dest_id, e-fields.dest_mode))
   __set_bit(e-fields.vector,
   (unsigned long *)eoi_exit_bitmap);
 - if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG)
 - __set_bit(e-fields.vector,
 - (unsigned long *)tmr);
 - }
   }
   }
   spin_unlock(ioapic-lock);
 diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
 index ca0b0b4e6256..3dbd0e2aac4e 100644
 --- a/arch/x86/kvm/ioapic.h
 +++ b/arch/x86/kvm/ioapic.h
 @@ -120,7 +120,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
 kvm_lapic *src,
   struct kvm_lapic_irq *irq, unsigned long *dest_map);
  int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
  int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 -void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
 - u32 *tmr);
 +void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
  
  #endif
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 2a5ca97c263b..9be64c77d6db 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -551,15 +551,6 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
   __clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention);
  }
  
 -void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
 -{
 - struct kvm_lapic *apic = vcpu-arch.apic;
 - int i;
 -
 - for (i = 0; i  8; i++)
 - apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]);
 -}
 -
  static void apic_update_ppr(struct kvm_lapic *apic)
  {
   u32 tpr, isrv, ppr, old_ppr;
 @@ -781,6 +772,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
 delivery_mode,
   case APIC_DM_LOWEST:
   vcpu-arch.apic_arb_prio++;
   case APIC_DM_FIXED:
 + if (unlikely(trig_mode  !level))
 + break;
 +
   /* FIXME add logic for vcpu on reset */
   if (unlikely(!apic_enabled(apic)))
   break;
 @@ -790,6 +784,13 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
 delivery_mode,
   if (dest_map)
   __set_bit(vcpu-vcpu_id, dest_map);
  
 + if (apic_test_vector(vector, apic-regs + APIC_TMR) != 
 !!trig_mode) {
 + if (trig_mode)
 + apic_set_vector(vector, apic-regs + APIC_TMR);
 + else
 + apic_clear_vector(vector, apic-regs + 
 APIC_TMR);
 + }
 +
   if (kvm_x86_ops-deliver_posted_interrupt)
   kvm_x86_ops-deliver_posted_interrupt(vcpu, vector);
   else {
 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
 index 764037991d26..eb46d6bcaa75 100644
 --- a/arch/x86/kvm/lapic.h
 +++ b/arch/x86/kvm/lapic.h
 @@ -57,7 +57,6 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
  u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
  
 -void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
  void __kvm_apic_update_irr(u32 *pir, void *regs);
  void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 23e47a0b054b..48dc954542db 100644
 --- 

Re: [PATCH 2/2] KVM: x86: store IOAPIC-handled vectors in each VCPU

2015-07-29 Thread Steve Rutherford
On Wed, Jul 29, 2015 at 03:37:35PM +0200, Paolo Bonzini wrote:
 We can reuse the algorithm that computes the EOI exit bitmap to figure
 out which vectors are handled by the IOAPIC.  The only difference
 between the two is for edge-triggered interrupts other than IRQ8
 that have no notifiers active; however, the IOAPIC does not have to
 do anything special for these interrupts anyway.
 
 This again limits the interactions between the IOAPIC and the LAPIC,
 making it easier to move the former to userspace.
 
 Inspired by a patch from Steve Rutherford.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |  3 ++-
  arch/x86/kvm/ioapic.c   | 18 ++
  arch/x86/kvm/ioapic.h   |  8 
  arch/x86/kvm/lapic.c| 10 --
  arch/x86/kvm/svm.c  |  2 +-
  arch/x86/kvm/vmx.c  |  3 ++-
  arch/x86/kvm/x86.c  |  8 +++-
  7 files changed, 18 insertions(+), 34 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 2f9e504f9f0c..d0e991ef6ef0 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -383,6 +383,7 @@ struct kvm_vcpu_arch {
   u64 efer;
   u64 apic_base;
   struct kvm_lapic *apic;/* kernel irqchip context */
 + u64 eoi_exit_bitmap[4];
   unsigned long apic_attention;
   int32_t apic_arb_prio;
   int mp_state;
 @@ -808,7 +809,7 @@ struct kvm_x86_ops {
   int (*vm_has_apicv)(struct kvm *kvm);
   void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
   void (*hwapic_isr_update)(struct kvm *kvm, int isr);
 - void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
 + void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
   void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
   void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
   void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
 diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
 index eaf4ec38d980..2dcda0f188ba 100644
 --- a/arch/x86/kvm/ioapic.c
 +++ b/arch/x86/kvm/ioapic.c
 @@ -233,19 +233,6 @@ static void kvm_ioapic_inject_all(struct kvm_ioapic 
 *ioapic, unsigned long irr)
  }
  
  
 -static void update_handled_vectors(struct kvm_ioapic *ioapic)
 -{
 - DECLARE_BITMAP(handled_vectors, 256);
 - int i;
 -
 - memset(handled_vectors, 0, sizeof(handled_vectors));
 - for (i = 0; i  IOAPIC_NUM_PINS; ++i)
 - __set_bit(ioapic-redirtbl[i].fields.vector, handled_vectors);
 - memcpy(ioapic-handled_vectors, handled_vectors,
 -sizeof(handled_vectors));
 - smp_wmb();
 -}
 -
  void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
  {
   struct kvm_ioapic *ioapic = vcpu-kvm-arch.vioapic;
 @@ -310,7 +297,6 @@ static void ioapic_write_indirect(struct kvm_ioapic 
 *ioapic, u32 val)
   e-bits |= (u32) val;
   e-fields.remote_irr = 0;
   }
 - update_handled_vectors(ioapic);
   mask_after = e-fields.mask;
   if (mask_before != mask_after)
   kvm_fire_mask_notifiers(ioapic-kvm, 
 KVM_IRQCHIP_IOAPIC, index, mask_after);
 @@ -594,7 +580,6 @@ static void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
   ioapic-id = 0;
   memset(ioapic-irq_eoi, 0x00, IOAPIC_NUM_PINS);
   rtc_irq_eoi_tracking_reset(ioapic);
 - update_handled_vectors(ioapic);
  }
  
  static const struct kvm_io_device_ops ioapic_mmio_ops = {
 @@ -623,8 +608,10 @@ int kvm_ioapic_init(struct kvm *kvm)
   if (ret  0) {
   kvm-arch.vioapic = NULL;
   kfree(ioapic);
 + return ret;
   }
  
 + kvm_vcpu_request_scan_ioapic(kvm);
   return ret;
  }
  
 @@ -661,7 +648,6 @@ int kvm_set_ioapic(struct kvm *kvm, struct 
 kvm_ioapic_state *state)
   memcpy(ioapic, state, sizeof(struct kvm_ioapic_state));
   ioapic-irr = 0;
   ioapic-irr_delivered = 0;
 - update_handled_vectors(ioapic);
   kvm_vcpu_request_scan_ioapic(kvm);
   kvm_ioapic_inject_all(ioapic, state-irr);
   spin_unlock(ioapic-lock);
 diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
 index 3dbd0e2aac4e..bf36d66a1951 100644
 --- a/arch/x86/kvm/ioapic.h
 +++ b/arch/x86/kvm/ioapic.h
 @@ -73,7 +73,6 @@ struct kvm_ioapic {
   struct kvm *kvm;
   void (*ack_notifier)(void *opaque, int irq);
   spinlock_t lock;
 - DECLARE_BITMAP(handled_vectors, 256);
   struct rtc_status rtc_status;
   struct delayed_work eoi_inject;
   u32 irq_eoi[IOAPIC_NUM_PINS];
 @@ -98,13 +97,6 @@ static inline struct kvm_ioapic *ioapic_irqchip(struct kvm 
 *kvm)
   return kvm-arch.vioapic;
  }
  
 -static inline bool kvm_ioapic_handles_vector(struct kvm *kvm, int vector)
 -{
 - struct kvm_ioapic *ioapic = kvm-arch.vioapic;
 - smp_rmb();
 - return

Re: [PATCH 3/4] KVM: i8254: remove unnecessary irqchip_in_kernel check

2015-07-29 Thread Steve Rutherford
On Wed, Jul 29, 2015 at 03:28:57PM +0200, Paolo Bonzini wrote:
 The PIT is only created if irqchip_in_kernel returns true, so the
 check is superfluous.
I poked around. Looks to me like the existence of an IOAPIC is not
checked on the creation of the in-kernel PIT. Userspace might limit itself to
that scenario (PIT implies IOAPIC in-kernel), but that isn't enforced at PIT
creation.

It's worth adding that check in.

 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/i8254.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
 index f90952f64e79..f588eb7bdf45 100644
 --- a/arch/x86/kvm/i8254.c
 +++ b/arch/x86/kvm/i8254.c
 @@ -333,7 +333,7 @@ static void create_pit_timer(struct kvm *kvm, u32 val, 
 int is_period)
   struct kvm_kpit_state *ps = kvm-arch.vpit-pit_state;
   s64 interval;
  
 - if (!irqchip_in_kernel(kvm) || ps-flags  KVM_PIT_FLAGS_HPET_LEGACY)
 + if (ps-flags  KVM_PIT_FLAGS_HPET_LEGACY)
   return;
  
   interval = muldiv64(val, NSEC_PER_SEC, KVM_PIT_FREQ);
 -- 
 1.8.3.1
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] KVM: x86: clean/fix memory barriers in irqchip_in_kernel

2015-07-29 Thread Steve Rutherford
On Wed, Jul 29, 2015 at 03:28:58PM +0200, Paolo Bonzini wrote:
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 2d62229aac26..23e47a0b054b 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3626,30 +3626,25 @@ long kvm_arch_vm_ioctl(struct file *filp,
   r = kvm_ioapic_init(kvm);
   if (r) {
   mutex_lock(kvm-slots_lock);
 - kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
 -   vpic-dev_master);
 - kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
 -   vpic-dev_slave);
 - kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
 -   vpic-dev_eclr);
 + kvm_destroy_pic(vpic);
   mutex_unlock(kvm-slots_lock);
 - kfree(vpic);
   goto create_irqchip_unlock;
   }
   } else
   goto create_irqchip_unlock;
 - smp_wmb();
 - kvm-arch.vpic = vpic;
 - smp_wmb();
   r = kvm_setup_default_irq_routing(kvm);
   if (r) {
   mutex_lock(kvm-slots_lock);
   mutex_lock(kvm-irq_lock);
   kvm_ioapic_destroy(kvm);
 - kvm_destroy_pic(kvm);
 + kvm_destroy_pic(vpic);
   mutex_unlock(kvm-irq_lock);
   mutex_unlock(kvm-slots_lock);
 + goto create_irqchip_unlock;
   }
 + /* Write kvm-irq_routing before kvm-arch.vpic.  */
 + smp_wmb();
I assume this pairs with irqchip_in_kernel? 
 + kvm-arch.vpic = vpic;
   create_irqchip_unlock:
   mutex_unlock(kvm-lock);
   break;
 -- 
 1.8.3.1
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/