Re: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

2021-03-09 Thread Tom Lendacky
On 3/8/21 8:19 PM, Sean Christopherson wrote:
> Drop bit 11, used for the MMU_PRESENT flag, from the set of bits used to
> store the generation number in MMIO SPTEs.  MMIO SPTEs with bit 11 set,
> which occurs when userspace creates 128+ memslots in an address space,
> get false positives for is_shadow_present_spte(), which lead to a variety
> of fireworks, crashes KVM, and likely hangs the host kernel.
> 
> Fixes: b14e28f37e9b ("KVM: x86/mmu: Use a dedicated bit to track 
> shadow/MMU-present SPTEs")
> Reported-by: Tom Lendacky 

Fixes the issue for me. Thanks, Sean.

Tested-by: Tom Lendacky 

> Reported-by: Paolo Bonzini 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/mmu/spte.h | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index b53036d9ddf3..bca0ba11cccf 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -101,11 +101,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & 
> SHADOW_ACC_TRACK_SAVED_MASK));
>  #undef SHADOW_ACC_TRACK_SAVED_MASK
>  
>  /*
> - * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
> + * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
>   * the memslots generation and is derived as follows:
>   *
> - * Bits 0-8 of the MMIO generation are propagated to spte bits 3-11
> - * Bits 9-19 of the MMIO generation are propagated to spte bits 52-62
> + * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
> + * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
>   *
>   * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included 
> in
>   * the MMIO generation number, as doing so would require stealing a bit from
> @@ -116,7 +116,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & 
> SHADOW_ACC_TRACK_SAVED_MASK));
>   */
>  
>  #define MMIO_SPTE_GEN_LOW_START  3
> -#define MMIO_SPTE_GEN_LOW_END11
> +#define MMIO_SPTE_GEN_LOW_END10
>  
>  #define MMIO_SPTE_GEN_HIGH_START 52
>  #define MMIO_SPTE_GEN_HIGH_END   62
> @@ -125,12 +125,14 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & 
> SHADOW_ACC_TRACK_SAVED_MASK));
>   MMIO_SPTE_GEN_LOW_START)
>  #define MMIO_SPTE_GEN_HIGH_MASK  
> GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \
>   MMIO_SPTE_GEN_HIGH_START)
> +static_assert(!(SPTE_MMU_PRESENT_MASK &
> + (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));
>  
>  #define MMIO_SPTE_GEN_LOW_BITS   (MMIO_SPTE_GEN_LOW_END - 
> MMIO_SPTE_GEN_LOW_START + 1)
>  #define MMIO_SPTE_GEN_HIGH_BITS  (MMIO_SPTE_GEN_HIGH_END - 
> MMIO_SPTE_GEN_HIGH_START + 1)
>  
>  /* remember to adjust the comment above as well if you change these */
> -static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
> +static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
>  
>  #define MMIO_SPTE_GEN_LOW_SHIFT  (MMIO_SPTE_GEN_LOW_START - 0)
>  #define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - 
> MMIO_SPTE_GEN_LOW_BITS)
> 


Re: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

2021-03-09 Thread Maxim Levitsky
On Tue, 2021-03-09 at 14:12 +0100, Paolo Bonzini wrote:
> On 09/03/21 11:09, Maxim Levitsky wrote:
> > What happens if mmio generation overflows (e.g if userspace keeps on 
> > updating the memslots)?
> > In theory if we have a SPTE with a stale generation, it can became valid, 
> > no?
> > 
> > I think that we should in the case of the overflow zap all mmio sptes.
> > What do you think?
> 
> Zapping all MMIO SPTEs is done by updating the generation count.  When 
> it overflows, all SPs are zapped:
> 
>  /*
>   * The very rare case: if the MMIO generation number has wrapped,
>   * zap all shadow pages.
>   */
>  if (unlikely(gen == 0)) {
>  kvm_debug_ratelimited("kvm: zapping shadow pages for 
> mmio generation wraparound\n");
>  kvm_mmu_zap_all_fast(kvm);
>  }
> 
> So giving it more bits make this more rare, at the same time having to 
> remove one or two bits is not the end of the world.

This is exactly what I expected to happen, I just didn't find that code.
Thanks for explanation, and it shows that I didn't study the mmio spte
code much.

Best regards,
Maxim Levitsky

> 
> Paolo
> 




Re: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

2021-03-09 Thread Paolo Bonzini

On 09/03/21 11:09, Maxim Levitsky wrote:

What happens if mmio generation overflows (e.g if userspace keeps on updating 
the memslots)?
In theory if we have a SPTE with a stale generation, it can became valid, no?

I think that we should in the case of the overflow zap all mmio sptes.
What do you think?


Zapping all MMIO SPTEs is done by updating the generation count.  When 
it overflows, all SPs are zapped:


/*
 * The very rare case: if the MMIO generation number has wrapped,
 * zap all shadow pages.
 */
if (unlikely(gen == 0)) {
kvm_debug_ratelimited("kvm: zapping shadow pages for 
mmio generation wraparound\n");

kvm_mmu_zap_all_fast(kvm);
}

So giving it more bits make this more rare, at the same time having to 
remove one or two bits is not the end of the world.


Paolo



Re: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

2021-03-09 Thread Maxim Levitsky
On Mon, 2021-03-08 at 18:19 -0800, Sean Christopherson wrote:
> Drop bit 11, used for the MMU_PRESENT flag, from the set of bits used to
> store the generation number in MMIO SPTEs.  MMIO SPTEs with bit 11 set,
> which occurs when userspace creates 128+ memslots in an address space,
> get false positives for is_shadow_present_spte(), which lead to a variety
> of fireworks, crashes KVM, and likely hangs the host kernel.
> 
> Fixes: b14e28f37e9b ("KVM: x86/mmu: Use a dedicated bit to track 
> shadow/MMU-present SPTEs")
> Reported-by: Tom Lendacky 
> Reported-by: Paolo Bonzini 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/mmu/spte.h | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index b53036d9ddf3..bca0ba11cccf 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -101,11 +101,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & 
> SHADOW_ACC_TRACK_SAVED_MASK));
>  #undef SHADOW_ACC_TRACK_SAVED_MASK
>  
>  /*
> - * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
> + * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
>   * the memslots generation and is derived as follows:
>   *
> - * Bits 0-8 of the MMIO generation are propagated to spte bits 3-11
> - * Bits 9-19 of the MMIO generation are propagated to spte bits 52-62
> + * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
> + * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
>   *
>   * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included 
> in
>   * the MMIO generation number, as doing so would require stealing a bit from
> @@ -116,7 +116,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & 
> SHADOW_ACC_TRACK_SAVED_MASK));
>   */
>  
>  #define MMIO_SPTE_GEN_LOW_START  3
> -#define MMIO_SPTE_GEN_LOW_END11
> +#define MMIO_SPTE_GEN_LOW_END10
>  
>  #define MMIO_SPTE_GEN_HIGH_START 52
>  #define MMIO_SPTE_GEN_HIGH_END   62
> @@ -125,12 +125,14 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & 
> SHADOW_ACC_TRACK_SAVED_MASK));
>   MMIO_SPTE_GEN_LOW_START)
>  #define MMIO_SPTE_GEN_HIGH_MASK  
> GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \
>   MMIO_SPTE_GEN_HIGH_START)
> +static_assert(!(SPTE_MMU_PRESENT_MASK &
> + (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));
>  
>  #define MMIO_SPTE_GEN_LOW_BITS   (MMIO_SPTE_GEN_LOW_END - 
> MMIO_SPTE_GEN_LOW_START + 1)
>  #define MMIO_SPTE_GEN_HIGH_BITS  (MMIO_SPTE_GEN_HIGH_END - 
> MMIO_SPTE_GEN_HIGH_START + 1)
>  
>  /* remember to adjust the comment above as well if you change these */
> -static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
> +static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
>  
>  #define MMIO_SPTE_GEN_LOW_SHIFT  (MMIO_SPTE_GEN_LOW_START - 0)
>  #define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - 
> MMIO_SPTE_GEN_LOW_BITS)
I bisected this and I reached the same conclusion that bit 11 has to be removed 
from mmio generation mask.

Reviewed-by: Maxim Levitsky 
 
I do wonder, why do we need 19 (and now 18 bits) for the mmio generation:

What happens if mmio generation overflows (e.g if userspace keeps on updating 
the memslots)? 
In theory if we have a SPTE with a stale generation, it can became valid, no?

I think that we should in the case of the overflow zap all mmio sptes.
What do you think?

Best regards,
Maxim Levitsky