Andrew Jones <drjo...@redhat.com> writes:

> On Tue, Jul 28, 2020 at 04:37:40PM +0200, Vitaly Kuznetsov wrote:
>> PCIe config space can (depending on the configuration) be quite big but
>> usually is sparsely populated. Guest may scan it by accessing individual
>> device's page which, when device is missing, is supposed to have 'pci
>> hole' semantics: reads return '0xff' and writes get discarded. Compared
>> to the already existing KVM_MEM_READONLY, VMM doesn't need to allocate
>> real memory and stuff it with '0xff'.
>> 
>> Suggested-by: Michael S. Tsirkin <m...@redhat.com>
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> ---
>>  Documentation/virt/kvm/api.rst  | 19 +++++++++++-----
>>  arch/x86/include/uapi/asm/kvm.h |  1 +
>>  arch/x86/kvm/mmu/mmu.c          |  5 ++++-
>>  arch/x86/kvm/mmu/paging_tmpl.h  |  3 +++
>>  arch/x86/kvm/x86.c              | 10 ++++++---
>>  include/linux/kvm_host.h        |  7 +++++-
>>  include/uapi/linux/kvm.h        |  3 ++-
>>  virt/kvm/kvm_main.c             | 39 +++++++++++++++++++++++++++------
>>  8 files changed, 68 insertions(+), 19 deletions(-)
>> 
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 644e5326aa50..fbbf533a331b 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -1241,6 +1241,7 @@ yet and must be cleared on entry.
>>    /* for kvm_memory_region::flags */
>>    #define KVM_MEM_LOG_DIRTY_PAGES   (1UL << 0)
>>    #define KVM_MEM_READONLY  (1UL << 1)
>> +  #define KVM_MEM_PCI_HOLE          (1UL << 2)
>>  
>>  This ioctl allows the user to create, modify or delete a guest physical
>>  memory slot.  Bits 0-15 of "slot" specify the slot id and this value
>> @@ -1268,12 +1269,18 @@ It is recommended that the lower 21 bits of 
>> guest_phys_addr and userspace_addr
>>  be identical.  This allows large pages in the guest to be backed by large
>>  pages in the host.
>>  
>> -The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
>> -KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
>> -writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how 
>> to
>> -use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows 
>> it,
>> -to make a new slot read-only.  In this case, writes to this memory will be
>> -posted to userspace as KVM_EXIT_MMIO exits.
>> +The flags field supports the following flags: KVM_MEM_LOG_DIRTY_PAGES,
>> +KVM_MEM_READONLY, KVM_MEM_READONLY:
>
> The second KVM_MEM_READONLY should be KVM_MEM_PCI_HOLE. Or just drop the
> list here, as they're listed below anyway
>
>> +- KVM_MEM_LOG_DIRTY_PAGES can be set to instruct KVM to keep track of 
>> writes to
>> +  memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to use 
>> it.
>> +- KVM_MEM_READONLY can be set, if KVM_CAP_READONLY_MEM capability allows it,
>> +  to make a new slot read-only.  In this case, writes to this memory will be
>> +  posted to userspace as KVM_EXIT_MMIO exits.
>> +- KVM_MEM_PCI_HOLE can be set, if KVM_CAP_PCI_HOLE_MEM capability allows it,
>> +  to create a new virtual read-only slot which will always return '0xff' 
>> when
>> +  guest reads from it. 'userspace_addr' has to be set to NULL. This flag is
>> +  mutually exclusive with KVM_MEM_LOG_DIRTY_PAGES/KVM_MEM_READONLY. All 
>> writes
>> +  to this memory will be posted to userspace as KVM_EXIT_MMIO exits.
>
> I see 2/3's of this text is copy+pasted from above, but how about this
>
>  - KVM_MEM_LOG_DIRTY_PAGES: log writes.  Use KVM_GET_DIRTY_LOG to retreive
>    the log.
>  - KVM_MEM_READONLY: exit to userspace with KVM_EXIT_MMIO on writes.  Only
>    available when KVM_CAP_READONLY_MEM is present.
>  - KVM_MEM_PCI_HOLE: always return 0xff on reads, exit to userspace with
>    KVM_EXIT_MMIO on writes.  Only available when KVM_CAP_PCI_HOLE_MEM is
>    present.  When setting the memory region 'userspace_addr' must be NULL.
>    This flag is mutually exclusive with KVM_MEM_LOG_DIRTY_PAGES and with
>    KVM_MEM_READONLY.

Sound better, thanks! Will add in v2.

>
>>  
>>  When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
>>  the memory region are automatically reflected into the guest.  For example, 
>> an
>> diff --git a/arch/x86/include/uapi/asm/kvm.h 
>> b/arch/x86/include/uapi/asm/kvm.h
>> index 17c5a038f42d..cf80a26d74f5 100644
>> --- a/arch/x86/include/uapi/asm/kvm.h
>> +++ b/arch/x86/include/uapi/asm/kvm.h
>> @@ -48,6 +48,7 @@
>>  #define __KVM_HAVE_XSAVE
>>  #define __KVM_HAVE_XCRS
>>  #define __KVM_HAVE_READONLY_MEM
>> +#define __KVM_HAVE_PCI_HOLE_MEM
>>  
>>  /* Architectural interrupt line count. */
>>  #define KVM_NR_INTERRUPTS 256
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 8597e8102636..c2e3a1deafdd 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3253,7 +3253,7 @@ static int kvm_mmu_hugepage_adjust(struct kvm_vcpu 
>> *vcpu, gfn_t gfn,
>>              return PG_LEVEL_4K;
>>  
>>      slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, true);
>> -    if (!slot)
>> +    if (!slot || (slot->flags & KVM_MEM_PCI_HOLE))
>>              return PG_LEVEL_4K;
>>  
>>      max_level = min(max_level, max_page_level);
>> @@ -4104,6 +4104,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, 
>> gpa_t gpa, u32 error_code,
>>  
>>      slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
>>  
>> +    if (!write && slot && (slot->flags & KVM_MEM_PCI_HOLE))
>> +            return RET_PF_EMULATE;
>> +
>>      if (try_async_pf(vcpu, slot, prefault, gfn, gpa, &pfn, write,
>>                       &map_writable))
>>              return RET_PF_RETRY;
>> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
>> index 5c6a895f67c3..27abd69e69f6 100644
>> --- a/arch/x86/kvm/mmu/paging_tmpl.h
>> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
>> @@ -836,6 +836,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, 
>> gpa_t addr, u32 error_code,
>>  
>>      slot = kvm_vcpu_gfn_to_memslot(vcpu, walker.gfn);
>>  
>> +    if (!write_fault && slot && (slot->flags & KVM_MEM_PCI_HOLE))
>> +            return RET_PF_EMULATE;
>> +
>>      if (try_async_pf(vcpu, slot, prefault, walker.gfn, addr, &pfn,
>>                       write_fault, &map_writable))
>>              return RET_PF_RETRY;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 95ef62922869..dc312b8bfa05 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3515,6 +3515,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
>> ext)
>>      case KVM_CAP_EXCEPTION_PAYLOAD:
>>      case KVM_CAP_SET_GUEST_DEBUG:
>>      case KVM_CAP_LAST_CPU:
>> +    case KVM_CAP_PCI_HOLE_MEM:
>>              r = 1;
>>              break;
>>      case KVM_CAP_SYNC_REGS:
>> @@ -10115,9 +10116,11 @@ static int kvm_alloc_memslot_metadata(struct 
>> kvm_memory_slot *slot,
>>              ugfn = slot->userspace_addr >> PAGE_SHIFT;
>>              /*
>>               * If the gfn and userspace address are not aligned wrt each
>> -             * other, disable large page support for this slot.
>> +             * other, disable large page support for this slot. Also,
>> +             * disable large page support for KVM_MEM_PCI_HOLE slots.
>>               */
>> -            if ((slot->base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1)) 
>> {
>> +            if (slot->flags & KVM_MEM_PCI_HOLE || ((slot->base_gfn ^ ugfn) &
>
> Please add () around the first expression
>

Ack

>> +                                  (KVM_PAGES_PER_HPAGE(level) - 1))) {
>>                      unsigned long j;
>>  
>>                      for (j = 0; j < lpages; ++j)
>> @@ -10179,7 +10182,8 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
>>       * Nothing to do for RO slots or CREATE/MOVE/DELETE of a slot.
>>       * See comments below.
>>       */
>> -    if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY))
>> +    if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY) ||
>> +        (new->flags & KVM_MEM_PCI_HOLE))
>
> How about
>
>  if ((change != KVM_MR_FLAGS_ONLY) ||
>      (new->flags & (KVM_MEM_READONLY|KVM_MEM_PCI_HOLE)))
>

Ack

>>              return;
>>  
>>      /*
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 989afcbe642f..63c2d93ef172 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1081,7 +1081,12 @@ __gfn_to_memslot(struct kvm_memslots *slots, gfn_t 
>> gfn)
>>  static inline unsigned long
>>  __gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
>>  {
>> -    return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE;
>> +    if (likely(!(slot->flags & KVM_MEM_PCI_HOLE))) {
>> +            return slot->userspace_addr +
>> +                    (gfn - slot->base_gfn) * PAGE_SIZE;
>> +    } else {
>> +            BUG();
>
> Debug code you forgot to remove? I see below you've modified
> __gfn_to_hva_many() to return KVM_HVA_ERR_BAD already when
> given a PCI hole slot. I think that's the only check we should add.

No, this was intentional. We have at least two users of
__gfn_to_hva_memslot() today and in case we ever reach here with a
KVM_MEM_PCI_HOLE slot we're doomed anyway but it would be much easier to
debug the immediate BUG() than an invalid pointer access some time
later.

Anyway, I don't really feel strong and I'm fine with dropping the
check. Alternatively, I can suggest we add

BUG_ON(!slot->userspace_addr);

to the beginning of __gfn_to_hva_memslot() intead.

>
>> +    }
>>  }
>>  
>>  static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 2c73dcfb3dbb..59d631cbb71d 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -109,6 +109,7 @@ struct kvm_userspace_memory_region {
>>   */
>>  #define KVM_MEM_LOG_DIRTY_PAGES     (1UL << 0)
>>  #define KVM_MEM_READONLY    (1UL << 1)
>> +#define KVM_MEM_PCI_HOLE            (1UL << 2)
>>  
>>  /* for KVM_IRQ_LINE */
>>  struct kvm_irq_level {
>> @@ -1034,7 +1035,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_ASYNC_PF_INT 183
>>  #define KVM_CAP_LAST_CPU 184
>>  #define KVM_CAP_SMALLER_MAXPHYADDR 185
>> -
>> +#define KVM_CAP_PCI_HOLE_MEM 186
>>  
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>  
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 2c2c0254c2d8..3f69ae711021 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1107,6 +1107,10 @@ static int check_memory_region_flags(const struct 
>> kvm_userspace_memory_region *m
>>      valid_flags |= KVM_MEM_READONLY;
>>  #endif
>>  
>> +#ifdef __KVM_HAVE_PCI_HOLE_MEM
>> +    valid_flags |= KVM_MEM_PCI_HOLE;
>> +#endif
>> +
>>      if (mem->flags & ~valid_flags)
>>              return -EINVAL;
>>  
>> @@ -1284,11 +1288,26 @@ int __kvm_set_memory_region(struct kvm *kvm,
>>              return -EINVAL;
>>      if (mem->guest_phys_addr & (PAGE_SIZE - 1))
>>              return -EINVAL;
>> -    /* We can read the guest memory with __xxx_user() later on. */
>> -    if ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
>> -         !access_ok((void __user *)(unsigned long)mem->userspace_addr,
>> -                    mem->memory_size))
>> +
>> +    /*
>> +     * KVM_MEM_PCI_HOLE is mutually exclusive with KVM_MEM_READONLY/
>> +     * KVM_MEM_LOG_DIRTY_PAGES.
>> +     */
>> +    if ((mem->flags & KVM_MEM_PCI_HOLE) &&
>> +        (mem->flags & (KVM_MEM_READONLY | KVM_MEM_LOG_DIRTY_PAGES)))
>>              return -EINVAL;
>> +
>> +    if (!(mem->flags & KVM_MEM_PCI_HOLE)) {
>> +            /* We can read the guest memory with __xxx_user() later on. */
>> +            if ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
>> +                !access_ok((void __user *)(unsigned 
>> long)mem->userspace_addr,
>> +                           mem->memory_size))
>> +                    return -EINVAL;
>> +    } else {
>> +            if (mem->userspace_addr)
>> +                    return -EINVAL;
>> +    }
>> +
>>      if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
>>              return -EINVAL;
>>      if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
>> @@ -1328,7 +1347,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
>>      } else { /* Modify an existing slot. */
>>              if ((new.userspace_addr != old.userspace_addr) ||
>>                  (new.npages != old.npages) ||
>> -                ((new.flags ^ old.flags) & KVM_MEM_READONLY))
>> +                ((new.flags ^ old.flags) & KVM_MEM_READONLY) ||
>> +                ((new.flags ^ old.flags) & KVM_MEM_PCI_HOLE))
>>                      return -EINVAL;
>>  
>>              if (new.base_gfn != old.base_gfn)
>> @@ -1715,13 +1735,13 @@ unsigned long kvm_host_page_size(struct kvm_vcpu 
>> *vcpu, gfn_t gfn)
>>  
>>  static bool memslot_is_readonly(struct kvm_memory_slot *slot)
>>  {
>> -    return slot->flags & KVM_MEM_READONLY;
>> +    return slot->flags & (KVM_MEM_READONLY | KVM_MEM_PCI_HOLE);
>>  }
>>  
>>  static unsigned long __gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t 
>> gfn,
>>                                     gfn_t *nr_pages, bool write)
>>  {
>> -    if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
>> +    if (!slot || (slot->flags & (KVM_MEMSLOT_INVALID | KVM_MEM_PCI_HOLE)))
>>              return KVM_HVA_ERR_BAD;
>>  
>>      if (memslot_is_readonly(slot) && write)
>> @@ -2318,6 +2338,11 @@ static int __kvm_read_guest_page(struct 
>> kvm_memory_slot *slot, gfn_t gfn,
>>      int r;
>>      unsigned long addr;
>>  
>> +    if (unlikely(slot && (slot->flags & KVM_MEM_PCI_HOLE))) {
>> +            memset(data, 0xff, len);
>> +            return 0;
>> +    }
>> +
>>      addr = gfn_to_hva_memslot_prot(slot, gfn, NULL);
>>      if (kvm_is_error_hva(addr))
>>              return -EFAULT;
>> -- 
>> 2.25.4
>>
>
> I didn't really review this patch, as it's touching lots of x86 mm
> functions that I didn't want to delve into, but I took a quick look
> since I was curious about the feature.

x86 part is really negligible, I think it would be very easy to expand
the scope to other arches if needed.

Thanks!

-- 
Vitaly

Reply via email to