Re: [PATCH v2 2/4] eventfd: simplify eventfd_signal()

2023-11-22 Thread Xu Yilun
On Wed, Nov 22, 2023 at 01:48:23PM +0100, Christian Brauner wrote:
> Ever since the evenfd type was introduced back in 2007 in commit
> e1ad7468c77d ("signal/timer/event: eventfd core") the eventfd_signal()
> function only ever passed 1 as a value for @n. There's no point in
> keeping that additional argument.
> 
> Signed-off-by: Christian Brauner 
> ---
>  arch/x86/kvm/hyperv.c |  2 +-
>  arch/x86/kvm/xen.c|  2 +-
>  drivers/accel/habanalabs/common/device.c  |  2 +-
>  drivers/fpga/dfl.c|  2 +-
>  drivers/gpu/drm/drm_syncobj.c |  6 +++---
>  drivers/gpu/drm/i915/gvt/interrupt.c  |  2 +-
>  drivers/infiniband/hw/mlx5/devx.c |  2 +-
>  drivers/misc/ocxl/file.c  |  2 +-
>  drivers/s390/cio/vfio_ccw_chp.c   |  2 +-
>  drivers/s390/cio/vfio_ccw_drv.c   |  4 ++--
>  drivers/s390/cio/vfio_ccw_ops.c   |  6 +++---
>  drivers/s390/crypto/vfio_ap_ops.c |  2 +-
>  drivers/usb/gadget/function/f_fs.c|  4 ++--
>  drivers/vdpa/vdpa_user/vduse_dev.c|  6 +++---
>  drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c|  2 +-
>  drivers/vfio/pci/vfio_pci_core.c  |  6 +++---
>  drivers/vfio/pci/vfio_pci_intrs.c | 12 ++--
>  drivers/vfio/platform/vfio_platform_irq.c |  4 ++--
>  drivers/vhost/vdpa.c  |  4 ++--
>  drivers/vhost/vhost.c | 10 +-
>  drivers/vhost/vhost.h |  2 +-
>  drivers/virt/acrn/ioeventfd.c |  2 +-
>  drivers/xen/privcmd.c |  2 +-
>  fs/aio.c  |  2 +-
>  fs/eventfd.c  |  9 +++--
>  include/linux/eventfd.h   |  4 ++--
>  mm/memcontrol.c   | 10 +-
>  mm/vmpressure.c   |  2 +-
>  samples/vfio-mdev/mtty.c  |  4 ++--
>  virt/kvm/eventfd.c|  4 ++--
>  30 files changed, 60 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
> index dd7a783d53b5..e73f88050f08 100644
> --- a/drivers/fpga/dfl.c
> +++ b/drivers/fpga/dfl.c
> @@ -1872,7 +1872,7 @@ static irqreturn_t dfl_irq_handler(int irq, void *arg)
>  {
>   struct eventfd_ctx *trigger = arg;
>  
> - eventfd_signal(trigger, 1);
> + eventfd_signal(trigger);

For FPGA part,

Acked-by: Xu Yilun 

>   return IRQ_HANDLED;
>  }


Re: [PATCH v13 20/35] KVM: x86/mmu: Handle page fault for private memory

2023-11-06 Thread Xu Yilun
On Sun, Nov 05, 2023 at 05:19:36PM +0100, Paolo Bonzini wrote:
> On Sun, Nov 5, 2023 at 2:04 PM Xu Yilun  wrote:
> >
> > > +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> > > +   struct kvm_page_fault *fault)
> > > +{
> > > + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
> > > +   PAGE_SIZE, fault->write, fault->exec,
> > > +   fault->is_private);
> > > +}
> > > +
> > > +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > > +struct kvm_page_fault *fault)
> > > +{
> > > + int max_order, r;
> > > +
> > > + if (!kvm_slot_can_be_private(fault->slot)) {
> > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > > + return -EFAULT;
> > > + }
> > > +
> > > + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, 
> > > >pfn,
> > > +  _order);
> > > + if (r) {
> > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > > + return r;
> >
> > Why report KVM_EXIT_MEMORY_FAULT here? even with a ret != -EFAULT?
> 
> The cases are EFAULT, EHWPOISON (which can report
> KVM_EXIT_MEMORY_FAULT) and ENOMEM. I think it's fine
> that even -ENOMEM can return KVM_EXIT_MEMORY_FAULT,
> and it doesn't violate the documentation.  The docs tell you "what
> can you do if error if EFAULT or EHWPOISON?"; they don't
> exclude that other errnos result in KVM_EXIT_MEMORY_FAULT,
> it's just that you're not supposed to look at it

Thanks, it's OK for ENOMEM + KVM_EXIT_MEMORY_FAULT.

Another concern is, now 3 places to report EFAULT + KVM_EXIT_MEMORY_FAULT:

  if (!kvm_slot_can_be_private(fault->slot)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
  }

  file = kvm_gmem_get_file(slot);
  if (!file)
return -EFAULT;

  if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
  }

They are different cases, and seems userspace should handle them
differently, but not enough information to distinguish them.

Thanks,
Yilun

> 
> Paolo
> 
> 


Re: [PATCH v13 20/35] KVM: x86/mmu: Handle page fault for private memory

2023-11-05 Thread Xu Yilun
> +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> +   struct kvm_page_fault *fault)
> +{
> + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
> +   PAGE_SIZE, fault->write, fault->exec,
> +   fault->is_private);
> +}
> +
> +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> +struct kvm_page_fault *fault)
> +{
> + int max_order, r;
> +
> + if (!kvm_slot_can_be_private(fault->slot)) {
> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> + return -EFAULT;
> + }
> +
> + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, >pfn,
> +  _order);
> + if (r) {
> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> + return r;

Why report KVM_EXIT_MEMORY_FAULT here? even with a ret != -EFAULT? This is
different from the decription where KVM_EXIT_MEMORY_FAULT is introduced:

  KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to
  be implicit conversions.

  To allow for future possibilities where KVM reports KVM_EXIT_MEMORY_FAULT
  and fills run->memory_fault on _any_ unresolved fault, KVM returns
  "-EFAULT"

Thanks,
Yilun

> + }
> +
> + fault->max_level = min(kvm_max_level_for_order(max_order),
> +fault->max_level);
> + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> +
> + return RET_PF_CONTINUE;
> +}


Re: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory

2023-11-04 Thread Xu Yilun
> +KVM_SET_USER_MEMORY_REGION2 is an extension to KVM_SET_USER_MEMORY_REGION 
> that
> +allows mapping guest_memfd memory into a guest.  All fields shared with
> +KVM_SET_USER_MEMORY_REGION identically.  Userspace can set KVM_MEM_PRIVATE in
> +flags to have KVM bind the memory region to a given guest_memfd range of
> +[guest_memfd_offset, guest_memfd_offset + memory_size].  The target 
> guest_memfd
^
The range end should be exclusive, is it?

> +must point at a file created via KVM_CREATE_GUEST_MEMFD on the current VM, 
> and
> +the target range must not be bound to any other memory region.  All standard
> +bounds checks apply (use common sense).
> +
>  ::
>  
>struct kvm_userspace_memory_region2 {
> @@ -6087,9 +6096,24 @@ applied.
>   __u64 guest_phys_addr;
>   __u64 memory_size; /* bytes */
>   __u64 userspace_addr; /* start of the userspace allocated memory */
> +  __u64 guest_memfd_offset;
> + __u32 guest_memfd;
> + __u32 pad1;
> + __u64 pad2[14];
>};
>  

[...]

> +static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
> +{
> + const char *anon_name = "[kvm-gmem]";
> + struct kvm_gmem *gmem;
> + struct inode *inode;
> + struct file *file;
> + int fd, err;
> +
> + fd = get_unused_fd_flags(0);
> + if (fd < 0)
> + return fd;
> +
> + gmem = kzalloc(sizeof(*gmem), GFP_KERNEL);
> + if (!gmem) {
> + err = -ENOMEM;
> + goto err_fd;
> + }
> +
> + /*
> +  * Use the so called "secure" variant, which creates a unique inode
> +  * instead of reusing a single inode.  Each guest_memfd instance needs
> +  * its own inode to track the size, flags, etc.
> +  */
> + file = anon_inode_getfile_secure(anon_name, _gmem_fops, gmem,
> +  O_RDWR, NULL);
> + if (IS_ERR(file)) {
> + err = PTR_ERR(file);
> + goto err_gmem;
> + }
> +
> + file->f_flags |= O_LARGEFILE;
> +
> + inode = file->f_inode;
> + WARN_ON(file->f_mapping != inode->i_mapping);

Just curious, why should we check the mapping fields which is garanteed in
other subsystem?

> +
> + inode->i_private = (void *)(unsigned long)flags;
> + inode->i_op = _gmem_iops;
> + inode->i_mapping->a_ops = _gmem_aops;
> + inode->i_mode |= S_IFREG;
> + inode->i_size = size;
> + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
> + mapping_set_unmovable(inode->i_mapping);
> + /* Unmovable mappings are supposed to be marked unevictable as well. */
> + WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
> +
> + kvm_get_kvm(kvm);
> + gmem->kvm = kvm;
> + xa_init(>bindings);
> + list_add(>entry, >i_mapping->private_list);
> +
> + fd_install(fd, file);
> + return fd;
> +
> +err_gmem:
> + kfree(gmem);
> +err_fd:
> + put_unused_fd(fd);
> + return err;
> +}

[...]

> +int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> +   unsigned int fd, loff_t offset)
> +{
> + loff_t size = slot->npages << PAGE_SHIFT;
> + unsigned long start, end;
> + struct kvm_gmem *gmem;
> + struct inode *inode;
> + struct file *file;
> +
> + BUILD_BUG_ON(sizeof(gfn_t) != sizeof(slot->gmem.pgoff));
> +
> + file = fget(fd);
> + if (!file)
> + return -EBADF;
> +
> + if (file->f_op != _gmem_fops)
> + goto err;
> +
> + gmem = file->private_data;
> + if (gmem->kvm != kvm)
> + goto err;
> +
> + inode = file_inode(file);
> +
> + if (offset < 0 || !PAGE_ALIGNED(offset))
> + return -EINVAL;

Should also "goto err" here.

> +
> + if (offset + size > i_size_read(inode))
> + goto err;
> +
> + filemap_invalidate_lock(inode->i_mapping);
> +
> + start = offset >> PAGE_SHIFT;
> + end = start + slot->npages;
> +
> + if (!xa_empty(>bindings) &&
> + xa_find(>bindings, , end - 1, XA_PRESENT)) {
> + filemap_invalidate_unlock(inode->i_mapping);
> + goto err;
> + }
> +
> + /*
> +  * No synchronize_rcu() needed, any in-flight readers are guaranteed to
> +  * be see either a NULL file or this new file, no need for them to go
> +  * away.
> +  */
> + rcu_assign_pointer(slot->gmem.file, file);
> + slot->gmem.pgoff = start;
> +
> + xa_store_range(>bindings, start, end - 1, slot, GFP_KERNEL);
> + filemap_invalidate_unlock(inode->i_mapping);
> +
> + /*
> +  * Drop the reference to the file, even on success.  The file pins KVM,
> +  * not the other way 'round.  Active bindings are invalidated if the
 ^
around?

Thanks,
Yilun

> +  * file is closed before memslots are destroyed.
> +  */
> + fput(file);
> + return 0;
> +
> +err:
> + fput(file);
> + return -EINVAL;
> +}


Re: [PATCH v13 09/35] KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace

2023-11-02 Thread Xu Yilun
On Fri, Oct 27, 2023 at 11:21:51AM -0700, Sean Christopherson wrote:
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6723,6 +6723,26 @@ array field represents return values. The userspace 
> should update the return
>  values of SBI call before resuming the VCPU. For more details on RISC-V SBI
>  spec refer, https://github.com/riscv/riscv-sbi-doc.
>  
> +::
> +
> + /* KVM_EXIT_MEMORY_FAULT */
> + struct {
> + __u64 flags;
> + __u64 gpa;
> + __u64 size;
> + } memory;
  ^

Should update to "memory_fault" to align with other places.

[...]

> @@ -520,6 +521,12 @@ struct kvm_run {
>  #define KVM_NOTIFY_CONTEXT_INVALID   (1 << 0)
>   __u32 flags;
>   } notify;
> + /* KVM_EXIT_MEMORY_FAULT */
> + struct {
> + __u64 flags;
> + __u64 gpa;
> + __u64 size;
> + } memory_fault;
>   /* Fix the size of the union. */
>   char padding[256];
>   };

Thanks,
Yilun

> 


Re: [PATCH v13 03/35] KVM: Use gfn instead of hva for mmu_notifier_retry

2023-11-01 Thread Xu Yilun
On Fri, Oct 27, 2023 at 11:21:45AM -0700, Sean Christopherson wrote:
> From: Chao Peng 
> 
> Currently in mmu_notifier invalidate path, hva range is recorded and
> then checked against by mmu_notifier_retry_hva() in the page fault
  ^

should be mmu_invalidate_retry_hva().

Besides this, Reviewed-by: Xu Yilun 

Thanks


Re: [RFC PATCH v12 02/33] KVM: Use gfn instead of hva for mmu_notifier_retry

2023-09-20 Thread Xu Yilun
On 2023-09-20 at 06:55:05 -0700, Sean Christopherson wrote:
> On Wed, Sep 20, 2023, Xu Yilun wrote:
> > On 2023-09-13 at 18:55:00 -0700, Sean Christopherson wrote:
> > > +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t 
> > > end)
> > > +{
> > > + lockdep_assert_held_write(>mmu_lock);
> > > +
> > > + WARN_ON_ONCE(!kvm->mmu_invalidate_in_progress);
> > > +
> > >   if (likely(kvm->mmu_invalidate_in_progress == 1)) {
> > >   kvm->mmu_invalidate_range_start = start;
> > >   kvm->mmu_invalidate_range_end = end;
> > 
> > IIUC, Now we only add or override a part of the invalidate range in
> > these fields, IOW only the range in last slot is stored when we unlock.
> 
> Ouch.  Good catch!
> 
> > That may break mmu_invalidate_retry_gfn() cause it can never know the
> > whole invalidate range.
> > 
> > How about we extend the mmu_invalidate_range_start/end everytime so that
> > it records the whole invalidate range:
> > 
> > if (kvm->mmu_invalidate_range_start == INVALID_GPA) {
> > kvm->mmu_invalidate_range_start = start;
> > kvm->mmu_invalidate_range_end = end;
> > } else {
> > kvm->mmu_invalidate_range_start =
> > min(kvm->mmu_invalidate_range_start, start);
> > kvm->mmu_invalidate_range_end =
> > max(kvm->mmu_invalidate_range_end, end);
> > }
> 
> Yeah, that does seem to be the easiest solution.
> 
> I'll post a fixup patch, unless you want the honors.

Please go ahead, cause at a second thought I'm wondering if this simple
range extension is reasonable.

When the invalidation acrosses multiple slots, I'm not sure if the
contiguous HVA range must correspond to contiguous GFN range. If not,
are we producing a larger range than required?

And when the invalidation acrosses multiple address space, I'm almost
sure it is wrong to merge GFN ranges from different address spaces. But
I have no clear solution yet.

Thanks,
Yilun


Re: [RFC PATCH v12 02/33] KVM: Use gfn instead of hva for mmu_notifier_retry

2023-09-20 Thread Xu Yilun
On 2023-09-13 at 18:55:00 -0700, Sean Christopherson wrote:
> From: Chao Peng 
> 
> Currently in mmu_notifier invalidate path, hva range is recorded and
> then checked against by mmu_notifier_retry_hva() in the page fault
  ^

Now it is mmu_invalidate_retry_hva().

> handling path. However, for the to be introduced private memory, a page
> fault may not have a hva associated, checking gfn(gpa) makes more sense.
> 
> For existing hva based shared memory, gfn is expected to also work. The
> only downside is when aliasing multiple gfns to a single hva, the
> current algorithm of checking multiple ranges could result in a much
> larger range being rejected. Such aliasing should be uncommon, so the
> impact is expected small.
> 
> Suggested-by: Sean Christopherson 
> Signed-off-by: Chao Peng 
> Reviewed-by: Fuad Tabba 
> Tested-by: Fuad Tabba 
> [sean: convert vmx_set_apic_access_page_addr() to gfn-based API]
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/mmu/mmu.c   | 10 ++
>  arch/x86/kvm/vmx/vmx.c   | 11 +--
>  include/linux/kvm_host.h | 33 +
>  virt/kvm/kvm_main.c  | 40 +++-
>  4 files changed, 63 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e1d011c67cc6..0f0231d2b74f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3056,7 +3056,7 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
> u64 *sptep)
>   *
>   * There are several ways to safely use this helper:
>   *
> - * - Check mmu_invalidate_retry_hva() after grabbing the mapping level, 
> before
> + * - Check mmu_invalidate_retry_gfn() after grabbing the mapping level, 
> before
>   *   consuming it.  In this case, mmu_lock doesn't need to be held during the
>   *   lookup, but it does need to be held while checking the MMU notifier.
>   *
> @@ -4358,7 +4358,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
>   return true;
>  
>   return fault->slot &&
> -mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva);
> +mmu_invalidate_retry_gfn(vcpu->kvm, fault->mmu_seq, fault->gfn);
>  }
>  
>  static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault 
> *fault)
> @@ -6253,7 +6253,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t 
> gfn_start, gfn_t gfn_end)
>  
>   write_lock(>mmu_lock);
>  
> - kvm_mmu_invalidate_begin(kvm, 0, -1ul);
> + kvm_mmu_invalidate_begin(kvm);
> +
> + kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end);
>  
>   flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
>  
> @@ -6266,7 +6268,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t 
> gfn_start, gfn_t gfn_end)
>   if (flush)
>   kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - 
> gfn_start);
>  
> - kvm_mmu_invalidate_end(kvm, 0, -1ul);
> + kvm_mmu_invalidate_end(kvm);
>  
>   write_unlock(>mmu_lock);
>  }
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 72e3943f3693..6e502ba93141 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6757,10 +6757,10 @@ static void vmx_set_apic_access_page_addr(struct 
> kvm_vcpu *vcpu)
>   return;
>  
>   /*
> -  * Grab the memslot so that the hva lookup for the mmu_notifier retry
> -  * is guaranteed to use the same memslot as the pfn lookup, i.e. rely
> -  * on the pfn lookup's validation of the memslot to ensure a valid hva
> -  * is used for the retry check.
> +  * Explicitly grab the memslot using KVM's internal slot ID to ensure
> +  * KVM doesn't unintentionally grab a userspace memslot.  It _should_
> +  * be impossible for userspace to create a memslot for the APIC when
> +  * APICv is enabled, but paranoia won't hurt in this case.
>*/
>   slot = id_to_memslot(slots, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT);
>   if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
> @@ -6785,8 +6785,7 @@ static void vmx_set_apic_access_page_addr(struct 
> kvm_vcpu *vcpu)
>   return;
>  
>   read_lock(>kvm->mmu_lock);
> - if (mmu_invalidate_retry_hva(kvm, mmu_seq,
> -  gfn_to_hva_memslot(slot, gfn))) {
> + if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn)) {
>   kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
>   read_unlock(>kvm->mmu_lock);
>   goto out;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index fb6c6109fdca..11d091688346 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -787,8 +787,8 @@ struct kvm {
>   struct mmu_notifier mmu_notifier;
>   unsigned long mmu_invalidate_seq;
>   long mmu_invalidate_in_progress;
> - unsigned long mmu_invalidate_range_start;
> - unsigned long mmu_invalidate_range_end;
> + gfn_t mmu_invalidate_range_start;
> +  

Re: [RFC PATCH v11 08/29] KVM: Introduce per-page memory attributes

2023-07-26 Thread Xu Yilun
On 2023-07-26 at 08:59:53 -0700, Sean Christopherson wrote:
> On Mon, Jul 24, 2023, Xu Yilun wrote:
> > On 2023-07-18 at 16:44:51 -0700, Sean Christopherson wrote:
> > > + if (WARN_ON_ONCE(start == end))
> > > + return -EINVAL;
> > 
> > Also, is this check possible to be hit? Maybe remove it?
> 
> It should be impossible to, hence the WARN.  I added the check for two 
> reasons:
> (1) to help document that end is exclusive, and (2) to guard against future 
> bugs.

Understood. I'm good to it.


Re: [RFC PATCH v11 08/29] KVM: Introduce per-page memory attributes

2023-07-23 Thread Xu Yilun
On 2023-07-18 at 16:44:51 -0700, Sean Christopherson wrote:
> From: Chao Peng 
> 
> In confidential computing usages, whether a page is private or shared is
> necessary information for KVM to perform operations like page fault
> handling, page zapping etc. There are other potential use cases for
> per-page memory attributes, e.g. to make memory read-only (or no-exec,
> or exec-only, etc.) without having to modify memslots.
> 
> Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow
> userspace to operate on the per-page memory attributes.
>   - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to
> a guest memory range.
>   - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported
> memory attributes.
> 
> Use an xarray to store the per-page attributes internally, with a naive,
> not fully optimized implementation, i.e. prioritize correctness over
> performance for the initial implementation.
> 
> Because setting memory attributes is roughly analogous to mprotect() on
> memory that is mapped into the guest, zap existing mappings prior to
> updating the memory attributes.  Opportunistically provide an arch hook
> for the post-set path (needed to complete invalidation anyways) in
> anticipation of x86 needing the hook to update metadata related to
> determining whether or not a given gfn can be backed with various sizes
> of hugepages.
> 
> It's possible that future usages may not require an invalidation, e.g.
> if KVM ends up supporting RWX protections and userspace grants _more_
> protections, but again opt for simplicity and punt optimizations to
> if/when they are needed.
> 
> Suggested-by: Sean Christopherson 
> Link: https://lore.kernel.org/all/y2wb48kd0j4vg...@google.com
> Cc: Fuad Tabba 
> Signed-off-by: Chao Peng 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> ---
>  Documentation/virt/kvm/api.rst |  60 
>  include/linux/kvm_host.h   |  14 +++
>  include/uapi/linux/kvm.h   |  14 +++
>  virt/kvm/Kconfig   |   4 +
>  virt/kvm/kvm_main.c| 170 +
>  5 files changed, 262 insertions(+)
>

Only some trivial concerns below.

[...]
 
> @@ -1175,6 +1176,9 @@ static struct kvm *kvm_create_vm(unsigned long type, 
> const char *fdname)
>   spin_lock_init(>mn_invalidate_lock);
>   rcuwait_init(>mn_memslots_update_rcuwait);
>   xa_init(>vcpu_array);
> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> + xa_init(>mem_attr_array);
> +#endif
>  
>   INIT_LIST_HEAD(>gpc_list);
>   spin_lock_init(>gpc_lock);
> @@ -1346,6 +1350,9 @@ static void kvm_destroy_vm(struct kvm *kvm)
>   kvm_free_memslots(kvm, >__memslots[i][0]);
>   kvm_free_memslots(kvm, >__memslots[i][1]);
>   }
> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> + xa_destroy(>mem_attr_array);
> +#endif

Is it better to make the destruction in reverse order from the creation?
To put xa_destroy(>mem_attr_array) after cleanup_srcu_struct(>srcu),
or put xa_init(>mem_attr_array) after init_srcu_struct(>irq_srcu).

>   cleanup_srcu_struct(>irq_srcu);
>   cleanup_srcu_struct(>srcu);
>   kvm_arch_free_vm(kvm);
> @@ -2346,6 +2353,145 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm 
> *kvm,
>  }
>  #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */

[...]

> +static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> +struct kvm_memory_attributes *attrs)
> +{
> + gfn_t start, end;
> +
> + /* flags is currently not used. */
> + if (attrs->flags)
> + return -EINVAL;
> + if (attrs->attributes & ~kvm_supported_mem_attributes(kvm))
> + return -EINVAL;
> + if (attrs->size == 0 || attrs->address + attrs->size < attrs->address)
> + return -EINVAL;
> + if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
> + return -EINVAL;
> +
> + start = attrs->address >> PAGE_SHIFT;
> + end = (attrs->address + attrs->size - 1 + PAGE_SIZE) >> PAGE_SHIFT;

As the attrs->address/size are both garanteed to be non-zero, non-wrap
and page aligned in prevous check. Is it OK to simplify the calculation,
like:

  end = (attrs->address + attrs->size) >> PAGE_SHIFT;

> +
> + if (WARN_ON_ONCE(start == end))
> + return -EINVAL;

Also, is this check possible to be hit? Maybe remove it?

Thanks,
Yilun

> +
> + /*
> +  * xarray tracks data using "unsigned long", and as a result so does
> +  * KVM.  For simplicity, supports generic attributes only on 64-bit
> +  * architectures.
> +  */
> + BUILD_BUG_ON(sizeof(attrs->attributes) != sizeof(unsigned long));
> +
> + return kvm_vm_set_mem_attributes(kvm, attrs->attributes, start, end);
> +}
> +#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */


Re: [RFC PATCH v11 01/29] KVM: Wrap kvm_gfn_range.pte in a per-action union

2023-07-21 Thread Xu Yilun
On 2023-07-21 at 14:26:11 +0800, Yan Zhao wrote:
> On Tue, Jul 18, 2023 at 04:44:44PM -0700, Sean Christopherson wrote:
> 
> May I know why KVM now needs to register to callback .change_pte()?

I can see the original purpose is to "setting a pte in the shadow page
table directly, instead of flushing the shadow page table entry and then
getting vmexit to set it"[1].

IIUC, KVM is expected to directly make the new pte present for new
pages in this callback, like for COW.

> As also commented in kvm_mmu_notifier_change_pte(), .change_pte() must be
> surrounded by .invalidate_range_{start,end}().
> 
> While kvm_mmu_notifier_invalidate_range_start() has called 
> kvm_unmap_gfn_range()
> to zap all leaf SPTEs, and page fault path will not install new SPTEs
> successfully before kvm_mmu_notifier_invalidate_range_end(),
> kvm_set_spte_gfn() should not be able to find any shadow present leaf entries 
> to
> update PFN.

I also failed to figure out how the kvm_set_spte_gfn() could pass
several !is_shadow_present_pte(iter.old_spte) check then write the new
pte.


[1] 
https://lore.kernel.org/all/200909222039.n8mkd4tl002...@imap1.linux-foundation.org/

Thanks,
Yilun

> 
> Or could we just delete completely
> "kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);"
> from kvm_mmu_notifier_change_pte() ?


Re: [RFC] fpga: dfl: fme: Fix cpu hotplug code

2021-06-28 Thread Xu Yilun
It's a good fix, you can drop the RFC in commit title. :)

The title could be more specific, like:

fpga: dfl: fme: Fix cpu hotplug issue in performance reporting

So we know it is for performance reporting feature at first glance.

On Mon, Jun 28, 2021 at 12:45:46PM +0530, Kajol Jain wrote:

> Commit 724142f8c42a ("fpga: dfl: fme: add performance
> reporting support") added performance reporting support
> for FPGA management engine via perf.

May drop this section, it is indicated in the Fixes tag.

> 
> It also added cpu hotplug feature but it didn't add

The performance reporting driver added cpu hotplug ...

> pmu migration call in cpu offline function.
> This can create an issue incase the current designated
> cpu being used to collect fme pmu data got offline,
> as based on current code we are not migrating fme pmu to
> new target cpu. Because of that perf will still try to
> fetch data from that offline cpu and hence we will not
> get counter data.
> 
> Patch fixed this issue by adding pmu_migrate_context call
> in fme_perf_offline_cpu function.
> 
> Fixes: 724142f8c42a ("fpga: dfl: fme: add performance reporting support")
> Signed-off-by: Kajol Jain 

Tested-by: Xu Yilun 

Thanks,
Yilun

> ---
>  drivers/fpga/dfl-fme-perf.c | 4 
>  1 file changed, 4 insertions(+)
> 
> ---
> - This fix patch is not tested (as I don't have required environment).
>   But issue mentioned in the commit msg can be re-created, by starting any
>   fme_perf event and while its still running, offline current designated
>   cpu pointed by cpumask file. Since current code didn't migrating pmu,
>   perf gonna try getting counts from that offlined cpu and hence we will
>   not get event data.
> ---
> diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
> index 4299145ef347..b9a54583e505 100644
> --- a/drivers/fpga/dfl-fme-perf.c
> +++ b/drivers/fpga/dfl-fme-perf.c
> @@ -953,6 +953,10 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct 
> hlist_node *node)
>   return 0;
>  
>   priv->cpu = target;
> +
> + /* Migrate fme_perf pmu events to the new target cpu */
> + perf_pmu_migrate_context(>pmu, cpu, target);
> +
>   return 0;
>  }
>  
> -- 
> 2.31.1