[PATCH 1/1] vhost scsi: allocate vhost_scsi with GFP_NOWAIT to avoid delay

2021-01-20 Thread Dongli Zhang
The size of 'struct vhost_scsi' is order-10 (~2.3MB). It may take long time
delay by kzalloc() to compact memory pages when there is a lack of
high-order pages. As a result, there is latency to create a VM (with
vhost-scsi) or to hotadd vhost-scsi-based storage.

The prior commit 595cb754983d ("vhost/scsi: use vmalloc for order-10
allocation") prefers to fallback only when really needed, while this patch
changes allocation to GFP_NOWAIT in order to avoid the delay caused by
memory page compact.

Cc: Aruna Ramakrishna 
Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
---
Another option is to rework by reducing the size of 'struct vhost_scsi',
e.g., by replacing inline vhost_scsi.vqs with just memory pointers while
each vhost_scsi.vqs[i] should be allocated separately. Please let me
know if that option is better.

 drivers/vhost/scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4ce9f00ae10e..85eaa4e883f4 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1814,7 +1814,7 @@ static int vhost_scsi_open(struct inode *inode, struct 
file *f)
struct vhost_virtqueue **vqs;
int r = -ENOMEM, i;
 
-   vs = kzalloc(sizeof(*vs), GFP_KERNEL | __GFP_NOWARN | 
__GFP_RETRY_MAYFAIL);
+   vs = kzalloc(sizeof(*vs), GFP_NOWAIT | __GFP_NOWARN);
if (!vs) {
vs = vzalloc(sizeof(*vs));
if (!vs)
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf

2021-01-20 Thread Kasireddy, Vivek
Hi Alex,

> -Original Message-
> From: Alex Williamson 
> Sent: Tuesday, January 19, 2021 7:37 PM
> To: Tian, Kevin 
> Cc: Kasireddy, Vivek ; Kim, Dongwon
> ; virtualization@lists.linux-foundation.org; Zhao, Yan 
> Y
> 
> Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> 
> On Wed, 20 Jan 2021 03:05:49 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson
> > > Sent: Wednesday, January 20, 2021 8:51 AM
> > >
> > > On Wed, 20 Jan 2021 00:14:49 +
> > > "Kasireddy, Vivek"  wrote:
> > >
> > > > Hi Alex,
> > > >
> > > > > -Original Message-
> > > > > From: Alex Williamson 
> > > > > Sent: Tuesday, January 19, 2021 7:40 AM
> > > > > To: Kasireddy, Vivek 
> > > > > Cc: virtualization@lists.linux-foundation.org; Kim, Dongwon
> > > 
> > > > > Subject: Re: [RFC 3/3] vfio: Share the KVM instance with Vdmabuf
> > > > >
> > > > > On Tue, 19 Jan 2021 00:28:12 -0800
> > > > > Vivek Kasireddy  wrote:
> > > > >
> > > > > > Getting a copy of the KVM instance is necessary for mapping Guest
> > > > > > pages in the Host.
> > > > > >
> > > > > > TODO: Instead of invoking the symbol directly, there needs to be a
> > > > > > better way of getting a copy of the KVM instance probably by using
> > > > > > other notifiers. However, currently, KVM shares its instance only
> > > > > > with VFIO and therefore we are compelled to bind the passthrough'd
> > > > > > device to vfio-pci.
> > > > >
> > > > > Yeah, this is a bad solution, sorry, vfio is not going to gratuitously
> > > > > call out to vhost to share a kvm pointer.  I'd prefer to get rid of
> > > > > vfio having any knowledge or visibility of the kvm pointer.  Thanks,
> > > >
> > > > [Kasireddy, Vivek] I agree that this is definitely not ideal as I 
> > > > recognize it
> > > > in the TODO. However, it looks like VFIO also gets a copy of the KVM
> > > > pointer in a similar manner:
> > > >
> > > > virt/kvm/vfio.c
> > > >
> > > > static void kvm_vfio_group_set_kvm(struct vfio_group *group, struct kvm
> > > *kvm)
> > > > {
> > > > void (*fn)(struct vfio_group *, struct kvm *);
> > > >
> > > > fn = symbol_get(vfio_group_set_kvm);
> > > > if (!fn)
> > > > return;
> > > >
> > > > fn(group, kvm);
> > > >
> > > > symbol_put(vfio_group_set_kvm);
> > > > }
> > >
> > > You're equating the mechanism with the architecture.  We use symbols
> > > here to avoid module dependencies between kvm and vfio, but this is
> > > just propagating data that userspace is specifically registering
> > > between kvm and vfio.  vhost doesn't get to piggyback on that channel.
> > >
> > > > With this patch, I am not suggesting that this is a precedent that 
> > > > should be
> > > followed
> > > > but it appears there doesn't seem to be an alternative way of getting a 
> > > > copy
> > > of the KVM
> > > > pointer that is clean and elegant -- unless I have not looked hard 
> > > > enough. I
> > > guess we
> > > > could create a notifier chain with callbacks for VFIO and Vhost that KVM
> > > would call
> > > > but this would mean modifying KVM.
> > > >
> > > > Also, if I understand correctly, if VFIO does not want to share the KVM
> > > pointer with
> > > > VFIO groups, then I think it would break stuff like mdev which counts 
> > > > on it.
> > >
> > > Only kvmgt requires the kvm pointer and the use case there is pretty
> > > questionable, I wonder if it actually still exists now that we have the
> > > DMA r/w interface through vfio.  Thanks,
> > >
> >
> > IIRC, kvmgt still needs the kvm pointer to use kvm page tracking interface
> > for write-protecting guest pgtable.
> 
> Thanks, Kevin.  Either way, a vhost device has no stake in the game wrt
> the kvm pointer lifecycle here and no business adding a callout.  I'm
> reluctant to add any further use cases even for mdevs as ideally mdevs
> should have no dependency on kvm.  Thanks,

[Kasireddy, Vivek] All I am trying to do is leverage existing mechanism(s) 
instead of creating new ones. So, if Vhost cannot get the kvm pointer from 
VFIO in any manner, my only option, as it appears is to add a new 
notifier_block to KVM that gets triggered in kvm_create_vm() and 
kvm_destroy_vm(). However, I am not sure if that would be acceptable to 
the KVM maintainers. Does anyone know if there is another cleaner option 
available rather than having to modify KVM?

Thanks,
Vivek

> 
> Alex

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required

2021-01-20 Thread Pavel Tatashin
On Wed, Jan 20, 2021 at 7:01 AM Wei Liu  wrote:
>
> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
>
> Allocate pages for storing results when Linux runs as the root
> partition.
>
> Signed-off-by: Lillian Grassin-Drake 
> Co-Developed-by: Lillian Grassin-Drake 
> Signed-off-by: Wei Liu 

Reviewed-by: Pavel Tatashin 

The new warnings reported by the robot are the same as for the input argument.

Pasha

> ---
> v3: Fix hv_cpu_die to use free_pages.
> v2: Address Vitaly's comments
> ---
>  arch/x86/hyperv/hv_init.c   | 35 -
>  arch/x86/include/asm/mshyperv.h |  1 +
>  2 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e04d90af4c27..6f4cb40e53fe 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
>  void  __percpu **hyperv_pcpu_input_arg;
>  EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>
> +void  __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>
> @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
> void **input_arg;
> struct page *pg;
>
> -   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> -   pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> +   pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 
> hv_root_partition ? 1 : 0);
> if (unlikely(!pg))
> return -ENOMEM;
> +
> +   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> *input_arg = page_address(pg);
> +   if (hv_root_partition) {
> +   void **output_arg;
> +
> +   output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> +   *output_arg = page_address(pg + 1);
> +   }
>
> hv_get_vp_index(msr_vp_index);
>
> @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
> unsigned int new_cpu;
> unsigned long flags;
> void **input_arg;
> -   void *input_pg = NULL;
> +   void *pg;
>
> local_irq_save(flags);
> input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -   input_pg = *input_arg;
> +   pg = *input_arg;
> *input_arg = NULL;
> +
> +   if (hv_root_partition) {
> +   void **output_arg;
> +
> +   output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> +   *output_arg = NULL;
> +   }
> +
> local_irq_restore(flags);
> -   free_page((unsigned long)input_pg);
> +
> +   free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
>
> if (hv_vp_assist_page && hv_vp_assist_page[cpu])
> wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -346,6 +365,12 @@ void __init hyperv_init(void)
>
> BUG_ON(hyperv_pcpu_input_arg == NULL);
>
> +   /* Allocate the per-CPU state for output arg for root */
> +   if (hv_root_partition) {
> +   hyperv_pcpu_output_arg = alloc_percpu(void *);
> +   BUG_ON(hyperv_pcpu_output_arg == NULL);
> +   }
> +
> /* Allocate percpu VP index */
> hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
> GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ac2b0d110f03..62d9390f1ddf 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
> +extern void  __percpu  **hyperv_pcpu_output_arg;
>
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
> --
> 2.20.1
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type

2021-01-20 Thread Alexei Starovoitov
On Tue, Jan 12, 2021 at 12:55 PM Yuri Benditovich
 wrote:
>
> On Tue, Jan 12, 2021 at 10:40 PM Yuri Benditovich
>  wrote:
> >
> > On Tue, Jan 12, 2021 at 9:42 PM Yuri Benditovich
> >  wrote:
> > >
> > > This program type can set skb hash value. It will be useful
> > > when the tun will support hash reporting feature if virtio-net.
> > >
> > > Signed-off-by: Yuri Benditovich 
> > > ---
> > >  drivers/net/tun.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index 7959b5c2d11f..455f7afc1f36 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, 
> > > struct tun_prog __rcu **prog_p,
> > > prog = NULL;
> > > } else {
> > > prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER);
> > > +   if (IS_ERR(prog))
> > > +   prog = bpf_prog_get_type(fd, 
> > > BPF_PROG_TYPE_SCHED_CLS);
> > > if (IS_ERR(prog))
> > > return PTR_ERR(prog);
> > > }
> >
> > Comment from Alexei Starovoitov:
> > Patches 1 and 2 are missing for me, so I couldn't review properly,
> > but this diff looks odd.
> > It allows sched_cls prog type to attach to tun.
> > That means everything that sched_cls progs can do will be done from tun 
> > hook?
>
> We do not have an intention to modify the packet in this steering eBPF.

The intent is irrelevant. Using SCHED_CLS here will let users modify the packet
and some users will do so. Hence the tun code has to support it.

> There is just one function that unavailable for BPF_PROG_TYPE_SOCKET_FILTER
> that the eBPF needs to make possible to deliver the hash to the guest
> VM - it is 'bpf_set_hash'
>
> Does it mean that we need to define a new eBPF type for socket filter
> operations + set_hash?
>
> Our problem is that the eBPF calculates 32-bit hash, 16-bit queue
> index and 8-bit of hash type.
> But it is able to return only 32-bit integer, so in this set of
> patches the eBPF returns
> queue index and hash type and saves the hash in skb->hash using 
> bpf_set_hash().

bpf prog can only return a 32-bit integer. That's true.
But the prog can use helpers to set any number of bits and variables.
bpf_set_hash_v2() with hash, queue and index arguments could fit this purpose,
but if you allow it for SCHED_CLS type,
tc side of the code should be ready to deal with that too and this extended
helper should be meaningful for both tc and tun.

In general if the purpose of the prog is to compute three values they better be
grouped together. Returned two of them via ORed 32-bit integer and
returning 32-bit via bpf_set_hash is an awkward api.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition

2021-01-20 Thread Pavel Tatashin
On Wed, Jan 20, 2021 at 7:01 AM Wei Liu  wrote:
>
> For now we can use the privilege flag to check. Stash the value to be
> used later.
>
> Put in a bunch of defines for future use when we want to have more
> fine-grained detection.
>
> Signed-off-by: Wei Liu 
> ---
> v3: move hv_root_partition to mshyperv.c
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 10 ++
>  arch/x86/include/asm/mshyperv.h|  2 ++
>  arch/x86/kernel/cpu/mshyperv.c | 20 
>  3 files changed, 32 insertions(+)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
> b/arch/x86/include/asm/hyperv-tlfs.h
> index 6bf42aed387e..204010350604 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -21,6 +21,7 @@
>  #define HYPERV_CPUID_FEATURES  0x4003
>  #define HYPERV_CPUID_ENLIGHTMENT_INFO  0x4004
>  #define HYPERV_CPUID_IMPLEMENT_LIMITS  0x4005
> +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES   0x4007
>  #define HYPERV_CPUID_NESTED_FEATURES   0x400A
>
>  #define HYPERV_CPUID_VIRT_STACK_INTERFACE  0x4081
> @@ -110,6 +111,15 @@
>  /* Recommend using enlightened VMCS */
>  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDEDBIT(14)
>
> +/*
> + * CPU management features identification.
> + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> + */
> +#define HV_X64_START_LOGICAL_PROCESSOR BIT(0)
> +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR   BIT(1)
> +#define HV_X64_PERFORMANCE_COUNTER_SYNCBIT(2)
> +#define HV_X64_RESERVED_IDENTITY_BIT   BIT(31)
> +
>  /*
>   * Virtual processor will never share a physical core with another virtual
>   * processor, except for virtual processors that are reported as sibling SMT
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..ac2b0d110f03 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
> struct hv_guest_mapping_flush_list *flush,
> u64 start_gfn, u64 end_gfn);
>
> +extern bool hv_root_partition;
> +
>  #ifdef CONFIG_X86_64
>  void hv_apic_init(void);
>  void __init hv_init_spinlocks(void);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f628e3dc150f..c376d191a260 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -32,6 +32,10 @@
>  #include 
>  #include 
>
> +/* Is Linux running as the root partition? */
> +bool hv_root_partition;
> +EXPORT_SYMBOL_GPL(hv_root_partition);
> +
>  struct ms_hyperv_info ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
>
> @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
> pr_debug("Hyper-V: max %u virtual processors, %u logical 
> processors\n",
>  ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
>
> +   /*
> +* Check CPU management privilege.
> +*
> +* To mirror what Windows does we should extract CPU management
> +* features and use the ReservedIdentityBit to detect if Linux is the
> +* root partition. But that requires negotiating CPU management
> +* interface (a process to be finalized).

Is this comment relevant? Do we have to mirror what Windows does?

> +*
> +* For now, use the privilege flag as the indicator for running as
> +* root.
> +*/
> +   if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
> +   hv_root_partition = true;
> +   pr_info("Hyper-V: running as root partition\n");
> +   }
> +

Reviewed-by: Pavel Tatashin 

> /*
>  * Extract host information.
>  */
> --
> 2.20.1
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 3/3] VMCI: Enforce queuepair max size for IOCTL_VMCI_QUEUEPAIR_ALLOC

2021-01-20 Thread Jorgen Hansen
When create the VMCI queue pair tracking data structures on the host
side, the IOCTL for creating the VMCI queue pair didn't validate
the queue pair size parameters. This change adds checks for this.

This avoids a memory allocation issue in qp_host_alloc_queue, as
reported by nslusa...@gmx.net. The check in qp_host_alloc_queue
has also been updated to enforce the maximum queue pair size
as defined by VMCI_MAX_GUEST_QP_MEMORY.

The fix has been verified using sample code supplied by
nslusa...@gmx.net.

Reported-by: nslusa...@gmx.net
Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 
 include/linux/vmw_vmci_defs.h   |  4 ++--
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c 
b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index 525ef96..d787dde 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -237,7 +237,9 @@ static struct qp_list qp_guest_endpoints = {
 #define QPE_NUM_PAGES(_QPE) ((u32) \
 (DIV_ROUND_UP(_QPE.produce_size, PAGE_SIZE) + \
  DIV_ROUND_UP(_QPE.consume_size, PAGE_SIZE) + 2))
-
+#define QP_SIZES_ARE_VALID(_prod_qsize, _cons_qsize) \
+   ((_prod_qsize) + (_cons_qsize) >= max(_prod_qsize, _cons_qsize) && \
+(_prod_qsize) + (_cons_qsize) <= VMCI_MAX_GUEST_QP_MEMORY)
 
 /*
  * Frees kernel VA space for a given queue and its queue header, and
@@ -528,7 +530,7 @@ static struct vmci_queue *qp_host_alloc_queue(u64 size)
u64 num_pages;
const size_t queue_size = sizeof(*queue) + sizeof(*(queue->kernel_if));
 
-   if (size > SIZE_MAX - PAGE_SIZE)
+   if (size > min_t(size_t, VMCI_MAX_GUEST_QP_MEMORY, SIZE_MAX - 
PAGE_SIZE))
return NULL;
num_pages = DIV_ROUND_UP(size, PAGE_SIZE) + 1;
if (num_pages > (SIZE_MAX - queue_size) /
@@ -1929,6 +1931,9 @@ int vmci_qp_broker_alloc(struct vmci_handle handle,
 struct vmci_qp_page_store *page_store,
 struct vmci_ctx *context)
 {
+   if (!QP_SIZES_ARE_VALID(produce_size, consume_size))
+   return VMCI_ERROR_NO_RESOURCES;
+
return qp_broker_alloc(handle, peer, flags, priv_flags,
   produce_size, consume_size,
   page_store, context, NULL, NULL, NULL, NULL);
@@ -2685,8 +2690,7 @@ int vmci_qpair_alloc(struct vmci_qp **qpair,
 * used by the device is NO_RESOURCES, so use that here too.
 */
 
-   if (produce_qsize + consume_qsize < max(produce_qsize, consume_qsize) ||
-   produce_qsize + consume_qsize > VMCI_MAX_GUEST_QP_MEMORY)
+   if (!QP_SIZES_ARE_VALID(produce_qsize, consume_qsize))
return VMCI_ERROR_NO_RESOURCES;
 
retval = vmci_route(&src, &dst, false, &route);
diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h
index be0afe6..e36cb11 100644
--- a/include/linux/vmw_vmci_defs.h
+++ b/include/linux/vmw_vmci_defs.h
@@ -66,7 +66,7 @@ enum {
  * consists of at least two pages, the memory limit also dictates the
  * number of queue pairs a guest can create.
  */
-#define VMCI_MAX_GUEST_QP_MEMORY (128 * 1024 * 1024)
+#define VMCI_MAX_GUEST_QP_MEMORY ((size_t)(128 * 1024 * 1024))
 #define VMCI_MAX_GUEST_QP_COUNT  (VMCI_MAX_GUEST_QP_MEMORY / PAGE_SIZE / 2)
 
 /*
@@ -80,7 +80,7 @@ enum {
  * too much kernel memory (especially on vmkernel).  We limit a queuepair to
  * 32 KB, or 16 KB per queue for symmetrical pairs.
  */
-#define VMCI_MAX_PINNED_QP_MEMORY (32 * 1024)
+#define VMCI_MAX_PINNED_QP_MEMORY ((size_t)(32 * 1024))
 
 /*
  * We have a fixed set of resource IDs available in the VMX.
-- 
2.6.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/3] VMCI: Use set_page_dirty_lock() when unregistering guest memory

2021-01-20 Thread Jorgen Hansen
When the VMCI host support releases guest memory in the case where
the VM was killed, the pinned guest pages aren't locked. Use
set_page_dirty_lock() instead of set_page_dirty().

Testing done: Killed VM while having an active VMCI based vSocket
connection and observed warning from ext4. With this fix, no
warning was observed. Ran various vSocket tests without issues.

Fixes: 06164d2b72aa ("VMCI: queue pairs implementation.")
Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_queue_pair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c 
b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index a3691c1..525ef96 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -630,7 +630,7 @@ static void qp_release_pages(struct page **pages,
 
for (i = 0; i < num_pages; i++) {
if (dirty)
-   set_page_dirty(pages[i]);
+   set_page_dirty_lock(pages[i]);
 
put_page(pages[i]);
pages[i] = NULL;
-- 
2.6.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/3] VMCI: Stop log spew when qp allocation isn't possible

2021-01-20 Thread Jorgen Hansen
VMCI queue pair allocation is disabled, if a VM is in FT mode. In
these cases, VMware Tools may still once in a while attempt to
create a vSocket stream connection, resulting in multiple
warnings in the kernel logs. Therefore downgrade the error log to
a debug log.

Reviewed-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 drivers/misc/vmw_vmci/vmci_queue_pair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c 
b/drivers/misc/vmw_vmci/vmci_queue_pair.c
index c490658..a3691c1 100644
--- a/drivers/misc/vmw_vmci/vmci_queue_pair.c
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -1207,7 +1207,7 @@ static int qp_alloc_guest_work(struct vmci_handle *handle,
} else {
result = qp_alloc_hypercall(queue_pair_entry);
if (result < VMCI_SUCCESS) {
-   pr_warn("qp_alloc_hypercall result = %d\n", result);
+   pr_devel("qp_alloc_hypercall result = %d\n", result);
goto error;
}
}
-- 
2.6.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/3] VMCI: Queue pair bug fixes

2021-01-20 Thread Jorgen Hansen
This series contains three bug fixes for the queue pair
implementation in the VMCI driver.

v1 -> v2:
  - format patches as a series
  - use min_t instead of min to ensure size_t comparison
(issue pointed out by kernel test robot )

Jorgen Hansen (3):
  VMCI: Stop log spew when qp allocation isn't possible
  VMCI: Use set_page_dirty_lock() when unregistering guest memory
  VMCI: Enforce queuepair max size for IOCTL_VMCI_QUEUEPAIR_ALLOC

 drivers/misc/vmw_vmci/vmci_queue_pair.c | 16 ++--
 include/linux/vmw_vmci_defs.h   |  4 ++--
 2 files changed, 12 insertions(+), 8 deletions(-)

-- 
2.6.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT

2021-01-20 Thread Pavel Tatashin
On Wed, Jan 20, 2021 at 7:01 AM Wei Liu  wrote:
>
> This makes the name match Hyper-V TLFS.
>
> Signed-off-by: Wei Liu 
> Reviewed-by: Vitaly Kuznetsov 
> ---
>  include/asm-generic/hyperv-tlfs.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index e73a11850055..e6903589a82a 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -88,7 +88,7 @@
>  #define HV_CONNECT_PORTBIT(7)
>  #define HV_ACCESS_STATSBIT(8)
>  #define HV_DEBUGGING   BIT(11)
> -#define HV_CPU_POWER_MANAGEMENTBIT(12)
> +#define HV_CPU_MANAGEMENT  BIT(12)

Reviewed-by: Pavel Tatashin 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if running as root

2021-01-20 Thread Pavel Tatashin
On Wed, Jan 20, 2021 at 7:01 AM Wei Liu  wrote:
>
> When Linux runs as the root partition, the setup required for TSC page
> is different.

Why would we need a TSC page as a clock source for root partition at
all? I think the above can be removed.

 Luckily Linux also has access to the MSR based
> clocksource. We can just disable the TSC page clocksource if Linux is
> the root partition.
>
> Signed-off-by: Wei Liu 
> Acked-by: Daniel Lezcano 
> ---
>  drivers/clocksource/hyperv_timer.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/clocksource/hyperv_timer.c 
> b/drivers/clocksource/hyperv_timer.c
> index ba04cb381cd3..269a691bd2c4 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
> if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
> return false;
>
> +   if (hv_root_partition)
> +   return false;
> +

Reviewed-by: Pavel Tatashin 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root

2021-01-20 Thread Pavel Tatashin
On Wed, Jan 20, 2021 at 7:01 AM Wei Liu  wrote:
>
> The IOMMU code needs more work. We're sure for now the IRQ remapping
> hooks are not applicable when Linux is the root partition.
>
> Signed-off-by: Wei Liu 
> Acked-by: Joerg Roedel 
> Reviewed-by: Vitaly Kuznetsov 
> ---
>  drivers/iommu/hyperv-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..b7db6024e65c 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "irq_remapping.h"
>
> @@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
>
> if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
> x86_init.hyper.msi_ext_dest_id() ||
> -   !x2apic_supported())
> +   !x2apic_supported() || hv_root_partition)
> return -ENODEV;

Reviewed-by: Pavel Tatashin 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root

2021-01-20 Thread Pavel Tatashin
On Wed, Jan 20, 2021 at 7:01 AM Wei Liu  wrote:
>
> There is no VMBus and the other infrastructures initialized in
> hv_acpi_init when Linux is running as the root partition.
>
> Signed-off-by: Wei Liu 
> ---
> v3: Return 0 instead of -ENODEV.
> ---
>  drivers/hv/vmbus_drv.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 502f8cd95f6d..ee27b3670a51 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2620,6 +2620,9 @@ static int __init hv_acpi_init(void)
> if (!hv_is_hyperv_initialized())
> return -ENODEV;
>
> +   if (hv_root_partition)
> +   return 0;
> +

Reviewed-by: Pavel Tatashin 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required

2021-01-20 Thread kernel test robot
Hi Wei,

I love your patch! Perhaps something to improve:

[auto build test WARNING on e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62]

url:
https://github.com/0day-ci/linux/commits/Wei-Liu/Introducing-Linux-root-partition-support-for-Microsoft-Hypervisor/20210120-215640
base:e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62
config: x86_64-randconfig-s021-20210120 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-208-g46a52ca4-dirty
# 
https://github.com/0day-ci/linux/commit/f93337fc44e13a1506633f5d308bf74a8311dada
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Wei-Liu/Introducing-Linux-root-partition-support-for-Microsoft-Hypervisor/20210120-215640
git checkout f93337fc44e13a1506633f5d308bf74a8311dada
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


"sparse warnings: (new ones prefixed by >>)"
   arch/x86/hyperv/hv_init.c:84:30: sparse: sparse: incorrect type in 
initializer (different address spaces) @@ expected void const [noderef] 
__percpu *__vpp_verify @@ got void [noderef] __percpu ** @@
   arch/x86/hyperv/hv_init.c:84:30: sparse: expected void const [noderef] 
__percpu *__vpp_verify
   arch/x86/hyperv/hv_init.c:84:30: sparse: got void [noderef] __percpu **
   arch/x86/hyperv/hv_init.c:89:39: sparse: sparse: incorrect type in 
initializer (different address spaces) @@ expected void const [noderef] 
__percpu *__vpp_verify @@ got void [noderef] __percpu ** @@
   arch/x86/hyperv/hv_init.c:89:39: sparse: expected void const [noderef] 
__percpu *__vpp_verify
   arch/x86/hyperv/hv_init.c:89:39: sparse: got void [noderef] __percpu **
   arch/x86/hyperv/hv_init.c:221:30: sparse: sparse: incorrect type in 
initializer (different address spaces) @@ expected void const [noderef] 
__percpu *__vpp_verify @@ got void [noderef] __percpu ** @@
   arch/x86/hyperv/hv_init.c:221:30: sparse: expected void const [noderef] 
__percpu *__vpp_verify
   arch/x86/hyperv/hv_init.c:221:30: sparse: got void [noderef] __percpu **
   arch/x86/hyperv/hv_init.c:228:39: sparse: sparse: incorrect type in 
initializer (different address spaces) @@ expected void const [noderef] 
__percpu *__vpp_verify @@ got void [noderef] __percpu ** @@
   arch/x86/hyperv/hv_init.c:228:39: sparse: expected void const [noderef] 
__percpu *__vpp_verify
   arch/x86/hyperv/hv_init.c:228:39: sparse: got void [noderef] __percpu **
   arch/x86/hyperv/hv_init.c:364:31: sparse: sparse: incorrect type in 
assignment (different address spaces) @@ expected void [noderef] __percpu 
**extern [addressable] [toplevel] hyperv_pcpu_input_arg @@ got void 
*[noderef] __percpu * @@
   arch/x86/hyperv/hv_init.c:364:31: sparse: expected void [noderef] 
__percpu **extern [addressable] [toplevel] hyperv_pcpu_input_arg
   arch/x86/hyperv/hv_init.c:364:31: sparse: got void *[noderef] __percpu *
>> arch/x86/hyperv/hv_init.c:370:40: sparse: sparse: incorrect type in 
>> assignment (different address spaces) @@ expected void [noderef] 
>> __percpu **extern [addressable] [toplevel] hyperv_pcpu_output_arg @@ got 
>> void *[noderef] __percpu * @@
   arch/x86/hyperv/hv_init.c:370:40: sparse: expected void [noderef] 
__percpu **extern [addressable] [toplevel] hyperv_pcpu_output_arg
   arch/x86/hyperv/hv_init.c:370:40: sparse: got void *[noderef] __percpu *

vim +370 arch/x86/hyperv/hv_init.c

   211  
   212  static int hv_cpu_die(unsigned int cpu)
   213  {
   214  struct hv_reenlightenment_control re_ctrl;
   215  unsigned int new_cpu;
   216  unsigned long flags;
   217  void **input_arg;
   218  void *pg;
   219  
   220  local_irq_save(flags);
 > 221  input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
   222  pg = *input_arg;
   223  *input_arg = NULL;
   224  
   225  if (hv_root_partition) {
   226  void **output_arg;
   227  
   228  output_arg = (void 
**)this_cpu_ptr(hyperv_pcpu_output_arg);
   229  *output_arg = NULL;
   230  }
   231  
   232  local_irq_restore(flags);
   233  
   234  free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
   235  
   236  if (hv_vp_assist_page && hv_vp_assist_page[cpu])
   237  wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
   238  
   239  if (hv_reenlightenment_cb == NULL)
   240  return 0;
   241  
   242  rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
   243  if (re_ctrl.target_vp == hv_vp_index[cpu])

Re: [PATCH v2] vdpa_sim: use iova module to allocate IOVA addresses

2021-01-20 Thread Stefano Garzarella

Hi Michael,
I'm restarting the work on vdpa-blk simulator, and this patch is needed 
to have it working properly.


Do you plan to queue this patch or would you prefer that I include it in 
my next vdpa-blk-sim series?


Thanks,
Stefano

On Wed, Dec 23, 2020 at 10:06:08AM +0100, Stefano Garzarella wrote:

The identical mapping used until now created issues when mapping
different virtual pages with the same physical address.
To solve this issue, we can use the iova module, to handle the IOVA
allocation.
For simplicity we use an IOVA allocator with byte granularity.

We add two new functions, vdpasim_map_range() and vdpasim_unmap_range(),
to handle the IOVA allocation and the registration into the IOMMU/IOTLB.
These functions are used by dma_map_ops callbacks.

Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---
v2:
- used ULONG_MAX instead of ~0UL [Jason]
- fixed typos in comment and patch description [Jason]
---
drivers/vdpa/vdpa_sim/vdpa_sim.h |   2 +
drivers/vdpa/vdpa_sim/vdpa_sim.c | 108 +++
drivers/vdpa/Kconfig |   1 +
3 files changed, 69 insertions(+), 42 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index b02142293d5b..6efe205e583e 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -6,6 +6,7 @@
#ifndef _VDPA_SIM_H
#define _VDPA_SIM_H

+#include 
#include 
#include 
#include 
@@ -55,6 +56,7 @@ struct vdpasim {
/* virtio config according to device type */
void *config;
struct vhost_iotlb *iommu;
+   struct iova_domain iova;
void *buffer;
u32 status;
u32 generation;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index b3fcc67bfdf0..edc930719fb8 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -17,6 +17,7 @@
#include 
#include 
#include 
+#include 

#include "vdpa_sim.h"

@@ -128,30 +129,57 @@ static int dir_to_perm(enum dma_data_direction dir)
return perm;
}

+static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr,
+   size_t size, unsigned int perm)
+{
+   struct iova *iova;
+   dma_addr_t dma_addr;
+   int ret;
+
+   /* We set the limit_pfn to the maximum (ULONG_MAX - 1) */
+   iova = alloc_iova(&vdpasim->iova, size, ULONG_MAX - 1, true);
+   if (!iova)
+   return DMA_MAPPING_ERROR;
+
+   dma_addr = iova_dma_addr(&vdpasim->iova, iova);
+
+   spin_lock(&vdpasim->iommu_lock);
+   ret = vhost_iotlb_add_range(vdpasim->iommu, (u64)dma_addr,
+   (u64)dma_addr + size - 1, (u64)paddr, perm);
+   spin_unlock(&vdpasim->iommu_lock);
+
+   if (ret) {
+   __free_iova(&vdpasim->iova, iova);
+   return DMA_MAPPING_ERROR;
+   }
+
+   return dma_addr;
+}
+
+static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr,
+   size_t size)
+{
+   spin_lock(&vdpasim->iommu_lock);
+   vhost_iotlb_del_range(vdpasim->iommu, (u64)dma_addr,
+ (u64)dma_addr + size - 1);
+   spin_unlock(&vdpasim->iommu_lock);
+
+   free_iova(&vdpasim->iova, iova_pfn(&vdpasim->iova, dma_addr));
+}
+
static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page,
   unsigned long offset, size_t size,
   enum dma_data_direction dir,
   unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
-   struct vhost_iotlb *iommu = vdpasim->iommu;
-   u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset;
-   int ret, perm = dir_to_perm(dir);
+   phys_addr_t paddr = page_to_phys(page) + offset;
+   int perm = dir_to_perm(dir);

if (perm < 0)
return DMA_MAPPING_ERROR;

-   /* For simplicity, use identical mapping to avoid e.g iova
-* allocator.
-*/
-   spin_lock(&vdpasim->iommu_lock);
-   ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1,
-   pa, dir_to_perm(dir));
-   spin_unlock(&vdpasim->iommu_lock);
-   if (ret)
-   return DMA_MAPPING_ERROR;
-
-   return (dma_addr_t)(pa);
+   return vdpasim_map_range(vdpasim, paddr, size, perm);
}

static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr,
@@ -159,12 +187,8 @@ static void vdpasim_unmap_page(struct device *dev, 
dma_addr_t dma_addr,
   unsigned long attrs)
{
struct vdpasim *vdpasim = dev_to_sim(dev);
-   struct vhost_iotlb *iommu = vdpasim->iommu;

-   spin_lock(&vdpasim->iommu_lock);
-   vhost_iotlb_del_range(iommu, (u64)dma_addr,
- (u64)dma_addr + size - 1);
-   spin_unlock(&vdpasim->iommu_lock);
+   vdpasim_unmap_range(vdpasim, dma_add

[PATCH v4 14/15] x86/paravirt: switch functions with custom code to ALTERNATIVE

2021-01-20 Thread Juergen Gross via Virtualization
Instead of using paravirt patching for custom code sequences use
ALTERNATIVE for the functions with custom code replacements.

Instead of patching an ud2 instruction for unpopulated vector entries
into the caller site, use a simple function just calling BUG() as a
replacement.

Simplify the register defines for assembler paravirt calling, as there
isn't much usage left.

Signed-off-by: Juergen Gross 
---
V4:
- fixed SAVE_FLAGS() (kernel test robot)
- added assembler paravirt cleanup
---
 arch/x86/entry/entry_64.S |  2 +-
 arch/x86/include/asm/irqflags.h   |  2 +-
 arch/x86/include/asm/paravirt.h   | 99 +--
 arch/x86/include/asm/paravirt_types.h |  6 --
 arch/x86/kernel/paravirt.c| 16 ++---
 arch/x86/kernel/paravirt_patch.c  | 88 
 6 files changed, 56 insertions(+), 157 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ce0464d630a2..714af508fe30 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -305,7 +305,7 @@ SYM_CODE_END(ret_from_fork)
 .macro DEBUG_ENTRY_ASSERT_IRQS_OFF
 #ifdef CONFIG_DEBUG_ENTRY
pushq %rax
-   SAVE_FLAGS(CLBR_RAX)
+   SAVE_FLAGS
testl $X86_EFLAGS_IF, %eax
jz .Lokay_\@
ud2
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index a0efbcd24b86..c5ce9845c999 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -111,7 +111,7 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_DEBUG_ENTRY
-#define SAVE_FLAGS(x)  pushfq; popq %rax
+#define SAVE_FLAGS pushfq; popq %rax
 #endif
 
 #define INTERRUPT_RETURN   jmp native_iret
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 36cd71fa097f..04b3067f31b5 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -137,7 +137,8 @@ static inline void write_cr0(unsigned long x)
 
 static inline unsigned long read_cr2(void)
 {
-   return PVOP_CALLEE0(unsigned long, mmu.read_cr2);
+   return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2,
+   "mov %%cr2, %%rax;", ~X86_FEATURE_XENPV);
 }
 
 static inline void write_cr2(unsigned long x)
@@ -147,12 +148,14 @@ static inline void write_cr2(unsigned long x)
 
 static inline unsigned long __read_cr3(void)
 {
-   return PVOP_CALL0(unsigned long, mmu.read_cr3);
+   return PVOP_ALT_CALL0(unsigned long, mmu.read_cr3,
+ "mov %%cr3, %%rax;", ~X86_FEATURE_XENPV);
 }
 
 static inline void write_cr3(unsigned long x)
 {
-   PVOP_VCALL1(mmu.write_cr3, x);
+   PVOP_ALT_VCALL1(mmu.write_cr3, x,
+   "mov %%rdi, %%cr3", ~X86_FEATURE_XENPV);
 }
 
 static inline void __write_cr4(unsigned long x)
@@ -172,7 +175,7 @@ static inline void halt(void)
 
 static inline void wbinvd(void)
 {
-   PVOP_VCALL0(cpu.wbinvd);
+   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ~X86_FEATURE_XENPV);
 }
 
 static inline u64 paravirt_read_msr(unsigned msr)
@@ -386,22 +389,28 @@ static inline void paravirt_release_p4d(unsigned long pfn)
 
 static inline pte_t __pte(pteval_t val)
 {
-   return (pte_t) { PVOP_CALLEE1(pteval_t, mmu.make_pte, val) };
+   return (pte_t) { PVOP_ALT_CALLEE1(pteval_t, mmu.make_pte, val,
+ "mov %%rdi, %%rax",
+ ~X86_FEATURE_XENPV) };
 }
 
 static inline pteval_t pte_val(pte_t pte)
 {
-   return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte);
+   return PVOP_ALT_CALLEE1(pteval_t, mmu.pte_val, pte.pte,
+   "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 }
 
 static inline pgd_t __pgd(pgdval_t val)
 {
-   return (pgd_t) { PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val) };
+   return (pgd_t) { PVOP_ALT_CALLEE1(pgdval_t, mmu.make_pgd, val,
+ "mov %%rdi, %%rax",
+ ~X86_FEATURE_XENPV) };
 }
 
 static inline pgdval_t pgd_val(pgd_t pgd)
 {
-   return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd);
+   return PVOP_ALT_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd,
+   "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 }
 
 #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -434,12 +443,15 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline pmd_t __pmd(pmdval_t val)
 {
-   return (pmd_t) { PVOP_CALLEE1(pmdval_t, mmu.make_pmd, val) };
+   return (pmd_t) { PVOP_ALT_CALLEE1(pmdval_t, mmu.make_pmd, val,
+ "mov %%rdi, %%rax",
+ ~X86_FEATURE_XENPV) };
 }
 
 static inline pmdval_t pmd_val(pmd_t pmd)
 {
-   return PVOP_CALLEE1(pmdval_t, mmu.pmd_val, pmd.pmd);
+   return PVOP_ALT_CALLEE1(pmdval_t, mmu.pmd_val, pmd.pmd,
+  

[PATCH v4 10/15] x86/paravirt: remove no longer needed 32-bit pvops cruft

2021-01-20 Thread Juergen Gross via Virtualization
PVOP_VCALL4() is only used for Xen PV, while PVOP_CALL4() isn't used
at all. Keep PVOP_CALL4() for 64 bits due to symmetry reasons.

This allows to remove the 32-bit definitions of those macros leading
to a substantial simplification of the paravirt macros, as those were
the only ones needing non-empty "pre" and "post" parameters.

PVOP_CALLEE2() and PVOP_VCALLEE2() are used nowhere, so remove them.

Another no longer needed case is special handling of return types
larger than unsigned long. Replace that with a BUILD_BUG_ON().

DISABLE_INTERRUPTS() is used in 32-bit code only, so it can just be
replaced by cli.

INTERRUPT_RETURN in 32-bit code can be replaced by iret.

ENABLE_INTERRUPTS is used nowhere, so it can be removed.

Signed-off-by: Juergen Gross 
---
 arch/x86/entry/entry_32.S |   4 +-
 arch/x86/include/asm/irqflags.h   |   5 --
 arch/x86/include/asm/paravirt.h   |  35 +---
 arch/x86/include/asm/paravirt_types.h | 112 --
 arch/x86/kernel/asm-offsets.c |   2 -
 5 files changed, 35 insertions(+), 123 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df8c017e6161..765487e57d6e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -430,7 +430,7 @@
 * will soon execute iret and the tracer was already set to
 * the irqstate after the IRET:
 */
-   DISABLE_INTERRUPTS(CLBR_ANY)
+   cli
lss (%esp), %esp/* switch to espfix segment */
 .Lend_\@:
 #endif /* CONFIG_X86_ESPFIX32 */
@@ -1077,7 +1077,7 @@ restore_all_switch_stack:
 * when returning from IPI handler and when returning from
 * scheduler to user-space.
 */
-   INTERRUPT_RETURN
+   iret
 
 .section .fixup, "ax"
 SYM_CODE_START(asm_iret_error)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 144d70ea4393..a0efbcd24b86 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -109,9 +109,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 }
 #else
 
-#define ENABLE_INTERRUPTS(x)   sti
-#define DISABLE_INTERRUPTS(x)  cli
-
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(x)  pushfq; popq %rax
@@ -119,8 +116,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 
 #define INTERRUPT_RETURN   jmp native_iret
 
-#else
-#define INTERRUPT_RETURN   iret
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 8c354099d9c3..c6496a82fad1 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -721,6 +721,7 @@ extern void default_banner(void);
.if ((~(set)) & mask); pop %reg; .endif
 
 #ifdef CONFIG_X86_64
+#ifdef CONFIG_PARAVIRT_XXL
 
 #define PV_SAVE_REGS(set)  \
COND_PUSH(set, CLBR_RAX, rax);  \
@@ -746,46 +747,12 @@ extern void default_banner(void);
 #define PARA_PATCH(off)((off) / 8)
 #define PARA_SITE(ptype, ops)  _PVSITE(ptype, ops, .quad, 8)
 #define PARA_INDIRECT(addr)*addr(%rip)
-#else
-#define PV_SAVE_REGS(set)  \
-   COND_PUSH(set, CLBR_EAX, eax);  \
-   COND_PUSH(set, CLBR_EDI, edi);  \
-   COND_PUSH(set, CLBR_ECX, ecx);  \
-   COND_PUSH(set, CLBR_EDX, edx)
-#define PV_RESTORE_REGS(set)   \
-   COND_POP(set, CLBR_EDX, edx);   \
-   COND_POP(set, CLBR_ECX, ecx);   \
-   COND_POP(set, CLBR_EDI, edi);   \
-   COND_POP(set, CLBR_EAX, eax)
-
-#define PARA_PATCH(off)((off) / 4)
-#define PARA_SITE(ptype, ops)  _PVSITE(ptype, ops, .long, 4)
-#define PARA_INDIRECT(addr)*%cs:addr
-#endif
 
-#ifdef CONFIG_PARAVIRT_XXL
 #define INTERRUPT_RETURN   \
PARA_SITE(PARA_PATCH(PV_CPU_iret),  \
  ANNOTATE_RETPOLINE_SAFE;  \
  jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
 
-#define DISABLE_INTERRUPTS(clobbers)   \
-   PARA_SITE(PARA_PATCH(PV_IRQ_irq_disable),   \
- PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\
- ANNOTATE_RETPOLINE_SAFE;  \
- call PARA_INDIRECT(pv_ops+PV_IRQ_irq_disable);\
- PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
-
-#define ENABLE_INTERRUPTS(clobbers)\
-   PARA_SITE(PARA_PATCH(PV_IRQ_irq_enable),\
- PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\
- ANNOTATE_RETPOLINE_SAFE;  \
- call PARA_INDIRECT(pv_ops+PV_IRQ_irq_enable); \
- PV_RESTORE_REGS(clobbers 

[PATCH v4 15/15] x86/paravirt: have only one paravirt patch function

2021-01-20 Thread Juergen Gross via Virtualization
There is no need any longer to have different paravirt patch functions
for native and Xen. Eliminate native_patch() and rename
paravirt_patch_default() to paravirt_patch().

Signed-off-by: Juergen Gross 
---
V3:
- remove paravirt_patch_insns() (kernel test robot)
---
 arch/x86/include/asm/paravirt_types.h | 19 +--
 arch/x86/kernel/Makefile  |  3 +--
 arch/x86/kernel/alternative.c |  2 +-
 arch/x86/kernel/paravirt.c| 20 ++--
 arch/x86/kernel/paravirt_patch.c  | 11 ---
 arch/x86/xen/enlighten_pv.c   |  1 -
 6 files changed, 5 insertions(+), 51 deletions(-)
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 588ff14ce969..62efbf8bd8f0 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -68,19 +68,6 @@ struct pv_info {
const char *name;
 };
 
-struct pv_init_ops {
-   /*
-* Patch may replace one of the defined code sequences with
-* arbitrary code, subject to the same register constraints.
-* This generally means the code is not free to clobber any
-* registers other than EAX.  The patch function should return
-* the number of bytes of code generated, as we nop pad the
-* rest in generic code.
-*/
-   unsigned (*patch)(u8 type, void *insn_buff,
- unsigned long addr, unsigned len);
-} __no_randomize_layout;
-
 #ifdef CONFIG_PARAVIRT_XXL
 struct pv_lazy_ops {
/* Set deferred update mode, used for batching operations. */
@@ -276,7 +263,6 @@ struct pv_lock_ops {
  * number for each function using the offset which we use to indicate
  * what to patch. */
 struct paravirt_patch_template {
-   struct pv_init_ops  init;
struct pv_cpu_ops   cpu;
struct pv_irq_ops   irq;
struct pv_mmu_ops   mmu;
@@ -317,10 +303,7 @@ extern void (*paravirt_iret)(void);
 /* Simple instruction patching code. */
 #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
 
-unsigned paravirt_patch_default(u8 type, void *insn_buff, unsigned long addr, 
unsigned len);
-unsigned paravirt_patch_insns(void *insn_buff, unsigned len, const char 
*start, const char *end);
-
-unsigned native_patch(u8 type, void *insn_buff, unsigned long addr, unsigned 
len);
+unsigned paravirt_patch(u8 type, void *insn_buff, unsigned long addr, unsigned 
len);
 
 int paravirt_disable_iospace(void);
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 5eeb808eb024..853a83503120 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -35,7 +35,6 @@ KASAN_SANITIZE_sev-es.o   
:= n
 KCSAN_SANITIZE := n
 
 OBJECT_FILES_NON_STANDARD_test_nx.o:= y
-OBJECT_FILES_NON_STANDARD_paravirt_patch.o := y
 
 ifdef CONFIG_FRAME_POINTER
 OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y
@@ -122,7 +121,7 @@ obj-$(CONFIG_AMD_NB)+= amd_nb.o
 obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
 
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvmclock.o
-obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 221acb2b868a..fb0b83c85de7 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -613,7 +613,7 @@ void __init_or_module apply_paravirt(struct 
paravirt_patch_site *start,
BUG_ON(p->len > MAX_PATCH_LEN);
/* prep the buffer with the original instructions */
memcpy(insn_buff, p->instr, p->len);
-   used = pv_ops.init.patch(p->type, insn_buff, (unsigned 
long)p->instr, p->len);
+   used = paravirt_patch(p->type, insn_buff, (unsigned 
long)p->instr, p->len);
 
BUG_ON(used > p->len);
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 082954930809..3d7b989ed6be 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -99,8 +99,8 @@ void __init native_pv_lock_init(void)
static_branch_disable(&virt_spin_lock_key);
 }
 
-unsigned paravirt_patch_default(u8 type, void *insn_buff,
-   unsigned long addr, unsigned len)
+unsigned int paravirt_patch(u8 type, void *insn_buff, unsigned long addr,
+   unsigned int len)
 {
/*
 * Neat trick to map patch type back to the call within the
@@ -121,19 +121,6 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
return ret;
 }
 
-unsigned paravirt_patch_insns(void *insn_buff, unsigned len,
-   

[PATCH v4 07/15] x86/paravirt: switch time pvops functions to use static_call()

2021-01-20 Thread Juergen Gross via Virtualization
The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.

Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.

Signed-off-by: Juergen Gross 
---
V4:
- drop paravirt_time.h again
- don't move Hyper-V code (Michael Kelley)
---
 arch/x86/Kconfig  |  1 +
 arch/x86/include/asm/mshyperv.h   |  2 +-
 arch/x86/include/asm/paravirt.h   | 17 ++---
 arch/x86/include/asm/paravirt_types.h |  6 --
 arch/x86/kernel/cpu/vmware.c  |  5 +++--
 arch/x86/kernel/kvm.c |  2 +-
 arch/x86/kernel/kvmclock.c|  2 +-
 arch/x86/kernel/paravirt.c| 16 
 arch/x86/kernel/tsc.c |  2 +-
 arch/x86/xen/time.c   | 11 ---
 drivers/clocksource/hyperv_timer.c|  5 +++--
 drivers/xen/time.c|  2 +-
 12 files changed, 42 insertions(+), 29 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..7ccd4a80788c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -771,6 +771,7 @@ if HYPERVISOR_GUEST
 
 config PARAVIRT
bool "Enable paravirtualization code"
+   depends on HAVE_STATIC_CALL
help
  This changes the kernel so it can modify itself when it is run
  under a hypervisor, potentially improving performance significantly
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 30f76b966857..b4ee331d29a7 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -63,7 +63,7 @@ typedef int (*hyperv_fill_flush_list_func)(
 static __always_inline void hv_setup_sched_clock(void *sched_clock)
 {
 #ifdef CONFIG_PARAVIRT
-   pv_ops.time.sched_clock = sched_clock;
+   paravirt_set_sched_clock(sched_clock);
 #endif
 }
 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 4abf110e2243..1e45b46fae84 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -15,11 +15,22 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
-static inline unsigned long long paravirt_sched_clock(void)
+u64 dummy_steal_clock(int cpu);
+u64 dummy_sched_clock(void);
+
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
+
+extern bool paravirt_using_native_sched_clock;
+
+void paravirt_set_sched_clock(u64 (*func)(void));
+
+static inline u64 paravirt_sched_clock(void)
 {
-   return PVOP_CALL0(unsigned long long, time.sched_clock);
+   return static_call(pv_sched_clock)();
 }
 
 struct static_key;
@@ -33,7 +44,7 @@ bool pv_is_native_vcpu_is_preempted(void);
 
 static inline u64 paravirt_steal_clock(int cpu)
 {
-   return PVOP_CALL1(u64, time.steal_clock, cpu);
+   return static_call(pv_steal_clock)(cpu);
 }
 
 /* The paravirtualized I/O functions */
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index de87087d3bde..1fff349e4792 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -95,11 +95,6 @@ struct pv_lazy_ops {
 } __no_randomize_layout;
 #endif
 
-struct pv_time_ops {
-   unsigned long long (*sched_clock)(void);
-   unsigned long long (*steal_clock)(int cpu);
-} __no_randomize_layout;
-
 struct pv_cpu_ops {
/* hooks for various privileged instructions */
void (*io_delay)(void);
@@ -291,7 +286,6 @@ struct pv_lock_ops {
  * what to patch. */
 struct paravirt_patch_template {
struct pv_init_ops  init;
-   struct pv_time_ops  time;
struct pv_cpu_ops   cpu;
struct pv_irq_ops   irq;
struct pv_mmu_ops   mmu;
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index c6ede3b3d302..84fb8e3f3d1b 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -336,11 +337,11 @@ static void __init vmware_paravirt_ops_setup(void)
vmware_cyc2ns_setup();
 
if (vmw_sched_clock)
-   pv_ops.time.sched_clock = vmware_sched_clock;
+   paravirt_set_sched_clock(vmware_sched_clock);
 
if (vmware_is_stealclock_available()) {
has_steal_clock = true;
-   pv_ops.time.steal_clock = vmware_steal_clock;
+   static_call_update(pv_steal_clock, vmware_steal_clock);
 
/* We use reboot notifier only to disable steal clock */
register_reboot_notifier(&vmware_pv_reboot_nb);
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5e78e01ca3b4..351ba99f6009 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -650,7 +650,7 @@ static void __init kvm_guest_init(void)
 
if (kvm_para_has_feature(KVM_FEATUR

[PATCH v4 13/15] x86/paravirt: add new macros PVOP_ALT* supporting pvops in ALTERNATIVEs

2021-01-20 Thread Juergen Gross via Virtualization
Instead of using paravirt patching for custom code sequences add
support for using ALTERNATIVE handling combined with paravirt call
patching.

Signed-off-by: Juergen Gross 
---
V3:
- drop PVOP_ALT_VCALL() macro
---
 arch/x86/include/asm/paravirt_types.h | 49 ++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 0afdac83f926..0ed976286d49 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -477,44 +477,91 @@ int paravirt_disable_iospace(void);
ret;\
})
 
+#define PVOP_ALT_CALL(ret, op, alt, cond, clbr, call_clbr, \
+ extra_clbr, ...)  \
+   ({  \
+   PVOP_CALL_ARGS; \
+   PVOP_TEST_NULL(op); \
+   asm volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),   \
+alt, cond) \
+: call_clbr, ASM_CALL_CONSTRAINT   \
+: paravirt_type(op),   \
+  paravirt_clobber(clbr),  \
+  ##__VA_ARGS__\
+: "memory", "cc" extra_clbr);  \
+   ret;\
+   })
+
 #define __PVOP_CALL(rettype, op, ...)  \
PVOP_CALL(PVOP_RETVAL(rettype), op, CLBR_ANY,   \
  PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALL(rettype, op, alt, cond, ...)   \
+   PVOP_ALT_CALL(PVOP_RETVAL(rettype), op, alt, cond, CLBR_ANY,\
+ PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS,   \
+ ##__VA_ARGS__)
+
 #define __PVOP_CALLEESAVE(rettype, op, ...)\
PVOP_CALL(PVOP_RETVAL(rettype), op.func, CLBR_RET_REG,  \
  PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond, ...) \
+   PVOP_ALT_CALL(PVOP_RETVAL(rettype), op.func, alt, cond, \
+ CLBR_RET_REG, PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
+
+
 #define __PVOP_VCALL(op, ...)  \
(void)PVOP_CALL(, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,\
   VEXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALL(op, alt, cond, ...)   \
+   (void)PVOP_ALT_CALL(, op, alt, cond, CLBR_ANY,  \
+   PVOP_VCALL_CLOBBERS, VEXTRA_CLOBBERS,   \
+   ##__VA_ARGS__)
+
 #define __PVOP_VCALLEESAVE(op, ...)\
(void)PVOP_CALL(, op.func, CLBR_RET_REG,\
- PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
+   PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALLEESAVE(op, alt, cond, ...) \
+   (void)PVOP_ALT_CALL(, op.func, alt, cond, CLBR_RET_REG, \
+   PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
 
 #define PVOP_CALL0(rettype, op)
\
__PVOP_CALL(rettype, op)
 #define PVOP_VCALL0(op)
\
__PVOP_VCALL(op)
+#define PVOP_ALT_CALL0(rettype, op, alt, cond) \
+   __PVOP_ALT_CALL(rettype, op, alt, cond)
+#define PVOP_ALT_VCALL0(op, alt, cond) \
+   __PVOP_ALT_VCALL(op, alt, cond)
 
 #define PVOP_CALLEE0(rettype, op)  \
__PVOP_CALLEESAVE(rettype, op)
 #define PVOP_VCALLEE0(op)  \
__PVOP_VCALLEESAVE(op)
+#define PVOP_ALT_CALLEE0(rettype, op, alt, cond)   \
+   __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond)
+#define PVOP_ALT_VCALLEE0(op, alt, cond)   \
+   __PVOP_ALT_VCALLEESAVE(op, alt, cond)
 
 
 #define PVOP_CALL1(rettype, op, arg1)  \
__PVOP_CALL(rettype, op, PVOP_CALL_ARG1(arg1))
 #define PVOP_VCALL1(op, arg1)  \
__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1))
+#define PVOP_ALT_VCALL1(op, arg1, alt, cond)   \
+   __PVOP_ALT_VCALL(op, alt, cond, PVOP_CALL_ARG1(arg1))
 
 #define PVOP_CALLEE1(rettype, op, arg1)
\
 

[PATCH v4 11/15] x86/paravirt: simplify paravirt macros

2021-01-20 Thread Juergen Gross via Virtualization
The central pvops call macros PVOP_CALL() and PVOP_VCALL() are
looking very similar now.

The main differences are using PVOP_VCALL_ARGS or PVOP_CALL_ARGS, which
are identical, and the return value handling.

So drop PVOP_VCALL_ARGS and instead of PVOP_VCALL() just use
(void)PVOP_CALL(long, ...).

Note that it isn't easily possible to just redefine PVOP_VCALL()
to use PVOP_CALL() instead, as this would require further hiding of
commas in macro parameters.

Signed-off-by: Juergen Gross 
---
V3:
- new patch

V4:
- fix build warnings with clang (kernel test robot)
---
 arch/x86/include/asm/paravirt_types.h | 41 ---
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 42f9eef84131..45bd21647dd8 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -408,11 +408,9 @@ int paravirt_disable_iospace(void);
  * makes sure the incoming and outgoing types are always correct.
  */
 #ifdef CONFIG_X86_32
-#define PVOP_VCALL_ARGS
\
+#define PVOP_CALL_ARGS \
unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
 
-#define PVOP_CALL_ARGS PVOP_VCALL_ARGS
-
 #define PVOP_CALL_ARG1(x)  "a" ((unsigned long)(x))
 #define PVOP_CALL_ARG2(x)  "d" ((unsigned long)(x))
 #define PVOP_CALL_ARG3(x)  "c" ((unsigned long)(x))
@@ -428,12 +426,10 @@ int paravirt_disable_iospace(void);
 #define VEXTRA_CLOBBERS
 #else  /* CONFIG_X86_64 */
 /* [re]ax isn't an arg, but the return val */
-#define PVOP_VCALL_ARGS\
+#define PVOP_CALL_ARGS \
unsigned long __edi = __edi, __esi = __esi, \
__edx = __edx, __ecx = __ecx, __eax = __eax;
 
-#define PVOP_CALL_ARGS PVOP_VCALL_ARGS
-
 #define PVOP_CALL_ARG1(x)  "D" ((unsigned long)(x))
 #define PVOP_CALL_ARG2(x)  "S" ((unsigned long)(x))
 #define PVOP_CALL_ARG3(x)  "d" ((unsigned long)(x))
@@ -458,59 +454,46 @@ int paravirt_disable_iospace(void);
 #define PVOP_TEST_NULL(op) ((void)pv_ops.op)
 #endif
 
-#define PVOP_RETMASK(rettype)  \
+#define PVOP_RETVAL(rettype)   \
({  unsigned long __mask = ~0UL;\
+   BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));  \
switch (sizeof(rettype)) {  \
case 1: __mask =   0xffUL; break;   \
case 2: __mask = 0xUL; break;   \
case 4: __mask = 0xUL; break;   \
default: break; \
}   \
-   __mask; \
+   __mask & __eax; \
})
 
 
-#define PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr, ...)   \
+#define PVOP_CALL(ret, op, clbr, call_clbr, extra_clbr, ...)   \
({  \
PVOP_CALL_ARGS; \
PVOP_TEST_NULL(op); \
-   BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));  \
asm volatile(paravirt_alt(PARAVIRT_CALL)\
 : call_clbr, ASM_CALL_CONSTRAINT   \
 : paravirt_type(op),   \
   paravirt_clobber(clbr),  \
   ##__VA_ARGS__\
 : "memory", "cc" extra_clbr);  \
-   (rettype)(__eax & PVOP_RETMASK(rettype));   \
+   ret;\
})
 
 #define __PVOP_CALL(rettype, op, ...)  \
-   PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS,\
- EXTRA_CLOBBERS, ##__VA_ARGS__)
+   PVOP_CALL(PVOP_RETVAL(rettype), op, CLBR_ANY,   \
+ PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS, ##__VA_ARGS__)
 
 #define __PVOP_CALLEESAVE(rettype, op, ...)\
-   PVOP_CALL(rettype, op.func, CLBR_RET_REG,   \
+   PVOP_CALL(PVOP_RETVAL(rettype), op.func, CLBR_RET_REG,  \
  PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
 
-
-#define PVOP_VCALL(op, clbr, call_clbr, extra

[PATCH v4 06/15] x86: rework arch_local_irq_restore() to not use popf

2021-01-20 Thread Juergen Gross via Virtualization
"popf" is a rather expensive operation, so don't use it for restoring
irq flags. Instead test whether interrupts are enabled in the flags
parameter and enable interrupts via "sti" in that case.

This results in the restore_fl paravirt op to be no longer needed.

Suggested-by: Andy Lutomirski 
Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/irqflags.h   | 20 ++-
 arch/x86/include/asm/paravirt.h   |  5 -
 arch/x86/include/asm/paravirt_types.h |  7 ++-
 arch/x86/kernel/irqflags.S| 11 ---
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/kernel/paravirt_patch.c  |  3 ---
 arch/x86/xen/enlighten_pv.c   |  2 --
 arch/x86/xen/irq.c| 23 --
 arch/x86/xen/xen-asm.S| 28 ---
 arch/x86/xen/xen-ops.h|  1 -
 10 files changed, 8 insertions(+), 93 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index e585a4705b8d..144d70ea4393 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -35,15 +35,6 @@ extern __always_inline unsigned long native_save_fl(void)
return flags;
 }
 
-extern inline void native_restore_fl(unsigned long flags);
-extern inline void native_restore_fl(unsigned long flags)
-{
-   asm volatile("push %0 ; popf"
-: /* no output */
-:"g" (flags)
-:"memory", "cc");
-}
-
 static __always_inline void native_irq_disable(void)
 {
asm volatile("cli": : :"memory");
@@ -79,11 +70,6 @@ static __always_inline unsigned long 
arch_local_save_flags(void)
return native_save_fl();
 }
 
-static __always_inline void arch_local_irq_restore(unsigned long flags)
-{
-   native_restore_fl(flags);
-}
-
 static __always_inline void arch_local_irq_disable(void)
 {
native_irq_disable();
@@ -152,6 +138,12 @@ static __always_inline int arch_irqs_disabled(void)
 
return arch_irqs_disabled_flags(flags);
 }
+
+static __always_inline void arch_local_irq_restore(unsigned long flags)
+{
+   if (!arch_irqs_disabled_flags(flags))
+   arch_local_irq_enable();
+}
 #else
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index dd43b1100a87..4abf110e2243 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -648,11 +648,6 @@ static inline notrace unsigned long 
arch_local_save_flags(void)
return PVOP_CALLEE0(unsigned long, irq.save_fl);
 }
 
-static inline notrace void arch_local_irq_restore(unsigned long f)
-{
-   PVOP_VCALLEE1(irq.restore_fl, f);
-}
-
 static inline notrace void arch_local_irq_disable(void)
 {
PVOP_VCALLEE0(irq.irq_disable);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 0169365f1403..de87087d3bde 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -168,16 +168,13 @@ struct pv_cpu_ops {
 struct pv_irq_ops {
 #ifdef CONFIG_PARAVIRT_XXL
/*
-* Get/set interrupt state.  save_fl and restore_fl are only
-* expected to use X86_EFLAGS_IF; all other bits
-* returned from save_fl are undefined, and may be ignored by
-* restore_fl.
+* Get/set interrupt state.  save_fl is expected to use X86_EFLAGS_IF;
+* all other bits returned from save_fl are undefined.
 *
 * NOTE: These functions callers expect the callee to preserve
 * more registers than the standard C calling convention.
 */
struct paravirt_callee_save save_fl;
-   struct paravirt_callee_save restore_fl;
struct paravirt_callee_save irq_disable;
struct paravirt_callee_save irq_enable;
 
diff --git a/arch/x86/kernel/irqflags.S b/arch/x86/kernel/irqflags.S
index 0db0375235b4..8ef35063964b 100644
--- a/arch/x86/kernel/irqflags.S
+++ b/arch/x86/kernel/irqflags.S
@@ -13,14 +13,3 @@ SYM_FUNC_START(native_save_fl)
ret
 SYM_FUNC_END(native_save_fl)
 EXPORT_SYMBOL(native_save_fl)
-
-/*
- * void native_restore_fl(unsigned long flags)
- * %eax/%rdi: flags
- */
-SYM_FUNC_START(native_restore_fl)
-   push %_ASM_ARG1
-   popf
-   ret
-SYM_FUNC_END(native_restore_fl)
-EXPORT_SYMBOL(native_restore_fl)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 18560b71e717..c60222ab8ab9 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -320,7 +320,6 @@ struct paravirt_patch_template pv_ops = {
 
/* Irq ops. */
.irq.save_fl= __PV_IS_CALLEE_SAVE(native_save_fl),
-   .irq.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
.irq.irq_disable= __PV_IS_CALLEE_SAVE(native_irq_disable),
.irq.irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
.irq.safe_halt  = native

[PATCH v4 05/15] x86/xen: drop USERGS_SYSRET64 paravirt call

2021-01-20 Thread Juergen Gross via Virtualization
USERGS_SYSRET64 is used to return from a syscall via sysret, but
a Xen PV guest will nevertheless use the iret hypercall, as there
is no sysret PV hypercall defined.

So instead of testing all the prerequisites for doing a sysret and
then mangling the stack for Xen PV again for doing an iret just use
the iret exit from the beginning.

This can easily be done via an ALTERNATIVE like it is done for the
sysenter compat case already.

It should be noted that this drops the optimization in Xen for not
restoring a few registers when returning to user mode, but it seems
as if the saved instructions in the kernel more than compensate for
this drop (a kernel build in a Xen PV guest was slightly faster with
this patch applied).

While at it remove the stale sysret32 remnants.

Signed-off-by: Juergen Gross 
---
V3:
- simplify ALTERNATIVE (Boris Petkov)
---
 arch/x86/entry/entry_64.S | 16 +++-
 arch/x86/include/asm/irqflags.h   |  6 --
 arch/x86/include/asm/paravirt.h   |  5 -
 arch/x86/include/asm/paravirt_types.h |  8 
 arch/x86/kernel/asm-offsets_64.c  |  2 --
 arch/x86/kernel/paravirt.c|  5 +
 arch/x86/kernel/paravirt_patch.c  |  4 
 arch/x86/xen/enlighten_pv.c   |  1 -
 arch/x86/xen/xen-asm.S| 20 
 arch/x86/xen/xen-ops.h|  2 --
 10 files changed, 8 insertions(+), 61 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a876204a73e0..ce0464d630a2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -46,14 +46,6 @@
 .code64
 .section .entry.text, "ax"
 
-#ifdef CONFIG_PARAVIRT_XXL
-SYM_CODE_START(native_usergs_sysret64)
-   UNWIND_HINT_EMPTY
-   swapgs
-   sysretq
-SYM_CODE_END(native_usergs_sysret64)
-#endif /* CONFIG_PARAVIRT_XXL */
-
 /*
  * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
  *
@@ -123,7 +115,12 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, 
SYM_L_GLOBAL)
 * Try to use SYSRET instead of IRET if we're returning to
 * a completely clean 64-bit userspace context.  If we're not,
 * go to the slow exit path.
+* In the Xen PV case we must use iret anyway.
 */
+
+   ALTERNATIVE "", "jmpswapgs_restore_regs_and_return_to_usermode", \
+   X86_FEATURE_XENPV
+
movqRCX(%rsp), %rcx
movqRIP(%rsp), %r11
 
@@ -215,7 +212,8 @@ syscall_return_via_sysret:
 
popq%rdi
popq%rsp
-   USERGS_SYSRET64
+   swapgs
+   sysretq
 SYM_CODE_END(entry_SYSCALL_64)
 
 /*
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 8c86edefa115..e585a4705b8d 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -132,12 +132,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 #endif
 
 #define INTERRUPT_RETURN   jmp native_iret
-#define USERGS_SYSRET64\
-   swapgs; \
-   sysretq;
-#define USERGS_SYSRET32\
-   swapgs; \
-   sysretl
 
 #else
 #define INTERRUPT_RETURN   iret
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f2ebe109a37e..dd43b1100a87 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,11 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-#define USERGS_SYSRET64
\
-   PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64),   \
- ANNOTATE_RETPOLINE_SAFE;  \
- jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);)
-
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),   \
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 130f428b0cc8..0169365f1403 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -156,14 +156,6 @@ struct pv_cpu_ops {
 
u64 (*read_pmc)(int counter);
 
-   /*
-* Switch to usermode gs and return to 64-bit usermode using
-* sysret.  Only used in 64-bit kernels to return to 64-bit
-* processes.  Usermode register state, including %rsp, must
-* already be restored.
-*/
-   void (*usergs_sysret64)(void);
-
/* Normal iret.  Jump to this with the standard iret stack
   frame set up. */
void (*iret)(void);
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 1354bc30614d..b14533af7676 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -13,8 +13,6 @@ int main(voi

[PATCH v4 12/15] x86/paravirt: switch iret pvops to ALTERNATIVE

2021-01-20 Thread Juergen Gross via Virtualization
The iret paravirt op is rather special as it is using a jmp instead
of a call instruction. Switch it to ALTERNATIVE.

Signed-off-by: Juergen Gross 
---
V3:
- use ALTERNATIVE_TERNARY
---
 arch/x86/include/asm/paravirt.h   |  6 +++---
 arch/x86/include/asm/paravirt_types.h |  5 +
 arch/x86/kernel/asm-offsets.c |  5 -
 arch/x86/kernel/paravirt.c| 26 ++
 arch/x86/xen/enlighten_pv.c   |  3 +--
 5 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c6496a82fad1..36cd71fa097f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -749,9 +749,9 @@ extern void default_banner(void);
 #define PARA_INDIRECT(addr)*addr(%rip)
 
 #define INTERRUPT_RETURN   \
-   PARA_SITE(PARA_PATCH(PV_CPU_iret),  \
- ANNOTATE_RETPOLINE_SAFE;  \
- jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
+   ANNOTATE_RETPOLINE_SAFE;\
+   ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);",\
+   X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;")
 
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 45bd21647dd8..0afdac83f926 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -151,10 +151,6 @@ struct pv_cpu_ops {
 
u64 (*read_pmc)(int counter);
 
-   /* Normal iret.  Jump to this with the standard iret stack
-  frame set up. */
-   void (*iret)(void);
-
void (*start_context_switch)(struct task_struct *prev);
void (*end_context_switch)(struct task_struct *next);
 #endif
@@ -294,6 +290,7 @@ struct paravirt_patch_template {
 
 extern struct pv_info pv_info;
 extern struct paravirt_patch_template pv_ops;
+extern void (*paravirt_iret)(void);
 
 #define PARAVIRT_PATCH(x)  \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 736508004b30..ecd3fd6993d1 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -61,11 +61,6 @@ static void __used common(void)
OFFSET(IA32_RT_SIGFRAME_sigcontext, rt_sigframe_ia32, uc.uc_mcontext);
 #endif
 
-#ifdef CONFIG_PARAVIRT_XXL
-   BLANK();
-   OFFSET(PV_CPU_iret, paravirt_patch_template, cpu.iret);
-#endif
-
 #ifdef CONFIG_XEN
BLANK();
OFFSET(XEN_vcpu_info_mask, vcpu_info, evtchn_upcall_mask);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 44e5b0fe28cb..0553a339d850 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -86,25 +86,6 @@ u64 notrace _paravirt_ident_64(u64 x)
 {
return x;
 }
-
-static unsigned paravirt_patch_jmp(void *insn_buff, const void *target,
-  unsigned long addr, unsigned len)
-{
-   struct branch *b = insn_buff;
-   unsigned long delta = (unsigned long)target - (addr+5);
-
-   if (len < 5) {
-#ifdef CONFIG_RETPOLINE
-   WARN_ONCE(1, "Failing to patch indirect JMP in %ps\n", (void 
*)addr);
-#endif
-   return len; /* call too long for patch site */
-   }
-
-   b->opcode = 0xe9;   /* jmp */
-   b->delta = delta;
-
-   return 5;
-}
 #endif
 
 DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
@@ -136,9 +117,6 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
else if (opfunc == _paravirt_ident_64)
ret = paravirt_patch_ident_64(insn_buff, len);
 
-   else if (type == PARAVIRT_PATCH(cpu.iret))
-   /* If operation requires a jmp, then jmp */
-   ret = paravirt_patch_jmp(insn_buff, opfunc, addr, len);
 #endif
else
/* Otherwise call the function. */
@@ -316,8 +294,6 @@ struct paravirt_patch_template pv_ops = {
 
.cpu.load_sp0   = native_load_sp0,
 
-   .cpu.iret   = native_iret,
-
 #ifdef CONFIG_X86_IOPL_IOPERM
.cpu.invalidate_io_bitmap   = native_tss_invalidate_io_bitmap,
.cpu.update_io_bitmap   = native_tss_update_io_bitmap,
@@ -422,6 +398,8 @@ struct paravirt_patch_template pv_ops = {
 NOKPROBE_SYMBOL(native_get_debugreg);
 NOKPROBE_SYMBOL(native_set_debugreg);
 NOKPROBE_SYMBOL(native_load_idt);
+
+void (*paravirt_iret)(void) = native_iret;
 #endif
 
 EXPORT_SYMBOL(pv_ops);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 32b295cc2716..4716383c64a9 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1057,8 +1057,6 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst =

[PATCH v4 00/15] x86: major paravirt cleanup

2021-01-20 Thread Juergen Gross via Virtualization
[Resend due to all the Cc:'s missing]

This is a major cleanup of the paravirt infrastructure aiming at
eliminating all custom code patching via paravirt patching.

This is achieved by using ALTERNATIVE instead, leading to the ability
to give objtool access to the patched in instructions.

In order to remove most of the 32-bit special handling from pvops the
time related operations are switched to use static_call() instead.

At the end of this series all paravirt patching has to do is to
replace indirect calls with direct ones. In a further step this could
be switched to static_call(), too, but that would require a major
header file disentangling.

For a clean build without any objtool warnings a modified objtool is
required. Currently this is available in the "tip" tree in the
objtool/core branch.

Changes in V4:
- fixed several build failures
- removed objtool patch, as objtool patches are in tip now
- added patch 1 for making usage of static_call easier
- even more cleanup

Changes in V3:
- added patches 7 and 12
- addressed all comments

Changes in V2:
- added patches 5-12

Juergen Gross (14):
  x86/xen: use specific Xen pv interrupt entry for MCE
  x86/xen: use specific Xen pv interrupt entry for DF
  x86/pv: switch SWAPGS to ALTERNATIVE
  x86/xen: drop USERGS_SYSRET64 paravirt call
  x86: rework arch_local_irq_restore() to not use popf
  x86/paravirt: switch time pvops functions to use static_call()
  x86/alternative: support "not feature" and ALTERNATIVE_TERNARY
  x86: add new features for paravirt patching
  x86/paravirt: remove no longer needed 32-bit pvops cruft
  x86/paravirt: simplify paravirt macros
  x86/paravirt: switch iret pvops to ALTERNATIVE
  x86/paravirt: add new macros PVOP_ALT* supporting pvops in
ALTERNATIVEs
  x86/paravirt: switch functions with custom code to ALTERNATIVE
  x86/paravirt: have only one paravirt patch function

Peter Zijlstra (1):
  static_call: Pull some static_call declarations to the type headers

 arch/x86/Kconfig|   1 +
 arch/x86/entry/entry_32.S   |   4 +-
 arch/x86/entry/entry_64.S   |  28 ++-
 arch/x86/include/asm/alternative-asm.h  |   4 +
 arch/x86/include/asm/alternative.h  |   7 +
 arch/x86/include/asm/cpufeatures.h  |   2 +
 arch/x86/include/asm/idtentry.h |   6 +
 arch/x86/include/asm/irqflags.h |  53 ++
 arch/x86/include/asm/mshyperv.h |   2 +-
 arch/x86/include/asm/paravirt.h | 197 
 arch/x86/include/asm/paravirt_types.h   | 227 +---
 arch/x86/kernel/Makefile|   3 +-
 arch/x86/kernel/alternative.c   |  49 -
 arch/x86/kernel/asm-offsets.c   |   7 -
 arch/x86/kernel/asm-offsets_64.c|   3 -
 arch/x86/kernel/cpu/vmware.c|   5 +-
 arch/x86/kernel/irqflags.S  |  11 --
 arch/x86/kernel/kvm.c   |   2 +-
 arch/x86/kernel/kvmclock.c  |   2 +-
 arch/x86/kernel/paravirt-spinlocks.c|   9 +
 arch/x86/kernel/paravirt.c  |  83 +++--
 arch/x86/kernel/paravirt_patch.c| 109 
 arch/x86/kernel/tsc.c   |   2 +-
 arch/x86/xen/enlighten_pv.c |  36 ++--
 arch/x86/xen/irq.c  |  23 ---
 arch/x86/xen/time.c |  11 +-
 arch/x86/xen/xen-asm.S  |  52 +-
 arch/x86/xen/xen-ops.h  |   3 -
 drivers/clocksource/hyperv_timer.c  |   5 +-
 drivers/xen/time.c  |   2 +-
 include/linux/static_call.h |  20 ---
 include/linux/static_call_types.h   |  27 +++
 tools/include/linux/static_call_types.h |  27 +++
 33 files changed, 376 insertions(+), 646 deletions(-)
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 04/15] x86/pv: switch SWAPGS to ALTERNATIVE

2021-01-20 Thread Juergen Gross via Virtualization
SWAPGS is used only for interrupts coming from user mode or for
returning to user mode. So there is no reason to use the PARAVIRT
framework, as it can easily be replaced by an ALTERNATIVE depending
on X86_FEATURE_XENPV.

There are several instances using the PV-aware SWAPGS macro in paths
which are never executed in a Xen PV guest. Replace those with the
plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies.

Signed-off-by: Juergen Gross 
Acked-by: Andy Lutomirski 
Acked-by: Peter Zijlstra (Intel) 
Reviewed-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
---
 arch/x86/entry/entry_64.S | 10 +-
 arch/x86/include/asm/irqflags.h   | 20 
 arch/x86/include/asm/paravirt.h   | 20 
 arch/x86/include/asm/paravirt_types.h |  2 --
 arch/x86/kernel/asm-offsets_64.c  |  1 -
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/kernel/paravirt_patch.c  |  3 ---
 arch/x86/xen/enlighten_pv.c   |  3 ---
 8 files changed, 13 insertions(+), 47 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cad08703c4ad..a876204a73e0 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -669,7 +669,7 @@ native_irq_return_ldt:
 */
 
pushq   %rdi/* Stash user RDI */
-   SWAPGS  /* to kernel GS */
+   swapgs  /* to kernel GS */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi   /* to kernel CR3 */
 
movqPER_CPU_VAR(espfix_waddr), %rdi
@@ -699,7 +699,7 @@ native_irq_return_ldt:
orq PER_CPU_VAR(espfix_stack), %rax
 
SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
-   SWAPGS  /* to user GS */
+   swapgs  /* to user GS */
popq%rdi/* Restore user RDI */
 
movq%rax, %rsp
@@ -943,7 +943,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
ret
 
 .Lparanoid_entry_swapgs:
-   SWAPGS
+   swapgs
 
/*
 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
@@ -1001,7 +1001,7 @@ SYM_CODE_START_LOCAL(paranoid_exit)
jnz restore_regs_and_return_to_kernel
 
/* We are returning to a context with user GSBASE */
-   SWAPGS_UNSAFE_STACK
+   swapgs
jmp restore_regs_and_return_to_kernel
 SYM_CODE_END(paranoid_exit)
 
@@ -1426,7 +1426,7 @@ nmi_no_fsgsbase:
jnz nmi_restore
 
 nmi_swapgs:
-   SWAPGS_UNSAFE_STACK
+   swapgs
 
 nmi_restore:
POP_REGS
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 2dfc8d380dab..8c86edefa115 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -131,18 +131,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 #define SAVE_FLAGS(x)  pushfq; popq %rax
 #endif
 
-#define SWAPGS swapgs
-/*
- * Currently paravirt can't handle swapgs nicely when we
- * don't have a stack we can rely on (such as a user space
- * stack).  So we either find a way around these or just fault
- * and emulate if a guest tries to call swapgs directly.
- *
- * Either way, this is a good way to document that we don't
- * have a reliable stack. x86_64 only.
- */
-#define SWAPGS_UNSAFE_STACKswapgs
-
 #define INTERRUPT_RETURN   jmp native_iret
 #define USERGS_SYSRET64\
swapgs; \
@@ -170,6 +158,14 @@ static __always_inline int arch_irqs_disabled(void)
 
return arch_irqs_disabled_flags(flags);
 }
+#else
+#ifdef CONFIG_X86_64
+#ifdef CONFIG_XEN_PV
+#define SWAPGS ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
+#else
+#define SWAPGS swapgs
+#endif
+#endif
 #endif /* !__ASSEMBLY__ */
 
 #endif
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f8dce11d2bc1..f2ebe109a37e 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,26 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-/*
- * If swapgs is used while the userspace stack is still current,
- * there's no way to call a pvop.  The PV replacement *must* be
- * inlined, or the swapgs instruction must be trapped and emulated.
- */
-#define SWAPGS_UNSAFE_STACK\
-   PARA_SITE(PARA_PATCH(PV_CPU_swapgs), swapgs)
-
-/*
- * Note: swapgs is very special, and in practise is either going to be
- * implemented with a single "swapgs" instruction or something very
- * special.  Either way, we don't need to save any registers for
- * it.
- */
-#define SWAPGS \
-   PARA_SITE(PARA_PATCH(PV_CPU_swapgs),\
- ANNOTATE_RETPOLINE_SAFE;  \
- call

[PATCH v4 09/15] x86: add new features for paravirt patching

2021-01-20 Thread Juergen Gross via Virtualization
For being able to switch paravirt patching from special cased custom
code sequences to ALTERNATIVE handling some X86_FEATURE_* are needed
as new features. This enables to have the standard indirect pv call
as the default code and to patch that with the non-Xen custom code
sequence via ALTERNATIVE patching later.

Make sure paravirt patching is performed before alternative patching.

Signed-off-by: Juergen Gross 
---
V3:
- add comment (Boris Petkov)
- no negative features (Boris Petkov)

V4:
- move paravirt_set_cap() to paravirt-spinlocks.c
---
 arch/x86/include/asm/cpufeatures.h   |  2 ++
 arch/x86/include/asm/paravirt.h  | 10 ++
 arch/x86/kernel/alternative.c| 30 ++--
 arch/x86/kernel/paravirt-spinlocks.c |  9 +
 4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..3ae8944b253a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -238,6 +238,8 @@
 #define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers 
VMMCALL hypercall instruction */
 #define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted 
Virtualization - Encrypted State */
 #define X86_FEATURE_VM_PAGE_FLUSH  ( 8*32+21) /* "" VM Page Flush MSR is 
supported */
+#define X86_FEATURE_PVUNLOCK   ( 8*32+22) /* "" PV unlock function */
+#define X86_FEATURE_VCPUPREEMPT( 8*32+23) /* "" PV 
vcpu_is_preempted function */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE   ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 1e45b46fae84..8c354099d9c3 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -47,6 +47,10 @@ static inline u64 paravirt_steal_clock(int cpu)
return static_call(pv_steal_clock)(cpu);
 }
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init paravirt_set_cap(void);
+#endif
+
 /* The paravirtualized I/O functions */
 static inline void slow_down_io(void)
 {
@@ -811,5 +815,11 @@ static inline void paravirt_arch_exit_mmap(struct 
mm_struct *mm)
 {
 }
 #endif
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+static inline void paravirt_set_cap(void)
+{
+}
+#endif
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PARAVIRT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 0a904fb2678b..221acb2b868a 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int __read_mostly alternatives_patched;
 
@@ -730,6 +731,33 @@ void __init alternative_instructions(void)
 * patching.
 */
 
+   /*
+* Paravirt patching and alternative patching can be combined to
+* replace a function call with a short direct code sequence (e.g.
+* by setting a constant return value instead of doing that in an
+* external function).
+* In order to make this work the following sequence is required:
+* 1. set (artificial) features depending on used paravirt
+*functions which can later influence alternative patching
+* 2. apply paravirt patching (generally replacing an indirect
+*function call with a direct one)
+* 3. apply alternative patching (e.g. replacing a direct function
+*call with a custom code sequence)
+* Doing paravirt patching after alternative patching would clobber
+* the optimization of the custom code with a function call again.
+*/
+   paravirt_set_cap();
+
+   /*
+* First patch paravirt functions, such that we overwrite the indirect
+* call with the direct call.
+*/
+   apply_paravirt(__parainstructions, __parainstructions_end);
+
+   /*
+* Then patch alternatives, such that those paravirt calls that are in
+* alternatives can be overwritten by their immediate fragments.
+*/
apply_alternatives(__alt_instructions, __alt_instructions_end);
 
 #ifdef CONFIG_SMP
@@ -748,8 +776,6 @@ void __init alternative_instructions(void)
}
 #endif
 
-   apply_paravirt(__parainstructions, __parainstructions_end);
-
restart_nmi();
alternatives_patched = 1;
 }
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index 4f75d0cf6305..9e1ea99ad9df 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -32,3 +32,12 @@ bool pv_is_native_vcpu_is_preempted(void)
return pv_ops.lock.vcpu_is_preempted.func ==
__raw_callee_save___native_vcpu_is_preempted;
 }
+
+void __init paravirt_set_cap(void)
+{
+   if (!pv_is_native_spin_unlock())
+   setup_force_cpu_cap(X86_FEATURE_PVUNLOCK

Re: [PATCH v2] drm/virtio: Track total GPU memory for virtio driver

2021-01-20 Thread Gerd Hoffmann
  Hi,

> > > > > +   select TRACE_GPU_MEM

> > > > > +#ifdef CONFIG_TRACE_GPU_MEM

That doesn't make sense btw.

> > > > > +#ifdef CONFIG_TRACE_GPU_MEM
> > > > > +static inline void virtio_gpu_trace_total_mem(struct 
> > > > > virtio_gpu_device *vgdev,
> > > > > + s64 delta)
> > > > > +{
> > > > > +   u64 total_mem = atomic64_add_return(delta, &vgdev->total_mem);
> > > > > +
> > > > > +   trace_gpu_mem_total(0, 0, total_mem);

Hmm, so no per process tracking (pid arg hard-coded to zero)?
Any plans for that?
The cgroups patches mentioned by Daniel should address that btw.

The gpu_id is hardcoded to zero too.  Shouldn't that be something like
the minor number of the drm device?  Or maybe something else in case you
need drm and non-drm gpu devices work side-by-side?

> > > Thanks for your reply! Android Cuttlefish virtual platform is using
> > > the virtio-gpu driver, and we currently are carrying this small patch
> > > at the downstream side. This is essential for us because:
> > > (1) Android has deprecated debugfs on production devices already

IIRC there have been discussions about a statfs, so you can export stats
with a sane interface without also enabling all the power provided by
debugfs, exactly because of the concerns to do that on production
systems.

Not sure what the state is, seems to not be upstream yet.  That would be
(beside cgroups) another thing to look at.

> > > Android relies on this tracepoint + eBPF to make the GPU memory totals
> > > available at runtime on production devices, which has been enforced
> > > already. Not only game developers can have a reliable kernel total GPU
> > > memory to look at, but also Android leverages this to take GPU memory
> > > usage out from the system lost ram.

Sounds like you define "gpu memory" as "system memory used to store gpu
data".  Is that correct?  What about device memory?

> > > I'm not sure whether the other DRM drivers would like to integrate
> > > this tracepoint(maybe upstream drivers will move away from debugfs
> > > later as well?), but at least we hope virtio-gpu can take this.

Well, it is basically the same for all drivers using the gem shmem
helpers.  So I see little reason why we should do that at virtio-gpu
level.

> Android GPU vendors have integrated this tracepoint to track gpu
> memory usage total(mapped into the gpu address space), which consists
> of below:
> (1) directly allocated via physical page allocator
> (2) imported external memory backed by dma-bufs
> (3) allocated exportable memory backed by dma-bufs

Hmm, the tracepoint doesn't track which of the three groups the memory
belongs to.  Which I think is important, specifically group (2) because
that might already be accounted for by the exporting driver ...

> Our Android kernel team is leading the other side of effort to help
> remove the dma-bufs overlap(those mapped into a gpu device) as a joint
> effort, so that we can accurately explain the memory usage of the
> entire Android system.

I suspect once you figured that you'll notice that this little hack is
rather incomplete.

> For virtio-gpu, since that's used by our reference platform
> Cuttlefish(Cloud Android), we have to integrate the same tracepoint as
> well to enforce the use of this tracepoint and the eBPF stuff built on
> top to support runtime query of gpu memory on production devices. For
> virtio-gpu at this moment, we only want to track GEM allocations since
> PRIME import is currently not supported/used in Cuttlefish. That's all
> we are doing in this small patch.

take care,
  Gerd

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 10/15] x86/paravirt: remove no longer needed 32-bit pvops cruft

2021-01-20 Thread Juergen Gross via Virtualization
PVOP_VCALL4() is only used for Xen PV, while PVOP_CALL4() isn't used
at all. Keep PVOP_CALL4() for 64 bits due to symmetry reasons.

This allows to remove the 32-bit definitions of those macros leading
to a substantial simplification of the paravirt macros, as those were
the only ones needing non-empty "pre" and "post" parameters.

PVOP_CALLEE2() and PVOP_VCALLEE2() are used nowhere, so remove them.

Another no longer needed case is special handling of return types
larger than unsigned long. Replace that with a BUILD_BUG_ON().

DISABLE_INTERRUPTS() is used in 32-bit code only, so it can just be
replaced by cli.

INTERRUPT_RETURN in 32-bit code can be replaced by iret.

ENABLE_INTERRUPTS is used nowhere, so it can be removed.

Signed-off-by: Juergen Gross 
---
 arch/x86/entry/entry_32.S |   4 +-
 arch/x86/include/asm/irqflags.h   |   5 --
 arch/x86/include/asm/paravirt.h   |  35 +---
 arch/x86/include/asm/paravirt_types.h | 112 --
 arch/x86/kernel/asm-offsets.c |   2 -
 5 files changed, 35 insertions(+), 123 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index df8c017e6161..765487e57d6e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -430,7 +430,7 @@
 * will soon execute iret and the tracer was already set to
 * the irqstate after the IRET:
 */
-   DISABLE_INTERRUPTS(CLBR_ANY)
+   cli
lss (%esp), %esp/* switch to espfix segment */
 .Lend_\@:
 #endif /* CONFIG_X86_ESPFIX32 */
@@ -1077,7 +1077,7 @@ restore_all_switch_stack:
 * when returning from IPI handler and when returning from
 * scheduler to user-space.
 */
-   INTERRUPT_RETURN
+   iret
 
 .section .fixup, "ax"
 SYM_CODE_START(asm_iret_error)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 144d70ea4393..a0efbcd24b86 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -109,9 +109,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 }
 #else
 
-#define ENABLE_INTERRUPTS(x)   sti
-#define DISABLE_INTERRUPTS(x)  cli
-
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(x)  pushfq; popq %rax
@@ -119,8 +116,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 
 #define INTERRUPT_RETURN   jmp native_iret
 
-#else
-#define INTERRUPT_RETURN   iret
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 8c354099d9c3..c6496a82fad1 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -721,6 +721,7 @@ extern void default_banner(void);
.if ((~(set)) & mask); pop %reg; .endif
 
 #ifdef CONFIG_X86_64
+#ifdef CONFIG_PARAVIRT_XXL
 
 #define PV_SAVE_REGS(set)  \
COND_PUSH(set, CLBR_RAX, rax);  \
@@ -746,46 +747,12 @@ extern void default_banner(void);
 #define PARA_PATCH(off)((off) / 8)
 #define PARA_SITE(ptype, ops)  _PVSITE(ptype, ops, .quad, 8)
 #define PARA_INDIRECT(addr)*addr(%rip)
-#else
-#define PV_SAVE_REGS(set)  \
-   COND_PUSH(set, CLBR_EAX, eax);  \
-   COND_PUSH(set, CLBR_EDI, edi);  \
-   COND_PUSH(set, CLBR_ECX, ecx);  \
-   COND_PUSH(set, CLBR_EDX, edx)
-#define PV_RESTORE_REGS(set)   \
-   COND_POP(set, CLBR_EDX, edx);   \
-   COND_POP(set, CLBR_ECX, ecx);   \
-   COND_POP(set, CLBR_EDI, edi);   \
-   COND_POP(set, CLBR_EAX, eax)
-
-#define PARA_PATCH(off)((off) / 4)
-#define PARA_SITE(ptype, ops)  _PVSITE(ptype, ops, .long, 4)
-#define PARA_INDIRECT(addr)*%cs:addr
-#endif
 
-#ifdef CONFIG_PARAVIRT_XXL
 #define INTERRUPT_RETURN   \
PARA_SITE(PARA_PATCH(PV_CPU_iret),  \
  ANNOTATE_RETPOLINE_SAFE;  \
  jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
 
-#define DISABLE_INTERRUPTS(clobbers)   \
-   PARA_SITE(PARA_PATCH(PV_IRQ_irq_disable),   \
- PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\
- ANNOTATE_RETPOLINE_SAFE;  \
- call PARA_INDIRECT(pv_ops+PV_IRQ_irq_disable);\
- PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
-
-#define ENABLE_INTERRUPTS(clobbers)\
-   PARA_SITE(PARA_PATCH(PV_IRQ_irq_enable),\
- PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\
- ANNOTATE_RETPOLINE_SAFE;  \
- call PARA_INDIRECT(pv_ops+PV_IRQ_irq_enable); \
- PV_RESTORE_REGS(clobbers 

[PATCH v4 15/15] x86/paravirt: have only one paravirt patch function

2021-01-20 Thread Juergen Gross via Virtualization
There is no need any longer to have different paravirt patch functions
for native and Xen. Eliminate native_patch() and rename
paravirt_patch_default() to paravirt_patch().

Signed-off-by: Juergen Gross 
---
V3:
- remove paravirt_patch_insns() (kernel test robot)
---
 arch/x86/include/asm/paravirt_types.h | 19 +--
 arch/x86/kernel/Makefile  |  3 +--
 arch/x86/kernel/alternative.c |  2 +-
 arch/x86/kernel/paravirt.c| 20 ++--
 arch/x86/kernel/paravirt_patch.c  | 11 ---
 arch/x86/xen/enlighten_pv.c   |  1 -
 6 files changed, 5 insertions(+), 51 deletions(-)
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 588ff14ce969..62efbf8bd8f0 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -68,19 +68,6 @@ struct pv_info {
const char *name;
 };
 
-struct pv_init_ops {
-   /*
-* Patch may replace one of the defined code sequences with
-* arbitrary code, subject to the same register constraints.
-* This generally means the code is not free to clobber any
-* registers other than EAX.  The patch function should return
-* the number of bytes of code generated, as we nop pad the
-* rest in generic code.
-*/
-   unsigned (*patch)(u8 type, void *insn_buff,
- unsigned long addr, unsigned len);
-} __no_randomize_layout;
-
 #ifdef CONFIG_PARAVIRT_XXL
 struct pv_lazy_ops {
/* Set deferred update mode, used for batching operations. */
@@ -276,7 +263,6 @@ struct pv_lock_ops {
  * number for each function using the offset which we use to indicate
  * what to patch. */
 struct paravirt_patch_template {
-   struct pv_init_ops  init;
struct pv_cpu_ops   cpu;
struct pv_irq_ops   irq;
struct pv_mmu_ops   mmu;
@@ -317,10 +303,7 @@ extern void (*paravirt_iret)(void);
 /* Simple instruction patching code. */
 #define NATIVE_LABEL(a,x,b) "\n\t.globl " a #x "_" #b "\n" a #x "_" #b ":\n\t"
 
-unsigned paravirt_patch_default(u8 type, void *insn_buff, unsigned long addr, 
unsigned len);
-unsigned paravirt_patch_insns(void *insn_buff, unsigned len, const char 
*start, const char *end);
-
-unsigned native_patch(u8 type, void *insn_buff, unsigned long addr, unsigned 
len);
+unsigned paravirt_patch(u8 type, void *insn_buff, unsigned long addr, unsigned 
len);
 
 int paravirt_disable_iospace(void);
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 5eeb808eb024..853a83503120 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -35,7 +35,6 @@ KASAN_SANITIZE_sev-es.o   
:= n
 KCSAN_SANITIZE := n
 
 OBJECT_FILES_NON_STANDARD_test_nx.o:= y
-OBJECT_FILES_NON_STANDARD_paravirt_patch.o := y
 
 ifdef CONFIG_FRAME_POINTER
 OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y
@@ -122,7 +121,7 @@ obj-$(CONFIG_AMD_NB)+= amd_nb.o
 obj-$(CONFIG_DEBUG_NMI_SELFTEST) += nmi_selftest.o
 
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvmclock.o
-obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch.o
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 221acb2b868a..fb0b83c85de7 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -613,7 +613,7 @@ void __init_or_module apply_paravirt(struct 
paravirt_patch_site *start,
BUG_ON(p->len > MAX_PATCH_LEN);
/* prep the buffer with the original instructions */
memcpy(insn_buff, p->instr, p->len);
-   used = pv_ops.init.patch(p->type, insn_buff, (unsigned 
long)p->instr, p->len);
+   used = paravirt_patch(p->type, insn_buff, (unsigned 
long)p->instr, p->len);
 
BUG_ON(used > p->len);
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 082954930809..3d7b989ed6be 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -99,8 +99,8 @@ void __init native_pv_lock_init(void)
static_branch_disable(&virt_spin_lock_key);
 }
 
-unsigned paravirt_patch_default(u8 type, void *insn_buff,
-   unsigned long addr, unsigned len)
+unsigned int paravirt_patch(u8 type, void *insn_buff, unsigned long addr,
+   unsigned int len)
 {
/*
 * Neat trick to map patch type back to the call within the
@@ -121,19 +121,6 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
return ret;
 }
 
-unsigned paravirt_patch_insns(void *insn_buff, unsigned len,
-   

[PATCH v4 04/15] x86/pv: switch SWAPGS to ALTERNATIVE

2021-01-20 Thread Juergen Gross via Virtualization
SWAPGS is used only for interrupts coming from user mode or for
returning to user mode. So there is no reason to use the PARAVIRT
framework, as it can easily be replaced by an ALTERNATIVE depending
on X86_FEATURE_XENPV.

There are several instances using the PV-aware SWAPGS macro in paths
which are never executed in a Xen PV guest. Replace those with the
plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies.

Signed-off-by: Juergen Gross 
Acked-by: Andy Lutomirski 
Acked-by: Peter Zijlstra (Intel) 
Reviewed-by: Borislav Petkov 
Reviewed-by: Thomas Gleixner 
---
 arch/x86/entry/entry_64.S | 10 +-
 arch/x86/include/asm/irqflags.h   | 20 
 arch/x86/include/asm/paravirt.h   | 20 
 arch/x86/include/asm/paravirt_types.h |  2 --
 arch/x86/kernel/asm-offsets_64.c  |  1 -
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/kernel/paravirt_patch.c  |  3 ---
 arch/x86/xen/enlighten_pv.c   |  3 ---
 8 files changed, 13 insertions(+), 47 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cad08703c4ad..a876204a73e0 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -669,7 +669,7 @@ native_irq_return_ldt:
 */
 
pushq   %rdi/* Stash user RDI */
-   SWAPGS  /* to kernel GS */
+   swapgs  /* to kernel GS */
SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi   /* to kernel CR3 */
 
movqPER_CPU_VAR(espfix_waddr), %rdi
@@ -699,7 +699,7 @@ native_irq_return_ldt:
orq PER_CPU_VAR(espfix_stack), %rax
 
SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
-   SWAPGS  /* to user GS */
+   swapgs  /* to user GS */
popq%rdi/* Restore user RDI */
 
movq%rax, %rsp
@@ -943,7 +943,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
ret
 
 .Lparanoid_entry_swapgs:
-   SWAPGS
+   swapgs
 
/*
 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
@@ -1001,7 +1001,7 @@ SYM_CODE_START_LOCAL(paranoid_exit)
jnz restore_regs_and_return_to_kernel
 
/* We are returning to a context with user GSBASE */
-   SWAPGS_UNSAFE_STACK
+   swapgs
jmp restore_regs_and_return_to_kernel
 SYM_CODE_END(paranoid_exit)
 
@@ -1426,7 +1426,7 @@ nmi_no_fsgsbase:
jnz nmi_restore
 
 nmi_swapgs:
-   SWAPGS_UNSAFE_STACK
+   swapgs
 
 nmi_restore:
POP_REGS
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 2dfc8d380dab..8c86edefa115 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -131,18 +131,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 #define SAVE_FLAGS(x)  pushfq; popq %rax
 #endif
 
-#define SWAPGS swapgs
-/*
- * Currently paravirt can't handle swapgs nicely when we
- * don't have a stack we can rely on (such as a user space
- * stack).  So we either find a way around these or just fault
- * and emulate if a guest tries to call swapgs directly.
- *
- * Either way, this is a good way to document that we don't
- * have a reliable stack. x86_64 only.
- */
-#define SWAPGS_UNSAFE_STACKswapgs
-
 #define INTERRUPT_RETURN   jmp native_iret
 #define USERGS_SYSRET64\
swapgs; \
@@ -170,6 +158,14 @@ static __always_inline int arch_irqs_disabled(void)
 
return arch_irqs_disabled_flags(flags);
 }
+#else
+#ifdef CONFIG_X86_64
+#ifdef CONFIG_XEN_PV
+#define SWAPGS ALTERNATIVE "swapgs", "", X86_FEATURE_XENPV
+#else
+#define SWAPGS swapgs
+#endif
+#endif
 #endif /* !__ASSEMBLY__ */
 
 #endif
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f8dce11d2bc1..f2ebe109a37e 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,26 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-/*
- * If swapgs is used while the userspace stack is still current,
- * there's no way to call a pvop.  The PV replacement *must* be
- * inlined, or the swapgs instruction must be trapped and emulated.
- */
-#define SWAPGS_UNSAFE_STACK\
-   PARA_SITE(PARA_PATCH(PV_CPU_swapgs), swapgs)
-
-/*
- * Note: swapgs is very special, and in practise is either going to be
- * implemented with a single "swapgs" instruction or something very
- * special.  Either way, we don't need to save any registers for
- * it.
- */
-#define SWAPGS \
-   PARA_SITE(PARA_PATCH(PV_CPU_swapgs),\
- ANNOTATE_RETPOLINE_SAFE;  \
- call

[PATCH v4 14/15] x86/paravirt: switch functions with custom code to ALTERNATIVE

2021-01-20 Thread Juergen Gross via Virtualization
Instead of using paravirt patching for custom code sequences use
ALTERNATIVE for the functions with custom code replacements.

Instead of patching an ud2 instruction for unpopulated vector entries
into the caller site, use a simple function just calling BUG() as a
replacement.

Simplify the register defines for assembler paravirt calling, as there
isn't much usage left.

Signed-off-by: Juergen Gross 
---
V4:
- fixed SAVE_FLAGS() (kernel test robot)
- added assembler paravirt cleanup
---
 arch/x86/entry/entry_64.S |  2 +-
 arch/x86/include/asm/irqflags.h   |  2 +-
 arch/x86/include/asm/paravirt.h   | 99 +--
 arch/x86/include/asm/paravirt_types.h |  6 --
 arch/x86/kernel/paravirt.c| 16 ++---
 arch/x86/kernel/paravirt_patch.c  | 88 
 6 files changed, 56 insertions(+), 157 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ce0464d630a2..714af508fe30 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -305,7 +305,7 @@ SYM_CODE_END(ret_from_fork)
 .macro DEBUG_ENTRY_ASSERT_IRQS_OFF
 #ifdef CONFIG_DEBUG_ENTRY
pushq %rax
-   SAVE_FLAGS(CLBR_RAX)
+   SAVE_FLAGS
testl $X86_EFLAGS_IF, %eax
jz .Lokay_\@
ud2
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index a0efbcd24b86..c5ce9845c999 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -111,7 +111,7 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_DEBUG_ENTRY
-#define SAVE_FLAGS(x)  pushfq; popq %rax
+#define SAVE_FLAGS pushfq; popq %rax
 #endif
 
 #define INTERRUPT_RETURN   jmp native_iret
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 36cd71fa097f..04b3067f31b5 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -137,7 +137,8 @@ static inline void write_cr0(unsigned long x)
 
 static inline unsigned long read_cr2(void)
 {
-   return PVOP_CALLEE0(unsigned long, mmu.read_cr2);
+   return PVOP_ALT_CALLEE0(unsigned long, mmu.read_cr2,
+   "mov %%cr2, %%rax;", ~X86_FEATURE_XENPV);
 }
 
 static inline void write_cr2(unsigned long x)
@@ -147,12 +148,14 @@ static inline void write_cr2(unsigned long x)
 
 static inline unsigned long __read_cr3(void)
 {
-   return PVOP_CALL0(unsigned long, mmu.read_cr3);
+   return PVOP_ALT_CALL0(unsigned long, mmu.read_cr3,
+ "mov %%cr3, %%rax;", ~X86_FEATURE_XENPV);
 }
 
 static inline void write_cr3(unsigned long x)
 {
-   PVOP_VCALL1(mmu.write_cr3, x);
+   PVOP_ALT_VCALL1(mmu.write_cr3, x,
+   "mov %%rdi, %%cr3", ~X86_FEATURE_XENPV);
 }
 
 static inline void __write_cr4(unsigned long x)
@@ -172,7 +175,7 @@ static inline void halt(void)
 
 static inline void wbinvd(void)
 {
-   PVOP_VCALL0(cpu.wbinvd);
+   PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ~X86_FEATURE_XENPV);
 }
 
 static inline u64 paravirt_read_msr(unsigned msr)
@@ -386,22 +389,28 @@ static inline void paravirt_release_p4d(unsigned long pfn)
 
 static inline pte_t __pte(pteval_t val)
 {
-   return (pte_t) { PVOP_CALLEE1(pteval_t, mmu.make_pte, val) };
+   return (pte_t) { PVOP_ALT_CALLEE1(pteval_t, mmu.make_pte, val,
+ "mov %%rdi, %%rax",
+ ~X86_FEATURE_XENPV) };
 }
 
 static inline pteval_t pte_val(pte_t pte)
 {
-   return PVOP_CALLEE1(pteval_t, mmu.pte_val, pte.pte);
+   return PVOP_ALT_CALLEE1(pteval_t, mmu.pte_val, pte.pte,
+   "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 }
 
 static inline pgd_t __pgd(pgdval_t val)
 {
-   return (pgd_t) { PVOP_CALLEE1(pgdval_t, mmu.make_pgd, val) };
+   return (pgd_t) { PVOP_ALT_CALLEE1(pgdval_t, mmu.make_pgd, val,
+ "mov %%rdi, %%rax",
+ ~X86_FEATURE_XENPV) };
 }
 
 static inline pgdval_t pgd_val(pgd_t pgd)
 {
-   return PVOP_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd);
+   return PVOP_ALT_CALLEE1(pgdval_t, mmu.pgd_val, pgd.pgd,
+   "mov %%rdi, %%rax", ~X86_FEATURE_XENPV);
 }
 
 #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
@@ -434,12 +443,15 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline pmd_t __pmd(pmdval_t val)
 {
-   return (pmd_t) { PVOP_CALLEE1(pmdval_t, mmu.make_pmd, val) };
+   return (pmd_t) { PVOP_ALT_CALLEE1(pmdval_t, mmu.make_pmd, val,
+ "mov %%rdi, %%rax",
+ ~X86_FEATURE_XENPV) };
 }
 
 static inline pmdval_t pmd_val(pmd_t pmd)
 {
-   return PVOP_CALLEE1(pmdval_t, mmu.pmd_val, pmd.pmd);
+   return PVOP_ALT_CALLEE1(pmdval_t, mmu.pmd_val, pmd.pmd,
+  

[PATCH v4 12/15] x86/paravirt: switch iret pvops to ALTERNATIVE

2021-01-20 Thread Juergen Gross via Virtualization
The iret paravirt op is rather special as it is using a jmp instead
of a call instruction. Switch it to ALTERNATIVE.

Signed-off-by: Juergen Gross 
---
V3:
- use ALTERNATIVE_TERNARY
---
 arch/x86/include/asm/paravirt.h   |  6 +++---
 arch/x86/include/asm/paravirt_types.h |  5 +
 arch/x86/kernel/asm-offsets.c |  5 -
 arch/x86/kernel/paravirt.c| 26 ++
 arch/x86/xen/enlighten_pv.c   |  3 +--
 5 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c6496a82fad1..36cd71fa097f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -749,9 +749,9 @@ extern void default_banner(void);
 #define PARA_INDIRECT(addr)*addr(%rip)
 
 #define INTERRUPT_RETURN   \
-   PARA_SITE(PARA_PATCH(PV_CPU_iret),  \
- ANNOTATE_RETPOLINE_SAFE;  \
- jmp PARA_INDIRECT(pv_ops+PV_CPU_iret);)
+   ANNOTATE_RETPOLINE_SAFE;\
+   ALTERNATIVE_TERNARY("jmp *paravirt_iret(%rip);",\
+   X86_FEATURE_XENPV, "jmp xen_iret;", "jmp native_iret;")
 
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 45bd21647dd8..0afdac83f926 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -151,10 +151,6 @@ struct pv_cpu_ops {
 
u64 (*read_pmc)(int counter);
 
-   /* Normal iret.  Jump to this with the standard iret stack
-  frame set up. */
-   void (*iret)(void);
-
void (*start_context_switch)(struct task_struct *prev);
void (*end_context_switch)(struct task_struct *next);
 #endif
@@ -294,6 +290,7 @@ struct paravirt_patch_template {
 
 extern struct pv_info pv_info;
 extern struct paravirt_patch_template pv_ops;
+extern void (*paravirt_iret)(void);
 
 #define PARAVIRT_PATCH(x)  \
(offsetof(struct paravirt_patch_template, x) / sizeof(void *))
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 736508004b30..ecd3fd6993d1 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -61,11 +61,6 @@ static void __used common(void)
OFFSET(IA32_RT_SIGFRAME_sigcontext, rt_sigframe_ia32, uc.uc_mcontext);
 #endif
 
-#ifdef CONFIG_PARAVIRT_XXL
-   BLANK();
-   OFFSET(PV_CPU_iret, paravirt_patch_template, cpu.iret);
-#endif
-
 #ifdef CONFIG_XEN
BLANK();
OFFSET(XEN_vcpu_info_mask, vcpu_info, evtchn_upcall_mask);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 44e5b0fe28cb..0553a339d850 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -86,25 +86,6 @@ u64 notrace _paravirt_ident_64(u64 x)
 {
return x;
 }
-
-static unsigned paravirt_patch_jmp(void *insn_buff, const void *target,
-  unsigned long addr, unsigned len)
-{
-   struct branch *b = insn_buff;
-   unsigned long delta = (unsigned long)target - (addr+5);
-
-   if (len < 5) {
-#ifdef CONFIG_RETPOLINE
-   WARN_ONCE(1, "Failing to patch indirect JMP in %ps\n", (void 
*)addr);
-#endif
-   return len; /* call too long for patch site */
-   }
-
-   b->opcode = 0xe9;   /* jmp */
-   b->delta = delta;
-
-   return 5;
-}
 #endif
 
 DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
@@ -136,9 +117,6 @@ unsigned paravirt_patch_default(u8 type, void *insn_buff,
else if (opfunc == _paravirt_ident_64)
ret = paravirt_patch_ident_64(insn_buff, len);
 
-   else if (type == PARAVIRT_PATCH(cpu.iret))
-   /* If operation requires a jmp, then jmp */
-   ret = paravirt_patch_jmp(insn_buff, opfunc, addr, len);
 #endif
else
/* Otherwise call the function. */
@@ -316,8 +294,6 @@ struct paravirt_patch_template pv_ops = {
 
.cpu.load_sp0   = native_load_sp0,
 
-   .cpu.iret   = native_iret,
-
 #ifdef CONFIG_X86_IOPL_IOPERM
.cpu.invalidate_io_bitmap   = native_tss_invalidate_io_bitmap,
.cpu.update_io_bitmap   = native_tss_update_io_bitmap,
@@ -422,6 +398,8 @@ struct paravirt_patch_template pv_ops = {
 NOKPROBE_SYMBOL(native_get_debugreg);
 NOKPROBE_SYMBOL(native_set_debugreg);
 NOKPROBE_SYMBOL(native_load_idt);
+
+void (*paravirt_iret)(void) = native_iret;
 #endif
 
 EXPORT_SYMBOL(pv_ops);
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 32b295cc2716..4716383c64a9 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1057,8 +1057,6 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst =

[PATCH v4 07/15] x86/paravirt: switch time pvops functions to use static_call()

2021-01-20 Thread Juergen Gross via Virtualization
The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.

Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.

Signed-off-by: Juergen Gross 
---
V4:
- drop paravirt_time.h again
- don't move Hyper-V code (Michael Kelley)
---
 arch/x86/Kconfig  |  1 +
 arch/x86/include/asm/mshyperv.h   |  2 +-
 arch/x86/include/asm/paravirt.h   | 17 ++---
 arch/x86/include/asm/paravirt_types.h |  6 --
 arch/x86/kernel/cpu/vmware.c  |  5 +++--
 arch/x86/kernel/kvm.c |  2 +-
 arch/x86/kernel/kvmclock.c|  2 +-
 arch/x86/kernel/paravirt.c| 16 
 arch/x86/kernel/tsc.c |  2 +-
 arch/x86/xen/time.c   | 11 ---
 drivers/clocksource/hyperv_timer.c|  5 +++--
 drivers/xen/time.c|  2 +-
 12 files changed, 42 insertions(+), 29 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..7ccd4a80788c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -771,6 +771,7 @@ if HYPERVISOR_GUEST
 
 config PARAVIRT
bool "Enable paravirtualization code"
+   depends on HAVE_STATIC_CALL
help
  This changes the kernel so it can modify itself when it is run
  under a hypervisor, potentially improving performance significantly
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 30f76b966857..b4ee331d29a7 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -63,7 +63,7 @@ typedef int (*hyperv_fill_flush_list_func)(
 static __always_inline void hv_setup_sched_clock(void *sched_clock)
 {
 #ifdef CONFIG_PARAVIRT
-   pv_ops.time.sched_clock = sched_clock;
+   paravirt_set_sched_clock(sched_clock);
 #endif
 }
 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 4abf110e2243..1e45b46fae84 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -15,11 +15,22 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
-static inline unsigned long long paravirt_sched_clock(void)
+u64 dummy_steal_clock(int cpu);
+u64 dummy_sched_clock(void);
+
+DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
+DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
+
+extern bool paravirt_using_native_sched_clock;
+
+void paravirt_set_sched_clock(u64 (*func)(void));
+
+static inline u64 paravirt_sched_clock(void)
 {
-   return PVOP_CALL0(unsigned long long, time.sched_clock);
+   return static_call(pv_sched_clock)();
 }
 
 struct static_key;
@@ -33,7 +44,7 @@ bool pv_is_native_vcpu_is_preempted(void);
 
 static inline u64 paravirt_steal_clock(int cpu)
 {
-   return PVOP_CALL1(u64, time.steal_clock, cpu);
+   return static_call(pv_steal_clock)(cpu);
 }
 
 /* The paravirtualized I/O functions */
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index de87087d3bde..1fff349e4792 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -95,11 +95,6 @@ struct pv_lazy_ops {
 } __no_randomize_layout;
 #endif
 
-struct pv_time_ops {
-   unsigned long long (*sched_clock)(void);
-   unsigned long long (*steal_clock)(int cpu);
-} __no_randomize_layout;
-
 struct pv_cpu_ops {
/* hooks for various privileged instructions */
void (*io_delay)(void);
@@ -291,7 +286,6 @@ struct pv_lock_ops {
  * what to patch. */
 struct paravirt_patch_template {
struct pv_init_ops  init;
-   struct pv_time_ops  time;
struct pv_cpu_ops   cpu;
struct pv_irq_ops   irq;
struct pv_mmu_ops   mmu;
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index c6ede3b3d302..84fb8e3f3d1b 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -336,11 +337,11 @@ static void __init vmware_paravirt_ops_setup(void)
vmware_cyc2ns_setup();
 
if (vmw_sched_clock)
-   pv_ops.time.sched_clock = vmware_sched_clock;
+   paravirt_set_sched_clock(vmware_sched_clock);
 
if (vmware_is_stealclock_available()) {
has_steal_clock = true;
-   pv_ops.time.steal_clock = vmware_steal_clock;
+   static_call_update(pv_steal_clock, vmware_steal_clock);
 
/* We use reboot notifier only to disable steal clock */
register_reboot_notifier(&vmware_pv_reboot_nb);
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5e78e01ca3b4..351ba99f6009 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -650,7 +650,7 @@ static void __init kvm_guest_init(void)
 
if (kvm_para_has_feature(KVM_FEATUR

[PATCH v4 11/15] x86/paravirt: simplify paravirt macros

2021-01-20 Thread Juergen Gross via Virtualization
The central pvops call macros PVOP_CALL() and PVOP_VCALL() are
looking very similar now.

The main differences are using PVOP_VCALL_ARGS or PVOP_CALL_ARGS, which
are identical, and the return value handling.

So drop PVOP_VCALL_ARGS and instead of PVOP_VCALL() just use
(void)PVOP_CALL(long, ...).

Note that it isn't easily possible to just redefine PVOP_VCALL()
to use PVOP_CALL() instead, as this would require further hiding of
commas in macro parameters.

Signed-off-by: Juergen Gross 
---
V3:
- new patch

V4:
- fix build warnings with clang (kernel test robot)
---
 arch/x86/include/asm/paravirt_types.h | 41 ---
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 42f9eef84131..45bd21647dd8 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -408,11 +408,9 @@ int paravirt_disable_iospace(void);
  * makes sure the incoming and outgoing types are always correct.
  */
 #ifdef CONFIG_X86_32
-#define PVOP_VCALL_ARGS
\
+#define PVOP_CALL_ARGS \
unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;
 
-#define PVOP_CALL_ARGS PVOP_VCALL_ARGS
-
 #define PVOP_CALL_ARG1(x)  "a" ((unsigned long)(x))
 #define PVOP_CALL_ARG2(x)  "d" ((unsigned long)(x))
 #define PVOP_CALL_ARG3(x)  "c" ((unsigned long)(x))
@@ -428,12 +426,10 @@ int paravirt_disable_iospace(void);
 #define VEXTRA_CLOBBERS
 #else  /* CONFIG_X86_64 */
 /* [re]ax isn't an arg, but the return val */
-#define PVOP_VCALL_ARGS\
+#define PVOP_CALL_ARGS \
unsigned long __edi = __edi, __esi = __esi, \
__edx = __edx, __ecx = __ecx, __eax = __eax;
 
-#define PVOP_CALL_ARGS PVOP_VCALL_ARGS
-
 #define PVOP_CALL_ARG1(x)  "D" ((unsigned long)(x))
 #define PVOP_CALL_ARG2(x)  "S" ((unsigned long)(x))
 #define PVOP_CALL_ARG3(x)  "d" ((unsigned long)(x))
@@ -458,59 +454,46 @@ int paravirt_disable_iospace(void);
 #define PVOP_TEST_NULL(op) ((void)pv_ops.op)
 #endif
 
-#define PVOP_RETMASK(rettype)  \
+#define PVOP_RETVAL(rettype)   \
({  unsigned long __mask = ~0UL;\
+   BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));  \
switch (sizeof(rettype)) {  \
case 1: __mask =   0xffUL; break;   \
case 2: __mask = 0xUL; break;   \
case 4: __mask = 0xUL; break;   \
default: break; \
}   \
-   __mask; \
+   __mask & __eax; \
})
 
 
-#define PVOP_CALL(rettype, op, clbr, call_clbr, extra_clbr, ...)   \
+#define PVOP_CALL(ret, op, clbr, call_clbr, extra_clbr, ...)   \
({  \
PVOP_CALL_ARGS; \
PVOP_TEST_NULL(op); \
-   BUILD_BUG_ON(sizeof(rettype) > sizeof(unsigned long));  \
asm volatile(paravirt_alt(PARAVIRT_CALL)\
 : call_clbr, ASM_CALL_CONSTRAINT   \
 : paravirt_type(op),   \
   paravirt_clobber(clbr),  \
   ##__VA_ARGS__\
 : "memory", "cc" extra_clbr);  \
-   (rettype)(__eax & PVOP_RETMASK(rettype));   \
+   ret;\
})
 
 #define __PVOP_CALL(rettype, op, ...)  \
-   PVOP_CALL(rettype, op, CLBR_ANY, PVOP_CALL_CLOBBERS,\
- EXTRA_CLOBBERS, ##__VA_ARGS__)
+   PVOP_CALL(PVOP_RETVAL(rettype), op, CLBR_ANY,   \
+ PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS, ##__VA_ARGS__)
 
 #define __PVOP_CALLEESAVE(rettype, op, ...)\
-   PVOP_CALL(rettype, op.func, CLBR_RET_REG,   \
+   PVOP_CALL(PVOP_RETVAL(rettype), op.func, CLBR_RET_REG,  \
  PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
 
-
-#define PVOP_VCALL(op, clbr, call_clbr, extra

[PATCH v4 13/15] x86/paravirt: add new macros PVOP_ALT* supporting pvops in ALTERNATIVEs

2021-01-20 Thread Juergen Gross via Virtualization
Instead of using paravirt patching for custom code sequences add
support for using ALTERNATIVE handling combined with paravirt call
patching.

Signed-off-by: Juergen Gross 
---
V3:
- drop PVOP_ALT_VCALL() macro
---
 arch/x86/include/asm/paravirt_types.h | 49 ++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 0afdac83f926..0ed976286d49 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -477,44 +477,91 @@ int paravirt_disable_iospace(void);
ret;\
})
 
+#define PVOP_ALT_CALL(ret, op, alt, cond, clbr, call_clbr, \
+ extra_clbr, ...)  \
+   ({  \
+   PVOP_CALL_ARGS; \
+   PVOP_TEST_NULL(op); \
+   asm volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),   \
+alt, cond) \
+: call_clbr, ASM_CALL_CONSTRAINT   \
+: paravirt_type(op),   \
+  paravirt_clobber(clbr),  \
+  ##__VA_ARGS__\
+: "memory", "cc" extra_clbr);  \
+   ret;\
+   })
+
 #define __PVOP_CALL(rettype, op, ...)  \
PVOP_CALL(PVOP_RETVAL(rettype), op, CLBR_ANY,   \
  PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALL(rettype, op, alt, cond, ...)   \
+   PVOP_ALT_CALL(PVOP_RETVAL(rettype), op, alt, cond, CLBR_ANY,\
+ PVOP_CALL_CLOBBERS, EXTRA_CLOBBERS,   \
+ ##__VA_ARGS__)
+
 #define __PVOP_CALLEESAVE(rettype, op, ...)\
PVOP_CALL(PVOP_RETVAL(rettype), op.func, CLBR_RET_REG,  \
  PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond, ...) \
+   PVOP_ALT_CALL(PVOP_RETVAL(rettype), op.func, alt, cond, \
+ CLBR_RET_REG, PVOP_CALLEE_CLOBBERS, , ##__VA_ARGS__)
+
+
 #define __PVOP_VCALL(op, ...)  \
(void)PVOP_CALL(, op, CLBR_ANY, PVOP_VCALL_CLOBBERS,\
   VEXTRA_CLOBBERS, ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALL(op, alt, cond, ...)   \
+   (void)PVOP_ALT_CALL(, op, alt, cond, CLBR_ANY,  \
+   PVOP_VCALL_CLOBBERS, VEXTRA_CLOBBERS,   \
+   ##__VA_ARGS__)
+
 #define __PVOP_VCALLEESAVE(op, ...)\
(void)PVOP_CALL(, op.func, CLBR_RET_REG,\
- PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
+   PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
+#define __PVOP_ALT_VCALLEESAVE(op, alt, cond, ...) \
+   (void)PVOP_ALT_CALL(, op.func, alt, cond, CLBR_RET_REG, \
+   PVOP_VCALLEE_CLOBBERS, , ##__VA_ARGS__)
 
 
 #define PVOP_CALL0(rettype, op)
\
__PVOP_CALL(rettype, op)
 #define PVOP_VCALL0(op)
\
__PVOP_VCALL(op)
+#define PVOP_ALT_CALL0(rettype, op, alt, cond) \
+   __PVOP_ALT_CALL(rettype, op, alt, cond)
+#define PVOP_ALT_VCALL0(op, alt, cond) \
+   __PVOP_ALT_VCALL(op, alt, cond)
 
 #define PVOP_CALLEE0(rettype, op)  \
__PVOP_CALLEESAVE(rettype, op)
 #define PVOP_VCALLEE0(op)  \
__PVOP_VCALLEESAVE(op)
+#define PVOP_ALT_CALLEE0(rettype, op, alt, cond)   \
+   __PVOP_ALT_CALLEESAVE(rettype, op, alt, cond)
+#define PVOP_ALT_VCALLEE0(op, alt, cond)   \
+   __PVOP_ALT_VCALLEESAVE(op, alt, cond)
 
 
 #define PVOP_CALL1(rettype, op, arg1)  \
__PVOP_CALL(rettype, op, PVOP_CALL_ARG1(arg1))
 #define PVOP_VCALL1(op, arg1)  \
__PVOP_VCALL(op, PVOP_CALL_ARG1(arg1))
+#define PVOP_ALT_VCALL1(op, arg1, alt, cond)   \
+   __PVOP_ALT_VCALL(op, alt, cond, PVOP_CALL_ARG1(arg1))
 
 #define PVOP_CALLEE1(rettype, op, arg1)
\
 

[PATCH v4 06/15] x86: rework arch_local_irq_restore() to not use popf

2021-01-20 Thread Juergen Gross via Virtualization
"popf" is a rather expensive operation, so don't use it for restoring
irq flags. Instead test whether interrupts are enabled in the flags
parameter and enable interrupts via "sti" in that case.

This results in the restore_fl paravirt op to be no longer needed.

Suggested-by: Andy Lutomirski 
Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/irqflags.h   | 20 ++-
 arch/x86/include/asm/paravirt.h   |  5 -
 arch/x86/include/asm/paravirt_types.h |  7 ++-
 arch/x86/kernel/irqflags.S| 11 ---
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/kernel/paravirt_patch.c  |  3 ---
 arch/x86/xen/enlighten_pv.c   |  2 --
 arch/x86/xen/irq.c| 23 --
 arch/x86/xen/xen-asm.S| 28 ---
 arch/x86/xen/xen-ops.h|  1 -
 10 files changed, 8 insertions(+), 93 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index e585a4705b8d..144d70ea4393 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -35,15 +35,6 @@ extern __always_inline unsigned long native_save_fl(void)
return flags;
 }
 
-extern inline void native_restore_fl(unsigned long flags);
-extern inline void native_restore_fl(unsigned long flags)
-{
-   asm volatile("push %0 ; popf"
-: /* no output */
-:"g" (flags)
-:"memory", "cc");
-}
-
 static __always_inline void native_irq_disable(void)
 {
asm volatile("cli": : :"memory");
@@ -79,11 +70,6 @@ static __always_inline unsigned long 
arch_local_save_flags(void)
return native_save_fl();
 }
 
-static __always_inline void arch_local_irq_restore(unsigned long flags)
-{
-   native_restore_fl(flags);
-}
-
 static __always_inline void arch_local_irq_disable(void)
 {
native_irq_disable();
@@ -152,6 +138,12 @@ static __always_inline int arch_irqs_disabled(void)
 
return arch_irqs_disabled_flags(flags);
 }
+
+static __always_inline void arch_local_irq_restore(unsigned long flags)
+{
+   if (!arch_irqs_disabled_flags(flags))
+   arch_local_irq_enable();
+}
 #else
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index dd43b1100a87..4abf110e2243 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -648,11 +648,6 @@ static inline notrace unsigned long 
arch_local_save_flags(void)
return PVOP_CALLEE0(unsigned long, irq.save_fl);
 }
 
-static inline notrace void arch_local_irq_restore(unsigned long f)
-{
-   PVOP_VCALLEE1(irq.restore_fl, f);
-}
-
 static inline notrace void arch_local_irq_disable(void)
 {
PVOP_VCALLEE0(irq.irq_disable);
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 0169365f1403..de87087d3bde 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -168,16 +168,13 @@ struct pv_cpu_ops {
 struct pv_irq_ops {
 #ifdef CONFIG_PARAVIRT_XXL
/*
-* Get/set interrupt state.  save_fl and restore_fl are only
-* expected to use X86_EFLAGS_IF; all other bits
-* returned from save_fl are undefined, and may be ignored by
-* restore_fl.
+* Get/set interrupt state.  save_fl is expected to use X86_EFLAGS_IF;
+* all other bits returned from save_fl are undefined.
 *
 * NOTE: These functions callers expect the callee to preserve
 * more registers than the standard C calling convention.
 */
struct paravirt_callee_save save_fl;
-   struct paravirt_callee_save restore_fl;
struct paravirt_callee_save irq_disable;
struct paravirt_callee_save irq_enable;
 
diff --git a/arch/x86/kernel/irqflags.S b/arch/x86/kernel/irqflags.S
index 0db0375235b4..8ef35063964b 100644
--- a/arch/x86/kernel/irqflags.S
+++ b/arch/x86/kernel/irqflags.S
@@ -13,14 +13,3 @@ SYM_FUNC_START(native_save_fl)
ret
 SYM_FUNC_END(native_save_fl)
 EXPORT_SYMBOL(native_save_fl)
-
-/*
- * void native_restore_fl(unsigned long flags)
- * %eax/%rdi: flags
- */
-SYM_FUNC_START(native_restore_fl)
-   push %_ASM_ARG1
-   popf
-   ret
-SYM_FUNC_END(native_restore_fl)
-EXPORT_SYMBOL(native_restore_fl)
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 18560b71e717..c60222ab8ab9 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -320,7 +320,6 @@ struct paravirt_patch_template pv_ops = {
 
/* Irq ops. */
.irq.save_fl= __PV_IS_CALLEE_SAVE(native_save_fl),
-   .irq.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
.irq.irq_disable= __PV_IS_CALLEE_SAVE(native_irq_disable),
.irq.irq_enable = __PV_IS_CALLEE_SAVE(native_irq_enable),
.irq.safe_halt  = native

[PATCH v4 05/15] x86/xen: drop USERGS_SYSRET64 paravirt call

2021-01-20 Thread Juergen Gross via Virtualization
USERGS_SYSRET64 is used to return from a syscall via sysret, but
a Xen PV guest will nevertheless use the iret hypercall, as there
is no sysret PV hypercall defined.

So instead of testing all the prerequisites for doing a sysret and
then mangling the stack for Xen PV again for doing an iret just use
the iret exit from the beginning.

This can easily be done via an ALTERNATIVE like it is done for the
sysenter compat case already.

It should be noted that this drops the optimization in Xen for not
restoring a few registers when returning to user mode, but it seems
as if the saved instructions in the kernel more than compensate for
this drop (a kernel build in a Xen PV guest was slightly faster with
this patch applied).

While at it remove the stale sysret32 remnants.

Signed-off-by: Juergen Gross 
---
V3:
- simplify ALTERNATIVE (Boris Petkov)
---
 arch/x86/entry/entry_64.S | 16 +++-
 arch/x86/include/asm/irqflags.h   |  6 --
 arch/x86/include/asm/paravirt.h   |  5 -
 arch/x86/include/asm/paravirt_types.h |  8 
 arch/x86/kernel/asm-offsets_64.c  |  2 --
 arch/x86/kernel/paravirt.c|  5 +
 arch/x86/kernel/paravirt_patch.c  |  4 
 arch/x86/xen/enlighten_pv.c   |  1 -
 arch/x86/xen/xen-asm.S| 20 
 arch/x86/xen/xen-ops.h|  2 --
 10 files changed, 8 insertions(+), 61 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a876204a73e0..ce0464d630a2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -46,14 +46,6 @@
 .code64
 .section .entry.text, "ax"
 
-#ifdef CONFIG_PARAVIRT_XXL
-SYM_CODE_START(native_usergs_sysret64)
-   UNWIND_HINT_EMPTY
-   swapgs
-   sysretq
-SYM_CODE_END(native_usergs_sysret64)
-#endif /* CONFIG_PARAVIRT_XXL */
-
 /*
  * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
  *
@@ -123,7 +115,12 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, 
SYM_L_GLOBAL)
 * Try to use SYSRET instead of IRET if we're returning to
 * a completely clean 64-bit userspace context.  If we're not,
 * go to the slow exit path.
+* In the Xen PV case we must use iret anyway.
 */
+
+   ALTERNATIVE "", "jmpswapgs_restore_regs_and_return_to_usermode", \
+   X86_FEATURE_XENPV
+
movqRCX(%rsp), %rcx
movqRIP(%rsp), %r11
 
@@ -215,7 +212,8 @@ syscall_return_via_sysret:
 
popq%rdi
popq%rsp
-   USERGS_SYSRET64
+   swapgs
+   sysretq
 SYM_CODE_END(entry_SYSCALL_64)
 
 /*
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 8c86edefa115..e585a4705b8d 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -132,12 +132,6 @@ static __always_inline unsigned long 
arch_local_irq_save(void)
 #endif
 
 #define INTERRUPT_RETURN   jmp native_iret
-#define USERGS_SYSRET64\
-   swapgs; \
-   sysretq;
-#define USERGS_SYSRET32\
-   swapgs; \
-   sysretl
 
 #else
 #define INTERRUPT_RETURN   iret
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f2ebe109a37e..dd43b1100a87 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -776,11 +776,6 @@ extern void default_banner(void);
 
 #ifdef CONFIG_X86_64
 #ifdef CONFIG_PARAVIRT_XXL
-#define USERGS_SYSRET64
\
-   PARA_SITE(PARA_PATCH(PV_CPU_usergs_sysret64),   \
- ANNOTATE_RETPOLINE_SAFE;  \
- jmp PARA_INDIRECT(pv_ops+PV_CPU_usergs_sysret64);)
-
 #ifdef CONFIG_DEBUG_ENTRY
 #define SAVE_FLAGS(clobbers)\
PARA_SITE(PARA_PATCH(PV_IRQ_save_fl),   \
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 130f428b0cc8..0169365f1403 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -156,14 +156,6 @@ struct pv_cpu_ops {
 
u64 (*read_pmc)(int counter);
 
-   /*
-* Switch to usermode gs and return to 64-bit usermode using
-* sysret.  Only used in 64-bit kernels to return to 64-bit
-* processes.  Usermode register state, including %rsp, must
-* already be restored.
-*/
-   void (*usergs_sysret64)(void);
-
/* Normal iret.  Jump to this with the standard iret stack
   frame set up. */
void (*iret)(void);
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 1354bc30614d..b14533af7676 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -13,8 +13,6 @@ int main(voi

[PATCH v4 09/15] x86: add new features for paravirt patching

2021-01-20 Thread Juergen Gross via Virtualization
For being able to switch paravirt patching from special cased custom
code sequences to ALTERNATIVE handling some X86_FEATURE_* are needed
as new features. This enables to have the standard indirect pv call
as the default code and to patch that with the non-Xen custom code
sequence via ALTERNATIVE patching later.

Make sure paravirt patching is performed before alternative patching.

Signed-off-by: Juergen Gross 
---
V3:
- add comment (Boris Petkov)
- no negative features (Boris Petkov)

V4:
- move paravirt_set_cap() to paravirt-spinlocks.c
---
 arch/x86/include/asm/cpufeatures.h   |  2 ++
 arch/x86/include/asm/paravirt.h  | 10 ++
 arch/x86/kernel/alternative.c| 30 ++--
 arch/x86/kernel/paravirt-spinlocks.c |  9 +
 4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..3ae8944b253a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -238,6 +238,8 @@
 #define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers 
VMMCALL hypercall instruction */
 #define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted 
Virtualization - Encrypted State */
 #define X86_FEATURE_VM_PAGE_FLUSH  ( 8*32+21) /* "" VM Page Flush MSR is 
supported */
+#define X86_FEATURE_PVUNLOCK   ( 8*32+22) /* "" PV unlock function */
+#define X86_FEATURE_VCPUPREEMPT( 8*32+23) /* "" PV 
vcpu_is_preempted function */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE   ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 1e45b46fae84..8c354099d9c3 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -47,6 +47,10 @@ static inline u64 paravirt_steal_clock(int cpu)
return static_call(pv_steal_clock)(cpu);
 }
 
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init paravirt_set_cap(void);
+#endif
+
 /* The paravirtualized I/O functions */
 static inline void slow_down_io(void)
 {
@@ -811,5 +815,11 @@ static inline void paravirt_arch_exit_mmap(struct 
mm_struct *mm)
 {
 }
 #endif
+
+#ifndef CONFIG_PARAVIRT_SPINLOCKS
+static inline void paravirt_set_cap(void)
+{
+}
+#endif
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PARAVIRT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 0a904fb2678b..221acb2b868a 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int __read_mostly alternatives_patched;
 
@@ -730,6 +731,33 @@ void __init alternative_instructions(void)
 * patching.
 */
 
+   /*
+* Paravirt patching and alternative patching can be combined to
+* replace a function call with a short direct code sequence (e.g.
+* by setting a constant return value instead of doing that in an
+* external function).
+* In order to make this work the following sequence is required:
+* 1. set (artificial) features depending on used paravirt
+*functions which can later influence alternative patching
+* 2. apply paravirt patching (generally replacing an indirect
+*function call with a direct one)
+* 3. apply alternative patching (e.g. replacing a direct function
+*call with a custom code sequence)
+* Doing paravirt patching after alternative patching would clobber
+* the optimization of the custom code with a function call again.
+*/
+   paravirt_set_cap();
+
+   /*
+* First patch paravirt functions, such that we overwrite the indirect
+* call with the direct call.
+*/
+   apply_paravirt(__parainstructions, __parainstructions_end);
+
+   /*
+* Then patch alternatives, such that those paravirt calls that are in
+* alternatives can be overwritten by their immediate fragments.
+*/
apply_alternatives(__alt_instructions, __alt_instructions_end);
 
 #ifdef CONFIG_SMP
@@ -748,8 +776,6 @@ void __init alternative_instructions(void)
}
 #endif
 
-   apply_paravirt(__parainstructions, __parainstructions_end);
-
restart_nmi();
alternatives_patched = 1;
 }
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index 4f75d0cf6305..9e1ea99ad9df 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -32,3 +32,12 @@ bool pv_is_native_vcpu_is_preempted(void)
return pv_ops.lock.vcpu_is_preempted.func ==
__raw_callee_save___native_vcpu_is_preempted;
 }
+
+void __init paravirt_set_cap(void)
+{
+   if (!pv_is_native_spin_unlock())
+   setup_force_cpu_cap(X86_FEATURE_PVUNLOCK

[PATCH v4 00/15] x86: major paravirt cleanup

2021-01-20 Thread Juergen Gross via Virtualization
This is a major cleanup of the paravirt infrastructure aiming at
eliminating all custom code patching via paravirt patching.

This is achieved by using ALTERNATIVE instead, leading to the ability
to give objtool access to the patched in instructions.

In order to remove most of the 32-bit special handling from pvops the
time related operations are switched to use static_call() instead.

At the end of this series all paravirt patching has to do is to
replace indirect calls with direct ones. In a further step this could
be switched to static_call(), too, but that would require a major
header file disentangling.

For a clean build without any objtool warnings a modified objtool is
required. Currently this is available in the "tip" tree in the
objtool/core branch.

Changes in V4:
- fixed several build failures
- removed objtool patch, as objtool patches are in tip now
- added patch 1 for making usage of static_call easier
- even more cleanup

Changes in V3:
- added patches 7 and 12
- addressed all comments

Changes in V2:
- added patches 5-12

Juergen Gross (14):
  x86/xen: use specific Xen pv interrupt entry for MCE
  x86/xen: use specific Xen pv interrupt entry for DF
  x86/pv: switch SWAPGS to ALTERNATIVE
  x86/xen: drop USERGS_SYSRET64 paravirt call
  x86: rework arch_local_irq_restore() to not use popf
  x86/paravirt: switch time pvops functions to use static_call()
  x86/alternative: support "not feature" and ALTERNATIVE_TERNARY
  x86: add new features for paravirt patching
  x86/paravirt: remove no longer needed 32-bit pvops cruft
  x86/paravirt: simplify paravirt macros
  x86/paravirt: switch iret pvops to ALTERNATIVE
  x86/paravirt: add new macros PVOP_ALT* supporting pvops in
ALTERNATIVEs
  x86/paravirt: switch functions with custom code to ALTERNATIVE
  x86/paravirt: have only one paravirt patch function

Peter Zijlstra (1):
  static_call: Pull some static_call declarations to the type headers

 arch/x86/Kconfig|   1 +
 arch/x86/entry/entry_32.S   |   4 +-
 arch/x86/entry/entry_64.S   |  28 ++-
 arch/x86/include/asm/alternative-asm.h  |   4 +
 arch/x86/include/asm/alternative.h  |   7 +
 arch/x86/include/asm/cpufeatures.h  |   2 +
 arch/x86/include/asm/idtentry.h |   6 +
 arch/x86/include/asm/irqflags.h |  53 ++
 arch/x86/include/asm/mshyperv.h |   2 +-
 arch/x86/include/asm/paravirt.h | 197 
 arch/x86/include/asm/paravirt_types.h   | 227 +---
 arch/x86/kernel/Makefile|   3 +-
 arch/x86/kernel/alternative.c   |  49 -
 arch/x86/kernel/asm-offsets.c   |   7 -
 arch/x86/kernel/asm-offsets_64.c|   3 -
 arch/x86/kernel/cpu/vmware.c|   5 +-
 arch/x86/kernel/irqflags.S  |  11 --
 arch/x86/kernel/kvm.c   |   2 +-
 arch/x86/kernel/kvmclock.c  |   2 +-
 arch/x86/kernel/paravirt-spinlocks.c|   9 +
 arch/x86/kernel/paravirt.c  |  83 +++--
 arch/x86/kernel/paravirt_patch.c| 109 
 arch/x86/kernel/tsc.c   |   2 +-
 arch/x86/xen/enlighten_pv.c |  36 ++--
 arch/x86/xen/irq.c  |  23 ---
 arch/x86/xen/time.c |  11 +-
 arch/x86/xen/xen-asm.S  |  52 +-
 arch/x86/xen/xen-ops.h  |   3 -
 drivers/clocksource/hyperv_timer.c  |   5 +-
 drivers/xen/time.c  |   2 +-
 include/linux/static_call.h |  20 ---
 include/linux/static_call_types.h   |  27 +++
 tools/include/linux/static_call_types.h |  27 +++
 33 files changed, 376 insertions(+), 646 deletions(-)
 delete mode 100644 arch/x86/kernel/paravirt_patch.c

-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Doctoral Symposium - CISTI 2021, Chaves, Portugal

2021-01-20 Thread ML
----
----
Doctoral Symposium

CISTI 2021 - 16th Iberian Conference on Information Systems and Technologies, 
Chaves, Portugal, 23 - 26 June 2021

http://www.cisti.eu/ 

----
---


The purpose of CISTI'2021’s Doctoral Symposium is to provide graduate students 
a setting where they can, informally, expose and discuss their work, collecting 
valuable expert opinions and sharing new ideas, methods and applications. The 
Doctoral Symposium is an excellent opportunity for PhD students to present and 
discuss their work in a Workshop format. Each presentation will be evaluated by 
a panel composed by at least three Information Systems and Technologies experts.



Contributions Submission

The Doctoral Symposium is opened to PhD students whose research area includes 
the themes proposed for this Conference. Submissions must include an extended 
abstract (maximum 4 pages), following the Conference style guide. All selected 
contributions will be published with the Conference Proceedings in electronic 
format with ISBN. These contributions will be available in the IEEE Xplore 
Digital Library and will be sent for indexing in ISI, Scopus, EI-Compendex, 
INSPEC and Google Scholar.

Submissions must include the field, the PhD institution and the number of 
months devoted to the development of the work. Additionally, they should 
include in a clear and succinct manner:

   •The problem approached and its significance or relevance
   •The research objectives and related investigation topics
   •A brief display of what is already known
   •A proposed solution methodology for the problem
   •Expected results



Important Dates

Paper submission: February 14, 2021

Notification of acceptance: March 28, 2021

Submission of accepted papers: April 11, 2021

Payment of registration, to ensure the inclusion of an accepted paper in the 
conference proceedings: April 11, 2021



Organizing Committee

Álvaro Rocha, ISEG, Universidade de Lisboa

Francisco García-Peñalvo, Universidad de Salamanca



Scientific Committee

Francisco García-Peñalvo, Universidad de Salamanca (Chair)

A. Augusto Sousa, FEUP, Universidade do Porto

Adérito Fernandes-Marcos, Universidade Aberta

Adolfo Lozano Tello, Universidad de Extremadura

Alicia García Holgado, Universidad de Salamanca

Álvaro Rocha, ISEG, Universidade de Lisboa

Ana Amélia Carvalho, Universidade de Coimbra

António Palma do Reis, ISEG, Universidade de Lisboa

Arnaldo Martins, Universidade de Aveiro

Borja Bordel, Universidad Politécnica de Madrid

Bráulio Alturas, ISCTE - Instituto Universitário de Lisboa

Carina Soledad González, Universidad de La Laguna

Carlos Costa, ISEG, Universidade de Lisboa

Carlos Ferrás Sexto, Universidad de Santiago de Compostela

Cesar Collazos, Universidad del Cauca

Daniel Amo, La Salle, Universidad Ramon Llull

David Fonseca, La Salle, Universitat Ramon Llull

Eduardo Sánchez Vila, Universidade de Santiago de Compostela

Fernando Moreira, Universidade Portucalense

Fernando Ramos, Universidade de Aveiro

Francisco Restivo, Universidade Católica Portuguesa

Gonçalo Paiva Dias, Universidade de Aveiro

João Costa, Universidade de Coimbra

João Manuel R.S. Tavares, FEUP, Universidade do Porto

João Pascoal Faria, FEUP, Universidade do Porto

José Machado, Universidade do Minho

Luis Camarinha-Matos, FCT, Universidade NOVA de Lisboa

Luís Paulo Reis, FEUP, Universidade do Porto

Marcelo Marciszack, Universidad Tecnológica Nacional

Marco Painho, NOVA IMS

María J Lado, Universidade de Vigo

María Pilar Mareca Lopez, Universidad Politécnica de Madrid

Mário Piattini, Universidad de Castilla-La Mancha

Martin Llamas Nistal, Universidad de Vigo

Miguel Casquilho, IST, Universidade de Lisboa

Miguel de Castro Neto, NOVA IMS

Miguel Ramón González-Castro, ENCE

Nelson Rocha, Universidade de Aveiro

Óscar Mealha, Universidade de Aveiro

Paulo Pinto, FCT, Universidade Nova de Lisboa

Ramiro Gonçalves, Universidade de Trás-os-Montes e Alto Douro

Tomas San Feliu, Universidad Politécnica de Madrid

Vitor Santos, NOVA IMS



Website of CISTI'2021: http://www.cisti.eu/ 



CISTI 2021 Team

http://www.cisti.eu/ 


--
This email has been checked for viruses by AVG.
https://www.avg.com
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] drm/virtio: Track total GPU memory for virtio driver

2021-01-20 Thread Daniel Vetter
On Wed, Jan 20, 2021 at 10:51 AM Yiwei Zhang‎  wrote:
>
> On Wed, Jan 20, 2021 at 1:11 AM Daniel Vetter  wrote:
> >
> > On Tue, Jan 19, 2021 at 11:08:12AM -0800, Yiwei Zhang wrote:
> > > On Mon, Jan 18, 2021 at 11:03 PM Daniel Vetter  wrote:
> > > >
> > > > On Tue, Jan 19, 2021 at 12:41 AM Yiwei Zhang  
> > > > wrote:
> > > > >
> > > > > On the success of virtio_gpu_object_create, add size of newly 
> > > > > allocated
> > > > > bo to the tracled total_mem. In drm_gem_object_funcs.free, after the 
> > > > > gem
> > > > > bo lost its last refcount, subtract the bo size from the tracked
> > > > > total_mem if the original underlying memory allocation is successful.
> > > > >
> > > > > Signed-off-by: Yiwei Zhang 
> > > >
> > > > Isn't this something that ideally we'd for everyone? Also tracepoint
> > > > for showing the total feels like tracepoint abuse, usually we show
> > > > totals somewhere in debugfs or similar, and tracepoint just for what's
> > > > happening (i.e. which object got deleted/created).
> > > >
> > > > What is this for exactly?
> > > > -Daniel
> > > >
> > > > > ---
> > > > >  drivers/gpu/drm/virtio/Kconfig  |  1 +
> > > > >  drivers/gpu/drm/virtio/virtgpu_drv.h|  4 
> > > > >  drivers/gpu/drm/virtio/virtgpu_object.c | 19 +++
> > > > >  3 files changed, 24 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/virtio/Kconfig 
> > > > > b/drivers/gpu/drm/virtio/Kconfig
> > > > > index b925b8b1da16..e103b7e883b1 100644
> > > > > --- a/drivers/gpu/drm/virtio/Kconfig
> > > > > +++ b/drivers/gpu/drm/virtio/Kconfig
> > > > > @@ -5,6 +5,7 @@ config DRM_VIRTIO_GPU
> > > > > select DRM_KMS_HELPER
> > > > > select DRM_GEM_SHMEM_HELPER
> > > > > select VIRTIO_DMA_SHARED_BUFFER
> > > > > +   select TRACE_GPU_MEM
> > > > > help
> > > > >This is the virtual GPU driver for virtio.  It can be used 
> > > > > with
> > > > >QEMU based VMMs (like KVM or Xen).
> > > > > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> > > > > b/drivers/gpu/drm/virtio/virtgpu_drv.h
> > > > > index 6a232553c99b..7c60e7486bc4 100644
> > > > > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> > > > > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> > > > > @@ -249,6 +249,10 @@ struct virtio_gpu_device {
> > > > > spinlock_t resource_export_lock;
> > > > > /* protects map state and host_visible_mm */
> > > > > spinlock_t host_visible_lock;
> > > > > +
> > > > > +#ifdef CONFIG_TRACE_GPU_MEM
> > > > > +   atomic64_t total_mem;
> > > > > +#endif
> > > > >  };
> > > > >
> > > > >  struct virtio_gpu_fpriv {
> > > > > diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
> > > > > b/drivers/gpu/drm/virtio/virtgpu_object.c
> > > > > index d69a5b6da553..1e16226cebbe 100644
> > > > > --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> > > > > +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> > > > > @@ -25,12 +25,29 @@
> > > > >
> > > > >  #include 
> > > > >  #include 
> > > > > +#ifdef CONFIG_TRACE_GPU_MEM
> > > > > +#include 
> > > > > +#endif
> > > > >
> > > > >  #include "virtgpu_drv.h"
> > > > >
> > > > >  static int virtio_gpu_virglrenderer_workaround = 1;
> > > > >  module_param_named(virglhack, virtio_gpu_virglrenderer_workaround, 
> > > > > int, 0400);
> > > > >
> > > > > +#ifdef CONFIG_TRACE_GPU_MEM
> > > > > +static inline void virtio_gpu_trace_total_mem(struct 
> > > > > virtio_gpu_device *vgdev,
> > > > > + s64 delta)
> > > > > +{
> > > > > +   u64 total_mem = atomic64_add_return(delta, &vgdev->total_mem);
> > > > > +
> > > > > +   trace_gpu_mem_total(0, 0, total_mem);
> > > > > +}
> > > > > +#else
> > > > > +static inline void virtio_gpu_trace_total_mem(struct 
> > > > > virtio_gpu_device *, s64)
> > > > > +{
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > >  int virtio_gpu_resource_id_get(struct virtio_gpu_device *vgdev, 
> > > > > uint32_t *resid)
> > > > >  {
> > > > > if (virtio_gpu_virglrenderer_workaround) {
> > > > > @@ -104,6 +121,7 @@ static void virtio_gpu_free_object(struct 
> > > > > drm_gem_object *obj)
> > > > > struct virtio_gpu_device *vgdev = 
> > > > > bo->base.base.dev->dev_private;
> > > > >
> > > > > if (bo->created) {
> > > > > +   virtio_gpu_trace_total_mem(vgdev, -(obj->size));
> > > > > virtio_gpu_cmd_unref_resource(vgdev, bo);
> > > > > virtio_gpu_notify(vgdev);
> > > > > /* completion handler calls 
> > > > > virtio_gpu_cleanup_object() */
> > > > > @@ -265,6 +283,7 @@ int virtio_gpu_object_create(struct 
> > > > > virtio_gpu_device *vgdev,
> > > > > virtio_gpu_object_attach(vgdev, bo, ents, nents);
> > > > > }
> > > > >
> > > > > +   virtio_gpu_trace_total_mem(vgdev, shmem_obj->base.size);
> > > > > *bo_ptr = bo;
> > > > > return 0;
> > > > >
> > > > > --
> > > > > 2.30.0.284.gd98b1dd5eaa7-goog
> > > > >

Re: [PATCH] virtio-mem: Assign boolean values to a bool variable

2021-01-20 Thread David Hildenbrand
>>> Want to send your ack on this one?
>>
>> Sure
>>
>> Acked-by: David Hildenbrand 
> 
> 
> Added yours and the original Signed-off-by.
> 
> Thanks!

Thanks Michael!


-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-mem: Assign boolean values to a bool variable

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 12:14:14PM +0100, David Hildenbrand wrote:
> On 20.01.21 12:03, Michael S. Tsirkin wrote:
> > On Wed, Jan 20, 2021 at 11:04:18AM +0100, David Hildenbrand wrote:
> >> On 20.01.21 10:57, Michael S. Tsirkin wrote:
> >>> On Wed, Jan 20, 2021 at 10:40:37AM +0100, David Hildenbrand wrote:
>  On 20.01.21 08:50, Jiapeng Zhong wrote:
> > Fix the following coccicheck warnings:
> >
> > ./drivers/virtio/virtio_mem.c:2580:2-25: WARNING: Assignment
> > of 0/1 to bool variable.
> >
> > Reported-by: Abaci Robot 
> > Signed-off-by: Jiapeng Zhong 
> > ---
> >  drivers/virtio/virtio_mem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> > index 9fc9ec4..85a272c 100644
> > --- a/drivers/virtio/virtio_mem.c
> > +++ b/drivers/virtio/virtio_mem.c
> > @@ -2577,7 +2577,7 @@ static int virtio_mem_probe(struct virtio_device 
> > *vdev)
> >  * actually in use (e.g., trying to reload the driver).
> >  */
> > if (vm->plugged_size) {
> > -   vm->unplug_all_required = 1;
> > +   vm->unplug_all_required = true;
> > dev_info(&vm->vdev->dev, "unplugging all memory is 
> > required\n");
> > }
> >  
> >
> 
>  Hi,
> 
>  we already had a fix on the list for quite a while:
> 
>  https://lkml.kernel.org/r/1609233239-60313-1-git-send-email-tiant...@hisilicon.com
> >>>
> >>> Can't find that one.
> >>
> >> Looks like it was only on virtualization@ and a couple of people on cc.
> >>
> >> https://lists.linuxfoundation.org/pipermail/virtualization/2020-December/051662.html
> >>
> >> Interestingly, I cannot find the follow-up ("[PATCH] virtio-mem: use
> >> boolean value when setting vm->unplug_all_required") in the mailing list
> >> archives, even though it has virtualization@ on cc.
> > 
> > 
> > Unsurprising that I didn't merge it then ;)
> 
> Well, you were on cc ;)

Hmm true. Found it now.

> > Want to send your ack on this one?
> 
> Sure
> 
> Acked-by: David Hildenbrand 


Added yours and the original Signed-off-by.

Thanks!

> 
> -- 
> Thanks,
> 
> David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-mem: Assign boolean values to a bool variable

2021-01-20 Thread David Hildenbrand
On 20.01.21 12:03, Michael S. Tsirkin wrote:
> On Wed, Jan 20, 2021 at 11:04:18AM +0100, David Hildenbrand wrote:
>> On 20.01.21 10:57, Michael S. Tsirkin wrote:
>>> On Wed, Jan 20, 2021 at 10:40:37AM +0100, David Hildenbrand wrote:
 On 20.01.21 08:50, Jiapeng Zhong wrote:
> Fix the following coccicheck warnings:
>
> ./drivers/virtio/virtio_mem.c:2580:2-25: WARNING: Assignment
> of 0/1 to bool variable.
>
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Zhong 
> ---
>  drivers/virtio/virtio_mem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 9fc9ec4..85a272c 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -2577,7 +2577,7 @@ static int virtio_mem_probe(struct virtio_device 
> *vdev)
>* actually in use (e.g., trying to reload the driver).
>*/
>   if (vm->plugged_size) {
> - vm->unplug_all_required = 1;
> + vm->unplug_all_required = true;
>   dev_info(&vm->vdev->dev, "unplugging all memory is required\n");
>   }
>  
>

 Hi,

 we already had a fix on the list for quite a while:

 https://lkml.kernel.org/r/1609233239-60313-1-git-send-email-tiant...@hisilicon.com
>>>
>>> Can't find that one.
>>
>> Looks like it was only on virtualization@ and a couple of people on cc.
>>
>> https://lists.linuxfoundation.org/pipermail/virtualization/2020-December/051662.html
>>
>> Interestingly, I cannot find the follow-up ("[PATCH] virtio-mem: use
>> boolean value when setting vm->unplug_all_required") in the mailing list
>> archives, even though it has virtualization@ on cc.
> 
> 
> Unsurprising that I didn't merge it then ;)

Well, you were on cc ;)

> Want to send your ack on this one?

Sure

Acked-by: David Hildenbrand 


-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 4/4] drm/qxl: handle shadow in primary destroy

2021-01-20 Thread Gerd Hoffmann
qxl_primary_atomic_disable must check whenever the framebuffer bo has a
shadow surface and in case it has check the shadow primary status.

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_display.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 60331e31861a..f5ee8cd72b5b 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -562,6 +562,8 @@ static void qxl_primary_atomic_disable(struct drm_plane 
*plane,
if (old_state->fb) {
struct qxl_bo *bo = gem_to_qxl_bo(old_state->fb->obj[0]);
 
+   if (bo->shadow)
+   bo = bo->shadow;
if (bo->is_primary) {
qxl_io_destroy_primary(qdev);
bo->is_primary = false;
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 2/4] drm/qxl: unpin release objects

2021-01-20 Thread Gerd Hoffmann
Balances the qxl_create_bo(..., pinned=true, ...);
call in qxl_release_bo_alloc().

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_release.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index 0fcfc952d5e9..add979cba11b 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -166,6 +166,7 @@ qxl_release_free_list(struct qxl_release *release)
entry = container_of(release->bos.next,
 struct qxl_bo_list, tv.head);
bo = to_qxl_bo(entry->tv.bo);
+   bo->tbo.pin_count = 0; /* ttm_bo_unpin(&bo->tbo); */
qxl_bo_unref(&bo);
list_del(&entry->tv.head);
kfree(entry);
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 1/4] drm/qxl: use drmm_mode_config_init

2021-01-20 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Daniel Vetter 
---
 drivers/gpu/drm/qxl/qxl_display.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 012bce0cdb65..38d6b596094d 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -1195,7 +1195,9 @@ int qxl_modeset_init(struct qxl_device *qdev)
int i;
int ret;
 
-   drm_mode_config_init(&qdev->ddev);
+   ret = drmm_mode_config_init(&qdev->ddev);
+   if (ret)
+   return ret;
 
ret = qxl_create_monitors_object(qdev);
if (ret)
@@ -1228,5 +1230,4 @@ int qxl_modeset_init(struct qxl_device *qdev)
 void qxl_modeset_fini(struct qxl_device *qdev)
 {
qxl_destroy_monitors_object(qdev);
-   drm_mode_config_cleanup(&qdev->ddev);
 }
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 3/4] drm/qxl: release shadow on shutdown

2021-01-20 Thread Gerd Hoffmann
In case we have a shadow surface on shutdown release
it so it doesn't leak.

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_display.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 38d6b596094d..60331e31861a 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -1229,5 +1229,9 @@ int qxl_modeset_init(struct qxl_device *qdev)
 
 void qxl_modeset_fini(struct qxl_device *qdev)
 {
+   if (qdev->dumb_shadow_bo) {
+   drm_gem_object_put(&qdev->dumb_shadow_bo->tbo.base);
+   qdev->dumb_shadow_bo = NULL;
+   }
qxl_destroy_monitors_object(qdev);
 }
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v3 03/11] vdpa: Remove the restriction that only supports virtio-net devices

2021-01-20 Thread Stefano Garzarella

On Wed, Jan 20, 2021 at 11:46:38AM +0800, Jason Wang wrote:


On 2021/1/19 下午12:59, Xie Yongji wrote:

With VDUSE, we should be able to support all kinds of virtio devices.

Signed-off-by: Xie Yongji 
---
 drivers/vhost/vdpa.c | 29 +++--
 1 file changed, 3 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 29ed4173f04e..448be7875b6d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "vhost.h"
@@ -185,26 +186,6 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 
__user *statusp)
return 0;
 }
-static int vhost_vdpa_config_validate(struct vhost_vdpa *v,
- struct vhost_vdpa_config *c)
-{
-   long size = 0;
-
-   switch (v->virtio_id) {
-   case VIRTIO_ID_NET:
-   size = sizeof(struct virtio_net_config);
-   break;
-   }
-
-   if (c->len == 0)
-   return -EINVAL;
-
-   if (c->len > size - c->off)
-   return -E2BIG;
-
-   return 0;
-}



I think we should use a separate patch for this.


For the vdpa-blk simulator I had the same issues and I'm adding a 
.get_config_size() callback to vdpa devices.


Do you think make sense or is better to remove this check in vhost/vdpa, 
delegating the boundaries checks to get_config/set_config callbacks.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-mem: Assign boolean values to a bool variable

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 11:04:18AM +0100, David Hildenbrand wrote:
> On 20.01.21 10:57, Michael S. Tsirkin wrote:
> > On Wed, Jan 20, 2021 at 10:40:37AM +0100, David Hildenbrand wrote:
> >> On 20.01.21 08:50, Jiapeng Zhong wrote:
> >>> Fix the following coccicheck warnings:
> >>>
> >>> ./drivers/virtio/virtio_mem.c:2580:2-25: WARNING: Assignment
> >>> of 0/1 to bool variable.
> >>>
> >>> Reported-by: Abaci Robot 
> >>> Signed-off-by: Jiapeng Zhong 
> >>> ---
> >>>  drivers/virtio/virtio_mem.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> >>> index 9fc9ec4..85a272c 100644
> >>> --- a/drivers/virtio/virtio_mem.c
> >>> +++ b/drivers/virtio/virtio_mem.c
> >>> @@ -2577,7 +2577,7 @@ static int virtio_mem_probe(struct virtio_device 
> >>> *vdev)
> >>>* actually in use (e.g., trying to reload the driver).
> >>>*/
> >>>   if (vm->plugged_size) {
> >>> - vm->unplug_all_required = 1;
> >>> + vm->unplug_all_required = true;
> >>>   dev_info(&vm->vdev->dev, "unplugging all memory is required\n");
> >>>   }
> >>>  
> >>>
> >>
> >> Hi,
> >>
> >> we already had a fix on the list for quite a while:
> >>
> >> https://lkml.kernel.org/r/1609233239-60313-1-git-send-email-tiant...@hisilicon.com
> > 
> > Can't find that one.
> 
> Looks like it was only on virtualization@ and a couple of people on cc.
> 
> https://lists.linuxfoundation.org/pipermail/virtualization/2020-December/051662.html
> 
> Interestingly, I cannot find the follow-up ("[PATCH] virtio-mem: use
> boolean value when setting vm->unplug_all_required") in the mailing list
> archives, even though it has virtualization@ on cc.


Unsurprising that I didn't merge it then ;)
Want to send your ack on this one?

> -- 
> Thanks,
> 
> David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-mem: Assign boolean values to a bool variable

2021-01-20 Thread David Hildenbrand
On 20.01.21 10:57, Michael S. Tsirkin wrote:
> On Wed, Jan 20, 2021 at 10:40:37AM +0100, David Hildenbrand wrote:
>> On 20.01.21 08:50, Jiapeng Zhong wrote:
>>> Fix the following coccicheck warnings:
>>>
>>> ./drivers/virtio/virtio_mem.c:2580:2-25: WARNING: Assignment
>>> of 0/1 to bool variable.
>>>
>>> Reported-by: Abaci Robot 
>>> Signed-off-by: Jiapeng Zhong 
>>> ---
>>>  drivers/virtio/virtio_mem.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
>>> index 9fc9ec4..85a272c 100644
>>> --- a/drivers/virtio/virtio_mem.c
>>> +++ b/drivers/virtio/virtio_mem.c
>>> @@ -2577,7 +2577,7 @@ static int virtio_mem_probe(struct virtio_device 
>>> *vdev)
>>>  * actually in use (e.g., trying to reload the driver).
>>>  */
>>> if (vm->plugged_size) {
>>> -   vm->unplug_all_required = 1;
>>> +   vm->unplug_all_required = true;
>>> dev_info(&vm->vdev->dev, "unplugging all memory is required\n");
>>> }
>>>  
>>>
>>
>> Hi,
>>
>> we already had a fix on the list for quite a while:
>>
>> https://lkml.kernel.org/r/1609233239-60313-1-git-send-email-tiant...@hisilicon.com
> 
> Can't find that one.

Looks like it was only on virtualization@ and a couple of people on cc.

https://lists.linuxfoundation.org/pipermail/virtualization/2020-December/051662.html

Interestingly, I cannot find the follow-up ("[PATCH] virtio-mem: use
boolean value when setting vm->unplug_all_required") in the mailing list
archives, even though it has virtualization@ on cc.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-mem: Assign boolean values to a bool variable

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 10:40:37AM +0100, David Hildenbrand wrote:
> On 20.01.21 08:50, Jiapeng Zhong wrote:
> > Fix the following coccicheck warnings:
> > 
> > ./drivers/virtio/virtio_mem.c:2580:2-25: WARNING: Assignment
> > of 0/1 to bool variable.
> > 
> > Reported-by: Abaci Robot 
> > Signed-off-by: Jiapeng Zhong 
> > ---
> >  drivers/virtio/virtio_mem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> > index 9fc9ec4..85a272c 100644
> > --- a/drivers/virtio/virtio_mem.c
> > +++ b/drivers/virtio/virtio_mem.c
> > @@ -2577,7 +2577,7 @@ static int virtio_mem_probe(struct virtio_device 
> > *vdev)
> >  * actually in use (e.g., trying to reload the driver).
> >  */
> > if (vm->plugged_size) {
> > -   vm->unplug_all_required = 1;
> > +   vm->unplug_all_required = true;
> > dev_info(&vm->vdev->dev, "unplugging all memory is required\n");
> > }
> >  
> > 
> 
> Hi,
> 
> we already had a fix on the list for quite a while:
> 
> https://lkml.kernel.org/r/1609233239-60313-1-git-send-email-tiant...@hisilicon.com

Can't find that one.

> However, looks like Michael queued your patch on the vhost tree instead.
> 
> -- 
> Thanks,
> 
> David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-mem: Assign boolean values to a bool variable

2021-01-20 Thread David Hildenbrand
On 20.01.21 08:50, Jiapeng Zhong wrote:
> Fix the following coccicheck warnings:
> 
> ./drivers/virtio/virtio_mem.c:2580:2-25: WARNING: Assignment
> of 0/1 to bool variable.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Zhong 
> ---
>  drivers/virtio/virtio_mem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 9fc9ec4..85a272c 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -2577,7 +2577,7 @@ static int virtio_mem_probe(struct virtio_device *vdev)
>* actually in use (e.g., trying to reload the driver).
>*/
>   if (vm->plugged_size) {
> - vm->unplug_all_required = 1;
> + vm->unplug_all_required = true;
>   dev_info(&vm->vdev->dev, "unplugging all memory is required\n");
>   }
>  
> 

Hi,

we already had a fix on the list for quite a while:

https://lkml.kernel.org/r/1609233239-60313-1-git-send-email-tiant...@hisilicon.com

However, looks like Michael queued your patch on the vhost tree instead.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] drm/virtio: Track total GPU memory for virtio driver

2021-01-20 Thread Daniel Vetter
On Tue, Jan 19, 2021 at 11:08:12AM -0800, Yiwei Zhang wrote:
> On Mon, Jan 18, 2021 at 11:03 PM Daniel Vetter  wrote:
> >
> > On Tue, Jan 19, 2021 at 12:41 AM Yiwei Zhang  wrote:
> > >
> > > On the success of virtio_gpu_object_create, add size of newly allocated
> > > bo to the tracled total_mem. In drm_gem_object_funcs.free, after the gem
> > > bo lost its last refcount, subtract the bo size from the tracked
> > > total_mem if the original underlying memory allocation is successful.
> > >
> > > Signed-off-by: Yiwei Zhang 
> >
> > Isn't this something that ideally we'd for everyone? Also tracepoint
> > for showing the total feels like tracepoint abuse, usually we show
> > totals somewhere in debugfs or similar, and tracepoint just for what's
> > happening (i.e. which object got deleted/created).
> >
> > What is this for exactly?
> > -Daniel
> >
> > > ---
> > >  drivers/gpu/drm/virtio/Kconfig  |  1 +
> > >  drivers/gpu/drm/virtio/virtgpu_drv.h|  4 
> > >  drivers/gpu/drm/virtio/virtgpu_object.c | 19 +++
> > >  3 files changed, 24 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/virtio/Kconfig 
> > > b/drivers/gpu/drm/virtio/Kconfig
> > > index b925b8b1da16..e103b7e883b1 100644
> > > --- a/drivers/gpu/drm/virtio/Kconfig
> > > +++ b/drivers/gpu/drm/virtio/Kconfig
> > > @@ -5,6 +5,7 @@ config DRM_VIRTIO_GPU
> > > select DRM_KMS_HELPER
> > > select DRM_GEM_SHMEM_HELPER
> > > select VIRTIO_DMA_SHARED_BUFFER
> > > +   select TRACE_GPU_MEM
> > > help
> > >This is the virtual GPU driver for virtio.  It can be used with
> > >QEMU based VMMs (like KVM or Xen).
> > > diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
> > > b/drivers/gpu/drm/virtio/virtgpu_drv.h
> > > index 6a232553c99b..7c60e7486bc4 100644
> > > --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> > > +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> > > @@ -249,6 +249,10 @@ struct virtio_gpu_device {
> > > spinlock_t resource_export_lock;
> > > /* protects map state and host_visible_mm */
> > > spinlock_t host_visible_lock;
> > > +
> > > +#ifdef CONFIG_TRACE_GPU_MEM
> > > +   atomic64_t total_mem;
> > > +#endif
> > >  };
> > >
> > >  struct virtio_gpu_fpriv {
> > > diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c 
> > > b/drivers/gpu/drm/virtio/virtgpu_object.c
> > > index d69a5b6da553..1e16226cebbe 100644
> > > --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> > > +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> > > @@ -25,12 +25,29 @@
> > >
> > >  #include 
> > >  #include 
> > > +#ifdef CONFIG_TRACE_GPU_MEM
> > > +#include 
> > > +#endif
> > >
> > >  #include "virtgpu_drv.h"
> > >
> > >  static int virtio_gpu_virglrenderer_workaround = 1;
> > >  module_param_named(virglhack, virtio_gpu_virglrenderer_workaround, int, 
> > > 0400);
> > >
> > > +#ifdef CONFIG_TRACE_GPU_MEM
> > > +static inline void virtio_gpu_trace_total_mem(struct virtio_gpu_device 
> > > *vgdev,
> > > + s64 delta)
> > > +{
> > > +   u64 total_mem = atomic64_add_return(delta, &vgdev->total_mem);
> > > +
> > > +   trace_gpu_mem_total(0, 0, total_mem);
> > > +}
> > > +#else
> > > +static inline void virtio_gpu_trace_total_mem(struct virtio_gpu_device 
> > > *, s64)
> > > +{
> > > +}
> > > +#endif
> > > +
> > >  int virtio_gpu_resource_id_get(struct virtio_gpu_device *vgdev, uint32_t 
> > > *resid)
> > >  {
> > > if (virtio_gpu_virglrenderer_workaround) {
> > > @@ -104,6 +121,7 @@ static void virtio_gpu_free_object(struct 
> > > drm_gem_object *obj)
> > > struct virtio_gpu_device *vgdev = bo->base.base.dev->dev_private;
> > >
> > > if (bo->created) {
> > > +   virtio_gpu_trace_total_mem(vgdev, -(obj->size));
> > > virtio_gpu_cmd_unref_resource(vgdev, bo);
> > > virtio_gpu_notify(vgdev);
> > > /* completion handler calls virtio_gpu_cleanup_object() */
> > > @@ -265,6 +283,7 @@ int virtio_gpu_object_create(struct virtio_gpu_device 
> > > *vgdev,
> > > virtio_gpu_object_attach(vgdev, bo, ents, nents);
> > > }
> > >
> > > +   virtio_gpu_trace_total_mem(vgdev, shmem_obj->base.size);
> > > *bo_ptr = bo;
> > > return 0;
> > >
> > > --
> > > 2.30.0.284.gd98b1dd5eaa7-goog
> > >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> 
> Thanks for your reply! Android Cuttlefish virtual platform is using
> the virtio-gpu driver, and we currently are carrying this small patch
> at the downstream side. This is essential for us because:
> (1) Android has deprecated debugfs on production devices already
> (2) Android GPU drivers are not DRM based, and this won't change in a
> short term.
> 
> Android relies on this tracepoint + eBPF to make the GPU memory totals
> available at runtime on production devices, which has been enforced
> already. Not only game developers c

Re: [PATCH v1] vdpa/mlx5: Fix memory key MTT population

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 10:11:54AM +0200, Eli Cohen wrote:
> On Wed, Jan 20, 2021 at 02:57:05AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Jan 20, 2021 at 07:36:19AM +0200, Eli Cohen wrote:
> > > On Fri, Jan 08, 2021 at 04:38:55PM +0800, Jason Wang wrote:
> > > 
> > > Hi Michael,
> > > this patch is a fix. Are you going to merge it?
> > 
> > yes - in the next pull request.
> > 
> 
> Great thanks.
> Can you send the path to your git tree where you keep the patches you
> intend to merge?

https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next

Note I often rebase it (e.g. just did).

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/7] ALSA: virtio: PCM substream operators

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 01:36:33AM +0100, Anton Yakovlev wrote:
> Introduce the operators required for the operation of substreams.
> 
> Signed-off-by: Anton Yakovlev 
> ---
>  sound/virtio/Makefile |   3 +-
>  sound/virtio/virtio_pcm.c |   5 +-
>  sound/virtio/virtio_pcm.h |   2 +
>  sound/virtio/virtio_pcm_ops.c | 509 ++
>  4 files changed, 517 insertions(+), 2 deletions(-)
>  create mode 100644 sound/virtio/virtio_pcm_ops.c
> 
> diff --git a/sound/virtio/Makefile b/sound/virtio/Makefile
> index 626af3cc3ed7..34493226793f 100644
> --- a/sound/virtio/Makefile
> +++ b/sound/virtio/Makefile
> @@ -6,5 +6,6 @@ virtio_snd-objs := \
>   virtio_card.o \
>   virtio_ctl_msg.o \
>   virtio_pcm.o \
> - virtio_pcm_msg.o
> + virtio_pcm_msg.o \
> + virtio_pcm_ops.o
>  
> diff --git a/sound/virtio/virtio_pcm.c b/sound/virtio/virtio_pcm.c
> index 1ab50dcc88c8..6a1ca6b2c3ca 100644
> --- a/sound/virtio/virtio_pcm.c
> +++ b/sound/virtio/virtio_pcm.c
> @@ -121,7 +121,8 @@ static int virtsnd_pcm_build_hw(struct 
> virtio_pcm_substream *substream,
>   SNDRV_PCM_INFO_MMAP_VALID |
>   SNDRV_PCM_INFO_BATCH |
>   SNDRV_PCM_INFO_BLOCK_TRANSFER |
> - SNDRV_PCM_INFO_INTERLEAVED;
> + SNDRV_PCM_INFO_INTERLEAVED |
> + SNDRV_PCM_INFO_PAUSE;
>  
>   if (!info->channels_min || info->channels_min > info->channels_max) {
>   dev_err(&vdev->dev,
> @@ -503,6 +504,8 @@ int virtsnd_pcm_build_devs(struct virtio_snd *snd)
>   if (rc)
>   return rc;
>   }
> +
> + snd_pcm_set_ops(pcm->pcm, i, &virtsnd_pcm_ops);
>   }
>  
>   return 0;
> diff --git a/sound/virtio/virtio_pcm.h b/sound/virtio/virtio_pcm.h
> index d011b7e1d18d..fe467bc05d8b 100644
> --- a/sound/virtio/virtio_pcm.h
> +++ b/sound/virtio/virtio_pcm.h
> @@ -90,6 +90,8 @@ struct virtio_pcm {
>   struct virtio_pcm_stream streams[SNDRV_PCM_STREAM_LAST + 1];
>  };
>  
> +extern const struct snd_pcm_ops virtsnd_pcm_ops;
> +
>  int virtsnd_pcm_validate(struct virtio_device *vdev);
>  
>  int virtsnd_pcm_parse_cfg(struct virtio_snd *snd);
> diff --git a/sound/virtio/virtio_pcm_ops.c b/sound/virtio/virtio_pcm_ops.c
> new file mode 100644
> index ..8d26c1144ad6
> --- /dev/null
> +++ b/sound/virtio/virtio_pcm_ops.c
> @@ -0,0 +1,509 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Sound card driver for virtio
> + * Copyright (C) 2020  OpenSynergy GmbH
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see .
> + */
> +#include 
> +
> +#include "virtio_card.h"
> +
> +/*
> + * Our main concern here is maintaining the correct state of the underlying 
> I/O
> + * virtqueues. Thus, operators are implemented to support all of the 
> following
> + * possible control paths (excluding all trivial ones):
> + *
> + *+-+
> + *| open()  |<--+
> + *+++   |
> + * v|
> + *  +--+--+ |
> + *   +->| hw_params() |<-+  |
> + *   |  +-+  |  |
> + *   | v |  |
> + *   |   +---+   |  |
> + *   |   | prepare() |<---+  |  |
> + *   |   +---+|  |  |
> + *   | v  |  |  |
> + *   |+-+ |  |  |
> + * +---+  | trigger(START/  | |  |  |
> + * | restore() |  | PAUSE_RELEASE/  |<-+  |  |  |
> + * +---+  | RESUME) |  |  |  |  |
> + *   ^+-+  |  |  |  |
> + *   | v   |  |  |  |
> + *   |   +---+ |  |  |  |
> + *   |   | pointer() | |  |  |  |
> + *   |   +---+ |  |  |  |
> + *   | v   |  |  |  |
> + *   |  +-+|  |  |  |
> + * +---+| trigger

Re: [PATCH 2/7] uapi: virtio_snd: add the sound device header file

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 03:19:55AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 20, 2021 at 01:36:30AM +0100, Anton Yakovlev wrote:
> > The file contains the definitions for the sound device from the OASIS
> > virtio spec.
> > 
> > Signed-off-by: Anton Yakovlev 
> > ---
> >  MAINTAINERS |   6 +
> >  include/uapi/linux/virtio_snd.h | 361 
> >  2 files changed, 367 insertions(+)
> >  create mode 100644 include/uapi/linux/virtio_snd.h
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 00836f6452f0..6dfd59eafe82 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -18936,6 +18936,12 @@ W: https://virtio-mem.gitlab.io/
> >  F: drivers/virtio/virtio_mem.c
> >  F: include/uapi/linux/virtio_mem.h
> >  
> > +VIRTIO SOUND DRIVER
> > +M: Anton Yakovlev 
> > +L: virtualization@lists.linux-foundation.org
> > +S: Maintained
> > +F: include/uapi/linux/virtio_snd.h
> > +
> >  VIRTUAL BOX GUEST DEVICE DRIVER
> >  M: Hans de Goede 
> >  M: Arnd Bergmann 
> 
> You want sound/virtio here too, right?
> I'd just squash this with the next patch in series.


I meant just the MAINTAINERS part. Not a big deal, admittedly ...

> 
> > diff --git a/include/uapi/linux/virtio_snd.h 
> > b/include/uapi/linux/virtio_snd.h
> > new file mode 100644
> > index ..1ff6310e54d6
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_snd.h
> > @@ -0,0 +1,361 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause */
> > +/*
> > + * Copyright (C) 2020  OpenSynergy GmbH
> > + *
> > + * This header is BSD licensed so anyone can use the definitions to
> > + * implement compatible drivers/servers.
> > + *
> > + * Redistribution and use in source and binary forms, with or without
> > + * modification, are permitted provided that the following conditions
> > + * are met:
> > + * 1. Redistributions of source code must retain the above copyright
> > + *notice, this list of conditions and the following disclaimer.
> > + * 2. Redistributions in binary form must reproduce the above copyright
> > + *notice, this list of conditions and the following disclaimer in the
> > + *documentation and/or other materials provided with the distribution.
> > + * 3. Neither the name of OpenSynergy GmbH nor the names of its 
> > contributors
> > + *may be used to endorse or promote products derived from this software
> > + *without specific prior written permission.
> > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> > + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
> > + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> > + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> > + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> > + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> > + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> > + * SUCH DAMAGE.
> > + */
> > +#ifndef VIRTIO_SND_IF_H
> > +#define VIRTIO_SND_IF_H
> > +
> > +#include 
> > +
> > +/***
> > + * CONFIGURATION SPACE
> > + */
> > +struct virtio_snd_config {
> > +   /* # of available physical jacks */
> > +   __le32 jacks;
> > +   /* # of available PCM streams */
> > +   __le32 streams;
> > +   /* # of available channel maps */
> > +   __le32 chmaps;
> > +};
> > +
> > +enum {
> > +   /* device virtqueue indexes */
> > +   VIRTIO_SND_VQ_CONTROL = 0,
> > +   VIRTIO_SND_VQ_EVENT,
> > +   VIRTIO_SND_VQ_TX,
> > +   VIRTIO_SND_VQ_RX,
> > +   /* # of device virtqueues */
> > +   VIRTIO_SND_VQ_MAX
> > +};
> > +
> > +/***
> > + * COMMON DEFINITIONS
> > + */
> > +
> > +/* supported dataflow directions */
> > +enum {
> > +   VIRTIO_SND_D_OUTPUT = 0,
> > +   VIRTIO_SND_D_INPUT
> > +};
> > +
> > +enum {
> > +   /* jack control request types */
> > +   VIRTIO_SND_R_JACK_INFO = 1,
> > +   VIRTIO_SND_R_JACK_REMAP,
> > +
> > +   /* PCM control request types */
> > +   VIRTIO_SND_R_PCM_INFO = 0x0100,
> > +   VIRTIO_SND_R_PCM_SET_PARAMS,
> > +   VIRTIO_SND_R_PCM_PREPARE,
> > +   VIRTIO_SND_R_PCM_RELEASE,
> > +   VIRTIO_SND_R_PCM_START,
> > +   VIRTIO_SND_R_PCM_STOP,
> > +
> > +   /* channel map control request types */
> > +   VIRTIO_SND_R_CHMAP_INFO = 0x0200,
> > +
> > +   /* jack event types */
> > +   VIRTIO_SND_EVT_JACK_CONNECTED = 0x1000,
> > +   VIRTIO_SND_EVT_JACK_DISCONNECTED,
> > +
> > +   /* PCM event types */
> > +   VIRTIO_SND_EVT_PCM_PERIOD_ELAPSED = 0x1100,
> > +   VIRTIO_SND_EVT_PCM_XRUN,
> > +
> > +   /* common status codes */
> > +   VIRTIO_

Re: [PATCH 2/7] uapi: virtio_snd: add the sound device header file

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 01:36:30AM +0100, Anton Yakovlev wrote:
> The file contains the definitions for the sound device from the OASIS
> virtio spec.
> 
> Signed-off-by: Anton Yakovlev 
> ---
>  MAINTAINERS |   6 +
>  include/uapi/linux/virtio_snd.h | 361 
>  2 files changed, 367 insertions(+)
>  create mode 100644 include/uapi/linux/virtio_snd.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 00836f6452f0..6dfd59eafe82 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -18936,6 +18936,12 @@ W:   https://virtio-mem.gitlab.io/
>  F:   drivers/virtio/virtio_mem.c
>  F:   include/uapi/linux/virtio_mem.h
>  
> +VIRTIO SOUND DRIVER
> +M:   Anton Yakovlev 
> +L:   virtualization@lists.linux-foundation.org
> +S:   Maintained
> +F:   include/uapi/linux/virtio_snd.h
> +
>  VIRTUAL BOX GUEST DEVICE DRIVER
>  M:   Hans de Goede 
>  M:   Arnd Bergmann 

Who's merging this driver me? If so pls add m...@redhat.com so I'm copied
on patches.

> diff --git a/include/uapi/linux/virtio_snd.h b/include/uapi/linux/virtio_snd.h
> new file mode 100644
> index ..1ff6310e54d6
> --- /dev/null
> +++ b/include/uapi/linux/virtio_snd.h
> @@ -0,0 +1,361 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +/*
> + * Copyright (C) 2020  OpenSynergy GmbH
> + *
> + * This header is BSD licensed so anyone can use the definitions to
> + * implement compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of OpenSynergy GmbH nor the names of its contributors
> + *may be used to endorse or promote products derived from this software
> + *without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
> + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#ifndef VIRTIO_SND_IF_H
> +#define VIRTIO_SND_IF_H
> +
> +#include 
> +
> +/***
> + * CONFIGURATION SPACE
> + */
> +struct virtio_snd_config {
> + /* # of available physical jacks */
> + __le32 jacks;
> + /* # of available PCM streams */
> + __le32 streams;
> + /* # of available channel maps */
> + __le32 chmaps;
> +};
> +
> +enum {
> + /* device virtqueue indexes */
> + VIRTIO_SND_VQ_CONTROL = 0,
> + VIRTIO_SND_VQ_EVENT,
> + VIRTIO_SND_VQ_TX,
> + VIRTIO_SND_VQ_RX,
> + /* # of device virtqueues */
> + VIRTIO_SND_VQ_MAX
> +};
> +
> +/***
> + * COMMON DEFINITIONS
> + */
> +
> +/* supported dataflow directions */
> +enum {
> + VIRTIO_SND_D_OUTPUT = 0,
> + VIRTIO_SND_D_INPUT
> +};
> +
> +enum {
> + /* jack control request types */
> + VIRTIO_SND_R_JACK_INFO = 1,
> + VIRTIO_SND_R_JACK_REMAP,
> +
> + /* PCM control request types */
> + VIRTIO_SND_R_PCM_INFO = 0x0100,
> + VIRTIO_SND_R_PCM_SET_PARAMS,
> + VIRTIO_SND_R_PCM_PREPARE,
> + VIRTIO_SND_R_PCM_RELEASE,
> + VIRTIO_SND_R_PCM_START,
> + VIRTIO_SND_R_PCM_STOP,
> +
> + /* channel map control request types */
> + VIRTIO_SND_R_CHMAP_INFO = 0x0200,
> +
> + /* jack event types */
> + VIRTIO_SND_EVT_JACK_CONNECTED = 0x1000,
> + VIRTIO_SND_EVT_JACK_DISCONNECTED,
> +
> + /* PCM event types */
> + VIRTIO_SND_EVT_PCM_PERIOD_ELAPSED = 0x1100,
> + VIRTIO_SND_EVT_PCM_XRUN,
> +
> + /* common status codes */
> + VIRTIO_SND_S_OK = 0x8000,
> + VIRTIO_SND_S_BAD_MSG,
> + VIRTIO_SND_S_NOT_SUPP,
> + VIRTIO_SND_S_IO_ERR
> +};
> +
> +/* common header */
> +struct virtio_snd_hdr {
> + __le32 code;
> +};
> +
> +/* event notification */
> +struct virtio_snd_event {
> + /* VIRTIO_SND_EVT_XXX */
> + struct virtio_snd_hdr hdr;
> + 

Re: [PATCH 4/7] ALSA: virtio: handling control and I/O messages for the PCM device

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 01:36:32AM +0100, Anton Yakovlev wrote:
> The driver implements a message-based transport for I/O substream
> operations. Before the start of the substream, the hardware buffer is
> sliced into I/O messages, the number of which is equal to the current
> number of periods. The size of each message is equal to the current
> size of one period.
> 
> I/O messages are organized in an ordered queue. The completion of the
> I/O message indicates an expired period (the only exception is the end
> of the stream for the capture substream). Upon completion, the message
> is automatically re-added to the end of the queue.
> 
> When an I/O message is completed, the hw_ptr value is incremented
> unconditionally (to ensure that the hw_ptr always correctly reflects
> the state of the messages in the virtqueue). Due to its asynchronous
> nature, a message can be completed when the runtime structure no longer
> exists. For this reason, the values from this structure are cached in
> the virtio substream, which are required to calculate the new value of
> the hw_ptr.
> 
> Signed-off-by: Anton Yakovlev 
> ---
>  sound/virtio/Makefile |   3 +-
>  sound/virtio/virtio_card.c|  33 
>  sound/virtio/virtio_card.h|   9 +
>  sound/virtio/virtio_pcm.c |   3 +
>  sound/virtio/virtio_pcm.h |  31 
>  sound/virtio/virtio_pcm_msg.c | 317 ++
>  6 files changed, 395 insertions(+), 1 deletion(-)
>  create mode 100644 sound/virtio/virtio_pcm_msg.c
> 
> diff --git a/sound/virtio/Makefile b/sound/virtio/Makefile
> index 69162a545a41..626af3cc3ed7 100644
> --- a/sound/virtio/Makefile
> +++ b/sound/virtio/Makefile
> @@ -5,5 +5,6 @@ obj-$(CONFIG_SND_VIRTIO) += virtio_snd.o
>  virtio_snd-objs := \
>   virtio_card.o \
>   virtio_ctl_msg.o \
> - virtio_pcm.o
> + virtio_pcm.o \
> + virtio_pcm_msg.o
>  
> diff --git a/sound/virtio/virtio_card.c b/sound/virtio/virtio_card.c
> index 293d497f24e7..dc703fc662f5 100644
> --- a/sound/virtio/virtio_card.c
> +++ b/sound/virtio/virtio_card.c
> @@ -145,6 +145,12 @@ static int virtsnd_find_vqs(struct virtio_snd *snd)
>   callbacks[VIRTIO_SND_VQ_CONTROL] = virtsnd_ctl_notify_cb;
>   callbacks[VIRTIO_SND_VQ_EVENT] = virtsnd_event_notify_cb;
>  
> + virtio_cread(vdev, struct virtio_snd_config, streams, &n);
> + if (n) {
> + callbacks[VIRTIO_SND_VQ_TX] = virtsnd_pcm_tx_notify_cb;
> + callbacks[VIRTIO_SND_VQ_RX] = virtsnd_pcm_rx_notify_cb;
> + }
> +
>   rc = virtio_find_vqs(vdev, VIRTIO_SND_VQ_MAX, vqs, callbacks, names,
>NULL);
>   if (rc) {
> @@ -186,15 +192,42 @@ static int virtsnd_find_vqs(struct virtio_snd *snd)
>   * virtsnd_enable_vqs() - Enable the event, tx and rx virtqueues.
>   * @snd: VirtIO sound device.
>   *
> + * The tx queue is enabled only if the device supports playback stream(s).
> + *
> + * The rx queue is enabled only if the device supports capture stream(s).
> + *
>   * Context: Any context.
>   */
>  static void virtsnd_enable_vqs(struct virtio_snd *snd)
>  {
> + struct virtio_device *vdev = snd->vdev;
>   struct virtqueue *vqueue;
> + struct virtio_pcm *pcm;
> + unsigned int npbs = 0;
> + unsigned int ncps = 0;
>  
>   vqueue = snd->queues[VIRTIO_SND_VQ_EVENT].vqueue;
>   if (!virtqueue_enable_cb(vqueue))
>   virtsnd_event_notify_cb(vqueue);
> +
> + list_for_each_entry(pcm, &snd->pcm_list, list) {
> + npbs += pcm->streams[SNDRV_PCM_STREAM_PLAYBACK].nsubstreams;
> + ncps += pcm->streams[SNDRV_PCM_STREAM_CAPTURE].nsubstreams;
> + }
> +
> + if (npbs) {
> + vqueue = snd->queues[VIRTIO_SND_VQ_TX].vqueue;
> + if (!virtqueue_enable_cb(vqueue))
> + dev_warn(&vdev->dev,
> +  "suspicious notification in the TX queue\n");
> + }
> +
> + if (ncps) {
> + vqueue = snd->queues[VIRTIO_SND_VQ_RX].vqueue;
> + if (!virtqueue_enable_cb(vqueue))
> + dev_warn(&vdev->dev,
> +  "suspicious notification in the RX queue\n");
> + }

Not sure how all this prevents use of same vq from multiple threads ...
And why are we sure there are no buffers yet?  If that is because
nothing yet happened, then I'd also like to point out that a vq starts
out with callbacks enabled, so you don't need to do that first thing ...



>  }
>  
>  /**
> diff --git a/sound/virtio/virtio_card.h b/sound/virtio/virtio_card.h
> index be6651a6aaf8..b11c09984882 100644
> --- a/sound/virtio/virtio_card.h
> +++ b/sound/virtio/virtio_card.h
> @@ -89,4 +89,13 @@ virtsnd_rx_queue(struct virtio_snd *snd)
>   return &snd->queues[VIRTIO_SND_VQ_RX];
>  }
>  
> +static inline struct virtio_snd_queue *
> +virtsnd_pcm_queue(struct virtio_pcm_substream *substream)
> +{
> + if (substream->direction == SNDRV_PCM_STREAM_PLAYBACK)
> + ret

Re: [PATCH 3/7] ALSA: virtio: add virtio sound driver

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 01:36:31AM +0100, Anton Yakovlev wrote:
> Introduce skeleton of the virtio sound driver. The driver implements
> the virtio sound device specification, which has become part of the
> virtio standard.
> 
> Initial initialization of the device, virtqueues and creation of an
> empty ALSA sound device. Also, handling DEVICE_NEEDS_RESET device
> status.
> 
> Signed-off-by: Anton Yakovlev 
> ---
>  MAINTAINERS   |   2 +
>  sound/Kconfig |   2 +
>  sound/Makefile|   3 +-
>  sound/virtio/Kconfig  |  10 +
>  sound/virtio/Makefile |   9 +
>  sound/virtio/virtio_card.c| 473 ++
>  sound/virtio/virtio_card.h|  92 ++
>  sound/virtio/virtio_ctl_msg.c | 293 +++
>  sound/virtio/virtio_ctl_msg.h | 122 
>  sound/virtio/virtio_pcm.c | 536 ++
>  sound/virtio/virtio_pcm.h |  89 ++
>  11 files changed, 1630 insertions(+), 1 deletion(-)
>  create mode 100644 sound/virtio/Kconfig
>  create mode 100644 sound/virtio/Makefile
>  create mode 100644 sound/virtio/virtio_card.c
>  create mode 100644 sound/virtio/virtio_card.h
>  create mode 100644 sound/virtio/virtio_ctl_msg.c
>  create mode 100644 sound/virtio/virtio_ctl_msg.h
>  create mode 100644 sound/virtio/virtio_pcm.c
>  create mode 100644 sound/virtio/virtio_pcm.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6dfd59eafe82..8a0e9f04402f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -18939,8 +18939,10 @@ F:   include/uapi/linux/virtio_mem.h
>  VIRTIO SOUND DRIVER
>  M:   Anton Yakovlev 
>  L:   virtualization@lists.linux-foundation.org
> +L:   alsa-de...@alsa-project.org (moderated for non-subscribers)
>  S:   Maintained
>  F:   include/uapi/linux/virtio_snd.h
> +F:   sound/virtio/*
>  
>  VIRTUAL BOX GUEST DEVICE DRIVER
>  M:   Hans de Goede 
> diff --git a/sound/Kconfig b/sound/Kconfig
> index 36785410fbe1..e56d96d2b11c 100644
> --- a/sound/Kconfig
> +++ b/sound/Kconfig
> @@ -99,6 +99,8 @@ source "sound/synth/Kconfig"
>  
>  source "sound/xen/Kconfig"
>  
> +source "sound/virtio/Kconfig"
> +
>  endif # SND
>  
>  endif # !UML
> diff --git a/sound/Makefile b/sound/Makefile
> index 797ecdcd35e2..04ef04b1168f 100644
> --- a/sound/Makefile
> +++ b/sound/Makefile
> @@ -5,7 +5,8 @@
>  obj-$(CONFIG_SOUND) += soundcore.o
>  obj-$(CONFIG_DMASOUND) += oss/dmasound/
>  obj-$(CONFIG_SND) += core/ i2c/ drivers/ isa/ pci/ ppc/ arm/ sh/ synth/ usb/ 
> \
> - firewire/ sparc/ spi/ parisc/ pcmcia/ mips/ soc/ atmel/ hda/ x86/ xen/
> + firewire/ sparc/ spi/ parisc/ pcmcia/ mips/ soc/ atmel/ hda/ x86/ xen/ \
> + virtio/
>  obj-$(CONFIG_SND_AOA) += aoa/
>  
>  # This one must be compilable even if sound is configured out
> diff --git a/sound/virtio/Kconfig b/sound/virtio/Kconfig
> new file mode 100644
> index ..094cba24ee5b
> --- /dev/null
> +++ b/sound/virtio/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0+
> +# Sound card driver for virtio
> +
> +config SND_VIRTIO
> + tristate "Virtio sound driver"
> + depends on VIRTIO
> + select SND_PCM
> + select SND_JACK
> + help
> +  This is the virtual sound driver for virtio. Say Y or M.
> diff --git a/sound/virtio/Makefile b/sound/virtio/Makefile
> new file mode 100644
> index ..69162a545a41
> --- /dev/null
> +++ b/sound/virtio/Makefile
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0+
> +
> +obj-$(CONFIG_SND_VIRTIO) += virtio_snd.o
> +
> +virtio_snd-objs := \
> + virtio_card.o \
> + virtio_ctl_msg.o \
> + virtio_pcm.o
> +
> diff --git a/sound/virtio/virtio_card.c b/sound/virtio/virtio_card.c
> new file mode 100644
> index ..293d497f24e7
> --- /dev/null
> +++ b/sound/virtio/virtio_card.c
> @@ -0,0 +1,473 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Sound card driver for virtio
> + * Copyright (C) 2020  OpenSynergy GmbH
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see .
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "virtio_card.h"
> +
> +int msg_timeout_ms = MSEC_PER_SEC;
> +module_param(msg_timeout_ms, int, 0644);
> +MODULE_PARM_DESC(msg_timeout_ms, "Message completion timeout in 
> milliseconds");
> +
> +static int virtsnd_probe(struct virtio_device *vdev);
> +static v

Re: [PATCH 2/7] uapi: virtio_snd: add the sound device header file

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 01:36:30AM +0100, Anton Yakovlev wrote:
> The file contains the definitions for the sound device from the OASIS
> virtio spec.
> 
> Signed-off-by: Anton Yakovlev 
> ---
>  MAINTAINERS |   6 +
>  include/uapi/linux/virtio_snd.h | 361 
>  2 files changed, 367 insertions(+)
>  create mode 100644 include/uapi/linux/virtio_snd.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 00836f6452f0..6dfd59eafe82 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -18936,6 +18936,12 @@ W:   https://virtio-mem.gitlab.io/
>  F:   drivers/virtio/virtio_mem.c
>  F:   include/uapi/linux/virtio_mem.h
>  
> +VIRTIO SOUND DRIVER
> +M:   Anton Yakovlev 
> +L:   virtualization@lists.linux-foundation.org
> +S:   Maintained
> +F:   include/uapi/linux/virtio_snd.h
> +
>  VIRTUAL BOX GUEST DEVICE DRIVER
>  M:   Hans de Goede 
>  M:   Arnd Bergmann 

You want sound/virtio here too, right?
I'd just squash this with the next patch in series.

> diff --git a/include/uapi/linux/virtio_snd.h b/include/uapi/linux/virtio_snd.h
> new file mode 100644
> index ..1ff6310e54d6
> --- /dev/null
> +++ b/include/uapi/linux/virtio_snd.h
> @@ -0,0 +1,361 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +/*
> + * Copyright (C) 2020  OpenSynergy GmbH
> + *
> + * This header is BSD licensed so anyone can use the definitions to
> + * implement compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of OpenSynergy GmbH nor the names of its contributors
> + *may be used to endorse or promote products derived from this software
> + *without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR
> + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
> + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
> + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#ifndef VIRTIO_SND_IF_H
> +#define VIRTIO_SND_IF_H
> +
> +#include 
> +
> +/***
> + * CONFIGURATION SPACE
> + */
> +struct virtio_snd_config {
> + /* # of available physical jacks */
> + __le32 jacks;
> + /* # of available PCM streams */
> + __le32 streams;
> + /* # of available channel maps */
> + __le32 chmaps;
> +};
> +
> +enum {
> + /* device virtqueue indexes */
> + VIRTIO_SND_VQ_CONTROL = 0,
> + VIRTIO_SND_VQ_EVENT,
> + VIRTIO_SND_VQ_TX,
> + VIRTIO_SND_VQ_RX,
> + /* # of device virtqueues */
> + VIRTIO_SND_VQ_MAX
> +};
> +
> +/***
> + * COMMON DEFINITIONS
> + */
> +
> +/* supported dataflow directions */
> +enum {
> + VIRTIO_SND_D_OUTPUT = 0,
> + VIRTIO_SND_D_INPUT
> +};
> +
> +enum {
> + /* jack control request types */
> + VIRTIO_SND_R_JACK_INFO = 1,
> + VIRTIO_SND_R_JACK_REMAP,
> +
> + /* PCM control request types */
> + VIRTIO_SND_R_PCM_INFO = 0x0100,
> + VIRTIO_SND_R_PCM_SET_PARAMS,
> + VIRTIO_SND_R_PCM_PREPARE,
> + VIRTIO_SND_R_PCM_RELEASE,
> + VIRTIO_SND_R_PCM_START,
> + VIRTIO_SND_R_PCM_STOP,
> +
> + /* channel map control request types */
> + VIRTIO_SND_R_CHMAP_INFO = 0x0200,
> +
> + /* jack event types */
> + VIRTIO_SND_EVT_JACK_CONNECTED = 0x1000,
> + VIRTIO_SND_EVT_JACK_DISCONNECTED,
> +
> + /* PCM event types */
> + VIRTIO_SND_EVT_PCM_PERIOD_ELAPSED = 0x1100,
> + VIRTIO_SND_EVT_PCM_XRUN,
> +
> + /* common status codes */
> + VIRTIO_SND_S_OK = 0x8000,
> + VIRTIO_SND_S_BAD_MSG,
> + VIRTIO_SND_S_NOT_SUPP,
> + VIRTIO_SND_S_IO_ERR
> +};
> +
> +/* common header */
> +struct virtio_snd_hdr {
> + __le32 code;
> +};
> +
> +/* event notification */
> +struct virtio_snd_event {
> + /* VIRTIO_SND_EVT_XXX */
> + struct virtio_snd_hdr hdr;

Re: [PATCH net-next v2 3/3] xsk: build skb by page

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 03:11:04AM -0500, Michael S. Tsirkin wrote:
> On Wed, Jan 20, 2021 at 03:50:01PM +0800, Xuan Zhuo wrote:
> > This patch is used to construct skb based on page to save memory copy
> > overhead.
> > 
> > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > directly construct skb. If this feature is not supported, it is still
> > necessary to copy data to construct skb.
> > 
> >  Performance Testing 
> > 
> > The test environment is Aliyun ECS server.
> > Test cmd:
> > ```
> > xdpsock -i eth0 -t  -S -s 
> > ```
> > 
> > Test result data:
> > 
> > size64  512 10241500
> > copy1916747 1775988 1600203 1440054
> > page1974058 1953655 1945463 1904478
> > percent 3.0%10.0%   21.58%  32.3%
> > 
> > Signed-off-by: Xuan Zhuo 
> > Reviewed-by: Dust Li 
> 
> I can't see the cover letter or 1/3 in this series - was probably
> threaded incorrectly?


Hmm looked again and now I do see them. My mistake pls ignore.

> 
> > ---
> >  net/xdp/xsk.c | 104 
> > --
> >  1 file changed, 86 insertions(+), 18 deletions(-)
> > 
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 8037b04..817a3a5 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> > sock_wfree(skb);
> >  }
> >  
> > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > + struct xdp_desc *desc)
> > +{
> > +   u32 len, offset, copy, copied;
> > +   struct sk_buff *skb;
> > +   struct page *page;
> > +   char *buffer;
> 
> Actually, make this void *, this way you will not need
> casts down the road. I know this is from xsk_generic_xmit -
> I don't know why it's char * there, either.
> 
> > +   int err, i;
> > +   u64 addr;
> > +
> > +   skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
> > +   if (unlikely(!skb))
> > +   return ERR_PTR(err);
> > +
> > +   addr = desc->addr;
> > +   len = desc->len;
> > +
> > +   buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > +   offset = offset_in_page(buffer);
> > +   addr = buffer - (char *)xs->pool->addrs;
> > +
> > +   for (copied = 0, i = 0; copied < len; i++) {
> > +   page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > +
> > +   get_page(page);
> > +
> > +   copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > +
> > +   skb_fill_page_desc(skb, i, page, offset, copy);
> > +
> > +   copied += copy;
> > +   addr += copy;
> > +   offset = 0;
> > +   }
> > +
> > +   skb->len += len;
> > +   skb->data_len += len;
> > +   skb->truesize += len;
> > +
> > +   refcount_add(len, &xs->sk.sk_wmem_alloc);
> > +
> > +   return skb;
> > +}
> > +
> > +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > +struct xdp_desc *desc)
> > +{
> > +   struct sk_buff *skb = NULL;
> > +
> > +   if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
> > +   skb = xsk_build_skb_zerocopy(xs, desc);
> > +   if (IS_ERR(skb))
> > +   return skb;
> > +   } else {
> > +   char *buffer;
> > +   u32 len;
> > +   int err;
> > +
> > +   len = desc->len;
> > +   skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> > +   if (unlikely(!skb))
> > +   return ERR_PTR(err);
> > +
> > +   skb_put(skb, len);
> > +   buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> > +   err = skb_store_bits(skb, 0, buffer, len);
> > +   if (unlikely(err)) {
> > +   kfree_skb(skb);
> > +   return ERR_PTR(err);
> > +   }
> > +   }
> > +
> > +   skb->dev = xs->dev;
> > +   skb->priority = xs->sk.sk_priority;
> > +   skb->mark = xs->sk.sk_mark;
> > +   skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> > +   skb->destructor = xsk_destruct_skb;
> > +
> > +   return skb;
> > +}
> > +
> >  static int xsk_generic_xmit(struct sock *sk)
> >  {
> > struct xdp_sock *xs = xdp_sk(sk);
> > @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk)
> > goto out;
> >  
> > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> > -   char *buffer;
> > -   u64 addr;
> > -   u32 len;
> > -
> > if (max_batch-- == 0) {
> > err = -EAGAIN;
> > goto out;
> > }
> >  
> > -   len = desc.len;
> > -   skb = sock_alloc_send_skb(sk, len, 1, &err);
> > -   if (unlikely(!skb))
> > +   skb = xsk_build_skb(xs, &desc);
> > +   if (IS_ERR(skb)) {
> > +   err = PTR_ERR(skb);
> > goto out;
> > +   }
> >  
> > -   skb_put(skb, len);
> > -   addr = desc.addr;
> > -  

Re: [PATCH net-next v2 3/3] xsk: build skb by page

2021-01-20 Thread Michael S. Tsirkin
On Wed, Jan 20, 2021 at 03:50:01PM +0800, Xuan Zhuo wrote:
> This patch is used to construct skb based on page to save memory copy
> overhead.
> 
> This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> directly construct skb. If this feature is not supported, it is still
> necessary to copy data to construct skb.
> 
>  Performance Testing 
> 
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t  -S -s 
> ```
> 
> Test result data:
> 
> size64  512 10241500
> copy1916747 1775988 1600203 1440054
> page1974058 1953655 1945463 1904478
> percent 3.0%10.0%   21.58%  32.3%
> 
> Signed-off-by: Xuan Zhuo 
> Reviewed-by: Dust Li 

I can't see the cover letter or 1/3 in this series - was probably
threaded incorrectly?


> ---
>  net/xdp/xsk.c | 104 
> --
>  1 file changed, 86 insertions(+), 18 deletions(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 8037b04..817a3a5 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
>   sock_wfree(skb);
>  }
>  
> +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> +   struct xdp_desc *desc)
> +{
> + u32 len, offset, copy, copied;
> + struct sk_buff *skb;
> + struct page *page;
> + char *buffer;

Actually, make this void *, this way you will not need
casts down the road. I know this is from xsk_generic_xmit -
I don't know why it's char * there, either.

> + int err, i;
> + u64 addr;
> +
> + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err);
> + if (unlikely(!skb))
> + return ERR_PTR(err);
> +
> + addr = desc->addr;
> + len = desc->len;
> +
> + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> + offset = offset_in_page(buffer);
> + addr = buffer - (char *)xs->pool->addrs;
> +
> + for (copied = 0, i = 0; copied < len; i++) {
> + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> +
> + get_page(page);
> +
> + copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> +
> + skb_fill_page_desc(skb, i, page, offset, copy);
> +
> + copied += copy;
> + addr += copy;
> + offset = 0;
> + }
> +
> + skb->len += len;
> + skb->data_len += len;
> + skb->truesize += len;
> +
> + refcount_add(len, &xs->sk.sk_wmem_alloc);
> +
> + return skb;
> +}
> +
> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> +  struct xdp_desc *desc)
> +{
> + struct sk_buff *skb = NULL;
> +
> + if (xs->dev->priv_flags & IFF_TX_SKB_NO_LINEAR) {
> + skb = xsk_build_skb_zerocopy(xs, desc);
> + if (IS_ERR(skb))
> + return skb;
> + } else {
> + char *buffer;
> + u32 len;
> + int err;
> +
> + len = desc->len;
> + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> + if (unlikely(!skb))
> + return ERR_PTR(err);
> +
> + skb_put(skb, len);
> + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> + err = skb_store_bits(skb, 0, buffer, len);
> + if (unlikely(err)) {
> + kfree_skb(skb);
> + return ERR_PTR(err);
> + }
> + }
> +
> + skb->dev = xs->dev;
> + skb->priority = xs->sk.sk_priority;
> + skb->mark = xs->sk.sk_mark;
> + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> + skb->destructor = xsk_destruct_skb;
> +
> + return skb;
> +}
> +
>  static int xsk_generic_xmit(struct sock *sk)
>  {
>   struct xdp_sock *xs = xdp_sk(sk);
> @@ -446,43 +527,30 @@ static int xsk_generic_xmit(struct sock *sk)
>   goto out;
>  
>   while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> - char *buffer;
> - u64 addr;
> - u32 len;
> -
>   if (max_batch-- == 0) {
>   err = -EAGAIN;
>   goto out;
>   }
>  
> - len = desc.len;
> - skb = sock_alloc_send_skb(sk, len, 1, &err);
> - if (unlikely(!skb))
> + skb = xsk_build_skb(xs, &desc);
> + if (IS_ERR(skb)) {
> + err = PTR_ERR(skb);
>   goto out;
> + }
>  
> - skb_put(skb, len);
> - addr = desc.addr;
> - buffer = xsk_buff_raw_get_data(xs->pool, addr);
> - err = skb_store_bits(skb, 0, buffer, len);
>   /* This is the backpressure mechanism for the Tx path.
>* Reserve space in the completion queue and only proceed
>* if there