Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-09-15 Thread Jason Wang


On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:

Hi Jason,

On 15/09/20 1:48 pm, Jason Wang wrote:

Hi Kishon:

On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:

Then you need something that is functional equivalent to virtio PCI
which is actually the concept of vDPA (e.g vDPA provides alternatives if
the queue_sel is hard in the EP implementation).

Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
the VHOST driver to configure VHOST device).

struct vdpa_config_ops {
 /* Virtqueue ops */
 int (*set_vq_address)(struct vdpa_device *vdev,
   u16 idx, u64 desc_area, u64 driver_area,
   u64 device_area);
 void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
 void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
 void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
   struct vdpa_callback *cb);
 void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
 bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
 int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
     const struct vdpa_vq_state *state);
 int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
     struct vdpa_vq_state *state);
 struct vdpa_notification_area
 (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
 /* vq irq is not expected to be changed once DRIVER_OK is set */
 int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);

 /* Device ops */
 u32 (*get_vq_align)(struct vdpa_device *vdev);
 u64 (*get_features)(struct vdpa_device *vdev);
 int (*set_features)(struct vdpa_device *vdev, u64 features);
 void (*set_config_cb)(struct vdpa_device *vdev,
   struct vdpa_callback *cb);
 u16 (*get_vq_num_max)(struct vdpa_device *vdev);
 u32 (*get_device_id)(struct vdpa_device *vdev);
 u32 (*get_vendor_id)(struct vdpa_device *vdev);
 u8 (*get_status)(struct vdpa_device *vdev);
 void (*set_status)(struct vdpa_device *vdev, u8 status);
 void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
    void *buf, unsigned int len);
 void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
    const void *buf, unsigned int len);
 u32 (*get_generation)(struct vdpa_device *vdev);

 /* DMA ops */
 int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
 int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
    u64 pa, u32 perm);
 int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);

 /* Free device resources */
 void (*free)(struct vdpa_device *vdev);
};

+struct vhost_config_ops {
+    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+  unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+  vhost_vq_callback_t *callbacks[],
+  const char * const names[]);
+    void (*del_vqs)(struct vhost_dev *vdev);
+    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
int len);
+    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
len);
+    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+    int (*set_status)(struct vhost_dev *vdev, u8 status);
+    u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct virtio_config_ops
I think there's some overlap here and some of the ops tries to do the
same thing.

I think it differs in (*set_vq_address)() and (*create_vqs)().
[create_vqs() introduced in struct vhost_config_ops provides
complimentary functionality to (*find_vqs)() in struct
virtio_config_ops. It seemingly encapsulates the functionality of
(*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].

Back to the difference between (*set_vq_address)() and (*create_vqs)(),
set_vq_address() directly provides the virtqueue address to the vdpa
device but create_vqs() only provides the parameters of the virtqueue
(like the number of virtqueues, number of buffers) but does not directly
provide the address. IMO the backend client drivers (like net or vhost)
shouldn't/cannot by itself know how to access the vring created on
virtio front-end. The vdpa device/vhost device should have logic for
that. That will help the client drivers to work with different types of
vdpa device/vhost device and can access the vring created by virtio
irrespective of whether the vring can be accessed via mmio or kernel
space or user space.

I think vdpa always works with client drivers in userspace and providing
userspace address for vring.


Sorry for being unclear. What I meant is not replacing vDPA with the
vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
with vDPA in:

Okay, so the virtio back-end still use vhost and front end should use
vDPA. I see. So the host side PCI driver for EPF should populate
vdpa_config_ops and invoke vdpa_register_device().



Yes.



My 

Re: [PATCH] Rescan the entire target on transport reset when LUN is 0

2020-09-15 Thread Martin K. Petersen


Matej,

> This change introduces the behaviour described above by scanning the
> entire scsi target when LUN is set to 0. This is both a functional and a
> performance fix. It aligns the driver with the spec and allows control
> planes to hotplug targets with large numbers of LUNs without having to
> request a RESCAN for each one of them.

Applied to 5.10/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 4/4] vhost: add an RPMsg API

2020-09-15 Thread Guennadi Liakhovetski
On Fri, Sep 11, 2020 at 11:33:13AM -0600, Mathieu Poirier wrote:
> On Fri, Sep 11, 2020 at 09:46:56AM +0200, Guennadi Liakhovetski wrote:
> > Hi Mathieu,
> > 
> > On Thu, Sep 10, 2020 at 11:22:11AM -0600, Mathieu Poirier wrote:
> > > Good morning Guennadi,
> > > 
> > > On Thu, Sep 10, 2020 at 10:38:54AM +0200, Guennadi Liakhovetski wrote:
> > > > Hi Mathieu,
> > > > 
> > > > On Wed, Sep 09, 2020 at 04:39:46PM -0600, Mathieu Poirier wrote:
> > > > > Good afternoon,
> > > > > 
> > > > > On Wed, Aug 26, 2020 at 07:46:36PM +0200, Guennadi Liakhovetski wrote:
> > > > > > Linux supports running the RPMsg protocol over the VirtIO transport
> > > > > > protocol, but currently there is only support for VirtIO clients and
> > > > > > no support for a VirtIO server. This patch adds a vhost-based RPMsg
> > > > > > server implementation.
> > > > > 
> > > > > This changelog is very confusing...  At this time the name service in 
> > > > > the
> > > > > remoteproc space runs as a server on the application processor.  But 
> > > > > from the
> > > > > above the remoteproc usecase seems to be considered to be a client
> > > > > configuration.
> > > > 
> > > > I agree that this isn't very obvious. But I think it is common to call 
> > > > the 
> > > > host "a server" and guests "clients." E.g. in vhost.c in the 
> > > > top-of-thefile 
> > > > comment:
> > > 
> > > Ok - that part we agree on.
> > > 
> > > > 
> > > >  * Generic code for virtio server in host kernel.
> > > > 
> > > > I think the generic concept behind this notation is, that as guests 
> > > > boot, 
> > > > they send their requests to the host, e.g. VirtIO device drivers on 
> > > > guests 
> > > > send requests over VirtQueues to VirtIO servers on the host, which can 
> > > > run 
> > > > either in the user- or in the kernel-space. And I think you can follow 
> > > > that 
> > > 
> > > I can see that process taking place.  After all virtIO devices on guests 
> > > are
> > > only stubs that need host support for access to HW.
> > > 
> > > > logic in case of devices or remote processors too: it's the main CPU(s) 
> > > > that boot(s) and start talking to devices and remote processors, so in 
> > > > that 
> > > > sence devices are servers and the CPUs are their clients.
> > > 
> > > In the remote processor case, the remoteproc core (application processor) 
> > > sets up
> > > the name service but does not initiate the communication with a remote
> > > processor.  It simply waits there for a name space request to come in 
> > > from the
> > > remote processor.
> > 
> > Hm, I don't see that in two examples, that I looked at: mtk and virtio. In 
> > both 
> > cases the announcement seems to be directly coming from the application 
> > processor 
> > maybe after some initialisation.
>  
> Can you expand on that part - perhaps point me to the (virtio) code you are
> referring to? 

Ok, we're both right: it goes both ways.

Here's my understanding of the control flow of virtio_rpmsg_bus.c:

1. The driver registers a VirtIO driver with the VIRTIO_ID_RPMSG ID.
2. When the driver is probed, if the server / the application processor 
supports the 
   VIRTIO_RPMSG_F_NS feature, the driver calls __rpmsg_create_ept() to create 
an 
   endpoint with rpmsg_ns_cb() as a callback.
3. When a namespace announcement arrives from the server, the callback is 
called, 
   which then registers a new channel (in case of CREATE). That then created an
   rpmsg device.
4. If there's a matching rpmsg driver for that device, it's .probe() method is 
   called, so it can then add its own rpmsg endpoints, to be used for its 
proper 
   communication.

Now there was indeed something in virtio_rpmsg_bus.c that I didn't fully 
understand: 
virtio_rpmsg_announce_create() and virtio_rpmsg_announce_destroy() functions. 
Now I 
understood, that as the client registers its custom channels, it also then can 
send name service announcements to the application processor, using those 
functions. 
This is also described in [1] as:


Name Service sub-component (optional)

This subcomponent is a minimum implementation of the name service which is 
present 
in the Linux Kernel implementation of RPMsg. It allows the communicating node 
both 
to send announcements about "named" endpoint (in other words, channel) creation 
or 
deletion and to receive these announcement taking any user-defined action in an 
application callback. 


Also in Documentation/rpmsg.txt


...the remote processor announces the existence of a remote rpmsg service by 
sending a name service message (which contains the name and rpmsg addr of the 
remote service, see struct rpmsg_ns_msg).


in [2]:


In the current protocol, at startup, the master sends notification to remote to 
let 
it know that it can receive name service announcement.


> > > > And yes, the name-space announcement use-case seems confusing to me too 
> > > > - it 
> > > > reverts the relationship in a way: once a guest has booted and 
> > > > established 
> > > > 

Re: [PATCH RFC v1 08/18] x86/hyperv: handling hypercall page setup for root

2020-09-15 Thread Vitaly Kuznetsov
Wei Liu  writes:

> On Tue, Sep 15, 2020 at 01:02:08PM +0200, Vitaly Kuznetsov wrote:
>> Wei Liu  writes:
>> 
>> > On Tue, Sep 15, 2020 at 12:32:29PM +0200, Vitaly Kuznetsov wrote:
>> >> Wei Liu  writes:
>> >> 
>> >> > When Linux is running as the root partition, the hypercall page will
>> >> > have already been setup by Hyper-V. Copy the content over to the
>> >> > allocated page.
>> >> 
>> >> And we can't setup a new hypercall page by writing something different
>> >> to HV_X64_MSR_HYPERCALL, right?
>> >> 
>> >
>> > My understanding is that we can't, but Sunil can maybe correct me.
>> >
>> >> >
>> >> > The suspend, resume and cleanup paths remain untouched because they are
>> >> > not supported in this setup yet.
>> >> >
>> >> > Signed-off-by: Lillian Grassin-Drake 
>> >> > Signed-off-by: Sunil Muthuswamy 
>> >> > Signed-off-by: Nuno Das Neves 
>> >> > Co-Developed-by: Lillian Grassin-Drake 
>> >> > Co-Developed-by: Sunil Muthuswamy 
>> >> > Co-Developed-by: Nuno Das Neves 
>> >> > Signed-off-by: Wei Liu 
>> >> > ---
>> >> >  arch/x86/hyperv/hv_init.c | 26 --
>> >> >  1 file changed, 24 insertions(+), 2 deletions(-)
>> >> >
>> >> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> >> > index 0eec1ed32023..26233aebc86c 100644
>> >> > --- a/arch/x86/hyperv/hv_init.c
>> >> > +++ b/arch/x86/hyperv/hv_init.c
>> >> > @@ -25,6 +25,7 @@
>> >> >  #include 
>> >> >  #include 
>> >> >  #include 
>> >> > +#include 
>> >> >  
>> >> >  /* Is Linux running as the root partition? */
>> >> >  bool hv_root_partition;
>> >> > @@ -448,8 +449,29 @@ void __init hyperv_init(void)
>> >> >  
>> >> > rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >> > hypercall_msr.enable = 1;
>> >> > -   hypercall_msr.guest_physical_address = 
>> >> > vmalloc_to_pfn(hv_hypercall_pg);
>> >> > -   wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >> > +
>> >> > +   if (hv_root_partition) {
>> >> > +   struct page *pg;
>> >> > +   void *src, *dst;
>> >> > +
>> >> > +   /*
>> >> > +* Order is important here. We must enable the 
>> >> > hypercall page
>> >> > +* so it is populated with code, then copy the code to 
>> >> > an
>> >> > +* executable page.
>> >> > +*/
>> >> > +   wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >> > +
>> >> > +   pg = vmalloc_to_page(hv_hypercall_pg);
>> >> > +   dst = kmap(pg);
>> >> > +   src = memremap(hypercall_msr.guest_physical_address << 
>> >> > PAGE_SHIFT, PAGE_SIZE,
>> >> > +   MEMREMAP_WB);
>> >> 
>> >> memremap() can fail...
>> >
>> > And we don't care here, if it fails, we would rather it panic or oops.
>> >
>> > I was relying on the fact that copying from / to a NULL pointer will
>> > cause the kernel to crash. But of course it wouldn't hurt to explicitly
>> > panic here.
>> >
>> >> 
>> >> > +   memcpy(dst, src, PAGE_SIZE);
>> >> > +   memunmap(src);
>> >> > +   kunmap(pg);
>> >> > +   } else {
>> >> > +   hypercall_msr.guest_physical_address = 
>> >> > vmalloc_to_pfn(hv_hypercall_pg);
>> >> > +   wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >> > +   }
>> >> 
>> >> Why can't we do wrmsrl() for both cases here?
>> >> 
>> >
>> > Because the hypercall page has already been set up when Linux is the
>> > root.
>> 
>> But you already do wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64)
>> in 'if (hv_root_partition)' case above, that's why I asked.
>> 
>
> You mean extracting wrmsrl to this point?  The ordering matters. See the
> comment in the root branch -- we have to enable the page before copying
> the content.
>
> What can be done is:
>
>if (!root) {
>/* some stuff */
>}
>
>wrmsrl(...)
>
>if (root) {
>/* some stuff */
>}
>
> This is not looking any better than the existing code.
>

Oh, I missed the comment indeed. So Hypervisor already picked a page for
us, however, it didn't enable it and it's not populated? How can we be
sure that we didn't use it for something else already? Maybe we can
still give a different known-to-be-empty page?

-- 
Vitaly

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v1 10/18] x86/hyperv: implement and use hv_smp_prepare_cpus

2020-09-15 Thread Vitaly Kuznetsov
Wei Liu  writes:

> Microsoft Hypervisor requires the root partition to make a few
> hypercalls to setup application processors before they can be used.
>
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Sunil Muthuswamy 
> Co-Developed-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
> CPU hotplug and unplug is not yet supported in this setup, so those
> paths remain untouched.
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 1bf57d310f78..7522cae02759 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -203,6 +203,31 @@ static void __init hv_smp_prepare_boot_cpu(void)
>   hv_init_spinlocks();
>  #endif
>  }
> +
> +static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> +{
> +#if defined(CONFIG_X86_64)

I think it makes little sense to try to make Linux work as Hyper-V root
partition when !CONFIG_X86_64. If we still care about Hyper-V enablement
for !CONFIG_X86_64 we can probably introduce something like
CONFIG_HYPERV_ROOT and enable it automatically, e.g.

config HYPERV_ROOT
def_bool HYPERV && X86_64

and use it instead.

> + int i;
> + int vp_index = 1;
> + int ret;
> +
> + native_smp_prepare_cpus(max_cpus);
> +
> + for_each_present_cpu(i) {
> + if (i == 0)
> + continue;
> + ret = hv_call_add_logical_proc(numa_cpu_node(i), i, 
> cpu_physical_id(i));
> + BUG_ON(ret);
> + }
> +
> + for_each_present_cpu(i) {
> + if (i == 0)
> + continue;
> + ret = hv_call_create_vp(numa_cpu_node(i), 
> hv_current_partition_id, vp_index++, i);

So vp_index variable is needed here to make sure there are no gaps? (or
we could've just used 'i')?

> + BUG_ON(ret);
> + }
> +#endif
> +}
>  #endif
>  
>  static void __init ms_hyperv_init_platform(void)
> @@ -359,6 +384,8 @@ static void __init ms_hyperv_init_platform(void)
>  
>  # ifdef CONFIG_SMP
>   smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> + if (hv_root_partition)
> + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
>  # endif
>  
>   /*

-- 
Vitaly

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v1 08/18] x86/hyperv: handling hypercall page setup for root

2020-09-15 Thread Vitaly Kuznetsov
Wei Liu  writes:

> On Tue, Sep 15, 2020 at 12:32:29PM +0200, Vitaly Kuznetsov wrote:
>> Wei Liu  writes:
>> 
>> > When Linux is running as the root partition, the hypercall page will
>> > have already been setup by Hyper-V. Copy the content over to the
>> > allocated page.
>> 
>> And we can't setup a new hypercall page by writing something different
>> to HV_X64_MSR_HYPERCALL, right?
>> 
>
> My understanding is that we can't, but Sunil can maybe correct me.
>
>> >
>> > The suspend, resume and cleanup paths remain untouched because they are
>> > not supported in this setup yet.
>> >
>> > Signed-off-by: Lillian Grassin-Drake 
>> > Signed-off-by: Sunil Muthuswamy 
>> > Signed-off-by: Nuno Das Neves 
>> > Co-Developed-by: Lillian Grassin-Drake 
>> > Co-Developed-by: Sunil Muthuswamy 
>> > Co-Developed-by: Nuno Das Neves 
>> > Signed-off-by: Wei Liu 
>> > ---
>> >  arch/x86/hyperv/hv_init.c | 26 --
>> >  1 file changed, 24 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> > index 0eec1ed32023..26233aebc86c 100644
>> > --- a/arch/x86/hyperv/hv_init.c
>> > +++ b/arch/x86/hyperv/hv_init.c
>> > @@ -25,6 +25,7 @@
>> >  #include 
>> >  #include 
>> >  #include 
>> > +#include 
>> >  
>> >  /* Is Linux running as the root partition? */
>> >  bool hv_root_partition;
>> > @@ -448,8 +449,29 @@ void __init hyperv_init(void)
>> >  
>> >rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> >hypercall_msr.enable = 1;
>> > -  hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
>> > -  wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> > +
>> > +  if (hv_root_partition) {
>> > +  struct page *pg;
>> > +  void *src, *dst;
>> > +
>> > +  /*
>> > +   * Order is important here. We must enable the hypercall page
>> > +   * so it is populated with code, then copy the code to an
>> > +   * executable page.
>> > +   */
>> > +  wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> > +
>> > +  pg = vmalloc_to_page(hv_hypercall_pg);
>> > +  dst = kmap(pg);
>> > +  src = memremap(hypercall_msr.guest_physical_address << 
>> > PAGE_SHIFT, PAGE_SIZE,
>> > +  MEMREMAP_WB);
>> 
>> memremap() can fail...
>
> And we don't care here, if it fails, we would rather it panic or oops.
>
> I was relying on the fact that copying from / to a NULL pointer will
> cause the kernel to crash. But of course it wouldn't hurt to explicitly
> panic here.
>
>> 
>> > +  memcpy(dst, src, PAGE_SIZE);
>> > +  memunmap(src);
>> > +  kunmap(pg);
>> > +  } else {
>> > +  hypercall_msr.guest_physical_address = 
>> > vmalloc_to_pfn(hv_hypercall_pg);
>> > +  wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>> > +  }
>> 
>> Why can't we do wrmsrl() for both cases here?
>> 
>
> Because the hypercall page has already been set up when Linux is the
> root.

But you already do wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64)
in 'if (hv_root_partition)' case above, that's why I asked.

>
> I could've tried writing to the MSR again, but because the behaviour
> here is not documented and subject to change so I didn't bother trying.
>
> Wei.
>
>> >  
>> >/*
>> > * Ignore any errors in setting up stimer clockevents
>> 
>> -- 
>> Vitaly
>> 
>

-- 
Vitaly

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v1 09/18] x86/hyperv: provide a bunch of helper functions

2020-09-15 Thread Vitaly Kuznetsov
Wei Liu  writes:

> They are used to deposit pages into Microsoft Hypervisor and bring up
> logical and virtual processors.
>
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Sunil Muthuswamy 
> Signed-off-by: Nuno Das Neves 
> Co-Developed-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Co-Developed-by: Nuno Das Neves 
> Signed-off-by: Wei Liu 
> ---
>  arch/x86/hyperv/Makefile  |   2 +-
>  arch/x86/hyperv/hv_proc.c | 209 ++
>  arch/x86/include/asm/mshyperv.h   |   4 +
>  include/asm-generic/hyperv-tlfs.h |  56 
>  4 files changed, 270 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/hv_proc.c
>
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 89b1f74d3225..565358020921 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  obj-y:= hv_init.o mmu.o nested.o
> -obj-$(CONFIG_X86_64) += hv_apic.o
> +obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
>  
>  ifdef CONFIG_X86_64
>  obj-$(CONFIG_PARAVIRT_SPINLOCKS) += hv_spinlock.o
> diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c
> new file mode 100644
> index ..847c72465d0e
> --- /dev/null
> +++ b/arch/x86/hyperv/hv_proc.c
> @@ -0,0 +1,209 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define HV_DEPOSIT_MAX_ORDER (8)
> +#define HV_DEPOSIT_MAX (1 << HV_DEPOSIT_MAX_ORDER)
> +
> +#define MAX(a, b) ((a) > (b) ? (a) : (b))
> +#define MIN(a, b) ((a) < (b) ? (a) : (b))

Nit: include/linux/kernel.h defines min() and max() macros with type
checking.

> +
> +/*
> + * Deposits exact number of pages
> + * Must be called with interrupts enabled
> + * Max 256 pages
> + */
> +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> +{
> + struct page **pages;
> + int *counts;
> + int num_allocations;
> + int i, j, page_count;
> + int order;
> + int desired_order;
> + int status;
> + int ret;
> + u64 base_pfn;
> + struct hv_deposit_memory *input_page;
> + unsigned long flags;
> +
> + if (num_pages > HV_DEPOSIT_MAX)
> + return -EINVAL;
> + if (!num_pages)
> + return 0;
> +
> + ret = -ENOMEM;
> +
> + /* One buffer for page pointers and counts */
> + pages = page_address(alloc_page(GFP_KERNEL));
> + if (!pages)
> + goto free_buf;

There is nothing to free, just do 'return -ENOMEM' here;

> + counts = (int *)[256];
> +

Oh this is weird. So 'pages' is an array of 512 'struct page *' items
and we use its second half (pages[256]) for an array of signed(!)
integers(!). Can we use a locally defined struct or something better for
that?

> + /* Allocate all the pages before disabling interrupts */
> + num_allocations = 0;
> + i = 0;
> + order = HV_DEPOSIT_MAX_ORDER;
> +
> + while (num_pages) {
> + /* Find highest order we can actually allocate */
> + desired_order = 31 - __builtin_clz(num_pages);
> + order = MIN(desired_order, order);
> + do {
> + pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
> + if (!pages[i]) {
> + if (!order) {
> + goto err_free_allocations;
> + }
> + --order;
> + }
> + } while (!pages[i]);
> +
> + split_page(pages[i], order);
> + counts[i] = 1 << order;
> + num_pages -= counts[i];
> + i++;

So here we believe we will never overrun the 2048 bytes we 'allocated'
for 'counts' above. While 'if (num_pages > HV_DEPOSIT_MAX)' presumably
guarantees that, this is not really obvious.

> + num_allocations++;
> + }
> +
> + local_irq_save(flags);
> +
> + input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + input_page->partition_id = partition_id;
> +
> + /* Populate gpa_page_list - these will fit on the input page */
> + for (i = 0, page_count = 0; i < num_allocations; ++i) {
> + base_pfn = page_to_pfn(pages[i]);
> + for (j = 0; j < counts[i]; ++j, ++page_count)
> + input_page->gpa_page_list[page_count] = base_pfn + j;
> + }
> + status = hv_do_rep_hypercall(HVCALL_DEPOSIT_MEMORY,
> +  page_count, 0, input_page,
> +  NULL) & HV_HYPERCALL_RESULT_MASK;
> + local_irq_restore(flags);
> +
> + if (status != HV_STATUS_SUCCESS) {

Nit: same like in one ov the previous patches, status can be 'u16'.

> + pr_err("Failed to deposit pages: %d\n", status);
> +  

Re: [PATCH RFC v1 08/18] x86/hyperv: handling hypercall page setup for root

2020-09-15 Thread Vitaly Kuznetsov
Wei Liu  writes:

> When Linux is running as the root partition, the hypercall page will
> have already been setup by Hyper-V. Copy the content over to the
> allocated page.

And we can't setup a new hypercall page by writing something different
to HV_X64_MSR_HYPERCALL, right?

>
> The suspend, resume and cleanup paths remain untouched because they are
> not supported in this setup yet.
>
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Sunil Muthuswamy 
> Signed-off-by: Nuno Das Neves 
> Co-Developed-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Co-Developed-by: Nuno Das Neves 
> Signed-off-by: Wei Liu 
> ---
>  arch/x86/hyperv/hv_init.c | 26 --
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0eec1ed32023..26233aebc86c 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /* Is Linux running as the root partition? */
>  bool hv_root_partition;
> @@ -448,8 +449,29 @@ void __init hyperv_init(void)
>  
>   rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>   hypercall_msr.enable = 1;
> - hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> - wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +
> + if (hv_root_partition) {
> + struct page *pg;
> + void *src, *dst;
> +
> + /*
> +  * Order is important here. We must enable the hypercall page
> +  * so it is populated with code, then copy the code to an
> +  * executable page.
> +  */
> + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +
> + pg = vmalloc_to_page(hv_hypercall_pg);
> + dst = kmap(pg);
> + src = memremap(hypercall_msr.guest_physical_address << 
> PAGE_SHIFT, PAGE_SIZE,
> + MEMREMAP_WB);

memremap() can fail...

> + memcpy(dst, src, PAGE_SIZE);
> + memunmap(src);
> + kunmap(pg);
> + } else {
> + hypercall_msr.guest_physical_address = 
> vmalloc_to_pfn(hv_hypercall_pg);
> + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> + }

Why can't we do wrmsrl() for both cases here?

>  
>   /*
>* Ignore any errors in setting up stimer clockevents

-- 
Vitaly

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v1 07/18] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary

2020-09-15 Thread Vitaly Kuznetsov
Wei Liu  writes:

> We will need the partition ID for executing some hypercalls later.
>
> Signed-off-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
>  arch/x86/hyperv/hv_init.c | 26 ++
>  arch/x86/include/asm/mshyperv.h   |  2 ++
>  include/asm-generic/hyperv-tlfs.h |  6 ++
>  3 files changed, 34 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index ebba4be4185d..0eec1ed32023 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -30,6 +30,9 @@
>  bool hv_root_partition;
>  EXPORT_SYMBOL_GPL(hv_root_partition);
>  
> +u64 hv_current_partition_id;
> +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> +
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
>  
> @@ -345,6 +348,26 @@ static struct syscore_ops hv_syscore_ops = {
>   .resume = hv_resume,
>  };
>  
> +void __init hv_get_partition_id(void)
> +{
> + struct hv_get_partition_id *output_page;
> + int status;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
> + HV_HYPERCALL_RESULT_MASK;

Nit: in this case status is 'u16', we can define it as such (instead of
signed int).

> + if (status != HV_STATUS_SUCCESS)
> + pr_err("Failed to get partition ID: %d\n", status);
> + else
> + hv_current_partition_id = output_page->partition_id;
> + local_irq_restore(flags);
> +
> + /* No point in proceeding if this failed */
> + BUG_ON(status != HV_STATUS_SUCCESS);
> +}
> +
>  /*
>   * This function is to be invoked early in the boot sequence after the
>   * hypervisor has been detected.
> @@ -440,6 +463,9 @@ void __init hyperv_init(void)
>  
>   register_syscore_ops(_syscore_ops);
>  
> + if (hv_root_partition)
> + hv_get_partition_id();

According to TLFS, partition ID is available when AccessPartitionId
privilege is granted. I'd suggest we check that instead of
hv_root_partition (and we can set hv_current_partition_id to something
like U64_MAX so we know it wasn't acuired). So the BUG_ON condition will
move here:

hv_get_partition_id();
BUG_ON(hv_root_partition && hv_current_partition_id == U64_MAX);

> +
>   return;
>  
>  remove_cpuhp_state:
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index f5c62140f28d..4039302e0ae9 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -65,6 +65,8 @@ extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
>  extern void  __percpu  **hyperv_pcpu_output_arg;
>  
> +extern u64 hv_current_partition_id;
> +
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
>   u64 input_address = input ? virt_to_phys(input) : 0;
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index e6903589a82a..87b1a79b19eb 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX0x0013
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
>  #define HVCALL_SEND_IPI_EX   0x0015
> +#define HVCALL_GET_PARTITION_ID  0x0046
>  #define HVCALL_GET_VP_REGISTERS  0x0050
>  #define HVCALL_SET_VP_REGISTERS  0x0051
>  #define HVCALL_POST_MESSAGE  0x005c
> @@ -407,6 +408,11 @@ struct hv_tlb_flush_ex {
>   u64 gva_list[];
>  } __packed;
>  
> +/* HvGetPartitionId hypercall (output only) */
> +struct hv_get_partition_id {
> + u64 partition_id;
> +} __packed;
> +
>  /* HvRetargetDeviceInterrupt hypercall */
>  union hv_msi_entry {
>   u64 as_uint64;

-- 
Vitaly

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v1 06/18] x86/hyperv: allocate output arg pages if required

2020-09-15 Thread Vitaly Kuznetsov
Wei Liu  writes:

> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
>
> Allocate pages for storing results when Linux runs as the root
> partition.
>
> Signed-off-by: Lillian Grassin-Drake 
> Co-Developed-by: Lillian Grassin-Drake 
> Signed-off-by: Wei Liu 
> ---
>  arch/x86/hyperv/hv_init.c   | 45 +
>  arch/x86/include/asm/mshyperv.h |  1 +
>  2 files changed, 41 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index cac8e4c56261..ebba4be4185d 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -45,6 +45,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
>  void  __percpu **hyperv_pcpu_input_arg;
>  EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>  
> +void  __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  
> @@ -75,14 +78,29 @@ static int hv_cpu_init(unsigned int cpu)
>   u64 msr_vp_index;
>   struct hv_vp_assist_page **hvp = _vp_assist_page[smp_processor_id()];
>   void **input_arg;
> - struct page *pg;
> + struct page *input_pg;
>  
>   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>   /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> - if (unlikely(!pg))
> + input_pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> + if (unlikely(!input_pg))
>   return -ENOMEM;
> - *input_arg = page_address(pg);
> + *input_arg = page_address(input_pg);
> +
> + if (hv_root_partition) {
> + struct page *output_pg;
> + void **output_arg;
> +
> + output_pg = alloc_page(irqs_disabled() ? GFP_ATOMIC :
>   GFP_KERNEL);

To simplify the code, can we just rename 'input_arg' to 'hypercall_args'
and do alloc_pages(rqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 1) to
allocate two pages above?

> + if (unlikely(!output_pg)) {
> + free_page((unsigned long)*input_arg);
> + *input_arg = NULL;
> + return -ENOMEM;
> + }
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = page_address(output_pg);
> + }
>  
>   hv_get_vp_index(msr_vp_index);
>  
> @@ -209,14 +227,25 @@ static int hv_cpu_die(unsigned int cpu)
>   unsigned int new_cpu;
>   unsigned long flags;
>   void **input_arg;
> - void *input_pg = NULL;
> + void *input_pg = NULL, *output_pg = NULL;
>  
>   local_irq_save(flags);
>   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>   input_pg = *input_arg;
>   *input_arg = NULL;
> +
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + output_pg = *output_arg;
> + *output_arg = NULL;
> + }
> +
>   local_irq_restore(flags);
> +
>   free_page((unsigned long)input_pg);
> + free_page((unsigned long)output_pg);
>  
>   if (hv_vp_assist_page && hv_vp_assist_page[cpu])
>   wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -350,6 +379,12 @@ void __init hyperv_init(void)
>  
>   BUG_ON(hyperv_pcpu_input_arg == NULL);
>  
> + /* Allocate the per-CPU state for output arg for root */
> + if (hv_root_partition) {
> + hyperv_pcpu_output_arg = alloc_percpu(void  *);
redundant space ^

> + BUG_ON(hyperv_pcpu_output_arg == NULL);
> + }
> +
>   /* Allocate percpu VP index */
>   hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
>   GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 2a2cc81beac6..f5c62140f28d 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -63,6 +63,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
> +extern void  __percpu  **hyperv_pcpu_output_arg;
>  
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {

-- 
Vitaly

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v1 05/18] clocksource/hyperv: use MSR-based access if running as root

2020-09-15 Thread Vitaly Kuznetsov
Wei Liu  writes:

> Signed-off-by: Wei Liu 
> ---
>  drivers/clocksource/hyperv_timer.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/clocksource/hyperv_timer.c 
> b/drivers/clocksource/hyperv_timer.c
> index 09aa44cb8a91..fe96082ce85e 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
>   if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
>   return false;
>  
> + if (hv_root_partition)
> + return false;
> +

Out of pure curiosity,

TSC page clocksource seems to be available to the root partition (as
HV_MSR_REFERENCE_TSC_AVAILABLE is set), why don't we use it? (I
understand that with TSC scaling support in modern CPUs even migration
is a no-issue and we can use raw TSC but this all seems to be
independent from root/non-root partition question).

>   hv_read_reference_counter = read_hv_clock_tsc;
>   phys_addr = virt_to_phys(hv_get_tsc_page());

-- 
Vitaly

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/7] kernel/resource: make release_mem_region_adjustable() never fail

2020-09-15 Thread David Hildenbrand
On 15.09.20 11:06, Wei Yang wrote:
> On Tue, Sep 15, 2020 at 09:35:30AM +0200, David Hildenbrand wrote:
>>
 static int __ref try_remove_memory(int nid, u64 start, u64 size)
 {
int rc = 0;
 @@ -1777,7 +1757,7 @@ static int __ref try_remove_memory(int nid, u64 
 start, u64 size)
memblock_remove(start, size);
}

 -  __release_memory_resource(start, size);
 +  release_mem_region_adjustable(_resource, start, size);

>>>
>>> Seems the only user of release_mem_region_adjustable() is here, can we move
>>> iomem_resource into the function body? Actually, we don't iterate the 
>>> resource
>>> tree from any level. We always start from the root.
>>
>> You mean, making iomem_resource implicit? I can spot that something
>> similar was done for
>>
>> #define devm_release_mem_region(dev, start, n) \
>>  __devm_release_region(dev, _resource, (start), (n))
>>
> 
> What I prefer is remove iomem_resource from the parameter list. Just use is in
> the function body.
> 
> For the example you listed, __release_region() would have varies of *parent*,
> which looks reasonable to keep it here.

Yeah I got that ("making iomem_resource implicit"), as I said:

>> I'll send an addon patch for that, ok? - thanks.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-15 Thread kernel test robot
Hi David,

I love your patch! Perhaps something to improve:

[auto build test WARNING on next-20200909]
[cannot apply to mmotm/master hnaz-linux-mm/master xen-tip/linux-next 
powerpc/next linus/master v5.9-rc4 v5.9-rc3 v5.9-rc2 v5.9-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/David-Hildenbrand/mm-memory_hotplug-selective-merging-of-system-ram-resources/20200910-171630
base:7204eaa2c1f509066486f488c9dcb065d7484494
:: branch date: 9 hours ago
:: commit date: 9 hours ago
config: powerpc-randconfig-r011-20200909 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
0a5dc7effb191eff740e0e7ae7bd8e1f6bdb3ad9)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install powerpc cross compiling tool for clang build
# apt-get install binutils-powerpc-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from arch/powerpc/kernel/asm-offsets.c:14:
   In file included from include/linux/compat.h:17:
   In file included from include/linux/fs.h:15:
   In file included from include/linux/radix-tree.h:18:
   In file included from include/linux/xarray.h:14:
   In file included from include/linux/gfp.h:6:
   In file included from include/linux/mmzone.h:853:
   include/linux/memory_hotplug.h:354:55: error: unknown type name 'mhp_t'
   extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
 ^
   include/linux/memory_hotplug.h:355:53: error: unknown type name 'mhp_t'
   extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
   ^
   include/linux/memory_hotplug.h:357:11: error: unknown type name 'mhp_t'
  mhp_t mhp_flags);
  ^
   include/linux/memory_hotplug.h:360:10: error: unknown type name 'mhp_t'
mhp_t mhp_flags);
^
   In file included from arch/powerpc/kernel/asm-offsets.c:21:
>> include/linux/mman.h:137:9: warning: division by zero is undefined 
>> [-Wdivision-by-zero]
  _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED) |
  ^~~~
   include/linux/mman.h:115:21: note: expanded from macro '_calc_vm_trans'
  : ((x) & (bit1)) / ((bit1) / (bit2
   ^ ~
   include/linux/mman.h:138:9: warning: division by zero is undefined 
[-Wdivision-by-zero]
  _calc_vm_trans(flags, MAP_SYNC,   VM_SYNC  );
  ^~~~
   include/linux/mman.h:115:21: note: expanded from macro '_calc_vm_trans'
  : ((x) & (bit1)) / ((bit1) / (bit2
   ^ ~
   2 warnings and 4 errors generated.
   make[2]: *** [scripts/Makefile.build:117: arch/powerpc/kernel/asm-offsets.s] 
Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1196: prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:185: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.

# 
https://github.com/0day-ci/linux/commit/d88270d1c0783a7f99f24a85692be90fd2ae0d7d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
David-Hildenbrand/mm-memory_hotplug-selective-merging-of-system-ram-resources/20200910-171630
git checkout d88270d1c0783a7f99f24a85692be90fd2ae0d7d
vim +137 include/linux/mman.h

^1da177e4c3f41 Linus Torvalds  2005-04-16  128  
^1da177e4c3f41 Linus Torvalds  2005-04-16  129  /*
^1da177e4c3f41 Linus Torvalds  2005-04-16  130   * Combine the mmap "flags" 
argument into "vm_flags" used internally.
^1da177e4c3f41 Linus Torvalds  2005-04-16  131   */
^1da177e4c3f41 Linus Torvalds  2005-04-16  132  static inline unsigned long
^1da177e4c3f41 Linus Torvalds  2005-04-16  133  calc_vm_flag_bits(unsigned long 
flags)
^1da177e4c3f41 Linus Torvalds  2005-04-16  134  {
^1da177e4c3f41 Linus Torvalds  2005-04-16  135  return 
_calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
^1da177e4c3f41 Linus Torvalds  2005-04-16  136 
_calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
b6fb293f2497a9 Jan Kara2017-11-01 @137 
_calc_vm_trans(flags, MAP_LOCKED, 

Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-09-15 Thread Jason Wang

Hi Kishon:

On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:

Then you need something that is functional equivalent to virtio PCI
which is actually the concept of vDPA (e.g vDPA provides alternatives if
the queue_sel is hard in the EP implementation).

Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
the VHOST driver to configure VHOST device).

struct vdpa_config_ops {
/* Virtqueue ops */
int (*set_vq_address)(struct vdpa_device *vdev,
  u16 idx, u64 desc_area, u64 driver_area,
  u64 device_area);
void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
  struct vdpa_callback *cb);
void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
const struct vdpa_vq_state *state);
int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
struct vdpa_vq_state *state);
struct vdpa_notification_area
(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
/* vq irq is not expected to be changed once DRIVER_OK is set */
int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);

/* Device ops */
u32 (*get_vq_align)(struct vdpa_device *vdev);
u64 (*get_features)(struct vdpa_device *vdev);
int (*set_features)(struct vdpa_device *vdev, u64 features);
void (*set_config_cb)(struct vdpa_device *vdev,
  struct vdpa_callback *cb);
u16 (*get_vq_num_max)(struct vdpa_device *vdev);
u32 (*get_device_id)(struct vdpa_device *vdev);
u32 (*get_vendor_id)(struct vdpa_device *vdev);
u8 (*get_status)(struct vdpa_device *vdev);
void (*set_status)(struct vdpa_device *vdev, u8 status);
void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
   void *buf, unsigned int len);
void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
   const void *buf, unsigned int len);
u32 (*get_generation)(struct vdpa_device *vdev);

/* DMA ops */
int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
   u64 pa, u32 perm);
int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);

/* Free device resources */
void (*free)(struct vdpa_device *vdev);
};

+struct vhost_config_ops {
+   int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+ unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+ vhost_vq_callback_t *callbacks[],
+ const char * const names[]);
+   void (*del_vqs)(struct vhost_dev *vdev);
+   int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src, int len);
+   int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int len);
+   int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+   int (*set_status)(struct vhost_dev *vdev, u8 status);
+   u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct virtio_config_ops
I think there's some overlap here and some of the ops tries to do the
same thing.

I think it differs in (*set_vq_address)() and (*create_vqs)().
[create_vqs() introduced in struct vhost_config_ops provides
complimentary functionality to (*find_vqs)() in struct
virtio_config_ops. It seemingly encapsulates the functionality of
(*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].

Back to the difference between (*set_vq_address)() and (*create_vqs)(),
set_vq_address() directly provides the virtqueue address to the vdpa
device but create_vqs() only provides the parameters of the virtqueue
(like the number of virtqueues, number of buffers) but does not directly
provide the address. IMO the backend client drivers (like net or vhost)
shouldn't/cannot by itself know how to access the vring created on
virtio front-end. The vdpa device/vhost device should have logic for
that. That will help the client drivers to work with different types of
vdpa device/vhost device and can access the vring created by virtio
irrespective of whether the vring can be accessed via mmio or kernel
space or user space.

I think vdpa always works with client drivers in userspace and providing
userspace address for vring.



Sorry for being unclear. What I meant is not replacing vDPA with the 
vhost(bus) you proposed but the possibility of replacing virtio-pci-epf 
with vDPA in:


My question is basically for the part of 

Re: [PATCH] vhost: reduce stack usage in log_used

2020-09-15 Thread Jason Wang


On 2020/9/15 上午2:08, Li Wang wrote:

Fix the warning: [-Werror=-Wframe-larger-than=]

drivers/vhost/vhost.c: In function log_used:
drivers/vhost/vhost.c:1906:1:
warning: the frame size of 1040 bytes is larger than 1024 bytes

Signed-off-by: Li Wang 
---
  drivers/vhost/vhost.c | 2 +-
  drivers/vhost/vhost.h | 1 +
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index b45519c..31837a5
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1884,7 +1884,7 @@ static int log_write_hva(struct vhost_virtqueue *vq, u64 
hva, u64 len)
  
  static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)

  {
-   struct iovec iov[64];
+   struct iovec *iov = vq->log_iov;
int i, ret;
  
  	if (!vq->iotlb)

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 9032d3c..5fe4b47
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -123,6 +123,7 @@ struct vhost_virtqueue {
/* Log write descriptors */
void __user *log_base;
struct vhost_log *log;
+   struct iovec log_iov[64];
  
  	/* Ring endianness. Defaults to legacy native endianness.

 * Set to true when starting a modern virtio device. */



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost_vdpa: Fix duplicate included kernel.h

2020-09-15 Thread Jason Wang


On 2020/9/15 上午8:51, Tian Tao wrote:

linux/kernel.h is included more than once, Remove the one that isn't
necessary.

Signed-off-by: Tian Tao 
---
  drivers/vhost/vdpa.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 3fab94f..95e2b83 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -22,7 +22,6 @@
  #include 
  #include 
  #include 
-#include 
  
  #include "vhost.h"
  



Acked-by: Jason Wang 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 2/7] kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED

2020-09-15 Thread David Hildenbrand
On 15.09.20 04:20, Wei Yang wrote:
> On Tue, Sep 08, 2020 at 10:10:07PM +0200, David Hildenbrand wrote:
>> IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
>> always set to 0 by hardware. This is far from beautiful (and confusing),
>> and the bit only applies to SYSRAM. So let's move it out of the
>> bus-specific (PnP) defined bits.
>>
>> We'll add another SYSRAM specific bit soon. If we ever need more bits for
>> other purposes, we can steal some from "desc", or reshuffle/regroup what we
>> have.
> 
> I think you make this definition because we use IORESOURCE_SYSRAM_RAM for
> hotpluged memory? So we make them all in IORESOURCE_SYSRAM_XXX family?

Yeah, to specify based on the extended MEM type SYSRAM. Because it
really only applies to that.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/7] kernel/resource: make release_mem_region_adjustable() never fail

2020-09-15 Thread David Hildenbrand


>> static int __ref try_remove_memory(int nid, u64 start, u64 size)
>> {
>>  int rc = 0;
>> @@ -1777,7 +1757,7 @@ static int __ref try_remove_memory(int nid, u64 start, 
>> u64 size)
>>  memblock_remove(start, size);
>>  }
>>
>> -__release_memory_resource(start, size);
>> +release_mem_region_adjustable(_resource, start, size);
>>
> 
> Seems the only user of release_mem_region_adjustable() is here, can we move
> iomem_resource into the function body? Actually, we don't iterate the resource
> tree from any level. We always start from the root.

You mean, making iomem_resource implicit? I can spot that something
similar was done for

#define devm_release_mem_region(dev, start, n) \
__devm_release_region(dev, _resource, (start), (n))

I'll send an addon patch for that, ok? - thanks.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 5/8] mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources

2020-09-15 Thread David Hildenbrand
On 15.09.20 04:43, Wei Yang wrote:
> On Fri, Sep 11, 2020 at 12:34:56PM +0200, David Hildenbrand wrote:
>> Some add_memory*() users add memory in small, contiguous memory blocks.
>> Examples include virtio-mem, hyper-v balloon, and the XEN balloon.
>>
>> This can quickly result in a lot of memory resources, whereby the actual
>> resource boundaries are not of interest (e.g., it might be relevant for
>> DIMMs, exposed via /proc/iomem to user space). We really want to merge
>> added resources in this scenario where possible.
>>
>> Let's provide a flag (MEMHP_MERGE_RESOURCE) to specify that a resource
>> either created within add_memory*() or passed via add_memory_resource()
>> shall be marked mergeable and merged with applicable siblings.
>>
>> To implement that, we need a kernel/resource interface to mark selected
>> System RAM resources mergeable (IORESOURCE_SYSRAM_MERGEABLE) and trigger
>> merging.
>>
>> Note: We really want to merge after the whole operation succeeded, not
>> directly when adding a resource to the resource tree (it would break
>> add_memory_resource() and require splitting resources again when the
>> operation failed - e.g., due to -ENOMEM).
> 
> Oops, the latest version is here.

Yeah, sorry, I dropped the "mm" prefix on the subject of the cover
letter by mistake.

> 
> BTW, I don't see patch 4. Not sure it is junked by my mail system?

At least you're in the CC list below with your old mail address (I
assume you monitor that).

I'll try to use your new address in the future.


-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization