Re: [virtio-dev] Re: clarifying the handling of responses for virtio-rpmb

2020-09-11 Thread Alex Bennée

Harald Mommer  writes:

> Hello,
>
> I had my hands in a virtio RPMB device implementation the last few
> weeks. During the development process I had to apply some patches to the
> virtio RPMB driver:
>
>   * Change the device id from 0x to 28
>
>   * (Add some debug facilities. Needed to see the frames. Got first no
> request frames on the device side, nothing.)
>
>   * Fix descriptor directions. For the outgoing frames num_in was
> incremented instead of num_out.
>
> The frames in the for-loop may be outgoing or intended for incoming
> data. Decided on the RPMB_F_WRITE flag what to do with those frames:
>
>for (i = 0; i < ncmds; i++) {
>  ...
>
>  if (cmds[i].flags & RPMB_F_WRITE)
>  sgs[num_out++ + num_in] = [i];
>  else
>  sgs[num_out + num_in++] = [i];
>  }
>
>   * Got now too much data comparing to the virtio spec. Removed those
> additional frames in the beginning disabling some pieces of code in
> the virtio RPMB driver.
>
> You are probably puzzled by something which I think is a bug in the
> virtio RPMB driver regarding the descriptor directions. Could be that
> some device implementations do not really care about provided descriptor
> directions, in this case this may go unnoticed for a while.

I wonder if we've ended up making very similar changes to the virtio
driver? I suspect because the originally driver had a whole bunch of
command frames for something that never made it into the final spec.

FWIW my current hacked up tree is here:

  
https://git.linaro.org/people/alex.bennee/linux.git/log/?h=testing/ivshmem-and-rpm-aug2020

I was pondering if it was worth removing the file-system integration
patches and just posting a series which implements the rpmb char device,
userspace tool and the virtio-rpmb driver?

>
>
> Am 10.09.20 um 15:08 schrieb Alex Bennée:
>> CAUTION: This email originated from outside of the organization.
>> Do not click links or open attachments unless you recognize the sender and 
>> know the content is safe.
>>
>>
>> Alex Bennée  writes:
>>
>>> Hi,
>>>
>>> The specification lists a number of commands that have responses:
>>>
>>>The operation of a virtio RPMB device is driven by the requests placed
>>>on the virtqueue. The type of request can be program key
>>>(VIRTIO_RPMB_REQ_PROGRAM_KEY), get write counter
>>>(VIRTIO_RPMB_REQ_GET_WRITE_COUNTER), write
>>>(VIRTIO_RPMB_REQ_DATA_WRITE), and read (VIRTIO_RPMB_REQ_DATA_READ). A
>>>program key or write request can also combine with a result read
>>>(VIRTIO_RPMB_REQ_RESULT_READ) for a returned result.
>>>
>>> Now I'm deep in the guts of virt queues doing the implementation I'm
>>> trying to clarify exactly what frames should be sent to the device and
>>> if they should be out_sgs or in_sgs. I suspect there is some difference
>>> between the original prototype interface and what we have now.
>>>
>>> Some operations obviously have an implied response
>>> (VIRTIO_RPMB_REQ_GET_WRITE_COUNTER and VIRTIO_RPMB_REQ_DATA_READ). As
>>> far as I could tell the frame should be simple:
>>>
>>>sg[0] (out_sg=1) - frame with command
>>>sg[1..n] (in_sg=1..n) - space for the reply to be filled in by the device
>>>
>>> However the language for the program key and data write say they can be
>>> combined with a VIRTIO_RPMB_REQ_RESULT_READ to optionally return a
>>> result. My question is is this result read meant to be in a separate
>>> request frame and response frame so you get:
>>>
>>>   sg[0] - VIRTIO_RPMB_REQ_PROGRAM_KEY/VIRTIO_RPMB_REQ_DATA_WRITE frame
>>>   sg[1] - VIRTIO_RPMB_REQ_RESULT_READ (out_sg=2)
>>>   sg[2] - empty frame for response (in_sg=1)
> This is what works after applying the direction patch above in the
> virtio driver and which makes also sense to me. See also below my
> comment for the rpmb_ioctl() code.
>>>
>>> or:
>>>
>>>   sg[0] - VIRTIO_RPMB_REQ_PROGRAM_KEY/VIRTIO_RPMB_REQ_DATA_WRITE frame 
>>> (out_sg=1)
>>>   sg[1] - VIRTIO_RPMB_REQ_RESULT_READ (in_sg=1)
> Makes no sense for me. The VIRTIO_RPMB_REQ_RESULT_READ is a request
> (command) in the same way as the other requests.
>>>
>>> where the result frame is filled in and sent back?
>>>
>>> I must say I'm a little confused by the logic in rpmb_ioctl (in the
>>> userspace tool) which creates both out_frames and resp frames:
>
> Was also confused but it's not that complicated (after some hours). For
> REQ_PROGRAM_KEY/REQ_WRITE_DATA is always an additional REQ_RESULT_READ
> added. So in the end as last descriptor there is always an incoming
> frame to be filled either with the  RESULT_READ data or the response
> data for REQ_GET_WRITE_COUNTER/REQ_DATA_READ.
>
>>>static int rpmb_ioctl(uint8_t frame_type, int fd, uint16_t req,
>>>  const void *frames_in, unsigned int cnt_in,
>>>  void *frames_out, unsigned int cnt_out)
>>>{
>>>int ret;
>>>struct __packed {
>>>

[PATCH v4 8/8] hv_balloon: try to merge system ram resources

2020-09-11 Thread David Hildenbrand
Let's try to merge system ram resources we add, to minimize the number
of resources in /proc/iomem. We don't care about the boundaries of
individual chunks we added.

Reviewed-by: Wei Liu 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Wei Liu 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 drivers/hv/hv_balloon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 3c0d52e244520..b64d2efbefe71 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
long size,
 
nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
ret = add_memory(nid, PFN_PHYS((start_pfn)),
-   (HA_CHUNK << PAGE_SHIFT), MHP_NONE);
+   (HA_CHUNK << PAGE_SHIFT), MEMHP_MERGE_RESOURCE);
 
if (ret) {
pr_err("hot_add memory failed error is %d\n", ret);
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 7/8] xen/balloon: try to merge system ram resources

2020-09-11 Thread David Hildenbrand
Let's try to merge system ram resources we add, to minimize the number
of resources in /proc/iomem. We don't care about the boundaries of
individual chunks we added.

Reviewed-by: Juergen Gross 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Roger Pau Monné 
Cc: Julien Grall 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 drivers/xen/balloon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 9f40a294d398d..b57b2067ecbfb 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -331,7 +331,7 @@ static enum bp_state reserve_additional_memory(void)
mutex_unlock(_mutex);
/* add_memory_resource() requires the device_hotplug lock */
lock_device_hotplug();
-   rc = add_memory_resource(nid, resource, MHP_NONE);
+   rc = add_memory_resource(nid, resource, MEMHP_MERGE_RESOURCE);
unlock_device_hotplug();
mutex_lock(_mutex);
 
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v4 6/8] virtio-mem: try to merge system ram resources

2020-09-11 Thread David Hildenbrand
virtio-mem adds memory in memory block granularity, to be able to
remove it in the same granularity again later, and to grow slowly on
demand. This, however, results in quite a lot of resources when
adding a lot of memory. Resources are effectively stored in a list-based
tree. Having a lot of resources not only wastes memory, it also makes
traversing that tree more expensive, and makes /proc/iomem explode in
size (e.g., requiring kexec-tools to manually merge resources later
when e.g., trying to create a kdump header).

Before this patch, we get (/proc/iomem) when hotplugging 2G via virtio-mem
on x86-64:
[...]
1-13fff : System RAM
14000-33fff : virtio0
  14000-147ff : System RAM (virtio_mem)
  14800-14fff : System RAM (virtio_mem)
  15000-157ff : System RAM (virtio_mem)
  15800-15fff : System RAM (virtio_mem)
  16000-167ff : System RAM (virtio_mem)
  16800-16fff : System RAM (virtio_mem)
  17000-177ff : System RAM (virtio_mem)
  17800-17fff : System RAM (virtio_mem)
  18000-187ff : System RAM (virtio_mem)
  18800-18fff : System RAM (virtio_mem)
  19000-197ff : System RAM (virtio_mem)
  19800-19fff : System RAM (virtio_mem)
  1a000-1a7ff : System RAM (virtio_mem)
  1a800-1afff : System RAM (virtio_mem)
  1b000-1b7ff : System RAM (virtio_mem)
  1b800-1bfff : System RAM (virtio_mem)
328000-32 : PCI Bus :00

With this patch, we get (/proc/iomem):
[...]
fffc- : Reserved
1-13fff : System RAM
14000-33fff : virtio0
  14000-1bfff : System RAM (virtio_mem)
328000-32 : PCI Bus :00

Of course, with more hotplugged memory, it gets worse. When unplugging
memory blocks again, try_remove_memory() (via
offline_and_remove_memory()) will properly split the resource up again.

Reviewed-by: Pankaj Gupta 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio_mem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index ed99e43354010..ba4de598f6636 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -424,7 +424,8 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, 
unsigned long mb_id)
 
dev_dbg(>vdev->dev, "adding memory block: %lu\n", mb_id);
return add_memory_driver_managed(nid, addr, memory_block_size_bytes(),
-vm->resource_name, MHP_NONE);
+vm->resource_name,
+MEMHP_MERGE_RESOURCE);
 }
 
 /*
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 3/8] mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG

2020-09-11 Thread David Hildenbrand
We soon want to pass flags via a new type to add_memory() and friends.
That revealed that we currently don't guard some declarations by
CONFIG_MEMORY_HOTPLUG.

While some definitions could be moved to different places, let's keep it
minimal for now and use CONFIG_MEMORY_HOTPLUG for all functions only
compiled with CONFIG_MEMORY_HOTPLUG.

Wrap sparse_decode_mem_map() into CONFIG_MEMORY_HOTPLUG, it's only
called from CONFIG_MEMORY_HOTPLUG code.

While at it, remove allow_online_pfn_range(), which is no longer around,
and mhp_notimplemented(), which is unused.

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 include/linux/memory_hotplug.h | 12 +++-
 mm/sparse.c|  2 ++
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 51a877fec8da8..1504b4d5ae6ce 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -247,13 +247,6 @@ static inline void zone_span_writelock(struct zone *zone) 
{}
 static inline void zone_span_writeunlock(struct zone *zone) {}
 static inline void zone_seqlock_init(struct zone *zone) {}
 
-static inline int mhp_notimplemented(const char *func)
-{
-   printk(KERN_WARNING "%s() called, with CONFIG_MEMORY_HOTPLUG 
disabled\n", func);
-   dump_stack();
-   return -ENOSYS;
-}
-
 static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
 {
 }
@@ -344,6 +337,7 @@ static inline void __remove_memory(int nid, u64 start, u64 
size) {}
 extern void set_zone_contiguous(struct zone *zone);
 extern void clear_zone_contiguous(struct zone *zone);
 
+#ifdef CONFIG_MEMORY_HOTPLUG
 extern void __ref free_area_init_core_hotplug(int nid);
 extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
@@ -364,8 +358,8 @@ extern void sparse_remove_section(struct mem_section *ms,
unsigned long map_offset, struct vmem_altmap *altmap);
 extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
  unsigned long pnum);
-extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long 
nr_pages,
-   int online_type);
 extern struct zone *zone_for_pfn_range(int online_type, int nid, unsigned 
start_pfn,
unsigned long nr_pages);
+#endif /* CONFIG_MEMORY_HOTPLUG */
+
 #endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/sparse.c b/mm/sparse.c
index b25ad8e648392..7bd23f9d6cef6 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -312,6 +312,7 @@ static unsigned long sparse_encode_mem_map(struct page 
*mem_map, unsigned long p
return coded_mem_map;
 }
 
+#ifdef CONFIG_MEMORY_HOTPLUG
 /*
  * Decode mem_map from the coded memmap
  */
@@ -321,6 +322,7 @@ struct page *sparse_decode_mem_map(unsigned long 
coded_mem_map, unsigned long pn
coded_mem_map &= SECTION_MAP_MASK;
return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
 }
+#endif /* CONFIG_MEMORY_HOTPLUG */
 
 static void __meminit sparse_init_one_section(struct mem_section *ms,
unsigned long pnum, struct page *mem_map,
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 5/8] mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System RAM resources

2020-09-11 Thread David Hildenbrand
Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space). We really want to merge
added resources in this scenario where possible.

Let's provide a flag (MEMHP_MERGE_RESOURCE) to specify that a resource
either created within add_memory*() or passed via add_memory_resource()
shall be marked mergeable and merged with applicable siblings.

To implement that, we need a kernel/resource interface to mark selected
System RAM resources mergeable (IORESOURCE_SYSRAM_MERGEABLE) and trigger
merging.

Note: We really want to merge after the whole operation succeeded, not
directly when adding a resource to the resource tree (it would break
add_memory_resource() and require splitting resources again when the
operation failed - e.g., due to -ENOMEM).

Reviewed-by: Pankaj Gupta 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Ard Biesheuvel 
Cc: Thomas Gleixner 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Wei Liu 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Roger Pau Monné 
Cc: Julien Grall 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 include/linux/ioport.h |  4 +++
 include/linux/memory_hotplug.h |  7 
 kernel/resource.c  | 60 ++
 mm/memory_hotplug.c|  7 
 4 files changed, 78 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index d7620d7c941a0..7e61389dcb017 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -60,6 +60,7 @@ struct resource {
 
 /* IORESOURCE_SYSRAM specific bits. */
 #define IORESOURCE_SYSRAM_DRIVER_MANAGED   0x0200 /* Always detected 
via a driver. */
+#define IORESOURCE_SYSRAM_MERGEABLE0x0400 /* Resource can be 
merged. */
 
 #define IORESOURCE_EXCLUSIVE   0x0800  /* Userland may not map this 
resource */
 
@@ -253,6 +254,9 @@ extern void __release_region(struct resource *, 
resource_size_t,
 extern void release_mem_region_adjustable(struct resource *, resource_size_t,
  resource_size_t);
 #endif
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern void merge_system_ram_resource(struct resource *res);
+#endif
 
 /* Wrappers for managed devices */
 struct device;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 33eb80fdba22f..d65c6fdc5cfc3 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -62,6 +62,13 @@ typedef int __bitwise mhp_t;
 
 /* No special request */
 #define MHP_NONE   ((__force mhp_t)0)
+/*
+ * Allow merging of the added System RAM resource with adjacent,
+ * mergeable resources. After a successful call to add_memory_resource()
+ * with this flag set, the resource pointer must no longer be used as it
+ * might be stale, or the resource might have changed.
+ */
+#define MEMHP_MERGE_RESOURCE   ((__force mhp_t)BIT(0))
 
 /*
  * Extended parameters for memory hotplug:
diff --git a/kernel/resource.c b/kernel/resource.c
index 36b3552210120..7a91b935f4c20 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1363,6 +1363,66 @@ void release_mem_region_adjustable(struct resource 
*parent,
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+static bool system_ram_resources_mergeable(struct resource *r1,
+  struct resource *r2)
+{
+   /* We assume either r1 or r2 is IORESOURCE_SYSRAM_MERGEABLE. */
+   return r1->flags == r2->flags && r1->end + 1 == r2->start &&
+  r1->name == r2->name && r1->desc == r2->desc &&
+  !r1->child && !r2->child;
+}
+
+/*
+ * merge_system_ram_resource - mark the System RAM resource mergeable and try 
to
+ * merge it with adjacent, mergeable resources
+ * @res: resource descriptor
+ *
+ * This interface is intended for memory hotplug, whereby lots of contiguous
+ * system ram resources are added (e.g., via add_memory*()) by a driver, and
+ * the actual resource boundaries are not of interest (e.g., it might be
+ * relevant for DIMMs). Only resources that are marked mergeable, that have the
+ * same parent, and that don't have any children are considered. All mergeable
+ * resources must be immutable during the request.
+ *
+ * Note:
+ * - The caller has to make sure that no pointers to resources that are
+ *   marked mergeable are used anymore after this call - the resource might
+ *   be freed and the pointer might be stale!
+ * - release_mem_region_adjustable() will split on demand on memory hotunplug
+ */
+void merge_system_ram_resource(struct resource *res)
+{
+   const unsigned long flags = IORESOURCE_SYSTEM_RAM | 

[PATCH v4 1/8] kernel/resource: make release_mem_region_adjustable() never fail

2020-09-11 Thread David Hildenbrand
Let's make sure splitting a resource on memory hotunplug will never fail.
This will become more relevant once we merge selected System RAM
resources - then, we'll trigger that case more often on memory hotunplug.

In general, this function is already unlikely to fail. When we remove
memory, we free up quite a lot of metadata (memmap, page tables, memory
block device, etc.). The only reason it could really fail would be when
injecting allocation errors.

All other error cases inside release_mem_region_adjustable() seem to be
sanity checks if the function would be abused in different context -
let's add WARN_ON_ONCE() in these cases so we can catch them.

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Ard Biesheuvel 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Signed-off-by: David Hildenbrand 
---
 include/linux/ioport.h |  4 ++--
 kernel/resource.c  | 49 --
 mm/memory_hotplug.c| 22 +--
 3 files changed, 31 insertions(+), 44 deletions(-)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb7..52a91f5fa1a36 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -248,8 +248,8 @@ extern struct resource * __request_region(struct resource *,
 extern void __release_region(struct resource *, resource_size_t,
resource_size_t);
 #ifdef CONFIG_MEMORY_HOTREMOVE
-extern int release_mem_region_adjustable(struct resource *, resource_size_t,
-   resource_size_t);
+extern void release_mem_region_adjustable(struct resource *, resource_size_t,
+ resource_size_t);
 #endif
 
 /* Wrappers for managed devices */
diff --git a/kernel/resource.c b/kernel/resource.c
index f1175ce93a1d5..36b3552210120 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1258,21 +1258,28 @@ EXPORT_SYMBOL(__release_region);
  *   assumes that all children remain in the lower address entry for
  *   simplicity.  Enhance this logic when necessary.
  */
-int release_mem_region_adjustable(struct resource *parent,
- resource_size_t start, resource_size_t size)
+void release_mem_region_adjustable(struct resource *parent,
+  resource_size_t start, resource_size_t size)
 {
+   struct resource *new_res = NULL;
+   bool alloc_nofail = false;
struct resource **p;
struct resource *res;
-   struct resource *new_res;
resource_size_t end;
-   int ret = -EINVAL;
 
end = start + size - 1;
-   if ((start < parent->start) || (end > parent->end))
-   return ret;
+   if (WARN_ON_ONCE((start < parent->start) || (end > parent->end)))
+   return;
 
-   /* The alloc_resource() result gets checked later */
-   new_res = alloc_resource(GFP_KERNEL);
+   /*
+* We free up quite a lot of memory on memory hotunplug (esp., memap),
+* just before releasing the region. This is highly unlikely to
+* fail - let's play save and make it never fail as the caller cannot
+* perform any error handling (e.g., trying to re-add memory will fail
+* similarly).
+*/
+retry:
+   new_res = alloc_resource(GFP_KERNEL | alloc_nofail ? __GFP_NOFAIL : 0);
 
p = >child;
write_lock(_lock);
@@ -1298,7 +1305,6 @@ int release_mem_region_adjustable(struct resource *parent,
 * so if we are dealing with them, let us just back off here.
 */
if (!(res->flags & IORESOURCE_SYSRAM)) {
-   ret = 0;
break;
}
 
@@ -1315,20 +1321,23 @@ int release_mem_region_adjustable(struct resource 
*parent,
/* free the whole entry */
*p = res->sibling;
free_resource(res);
-   ret = 0;
} else if (res->start == start && res->end != end) {
/* adjust the start */
-   ret = __adjust_resource(res, end + 1,
-   res->end - end);
+   WARN_ON_ONCE(__adjust_resource(res, end + 1,
+  res->end - end));
} else if (res->start != start && res->end == end) {
/* adjust the end */
-   ret = __adjust_resource(res, res->start,
-   start - res->start);
+   WARN_ON_ONCE(__adjust_resource(res, res->start,
+  start - res->start));
} else {
-   /* split into two entries */
+   /* split into two entries - we need a new resource */
if (!new_res) {
-

[PATCH v4 0/8] selective merging of system ram resources

2020-09-11 Thread David Hildenbrand
Some add_memory*() users add memory in small, contiguous memory blocks.
Examples include virtio-mem, hyper-v balloon, and the XEN balloon.

This can quickly result in a lot of memory resources, whereby the actual
resource boundaries are not of interest (e.g., it might be relevant for
DIMMs, exposed via /proc/iomem to user space). We really want to merge
added resources in this scenario where possible.

Resources are effectively stored in a list-based tree. Having a lot of
resources not only wastes memory, it also makes traversing that tree more
expensive, and makes /proc/iomem explode in size (e.g., requiring
kexec-tools to manually merge resources when creating a kdump header. The
current kexec-tools resource count limit does not allow for more than
~100GB of memory with a memory block size of 128MB on x86-64).

Let's allow to selectively merge system ram resources by specifying a
new flag for add_memory*(). Patch #5 contains a /proc/iomem example. Only
tested with virtio-mem.

v3 -> v4:
- "mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG"
-- Fix configs without CONFIG_MEMORY_HOTPLUG with the new mhp_t type
-- Did a buch of cross-compiles with different configs, hope there isn't
   anything I missed.

v2 -> v3:
- "mm/memory_hotplug: prepare passing flags to add_memory() and friends"
-- Use proper __bitwise type for flags
-- Use "MHP_NONE" for empty flags
- Rebased to latest -next, added rb's

v1 -> v2:
- I had another look at v1 after vacation and didn't like it - it felt like
  a hack. So I want forward and added a proper flag to add_memory*(), and
  introduce a clean (non-racy) way to mark System RAM resources mergeable.
- "kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED"
-- Clean that flag up, felt wrong in the PnP section
- "mm/memory_hotplug: prepare passing flags to add_memory() and friends"
-- Previously sent in other context - decided to keep Wei's ack
- "mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System
   RAM resources"
-- Cleaner approach to get the job done by using proper flags and only
   merging the single, specified resource
- "virtio-mem: try to merge system ram resources"
  "xen/balloon: try to merge system ram resources"
  "hv_balloon: try to merge system ram resources"
-- Use the new flag MEMHP_MERGE_RESOURCE, much cleaner

RFC -> v1:
- Switch from rather generic "merge_child_mem_resources()" where a resource
  name has to be specified to "merge_system_ram_resources().
- Smaller comment/documentation/patch description changes/fixes

David Hildenbrand (8):
  kernel/resource: make release_mem_region_adjustable() never fail
  kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED
  mm/memory_hotplug: guard more declarations by CONFIG_MEMORY_HOTPLUG
  mm/memory_hotplug: prepare passing flags to add_memory() and friends
  mm/memory_hotplug: MEMHP_MERGE_RESOURCE to specify merging of System
RAM resources
  virtio-mem: try to merge system ram resources
  xen/balloon: try to merge system ram resources
  hv_balloon: try to merge system ram resources

 arch/powerpc/platforms/powernv/memtrace.c |   2 +-
 .../platforms/pseries/hotplug-memory.c|   2 +-
 drivers/acpi/acpi_memhotplug.c|   3 +-
 drivers/base/memory.c |   3 +-
 drivers/dax/kmem.c|   2 +-
 drivers/hv/hv_balloon.c   |   2 +-
 drivers/s390/char/sclp_cmd.c  |   2 +-
 drivers/virtio/virtio_mem.c   |   3 +-
 drivers/xen/balloon.c |   2 +-
 include/linux/ioport.h|  12 +-
 include/linux/memory_hotplug.h|  35 +++---
 kernel/kexec_file.c   |   2 +-
 kernel/resource.c | 109 ++
 mm/memory_hotplug.c   |  47 +++-
 mm/sparse.c   |   2 +
 15 files changed, 151 insertions(+), 77 deletions(-)

-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 4/8] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-11 Thread David Hildenbrand
We soon want to pass flags, e.g., to mark added System RAM resources.
mergeable. Prepare for that.

This patch is based on a similar patch by Oscar Salvador:

https://lkml.kernel.org/r/20190625075227.15193-3-osalva...@suse.de

Acked-by: Wei Liu 
Reviewed-by: Juergen Gross  # Xen related part
Reviewed-by: Pankaj Gupta 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Jason Gunthorpe 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Wei Liu 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: David Hildenbrand 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: "Oliver O'Halloran" 
Cc: Pingfan Liu 
Cc: Nathan Lynch 
Cc: Libor Pechacek 
Cc: Anton Blanchard 
Cc: Leonardo Bras 
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-a...@vger.kernel.org
Cc: linux-nvd...@lists.01.org
Cc: linux-hyp...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-de...@lists.xenproject.org
Signed-off-by: David Hildenbrand 
---
 arch/powerpc/platforms/powernv/memtrace.c   |  2 +-
 arch/powerpc/platforms/pseries/hotplug-memory.c |  2 +-
 drivers/acpi/acpi_memhotplug.c  |  3 ++-
 drivers/base/memory.c   |  3 ++-
 drivers/dax/kmem.c  |  2 +-
 drivers/hv/hv_balloon.c |  2 +-
 drivers/s390/char/sclp_cmd.c|  2 +-
 drivers/virtio/virtio_mem.c |  2 +-
 drivers/xen/balloon.c   |  2 +-
 include/linux/memory_hotplug.h  | 16 
 mm/memory_hotplug.c | 14 +++---
 11 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
b/arch/powerpc/platforms/powernv/memtrace.c
index 13b369d2cc454..6828108486f83 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -224,7 +224,7 @@ static int memtrace_online(void)
ent->mem = 0;
}
 
-   if (add_memory(ent->nid, ent->start, ent->size)) {
+   if (add_memory(ent->nid, ent->start, ent->size, MHP_NONE)) {
pr_err("Failed to add trace memory to node %d\n",
ent->nid);
ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 0ea976d1cac47..e1c9fa0d730f5 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -615,7 +615,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
/* Add the memory */
-   rc = __add_memory(nid, lmb->base_addr, block_sz);
+   rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
if (rc) {
invalidate_lmb_associativity_index(lmb);
return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index e294f44a78504..2067c3bc55763 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -207,7 +207,8 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
-   result = __add_memory(node, info->start_addr, info->length);
+   result = __add_memory(node, info->start_addr, info->length,
+ MHP_NONE);
 
/*
 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 4db3c660de831..b4c297dd04755 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -432,7 +432,8 @@ static ssize_t probe_store(struct device *dev, struct 
device_attribute *attr,
 
nid = memory_add_physaddr_to_nid(phys_addr);
ret = __add_memory(nid, phys_addr,
-  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+  MIN_MEMORY_BLOCK_SIZE * sections_per_block,
+  MHP_NONE);
 
if (ret)
goto out;
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 7dcb2902e9b1b..896cb9444e727 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -95,7 +95,7 @@ int dev_dax_kmem_probe(struct dev_dax *dev_dax)
 * this as RAM automatically.
 */
rc = add_memory_driver_managed(numa_node, range.start,
-   range_len(), kmem_name);
+  

[PATCH v4 2/8] kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED

2020-09-11 Thread David Hildenbrand
IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
always set to 0 by hardware. This is far from beautiful (and confusing),
and the bit only applies to SYSRAM. So let's move it out of the
bus-specific (PnP) defined bits.

We'll add another SYSRAM specific bit soon. If we ever need more bits for
other purposes, we can steal some from "desc", or reshuffle/regroup what we
have.

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Dan Williams 
Cc: Jason Gunthorpe 
Cc: Kees Cook 
Cc: Ard Biesheuvel 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Cc: Wei Yang 
Cc: Eric Biederman 
Cc: Thomas Gleixner 
Cc: Greg Kroah-Hartman 
Cc: ke...@lists.infradead.org
Signed-off-by: David Hildenbrand 
---
 include/linux/ioport.h | 4 +++-
 kernel/kexec_file.c| 2 +-
 mm/memory_hotplug.c| 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 52a91f5fa1a36..d7620d7c941a0 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -58,6 +58,9 @@ struct resource {
 #define IORESOURCE_EXT_TYPE_BITS 0x0100/* Resource extended types */
 #define IORESOURCE_SYSRAM  0x0100  /* System RAM (modifier) */
 
+/* IORESOURCE_SYSRAM specific bits. */
+#define IORESOURCE_SYSRAM_DRIVER_MANAGED   0x0200 /* Always detected 
via a driver. */
+
 #define IORESOURCE_EXCLUSIVE   0x0800  /* Userland may not map this 
resource */
 
 #define IORESOURCE_DISABLED0x1000
@@ -103,7 +106,6 @@ struct resource {
 #define IORESOURCE_MEM_32BIT   (3<<3)
 #define IORESOURCE_MEM_SHADOWABLE  (1<<5)  /* dup: IORESOURCE_SHADOWABLE */
 #define IORESOURCE_MEM_EXPANSIONROM(1<<6)
-#define IORESOURCE_MEM_DRIVER_MANAGED  (1<<7)
 
 /* PnP I/O specific bits (IORESOURCE_BITS) */
 #define IORESOURCE_IO_16BIT_ADDR   (1<<0)
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index ca40bef75a616..dfeeed1aed084 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -520,7 +520,7 @@ static int locate_mem_hole_callback(struct resource *res, 
void *arg)
/* Returning 0 will take to next memory range */
 
/* Don't use memory that will be detected and handled by a driver. */
-   if (res->flags & IORESOURCE_MEM_DRIVER_MANAGED)
+   if (res->flags & IORESOURCE_SYSRAM_DRIVER_MANAGED)
return 0;
 
if (sz < kbuf->memsz)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4c47b68a9f4b5..8e1cd18b5cf14 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, 
u64 size,
unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 
if (strcmp(resource_name, "System RAM"))
-   flags |= IORESOURCE_MEM_DRIVER_MANAGED;
+   flags |= IORESOURCE_SYSRAM_DRIVER_MANAGED;
 
/*
 * Make sure value parsed from 'mem=' only restricts memory adding
@@ -1160,7 +1160,7 @@ EXPORT_SYMBOL_GPL(add_memory);
  *
  * For this memory, no entries in /sys/firmware/memmap ("raw firmware-provided
  * memory map") are created. Also, the created memory resource is flagged
- * with IORESOURCE_MEM_DRIVER_MANAGED, so in-kernel users can special-case
+ * with IORESOURCE_SYSRAM_DRIVER_MANAGED, so in-kernel users can special-case
  * this memory as well (esp., not place kexec images onto it).
  *
  * The resource_name (visible via /proc/iomem) has to have the format
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-11 Thread David Hildenbrand
On 11.09.20 04:21, kernel test robot wrote:
> Hi David,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on next-20200909]
> [cannot apply to mmotm/master hnaz-linux-mm/master xen-tip/linux-next 
> powerpc/next linus/master v5.9-rc4 v5.9-rc3 v5.9-rc2 v5.9-rc4]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:
> https://github.com/0day-ci/linux/commits/David-Hildenbrand/mm-memory_hotplug-selective-merging-of-system-ram-resources/20200910-171630
> base:7204eaa2c1f509066486f488c9dcb065d7484494
> config: x86_64-randconfig-a016-20200909 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
> 0a5dc7effb191eff740e0e7ae7bd8e1f6bdb3ad9)
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install x86_64 cross compiling tool for clang build
> # apt-get install binutils-x86-64-linux-gnu
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross 
> ARCH=x86_64 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> All errors (new ones prefixed by >>):
> 
>WARNING: unmet direct dependencies detected for PHY_SAMSUNG_UFS
>Depends on OF && (ARCH_EXYNOS || COMPILE_TEST
>Selected by
>- SCSI_UFS_EXYNOS && SCSI_LOWLEVEL && SCSI && SCSI_UFSHCD_PLATFORM && 
> (ARCH_EXYNOS || COMPILE_TEST
>In file included from arch/x86/kernel/asm-offsets.c:9:
>In file included from include/linux/crypto.h:20:
>In file included from include/linux/slab.h:15:
>In file included from include/linux/gfp.h:6:
>In file included from include/linux/mmzone.h:853:
>>> include/linux/memory_hotplug.h:354:55: error: unknown type name 'mhp_t'
>extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>^
>include/linux/memory_hotplug.h:355:53: error: unknown type name 'mhp_t'
>extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>^
>include/linux/memory_hotplug.h:357:11: error: unknown type name 'mhp_t'
>mhp_t mhp_flags);
>^
>include/linux/memory_hotplug.h:360:10: error: unknown type name 'mhp_t'
>mhp_t mhp_flags);
>^
>4 errors generated.
>Makefile Module.symvers System.map arch block certs crypto drivers fs 
> include init ipc kernel lib mm modules.builtin modules.builtin.modinfo 
> modules.order net scripts security sound source tools usr virt vmlinux 
> vmlinux.o vmlinux.symvers [scripts/Makefile.build:117: 
> arch/x86/kernel/asm-offsets.s] Error 1
>Target '__build' not remade because of errors.
>Makefile Module.symvers System.map arch block certs crypto drivers fs 
> include init ipc kernel lib mm modules.builtin modules.builtin.modinfo 
> modules.order net scripts security sound source tools usr virt vmlinux 
> vmlinux.o vmlinux.symvers [Makefile:1196: prepare0] Error 2
>Target 'prepare' not remade because of errors.
>make: Makefile Module.symvers System.map arch block certs crypto drivers 
> fs include init ipc kernel lib mm modules.builtin modules.builtin.modinfo 
> modules.order net scripts security sound source tools usr virt vmlinux 
> vmlinux.o vmlinux.symvers [Makefile:185: __sub-make] Error 2
>make: Target 'prepare' not remade because of errors.
> 
> # 
> https://github.com/0day-ci/linux/commit/d88270d1c0783a7f99f24a85692be90fd2ae0d7d
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review 
> David-Hildenbrand/mm-memory_hotplug-selective-merging-of-system-ram-resources/20200910-171630
> git checkout d88270d1c0783a7f99f24a85692be90fd2ae0d7d
> vim +/mhp_t +354 include/linux/memory_hotplug.h
> 
>352
>353extern void __ref free_area_init_core_hotplug(int nid);
>  > 354extern int __add_memory(int nid, u64 start, u64 size, mhp_t 
> mhp_flags);
> 

add_memory() and not protected by CONFIG_MEMORY_HOTPLUG, but the new
type is. Will look into it.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 1/4] vhost: convert VHOST_VSOCK_SET_RUNNING to a generic ioctl

2020-09-11 Thread Guennadi Liakhovetski
On Thu, Sep 10, 2020 at 10:46:43AM -0600, Mathieu Poirier wrote:
> On Thu, Sep 10, 2020 at 09:15:13AM +0200, Guennadi Liakhovetski wrote:
> > Hi Mathieu,
> > 
> > On Wed, Sep 09, 2020 at 04:42:14PM -0600, Mathieu Poirier wrote:
> > > On Wed, Aug 26, 2020 at 07:46:33PM +0200, Guennadi Liakhovetski wrote:
> > > > VHOST_VSOCK_SET_RUNNING is used by the vhost vsock driver to perform
> > > > crucial VirtQueue initialisation, like assigning .private fields and
> > > > calling vhost_vq_init_access(), and clean up. However, this ioctl is
> > > > actually extremely useful for any vhost driver, that doesn't have a
> > > > side channel to inform it of a status change, e.g. upon a guest
> > > > reboot. This patch makes that ioctl generic, while preserving its
> > > > numeric value and also keeping the original alias.
> > > > 
> > > > Signed-off-by: Guennadi Liakhovetski 
> > > > 
> > > > ---
> > > >  include/uapi/linux/vhost.h | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > > > index 75232185324a..11a4948b6216 100644
> > > > --- a/include/uapi/linux/vhost.h
> > > > +++ b/include/uapi/linux/vhost.h
> > > > @@ -97,6 +97,8 @@
> > > >  #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
> > > >  #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
> > > >  
> > > > +#define VHOST_SET_RUNNING _IOW(VHOST_VIRTIO, 0x61, int)
> > > > +
> > > 
> > > I don't see it used in the next patches and as such should be part of 
> > > another
> > > series.
> > 
> > It isn't used in the next patches, it is used in this patch - see below.
> >
> 
> Right, but why is this part of this set?  What does it bring?  It should be 
> part
> of a patchset where "VHOST_SET_RUNNING" is used.

Ok, I can remove this patch from this series and make it a part of the series, 
containing [1] "vhost: add an SOF Audio DSP driver"

Thanks
Guennadi

[1] https://www.spinics.net/lists/linux-virtualization/msg43309.html

> > > >  /* VHOST_NET specific defines */
> > > >  
> > > >  /* Attach virtio net ring to a raw socket, or tap device.
> > > > @@ -118,7 +120,7 @@
> > > >  /* VHOST_VSOCK specific defines */
> > > >  
> > > >  #define VHOST_VSOCK_SET_GUEST_CID  _IOW(VHOST_VIRTIO, 0x60, __u64)
> > > > -#define VHOST_VSOCK_SET_RUNNING_IOW(VHOST_VIRTIO, 
> > > > 0x61, int)
> > > > +#define VHOST_VSOCK_SET_RUNNINGVHOST_SET_RUNNING
> > > >  
> > > >  /* VHOST_VDPA specific defines */
> > > >  
> > > > -- 
> > > > 2.28.0
> > > > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 4/4] vhost: add an RPMsg API

2020-09-11 Thread Guennadi Liakhovetski
Hi Mathieu,

On Thu, Sep 10, 2020 at 11:22:11AM -0600, Mathieu Poirier wrote:
> Good morning Guennadi,
> 
> On Thu, Sep 10, 2020 at 10:38:54AM +0200, Guennadi Liakhovetski wrote:
> > Hi Mathieu,
> > 
> > On Wed, Sep 09, 2020 at 04:39:46PM -0600, Mathieu Poirier wrote:
> > > Good afternoon,
> > > 
> > > On Wed, Aug 26, 2020 at 07:46:36PM +0200, Guennadi Liakhovetski wrote:
> > > > Linux supports running the RPMsg protocol over the VirtIO transport
> > > > protocol, but currently there is only support for VirtIO clients and
> > > > no support for a VirtIO server. This patch adds a vhost-based RPMsg
> > > > server implementation.
> > > 
> > > This changelog is very confusing...  At this time the name service in the
> > > remoteproc space runs as a server on the application processor.  But from 
> > > the
> > > above the remoteproc usecase seems to be considered to be a client
> > > configuration.
> > 
> > I agree that this isn't very obvious. But I think it is common to call the 
> > host "a server" and guests "clients." E.g. in vhost.c in the top-of-thefile 
> > comment:
> 
> Ok - that part we agree on.
> 
> > 
> >  * Generic code for virtio server in host kernel.
> > 
> > I think the generic concept behind this notation is, that as guests boot, 
> > they send their requests to the host, e.g. VirtIO device drivers on guests 
> > send requests over VirtQueues to VirtIO servers on the host, which can run 
> > either in the user- or in the kernel-space. And I think you can follow that 
> 
> I can see that process taking place.  After all virtIO devices on guests are
> only stubs that need host support for access to HW.
> 
> > logic in case of devices or remote processors too: it's the main CPU(s) 
> > that boot(s) and start talking to devices and remote processors, so in that 
> > sence devices are servers and the CPUs are their clients.
> 
> In the remote processor case, the remoteproc core (application processor) 
> sets up
> the name service but does not initiate the communication with a remote
> processor.  It simply waits there for a name space request to come in from the
> remote processor.

Hm, I don't see that in two examples, that I looked at: mtk and virtio. In both 
cases the announcement seems to be directly coming from the application 
processor 
maybe after some initialisation.

> > And yes, the name-space announcement use-case seems confusing to me too - 
> > it 
> > reverts the relationship in a way: once a guest has booted and established 
> > connections to any rpmsg "devices," those send their namespace 
> > announcements 
> > back. But I think this can be regarded as server identification: you 
> > connect 
> > to a server and it replies with its identification and capabilities.
> 
> Based on the above can I assume vhost_rpmsg_ns_announce() is sent from the
> guest?

No, it's "vhost_..." so it's running on the host. The host (the server, an 
analogue of the application processor, IIUC) sends NS announcements to guests.

> I saw your V7, something I will look into.  In the mean time I need to bring
> your attention to this set [1] from Arnaud.  Please have a look as it will
> impact your work.
> 
> https://patchwork.kernel.org/project/linux-remoteproc/list/?series=338335

Yes, I've had a look at that series, thanks for forwarding it to me. TBH I 
don't quite understand some choices there, e.g. creating a separate driver and 
then having to register devices just for the namespace announcement. I don't 
think creating virtual devices is taken easily in Linux. But either way I 
don't think our series conflict a lot, but I do hope that I can merge my 
series first, he'd just have to switch to using the header, that I'm adding. 
Hardly too many changes otherwise.

> > > And I don't see a server implementation per se...  It is more like a 
> > > client
> > > implementation since vhost_rpmsg_announce() uses the RESPONSE queue, 
> > > which sends
> > > messages from host to guest.
> > > 
> > > Perhaps it is my lack of familiarity with vhost terminology.
> > > 
> > > > 
> > > > Signed-off-by: Guennadi Liakhovetski 
> > > > 
> > > > ---
> > > >  drivers/vhost/Kconfig   |   7 +
> > > >  drivers/vhost/Makefile  |   3 +
> > > >  drivers/vhost/rpmsg.c   | 373 
> > > >  drivers/vhost/vhost_rpmsg.h |  74 +++
> > > >  4 files changed, 457 insertions(+)
> > > >  create mode 100644 drivers/vhost/rpmsg.c
> > > >  create mode 100644 drivers/vhost/vhost_rpmsg.h
> > > > 
> > > > diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> > > > index 587fbae06182..046b948fc411 100644
> > > > --- a/drivers/vhost/Kconfig
> > > > +++ b/drivers/vhost/Kconfig
> > > > @@ -38,6 +38,13 @@ config VHOST_NET
> > > >   To compile this driver as a module, choose M here: the module 
> > > > will
> > > >   be called vhost_net.
> > > >  
> > > > +config VHOST_RPMSG
> > > > +   tristate
> > > > +   select VHOST
> > > > +   help
> > > > + Vhost