Re: [PATCH v4 0/4] Add a vhost RPMsg API

2020-07-30 Thread Guennadi Liakhovetski
Hi Michael,

On Thu, Jul 30, 2020 at 12:08:29PM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 22, 2020 at 05:09:23PM +0200, Guennadi Liakhovetski wrote:
> > Hi,
> > 
> > Now that virtio-rpmsg endianness fixes have been merged we can 
> > proceed with the next step.
> 
> Which tree is this for?

The essential part of this series is for drivers/vhost, so, I presume 
that should be the target tree as well. There is however a small part 
for the drivers/rpmsg, should I split this series in two or shall we 
first review is as a whole to make its goals clearer?

Thanks
Guennadi

> > v4:
> > - add endianness conversions to comply with the VirtIO standard
> > 
> > v3:
> > - address several checkpatch warnings
> > - address comments from Mathieu Poirier
> > 
> > v2:
> > - update patch #5 with a correct vhost_dev_init() prototype
> > - drop patch #6 - it depends on a different patch, that is currently
> >   an RFC
> > - address comments from Pierre-Louis Bossart:
> >   * remove "default n" from Kconfig
> > 
> > Linux supports RPMsg over VirtIO for "remote processor" / AMP use
> > cases. It can however also be used for virtualisation scenarios,
> > e.g. when using KVM to run Linux on both the host and the guests.
> > This patch set adds a wrapper API to facilitate writing vhost
> > drivers for such RPMsg-based solutions. The first use case is an
> > audio DSP virtualisation project, currently under development, ready
> > for review and submission, available at
> > https://github.com/thesofproject/linux/pull/1501/commits
> > 
> > Thanks
> > Guennadi
> > 
> > Guennadi Liakhovetski (4):
> >   vhost: convert VHOST_VSOCK_SET_RUNNING to a generic ioctl
> >   rpmsg: move common structures and defines to headers
> >   rpmsg: update documentation
> >   vhost: add an RPMsg API
> > 
> >  Documentation/rpmsg.txt  |   6 +-
> >  drivers/rpmsg/virtio_rpmsg_bus.c |  78 +--
> >  drivers/vhost/Kconfig|   7 +
> >  drivers/vhost/Makefile   |   3 +
> >  drivers/vhost/rpmsg.c| 375 +++
> >  drivers/vhost/vhost_rpmsg.h  |  74 ++
> >  include/linux/virtio_rpmsg.h |  83 +++
> >  include/uapi/linux/rpmsg.h   |   3 +
> >  include/uapi/linux/vhost.h   |   4 +-
> >  9 files changed, 553 insertions(+), 80 deletions(-)
> >  create mode 100644 drivers/vhost/rpmsg.c
> >  create mode 100644 drivers/vhost/vhost_rpmsg.h
> >  create mode 100644 include/linux/virtio_rpmsg.h
> > 
> > -- 
> > 2.27.0
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4 0/6] IRQ offloading for vDPA

2020-07-30 Thread Jason Wang


On 2020/7/28 下午12:23, Zhu Lingshan wrote:

This series intends to implement IRQ offloading for
vhost_vdpa.

By the feat of irq forwarding facilities like posted
interrupt on X86, irq bypass can  help deliver
interrupts to vCPU directly.

vDPA devices have dedicated hardware backends like VFIO
pass-throughed devices. So it would be possible to setup
irq offloading(irq bypass) for vDPA devices and gain
performance improvements.

In my testing, with this feature, we can save 0.1ms
in a ping between two VFs on average.
changes from V3:
(1)removed vDPA irq allocate/free helpers in vDPA core.
(2)add a new function get_vq_irq() in struct vdpa_config_ops,
upper layer driver can use this function to: A. query the
irq numbner of a vq. B. detect whether a vq is enabled.
(3)implement get_vq_irq() in ifcvf driver.
(4)in vhost_vdpa, set_status() will setup irq offloading when
setting DRIVER_OK, and unsetup when receive !DRIVER_OK.
(5)minor improvements.



Ok, I think you can go ahead to post a V5. It's not bad to start with 
get_vq_irq() and we can do any changes afterwards if it can work for 
some cases.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V4 4/6] vhost_vdpa: implement IRQ offloading in vhost_vdpa

2020-07-30 Thread Jason Wang


On 2020/7/29 下午10:15, Eli Cohen wrote:

OK, we have a mode of operation that does not require driver
intervention to manipulate the event queues so I think we're ok with
this design.



Good to know this.

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [GIT PULL] virtio, qemu_fw: bugfixes

2020-07-30 Thread pr-tracker-bot
The pull request you sent on Thu, 30 Jul 2020 15:29:58 -0400:

> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/417385c47ef7ee0d4f48f63f70cca6c1ed6355f4

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[GIT PULL] virtio, qemu_fw: bugfixes

2020-07-30 Thread Michael S. Tsirkin
The following changes since commit 92ed301919932f13b9172e525674157e983d:

  Linux 5.8-rc7 (2020-07-26 14:14:06 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to a96b0d061d476093cf86ca1c2de06fc57163588d:

  virtio-mem: Fix build error due to improper use 'select' (2020-07-30 11:28:17 
-0400)


virtio, qemu_fw: bugfixes

A couple of last minute bugfixes.

Signed-off-by: Michael S. Tsirkin 


Alexander Duyck (1):
  virtio-balloon: Document byte ordering of poison_val

Michael S. Tsirkin (2):
  vhost/scsi: fix up req type endian-ness
  virtio_balloon: fix up endian-ness for free cmd id

Qiushi Wu (1):
  firmware: Fix a reference count leak.

Weilong Chen (1):
  virtio-mem: Fix build error due to improper use 'select'

 drivers/firmware/qemu_fw_cfg.c  |  7 ---
 drivers/vhost/scsi.c|  2 +-
 drivers/virtio/Kconfig  |  2 +-
 drivers/virtio/virtio_balloon.c | 11 ++-
 4 files changed, 16 insertions(+), 6 deletions(-)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] Add a vhost RPMsg API

2020-07-30 Thread Michael S. Tsirkin
On Wed, Jul 22, 2020 at 05:09:23PM +0200, Guennadi Liakhovetski wrote:
> Hi,
> 
> Now that virtio-rpmsg endianness fixes have been merged we can 
> proceed with the next step.

Which tree is this for?


> v4:
> - add endianness conversions to comply with the VirtIO standard
> 
> v3:
> - address several checkpatch warnings
> - address comments from Mathieu Poirier
> 
> v2:
> - update patch #5 with a correct vhost_dev_init() prototype
> - drop patch #6 - it depends on a different patch, that is currently
>   an RFC
> - address comments from Pierre-Louis Bossart:
>   * remove "default n" from Kconfig
> 
> Linux supports RPMsg over VirtIO for "remote processor" / AMP use
> cases. It can however also be used for virtualisation scenarios,
> e.g. when using KVM to run Linux on both the host and the guests.
> This patch set adds a wrapper API to facilitate writing vhost
> drivers for such RPMsg-based solutions. The first use case is an
> audio DSP virtualisation project, currently under development, ready
> for review and submission, available at
> https://github.com/thesofproject/linux/pull/1501/commits
> 
> Thanks
> Guennadi
> 
> Guennadi Liakhovetski (4):
>   vhost: convert VHOST_VSOCK_SET_RUNNING to a generic ioctl
>   rpmsg: move common structures and defines to headers
>   rpmsg: update documentation
>   vhost: add an RPMsg API
> 
>  Documentation/rpmsg.txt  |   6 +-
>  drivers/rpmsg/virtio_rpmsg_bus.c |  78 +--
>  drivers/vhost/Kconfig|   7 +
>  drivers/vhost/Makefile   |   3 +
>  drivers/vhost/rpmsg.c| 375 +++
>  drivers/vhost/vhost_rpmsg.h  |  74 ++
>  include/linux/virtio_rpmsg.h |  83 +++
>  include/uapi/linux/rpmsg.h   |   3 +
>  include/uapi/linux/vhost.h   |   4 +-
>  9 files changed, 553 insertions(+), 80 deletions(-)
>  create mode 100644 drivers/vhost/rpmsg.c
>  create mode 100644 drivers/vhost/vhost_rpmsg.h
>  create mode 100644 include/linux/virtio_rpmsg.h
> 
> -- 
> 2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 00/75] x86: SEV-ES Guest Support

2020-07-30 Thread Joerg Roedel
Hi Mike,

On Thu, Jul 30, 2020 at 01:27:48AM +, Mike Stunes wrote:
> Thanks for the updated patches! I applied this patch-set onto commit
> 01634f2bd42e ("Merge branch 'x86/urgent’”) from your tree. It boots,
> but CPU 1 (on a two-CPU VM) is offline at boot, and `chcpu -e 1` returns:
> 
> chcpu: CPU 1 enable failed: Input/output error
> 
> with nothing in dmesg to indicate why it failed. The first thing I thought
> of was anything relating to the AP jump table, but I haven’t changed
> anything there on the hypervisor side. Let me know what other data I can
> provide for you.

Hard to tell, have you enabled FSGSBASE in the guest? If yes, can you
try to disable it?

Regards,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v7 2/2] s390: virtio: PV needs VIRTIO I/O device protection

2020-07-30 Thread Pierre Morel


gentle ping.


On 2020-07-15 13:51, Michael S. Tsirkin wrote:

On Wed, Jul 15, 2020 at 06:16:59PM +0800, Jason Wang wrote:


On 2020/7/15 下午5:50, Michael S. Tsirkin wrote:

On Wed, Jul 15, 2020 at 10:31:09AM +0200, Pierre Morel wrote:

If protected virtualization is active on s390, the virtio queues are
not accessible to the host, unless VIRTIO_F_IOMMU_PLATFORM has been
negotiated. Use the new arch_validate_virtio_features() interface to
fail probe if that's not the case, preventing a host error on access
attempt.

Signed-off-by: Pierre Morel 
Reviewed-by: Cornelia Huck 
Acked-by: Halil Pasic 
Acked-by: Christian Borntraeger 
---
   arch/s390/mm/init.c | 28 
   1 file changed, 28 insertions(+)

diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 6dc7c3b60ef6..d39af6554d4f 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -45,6 +45,7 @@
   #include 
   #include 
   #include 
+#include 
   pgd_t swapper_pg_dir[PTRS_PER_PGD] __section(.bss..swapper_pg_dir);
@@ -161,6 +162,33 @@ bool force_dma_unencrypted(struct device *dev)
return is_prot_virt_guest();
   }
+/*
+ * arch_validate_virtio_features
+ * @dev: the VIRTIO device being added
+ *
+ * Return an error if required features are missing on a guest running
+ * with protected virtualization.
+ */
+int arch_validate_virtio_features(struct virtio_device *dev)
+{
+   if (!is_prot_virt_guest())
+   return 0;
+
+   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) {
+   dev_warn(&dev->dev,
+"legacy virtio not supported with protected 
virtualization\n");
+   return -ENODEV;
+   }
+
+   if (!virtio_has_feature(dev, VIRTIO_F_IOMMU_PLATFORM)) {
+   dev_warn(&dev->dev,
+"support for limited memory access required for protected 
virtualization\n");
+   return -ENODEV;
+   }
+
+   return 0;
+}
+
   /* protected virtualization */
   static void pv_init(void)
   {

What bothers me here is that arch code depends on virtio now.
It works even with a modular virtio when functions are inline,
but it seems fragile: e.g. it breaks virtio as an out of tree module,
since layout of struct virtio_device can change.



The code was only called from virtio.c so it should be fine.

And my understanding is that we don't need to care about the kABI issue
during upstream development?

Thanks


No, but so far it has been convenient at least for me, for development,
to just be able to unload all of virtio and load a different version.






I'm not sure what to do with this yet, will try to think about it
over the weekend. Thanks!



--
2.25.1




--
Pierre Morel
IBM Lab Boeblingen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 4/6] mm/page_isolation: cleanup set_migratetype_isolate()

2020-07-30 Thread David Hildenbrand
Let's clean it up a bit, simplifying error handling and getting rid of
the label.

Reviewed-by: Baoquan He 
Reviewed-by: Pankaj Gupta 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Cc: Mike Kravetz 
Signed-off-by: David Hildenbrand 
---
 mm/page_isolation.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d099aac479601..e65fe5d770849 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -17,12 +17,9 @@
 
 static int set_migratetype_isolate(struct page *page, int migratetype, int 
isol_flags)
 {
-   struct page *unmovable = NULL;
-   struct zone *zone;
+   struct zone *zone = page_zone(page);
+   struct page *unmovable;
unsigned long flags;
-   int ret = -EBUSY;
-
-   zone = page_zone(page);
 
spin_lock_irqsave(&zone->lock, flags);
 
@@ -51,13 +48,13 @@ static int set_migratetype_isolate(struct page *page, int 
migratetype, int isol_
NULL);
 
__mod_zone_freepage_state(zone, -nr_pages, mt);
-   ret = 0;
+   spin_unlock_irqrestore(&zone->lock, flags);
+   drain_all_pages(zone);
+   return 0;
}
 
spin_unlock_irqrestore(&zone->lock, flags);
-   if (!ret) {
-   drain_all_pages(zone);
-   } else if ((isol_flags & REPORT_FAILURE) && unmovable) {
+   if (isol_flags & REPORT_FAILURE) {
/*
 * printk() with zone->lock held will likely trigger a
 * lockdep splat, so defer it here.
@@ -65,7 +62,7 @@ static int set_migratetype_isolate(struct page *page, int 
migratetype, int isol_
dump_page(unmovable, "unmovable page");
}
 
-   return ret;
+   return -EBUSY;
 }
 
 static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 5/6] virtio-mem: don't special-case ZONE_MOVABLE

2020-07-30 Thread David Hildenbrand
Let's allow to online partially plugged memory blocks to ZONE_MOVABLE
and also consider memory blocks that were onlined to ZONE_MOVABLE when
unplugging memory. While unplugged memory blocks are, in general,
unmovable, they can be skipped when offlining memory.

virtio-mem only unplugs fairly big chunks (in the megabyte range) and
rather tries to shrink the memory region than randomly choosing memory. In
theory, if all other pages in the movable zone would be movable, virtio-mem
would only shrink that zone and not create any kind of fragmentation.

In the future, we might want to remember the zone again and use the
information when (un)plugging memory. For now, let's keep it simple.

Note: Support for defragmentation is planned, to deal with fragmentation
after unplug due to memory chunks within memory blocks that could not
get unplugged before (e.g., somebody pinning pages within ZONE_MOVABLE
for a longer time).

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Mike Kravetz 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio_mem.c | 47 +++--
 1 file changed, 8 insertions(+), 39 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index f26f5f64ae822..2ddfc4a0e2ee0 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -36,18 +36,10 @@ enum virtio_mem_mb_state {
VIRTIO_MEM_MB_STATE_OFFLINE,
/* Partially plugged, fully added to Linux, offline. */
VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL,
-   /* Fully plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+   /* Fully plugged, fully added to Linux, online. */
VIRTIO_MEM_MB_STATE_ONLINE,
-   /* Partially plugged, fully added to Linux, online (!ZONE_MOVABLE). */
+   /* Partially plugged, fully added to Linux, online. */
VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL,
-   /*
-* Fully plugged, fully added to Linux, online (ZONE_MOVABLE).
-* We are not allowed to allocate (unplug) parts of this block that
-* are not movable (similar to gigantic pages). We will never allow
-* to online OFFLINE_PARTIAL to ZONE_MOVABLE (as they would contain
-* unmovable parts).
-*/
-   VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE,
VIRTIO_MEM_MB_STATE_COUNT
 };
 
@@ -526,21 +518,10 @@ static bool virtio_mem_owned_mb(struct virtio_mem *vm, 
unsigned long mb_id)
 }
 
 static int virtio_mem_notify_going_online(struct virtio_mem *vm,
- unsigned long mb_id,
- enum zone_type zone)
+ unsigned long mb_id)
 {
switch (virtio_mem_mb_get_state(vm, mb_id)) {
case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
-   /*
-* We won't allow to online a partially plugged memory block
-* to the MOVABLE zone - it would contain unmovable parts.
-*/
-   if (zone == ZONE_MOVABLE) {
-   dev_warn_ratelimited(&vm->vdev->dev,
-"memory block has holes, MOVABLE 
not supported\n");
-   return NOTIFY_BAD;
-   }
-   return NOTIFY_OK;
case VIRTIO_MEM_MB_STATE_OFFLINE:
return NOTIFY_OK;
default:
@@ -560,7 +541,6 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm,
VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL);
break;
case VIRTIO_MEM_MB_STATE_ONLINE:
-   case VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE:
virtio_mem_mb_set_state(vm, mb_id,
VIRTIO_MEM_MB_STATE_OFFLINE);
break;
@@ -579,24 +559,17 @@ static void virtio_mem_notify_offline(struct virtio_mem 
*vm,
virtio_mem_retry(vm);
 }
 
-static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long 
mb_id,
-enum zone_type zone)
+static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long 
mb_id)
 {
unsigned long nb_offline;
 
switch (virtio_mem_mb_get_state(vm, mb_id)) {
case VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL:
-   BUG_ON(zone == ZONE_MOVABLE);
virtio_mem_mb_set_state(vm, mb_id,
VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL);
break;
case VIRTIO_MEM_MB_STATE_OFFLINE:
-   if (zone == ZONE_MOVABLE)
-   virtio_mem_mb_set_state(vm, mb_id,
-   VIRTIO_MEM_MB_STATE_ONLINE_MOVABLE);
-   else
-   virtio_mem_mb_set_state(vm, mb_id,
-   VIRTIO_MEM_MB_STATE_ONLINE);
+   virtio_mem_mb_set_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE);
 

[PATCH v2 3/6] mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()

2020-07-30 Thread David Hildenbrand
Inside has_unmovable_pages(), we have a comment describing how unmovable
data could end up in ZONE_MOVABLE - via "movable_core". Also, besides
checking if the first page in the pageblock is reserved, we don't
perform any further checks in case of ZONE_MOVABLE.

In case of memory offlining, we set REPORT_FAILURE, properly
dump_page() the page and handle the error gracefully.
alloc_contig_pages() users currently never allocate from ZONE_MOVABLE.
E.g., hugetlb uses alloc_contig_pages() for the allocation of gigantic
pages only, which will never end up on the MOVABLE zone
(see htlb_alloc_mask()).

Reviewed-by: Baoquan He 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Cc: Mike Kravetz 
Cc: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
---
 mm/page_isolation.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 7d7d263ce7f4b..d099aac479601 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -57,15 +57,12 @@ static int set_migratetype_isolate(struct page *page, int 
migratetype, int isol_
spin_unlock_irqrestore(&zone->lock, flags);
if (!ret) {
drain_all_pages(zone);
-   } else {
-   WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
-
-   if ((isol_flags & REPORT_FAILURE) && unmovable)
-   /*
-* printk() with zone->lock held will likely trigger a
-* lockdep splat, so defer it here.
-*/
-   dump_page(unmovable, "unmovable page");
+   } else if ((isol_flags & REPORT_FAILURE) && unmovable) {
+   /*
+* printk() with zone->lock held will likely trigger a
+* lockdep splat, so defer it here.
+*/
+   dump_page(unmovable, "unmovable page");
}
 
return ret;
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 6/6] mm: document semantics of ZONE_MOVABLE

2020-07-30 Thread David Hildenbrand
Let's document what ZONE_MOVABLE means, how it's used, and which special
cases we have regarding unmovable pages (memory offlining vs. migration /
allocations).

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Cc: Mike Kravetz 
Cc: Mike Rapoport 
Cc: Pankaj Gupta 
Cc: Baoquan He 
Signed-off-by: David Hildenbrand 
---
 include/linux/mmzone.h | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f6f884970511d..b8c49b2aff684 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -372,6 +372,40 @@ enum zone_type {
 */
ZONE_HIGHMEM,
 #endif
+   /*
+* ZONE_MOVABLE is similar to ZONE_NORMAL, except that it *primarily*
+* only contains movable pages. Main use cases are to make memory
+* offlining more likely to succeed, and to locally limit unmovable
+* allocations - e.g., to increase the number of THP/huge pages.
+* Notable special cases:
+*
+* 1. Pinned pages: (Long-term) pinning of movable pages might
+*essentially turn such pages unmovable. Memory offlining might
+*retry a long time.
+* 2. memblock allocations: kernelcore/movablecore setups might create
+*situations where ZONE_MOVABLE contains unmovable allocations
+*after boot. Memory offlining and allocations fail early.
+* 3. Memory holes: Such pages cannot be allocated. Applies only to
+*boot memory, not hotplugged memory. Memory offlining and
+*allocations fail early.
+* 4. PG_hwpoison pages: While poisoned pages can be skipped during
+*memory offlining, such pages cannot be allocated.
+* 5. Unmovable PG_offline pages: In paravirtualized environments,
+*hotplugged memory blocks might only partially be managed by the
+*buddy (e.g., via XEN-balloon, Hyper-V balloon, virtio-mem). The
+*parts not manged by the buddy are unmovable PG_offline pages. In
+*some cases (virtio-mem), such pages can be skipped during
+*memory offlining, however, cannot be moved/allcoated. These
+*techniques might use alloc_contig_range() to hide previously
+*exposed pages from the buddy again (e.g., to implement some sort
+*of memory unplug in virtio-mem).
+*
+* In general, no unmovable allocations that degrade memory offlining
+* should end up in ZONE_MOVABLE. Allocators (like alloc_contig_range())
+* have to expect that migrating pages in ZONE_MOVABLE can fail (even
+* if has_unmovable_pages() states that there are no unmovable pages,
+* there can be false negatives).
+*/
ZONE_MOVABLE,
 #ifdef CONFIG_ZONE_DEVICE
ZONE_DEVICE,
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/6] mm/page_alloc: tweak comments in has_unmovable_pages()

2020-07-30 Thread David Hildenbrand
Let's move the split comment regarding bootmem allocations and memory
holes, especially in the context of ZONE_MOVABLE, to the PageReserved()
check.

Reviewed-by: Baoquan He 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Cc: Mike Kravetz 
Cc: Pankaj Gupta 
Signed-off-by: David Hildenbrand 
---
 mm/page_alloc.c | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e028b87ce2942..042ba09d70c5d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8207,14 +8207,6 @@ struct page *has_unmovable_pages(struct zone *zone, 
struct page *page,
unsigned long iter = 0;
unsigned long pfn = page_to_pfn(page);
 
-   /*
-* TODO we could make this much more efficient by not checking every
-* page in the range if we know all of them are in MOVABLE_ZONE and
-* that the movable zone guarantees that pages are migratable but
-* the later is not the case right now unfortunatelly. E.g. movablecore
-* can still lead to having bootmem allocations in zone_movable.
-*/
-
if (is_migrate_cma_page(page)) {
/*
 * CMA allocations (alloc_contig_range) really need to mark
@@ -8233,6 +8225,12 @@ struct page *has_unmovable_pages(struct zone *zone, 
struct page *page,
 
page = pfn_to_page(pfn + iter);
 
+   /*
+* Both, bootmem allocations and memory holes are marked
+* PG_reserved and are unmovable. We can even have unmovable
+* allocations inside ZONE_MOVABLE, for example when
+* specifying "movablecore".
+*/
if (PageReserved(page))
return page;
 
@@ -8306,14 +8304,6 @@ struct page *has_unmovable_pages(struct zone *zone, 
struct page *page,
 * it.  But now, memory offline itself doesn't call
 * shrink_node_slabs() and it still to be fixed.
 */
-   /*
-* If the page is not RAM, page_count()should be 0.
-* we don't need more check. This is an _used_ not-movable page.
-*
-* The problematic thing here is PG_reserved pages. PG_reserved
-* is set to both of a memory hole page and a _used_ kernel
-* page at boot.
-*/
return page;
}
return NULL;
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/6] mm / virtio-mem: support ZONE_MOVABLE

2020-07-30 Thread David Hildenbrand
@Andrew, @Mst, I suggest the whole series (including the virtio-mem
change) goes via the -mm tree.

Currently, virtio-mem does not really support ZONE_MOVABLE. While it allows
to online fully plugged memory blocks to ZONE_MOVABLE, it does not allow
to online partially-plugged memory blocks to ZONE_MOVABLE and will never
consider such memory blocks when unplugging memory. This might be
surprising for users (especially, if onlining suddenly fails).

Let's support partially plugged memory blocks in ZONE_MOVABLE, allowing
partially plugged memory blocks to be online to ZONE_MOVABLE and also
unplugging from such memory blocks.

This is especially helpful for testing, but also paves the way for
virtio-mem optimizations, allowing more memory to get reliably unplugged.

Cleanup has_unmovable_pages() and set_migratetype_isolate(), providing
better documentation of how ZONE_MOVABLE interacts with different kind of
unmovable pages (memory offlining vs. alloc_contig_range()).

v1 -> v2:
- "mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()"
-- Move to position 1, add Fixes: tag
-- Drop unused "out:" label
- "mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()"
-- Keep curly braces on "else" case
- Replace "[PATCH v1 5/6] mm/page_alloc: restrict ZONE_MOVABLE optimization
   in has_unmovable_pages() to memory offlining"
  by "mm: document semantics of ZONE_MOVABLE"
-- Brain dump of what I know about ZONE_MOVABLE :)

David Hildenbrand (6):
  mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()
  mm/page_alloc: tweak comments in has_unmovable_pages()
  mm/page_isolation: drop WARN_ON_ONCE() in set_migratetype_isolate()
  mm/page_isolation: cleanup set_migratetype_isolate()
  virtio-mem: don't special-case ZONE_MOVABLE
  mm: document semantics of ZONE_MOVABLE

 drivers/virtio/virtio_mem.c | 47 +++--
 include/linux/mmzone.h  | 34 +++
 mm/page_alloc.c | 22 +
 mm/page_isolation.c | 39 ++
 4 files changed, 65 insertions(+), 77 deletions(-)

-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 1/6] mm/page_isolation: don't dump_page(NULL) in set_migratetype_isolate()

2020-07-30 Thread David Hildenbrand
Right now, if we have two isolations racing, we might trigger the
WARN_ON_ONCE() and to dump_page(NULL), dereferencing NULL. Let's just
return directly.

In the future, we might want to report -EAGAIN to the caller instead, as
this could indicate a temporary isolation failure only.

Reviewed-by: Baoquan He 
Reviewed-by: Pankaj Gupta 
Acked-by: Mike Kravetz 
Fixes: 4a55c0474a92 ("mm/hotplug: silence a lockdep splat with printk()")
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Cc: Qian Cai 
Signed-off-by: David Hildenbrand 
---
 mm/page_isolation.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f6d07c5f0d34d..7d7d263ce7f4b 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -29,10 +29,12 @@ static int set_migratetype_isolate(struct page *page, int 
migratetype, int isol_
/*
 * We assume the caller intended to SET migrate type to isolate.
 * If it is already set, then someone else must have raced and
-* set it before us.  Return -EBUSY
+* set it before us.
 */
-   if (is_migrate_isolate_page(page))
-   goto out;
+   if (is_migrate_isolate_page(page)) {
+   spin_unlock_irqrestore(&zone->lock, flags);
+   return -EBUSY;
+   }
 
/*
 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
@@ -52,7 +54,6 @@ static int set_migratetype_isolate(struct page *page, int 
migratetype, int isol_
ret = 0;
}
 
-out:
spin_unlock_irqrestore(&zone->lock, flags);
if (!ret) {
drain_all_pages(zone);
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-blk: fix discard buffer overrun

2020-07-30 Thread Jason Wang


On 2020/7/30 下午4:30, Jeffle Xu wrote:

Before commit eded341c085b ("block: don't decrement nr_phys_segments for
physically contigous segments") applied, the generic block layer may not
guarantee that @req->nr_phys_segments equals the number of bios in the
request. When limits.@max_discard_segments == 1 and the IO scheduler is
set to scheduler except for "none" (mq-deadline, kyber or bfq, e.g.),
the requests merge routine may be called when enqueuing a DISCARD bio.
When merging two requests, @req->nr_phys_segments will minus one if the
middle two bios of the requests can be merged into one physical segment,
causing @req->nr_phys_segments one less than the number of bios in the
DISCARD request.

In this case, we are in risk of overruning virtio_blk_discard_write_zeroes
buffers. Though this issue has been fixed by commit eded341c085b
("block: don't decrement nr_phys_segments for physically contigous segments"),
it can recure once the generic block layer breaks the guarantee as code
refactoring.

commit 8cb6af7b3a6d ("nvme: Fix discard buffer overrun") has fixed the similar
issue in nvme driver. This patch is also inspired by this commit.

Signed-off-by: Jeffle Xu 
Reviewed-by: Joseph Qi 
---
  drivers/block/virtio_blk.c | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 980df853ee49..248c8f46b51c 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -130,12 +130,19 @@ static int virtblk_setup_discard_write_zeroes(struct 
request *req, bool unmap)
u64 sector = bio->bi_iter.bi_sector;
u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT;
  
-		range[n].flags = cpu_to_le32(flags);

-   range[n].num_sectors = cpu_to_le32(num_sectors);
-   range[n].sector = cpu_to_le64(sector);
+   if (n < segments) {
+   range[n].flags = cpu_to_le32(flags);
+   range[n].num_sectors = cpu_to_le32(num_sectors);
+   range[n].sector = cpu_to_le64(sector);
+   }



Not familiar with block but if we start to duplicate checks like this, 
it's a hint to move it in the core.




n++;
}
  
+	if (WARN_ON_ONCE(n != segments)) {

+   kfree(range);
+   return -EIO;
+   }
+
req->special_vec.bv_page = virt_to_page(range);
req->special_vec.bv_offset = offset_in_page(range);
req->special_vec.bv_len = sizeof(*range) * segments;
@@ -246,8 +253,14 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx 
*hctx,
  
  	if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) {

err = virtblk_setup_discard_write_zeroes(req, unmap);
-   if (err)
-   return BLK_STS_RESOURCE;
+   if (err) {
+   switch (err) {
+   case -ENOMEM:
+   return BLK_STS_RESOURCE;
+   default:
+   return BLK_STS_IOERR;
+   }
+   }



This looks not elegant, why not simple if (err  == -ENOMEM) else if 
(err) ...


Thanks



}
  
  	num = blk_rq_map_sg(hctx->queue, req, vbr->sg);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization