Re: [virtio-dev] Re: guest / host buffer sharing ...
> > > Without buffer sharing support the driver importing a virtio-gpu dma-buf > > > can send the buffer scatter list to the host. So both virtio-gpu and > > > the other device would actually access the same guest pages, but they > > > are not aware that the buffer is shared between devices. > > > > With the uuid approach, how should this case be handled? Should it be > > equivalent to exporting and importing the buffer which was created > > first? Should the spec say it's undefined behavior that might work as > > expected but might not, depending on the device implementation? Does > > the spec even need to say anything about it? > > Using the uuid is an optional optimization. I'd expect the workflow be > roughly this: > > (1) exporting driver exports a dma-buf as usual, additionally attaches > a uuid to it and notifies the host (using device-specific commands). > (2) importing driver will ask the host to use the buffer referenced by > the given uuid. > (3) if (2) fails for some reason use the dma-buf scatter list instead. > > Of course only virtio drivers would try step (2), other drivers (when > sharing buffers between intel gvt device and virtio-gpu for example) > would go straight to (3). For virtio-gpu as it is today, it's not clear to me that they're equivalent. As I read it, the virtio-gpu spec makes a distinction between the guest memory and the host resource. If virtio-gpu is communicating with non-virtio devices, then obviously you'd just be working with guest memory. But if it's communicating with another virtio device, then there are potentially distinct guest and host buffers that could be used. The spec shouldn't leave any room for ambiguity as to how this distinction is handled. > > Not just buffers not backed by guest ram, but things like fences. I > > would suggest the uuids represent 'exported resources' rather than > > 'exported buffers'. > > Hmm, I can't see how this is useful. Care to outline how you envision > this to work in a typical use case? Looking at the spec again, it seems like there's some more work that would need to be done before this would be possible. But the use case I was thinking of would be to export a fence from virtio-gpu and share it with a virtio decoder, to set up a decode pipeline that doesn't need to go back into the guest for synchronization. I'm fine dropping this point for now, though, and revisiting it as a separate proposal. -David - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH RFC v4 13/13] virtio-mem: Drop slab objects when unplug continues to fail
Start dropping slab objects after 30 minutes and repeat every 30 minutes in case we can't unplug more memory using alloc_contig_range(). Log messages and make it configurable. Enable dropping slab objects as default (especially, reclaimable slab objects that are not movable). In the future, we might want to implement+use drop_slab_range(), which will also come in handy for other users (e.g., offlining, gigantic huge pages). Cc: Alexander Viro Cc: Andrew Morton Cc: Michal Hocko Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Oscar Salvador Cc: Igor Mammedov Cc: Dave Young Cc: Dan Williams Cc: Pavel Tatashin Cc: Stefan Hajnoczi Cc: Vlastimil Babka Cc: linux-fsde...@vger.kernel.org Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 3a57434f92ed..8f25f7453a08 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -25,6 +25,11 @@ static bool unplug_online = true; module_param(unplug_online, bool, 0644); MODULE_PARM_DESC(unplug_online, "Try to unplug online memory"); +static bool drop_slab_objects = true; +module_param(drop_slab_objects, bool, 0644); +MODULE_PARM_DESC(drop_slab_objects, +"Drop slab objects when unplug continues to fail"); + enum virtio_mem_mb_state { /* Unplugged, not added to Linux. Can be reused later. */ VIRTIO_MEM_MB_STATE_UNUSED = 0, @@ -1384,6 +1389,7 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm, static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff) { uint64_t nb_sb = diff / vm->subblock_size; + bool retried = false; unsigned long mb_id; int rc; @@ -1421,6 +1427,7 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff) return 0; } +retry_locked: /* Try to unplug subblocks of partially plugged online blocks. */ virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_ONLINE_PARTIAL) { @@ -1445,6 +1452,29 @@ static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff) } mutex_unlock(>hotplug_mutex); + + /* +* If we can't unplug the requested amount of memory for a long time, +* start freeing up memory in caches. This might harm performance, +* is configurable, and we log a message. Retry imemdiately a second +* time - then wait another VIRTIO_MEM_RETRY_TIMER_MAX_MS. +*/ + if (nb_sb && !retried && drop_slab_objects && + vm->retry_timer_ms == VIRTIO_MEM_RETRY_TIMER_MAX_MS) { + if (vm->nid == NUMA_NO_NODE) { + dev_info(>vdev->dev, "dropping all slab objects\n"); + drop_slab(); + } else { + dev_info(>vdev->dev, +"dropping all slab objects on node=%d\n", +vm->nid); + drop_slab_node(vm->nid); + } + retried = true; + mutex_lock(>hotplug_mutex); + goto retry_locked; + } + return nb_sb ? -EBUSY : 0; out_unlock: mutex_unlock(>hotplug_mutex); -- 2.23.0 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH RFC v4 11/13] mm/vmscan: Move count_vm_event(DROP_SLAB) into drop_slab()
Let's count within the function itself, so every invocation (of future users) will be counted. Cc: Alexander Viro Cc: Andrew Morton Cc: linux-fsde...@vger.kernel.org Signed-off-by: David Hildenbrand --- fs/drop_caches.c | 4 +--- mm/vmscan.c | 1 + 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/drop_caches.c b/fs/drop_caches.c index d31b6c72b476..a042da782fcd 100644 --- a/fs/drop_caches.c +++ b/fs/drop_caches.c @@ -61,10 +61,8 @@ int drop_caches_sysctl_handler(struct ctl_table *table, int write, iterate_supers(drop_pagecache_sb, NULL); count_vm_event(DROP_PAGECACHE); } - if (sysctl_drop_caches & 2) { + if (sysctl_drop_caches & 2) drop_slab(); - count_vm_event(DROP_SLAB); - } if (!stfu) { pr_info("%s (%d): drop_caches: %d\n", current->comm, task_pid_nr(current), diff --git a/mm/vmscan.c b/mm/vmscan.c index 5a6445e86328..c3e53502a84a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -726,6 +726,7 @@ void drop_slab(void) for_each_online_node(nid) drop_slab_node(nid); + count_vm_event(DROP_SLAB); } static inline int is_page_cache_freeable(struct page *page) -- 2.23.0 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH RFC v4 09/13] virtio-mem: Offline and remove completely unplugged memory blocks
Let's offline+remove memory blocks once all subblocks are unplugged. We can use the new Linux MM interface for that. As no memory is in use anymore, this shouldn't take a long time and shouldn't fail. There might be corner cases where the offlining could still fail (especially, if another notifier NACKs the offlining request). Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Oscar Salvador Cc: Michal Hocko Cc: Igor Mammedov Cc: Dave Young Cc: Andrew Morton Cc: Dan Williams Cc: Pavel Tatashin Cc: Stefan Hajnoczi Cc: Vlastimil Babka Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 47 + 1 file changed, 43 insertions(+), 4 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index a12a0f9c076b..807d4e393427 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -436,6 +436,28 @@ static int virtio_mem_mb_remove(struct virtio_mem *vm, unsigned long mb_id) return remove_memory(nid, addr, memory_block_size_bytes()); } +/* + * Try to offline and remove a memory block from Linux. + * + * Must not be called with the vm->hotplug_mutex held (possible deadlock with + * onlining code). + * + * Will not modify the state of the memory block. + */ +static int virtio_mem_mb_offline_and_remove(struct virtio_mem *vm, + unsigned long mb_id) +{ + const uint64_t addr = virtio_mem_mb_id_to_phys(mb_id); + int nid = vm->nid; + + if (nid == NUMA_NO_NODE) + nid = memory_add_physaddr_to_nid(addr); + + dev_dbg(>vdev->dev, "offlining and removing memory block: %lu\n", + mb_id); + return offline_and_remove_memory(nid, addr, memory_block_size_bytes()); +} + /* * Trigger the workqueue so the device can perform its magic. */ @@ -529,7 +551,13 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm, break; } - /* trigger the workqueue, maybe we can now unplug memory. */ + /* +* Trigger the workqueue, maybe we can now unplug memory. Also, +* when we offline and remove a memory block, this will re-trigger +* us immediately - which is often nice because the removal of +* the memory block (e.g., memmap) might have freed up memory +* on other memory blocks we manage. +*/ virtio_mem_retry(vm); } @@ -1275,7 +1303,8 @@ static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm, * Unplug the desired number of plugged subblocks of an online memory block. * Will skip subblock that are busy. * - * Will modify the state of the memory block. + * Will modify the state of the memory block. Might temporarily drop the + * hotplug_mutex. * * Note: Can fail after some subblocks were successfully unplugged. Can * return 0 even if subblocks were busy and could not get unplugged. @@ -1331,9 +1360,19 @@ static int virtio_mem_mb_unplug_any_sb_online(struct virtio_mem *vm, } /* -* TODO: Once all subblocks of a memory block were unplugged, we want -* to offline the memory block and remove it. +* Once all subblocks of a memory block were unplugged, offline and +* remove it. This will usually not fail, as no memory is in use +* anymore - however some other notifiers might NACK the request. */ + if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) { + mutex_unlock(>hotplug_mutex); + rc = virtio_mem_mb_offline_and_remove(vm, mb_id); + mutex_lock(>hotplug_mutex); + if (!rc) + virtio_mem_mb_set_state(vm, mb_id, + VIRTIO_MEM_MB_STATE_UNUSED); + } + return 0; } -- 2.23.0 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH RFC v4 12/13] mm/vmscan: Export drop_slab() and drop_slab_node()
We already have a way to trigger reclaiming of all reclaimable slab objects from user space (echo 2 > /proc/sys/vm/drop_caches). Let's allow drivers to also trigger this when they really want to make progress and know what they are doing. virtio-mem wants to use these functions when it failed to unplug memory for quite some time (e.g., after 30 minutes). It will then try to free up reclaimable objects by dropping the slab caches every now and then (e.g., every 30 minutes) as long as necessary. There will be a way to disable this feature and info messages will be logged. In the future, we want to have a drop_slab_range() functionality instead. Memory offlining code has similar demands and also other alloc_contig_range() users (e.g., gigantic pages) could make good use of this feature. Adding it, however, requires more work/thought. Cc: Andrew Morton Cc: Michal Hocko Signed-off-by: David Hildenbrand --- include/linux/mm.h | 4 ++-- mm/vmscan.c| 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 64799c5cb39f..483300f58be8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2706,8 +2706,8 @@ int drop_caches_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); #endif -void drop_slab(void); -void drop_slab_node(int nid); +extern void drop_slab(void); +extern void drop_slab_node(int nid); #ifndef CONFIG_MMU #define randomize_va_space 0 diff --git a/mm/vmscan.c b/mm/vmscan.c index c3e53502a84a..4e1cdaaec5e6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -719,6 +719,7 @@ void drop_slab_node(int nid) } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL); } while (freed > 10); } +EXPORT_SYMBOL(drop_slab_node); void drop_slab(void) { @@ -728,6 +729,7 @@ void drop_slab(void) drop_slab_node(nid); count_vm_event(DROP_SLAB); } +EXPORT_SYMBOL(drop_slab); static inline int is_page_cache_freeable(struct page *page) { -- 2.23.0 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH RFC v4 10/13] virtio-mem: Better retry handling
Let's start with a retry interval of 30 seconds and double the time until we reach 30 minutes, in case we keep getting errors. Reset the retry interval in case we succeeded. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Oscar Salvador Cc: Michal Hocko Cc: Igor Mammedov Cc: Dave Young Cc: Andrew Morton Cc: Dan Williams Cc: Pavel Tatashin Cc: Stefan Hajnoczi Cc: Vlastimil Babka Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 807d4e393427..3a57434f92ed 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -137,7 +137,9 @@ struct virtio_mem { /* Timer for retrying to plug/unplug memory. */ struct hrtimer retry_timer; -#define VIRTIO_MEM_RETRY_TIMER_MS 3 + unsigned int retry_timer_ms; +#define VIRTIO_MEM_RETRY_TIMER_MIN_MS 3 +#define VIRTIO_MEM_RETRY_TIMER_MAX_MS 180 /* Memory notifier (online/offline events). */ struct notifier_block memory_notifier; @@ -1537,6 +1539,7 @@ static void virtio_mem_run_wq(struct work_struct *work) switch (rc) { case 0: + vm->retry_timer_ms = VIRTIO_MEM_RETRY_TIMER_MIN_MS; break; case -ENOSPC: /* @@ -1552,8 +1555,7 @@ static void virtio_mem_run_wq(struct work_struct *work) */ case -ENOMEM: /* Out of memory, try again later. */ - hrtimer_start(>retry_timer, - ms_to_ktime(VIRTIO_MEM_RETRY_TIMER_MS), + hrtimer_start(>retry_timer, ms_to_ktime(vm->retry_timer_ms), HRTIMER_MODE_REL); break; case -EAGAIN: @@ -1573,6 +1575,9 @@ static enum hrtimer_restart virtio_mem_timer_expired(struct hrtimer *timer) retry_timer); virtio_mem_retry(vm); + /* Racy (with reset in virtio_mem_run_wq), we ignore that for now. */ + vm->retry_timer_ms = min_t(unsigned int, vm->retry_timer_ms * 2, + VIRTIO_MEM_RETRY_TIMER_MAX_MS); return HRTIMER_NORESTART; } @@ -1746,6 +1751,7 @@ static int virtio_mem_probe(struct virtio_device *vdev) spin_lock_init(>removal_lock); hrtimer_init(>retry_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); vm->retry_timer.function = virtio_mem_timer_expired; + vm->retry_timer_ms = VIRTIO_MEM_RETRY_TIMER_MIN_MS; /* register the virtqueue */ rc = virtio_mem_init_vq(vm); -- 2.23.0 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH RFC v4 03/13] virtio-mem: Paravirtualized memory hotunplug part 1
Unplugging subblocks of memory blocks that are offline is easy. All we have to do is watch out for concurrent onlining acticity. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Oscar Salvador Cc: Michal Hocko Cc: Igor Mammedov Cc: Dave Young Cc: Andrew Morton Cc: Dan Williams Cc: Pavel Tatashin Cc: Stefan Hajnoczi Cc: Vlastimil Babka Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 109 +++- 1 file changed, 107 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 8f4ceeab3d7c..f1af05def5df 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -119,7 +119,7 @@ struct virtio_mem { * * When this lock is held the pointers can't change, ONLINE and * OFFLINE blocks can't change the state and no subblocks will get -* plugged. +* plugged/unplugged. */ struct mutex hotplug_mutex; bool hotplug_active; @@ -321,6 +321,19 @@ static bool virtio_mem_mb_test_sb_plugged(struct virtio_mem *vm, bit + count; } +/* + * Test if all selected subblocks are unplugged. + */ +static bool virtio_mem_mb_test_sb_unplugged(struct virtio_mem *vm, + unsigned long mb_id, int sb_id, + int count) +{ + const int bit = (mb_id - vm->first_mb_id) * vm->nb_sb_per_mb + sb_id; + + /* TODO: Helper similar to bitmap_set() */ + return find_next_bit(vm->sb_bitmap, bit + count, bit) >= bit + count; +} + /* * Find the first plugged subblock. Returns vm->nb_sb_per_mb in case there is * none. @@ -511,6 +524,9 @@ static void virtio_mem_notify_offline(struct virtio_mem *vm, BUG(); break; } + + /* trigger the workqueue, maybe we can now unplug memory. */ + virtio_mem_retry(vm); } static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id, @@ -1123,6 +1139,93 @@ static int virtio_mem_plug_request(struct virtio_mem *vm, uint64_t diff) return rc; } +/* + * Unplug the desired number of plugged subblocks of an offline memory block. + * Will fail if any subblock cannot get unplugged (instead of skipping it). + * + * Will modify the state of the memory block. Might temporarily drop the + * hotplug_mutex. + * + * Note: Can fail after some subblocks were successfully unplugged. + */ +static int virtio_mem_mb_unplug_any_sb_offline(struct virtio_mem *vm, + unsigned long mb_id, + uint64_t *nb_sb) +{ + int rc; + + rc = virtio_mem_mb_unplug_any_sb(vm, mb_id, nb_sb); + + /* some subblocks might have been unplugged even on failure */ + if (!virtio_mem_mb_test_sb_plugged(vm, mb_id, 0, vm->nb_sb_per_mb)) + virtio_mem_mb_set_state(vm, mb_id, + VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL); + if (rc) + return rc; + + if (virtio_mem_mb_test_sb_unplugged(vm, mb_id, 0, vm->nb_sb_per_mb)) { + /* +* Remove the block from Linux - this should never fail. +* Hinder the block from getting onlined by marking it +* unplugged. Temporarily drop the mutex, so +* any pending GOING_ONLINE requests can be serviced/rejected. +*/ + virtio_mem_mb_set_state(vm, mb_id, + VIRTIO_MEM_MB_STATE_UNUSED); + + mutex_unlock(>hotplug_mutex); + rc = virtio_mem_mb_remove(vm, mb_id); + BUG_ON(rc); + mutex_lock(>hotplug_mutex); + } + return 0; +} + +/* + * Try to unplug the requested amount of memory. + */ +static int virtio_mem_unplug_request(struct virtio_mem *vm, uint64_t diff) +{ + uint64_t nb_sb = diff / vm->subblock_size; + unsigned long mb_id; + int rc; + + if (!nb_sb) + return 0; + + /* +* We'll drop the mutex a couple of times when it is safe to do so. +* This might result in some blocks switching the state (online/offline) +* and we could miss them in this run - we will retry again later. +*/ + mutex_lock(>hotplug_mutex); + + /* Try to unplug subblocks of partially plugged offline blocks. */ + virtio_mem_for_each_mb_state(vm, mb_id, +VIRTIO_MEM_MB_STATE_OFFLINE_PARTIAL) { + rc = virtio_mem_mb_unplug_any_sb_offline(vm, mb_id, +_sb); + if (rc || !nb_sb) + goto out_unlock; + cond_resched(); + } + + /* Try to unplug subblocks of plugged offline blocks. */ + virtio_mem_for_each_mb_state(vm, mb_id, VIRTIO_MEM_MB_STATE_OFFLINE) { +
[virtio-dev] [PATCH RFC v4 02/13] virtio-mem: Paravirtualized memory hotplug
Each virtio-mem device owns exactly one memory region. It is responsible for adding/removing memory from that memory region on request. When the device driver starts up, the requested amount of memory is queried and then plugged to Linux. On request, further memory can be plugged or unplugged. This patch only implements the plugging part. On x86-64, memory can currently be plugged in 4MB ("subblock") granularity. When required, a new memory block will be added (e.g., usually 128MB on x86-64) in order to plug more subblocks. Only x86-64 was tested for now. The online_page callback is used to keep unplugged subblocks offline when onlining memory - similar to the Hyper-V balloon driver. Unplugged pages are marked PG_offline, to tell dump tools (e.g., makedumpfile) to skip them. User space is usually responsible for onlining the added memory. The memory hotplug notifier is used to synchronize virtio-mem activity against memory onlining/offlining. Each virtio-mem device can belong to a NUMA node, which allows us to easily add/remove small chunks of memory to/from a specific NUMA node by using multiple virtio-mem devices. Something that works even when the guest has no idea about the NUMA topology. One way to view virtio-mem is as a "resizable DIMM" or a DIMM with many "sub-DIMMS". This patch directly introduces the basic infrastructure to implement memory unplug. Especially the memory block states and subblock bitmaps will be heavily used there. Notes: - Memory blocks that are partally unplugged must not be onlined to the MOVABLE_ZONE. They contain unmovable parts and might e.g., result in warnings when trying to offline them. - In the kdump kernel, we don't want to load the driver, to not mess with memory to be dumped. - In case memory is to be onlined by user space, we limit the amount of offline memory blocks, to not run out of memory. - Suspend/Hibernate is not supported due to the way virtio-mem devices behave. Limited support might be possible in the future. - Reloading the device driver is not supported. - When unloading the driver, we have to remove partially plugged offline memory blocks. Otherwise they could get fully onlined later on, when we no longer have control over via the online_page callback. - As the hypervisor might suddenly be busy (during reboot, in-between requests, when adding of memory fails), we have to take care of some corner cases - especially virtio_mem_unplug_pending_mb() and virtio_mem_send_unplug_all_request(). The hypervisor could for example be busy if it is currently migrating the guest. Future work: - Reduce the amount of memory resources if that turnes out to be an issue. Or try to speed up relevant code paths to deal with many resources. - Allocate the vmemmap from the added memory. Makes hotplug more likely to succeed, the vmemmap is stored on the same NUMA node and that unmovable memory will later not hinder unplug. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Oscar Salvador Cc: Michal Hocko Cc: Igor Mammedov Cc: Dave Young Cc: Andrew Morton Cc: Dan Williams Cc: Pavel Tatashin Cc: Stefan Hajnoczi Cc: Vlastimil Babka Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: linux-a...@vger.kernel.org Signed-off-by: David Hildenbrand --- drivers/virtio/Kconfig | 17 + drivers/virtio/Makefile |1 + drivers/virtio/virtio_mem.c | 1569 +++ include/uapi/linux/virtio_ids.h |1 + include/uapi/linux/virtio_mem.h | 204 5 files changed, 1792 insertions(+) create mode 100644 drivers/virtio/virtio_mem.c create mode 100644 include/uapi/linux/virtio_mem.h diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 078615cf2afc..294720d53057 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -64,6 +64,23 @@ config VIRTIO_BALLOON If unsure, say M. +config VIRTIO_MEM + tristate "Virtio mem driver" + default m + depends on X86_64 + depends on VIRTIO + depends on MEMORY_HOTPLUG_SPARSE + depends on MEMORY_HOTREMOVE + help +This driver provides access to virtio-mem paravirtualized memory +devices, allowing to hotplug and hotunplug memory. + +This driver is was only tested under x86-64, but should +theoretically work on all architectures that support memory +hotplug and hotremove. + +If unsure, say M. + config VIRTIO_INPUT tristate "Virtio input driver" depends on VIRTIO diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile index 3a2b5c5dcf46..906d5a00ac85 100644 --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o +obj-$(CONFIG_VIRTIO_MEM) += virtio_mem.o diff --git a/drivers/virtio/virtio_mem.c
[virtio-dev] [PATCH RFC v4 07/13] virtio-mem: Allow to offline partially unplugged memory blocks
Dropping the reference count of PageOffline() pages allows offlining code to skip them. However, we also have to convert PG_reserved to another flag - let's use PG_dirty - so has_unmovable_pages() will properly handle them. PG_reserved pages get detected as unmovable right away. We need the flag to see if we are onlining pages the first time, or if we allocated them via alloc_contig_range(). Properly take care of offlining code also modifying the stats and special handling in case the driver gets unloaded. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Oscar Salvador Cc: Michal Hocko Cc: Igor Mammedov Cc: Dave Young Cc: Andrew Morton Cc: Dan Williams Cc: Pavel Tatashin Cc: Stefan Hajnoczi Cc: Vlastimil Babka Signed-off-by: David Hildenbrand --- drivers/virtio/virtio_mem.c | 64 - 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 5a142a371222..a12a0f9c076b 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -564,6 +564,53 @@ static void virtio_mem_notify_online(struct virtio_mem *vm, unsigned long mb_id, virtio_mem_retry(vm); } +static void virtio_mem_notify_going_offline(struct virtio_mem *vm, + unsigned long mb_id) +{ + const unsigned long nr_pages = PFN_DOWN(vm->subblock_size); + unsigned long pfn; + int sb_id, i; + + for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) { + if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1)) + continue; + /* +* Drop our reference to the pages so the memory can get +* offlined and add the unplugged pages to the managed +* page counters (so offlining code can correctly subtract +* them again). +*/ + pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) + + sb_id * vm->subblock_size); + adjust_managed_page_count(pfn_to_page(pfn), nr_pages); + for (i = 0; i < nr_pages; i++) + page_ref_dec(pfn_to_page(pfn + i)); + } +} + +static void virtio_mem_notify_cancel_offline(struct virtio_mem *vm, +unsigned long mb_id) +{ + const unsigned long nr_pages = PFN_DOWN(vm->subblock_size); + unsigned long pfn; + int sb_id, i; + + for (sb_id = 0; sb_id < vm->nb_sb_per_mb; sb_id++) { + if (virtio_mem_mb_test_sb_plugged(vm, mb_id, sb_id, 1)) + continue; + /* +* Get the reference we dropped when going offline and +* subtract the unplugged pages from the managed page +* counters. +*/ + pfn = PFN_DOWN(virtio_mem_mb_id_to_phys(mb_id) + + sb_id * vm->subblock_size); + adjust_managed_page_count(pfn_to_page(pfn), -nr_pages); + for (i = 0; i < nr_pages; i++) + page_ref_inc(pfn_to_page(pfn + i)); + } +} + /* * This callback will either be called synchonously from add_memory() or * asynchronously (e.g., triggered via user space). We have to be careful @@ -611,6 +658,7 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb, break; mutex_lock(>hotplug_mutex); vm->hotplug_active = true; + virtio_mem_notify_going_offline(vm, mb_id); break; case MEM_GOING_ONLINE: spin_lock_irq(>removal_lock); @@ -636,6 +684,12 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb, mutex_unlock(>hotplug_mutex); break; case MEM_CANCEL_OFFLINE: + if (!vm->hotplug_active) + break; + virtio_mem_notify_cancel_offline(vm, mb_id); + vm->hotplug_active = false; + mutex_unlock(>hotplug_mutex); + break; case MEM_CANCEL_ONLINE: if (!vm->hotplug_active) break; @@ -660,8 +714,11 @@ static void virtio_mem_set_fake_offline(unsigned long pfn, struct page *page = pfn_to_page(pfn); __SetPageOffline(page); - if (!onlined) + if (!onlined) { SetPageDirty(page); + /* FIXME: remove after cleanups */ + ClearPageReserved(page); + } } } @@ -1719,6 +1776,11 @@ static void virtio_mem_remove(struct virtio_device *vdev) rc = virtio_mem_mb_remove(vm, mb_id); BUG_ON(rc); } + /* +* After we unregistered our callbacks, user space can no longer +* offline partially plugged online memory blocks. No
[virtio-dev] [PATCH RFC v4 06/13] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE
virtio-mem wants to allow to offline memory blocks of which some parts were unplugged (allocated via alloc_contig_range()), especially, to later offline and remove completely unplugged memory blocks. The important part is that PageOffline() has to remain set until the section is offline, so these pages will never get accessed (e.g., when dumping). The pages should not be handed back to the buddy (which would require clearing PageOffline() and result in issues if offlining fails and the pages are suddenly in the buddy). Let's allow to do that by allowing to isolate any PageOffline() page when offlining. This way, we can reach the memory hotplug notifier MEM_GOING_OFFLINE, where the driver can signal that he is fine with offlining this page by dropping its reference count. PageOffline() pages with a reference count of 0 can then be skipped when offlining the pages (like if they were free, however they are not in the buddy). Anybody who uses PageOffline() pages and does not agree to offline them (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not decrement the reference count and make offlining fail when trying to migrate such an unmovable page. So there should be no observerable change. Same applies to balloon compaction users (movable PageOffline() pages), the pages will simply be migrated. Note 1: If offlining fails, a driver has to increment the reference count again in MEM_CANCEL_OFFLINE. Note 2: A driver that makes use of this has to be aware that re-onlining the memory block has to be handled by hooking into onlining code (online_page_callback_t), resetting the page PageOffline() and not giving them to the buddy. Cc: Andrew Morton Cc: Juergen Gross Cc: Konrad Rzeszutek Wilk Cc: Pavel Tatashin Cc: Alexander Duyck Cc: Vlastimil Babka Cc: Johannes Weiner Cc: Anthony Yznaga Cc: Michal Hocko Cc: Oscar Salvador Cc: Mel Gorman Cc: Mike Rapoport Cc: Dan Williams Cc: Anshuman Khandual Cc: Qian Cai Cc: Pingfan Liu Signed-off-by: David Hildenbrand --- include/linux/page-flags.h | 10 ++ mm/memory_hotplug.c| 41 -- mm/page_alloc.c| 24 ++ mm/page_isolation.c| 9 + 4 files changed, 74 insertions(+), 10 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1bf83c8fcaa7..ac1775082343 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -761,6 +761,16 @@ PAGE_TYPE_OPS(Buddy, buddy) * not onlined when onlining the section). * The content of these pages is effectively stale. Such pages should not * be touched (read/write/dump/save) except by their owner. + * + * If a driver wants to allow to offline unmovable PageOffline() pages without + * putting them back to the buddy, it can do so via the memory notifier by + * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the + * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline() + * pages (now with a reference count of zero) are treated like free pages, + * allowing the containing memory block to get offlined. A driver that + * relies on this feature is aware that re-onlining the memory block will + * require to re-set the pages PageOffline() and not giving them to the + * buddy via online_page_callback_t. */ PAGE_TYPE_OPS(Offline, offline) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index fc617ad6f035..da01453a04e6 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1231,11 +1231,15 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn, /* * Scan pfn range [start,end) to find movable/migratable pages (LRU pages, - * non-lru movable pages and hugepages). We scan pfn because it's much - * easier than scanning over linked list. This function returns the pfn - * of the first found movable page if it's found, otherwise 0. + * non-lru movable pages and hugepages). + * + * Returns: + * 0 in case a movable page is found and movable_pfn was updated. + * -ENOENT in case no movable page was found. + * -EBUSY in case a definetly unmovable page was found. */ -static unsigned long scan_movable_pages(unsigned long start, unsigned long end) +static int scan_movable_pages(unsigned long start, unsigned long end, + unsigned long *movable_pfn) { unsigned long pfn; @@ -1247,18 +1251,29 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end) continue; page = pfn_to_page(pfn); if (PageLRU(page)) - return pfn; + goto found; if (__PageMovable(page)) - return pfn; + goto found; + + /* +* Unmovable PageOffline() pages where somebody still holds +* a reference count (after
[virtio-dev] [PATCH RFC v4 04/13] mm: Export alloc_contig_range() / free_contig_range()
A virtio-mem device wants to allocate memory from the memory region it manages in order to unplug it in the hypervisor - similar to a balloon driver. Also, it might want to plug previously unplugged (allocated) memory and give it back to Linux. alloc_contig_range() / free_contig_range() seem to be the perfect interface for this task. In contrast to existing balloon devices, a virtio-mem device operates on bigger chunks (e.g., 4MB) and only on physical memory it manages. It tracks which chunks (subblocks) are still plugged, so it can go ahead and try to alloc_contig_range()+unplug them on unplug request, or plug+free_contig_range() unplugged chunks on plug requests. A virtio-mem device will use alloc_contig_range() / free_contig_range() only on ranges that belong to the same node/zone in at least MAX(MAX_ORDER - 1, pageblock_order) order granularity - e.g., 4MB on x86-64. The virtio-mem device added that memory, so the memory exists and does not contain any holes. virtio-mem will only try to allocate on ZONE_NORMAL, never on ZONE_MOVABLE, just like when allocating gigantic pages (we don't put unmovable data into the movable zone). Cc: Andrew Morton Cc: Michal Hocko Cc: Vlastimil Babka Cc: Oscar Salvador Cc: Mel Gorman Cc: Mike Rapoport Cc: Dan Williams Cc: Alexander Duyck Cc: Pavel Tatashin Cc: Alexander Potapenko Acked-by: Michal Hocko # to export contig range allocator API Signed-off-by: David Hildenbrand --- mm/page_alloc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 62dcd6b76c80..5334decc9e06 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8494,6 +8494,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, pfn_max_align_up(end), migratetype); return ret; } +EXPORT_SYMBOL(alloc_contig_range); static int __alloc_contig_pages(unsigned long start_pfn, unsigned long nr_pages, gfp_t gfp_mask) @@ -8609,6 +8610,7 @@ void free_contig_range(unsigned long pfn, unsigned int nr_pages) } WARN(count != 0, "%d pages are still in use!\n", count); } +EXPORT_SYMBOL(free_contig_range); /* * The zone indicated has a new number of managed_pages; batch sizes and percpu -- 2.23.0 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH RFC v4 05/13] virtio-mem: Paravirtualized memory hotunplug part 2
We can use alloc_contig_range() to try to unplug subblocks. Unplugged blocks will be marked PG_offline, however, don't have the PG_reserved flag set. This way, we can differentiate these allocated subblocks from subblocks that were never onlined and handle them properly in virtio_mem_fake_online(). free_contig_range() is used to hand back subblocks to Linux. It is worth noting that there are no guarantess on how much memory can actually get unplugged again. All device memory might completely be fragmented with unmovable data, such that no subblock can get unplugged. We might want to improve the unplugging capability in the future. We are not touching the ZONE_MOVABLE. If memory is onlined to the ZONE_MOVABLE, it can only get unplugged after that memory was offlined manually by user space. In normal operation, virtio-mem memory is suggested to be onlined to ZONE_NORMAL. In the future, we will try to make unplug more likely to succeed. Add a module parameter to control if online memory shall be touched. Future work: - Performance improvements: -- Sense (lockless) if it make sense to try alloc_contig_range() at all before directly trying to isolate and taking locks. -- Try to unplug bigger chunks if possible first. -- Identify free areas first, that don't have to be evacuated. - Make unplug more likely to succeed: -- There are various idea to limit fragmentation on memory block granularity. (e.g., ZONE_PREFER_MOVABLE and smart balancing) -- Allocate memmap from added memory. This way, less unmovable data can end up on the memory blocks. - OOM handling, e.g., via an OOM handler. - Defragmentation -- Will require a new virtio-mem CMD to exchange plugged<->unplugged blocks Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Oscar Salvador Cc: Michal Hocko Cc: Igor Mammedov Cc: Dave Young Cc: Andrew Morton Cc: Dan Williams Cc: Pavel Tatashin Cc: Stefan Hajnoczi Cc: Vlastimil Babka Signed-off-by: David Hildenbrand --- drivers/virtio/Kconfig | 1 + drivers/virtio/virtio_mem.c | 156 2 files changed, 143 insertions(+), 14 deletions(-) diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index 294720d53057..75a760f32ec7 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -71,6 +71,7 @@ config VIRTIO_MEM depends on VIRTIO depends on MEMORY_HOTPLUG_SPARSE depends on MEMORY_HOTREMOVE + select CONTIG_ALLOC help This driver provides access to virtio-mem paravirtualized memory devices, allowing to hotplug and hotunplug memory. diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index f1af05def5df..5a142a371222 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -21,6 +21,10 @@ #include +static bool unplug_online = true; +module_param(unplug_online, bool, 0644); +MODULE_PARM_DESC(unplug_online, "Try to unplug online memory"); + enum virtio_mem_mb_state { /* Unplugged, not added to Linux. Can be reused later. */ VIRTIO_MEM_MB_STATE_UNUSED = 0, @@ -646,23 +650,35 @@ static int virtio_mem_memory_notifier_cb(struct notifier_block *nb, } /* - * Set a range of pages PG_offline. + * Set a range of pages PG_offline. Remember pages that were never onlined + * (via generic_online_page()) using PageDirty(). */ static void virtio_mem_set_fake_offline(unsigned long pfn, - unsigned int nr_pages) + unsigned int nr_pages, bool onlined) { - for (; nr_pages--; pfn++) - __SetPageOffline(pfn_to_page(pfn)); + for (; nr_pages--; pfn++) { + struct page *page = pfn_to_page(pfn); + + __SetPageOffline(page); + if (!onlined) + SetPageDirty(page); + } } /* - * Clear PG_offline from a range of pages. + * Clear PG_offline from a range of pages. If the pages were never onlined, + * (via generic_online_page()), clear PageDirty(). */ static void virtio_mem_clear_fake_offline(unsigned long pfn, - unsigned int nr_pages) + unsigned int nr_pages, bool onlined) { - for (; nr_pages--; pfn++) - __ClearPageOffline(pfn_to_page(pfn)); + for (; nr_pages--; pfn++) { + struct page *page = pfn_to_page(pfn); + + __ClearPageOffline(page); + if (!onlined) + ClearPageDirty(page); + } } /* @@ -681,10 +697,26 @@ static void virtio_mem_fake_online(unsigned long pfn, unsigned int nr_pages) BUG_ON(!IS_ALIGNED(pfn, 1 << order)); BUG_ON(!IS_ALIGNED(nr_pages, 1 << order)); - virtio_mem_clear_fake_offline(pfn, nr_pages); + for (i = 0; i < nr_pages; i += 1 << order) { + struct page *page = pfn_to_page(pfn + i); - for (i = 0; i < nr_pages; i += 1 << order) -
[virtio-dev] [PATCH RFC v4 01/13] ACPI: NUMA: export pxm_to_node
Will be needed by virtio-mem to identify the node from a pxm. Cc: "Rafael J. Wysocki" Cc: Len Brown Cc: linux-a...@vger.kernel.org Signed-off-by: David Hildenbrand --- drivers/acpi/numa/srat.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c index eadbf90e65d1..d5847fa7ac69 100644 --- a/drivers/acpi/numa/srat.c +++ b/drivers/acpi/numa/srat.c @@ -35,6 +35,7 @@ int pxm_to_node(int pxm) return NUMA_NO_NODE; return pxm_to_node_map[pxm]; } +EXPORT_SYMBOL(pxm_to_node); int node_to_pxm(int node) { -- 2.23.0 - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] [PATCH RFC v4 00/13] virtio-mem: paravirtualized memory
This series is based on latest linux-next. The patches are located at: https://github.com/davidhildenbrand/linux.git virtio-mem-rfc-v4 The basic idea of virtio-mem is to provide a flexible, cross-architecture memory hot(un)plug solution that avoids many limitations imposed by existing technologies, architectures, and interfaces. More details can be found below and in linked material. This RFC is limited to x86-64, however, should theoretically work on any architecture that supports virtio and implements memory hot(un)plug under Linux - like s390x, powerpc64 and arm64. On x86-64, it is currently possible to add/remove memory to the system in >= 4MB granularity. Memory hotplug works very reliably. For memory unplug, there are no guarantees how much memory can actually get unplugged, it depends on the setup (especially: fragmentation of (unmovable) memory). I have plans to improve that in the future. -- 1. virtio-mem -- The basic idea behind virtio-mem was presented at KVM Forum 2018. The slides can be found at [1]. The previous RFC can be found at [2]. The first RFC can be found at [3]. However, the concept evolved over time. The KVM Forum slides roughly match the current design. Patch #2 ("virtio-mem: Paravirtualized memory hotplug") contains quite some information, especially in "include/uapi/linux/virtio_mem.h": Each virtio-mem device manages a dedicated region in physical address space. Each device can belong to a single NUMA node, multiple devices for a single NUMA node are possible. A virtio-mem device is like a "resizable DIMM" consisting of small memory blocks that can be plugged or unplugged. The device driver is responsible for (un)plugging memory blocks on demand. Virtio-mem devices can only operate on their assigned memory region in order to (un)plug memory. A device cannot (un)plug memory belonging to other devices. The "region_size" corresponds to the maximum amount of memory that can be provided by a device. The "size" corresponds to the amount of memory that is currently plugged. "requested_size" corresponds to a request from the device to the device driver to (un)plug blocks. The device driver should try to (un)plug blocks in order to reach the "requested_size". It is impossible to plug more memory than requested. The "usable_region_size" represents the memory region that can actually be used to (un)plug memory. It is always at least as big as the "requested_size" and will grow dynamically. It will only shrink when explicitly triggered (VIRTIO_MEM_REQ_UNPLUG). Memory in the usable region can usually be read, however, there are no guarantees. It can happen that the device cannot process a request, because it is busy. The device driver has to retry later. Usually, during system resets all memory will get unplugged, so the device driver can start with a clean state. However, in specific scenarios (if the device is busy) it can happen that the device still has memory plugged. The device driver can request to unplug all memory (VIRTIO_MEM_REQ_UNPLUG) - which might take a while to succeed if the device is busy. -- 2. Linux Implementation -- This RFC reuses quite some existing MM infrastructure, however, has to expose some additional functionality. Memory blocks (e.g., 128MB) are added/removed on demand. Within these memory blocks, subblocks (e.g., 4MB) are plugged/unplugged. The sizes depend on the target architecture, MAX_ORDER + pageblock_order, and the block size of a virtio-mem device. add_memory()/try_remove_memory() is used to add/remove memory blocks. virtio-mem will not online memory blocks itself. This has to be done by user space, or configured into the kernel (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE). virtio-mem will only unplug memory that was online to the ZONE_NORMAL. Memory is suggested to be onlined to the ZONE_NORMAL for now. The memory hotplug notifier is used to properly synchronize against onlining/offlining of memory blocks and to track the states of memory blocks (including the zone memory blocks are onlined to). The set_online_page() callback is used to keep unplugged subblocks of a memory block fake-offline when onlining the memory block. generic_online_page() is used to fake-online plugged subblocks. This handling is similar to the Hyper-V balloon driver. PG_offline is used to mark unplugged subblocks as offline, so e.g., dumping tools (makedumpfile) will skip these pages. This is similar to other balloon drivers like virtio-balloon and Hyper-V. Memory offlining code is extended to allow drivers to drop their reference to PG_offline pages when MEM_GOING_OFFLINE, so these pages can be skipped when offlining memory
Re: [virtio-dev] Re: guest / host buffer sharing ...
On Thu, Dec 12, 2019 at 09:26:32PM +0900, David Stevens wrote: > > > > Second I think it is a bad idea > > > > from the security point of view. When explicitly exporting buffers it > > > > is easy to restrict access to the actual exports. > > > > > > Restricting access to actual exports could perhaps help catch bugs. > > > However, I don't think it provides any security guarantees, since the > > > guest can always just export every buffer before using it. > > > > Probably not on the guest/host boundary. > > > > It's important for security inside the guest though. You don't want > > process A being able to access process B private resources via buffer > > sharing support, by guessing implicit buffer identifiers. > > At least for the linux guest implementation, I wouldn't think the > uuids would be exposed from the kernel. To me, it seems like something > that should be handled internally by the virtio drivers. That would be one possible use case, yes. The exporting driver attaches a uuid to the dma-buf. The importing driver can see the attached uuid and use it (if supported, otherwise run with the scatter list). That will be transparent to userspace, apps will just export/import dma-bufs as usual and not even notice the uuid. I can see other valid use cases though: A wayland proxy could use virtio-gpu buffer exports for shared memory and send the buffer uuid to the host over some stream protocol (vsock, tcp, ...). For that to work we have to export the uuid to userspace, for example using a ioctl on the dma-buf file handle. > If you use some other guest with untrusted > userspace drivers, or if you're pulling the uuids out of the kernel to > give to some non-virtio transport, then I can see it being a concern. I strongly prefer a design where we don't have to worry about that concern in the first place instead of discussing whenever we should be worried or not. > > Without buffer sharing support the driver importing a virtio-gpu dma-buf > > can send the buffer scatter list to the host. So both virtio-gpu and > > the other device would actually access the same guest pages, but they > > are not aware that the buffer is shared between devices. > > With the uuid approach, how should this case be handled? Should it be > equivalent to exporting and importing the buffer which was created > first? Should the spec say it's undefined behavior that might work as > expected but might not, depending on the device implementation? Does > the spec even need to say anything about it? Using the uuid is an optional optimization. I'd expect the workflow be roughly this: (1) exporting driver exports a dma-buf as usual, additionally attaches a uuid to it and notifies the host (using device-specific commands). (2) importing driver will ask the host to use the buffer referenced by the given uuid. (3) if (2) fails for some reason use the dma-buf scatter list instead. Of course only virtio drivers would try step (2), other drivers (when sharing buffers between intel gvt device and virtio-gpu for example) would go straight to (3). > > With buffer sharing virtio-gpu would attach a uuid to the dma-buf, and > > the importing driver can send the uuid (instead of the scatter list) to > > the host. So the device can simply lookup the buffer on the host side > > and use it directly. Another advantage is that this enables some more > > use cases like sharing buffers between devices which are not backed by > > guest ram. > > Not just buffers not backed by guest ram, but things like fences. I > would suggest the uuids represent 'exported resources' rather than > 'exported buffers'. Hmm, I can't see how this is useful. Care to outline how you envision this to work in a typical use case? cheers, Gerd - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: guest / host buffer sharing ...
> > > Second I think it is a bad idea > > > from the security point of view. When explicitly exporting buffers it > > > is easy to restrict access to the actual exports. > > > > Restricting access to actual exports could perhaps help catch bugs. > > However, I don't think it provides any security guarantees, since the > > guest can always just export every buffer before using it. > > Probably not on the guest/host boundary. > > It's important for security inside the guest though. You don't want > process A being able to access process B private resources via buffer > sharing support, by guessing implicit buffer identifiers. At least for the linux guest implementation, I wouldn't think the uuids would be exposed from the kernel. To me, it seems like something that should be handled internally by the virtio drivers. Especially since the 'export' process would be very much a virtio-specific action, so it's likely that it wouldn't fit nicely into existing userspace software. If you use some other guest with untrusted userspace drivers, or if you're pulling the uuids out of the kernel to give to some non-virtio transport, then I can see it being a concern. > > > Instead of using a dedicated buffer sharing device we can also use > > > virtio-gpu (or any other driver which supports dma-buf exports) to > > > manage buffers. Ah, okay. I misunderstood the original statement. I read the sentence as 'we can use virtio-gpu in place of the dedicated buffer sharing device', rather than 'every device can manage its own buffers'. I can agree with the second meaning. > Without buffer sharing support the driver importing a virtio-gpu dma-buf > can send the buffer scatter list to the host. So both virtio-gpu and > the other device would actually access the same guest pages, but they > are not aware that the buffer is shared between devices. With the uuid approach, how should this case be handled? Should it be equivalent to exporting and importing the buffer which was created first? Should the spec say it's undefined behavior that might work as expected but might not, depending on the device implementation? Does the spec even need to say anything about it? > With buffer sharing virtio-gpu would attach a uuid to the dma-buf, and > the importing driver can send the uuid (instead of the scatter list) to > the host. So the device can simply lookup the buffer on the host side > and use it directly. Another advantage is that this enables some more > use cases like sharing buffers between devices which are not backed by > guest ram. Not just buffers not backed by guest ram, but things like fences. I would suggest the uuids represent 'exported resources' rather than 'exported buffers'. > Well, security-wise you want have buffer identifiers which can't be > easily guessed. And guessing uuid is pretty much impossible due to > the namespace being huge. I guess this depends on what you're passing around within the guest. If you're passing around the raw uuids, sure. But I would argue it's better to pass around unforgeable identifiers (e.g. fds), and to restrict the uuids to when talking directly to the virtio transport. But I guess there are likely situations where that's not possible. -David - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] [RFC RESEND] virtio-video: Add virtio video device specification
Hi Keiichi, Thank you for your comment. What do you thing about selection/crop rectangles? Should this be defined as a must have? Or we just assume the device always sticks to the stream resolution. Regards, Dmitry. On Donnerstag, 12. Dezember 2019 06:39:11 CET Keiichi Watanabe wrote: > On Tue, Dec 10, 2019 at 10:16 PM Dmitry Sepp > > wrote: > > Hi, > > > > Just to start, let's consider this v4l2 control: > > V4L2_CID_MPEG_VIDEO_FRAME_RC_ENABLE. > > As I can see, this control is referenced as a mandatory one in the > > Chromium > > sources [1]. > > > > So could someone from the Chromium team please explain why it is > > mandatory? > > (YouTube?) In fact, almost no encoders implement this control. Do we need > > it in virtio-video? > > That control is used to encode videos in constant bitrate (CBR) mode, > which is critical for > real-time use case like video conferencing. > > However, that Chromium source code just requires *host-side* drivers > to have the v4l2 > control. Also, I guess it's rare that a guest app wants to use CQP > instead of CBR from our > experience. > So, I suppose we can omit this control in virtio spec for now by > assuming that guests > always use CBR mode. > > Best regards, > Keiichi > > > [1] > > https://chromium.googlesource.com/chromium/src/media/+/refs/heads/master/ > > gpu/v4l2/v4l2_video_encode_accelerator.cc#1500 > > > > Regards, > > Dmitry. > > > > On Montag, 9. Dezember 2019 22:12:28 CET Enrico Granata wrote: > > > +Changyeon Jo for his awareness > > > > > > Thanks, > > > - Enrico > > > > > > > > > On Mon, Dec 9, 2019 at 6:20 AM Dmitry Sepp > > > > > > wrote: > > > > Hello, > > > > > > > > I'd like to invite everyone to share ideas regarding required encoder > > > > features > > > > in this separate sub-tree. > > > > > > > > In general, encoder devices are more complex compared to decoders. So > > > > the > > > > question I'd like to rise is in what way we define the minimal subset > > > > of > > > > features to be implemented by the virtio-video. > > > > > > > > We may look at the following to define the set of features: > > > > 1. USB video, 2.3.6 Encoding Unit [1]. > > > > 2. Android 10 Compatibility Definition [2]. > > > > > > > > Would be nice to hear about any specific requirements from the > > > > Chromium > > > > team as > > > > well. > > > > > > > > [1] https://www.usb.org/sites/default/files/USB_Video_Class_1_5.zip > > > > [2] > > > > https://source.android.com/compatibility/android-cdd#5_2_video_encodin > > > > g > > > > > > > > Thank you. > > > > > > > > Best regards, > > > > Dmitry. > > > > > > > > On Mittwoch, 4. Dezember 2019 10:16:20 CET Gerd Hoffmann wrote: > > > > > Hi, > > > > > > > > > > > 1. Focus on only decoder/encoder functionalities first. > > > > > > > > > > > > As Tomasz said earlier in this thread, it'd be too complicated to > > > > > > > > support > > > > > > > > > > camera usage at the same time. So, I'd suggest to make it just a > > > > > > > > generic > > > > > > > > > > mem-to-mem video processing device protocol for now. > > > > > > If we finally decide to support camera in this protocol, we can > > > > > > add it > > > > > > later. > > > > > > > > > > Agree. > > > > > > > > > > > 2. Only one feature bit can be specified for one device. > > > > > > > > > > > > I'd like to have a decoder device and encoder device separately. > > > > > > It'd be natural to assume it because a decoder and an encoder are > > > > > > > > provided > > > > > > > > > > as different hardware. > > > > > > > > > > Hmm, modern GPUs support both encoding and decoding ... > > > > > > > > > > I don't think we should bake that restriction into the > > > > > specification. > > > > > It probably makes sense to use one virtqueue per function though, > > > > > that > > > > > will simplify dispatching in both host and guest. > > > > > > > > > > > 3. Separate buffer allocation functionalities from virtio-video > > > > > > > > protocol. > > > > > > > > > > To support various ways of guest/host buffer sharing, we might > > > > > > want to > > > > > > have a dedicated buffer sharing device as we're discussing in > > > > > > another > > > > > > thread. Until we reach consensus there, it'd be good not to have > > > > > > buffer > > > > > > allocation > > > > > > functionalities in virtio-video. > > > > > > > > > > I think virtio-video should be able to work as stand-alone device, > > > > > so we need some way to allocate buffers ... > > > > > > > > > > Buffer sharing with other devices can be added later. > > > > > > > > > > > > +The virtio video device is a virtual video streaming device > > > > > > > that > > > > > > > supports the +following functions: encoder, decoder, capture, > > > > > > > output. > > > > > > > + > > > > > > > +\subsection{Device ID}\label{sec:Device Types / Video Device / > > > > > > > > Device > > > > > > > > > > > ID} > > > > > > > + > > > > > > > +TBD. > > > > > > > > > > > > I'm wondering how and when
Re: [virtio-dev] Re: guest / host buffer sharing ...
Hi, > > First the addressing is non-trivial, especially with the "transport > > specific device address" in the tuple. > > There is complexity here, but I think it would also be present in the > buffer sharing device case. With a buffer sharing device, the same > identifying information would need to be provided from the exporting > driver to the buffer sharing driver, so the buffer sharing device > would be able to identify the right device in the vmm. No. The idea is that the buffer sharing device will allocate and manage the buffers (including identifiers), i.e. it will only export buffers, never import. > > Second I think it is a bad idea > > from the security point of view. When explicitly exporting buffers it > > is easy to restrict access to the actual exports. > > Restricting access to actual exports could perhaps help catch bugs. > However, I don't think it provides any security guarantees, since the > guest can always just export every buffer before using it. Probably not on the guest/host boundary. It's important for security inside the guest though. You don't want process A being able to access process B private resources via buffer sharing support, by guessing implicit buffer identifiers. With explicit buffer exports that opportunity doesn't exist in the first place. Anything not exported can't be accessed via buffer sharing, period. And to access the exported buffers you need to know the uuid, which in turn allows the guest implement any access restrictions it wants. > > Instead of using a dedicated buffer sharing device we can also use > > virtio-gpu (or any other driver which supports dma-buf exports) to > > manage buffers. > > I don't think adding generic buffer management to virtio-gpu (or any > specific device type) is a good idea, There isn't much to add btw. virtio-gpu has buffer management, buffers are called "resources" in virtio-gpu terminology. You can already export them as dma-bufs (just landed in 5.5-rc1) and import them into other drivers. Without buffer sharing support the driver importing a virtio-gpu dma-buf can send the buffer scatter list to the host. So both virtio-gpu and the other device would actually access the same guest pages, but they are not aware that the buffer is shared between devices. With buffer sharing virtio-gpu would attach a uuid to the dma-buf, and the importing driver can send the uuid (instead of the scatter list) to the host. So the device can simply lookup the buffer on the host side and use it directly. Another advantage is that this enables some more use cases like sharing buffers between devices which are not backed by guest ram. > since that device would then > become a requirement for buffer sharing between unrelated devices. No. When we drop the buffer sharing device idea (which is quite likely), then any device can create buffers. If virtio-gpu is involved anyway, for example because you want show the images from the virtio-camera device on the virtio-gpu display, it makes sense to use virtio-gpu of course. But any other device can create and export buffers in a similar way. Without a buffer sharing device there is no central instance managing the buffers. A virtio-video spec (video encoder/decoder) is in discussion at the moment, it will probably get resource management simliar to virtio-gpu for the video frames, and it will be able to export/import those buffers (probably not in the first revision, but it is on the radar). > > With no central instance (buffer sharing device) being there managing > > the buffer identifiers I think using uuids as identifiers would be a > > good idea, to avoid clashes. Also good for security because it's pretty > > much impossible to guess buffer identifiers then. > > Using uuids to identify buffers would work. The fact that it provides > a single way to refer to both guest and host allocated buffers is > nice. And it could also directly apply to sharing resources other than > buffers (e.g. fences). Although unless we're positing that there are > different levels of trust within the guest, I don't think uuids really > provides much security. Well, security-wise you want have buffer identifiers which can't be easily guessed. And guessing uuid is pretty much impossible due to the namespace being huge. > If we're talking about uuids, they could also be used to simplify my > proposed implicit addressing scheme. Each device could be assigned a > uuid, which would simplify the shared resource identifier to > (device-uuid, shmid, offset). See above for the security aspects of implicit vs. explicit buffer identifiers. cheers, Gerd - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org