Re: [virtio-dev] Re: [PATCH 1/2] virtio: add dma-buf support for exported objects

2020-02-25 Thread David Stevens
On Tue, Feb 25, 2020 at 3:10 PM Gerd Hoffmann  wrote:
>
> On Wed, Feb 19, 2020 at 05:06:36PM +0900, David Stevens wrote:
> > This change adds a new flavor of dma-bufs that can be used by virtio
> > drivers to share exported objects. A virtio dma-buf can be queried by
> > virtio drivers to obtain the UUID which identifies the underlying
> > exported object.
>
> That duplicates a bunch of code from dma-buf.c in virtio_dma_buf.c.
>
> How about dma_buf_{get,set}_uuid, simliar to dma_buf_set_name?

While I'm not opposed to such an API, I'm also hesitant to make
changes to the dma-buf API for a single use case.

As for the duplicated code around virtio_dma_buf_export_info, it can
be removed by sacrificing a little bit of type safety, if that's
preferable.

-David

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH RFC v4 06/13] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

2020-02-25 Thread David Hildenbrand
On 25.02.20 22:46, Alexander Duyck wrote:
> On Tue, 2020-02-25 at 19:49 +0100, David Hildenbrand wrote:
  /*
   * Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
 - * non-lru movable pages and hugepages). We scan pfn because it's much
 - * easier than scanning over linked list. This function returns the pfn
 - * of the first found movable page if it's found, otherwise 0.
 + * non-lru movable pages and hugepages).
 + *
 + * Returns:
 + *0 in case a movable page is found and movable_pfn was updated.
 + *-ENOENT in case no movable page was found.
 + *-EBUSY in case a definetly unmovable page was found.
   */
 -static unsigned long scan_movable_pages(unsigned long start, unsigned 
 long end)
 +static int scan_movable_pages(unsigned long start, unsigned long end,
 +unsigned long *movable_pfn)
  {
unsigned long pfn;
  
 @@ -1247,18 +1251,29 @@ static unsigned long scan_movable_pages(unsigned 
 long start, unsigned long end)
continue;
page = pfn_to_page(pfn);
if (PageLRU(page))
 -  return pfn;
 +  goto found;
if (__PageMovable(page))
 -  return pfn;
 +  goto found;
 +
 +  /*
 +   * Unmovable PageOffline() pages where somebody still holds
 +   * a reference count (after MEM_GOING_OFFLINE) can definetly
 +   * not be offlined.
 +   */
 +  if (PageOffline(page) && page_count(page))
 +  return -EBUSY;
>>>
>>> So the comment confused me a bit because technically this function isn't
>>> about offlining memory, it is about finding movable pages. I had to do a
>>> bit of digging to find the only consumer is __offline_pages, but if we are
>>> going to talk about "offlining" instead of "moving" in this function it
>>> might make sense to rename it.
>>
>> Well, it's contained in memory_hotplug.c, and the only user of moving
>> pages around in there is offlining code :) And it's job is to locate
>> movable pages, skip over some (temporary? unmovable ones) and (now)
>> indicate definitely unmovable ones.
>>
>> Any idea for a better name?
>> scan_movable_pages_and_stop_on_definitely_unmovable() is not so nice :)
> 
> I dunno. What I was getting at is that the wording here would make it
> clearer if you simply stated that these pages "can definately not be
> moved". Saying you cannot offline a page that is PageOffline seems kind of
> redundant, then again calling it an Unmovable and then saying it cannot be
> moves is also redundant I suppose. In the end you don't move them, but

So, in summary, there are
- PageOffline() pages that are movable (balloon compaction).
- PageOffline() pages that cannot be moved and cannot be offlined (e.g.,
  no balloon compaction enabled, XEN, HyperV, ...) . page_count(page) >=
  0
- PageOffline() pages that cannot be moved, but can be offlined.
  page_count(page) == 0.


> they can be switched to offline if the page count hits 0. When that
> happens you simply end up skipping over them in the code for
> __test_page_isolated_in_pageblock and __offline_isolated_pages.

Yes. The thing with the wording is that pages with (PageOffline(page) &&
!page_count(page)) can also not really be moved, but they can be skipped
when offlining. If we call that "moving them to /dev/null", then yes,
they can be moved to some degree :)

I can certainly do here e.g.,

/*
 * PageOffline() pages that are not marked __PageMovable() and have a
 * reference count > 0 (after MEM_GOING_OFFLINE) are definitely
 * unmovable. If their reference count would be 0, they could be skipped
 * when offlining memory sections.
 */

And maybe I'll add to the function doc, that unmovable pages that are
skipped in this function can include pages that can be skipped when
offlining (moving them to nirvana).

Other suggestions?

[...]

>>
>> [1] we detect a definite offlining blocker and
>>
 +  } while (!ret);
 +
 +  if (ret != -ENOENT) {
 +  reason = "unmovable page";
>>
>> [2] we abort offlining
>>
 +  goto failed_removal_isolated;
}
  
/*
> 
> Yeah, this is the piece I misread.  I knew the loop this was in previously
> was looping when returning -ENOENT so for some reason I had it in my head
> that you were still looping on -EBUSY.

Ah okay, I see. Yeah, that wouldn't make sense for the use case I have :)

> 
> So the one question I would have is if at this point are we guaranteed
> that the balloon drivers have already taken care of the page count for all
> the pages they set to PageOffline? Based on the patch description I was
> thinking that this was going to be looping for a while waiting for the
> driver to clear the pages and then w

[virtio-dev] Re: [PATCH RFC v4 06/13] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

2020-02-25 Thread Alexander Duyck
On Tue, 2020-02-25 at 19:49 +0100, David Hildenbrand wrote:
> > >  /*
> > >   * Scan pfn range [start,end) to find movable/migratable pages (LRU 
> > > pages,
> > > - * non-lru movable pages and hugepages). We scan pfn because it's much
> > > - * easier than scanning over linked list. This function returns the pfn
> > > - * of the first found movable page if it's found, otherwise 0.
> > > + * non-lru movable pages and hugepages).
> > > + *
> > > + * Returns:
> > > + *   0 in case a movable page is found and movable_pfn was updated.
> > > + *   -ENOENT in case no movable page was found.
> > > + *   -EBUSY in case a definetly unmovable page was found.
> > >   */
> > > -static unsigned long scan_movable_pages(unsigned long start, unsigned 
> > > long end)
> > > +static int scan_movable_pages(unsigned long start, unsigned long end,
> > > +   unsigned long *movable_pfn)
> > >  {
> > >   unsigned long pfn;
> > >  
> > > @@ -1247,18 +1251,29 @@ static unsigned long scan_movable_pages(unsigned 
> > > long start, unsigned long end)
> > >   continue;
> > >   page = pfn_to_page(pfn);
> > >   if (PageLRU(page))
> > > - return pfn;
> > > + goto found;
> > >   if (__PageMovable(page))
> > > - return pfn;
> > > + goto found;
> > > +
> > > + /*
> > > +  * Unmovable PageOffline() pages where somebody still holds
> > > +  * a reference count (after MEM_GOING_OFFLINE) can definetly
> > > +  * not be offlined.
> > > +  */
> > > + if (PageOffline(page) && page_count(page))
> > > + return -EBUSY;
> > 
> > So the comment confused me a bit because technically this function isn't
> > about offlining memory, it is about finding movable pages. I had to do a
> > bit of digging to find the only consumer is __offline_pages, but if we are
> > going to talk about "offlining" instead of "moving" in this function it
> > might make sense to rename it.
> 
> Well, it's contained in memory_hotplug.c, and the only user of moving
> pages around in there is offlining code :) And it's job is to locate
> movable pages, skip over some (temporary? unmovable ones) and (now)
> indicate definitely unmovable ones.
> 
> Any idea for a better name?
> scan_movable_pages_and_stop_on_definitely_unmovable() is not so nice :)

I dunno. What I was getting at is that the wording here would make it
clearer if you simply stated that these pages "can definately not be
moved". Saying you cannot offline a page that is PageOffline seems kind of
redundant, then again calling it an Unmovable and then saying it cannot be
moves is also redundant I suppose. In the end you don't move them, but
they can be switched to offline if the page count hits 0. When that
happens you simply end up skipping over them in the code for
__test_page_isolated_in_pageblock and __offline_isolated_pages.

> > >  
> > >   if (!PageHuge(page))
> > >   continue;
> > >   head = compound_head(page);
> > >   if (page_huge_active(head))
> > > - return pfn;
> > > + goto found;
> > >   skip = compound_nr(head) - (page - head);
> > >   pfn += skip - 1;
> > >   }
> > > + return -ENOENT;
> > > +found:
> > > + *movable_pfn = pfn;
> > >   return 0;
> > >  }
> > 
> > So I am looking at this function and it seems like your change completely
> > changes the behavior. The code before would walk the entire range and if
> > at least 1 page was available to move you would return the PFN of that
> > page. Now what seems to happen is that you will return -EBUSY as soon as
> > you encounter an offline page with a page count. I would think that would
> > slow down the offlining process since you have made the Unmovable
> > PageOffline() page a head of line blocker that you have to wait to get
> > around.
> 
> So, the comment says "Unmovable PageOffline() pages where somebody still
> holds a reference count (after MEM_GOING_OFFLINE) can definitely not be
> offlined". And the doc "-EBUSY in case a definitely unmovable page was
> found."
> 
> So why would this make offlining slow? Offlining will be aborted,
> because offlining is not possible.

Okay, my reading of the code was a bit off. In my mind I was thinking you
were effectively treating it almost like an EAGAIN and continuing the
loop. I didn't catch the part where you had rewritten the for loop as a
do-while in __offline_pages.

> Please note that this is the exact old behavior, where isolating the
> page range would have failed directly and offlining would have been
> aborted early. The old offlining failure in the case in the offlining
> path would have been "failure to isolate range".
> 
> Also, note that the users of PageOffline() with unmovable pages are very
> rare (only balloon drivers for now).
> 
> > Would it perhaps make more sense to add a return value initialized to
> > EN

[virtio-dev] Re: [PATCH RFC v4 06/13] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

2020-02-25 Thread David Hildenbrand
>>  /*
>>   * Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
>> - * non-lru movable pages and hugepages). We scan pfn because it's much
>> - * easier than scanning over linked list. This function returns the pfn
>> - * of the first found movable page if it's found, otherwise 0.
>> + * non-lru movable pages and hugepages).
>> + *
>> + * Returns:
>> + *  0 in case a movable page is found and movable_pfn was updated.
>> + *  -ENOENT in case no movable page was found.
>> + *  -EBUSY in case a definetly unmovable page was found.
>>   */
>> -static unsigned long scan_movable_pages(unsigned long start, unsigned long 
>> end)
>> +static int scan_movable_pages(unsigned long start, unsigned long end,
>> +  unsigned long *movable_pfn)
>>  {
>>  unsigned long pfn;
>>  
>> @@ -1247,18 +1251,29 @@ static unsigned long scan_movable_pages(unsigned 
>> long start, unsigned long end)
>>  continue;
>>  page = pfn_to_page(pfn);
>>  if (PageLRU(page))
>> -return pfn;
>> +goto found;
>>  if (__PageMovable(page))
>> -return pfn;
>> +goto found;
>> +
>> +/*
>> + * Unmovable PageOffline() pages where somebody still holds
>> + * a reference count (after MEM_GOING_OFFLINE) can definetly
>> + * not be offlined.
>> + */
>> +if (PageOffline(page) && page_count(page))
>> +return -EBUSY;
> 
> So the comment confused me a bit because technically this function isn't
> about offlining memory, it is about finding movable pages. I had to do a
> bit of digging to find the only consumer is __offline_pages, but if we are
> going to talk about "offlining" instead of "moving" in this function it
> might make sense to rename it.

Well, it's contained in memory_hotplug.c, and the only user of moving
pages around in there is offlining code :) And it's job is to locate
movable pages, skip over some (temporary? unmovable ones) and (now)
indicate definitely unmovable ones.

Any idea for a better name?
scan_movable_pages_and_stop_on_definitely_unmovable() is not so nice :)


> 
>>  
>>  if (!PageHuge(page))
>>  continue;
>>  head = compound_head(page);
>>  if (page_huge_active(head))
>> -return pfn;
>> +goto found;
>>  skip = compound_nr(head) - (page - head);
>>  pfn += skip - 1;
>>  }
>> +return -ENOENT;
>> +found:
>> +*movable_pfn = pfn;
>>  return 0;
>>  }
> 
> So I am looking at this function and it seems like your change completely
> changes the behavior. The code before would walk the entire range and if
> at least 1 page was available to move you would return the PFN of that
> page. Now what seems to happen is that you will return -EBUSY as soon as
> you encounter an offline page with a page count. I would think that would
> slow down the offlining process since you have made the Unmovable
> PageOffline() page a head of line blocker that you have to wait to get
> around.

So, the comment says "Unmovable PageOffline() pages where somebody still
holds a reference count (after MEM_GOING_OFFLINE) can definitely not be
offlined". And the doc "-EBUSY in case a definitely unmovable page was
found."

So why would this make offlining slow? Offlining will be aborted,
because offlining is not possible.

Please note that this is the exact old behavior, where isolating the
page range would have failed directly and offlining would have been
aborted early. The old offlining failure in the case in the offlining
path would have been "failure to isolate range".

Also, note that the users of PageOffline() with unmovable pages are very
rare (only balloon drivers for now).

> 
> Would it perhaps make more sense to add a return value initialized to
> ENOENT, and if you encounter one of these offline pages you change the
> return value to EBUSY, and then if you walk through the entire list
> without finding a movable page you just return the value?

Did you have a look in  which context this function is used, especially
[1] and [2]?

> 
> Otherwise you might want to add a comment explaining why the function
> should stall instead of skipping over the unmovable section that will
> hopefully become movable later.

So we have "-EBUSY in case a definitely unmovable page was found.". Do
you have a better suggestion?

> 
>> @@ -1528,7 +1543,8 @@ static int __ref __offline_pages(unsigned long 
>> start_pfn,
>>  }
>>  
>>  do {
>> -for (pfn = start_pfn; pfn;) {
>> +pfn = start_pfn;
>> +do {
>>  if (signal_pending(current)) {
>>  ret = -EINTR;
>>  reason = "signal backoff";
>> @@ -1538,14 +1554,19 @@ static int __ref __offline_pages(unsigned long 
>> 

[virtio-dev] Re: [PATCH RFC v4 06/13] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

2020-02-25 Thread Alexander Duyck
On Thu, 2019-12-12 at 18:11 +0100, David Hildenbrand wrote:
> virtio-mem wants to allow to offline memory blocks of which some parts
> were unplugged (allocated via alloc_contig_range()), especially, to later
> offline and remove completely unplugged memory blocks. The important part
> is that PageOffline() has to remain set until the section is offline, so
> these pages will never get accessed (e.g., when dumping). The pages should
> not be handed back to the buddy (which would require clearing PageOffline()
> and result in issues if offlining fails and the pages are suddenly in the
> buddy).
> 
> Let's allow to do that by allowing to isolate any PageOffline() page
> when offlining. This way, we can reach the memory hotplug notifier
> MEM_GOING_OFFLINE, where the driver can signal that he is fine with
> offlining this page by dropping its reference count. PageOffline() pages
> with a reference count of 0 can then be skipped when offlining the
> pages (like if they were free, however they are not in the buddy).
> 
> Anybody who uses PageOffline() pages and does not agree to offline them
> (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
> decrement the reference count and make offlining fail when trying to
> migrate such an unmovable page. So there should be no observerable change.
> Same applies to balloon compaction users (movable PageOffline() pages), the
> pages will simply be migrated.
> 
> Note 1: If offlining fails, a driver has to increment the reference
>   count again in MEM_CANCEL_OFFLINE.
> 
> Note 2: A driver that makes use of this has to be aware that re-onlining
>   the memory block has to be handled by hooking into onlining code
>   (online_page_callback_t), resetting the page PageOffline() and
>   not giving them to the buddy.
> 
> Cc: Andrew Morton 
> Cc: Juergen Gross 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Pavel Tatashin 
> Cc: Alexander Duyck 
> Cc: Vlastimil Babka 
> Cc: Johannes Weiner 
> Cc: Anthony Yznaga 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: Mel Gorman 
> Cc: Mike Rapoport 
> Cc: Dan Williams 
> Cc: Anshuman Khandual 
> Cc: Qian Cai 
> Cc: Pingfan Liu 
> Signed-off-by: David Hildenbrand 
> ---
>  include/linux/page-flags.h | 10 ++
>  mm/memory_hotplug.c| 41 --
>  mm/page_alloc.c| 24 ++
>  mm/page_isolation.c|  9 +
>  4 files changed, 74 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 1bf83c8fcaa7..ac1775082343 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -761,6 +761,16 @@ PAGE_TYPE_OPS(Buddy, buddy)
>   * not onlined when onlining the section).
>   * The content of these pages is effectively stale. Such pages should not
>   * be touched (read/write/dump/save) except by their owner.
> + *
> + * If a driver wants to allow to offline unmovable PageOffline() pages 
> without
> + * putting them back to the buddy, it can do so via the memory notifier by
> + * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the
> + * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline()
> + * pages (now with a reference count of zero) are treated like free pages,
> + * allowing the containing memory block to get offlined. A driver that
> + * relies on this feature is aware that re-onlining the memory block will
> + * require to re-set the pages PageOffline() and not giving them to the
> + * buddy via online_page_callback_t.
>   */
>  PAGE_TYPE_OPS(Offline, offline)
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index fc617ad6f035..da01453a04e6 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1231,11 +1231,15 @@ int test_pages_in_a_zone(unsigned long start_pfn, 
> unsigned long end_pfn,
>  
>  /*
>   * Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
> - * non-lru movable pages and hugepages). We scan pfn because it's much
> - * easier than scanning over linked list. This function returns the pfn
> - * of the first found movable page if it's found, otherwise 0.
> + * non-lru movable pages and hugepages).
> + *
> + * Returns:
> + *   0 in case a movable page is found and movable_pfn was updated.
> + *   -ENOENT in case no movable page was found.
> + *   -EBUSY in case a definetly unmovable page was found.
>   */
> -static unsigned long scan_movable_pages(unsigned long start, unsigned long 
> end)
> +static int scan_movable_pages(unsigned long start, unsigned long end,
> +   unsigned long *movable_pfn)
>  {
>   unsigned long pfn;
>  
> @@ -1247,18 +1251,29 @@ static unsigned long scan_movable_pages(unsigned long 
> start, unsigned long end)
>   continue;
>   page = pfn_to_page(pfn);
>   if (PageLRU(page))
> - return pfn;
> + goto found;
> 

[virtio-dev] Re: [PATCH RFC v4 12/13] mm/vmscan: Export drop_slab() and drop_slab_node()

2020-02-25 Thread David Hildenbrand
On 25.02.20 18:06, Michal Hocko wrote:
> On Tue 25-02-20 16:09:29, David Hildenbrand wrote:
>> On 25.02.20 15:58, Michal Hocko wrote:
>>> On Thu 12-12-19 18:11:36, David Hildenbrand wrote:
 We already have a way to trigger reclaiming of all reclaimable slab objects
 from user space (echo 2 > /proc/sys/vm/drop_caches). Let's allow drivers
 to also trigger this when they really want to make progress and know what
 they are doing.
>>>
>>> I cannot say I would be fan of this. This is a global action with user
>>> visible performance impact. I am worried that we will find out that all
>>> sorts of drivers have a very good idea that dropping slab caches is
>>> going to help their problem whatever it is. We have seen the same patter
>>> in the userspace already and that is the reason we are logging the usage
>>> to the log and count invocations in the counter.
>>
>> Yeah, I decided to hold back patch 11-13 for the v1 (which I am planning
>> to post in March after more testing). What we really want is to make
>> memory offlining an alloc_contig_range() work better with reclaimable
>> objects.
>>
>>>
 virtio-mem wants to use these functions when it failed to unplug memory
 for quite some time (e.g., after 30 minutes). It will then try to
 free up reclaimable objects by dropping the slab caches every now and
 then (e.g., every 30 minutes) as long as necessary. There will be a way to
 disable this feature and info messages will be logged.

 In the future, we want to have a drop_slab_range() functionality
 instead. Memory offlining code has similar demands and also other
 alloc_contig_range() users (e.g., gigantic pages) could make good use of
 this feature. Adding it, however, requires more work/thought.
>>>
>>> We already do have a memory_notify(MEM_GOING_OFFLINE) for that purpose
>>> and slab allocator implements a callback (slab_mem_going_offline_callback).
>>> The callback is quite dumb and it doesn't really try to free objects
>>> from the given memory range or even try to drop active objects which
>>> might turn out to be hard but this sounds like a more robust way to
>>> achieve what you want.
>>
>> Two things:
>>
>> 1. memory_notify(MEM_GOING_OFFLINE) is called after trying to isolate
>> the page range and checking if we only have movable pages. Won't help
>> much I guess.
> 
> You are right, I have missed that. Can we reorder those two calls?

AFAIK no (would have to look up the details, but there was a good reason
for the order, e.g., avoid races with other users of page isolation like
alloc_contig_range()).

Especially, "[PATCH RFC v4 06/13] mm: Allow to offline unmovable
PageOffline() pages via MEM_GOING_OFFLINE" (which is still impatiently
waiting for an ACK ;) ) also works around that ordering issue in a way
we discussed back then.

-- 
Thanks,

David / dhildenb


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] [RFC] Upstreaming virtio-wayland (or an alternative)

2020-02-25 Thread Alex Bennée


Boris Brezillon  writes:

> Hi all,
>
> On Fri, 7 Feb 2020 18:28:42 +0100
> Boris Brezillon  wrote:
>
>> Hello everyone,
>> 
>> I recently took over Tomeu's task of upstreaming virtio-wayland. After
>> spending quite a bit of time collecting information from his different
>> attempts [1][2] I wanted to sync with all the people that were involved
>> in the previous discussions (if I missed some of them, feel free to add
>> them back).
>> 
>> The goal here is to get a rough idea of the general direction this
>> should take so I can start implementing a PoC and see if it fits
>> everyone's needs.
>> 
>> virtio-wayland [3] started as a solution to pass wayland messages
>> between host and guests so the guest can execute wayland apps whose
>> surface buffers are passed to the wayland compositor running on the
>> host. While this was its primary use case, I've heard it's been used to
>> transport other protocols. And that's not surprising, when looking at
>> the code I noticed it was providing a protocol-agnostic message passing
>> interface between host and guests, similar to what VSOCK provides but
>> with FD passing as an extra feature.
>> 
>> Based on all previous discussions, I could identify 3 different
>> approaches:
>> 
>> 1/ Use VSOCK and extend it to support passing (some) FDs
>> 2/ Use a user space VSOCK-based proxy that's in charge of
>>a/ passing regular messages
>>b/ passing specific handles to describe objects shared
>>   between host and guest (most focus has been on dmabufs as
>>   this is what we really care about for the gfx use case,
>>   but other kind of FDs can be emulated through a
>>   VSOCK <-> UNIX_SOCK bridging)
>> 3/ Have a dedicated kernel space solution that provides features
>>exposed by #1 but through a virtio device interface (basically
>>what virtio-wayland does today)
>> 


> Option #1 sounds like a good idea on
> the paper, but the fact that Google wants a solution that does not
> involve vhost and given Gerd's concern about being able to use vsock
> in some cases and do !vsock message-passing in others, it might not be
> possible to use vsock (there can only be one vsock implementation
> active).

Forgive me for jumping in but what is the concern about vhost here? Is
it purely the increased attack surface to the kernel or some more
fundamental issue? Is this something that can be addressed with a
vhost-user backend where all the complications of the host end are dealt
with in user-space?

> That leaves us with option #3: have a dedicated virtio device
> providing a message+FD passing interface.
>
> So, I'm about to start working on virtio-pipe (I realize the name is
> not that great since pipes are normally unidirectional, but I'm sure
> we'll have plenty of time to bikeshed on that particular aspect once
> the other bits are sorted out :)). This device would be a singleton
> (there can only be one per VM), and would provide the following
> features:
>
> * allow you to connect to named/public pipes (similar to named unix
>   sockets). That's an easy way to expose host services to guests, and
>   AFAICT, that's not so far from the virtio-wayland way of doing things
>   (VIRTWL_IOCTL_NEW_CTX_NAMED).
> * manage a central UUID <-> 'struct file' map that allows virtio-pipe
>   to convert FDs to UUIDs, pass UUIDs through a pipe and convert those
>   UUIDs back to FDs on the other end
>   - we need to expose an API to let each subsystem register/unregister
> their UUID <-> FD mapping (subsystems are responsible for the UUID
> creation/negotiation)
> * allow you to create private/unnamed pipes whose FDs can be passed to
>   the other end (that implies creating a UUID <-> FD mapping for each
>   unnamed pipe). This feature is here to allow unamed unix-socket <->
>   virtio-pipe bridging. AFAICT, that's what VIRTWL_IOCTL_NEW_CTX is
>   providing. Looks like virtio-wayland also provides support for actual
>   pipes (uni-directional msg-passing) with its VIRTWL_IOCTL_NEW_PIPE_*
>   ioctls, which we can always support at some point if we see the need.
>
> Once we have those basic building blocks in place, we can start
> patching virtio-gpu to create a UUID <-> FD mapping when a buffer is
> exported as a dmabuf (using the virtio-pipe public API). And this can
> easily be applied to any kind of FDs we want to share.
>
> Note that this solution could be extended to support FD passing on top
> of VSOCK if we ever need/want to. We'd only have to move the UUID <->
> FD map/API out of virtio-pipe (virtio-fd?) and let virtio-vsock use it.
>
> Gerd, Stefan, Zach, what do you think? Should I start looking into
> that, maybe work on a PoC and post it as an RFC, or do you see any
> problem making this proposal not possible/irrelevant.
>
> Any feedback is welcome!
>
> Thanks,
>
> Boris
>
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oas

Re: [virtio-dev] [RFC] Upstreaming virtio-wayland (or an alternative)

2020-02-25 Thread Boris Brezillon
Hi all,

On Fri, 7 Feb 2020 18:28:42 +0100
Boris Brezillon  wrote:

> Hello everyone,
> 
> I recently took over Tomeu's task of upstreaming virtio-wayland. After
> spending quite a bit of time collecting information from his different
> attempts [1][2] I wanted to sync with all the people that were involved
> in the previous discussions (if I missed some of them, feel free to add
> them back).
> 
> The goal here is to get a rough idea of the general direction this
> should take so I can start implementing a PoC and see if it fits
> everyone's needs.
> 
> virtio-wayland [3] started as a solution to pass wayland messages
> between host and guests so the guest can execute wayland apps whose
> surface buffers are passed to the wayland compositor running on the
> host. While this was its primary use case, I've heard it's been used to
> transport other protocols. And that's not surprising, when looking at
> the code I noticed it was providing a protocol-agnostic message passing
> interface between host and guests, similar to what VSOCK provides but
> with FD passing as an extra feature.
> 
> Based on all previous discussions, I could identify 3 different
> approaches:
> 
> 1/ Use VSOCK and extend it to support passing (some) FDs
> 2/ Use a user space VSOCK-based proxy that's in charge of
>a/ passing regular messages
>b/ passing specific handles to describe objects shared
>   between host and guest (most focus has been on dmabufs as
>   this is what we really care about for the gfx use case,
>   but other kind of FDs can be emulated through a
>   VSOCK <-> UNIX_SOCK bridging)
> 3/ Have a dedicated kernel space solution that provides features
>exposed by #1 but through a virtio device interface (basically
>what virtio-wayland does today)
> 
> Each of them has its pros and cons, which I'll try to sum-up (please
> correct me if I'm wrong, and add new things if you think they are
> missing).
> 
> #1 might require extra care if we want to make it safe, as pointed
> out by Stefan here [4] (but I wonder if the problem is not the same
> for a virtio-wayland based solution). Of course you also need a bit of
> infrastructure to register FD <-> VFD mappings (VFD being a virtual
> file descriptor that's only used as unique IDs identifying the resource
> backed by the local FD). FD <-> VFD mappings would have to be created
> by the subsystem in charge of the object backing the FD (virtio-gpu for
> exported GEM buffers, virtio-vdec for video buffers, vsock for unix
> sockets if we decide to bridge unix and vsock sockets to make it
> transparent, ...). The FD <-> VFD mapping would also have to be created
> on the host side, probably by the virtio device implementation
> (virglrenderer for GEM bufs for instance), which means host and guest
> need a way to inform the other end that a new FD <-> VFD mapping has
> been created so the other end can create a similar mapping (I guess this
> requires extra device-specific commands to work). Note that this
> solution doesn't look so different from the virtio-dmabuf [5] approach
> proposed by Gerd a few months back, it's just extended to be a global
> VFD <-> FD registry instead of a dmabuf <-> unique-handle one. One
> great thing about this approach is that we can re-use it for any kind
> of FD sharing, not just dmabufs.
> 
> #2 is a bit challenging, since it requires the proxy to know about all
> possible kind of FDs and do a FD <-> unique handle conversion with some
> help from the subsystem backing the FD. For dmabufs, that means we
> need to know who created the dmabuf, or assume that only one device is
> used for all allocations (virtio-gpu?). AFAIU, there's also a security
> issue as one could pass random (but potentially valid) handles to the
> host proxy (pointed out by Tomasz [6]).
> 
> #3 is pretty similar to #1 in its design except that, instead of using
> the VSOCK infrastructure it's using a new type of virtio device. I
> guess it has the same pros and cons #1 has, and the name should probably
> be changed to reflect the fact that it can transmit any kind of data not
> just wayland.
> 
> This is just a high level view of the problem and the solutions proposed
> by various people over the years. I'm sure I'm missing tons of details
> and don't realize yet all the complexity behind solution #1, but looking
> at this summary, I wonder if I should investigate this solution in
> priority. An alternative could be to rebrand virtio-wayland, but as I
> said, it's close enough to VSOCK to try to merge the missing features
> in VSOCK instead. This being said, I'm not yet set on any of those
> solutions, and the point of this email is to see with all of you which
> option I should investigate first.
> 
> Note that option #3 is already implemented (would have to be polished
> for upstream), IIRC option #2 has been partially implemented by Tomeu
> but I'm not sure it was finished, and option #1 has just been dis

[virtio-dev] Re: [PATCH RFC v4 12/13] mm/vmscan: Export drop_slab() and drop_slab_node()

2020-02-25 Thread David Hildenbrand
On 25.02.20 15:58, Michal Hocko wrote:
> On Thu 12-12-19 18:11:36, David Hildenbrand wrote:
>> We already have a way to trigger reclaiming of all reclaimable slab objects
>> from user space (echo 2 > /proc/sys/vm/drop_caches). Let's allow drivers
>> to also trigger this when they really want to make progress and know what
>> they are doing.
> 
> I cannot say I would be fan of this. This is a global action with user
> visible performance impact. I am worried that we will find out that all
> sorts of drivers have a very good idea that dropping slab caches is
> going to help their problem whatever it is. We have seen the same patter
> in the userspace already and that is the reason we are logging the usage
> to the log and count invocations in the counter.

Yeah, I decided to hold back patch 11-13 for the v1 (which I am planning
to post in March after more testing). What we really want is to make
memory offlining an alloc_contig_range() work better with reclaimable
objects.

> 
>> virtio-mem wants to use these functions when it failed to unplug memory
>> for quite some time (e.g., after 30 minutes). It will then try to
>> free up reclaimable objects by dropping the slab caches every now and
>> then (e.g., every 30 minutes) as long as necessary. There will be a way to
>> disable this feature and info messages will be logged.
>>
>> In the future, we want to have a drop_slab_range() functionality
>> instead. Memory offlining code has similar demands and also other
>> alloc_contig_range() users (e.g., gigantic pages) could make good use of
>> this feature. Adding it, however, requires more work/thought.
> 
> We already do have a memory_notify(MEM_GOING_OFFLINE) for that purpose
> and slab allocator implements a callback (slab_mem_going_offline_callback).
> The callback is quite dumb and it doesn't really try to free objects
> from the given memory range or even try to drop active objects which
> might turn out to be hard but this sounds like a more robust way to
> achieve what you want.

Two things:

1. memory_notify(MEM_GOING_OFFLINE) is called after trying to isolate
the page range and checking if we only have movable pages. Won't help
much I guess.

2. alloc_contig_range() won't benefit from that.

Something like drop_slab_range() would be better, and calling it from
the right spots in the core (e.g., set_migratetype_isolate() see below).

Especially, have a look at mm/page_isolation.c:set_migratetype_isolate()

"FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. We
just check MOVABLE pages."

-- 
Thanks,

David / dhildenb


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH RFC v4 08/13] mm/memory_hotplug: Introduce offline_and_remove_memory()

2020-02-25 Thread David Hildenbrand
On 25.02.20 15:11, Michal Hocko wrote:
> On Thu 12-12-19 18:11:32, David Hildenbrand wrote:
>> virtio-mem wants to offline and remove a memory block once it unplugged
>> all subblocks (e.g., using alloc_contig_range()). Let's provide
>> an interface to do that from a driver. virtio-mem already supports to
>> offline partially unplugged memory blocks. Offlining a fully unplugged
>> memory block will not require to migrate any pages. All unplugged
>> subblocks are PageOffline() and have a reference count of 0 - so
>> offlining code will simply skip them.
>>
>> All we need an interface to trigger the "offlining" and the removing in a
>> single operation - to make sure the memory block cannot get onlined by
>> user space again before it gets removed.
> 
> Why does that matter? Is it really likely that the userspace would
> interfere? What would be the scenario?

I guess it's not that relevant after all (I think this comment dates
back to the times where we didn't have try_remove_memory() and could
actually BUG_ON() in remove_memory() if there would have been a race).
Can drop that part.

> 
> Or is still mostly about not requiring callers to open code this general
> patter?

>From kernel module context, I cannot get access to the actual memory
block device (find_memory_block()) and call the device_unregister().

Especially, also the device hotplug lock is not exported. So this is a
clean helper function to be used from kernel module context. (e.g., also
hyper-v showed interest for using that)

-- 
Thanks,

David / dhildenb


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v3 2/2] virtio-video: Define a feature for exported objects from different virtio devices

2020-02-25 Thread Gerd Hoffmann
  Hi,

> +/*
> + * Followed by either
> + * - struct virtio_video_mem_entry entries[]
> + *   for VIRTIO_VIDEO_MEM_TYPE_GUEST_PAGES
> + * - struct virtio_video_object_entry entries[]
> + *   for VIRTIO_VIDEO_MEM_TYPE_VIRTIO_OBJECT

Wouldn't that be a single virtio_video_object_entry?
Or could it be one per plane?

cheers,
  Gerd


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v3 1/2] virtio-video: Add virtio video device specification

2020-02-25 Thread Gerd Hoffmann
On Thu, Feb 06, 2020 at 07:20:57PM +0900, Keiichi Watanabe wrote:
> From: Dmitry Sepp 
> 
> The virtio video encoder device and decoder device provide functionalities to
> encode and decode video stream respectively.
> Though video encoder and decoder are provided as different devices, they use a
> same protocol.
> 
> Signed-off-by: Dmitry Sepp 
> Signed-off-by: Keiichi Watanabe 

Finally found the time for a closer look.
Pretty good overall, some minor nits below ...

> +\begin{description}
> +\item[\field{version}] is the protocol version that the device talks.
> +  The device MUST set this to 0.

What is the intended use case for this?

Given that virtio has feature flags to negotiate support for optional
features and protocol extensions between driver and device, why do you
think this is needed?

> +The format description \field{virtio_video_format_desc} is defined as
> +follows:
> +\begin{lstlisting}
> +enum virtio_video_format {
> +/* Raw formats */
> +VIRTIO_VIDEO_FORMAT_RAW_MIN = 1,
> +VIRTIO_VIDEO_FORMAT_ARGB = VIRTIO_VIDEO_FORMAT_RAW_MIN,
> +VIRTIO_VIDEO_FORMAT_BGRA,
> +VIRTIO_VIDEO_FORMAT_NV12, /* 12  Y/CbCr 4:2:0  */
> +VIRTIO_VIDEO_FORMAT_YUV420, /* 12  YUV 4:2:0 */
> +VIRTIO_VIDEO_FORMAT_YVU420, /* 12  YVU 4:2:0 */
> +VIRTIO_VIDEO_FORMAT_RAW_MAX = VIRTIO_VIDEO_FORMAT_YVU420,

I'm wondering what the *_MIN and *_MAX values here (and elsewhere) are
good for?  I doubt drivers would actually loop over formats from min to
max, I'd expect they check for specific formats they can handle instead.

If you want define the range for valid raw formats I'd suggest to leave
some room, so new formats can be added without changing MAX values, i.e.
use -- for example -- RAW_MIN = 0x100, RAW_MAX = 0x1ff, CODED_MIN=0x200,
CODED_MAX=0x2ff.  Or just drop them ...

> +struct virtio_video_query_control_level {
> +le32 profile; /* One of VIRTIO_VIDEO_PROFILE_* */
^^^  LEVEL ?

cheers,
  Gerd


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH RFC v4 00/13] virtio-mem: paravirtualized memory

2020-02-25 Thread David Hildenbrand
On 29.01.20 10:41, David Hildenbrand wrote:
> On 09.01.20 14:48, David Hildenbrand wrote:
>> On 12.12.19 18:11, David Hildenbrand wrote:
>>> This series is based on latest linux-next. The patches are located at:
>>> https://github.com/davidhildenbrand/linux.git virtio-mem-rfc-v4
>>>
>>> The basic idea of virtio-mem is to provide a flexible,
>>> cross-architecture memory hot(un)plug solution that avoids many limitations
>>> imposed by existing technologies, architectures, and interfaces. More
>>> details can be found below and in linked material.
>>>
>>> This RFC is limited to x86-64, however, should theoretically work on any
>>> architecture that supports virtio and implements memory hot(un)plug under
>>> Linux - like s390x, powerpc64 and arm64. On x86-64, it is currently
>>> possible to add/remove memory to the system in >= 4MB granularity.
>>> Memory hotplug works very reliably. For memory unplug, there are no
>>> guarantees how much memory can actually get unplugged, it depends on the
>>> setup (especially: fragmentation of (unmovable) memory). I have plans to
>>> improve that in the future.
>>>
>>> --
>>> 1. virtio-mem
>>> --
>>>
>>> The basic idea behind virtio-mem was presented at KVM Forum 2018. The
>>> slides can be found at [1]. The previous RFC can be found at [2]. The
>>> first RFC can be found at [3]. However, the concept evolved over time. The
>>> KVM Forum slides roughly match the current design.
>>>
>>> Patch #2 ("virtio-mem: Paravirtualized memory hotplug") contains quite some
>>> information, especially in "include/uapi/linux/virtio_mem.h":
>>>
>>>   Each virtio-mem device manages a dedicated region in physical address
>>>   space. Each device can belong to a single NUMA node, multiple devices
>>>   for a single NUMA node are possible. A virtio-mem device is like a
>>>   "resizable DIMM" consisting of small memory blocks that can be plugged
>>>   or unplugged. The device driver is responsible for (un)plugging memory
>>>   blocks on demand.
>>>
>>>   Virtio-mem devices can only operate on their assigned memory region in
>>>   order to (un)plug memory. A device cannot (un)plug memory belonging to
>>>   other devices.
>>>
>>>   The "region_size" corresponds to the maximum amount of memory that can
>>>   be provided by a device. The "size" corresponds to the amount of memory
>>>   that is currently plugged. "requested_size" corresponds to a request
>>>   from the device to the device driver to (un)plug blocks. The
>>>   device driver should try to (un)plug blocks in order to reach the
>>>   "requested_size". It is impossible to plug more memory than requested.
>>>
>>>   The "usable_region_size" represents the memory region that can actually
>>>   be used to (un)plug memory. It is always at least as big as the
>>>   "requested_size" and will grow dynamically. It will only shrink when
>>>   explicitly triggered (VIRTIO_MEM_REQ_UNPLUG).
>>>
>>>   Memory in the usable region can usually be read, however, there are no
>>>   guarantees. It can happen that the device cannot process a request,
>>>   because it is busy. The device driver has to retry later.
>>>
>>>   Usually, during system resets all memory will get unplugged, so the
>>>   device driver can start with a clean state. However, in specific
>>>   scenarios (if the device is busy) it can happen that the device still
>>>   has memory plugged. The device driver can request to unplug all memory
>>>   (VIRTIO_MEM_REQ_UNPLUG) - which might take a while to succeed if the
>>>   device is busy.
>>>
>>> --
>>> 2. Linux Implementation
>>> --
>>>
>>> This RFC reuses quite some existing MM infrastructure, however, has to
>>> expose some additional functionality.
>>>
>>> Memory blocks (e.g., 128MB) are added/removed on demand. Within these
>>> memory blocks, subblocks (e.g., 4MB) are plugged/unplugged. The sizes
>>> depend on the target architecture, MAX_ORDER + pageblock_order, and
>>> the block size of a virtio-mem device.
>>>
>>> add_memory()/try_remove_memory() is used to add/remove memory blocks.
>>> virtio-mem will not online memory blocks itself. This has to be done by
>>> user space, or configured into the kernel
>>> (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE). virtio-mem will only unplug memory
>>> that was online to the ZONE_NORMAL. Memory is suggested to be onlined to
>>> the ZONE_NORMAL for now.
>>>
>>> The memory hotplug notifier is used to properly synchronize against
>>> onlining/offlining of memory blocks and to track the states of memory
>>> blocks (including the zone memory blocks are onlined to).
>>>
>>> The set_online_page() callback is used to keep unplugged subblocks
>>> of a memory block fake-offline when onlining the memory block.
>>> generic_onlin