Re: [PATCH 03/12] block: bio_release_pages: use flags arg instead of bool

2019-07-23 Thread Christoph Hellwig
On Tue, Jul 23, 2019 at 09:25:09PM -0700, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> In commit d241a95f3514 ("block: optionally mark pages dirty in
> bio_release_pages"), new "bool mark_dirty" argument was added to
> bio_release_pages.
> 
> In upcoming work, another bool argument (to indicate that the pages came
> from get_user_pages) is going to be added. That's one bool too many,
> because it's not desirable have calls of the form:

All pages releases by bio_release_pages should come from
get_get_user_pages, so I don't really see the point here.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Reminder: 3 open syzbot bugs in vhost subsystem

2019-07-23 Thread Jason Wang


On 2019/7/24 上午10:38, Eric Biggers wrote:

[This email was generated by a script.  Let me know if you have any suggestions
to make it better, or if you want it re-generated with the latest status.]

Of the currently open syzbot reports against the upstream kernel, I've manually
marked 3 of them as possibly being bugs in the vhost subsystem.  I've listed
these reports below, sorted by an algorithm that tries to list first the reports
most likely to be still valid, important, and actionable.

Of these 3 bugs, 2 were seen in mainline in the last week.

Of these 3 bugs, 2 were bisected to commits from the following person:

Jason Wang 

If you believe a bug is no longer valid, please close the syzbot report by
sending a '#syz fix', '#syz dup', or '#syz invalid' command in reply to the
original thread, as explained at https://goo.gl/tpsmEJ#status

If you believe I misattributed a bug to the vhost subsystem, please let me know,
and if possible forward the report to the correct people or mailing list.

Here are the bugs:


Title:  KASAN: use-after-free Write in tlb_finish_mmu
Last occurred:  5 days ago
Reported:   4 days ago
Branches:   Mainline
Dashboard link: 
https://syzkaller.appspot.com/bug?id=d57b94f89e48c85ef7d95acc208209ea4bdc10de
Original thread:
https://lkml.kernel.org/lkml/45e7a1058e024...@google.com/T/#u

This bug has a syzkaller reproducer only.

This bug was bisected to:

commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
Author: Jason Wang 
Date:   Fri May 24 08:12:18 2019 +

  vhost: access vq metadata through kernel virtual address

No one has replied to the original thread for this bug yet.

If you fix this bug, please add the following tag to the commit:
 Reported-by: syzbot+8267e9af795434ffa...@syzkaller.appspotmail.com

If you send any email or patch for this bug, please reply to the original
thread.  For the git send-email command to use, or tips on how to reply if the
thread isn't in your mailbox, see the "Reply instructions" at
https://lkml.kernel.org/r/45e7a1058e024...@google.com


Title:  KASAN: use-after-free Read in finish_task_switch (2)
Last occurred:  5 days ago
Reported:   4 days ago
Branches:   Mainline
Dashboard link: 
https://syzkaller.appspot.com/bug?id=9a98fcad6c8bd31f5c3afbdc6c75de9f082c0ffa
Original thread:
https://lkml.kernel.org/lkml/490679058e024...@google.com/T/#u

This bug has a syzkaller reproducer only.

This bug was bisected to:

commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
Author: Jason Wang 
Date:   Fri May 24 08:12:18 2019 +

  vhost: access vq metadata through kernel virtual address

No one has replied to the original thread for this bug yet.



Hi:

We believe above two bugs are duplicated with the report "WARNING in 
__mmdrop". Can I just dup them with


#syz dup "WARNING in __mmdrop"

(If yes, just wonder how syzbot differ bugs, technically, several 
different bug can hit the same warning).





If you fix this bug, please add the following tag to the commit:
 Reported-by: syzbot+7f067c796eee2acbc...@syzkaller.appspotmail.com

If you send any email or patch for this bug, please reply to the original
thread.  For the git send-email command to use, or tips on how to reply if the
thread isn't in your mailbox, see the "Reply instructions" at
https://lkml.kernel.org/r/490679058e024...@google.com


Title:  memory leak in vhost_net_ioctl
Last occurred:  22 days ago
Reported:   48 days ago
Branches:   Mainline
Dashboard link: 
https://syzkaller.appspot.com/bug?id=12ba349d7e26ccfe95317bc376e812ebbae2ee0f
Original thread:
https://lkml.kernel.org/lkml/188da1058a9c2...@google.com/T/#u

This bug has a C reproducer.

The original thread for this bug has received 4 replies; the last was 39 days
ago.

If you fix this bug, please add the following tag to the commit:
 Reported-by: syzbot+0789f0c7e45efd7bb...@syzkaller.appspotmail.com



I do remember it can not be reproduced upstream, let me double check and 
close this one.


Thanks




If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lkml.kernel.org/r/188da1058a9c2...@google.com


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 1/1] virtio/s390: fix race on airq_areas[]

2019-07-23 Thread Halil Pasic
The access to airq_areas was racy ever since the adapter interrupts got
introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390:
make airq summary indicators DMA") this became an issue in practice as
well. Namely before that commit the airq_info that got overwritten was
still functional. After that commit however the two infos share a
summary_indicator, which aggravates the situation. Which means
auto-online mechanism occasionally hangs the boot with virtio_blk.

Signed-off-by: Halil Pasic 
Reported-by: Marc Hartmayer 
Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.")
---
* We need definitely this fixed for 5.3. For older stable kernels it is
to be discussed. @Connie what do you think: do we need a cc stable?

* I have a variant that does not need the extra mutex but uses cmpxchg().
Decided to post this one because that one is more complex. But if there
is interest we can have a look at it as well.
---
 drivers/s390/virtio/virtio_ccw.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 1a55e5942d36..d97742662755 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -145,6 +145,8 @@ struct airq_info {
struct airq_iv *aiv;
 };
 static struct airq_info *airq_areas[MAX_AIRQ_AREAS];
+DEFINE_MUTEX(airq_areas_lock);
+
 static u8 *summary_indicators;
 
 static inline u8 *get_summary_indicator(struct airq_info *info)
@@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue 
*vqs[], int nvqs,
unsigned long bit, flags;
 
for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) {
+   mutex_lock(_areas_lock);
if (!airq_areas[i])
airq_areas[i] = new_airq_info(i);
info = airq_areas[i];
+   mutex_unlock(_areas_lock);
if (!info)
return 0;
write_lock_irqsave(>lock, flags);
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-23 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 05:38:30PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 11:33:35AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote:
> > > Do not call dma_max_mapping_size for devices that have no DMA
> > > mask set, otherwise we can hit a NULL pointer dereference.
> > > 
> > > This occurs when a virtio-blk-pci device is protected with
> > > a virtual IOMMU.
> > > 
> > > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> > > Signed-off-by: Eric Auger 
> > > Suggested-by: Christoph Hellwig 
> > 
> > Christoph, I wonder why did you suggest this?
> > The connection between dma_mask and dma_max_mapping_size
> > is far from obvious.  The documentation doesn't exist.
> > Do we really have to teach all users about this hack?
> > Why not just make dma_max_mapping_size DTRT?
> 
> Because we should not call into dma API functions for devices that
> are not DMA capable.

I'd rather call is_device_dma_capable then, better than poking
at DMA internals.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-23 Thread Christoph Hellwig
On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote:
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index c8be1c4f5b55..37c143971211 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
>>   {
>>  size_t max_segment_size = SIZE_MAX;
>>   -  if (vring_use_dma_api(vdev))
>> +if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
>
> Hmm, might it make sense to roll that check up into vring_use_dma_api() 
> itself? After all, if the device has no mask then it's likely that other 
> DMA API ops wouldn't really work as expected either.

Makes sense to me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-23 Thread Christoph Hellwig
On Mon, Jul 22, 2019 at 11:33:35AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote:
> > Do not call dma_max_mapping_size for devices that have no DMA
> > mask set, otherwise we can hit a NULL pointer dereference.
> > 
> > This occurs when a virtio-blk-pci device is protected with
> > a virtual IOMMU.
> > 
> > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()")
> > Signed-off-by: Eric Auger 
> > Suggested-by: Christoph Hellwig 
> 
> Christoph, I wonder why did you suggest this?
> The connection between dma_mask and dma_max_mapping_size
> is far from obvious.  The documentation doesn't exist.
> Do we really have to teach all users about this hack?
> Why not just make dma_max_mapping_size DTRT?

Because we should not call into dma API functions for devices that
are not DMA capable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] dma-mapping: Use dma_get_mask in dma_addressing_limited

2019-07-23 Thread Christoph Hellwig
On Mon, Jul 22, 2019 at 06:56:49PM +0200, Auger Eric wrote:
> Hi Christoph,
> 
> On 7/22/19 6:51 PM, Eric Auger wrote:
> > We currently have cases where the dma_addressing_limited() gets
> > called with dma_mask unset. This causes a NULL pointer dereference.
> > 
> > Use dma_get_mask() accessor to prevent the crash.
> > 
> > Fixes: b866455423e0 ("dma-mapping: add a dma_addressing_limited helper")
> > Signed-off-by: Eric Auger 
> 
> As a follow-up of my last email, here is a patch featuring
> dma_get_mask(). But you don't have the WARN_ON_ONCE anymore, pointing
> out suspect users.
> 
> Feel free to pick up your preferred approach

I can take this for now as we are past the merge window.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/6] vhost: validate MMU notifier registration

2019-07-23 Thread Jason Wang


On 2019/7/23 下午5:17, Michael S. Tsirkin wrote:

On Tue, Jul 23, 2019 at 03:57:14AM -0400, Jason Wang wrote:

The return value of mmu_notifier_register() is not checked in
vhost_vring_set_num_addr(). This will cause an out of sync between mm
and MMU notifier thus a double free. To solve this, introduce a
boolean flag to track whether MMU notifier is registered and only do
unregistering when it was true.

Reported-and-tested-by:
syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang 

Right. This fixes the bug.
But it's not great that simple things like
setting vq address put pressure on memory allocator.
Also, if we get a single during processing
notifier register will fail, disabling optimization permanently.



Yes, but do we really care about this case. E.g we even fail for -ENOMEM 
for set owner.





In fact, see below:



---
  drivers/vhost/vhost.c | 19 +++
  drivers/vhost/vhost.h |  1 +
  2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 34c0d970bcbc..058191d5efad 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -630,6 +630,7 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->iov_limit = iov_limit;
dev->weight = weight;
dev->byte_weight = byte_weight;
+   dev->has_notifier = false;
init_llist_head(>work_list);
init_waitqueue_head(>wait);
INIT_LIST_HEAD(>read_list);
@@ -731,6 +732,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
if (err)
goto err_mmu_notifier;
  #endif
+   dev->has_notifier = true;
  
  	return 0;
  

I just noticed that set owner now fails if we get a signal.
Userspace could retry in theory but it does not:
this is userspace abi breakage since it used to only
fail on invalid input.



Well, at least kthread_create() and vhost_dev_alloc_iovecs() will 
allocate memory.


Thanks





@@ -960,7 +962,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
}
if (dev->mm) {
  #if VHOST_ARCH_CAN_ACCEL_UACCESS
-   mmu_notifier_unregister(>mmu_notifier, dev->mm);
+   if (dev->has_notifier) {
+   mmu_notifier_unregister(>mmu_notifier,
+   dev->mm);
+   dev->has_notifier = false;
+   }
  #endif
mmput(dev->mm);
}
@@ -2065,8 +2071,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
/* Unregister MMU notifer to allow invalidation callback
 * can access vq->uaddrs[] without holding a lock.
 */
-   if (d->mm)
+   if (d->has_notifier) {
mmu_notifier_unregister(>mmu_notifier, d->mm);
+   d->has_notifier = false;
+   }
  
  	vhost_uninit_vq_maps(vq);

  #endif
@@ -2086,8 +2094,11 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
if (r == 0)
vhost_setup_vq_uaddr(vq);
  
-	if (d->mm)

-   mmu_notifier_register(>mmu_notifier, d->mm);
+   if (d->mm) {
+   r = mmu_notifier_register(>mmu_notifier, d->mm);
+   if (!r)
+   d->has_notifier = true;
+   }
  #endif
  
  	mutex_unlock(>mutex);

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 819296332913..a62f56a4cf72 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -214,6 +214,7 @@ struct vhost_dev {
int iov_limit;
int weight;
int byte_weight;
+   bool has_notifier;
  };
  
  bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);

--
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/6] vhost: reset invalidate_count in vhost_set_vring_num_addr()

2019-07-23 Thread Jason Wang


On 2019/7/23 下午5:17, Michael S. Tsirkin wrote:

On Tue, Jul 23, 2019 at 03:57:16AM -0400, Jason Wang wrote:

The vhost_set_vring_num_addr() could be called in the middle of
invalidate_range_start() and invalidate_range_end(). If we don't reset
invalidate_count after the un-registering of MMU notifier, the
invalidate_cont will run out of sync (e.g never reach zero). This will
in fact disable the fast accessor path. Fixing by reset the count to
zero.

Reported-by: Michael S. Tsirkin 
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang 
---
  drivers/vhost/vhost.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 03666b702498..89c9f08b5146 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2074,6 +2074,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
d->has_notifier = false;
}
  
+	/* reset invalidate_count in case we are in the middle of

+* invalidate_start() and invalidate_end().
+*/
+   vq->invalidate_count = 0;

I think that the code is ok but the comments are not very clear:
- we are never in the middle since we just removed the notifier



If I read the code correctly, mmu_notifier_unregister() can only 
guarantee to wait for the pending method to complete. So we can have:


invalidate_start()

mmu_notifier_unregister()

invalidate_end()



- the result is not just disabling optimization:
   if notifier becomes negative, then later we
   can think it's ok to map when it isn't since
   notifier is active.



I don't get how it could be negative, the only possible thing is to have 
a positive value.


Thanks





vhost_uninit_vq_maps(vq);
  #endif
  
--

2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/6] vhost: mark dirty pages during map uninit

2019-07-23 Thread Jason Wang


On 2019/7/23 下午5:17, Michael S. Tsirkin wrote:

On Tue, Jul 23, 2019 at 03:57:17AM -0400, Jason Wang wrote:

We don't mark dirty pages if the map was teared down outside MMU
notifier. This will lead untracked dirty pages. Fixing by marking
dirty pages during map uninit.

Reported-by: Michael S. Tsirkin
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang
---
  drivers/vhost/vhost.c | 22 --
  1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 89c9f08b5146..5b8821d00fe4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -306,6 +306,18 @@ static void vhost_map_unprefetch(struct vhost_map *map)
kfree(map);
  }
  
+static void vhost_set_map_dirty(struct vhost_virtqueue *vq,

+   struct vhost_map *map, int index)
+{
+   struct vhost_uaddr *uaddr = >uaddrs[index];
+   int i;
+
+   if (uaddr->write) {
+   for (i = 0; i < map->npages; i++)
+   set_page_dirty(map->pages[i]);
+   }
+}
+
  static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
  {
struct vhost_map *map[VHOST_NUM_ADDRS];
@@ -315,8 +327,10 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue 
*vq)
for (i = 0; i < VHOST_NUM_ADDRS; i++) {
map[i] = rcu_dereference_protected(vq->maps[i],
  lockdep_is_held(>mmu_lock));
-   if (map[i])
+   if (map[i]) {
+   vhost_set_map_dirty(vq, map[i], i);
rcu_assign_pointer(vq->maps[i], NULL);
+   }
}
spin_unlock(>mmu_lock);
  
@@ -354,7 +368,6 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,

  {
struct vhost_uaddr *uaddr = >uaddrs[index];
struct vhost_map *map;
-   int i;
  
  	if (!vhost_map_range_overlap(uaddr, start, end))

return;
@@ -365,10 +378,7 @@ static void vhost_invalidate_vq_start(struct 
vhost_virtqueue *vq,
map = rcu_dereference_protected(vq->maps[index],
lockdep_is_held(>mmu_lock));
if (map) {
-   if (uaddr->write) {
-   for (i = 0; i < map->npages; i++)
-   set_page_dirty(map->pages[i]);
-   }
+   vhost_set_map_dirty(vq, map, index);
rcu_assign_pointer(vq->maps[index], NULL);
}
spin_unlock(>mmu_lock);

OK and the reason it's safe is because the invalidate counter
got incremented so we know page will not get mapped again.

But we*do*  need to wait for page not to be mapped.
And if that means waiting for VQ processing to finish,
then I worry that is a very log time.



I'm not sure I get you here. If we don't have such map, we will fall 
back to normal uaccess helper. And in the memory accessor, the rcu 
critical section is pretty small.


Thanks



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 6/6] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()

2019-07-23 Thread Jason Wang


On 2019/7/23 下午5:16, Michael S. Tsirkin wrote:

On Tue, Jul 23, 2019 at 03:57:18AM -0400, Jason Wang wrote:

There's no need for RCU synchronization in vhost_uninit_vq_maps()
since we've already serialized with readers (memory accessors). This
also avoid the possible userspace DOS through ioctl() because of the
possible high latency caused by synchronize_rcu().

Reported-by: Michael S. Tsirkin 
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang 

I agree synchronize_rcu in both mmu notifiers and ioctl
is a problem we must fix.


---
  drivers/vhost/vhost.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5b8821d00fe4..a17df1f4069a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -334,7 +334,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
}
spin_unlock(>mmu_lock);
  
-	synchronize_rcu();

+   /* No need for synchronize_rcu() or kfree_rcu() since we are
+* serialized with memory accessors (e.g vq mutex held).
+*/
  
  	for (i = 0; i < VHOST_NUM_ADDRS; i++)

if (map[i])
--
2.18.1

.. however we can not RCU with no synchronization in sight.
Sometimes there are hacks like using a lock/unlock
pair instead of sync, but here no one bothers.

specifically notifiers call reset vq maps which calls
uninit vq maps which is not under any lock.



Notifier did this:

    if (map) {
    if (uaddr->write) {
    for (i = 0; i < map->npages; i++)
set_page_dirty(map->pages[i]);
}
    rcu_assign_pointer(vq->maps[index], NULL);
}
spin_unlock(>mmu_lock);

    if (map) {
synchronize_rcu();
vhost_map_unprefetch(map);
    }

So it indeed have a synchronize_rcu() there.

Thanks




You will get use after free when map is then accessed.

If you always have a lock then just take that lock
and no need for RCU.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/6] vhost: validate MMU notifier registration

2019-07-23 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 03:57:14AM -0400, Jason Wang wrote:
> The return value of mmu_notifier_register() is not checked in
> vhost_vring_set_num_addr(). This will cause an out of sync between mm
> and MMU notifier thus a double free. To solve this, introduce a
> boolean flag to track whether MMU notifier is registered and only do
> unregistering when it was true.
> 
> Reported-and-tested-by:
> syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual 
> address")
> Signed-off-by: Jason Wang 

Right. This fixes the bug.
But it's not great that simple things like
setting vq address put pressure on memory allocator.
Also, if we get a single during processing
notifier register will fail, disabling optimization permanently.

In fact, see below:


> ---
>  drivers/vhost/vhost.c | 19 +++
>  drivers/vhost/vhost.h |  1 +
>  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 34c0d970bcbc..058191d5efad 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -630,6 +630,7 @@ void vhost_dev_init(struct vhost_dev *dev,
>   dev->iov_limit = iov_limit;
>   dev->weight = weight;
>   dev->byte_weight = byte_weight;
> + dev->has_notifier = false;
>   init_llist_head(>work_list);
>   init_waitqueue_head(>wait);
>   INIT_LIST_HEAD(>read_list);
> @@ -731,6 +732,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>   if (err)
>   goto err_mmu_notifier;
>  #endif
> + dev->has_notifier = true;
>  
>   return 0;
>  

I just noticed that set owner now fails if we get a signal.
Userspace could retry in theory but it does not:
this is userspace abi breakage since it used to only
fail on invalid input.

> @@ -960,7 +962,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>   }
>   if (dev->mm) {
>  #if VHOST_ARCH_CAN_ACCEL_UACCESS
> - mmu_notifier_unregister(>mmu_notifier, dev->mm);
> + if (dev->has_notifier) {
> + mmu_notifier_unregister(>mmu_notifier,
> + dev->mm);
> + dev->has_notifier = false;
> + }
>  #endif
>   mmput(dev->mm);
>   }
> @@ -2065,8 +2071,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev 
> *d,
>   /* Unregister MMU notifer to allow invalidation callback
>* can access vq->uaddrs[] without holding a lock.
>*/
> - if (d->mm)
> + if (d->has_notifier) {
>   mmu_notifier_unregister(>mmu_notifier, d->mm);
> + d->has_notifier = false;
> + }
>  
>   vhost_uninit_vq_maps(vq);
>  #endif
> @@ -2086,8 +2094,11 @@ static long vhost_vring_set_num_addr(struct vhost_dev 
> *d,
>   if (r == 0)
>   vhost_setup_vq_uaddr(vq);
>  
> - if (d->mm)
> - mmu_notifier_register(>mmu_notifier, d->mm);
> + if (d->mm) {
> + r = mmu_notifier_register(>mmu_notifier, d->mm);
> + if (!r)
> + d->has_notifier = true;
> + }
>  #endif
>  
>   mutex_unlock(>mutex);
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 819296332913..a62f56a4cf72 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -214,6 +214,7 @@ struct vhost_dev {
>   int iov_limit;
>   int weight;
>   int byte_weight;
> + bool has_notifier;
>  };
>  
>  bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int 
> total_len);
> -- 
> 2.18.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 4/6] vhost: reset invalidate_count in vhost_set_vring_num_addr()

2019-07-23 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 03:57:16AM -0400, Jason Wang wrote:
> The vhost_set_vring_num_addr() could be called in the middle of
> invalidate_range_start() and invalidate_range_end(). If we don't reset
> invalidate_count after the un-registering of MMU notifier, the
> invalidate_cont will run out of sync (e.g never reach zero). This will
> in fact disable the fast accessor path. Fixing by reset the count to
> zero.
> 
> Reported-by: Michael S. Tsirkin 
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual 
> address")
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vhost.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 03666b702498..89c9f08b5146 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2074,6 +2074,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev 
> *d,
>   d->has_notifier = false;
>   }
>  
> + /* reset invalidate_count in case we are in the middle of
> +  * invalidate_start() and invalidate_end().
> +  */
> + vq->invalidate_count = 0;

I think that the code is ok but the comments are not very clear:
- we are never in the middle since we just removed the notifier
- the result is not just disabling optimization:
  if notifier becomes negative, then later we
  can think it's ok to map when it isn't since
  notifier is active.

>   vhost_uninit_vq_maps(vq);
>  #endif
>  
> -- 
> 2.18.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 5/6] vhost: mark dirty pages during map uninit

2019-07-23 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 03:57:17AM -0400, Jason Wang wrote:
> We don't mark dirty pages if the map was teared down outside MMU
> notifier. This will lead untracked dirty pages. Fixing by marking
> dirty pages during map uninit.
> 
> Reported-by: Michael S. Tsirkin 
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual 
> address")
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vhost.c | 22 --
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 89c9f08b5146..5b8821d00fe4 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -306,6 +306,18 @@ static void vhost_map_unprefetch(struct vhost_map *map)
>   kfree(map);
>  }
>  
> +static void vhost_set_map_dirty(struct vhost_virtqueue *vq,
> + struct vhost_map *map, int index)
> +{
> + struct vhost_uaddr *uaddr = >uaddrs[index];
> + int i;
> +
> + if (uaddr->write) {
> + for (i = 0; i < map->npages; i++)
> + set_page_dirty(map->pages[i]);
> + }
> +}
> +
>  static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>  {
>   struct vhost_map *map[VHOST_NUM_ADDRS];
> @@ -315,8 +327,10 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue 
> *vq)
>   for (i = 0; i < VHOST_NUM_ADDRS; i++) {
>   map[i] = rcu_dereference_protected(vq->maps[i],
> lockdep_is_held(>mmu_lock));
> - if (map[i])
> + if (map[i]) {
> + vhost_set_map_dirty(vq, map[i], i);
>   rcu_assign_pointer(vq->maps[i], NULL);
> + }
>   }
>   spin_unlock(>mmu_lock);
>  
> @@ -354,7 +368,6 @@ static void vhost_invalidate_vq_start(struct 
> vhost_virtqueue *vq,
>  {
>   struct vhost_uaddr *uaddr = >uaddrs[index];
>   struct vhost_map *map;
> - int i;
>  
>   if (!vhost_map_range_overlap(uaddr, start, end))
>   return;
> @@ -365,10 +378,7 @@ static void vhost_invalidate_vq_start(struct 
> vhost_virtqueue *vq,
>   map = rcu_dereference_protected(vq->maps[index],
>   lockdep_is_held(>mmu_lock));
>   if (map) {
> - if (uaddr->write) {
> - for (i = 0; i < map->npages; i++)
> - set_page_dirty(map->pages[i]);
> - }
> + vhost_set_map_dirty(vq, map, index);
>   rcu_assign_pointer(vq->maps[index], NULL);
>   }
>   spin_unlock(>mmu_lock);

OK and the reason it's safe is because the invalidate counter
got incremented so we know page will not get mapped again.

But we *do* need to wait for page not to be mapped.
And if that means waiting for VQ processing to finish,
then I worry that is a very log time.


> -- 
> 2.18.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 6/6] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()

2019-07-23 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 03:57:18AM -0400, Jason Wang wrote:
> There's no need for RCU synchronization in vhost_uninit_vq_maps()
> since we've already serialized with readers (memory accessors). This
> also avoid the possible userspace DOS through ioctl() because of the
> possible high latency caused by synchronize_rcu().
> 
> Reported-by: Michael S. Tsirkin 
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual 
> address")
> Signed-off-by: Jason Wang 

I agree synchronize_rcu in both mmu notifiers and ioctl
is a problem we must fix.

> ---
>  drivers/vhost/vhost.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 5b8821d00fe4..a17df1f4069a 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -334,7 +334,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue 
> *vq)
>   }
>   spin_unlock(>mmu_lock);
>  
> - synchronize_rcu();
> + /* No need for synchronize_rcu() or kfree_rcu() since we are
> +  * serialized with memory accessors (e.g vq mutex held).
> +  */
>  
>   for (i = 0; i < VHOST_NUM_ADDRS; i++)
>   if (map[i])
> -- 
> 2.18.1

.. however we can not RCU with no synchronization in sight.
Sometimes there are hacks like using a lock/unlock
pair instead of sync, but here no one bothers.

specifically notifiers call reset vq maps which calls
uninit vq maps which is not under any lock.

You will get use after free when map is then accessed.

If you always have a lock then just take that lock
and no need for RCU.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 6/6] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()

2019-07-23 Thread Jason Wang
There's no need for RCU synchronization in vhost_uninit_vq_maps()
since we've already serialized with readers (memory accessors). This
also avoid the possible userspace DOS through ioctl() because of the
possible high latency caused by synchronize_rcu().

Reported-by: Michael S. Tsirkin 
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5b8821d00fe4..a17df1f4069a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -334,7 +334,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
}
spin_unlock(>mmu_lock);
 
-   synchronize_rcu();
+   /* No need for synchronize_rcu() or kfree_rcu() since we are
+* serialized with memory accessors (e.g vq mutex held).
+*/
 
for (i = 0; i < VHOST_NUM_ADDRS; i++)
if (map[i])
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 4/6] vhost: reset invalidate_count in vhost_set_vring_num_addr()

2019-07-23 Thread Jason Wang
The vhost_set_vring_num_addr() could be called in the middle of
invalidate_range_start() and invalidate_range_end(). If we don't reset
invalidate_count after the un-registering of MMU notifier, the
invalidate_cont will run out of sync (e.g never reach zero). This will
in fact disable the fast accessor path. Fixing by reset the count to
zero.

Reported-by: Michael S. Tsirkin 
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 03666b702498..89c9f08b5146 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2074,6 +2074,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
d->has_notifier = false;
}
 
+   /* reset invalidate_count in case we are in the middle of
+* invalidate_start() and invalidate_end().
+*/
+   vq->invalidate_count = 0;
vhost_uninit_vq_maps(vq);
 #endif
 
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/6] vhost: fix vhost map leak

2019-07-23 Thread Jason Wang
We don't free map during vhost_map_unprefetch(). This means it could
be leaked. Fixing by free the map.

Reported-by: Michael S. Tsirkin 
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 058191d5efad..03666b702498 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -303,9 +303,7 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
 static void vhost_map_unprefetch(struct vhost_map *map)
 {
kfree(map->pages);
-   map->pages = NULL;
-   map->npages = 0;
-   map->addr = NULL;
+   kfree(map);
 }
 
 static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 5/6] vhost: mark dirty pages during map uninit

2019-07-23 Thread Jason Wang
We don't mark dirty pages if the map was teared down outside MMU
notifier. This will lead untracked dirty pages. Fixing by marking
dirty pages during map uninit.

Reported-by: Michael S. Tsirkin 
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 89c9f08b5146..5b8821d00fe4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -306,6 +306,18 @@ static void vhost_map_unprefetch(struct vhost_map *map)
kfree(map);
 }
 
+static void vhost_set_map_dirty(struct vhost_virtqueue *vq,
+   struct vhost_map *map, int index)
+{
+   struct vhost_uaddr *uaddr = >uaddrs[index];
+   int i;
+
+   if (uaddr->write) {
+   for (i = 0; i < map->npages; i++)
+   set_page_dirty(map->pages[i]);
+   }
+}
+
 static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
 {
struct vhost_map *map[VHOST_NUM_ADDRS];
@@ -315,8 +327,10 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue 
*vq)
for (i = 0; i < VHOST_NUM_ADDRS; i++) {
map[i] = rcu_dereference_protected(vq->maps[i],
  lockdep_is_held(>mmu_lock));
-   if (map[i])
+   if (map[i]) {
+   vhost_set_map_dirty(vq, map[i], i);
rcu_assign_pointer(vq->maps[i], NULL);
+   }
}
spin_unlock(>mmu_lock);
 
@@ -354,7 +368,6 @@ static void vhost_invalidate_vq_start(struct 
vhost_virtqueue *vq,
 {
struct vhost_uaddr *uaddr = >uaddrs[index];
struct vhost_map *map;
-   int i;
 
if (!vhost_map_range_overlap(uaddr, start, end))
return;
@@ -365,10 +378,7 @@ static void vhost_invalidate_vq_start(struct 
vhost_virtqueue *vq,
map = rcu_dereference_protected(vq->maps[index],
lockdep_is_held(>mmu_lock));
if (map) {
-   if (uaddr->write) {
-   for (i = 0; i < map->npages; i++)
-   set_page_dirty(map->pages[i]);
-   }
+   vhost_set_map_dirty(vq, map, index);
rcu_assign_pointer(vq->maps[index], NULL);
}
spin_unlock(>mmu_lock);
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/6] vhost: validate MMU notifier registration

2019-07-23 Thread Jason Wang
The return value of mmu_notifier_register() is not checked in
vhost_vring_set_num_addr(). This will cause an out of sync between mm
and MMU notifier thus a double free. To solve this, introduce a
boolean flag to track whether MMU notifier is registered and only do
unregistering when it was true.

Reported-and-tested-by:
syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 19 +++
 drivers/vhost/vhost.h |  1 +
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 34c0d970bcbc..058191d5efad 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -630,6 +630,7 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->iov_limit = iov_limit;
dev->weight = weight;
dev->byte_weight = byte_weight;
+   dev->has_notifier = false;
init_llist_head(>work_list);
init_waitqueue_head(>wait);
INIT_LIST_HEAD(>read_list);
@@ -731,6 +732,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
if (err)
goto err_mmu_notifier;
 #endif
+   dev->has_notifier = true;
 
return 0;
 
@@ -960,7 +962,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
}
if (dev->mm) {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-   mmu_notifier_unregister(>mmu_notifier, dev->mm);
+   if (dev->has_notifier) {
+   mmu_notifier_unregister(>mmu_notifier,
+   dev->mm);
+   dev->has_notifier = false;
+   }
 #endif
mmput(dev->mm);
}
@@ -2065,8 +2071,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
/* Unregister MMU notifer to allow invalidation callback
 * can access vq->uaddrs[] without holding a lock.
 */
-   if (d->mm)
+   if (d->has_notifier) {
mmu_notifier_unregister(>mmu_notifier, d->mm);
+   d->has_notifier = false;
+   }
 
vhost_uninit_vq_maps(vq);
 #endif
@@ -2086,8 +2094,11 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
if (r == 0)
vhost_setup_vq_uaddr(vq);
 
-   if (d->mm)
-   mmu_notifier_register(>mmu_notifier, d->mm);
+   if (d->mm) {
+   r = mmu_notifier_register(>mmu_notifier, d->mm);
+   if (!r)
+   d->has_notifier = true;
+   }
 #endif
 
mutex_unlock(>mutex);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 819296332913..a62f56a4cf72 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -214,6 +214,7 @@ struct vhost_dev {
int iov_limit;
int weight;
int byte_weight;
+   bool has_notifier;
 };
 
 bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/6] vhost: don't set uaddr for invalid address

2019-07-23 Thread Jason Wang
We should not setup uaddr for the invalid address, otherwise we may
try to pin or prefetch mapping of wrong pages.

Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dc9301d31f12..34c0d970bcbc 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2083,7 +2083,8 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
}
 
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-   vhost_setup_vq_uaddr(vq);
+   if (r == 0)
+   vhost_setup_vq_uaddr(vq);
 
if (d->mm)
mmu_notifier_register(>mmu_notifier, d->mm);
-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/6] Fixes for meta data acceleration

2019-07-23 Thread Jason Wang
Hi all:

This series try to fix several issues introduced by meta data
accelreation series. Please review.

Jason Wang (6):
  vhost: don't set uaddr for invalid address
  vhost: validate MMU notifier registration
  vhost: fix vhost map leak
  vhost: reset invalidate_count in vhost_set_vring_num_addr()
  vhost: mark dirty pages during map uninit
  vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()

 drivers/vhost/vhost.c | 56 +++
 drivers/vhost/vhost.h |  1 +
 2 files changed, 42 insertions(+), 15 deletions(-)

-- 
2.18.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization