Re: [PATCH 03/12] block: bio_release_pages: use flags arg instead of bool
On Tue, Jul 23, 2019 at 09:25:09PM -0700, john.hubb...@gmail.com wrote: > From: John Hubbard > > In commit d241a95f3514 ("block: optionally mark pages dirty in > bio_release_pages"), new "bool mark_dirty" argument was added to > bio_release_pages. > > In upcoming work, another bool argument (to indicate that the pages came > from get_user_pages) is going to be added. That's one bool too many, > because it's not desirable have calls of the form: All pages releases by bio_release_pages should come from get_get_user_pages, so I don't really see the point here. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Reminder: 3 open syzbot bugs in vhost subsystem
On 2019/7/24 上午10:38, Eric Biggers wrote: [This email was generated by a script. Let me know if you have any suggestions to make it better, or if you want it re-generated with the latest status.] Of the currently open syzbot reports against the upstream kernel, I've manually marked 3 of them as possibly being bugs in the vhost subsystem. I've listed these reports below, sorted by an algorithm that tries to list first the reports most likely to be still valid, important, and actionable. Of these 3 bugs, 2 were seen in mainline in the last week. Of these 3 bugs, 2 were bisected to commits from the following person: Jason Wang If you believe a bug is no longer valid, please close the syzbot report by sending a '#syz fix', '#syz dup', or '#syz invalid' command in reply to the original thread, as explained at https://goo.gl/tpsmEJ#status If you believe I misattributed a bug to the vhost subsystem, please let me know, and if possible forward the report to the correct people or mailing list. Here are the bugs: Title: KASAN: use-after-free Write in tlb_finish_mmu Last occurred: 5 days ago Reported: 4 days ago Branches: Mainline Dashboard link: https://syzkaller.appspot.com/bug?id=d57b94f89e48c85ef7d95acc208209ea4bdc10de Original thread: https://lkml.kernel.org/lkml/45e7a1058e024...@google.com/T/#u This bug has a syzkaller reproducer only. This bug was bisected to: commit 7f466032dc9e5a61217f22ea34b2df932786bbfc Author: Jason Wang Date: Fri May 24 08:12:18 2019 + vhost: access vq metadata through kernel virtual address No one has replied to the original thread for this bug yet. If you fix this bug, please add the following tag to the commit: Reported-by: syzbot+8267e9af795434ffa...@syzkaller.appspotmail.com If you send any email or patch for this bug, please reply to the original thread. For the git send-email command to use, or tips on how to reply if the thread isn't in your mailbox, see the "Reply instructions" at https://lkml.kernel.org/r/45e7a1058e024...@google.com Title: KASAN: use-after-free Read in finish_task_switch (2) Last occurred: 5 days ago Reported: 4 days ago Branches: Mainline Dashboard link: https://syzkaller.appspot.com/bug?id=9a98fcad6c8bd31f5c3afbdc6c75de9f082c0ffa Original thread: https://lkml.kernel.org/lkml/490679058e024...@google.com/T/#u This bug has a syzkaller reproducer only. This bug was bisected to: commit 7f466032dc9e5a61217f22ea34b2df932786bbfc Author: Jason Wang Date: Fri May 24 08:12:18 2019 + vhost: access vq metadata through kernel virtual address No one has replied to the original thread for this bug yet. Hi: We believe above two bugs are duplicated with the report "WARNING in __mmdrop". Can I just dup them with #syz dup "WARNING in __mmdrop" (If yes, just wonder how syzbot differ bugs, technically, several different bug can hit the same warning). If you fix this bug, please add the following tag to the commit: Reported-by: syzbot+7f067c796eee2acbc...@syzkaller.appspotmail.com If you send any email or patch for this bug, please reply to the original thread. For the git send-email command to use, or tips on how to reply if the thread isn't in your mailbox, see the "Reply instructions" at https://lkml.kernel.org/r/490679058e024...@google.com Title: memory leak in vhost_net_ioctl Last occurred: 22 days ago Reported: 48 days ago Branches: Mainline Dashboard link: https://syzkaller.appspot.com/bug?id=12ba349d7e26ccfe95317bc376e812ebbae2ee0f Original thread: https://lkml.kernel.org/lkml/188da1058a9c2...@google.com/T/#u This bug has a C reproducer. The original thread for this bug has received 4 replies; the last was 39 days ago. If you fix this bug, please add the following tag to the commit: Reported-by: syzbot+0789f0c7e45efd7bb...@syzkaller.appspotmail.com I do remember it can not be reproduced upstream, let me double check and close this one. Thanks If you send any email or patch for this bug, please consider replying to the original thread. For the git send-email command to use, or tips on how to reply if the thread isn't in your mailbox, see the "Reply instructions" at https://lkml.kernel.org/r/188da1058a9c2...@google.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/1] virtio/s390: fix race on airq_areas[]
The access to airq_areas was racy ever since the adapter interrupts got introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390: make airq summary indicators DMA") this became an issue in practice as well. Namely before that commit the airq_info that got overwritten was still functional. After that commit however the two infos share a summary_indicator, which aggravates the situation. Which means auto-online mechanism occasionally hangs the boot with virtio_blk. Signed-off-by: Halil Pasic Reported-by: Marc Hartmayer Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.") --- * We need definitely this fixed for 5.3. For older stable kernels it is to be discussed. @Connie what do you think: do we need a cc stable? * I have a variant that does not need the extra mutex but uses cmpxchg(). Decided to post this one because that one is more complex. But if there is interest we can have a look at it as well. --- drivers/s390/virtio/virtio_ccw.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index 1a55e5942d36..d97742662755 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -145,6 +145,8 @@ struct airq_info { struct airq_iv *aiv; }; static struct airq_info *airq_areas[MAX_AIRQ_AREAS]; +DEFINE_MUTEX(airq_areas_lock); + static u8 *summary_indicators; static inline u8 *get_summary_indicator(struct airq_info *info) @@ -265,9 +267,11 @@ static unsigned long get_airq_indicator(struct virtqueue *vqs[], int nvqs, unsigned long bit, flags; for (i = 0; i < MAX_AIRQ_AREAS && !indicator_addr; i++) { + mutex_lock(_areas_lock); if (!airq_areas[i]) airq_areas[i] = new_airq_info(i); info = airq_areas[i]; + mutex_unlock(_areas_lock); if (!info) return 0; write_lock_irqsave(>lock, flags); -- 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call
On Tue, Jul 23, 2019 at 05:38:30PM +0200, Christoph Hellwig wrote: > On Mon, Jul 22, 2019 at 11:33:35AM -0400, Michael S. Tsirkin wrote: > > On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote: > > > Do not call dma_max_mapping_size for devices that have no DMA > > > mask set, otherwise we can hit a NULL pointer dereference. > > > > > > This occurs when a virtio-blk-pci device is protected with > > > a virtual IOMMU. > > > > > > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()") > > > Signed-off-by: Eric Auger > > > Suggested-by: Christoph Hellwig > > > > Christoph, I wonder why did you suggest this? > > The connection between dma_mask and dma_max_mapping_size > > is far from obvious. The documentation doesn't exist. > > Do we really have to teach all users about this hack? > > Why not just make dma_max_mapping_size DTRT? > > Because we should not call into dma API functions for devices that > are not DMA capable. I'd rather call is_device_dma_capable then, better than poking at DMA internals. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call
On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote: >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c >> index c8be1c4f5b55..37c143971211 100644 >> --- a/drivers/virtio/virtio_ring.c >> +++ b/drivers/virtio/virtio_ring.c >> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev) >> { >> size_t max_segment_size = SIZE_MAX; >> - if (vring_use_dma_api(vdev)) >> +if (vring_use_dma_api(vdev) && vdev->dev.dma_mask) > > Hmm, might it make sense to roll that check up into vring_use_dma_api() > itself? After all, if the device has no mask then it's likely that other > DMA API ops wouldn't really work as expected either. Makes sense to me. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call
On Mon, Jul 22, 2019 at 11:33:35AM -0400, Michael S. Tsirkin wrote: > On Mon, Jul 22, 2019 at 04:55:09PM +0200, Eric Auger wrote: > > Do not call dma_max_mapping_size for devices that have no DMA > > mask set, otherwise we can hit a NULL pointer dereference. > > > > This occurs when a virtio-blk-pci device is protected with > > a virtual IOMMU. > > > > Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()") > > Signed-off-by: Eric Auger > > Suggested-by: Christoph Hellwig > > Christoph, I wonder why did you suggest this? > The connection between dma_mask and dma_max_mapping_size > is far from obvious. The documentation doesn't exist. > Do we really have to teach all users about this hack? > Why not just make dma_max_mapping_size DTRT? Because we should not call into dma API functions for devices that are not DMA capable. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] dma-mapping: Use dma_get_mask in dma_addressing_limited
On Mon, Jul 22, 2019 at 06:56:49PM +0200, Auger Eric wrote: > Hi Christoph, > > On 7/22/19 6:51 PM, Eric Auger wrote: > > We currently have cases where the dma_addressing_limited() gets > > called with dma_mask unset. This causes a NULL pointer dereference. > > > > Use dma_get_mask() accessor to prevent the crash. > > > > Fixes: b866455423e0 ("dma-mapping: add a dma_addressing_limited helper") > > Signed-off-by: Eric Auger > > As a follow-up of my last email, here is a patch featuring > dma_get_mask(). But you don't have the WARN_ON_ONCE anymore, pointing > out suspect users. > > Feel free to pick up your preferred approach I can take this for now as we are past the merge window. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/6] vhost: validate MMU notifier registration
On 2019/7/23 下午5:17, Michael S. Tsirkin wrote: On Tue, Jul 23, 2019 at 03:57:14AM -0400, Jason Wang wrote: The return value of mmu_notifier_register() is not checked in vhost_vring_set_num_addr(). This will cause an out of sync between mm and MMU notifier thus a double free. To solve this, introduce a boolean flag to track whether MMU notifier is registered and only do unregistering when it was true. Reported-and-tested-by: syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang Right. This fixes the bug. But it's not great that simple things like setting vq address put pressure on memory allocator. Also, if we get a single during processing notifier register will fail, disabling optimization permanently. Yes, but do we really care about this case. E.g we even fail for -ENOMEM for set owner. In fact, see below: --- drivers/vhost/vhost.c | 19 +++ drivers/vhost/vhost.h | 1 + 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 34c0d970bcbc..058191d5efad 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -630,6 +630,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; + dev->has_notifier = false; init_llist_head(>work_list); init_waitqueue_head(>wait); INIT_LIST_HEAD(>read_list); @@ -731,6 +732,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev) if (err) goto err_mmu_notifier; #endif + dev->has_notifier = true; return 0; I just noticed that set owner now fails if we get a signal. Userspace could retry in theory but it does not: this is userspace abi breakage since it used to only fail on invalid input. Well, at least kthread_create() and vhost_dev_alloc_iovecs() will allocate memory. Thanks @@ -960,7 +962,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev) } if (dev->mm) { #if VHOST_ARCH_CAN_ACCEL_UACCESS - mmu_notifier_unregister(>mmu_notifier, dev->mm); + if (dev->has_notifier) { + mmu_notifier_unregister(>mmu_notifier, + dev->mm); + dev->has_notifier = false; + } #endif mmput(dev->mm); } @@ -2065,8 +2071,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, /* Unregister MMU notifer to allow invalidation callback * can access vq->uaddrs[] without holding a lock. */ - if (d->mm) + if (d->has_notifier) { mmu_notifier_unregister(>mmu_notifier, d->mm); + d->has_notifier = false; + } vhost_uninit_vq_maps(vq); #endif @@ -2086,8 +2094,11 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, if (r == 0) vhost_setup_vq_uaddr(vq); - if (d->mm) - mmu_notifier_register(>mmu_notifier, d->mm); + if (d->mm) { + r = mmu_notifier_register(>mmu_notifier, d->mm); + if (!r) + d->has_notifier = true; + } #endif mutex_unlock(>mutex); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 819296332913..a62f56a4cf72 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -214,6 +214,7 @@ struct vhost_dev { int iov_limit; int weight; int byte_weight; + bool has_notifier; }; bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len); -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/6] vhost: reset invalidate_count in vhost_set_vring_num_addr()
On 2019/7/23 下午5:17, Michael S. Tsirkin wrote: On Tue, Jul 23, 2019 at 03:57:16AM -0400, Jason Wang wrote: The vhost_set_vring_num_addr() could be called in the middle of invalidate_range_start() and invalidate_range_end(). If we don't reset invalidate_count after the un-registering of MMU notifier, the invalidate_cont will run out of sync (e.g never reach zero). This will in fact disable the fast accessor path. Fixing by reset the count to zero. Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 03666b702498..89c9f08b5146 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2074,6 +2074,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, d->has_notifier = false; } + /* reset invalidate_count in case we are in the middle of +* invalidate_start() and invalidate_end(). +*/ + vq->invalidate_count = 0; I think that the code is ok but the comments are not very clear: - we are never in the middle since we just removed the notifier If I read the code correctly, mmu_notifier_unregister() can only guarantee to wait for the pending method to complete. So we can have: invalidate_start() mmu_notifier_unregister() invalidate_end() - the result is not just disabling optimization: if notifier becomes negative, then later we can think it's ok to map when it isn't since notifier is active. I don't get how it could be negative, the only possible thing is to have a positive value. Thanks vhost_uninit_vq_maps(vq); #endif -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 5/6] vhost: mark dirty pages during map uninit
On 2019/7/23 下午5:17, Michael S. Tsirkin wrote: On Tue, Jul 23, 2019 at 03:57:17AM -0400, Jason Wang wrote: We don't mark dirty pages if the map was teared down outside MMU notifier. This will lead untracked dirty pages. Fixing by marking dirty pages during map uninit. Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 89c9f08b5146..5b8821d00fe4 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -306,6 +306,18 @@ static void vhost_map_unprefetch(struct vhost_map *map) kfree(map); } +static void vhost_set_map_dirty(struct vhost_virtqueue *vq, + struct vhost_map *map, int index) +{ + struct vhost_uaddr *uaddr = >uaddrs[index]; + int i; + + if (uaddr->write) { + for (i = 0; i < map->npages; i++) + set_page_dirty(map->pages[i]); + } +} + static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) { struct vhost_map *map[VHOST_NUM_ADDRS]; @@ -315,8 +327,10 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) for (i = 0; i < VHOST_NUM_ADDRS; i++) { map[i] = rcu_dereference_protected(vq->maps[i], lockdep_is_held(>mmu_lock)); - if (map[i]) + if (map[i]) { + vhost_set_map_dirty(vq, map[i], i); rcu_assign_pointer(vq->maps[i], NULL); + } } spin_unlock(>mmu_lock); @@ -354,7 +368,6 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, { struct vhost_uaddr *uaddr = >uaddrs[index]; struct vhost_map *map; - int i; if (!vhost_map_range_overlap(uaddr, start, end)) return; @@ -365,10 +378,7 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, map = rcu_dereference_protected(vq->maps[index], lockdep_is_held(>mmu_lock)); if (map) { - if (uaddr->write) { - for (i = 0; i < map->npages; i++) - set_page_dirty(map->pages[i]); - } + vhost_set_map_dirty(vq, map, index); rcu_assign_pointer(vq->maps[index], NULL); } spin_unlock(>mmu_lock); OK and the reason it's safe is because the invalidate counter got incremented so we know page will not get mapped again. But we*do* need to wait for page not to be mapped. And if that means waiting for VQ processing to finish, then I worry that is a very log time. I'm not sure I get you here. If we don't have such map, we will fall back to normal uaccess helper. And in the memory accessor, the rcu critical section is pretty small. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 6/6] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
On 2019/7/23 下午5:16, Michael S. Tsirkin wrote: On Tue, Jul 23, 2019 at 03:57:18AM -0400, Jason Wang wrote: There's no need for RCU synchronization in vhost_uninit_vq_maps() since we've already serialized with readers (memory accessors). This also avoid the possible userspace DOS through ioctl() because of the possible high latency caused by synchronize_rcu(). Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang I agree synchronize_rcu in both mmu notifiers and ioctl is a problem we must fix. --- drivers/vhost/vhost.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 5b8821d00fe4..a17df1f4069a 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -334,7 +334,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) } spin_unlock(>mmu_lock); - synchronize_rcu(); + /* No need for synchronize_rcu() or kfree_rcu() since we are +* serialized with memory accessors (e.g vq mutex held). +*/ for (i = 0; i < VHOST_NUM_ADDRS; i++) if (map[i]) -- 2.18.1 .. however we can not RCU with no synchronization in sight. Sometimes there are hacks like using a lock/unlock pair instead of sync, but here no one bothers. specifically notifiers call reset vq maps which calls uninit vq maps which is not under any lock. Notifier did this: if (map) { if (uaddr->write) { for (i = 0; i < map->npages; i++) set_page_dirty(map->pages[i]); } rcu_assign_pointer(vq->maps[index], NULL); } spin_unlock(>mmu_lock); if (map) { synchronize_rcu(); vhost_map_unprefetch(map); } So it indeed have a synchronize_rcu() there. Thanks You will get use after free when map is then accessed. If you always have a lock then just take that lock and no need for RCU. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/6] vhost: validate MMU notifier registration
On Tue, Jul 23, 2019 at 03:57:14AM -0400, Jason Wang wrote: > The return value of mmu_notifier_register() is not checked in > vhost_vring_set_num_addr(). This will cause an out of sync between mm > and MMU notifier thus a double free. To solve this, introduce a > boolean flag to track whether MMU notifier is registered and only do > unregistering when it was true. > > Reported-and-tested-by: > syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com > Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual > address") > Signed-off-by: Jason Wang Right. This fixes the bug. But it's not great that simple things like setting vq address put pressure on memory allocator. Also, if we get a single during processing notifier register will fail, disabling optimization permanently. In fact, see below: > --- > drivers/vhost/vhost.c | 19 +++ > drivers/vhost/vhost.h | 1 + > 2 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 34c0d970bcbc..058191d5efad 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -630,6 +630,7 @@ void vhost_dev_init(struct vhost_dev *dev, > dev->iov_limit = iov_limit; > dev->weight = weight; > dev->byte_weight = byte_weight; > + dev->has_notifier = false; > init_llist_head(>work_list); > init_waitqueue_head(>wait); > INIT_LIST_HEAD(>read_list); > @@ -731,6 +732,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev) > if (err) > goto err_mmu_notifier; > #endif > + dev->has_notifier = true; > > return 0; > I just noticed that set owner now fails if we get a signal. Userspace could retry in theory but it does not: this is userspace abi breakage since it used to only fail on invalid input. > @@ -960,7 +962,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > } > if (dev->mm) { > #if VHOST_ARCH_CAN_ACCEL_UACCESS > - mmu_notifier_unregister(>mmu_notifier, dev->mm); > + if (dev->has_notifier) { > + mmu_notifier_unregister(>mmu_notifier, > + dev->mm); > + dev->has_notifier = false; > + } > #endif > mmput(dev->mm); > } > @@ -2065,8 +2071,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev > *d, > /* Unregister MMU notifer to allow invalidation callback >* can access vq->uaddrs[] without holding a lock. >*/ > - if (d->mm) > + if (d->has_notifier) { > mmu_notifier_unregister(>mmu_notifier, d->mm); > + d->has_notifier = false; > + } > > vhost_uninit_vq_maps(vq); > #endif > @@ -2086,8 +2094,11 @@ static long vhost_vring_set_num_addr(struct vhost_dev > *d, > if (r == 0) > vhost_setup_vq_uaddr(vq); > > - if (d->mm) > - mmu_notifier_register(>mmu_notifier, d->mm); > + if (d->mm) { > + r = mmu_notifier_register(>mmu_notifier, d->mm); > + if (!r) > + d->has_notifier = true; > + } > #endif > > mutex_unlock(>mutex); > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index 819296332913..a62f56a4cf72 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -214,6 +214,7 @@ struct vhost_dev { > int iov_limit; > int weight; > int byte_weight; > + bool has_notifier; > }; > > bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int > total_len); > -- > 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 4/6] vhost: reset invalidate_count in vhost_set_vring_num_addr()
On Tue, Jul 23, 2019 at 03:57:16AM -0400, Jason Wang wrote: > The vhost_set_vring_num_addr() could be called in the middle of > invalidate_range_start() and invalidate_range_end(). If we don't reset > invalidate_count after the un-registering of MMU notifier, the > invalidate_cont will run out of sync (e.g never reach zero). This will > in fact disable the fast accessor path. Fixing by reset the count to > zero. > > Reported-by: Michael S. Tsirkin > Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual > address") > Signed-off-by: Jason Wang > --- > drivers/vhost/vhost.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 03666b702498..89c9f08b5146 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -2074,6 +2074,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev > *d, > d->has_notifier = false; > } > > + /* reset invalidate_count in case we are in the middle of > + * invalidate_start() and invalidate_end(). > + */ > + vq->invalidate_count = 0; I think that the code is ok but the comments are not very clear: - we are never in the middle since we just removed the notifier - the result is not just disabling optimization: if notifier becomes negative, then later we can think it's ok to map when it isn't since notifier is active. > vhost_uninit_vq_maps(vq); > #endif > > -- > 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 5/6] vhost: mark dirty pages during map uninit
On Tue, Jul 23, 2019 at 03:57:17AM -0400, Jason Wang wrote: > We don't mark dirty pages if the map was teared down outside MMU > notifier. This will lead untracked dirty pages. Fixing by marking > dirty pages during map uninit. > > Reported-by: Michael S. Tsirkin > Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual > address") > Signed-off-by: Jason Wang > --- > drivers/vhost/vhost.c | 22 -- > 1 file changed, 16 insertions(+), 6 deletions(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 89c9f08b5146..5b8821d00fe4 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -306,6 +306,18 @@ static void vhost_map_unprefetch(struct vhost_map *map) > kfree(map); > } > > +static void vhost_set_map_dirty(struct vhost_virtqueue *vq, > + struct vhost_map *map, int index) > +{ > + struct vhost_uaddr *uaddr = >uaddrs[index]; > + int i; > + > + if (uaddr->write) { > + for (i = 0; i < map->npages; i++) > + set_page_dirty(map->pages[i]); > + } > +} > + > static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) > { > struct vhost_map *map[VHOST_NUM_ADDRS]; > @@ -315,8 +327,10 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue > *vq) > for (i = 0; i < VHOST_NUM_ADDRS; i++) { > map[i] = rcu_dereference_protected(vq->maps[i], > lockdep_is_held(>mmu_lock)); > - if (map[i]) > + if (map[i]) { > + vhost_set_map_dirty(vq, map[i], i); > rcu_assign_pointer(vq->maps[i], NULL); > + } > } > spin_unlock(>mmu_lock); > > @@ -354,7 +368,6 @@ static void vhost_invalidate_vq_start(struct > vhost_virtqueue *vq, > { > struct vhost_uaddr *uaddr = >uaddrs[index]; > struct vhost_map *map; > - int i; > > if (!vhost_map_range_overlap(uaddr, start, end)) > return; > @@ -365,10 +378,7 @@ static void vhost_invalidate_vq_start(struct > vhost_virtqueue *vq, > map = rcu_dereference_protected(vq->maps[index], > lockdep_is_held(>mmu_lock)); > if (map) { > - if (uaddr->write) { > - for (i = 0; i < map->npages; i++) > - set_page_dirty(map->pages[i]); > - } > + vhost_set_map_dirty(vq, map, index); > rcu_assign_pointer(vq->maps[index], NULL); > } > spin_unlock(>mmu_lock); OK and the reason it's safe is because the invalidate counter got incremented so we know page will not get mapped again. But we *do* need to wait for page not to be mapped. And if that means waiting for VQ processing to finish, then I worry that is a very log time. > -- > 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 6/6] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
On Tue, Jul 23, 2019 at 03:57:18AM -0400, Jason Wang wrote: > There's no need for RCU synchronization in vhost_uninit_vq_maps() > since we've already serialized with readers (memory accessors). This > also avoid the possible userspace DOS through ioctl() because of the > possible high latency caused by synchronize_rcu(). > > Reported-by: Michael S. Tsirkin > Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual > address") > Signed-off-by: Jason Wang I agree synchronize_rcu in both mmu notifiers and ioctl is a problem we must fix. > --- > drivers/vhost/vhost.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 5b8821d00fe4..a17df1f4069a 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -334,7 +334,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue > *vq) > } > spin_unlock(>mmu_lock); > > - synchronize_rcu(); > + /* No need for synchronize_rcu() or kfree_rcu() since we are > + * serialized with memory accessors (e.g vq mutex held). > + */ > > for (i = 0; i < VHOST_NUM_ADDRS; i++) > if (map[i]) > -- > 2.18.1 .. however we can not RCU with no synchronization in sight. Sometimes there are hacks like using a lock/unlock pair instead of sync, but here no one bothers. specifically notifiers call reset vq maps which calls uninit vq maps which is not under any lock. You will get use after free when map is then accessed. If you always have a lock then just take that lock and no need for RCU. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 6/6] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
There's no need for RCU synchronization in vhost_uninit_vq_maps() since we've already serialized with readers (memory accessors). This also avoid the possible userspace DOS through ioctl() because of the possible high latency caused by synchronize_rcu(). Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 5b8821d00fe4..a17df1f4069a 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -334,7 +334,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) } spin_unlock(>mmu_lock); - synchronize_rcu(); + /* No need for synchronize_rcu() or kfree_rcu() since we are +* serialized with memory accessors (e.g vq mutex held). +*/ for (i = 0; i < VHOST_NUM_ADDRS; i++) if (map[i]) -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 4/6] vhost: reset invalidate_count in vhost_set_vring_num_addr()
The vhost_set_vring_num_addr() could be called in the middle of invalidate_range_start() and invalidate_range_end(). If we don't reset invalidate_count after the un-registering of MMU notifier, the invalidate_cont will run out of sync (e.g never reach zero). This will in fact disable the fast accessor path. Fixing by reset the count to zero. Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 03666b702498..89c9f08b5146 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2074,6 +2074,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, d->has_notifier = false; } + /* reset invalidate_count in case we are in the middle of +* invalidate_start() and invalidate_end(). +*/ + vq->invalidate_count = 0; vhost_uninit_vq_maps(vq); #endif -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 3/6] vhost: fix vhost map leak
We don't free map during vhost_map_unprefetch(). This means it could be leaked. Fixing by free the map. Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 058191d5efad..03666b702498 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -303,9 +303,7 @@ static void vhost_vq_meta_reset(struct vhost_dev *d) static void vhost_map_unprefetch(struct vhost_map *map) { kfree(map->pages); - map->pages = NULL; - map->npages = 0; - map->addr = NULL; + kfree(map); } static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 5/6] vhost: mark dirty pages during map uninit
We don't mark dirty pages if the map was teared down outside MMU notifier. This will lead untracked dirty pages. Fixing by marking dirty pages during map uninit. Reported-by: Michael S. Tsirkin Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 89c9f08b5146..5b8821d00fe4 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -306,6 +306,18 @@ static void vhost_map_unprefetch(struct vhost_map *map) kfree(map); } +static void vhost_set_map_dirty(struct vhost_virtqueue *vq, + struct vhost_map *map, int index) +{ + struct vhost_uaddr *uaddr = >uaddrs[index]; + int i; + + if (uaddr->write) { + for (i = 0; i < map->npages; i++) + set_page_dirty(map->pages[i]); + } +} + static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) { struct vhost_map *map[VHOST_NUM_ADDRS]; @@ -315,8 +327,10 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq) for (i = 0; i < VHOST_NUM_ADDRS; i++) { map[i] = rcu_dereference_protected(vq->maps[i], lockdep_is_held(>mmu_lock)); - if (map[i]) + if (map[i]) { + vhost_set_map_dirty(vq, map[i], i); rcu_assign_pointer(vq->maps[i], NULL); + } } spin_unlock(>mmu_lock); @@ -354,7 +368,6 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, { struct vhost_uaddr *uaddr = >uaddrs[index]; struct vhost_map *map; - int i; if (!vhost_map_range_overlap(uaddr, start, end)) return; @@ -365,10 +378,7 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq, map = rcu_dereference_protected(vq->maps[index], lockdep_is_held(>mmu_lock)); if (map) { - if (uaddr->write) { - for (i = 0; i < map->npages; i++) - set_page_dirty(map->pages[i]); - } + vhost_set_map_dirty(vq, map, index); rcu_assign_pointer(vq->maps[index], NULL); } spin_unlock(>mmu_lock); -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 2/6] vhost: validate MMU notifier registration
The return value of mmu_notifier_register() is not checked in vhost_vring_set_num_addr(). This will cause an out of sync between mm and MMU notifier thus a double free. To solve this, introduce a boolean flag to track whether MMU notifier is registered and only do unregistering when it was true. Reported-and-tested-by: syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 19 +++ drivers/vhost/vhost.h | 1 + 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 34c0d970bcbc..058191d5efad 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -630,6 +630,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; + dev->has_notifier = false; init_llist_head(>work_list); init_waitqueue_head(>wait); INIT_LIST_HEAD(>read_list); @@ -731,6 +732,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev) if (err) goto err_mmu_notifier; #endif + dev->has_notifier = true; return 0; @@ -960,7 +962,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev) } if (dev->mm) { #if VHOST_ARCH_CAN_ACCEL_UACCESS - mmu_notifier_unregister(>mmu_notifier, dev->mm); + if (dev->has_notifier) { + mmu_notifier_unregister(>mmu_notifier, + dev->mm); + dev->has_notifier = false; + } #endif mmput(dev->mm); } @@ -2065,8 +2071,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, /* Unregister MMU notifer to allow invalidation callback * can access vq->uaddrs[] without holding a lock. */ - if (d->mm) + if (d->has_notifier) { mmu_notifier_unregister(>mmu_notifier, d->mm); + d->has_notifier = false; + } vhost_uninit_vq_maps(vq); #endif @@ -2086,8 +2094,11 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, if (r == 0) vhost_setup_vq_uaddr(vq); - if (d->mm) - mmu_notifier_register(>mmu_notifier, d->mm); + if (d->mm) { + r = mmu_notifier_register(>mmu_notifier, d->mm); + if (!r) + d->has_notifier = true; + } #endif mutex_unlock(>mutex); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 819296332913..a62f56a4cf72 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -214,6 +214,7 @@ struct vhost_dev { int iov_limit; int weight; int byte_weight; + bool has_notifier; }; bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len); -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/6] vhost: don't set uaddr for invalid address
We should not setup uaddr for the invalid address, otherwise we may try to pin or prefetch mapping of wrong pages. Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address") Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index dc9301d31f12..34c0d970bcbc 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2083,7 +2083,8 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d, } #if VHOST_ARCH_CAN_ACCEL_UACCESS - vhost_setup_vq_uaddr(vq); + if (r == 0) + vhost_setup_vq_uaddr(vq); if (d->mm) mmu_notifier_register(>mmu_notifier, d->mm); -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 0/6] Fixes for meta data acceleration
Hi all: This series try to fix several issues introduced by meta data accelreation series. Please review. Jason Wang (6): vhost: don't set uaddr for invalid address vhost: validate MMU notifier registration vhost: fix vhost map leak vhost: reset invalidate_count in vhost_set_vring_num_addr() vhost: mark dirty pages during map uninit vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps() drivers/vhost/vhost.c | 56 +++ drivers/vhost/vhost.h | 1 + 2 files changed, 42 insertions(+), 15 deletions(-) -- 2.18.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization