Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
On Tue, Mar 12, 2019 at 10:59:09AM +0800, Jason Wang wrote: > > On 2019/3/12 上午2:14, David Miller wrote: > > From: "Michael S. Tsirkin" > > Date: Mon, 11 Mar 2019 09:59:28 -0400 > > > > > On Mon, Mar 11, 2019 at 03:13:17PM +0800, Jason Wang wrote: > > > > On 2019/3/8 下午10:12, Christoph Hellwig wrote: > > > > > On Wed, Mar 06, 2019 at 02:18:07AM -0500, Jason Wang wrote: > > > > > > This series tries to access virtqueue metadata through kernel > > > > > > virtual > > > > > > address instead of copy_user() friends since they had too much > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > toggling. This is done through setup kernel address through vmap() > > > > > > and > > > > > > resigter MMU notifier for invalidation. > > > > > > > > > > > > Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see > > > > > > obvious improvement. > > > > > How is this going to work for CPUs with virtually tagged caches? > > > > > > > > Anything different that you worry? > > > If caches have virtual tags then kernel and userspace view of memory > > > might not be automatically in sync if they access memory > > > through different virtual addresses. You need to do things like > > > flush_cache_page, probably multiple times. > > "flush_dcache_page()" > > > I get this. Then I think the current set_bit_to_user() is suspicious, we > probably miss a flush_dcache_page() there: > > > static int set_bit_to_user(int nr, void __user *addr) > { > unsigned long log = (unsigned long)addr; > struct page *page; > void *base; > int bit = nr + (log % PAGE_SIZE) * 8; > int r; > > r = get_user_pages_fast(log, 1, 1, &page); > if (r < 0) > return r; > BUG_ON(r != 1); > base = kmap_atomic(page); > set_bit(bit, base); > kunmap_atomic(base); > set_page_dirty_lock(page); > put_page(page); > return 0; > } > > Thanks I think you are right. The correct fix though is to re-implement it using asm and handling pagefault, not gup. Three atomic ops per bit is way to expensive. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On Tue, Mar 12, 2019 at 10:56:20AM +0800, Jason Wang wrote: > > On 2019/3/11 下午9:43, Andrea Arcangeli wrote: > > On Mon, Mar 11, 2019 at 08:48:37AM -0400, Michael S. Tsirkin wrote: > > > Using copyXuser is better I guess. > > It certainly would be faster there, but I don't think it's needed if > > that would be the only use case left that justifies supporting two > > different models. On small 32bit systems with little RAM kmap won't > > perform measurably different on 32bit or 64bit systems. If the 32bit > > host has a lot of ram it all gets slow anyway at accessing RAM above > > the direct mapping, if compared to 64bit host kernels, it's not just > > an issue for vhost + mmu notifier + kmap and the best way to optimize > > things is to run 64bit host kernels. > > > > Like Christoph pointed out, the main use case for retaining the > > copy-user model would be CPUs with virtually indexed not physically > > tagged data caches (they'll still suffer from the spectre-v1 fix, > > although I exclude they have to suffer the SMAP > > slowdown/feature). Those may require some additional flushing than the > > current copy-user model requires. > > > > As a rule of thumb any arch where copy_user_page doesn't define as > > copy_page will require some additional cache flushing after the > > kmap. Supposedly with vmap, the vmap layer should have taken care of > > that (I didn't verify that yet). > > > vmap_page_range()/free_unmap_vmap_area() will call > fluch_cache_vmap()/flush_cache_vunmap(). So vmap layer should be ok. > > Thanks You only unmap from mmu notifier though. You don't do it after any access. > > > > > There are some accessories like copy_to_user_page() > > copy_from_user_page() that could work and obviously defines to raw > > memcpy on x86 (the main cons is they don't provide word granular > > access) and at least on sparc they're tailored to ptrace assumptions > > so then we'd need to evaluate what happens if this is used outside of > > ptrace context. kmap has been used generally either to access whole > > pages (i.e. copy_user_page), so ptrace may actually be the only use > > case with subpage granularity access. > > > > #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ > > do {\ > > flush_cache_page(vma, vaddr, page_to_pfn(page));\ > > memcpy(dst, src, len); \ > > flush_ptrace_access(vma, page, vaddr, src, len, 0); \ > > } while (0) > > > > So I wouldn't rule out the need for a dual model, until we solve how > > to run this stable on non-x86 arches with not physically tagged > > caches. > > > > Thanks, > > Andrea ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On Tue, Mar 12, 2019 at 10:52:15AM +0800, Jason Wang wrote: > > On 2019/3/11 下午8:48, Michael S. Tsirkin wrote: > > On Mon, Mar 11, 2019 at 03:40:31PM +0800, Jason Wang wrote: > > > On 2019/3/9 上午3:48, Andrea Arcangeli wrote: > > > > Hello Jeson, > > > > > > > > On Fri, Mar 08, 2019 at 04:50:36PM +0800, Jason Wang wrote: > > > > > Just to make sure I understand here. For boosting through huge TLB, do > > > > > you mean we can do that in the future (e.g by mapping more userspace > > > > > pages to kenrel) or it can be done by this series (only about three 4K > > > > > pages were vmapped per virtqueue)? > > > > When I answered about the advantages of mmu notifier and I mentioned > > > > guaranteed 2m/gigapages where available, I overlooked the detail you > > > > were using vmap instead of kmap. So with vmap you're actually doing > > > > the opposite, it slows down the access because it will always use a 4k > > > > TLB even if QEMU runs on THP or gigapages hugetlbfs. > > > > > > > > If there's just one page (or a few pages) in each vmap there's no need > > > > of vmap, the linearity vmap provides doesn't pay off in such > > > > case. > > > > > > > > So likely there's further room for improvement here that you can > > > > achieve in the current series by just dropping vmap/vunmap. > > > > > > > > You can just use kmap (or kmap_atomic if you're in preemptible > > > > section, should work from bh/irq). > > > > > > > > In short the mmu notifier to invalidate only sets a "struct page * > > > > userringpage" pointer to NULL without calls to vunmap. > > > > > > > > In all cases immediately after gup_fast returns you can always call > > > > put_page immediately (which explains why I'd like an option to drop > > > > FOLL_GET from gup_fast to speed it up). > > > > > > > > Then you can check the sequence_counter and inc/dec counter increased > > > > by _start/_end. That will tell you if the page you got and you called > > > > put_page to immediately unpin it or even to free it, cannot go away > > > > under you until the invalidate is called. > > > > > > > > If sequence counters and counter tells that gup_fast raced with anyt > > > > mmu notifier invalidate you can just repeat gup_fast. Otherwise you're > > > > done, the page cannot go away under you, the host virtual to host > > > > physical mapping cannot change either. And the page is not pinned > > > > either. So you can just set the "struct page * userringpage = page" > > > > where "page" was the one setup by gup_fast. > > > > > > > > When later the invalidate runs, you can just call set_page_dirty if > > > > gup_fast was called with "write = 1" and then you clear the pointer > > > > "userringpage = NULL". > > > > > > > > When you need to read/write to the memory > > > > kmap/kmap_atomic(userringpage) should work. > > > Yes, I've considered kmap() from the start. The reason I don't do that is > > > large virtqueue may need more than one page so VA might not be contiguous. > > > But this is probably not a big issue which just need more tricks in the > > > vhost memory accessors. > > > > > > > > > > In short because there's no hardware involvement here, the established > > > > mapping is just the pointer to the page, there is no need of setting > > > > up any pagetables or to do any TLB flushes (except on 32bit archs if > > > > the page is above the direct mapping but it never happens on 64bit > > > > archs). > > > I see, I believe we don't care much about the performance of 32bit archs > > > (or > > > we can just fallback to copy_to_user() friends). > > Using copyXuser is better I guess. > > > Ok. > > > > > > > Using direct mapping (I > > > guess kernel will always try hugepage for that?) should be better and we > > > can > > > even use it for the data transfer not only for the metadata. > > > > > > Thanks > > We can't really. The big issue is get user pages. Doing that on data > > path will be slower than copyXuser. > > > I meant if we can find a way to avoid doing gup in datapath. E.g vhost > maintain a range tree and add or remove ranges through MMU notifier. Then in > datapath, if we find the range, then use direct mapping otherwise > copy_to_user(). > > Thanks We can try. But I'm not sure there's any reason to think there's any locality there. > > > Or maybe it won't with the > > amount of mitigations spread around. Go ahead and try. > > > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
On 2019/3/12 上午2:14, David Miller wrote: From: "Michael S. Tsirkin" Date: Mon, 11 Mar 2019 09:59:28 -0400 On Mon, Mar 11, 2019 at 03:13:17PM +0800, Jason Wang wrote: On 2019/3/8 下午10:12, Christoph Hellwig wrote: On Wed, Mar 06, 2019 at 02:18:07AM -0500, Jason Wang wrote: This series tries to access virtqueue metadata through kernel virtual address instead of copy_user() friends since they had too much overheads like checks, spec barriers or even hardware feature toggling. This is done through setup kernel address through vmap() and resigter MMU notifier for invalidation. Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see obvious improvement. How is this going to work for CPUs with virtually tagged caches? Anything different that you worry? If caches have virtual tags then kernel and userspace view of memory might not be automatically in sync if they access memory through different virtual addresses. You need to do things like flush_cache_page, probably multiple times. "flush_dcache_page()" I get this. Then I think the current set_bit_to_user() is suspicious, we probably miss a flush_dcache_page() there: static int set_bit_to_user(int nr, void __user *addr) { unsigned long log = (unsigned long)addr; struct page *page; void *base; int bit = nr + (log % PAGE_SIZE) * 8; int r; r = get_user_pages_fast(log, 1, 1, &page); if (r < 0) return r; BUG_ON(r != 1); base = kmap_atomic(page); set_bit(bit, base); kunmap_atomic(base); set_page_dirty_lock(page); put_page(page); return 0; } Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On 2019/3/11 下午9:43, Andrea Arcangeli wrote: On Mon, Mar 11, 2019 at 08:48:37AM -0400, Michael S. Tsirkin wrote: Using copyXuser is better I guess. It certainly would be faster there, but I don't think it's needed if that would be the only use case left that justifies supporting two different models. On small 32bit systems with little RAM kmap won't perform measurably different on 32bit or 64bit systems. If the 32bit host has a lot of ram it all gets slow anyway at accessing RAM above the direct mapping, if compared to 64bit host kernels, it's not just an issue for vhost + mmu notifier + kmap and the best way to optimize things is to run 64bit host kernels. Like Christoph pointed out, the main use case for retaining the copy-user model would be CPUs with virtually indexed not physically tagged data caches (they'll still suffer from the spectre-v1 fix, although I exclude they have to suffer the SMAP slowdown/feature). Those may require some additional flushing than the current copy-user model requires. As a rule of thumb any arch where copy_user_page doesn't define as copy_page will require some additional cache flushing after the kmap. Supposedly with vmap, the vmap layer should have taken care of that (I didn't verify that yet). vmap_page_range()/free_unmap_vmap_area() will call fluch_cache_vmap()/flush_cache_vunmap(). So vmap layer should be ok. Thanks There are some accessories like copy_to_user_page() copy_from_user_page() that could work and obviously defines to raw memcpy on x86 (the main cons is they don't provide word granular access) and at least on sparc they're tailored to ptrace assumptions so then we'd need to evaluate what happens if this is used outside of ptrace context. kmap has been used generally either to access whole pages (i.e. copy_user_page), so ptrace may actually be the only use case with subpage granularity access. #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ do {\ flush_cache_page(vma, vaddr, page_to_pfn(page));\ memcpy(dst, src, len); \ flush_ptrace_access(vma, page, vaddr, src, len, 0); \ } while (0) So I wouldn't rule out the need for a dual model, until we solve how to run this stable on non-x86 arches with not physically tagged caches. Thanks, Andrea ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On 2019/3/11 下午8:48, Michael S. Tsirkin wrote: On Mon, Mar 11, 2019 at 03:40:31PM +0800, Jason Wang wrote: On 2019/3/9 上午3:48, Andrea Arcangeli wrote: Hello Jeson, On Fri, Mar 08, 2019 at 04:50:36PM +0800, Jason Wang wrote: Just to make sure I understand here. For boosting through huge TLB, do you mean we can do that in the future (e.g by mapping more userspace pages to kenrel) or it can be done by this series (only about three 4K pages were vmapped per virtqueue)? When I answered about the advantages of mmu notifier and I mentioned guaranteed 2m/gigapages where available, I overlooked the detail you were using vmap instead of kmap. So with vmap you're actually doing the opposite, it slows down the access because it will always use a 4k TLB even if QEMU runs on THP or gigapages hugetlbfs. If there's just one page (or a few pages) in each vmap there's no need of vmap, the linearity vmap provides doesn't pay off in such case. So likely there's further room for improvement here that you can achieve in the current series by just dropping vmap/vunmap. You can just use kmap (or kmap_atomic if you're in preemptible section, should work from bh/irq). In short the mmu notifier to invalidate only sets a "struct page * userringpage" pointer to NULL without calls to vunmap. In all cases immediately after gup_fast returns you can always call put_page immediately (which explains why I'd like an option to drop FOLL_GET from gup_fast to speed it up). Then you can check the sequence_counter and inc/dec counter increased by _start/_end. That will tell you if the page you got and you called put_page to immediately unpin it or even to free it, cannot go away under you until the invalidate is called. If sequence counters and counter tells that gup_fast raced with anyt mmu notifier invalidate you can just repeat gup_fast. Otherwise you're done, the page cannot go away under you, the host virtual to host physical mapping cannot change either. And the page is not pinned either. So you can just set the "struct page * userringpage = page" where "page" was the one setup by gup_fast. When later the invalidate runs, you can just call set_page_dirty if gup_fast was called with "write = 1" and then you clear the pointer "userringpage = NULL". When you need to read/write to the memory kmap/kmap_atomic(userringpage) should work. Yes, I've considered kmap() from the start. The reason I don't do that is large virtqueue may need more than one page so VA might not be contiguous. But this is probably not a big issue which just need more tricks in the vhost memory accessors. In short because there's no hardware involvement here, the established mapping is just the pointer to the page, there is no need of setting up any pagetables or to do any TLB flushes (except on 32bit archs if the page is above the direct mapping but it never happens on 64bit archs). I see, I believe we don't care much about the performance of 32bit archs (or we can just fallback to copy_to_user() friends). Using copyXuser is better I guess. Ok. Using direct mapping (I guess kernel will always try hugepage for that?) should be better and we can even use it for the data transfer not only for the metadata. Thanks We can't really. The big issue is get user pages. Doing that on data path will be slower than copyXuser. I meant if we can find a way to avoid doing gup in datapath. E.g vhost maintain a range tree and add or remove ranges through MMU notifier. Then in datapath, if we find the range, then use direct mapping otherwise copy_to_user(). Thanks Or maybe it won't with the amount of mitigations spread around. Go ahead and try. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH 1/1] virtio_blk: replace 0 by HCTX_TYPE_DEFAULT to index blk_mq_tag_set->map
Use HCTX_TYPE_DEFAULT instead of 0 to avoid hardcoding. Signed-off-by: Dongli Zhang --- This is the only place to cleanup under drivers/block/*. Another patch set is sent to scsi and then all of below are cleaned up: - block/* - drivers/* drivers/block/virtio_blk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 4bc083b..bed6035 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -691,7 +691,8 @@ static int virtblk_map_queues(struct blk_mq_tag_set *set) { struct virtio_blk *vblk = set->driver_data; - return blk_mq_virtio_map_queues(&set->map[0], vblk->vdev, 0); + return blk_mq_virtio_map_queues(&set->map[HCTX_TYPE_DEFAULT], + vblk->vdev, 0); } #ifdef CONFIG_VIRTIO_BLK_SCSI -- 2.7.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] VMCI: Use BIT() macro for bit definitions
On 3/11/19 3:09 PM, Vishnu DASA wrote: > No functional changes, cleanup only. > > Reviewed-by: Adit Ranadive > Reviewed-by: Jorgen Hansen > Signed-off-by: Vishnu Dasa > --- > include/linux/vmw_vmci_defs.h | 34 +- > 1 file changed, 17 insertions(+), 17 deletions(-) > Now this header file needs to #include or for the BIT() macro. Do the users of this header file build cleanly anyway? Even if so, we prefer not to depend on implicit or accidental header file inclusions that may change in the future. > diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h > index eaa1e762bf06..007e28053da8 100644 > --- a/include/linux/vmw_vmci_defs.h > +++ b/include/linux/vmw_vmci_defs.h > @@ -33,27 +33,27 @@ > #define VMCI_MAX_DEVICES 1 > > /* Status register bits. */ > -#define VMCI_STATUS_INT_ON 0x1 > +#define VMCI_STATUS_INT_ON BIT(0) > > /* Control register bits. */ > -#define VMCI_CONTROL_RESET0x1 > -#define VMCI_CONTROL_INT_ENABLE 0x2 > -#define VMCI_CONTROL_INT_DISABLE 0x4 > +#define VMCI_CONTROL_RESETBIT(0) > +#define VMCI_CONTROL_INT_ENABLE BIT(1) > +#define VMCI_CONTROL_INT_DISABLE BIT(2) > > /* Capabilities register bits. */ > -#define VMCI_CAPS_HYPERCALL 0x1 > -#define VMCI_CAPS_GUESTCALL 0x2 > -#define VMCI_CAPS_DATAGRAM 0x4 > -#define VMCI_CAPS_NOTIFICATIONS 0x8 > -#define VMCI_CAPS_PPN64 0x10 > +#define VMCI_CAPS_HYPERCALL BIT(0) > +#define VMCI_CAPS_GUESTCALL BIT(1) > +#define VMCI_CAPS_DATAGRAM BIT(2) > +#define VMCI_CAPS_NOTIFICATIONS BIT(3) > +#define VMCI_CAPS_PPN64 BIT(4) > > /* Interrupt Cause register bits. */ > -#define VMCI_ICR_DATAGRAM 0x1 > -#define VMCI_ICR_NOTIFICATION 0x2 > +#define VMCI_ICR_DATAGRAM BIT(0) > +#define VMCI_ICR_NOTIFICATION BIT(1) > > /* Interrupt Mask register bits. */ > -#define VMCI_IMR_DATAGRAM 0x1 > -#define VMCI_IMR_NOTIFICATION 0x2 > +#define VMCI_IMR_DATAGRAM BIT(0) > +#define VMCI_IMR_NOTIFICATION BIT(1) > > /* Maximum MSI/MSI-X interrupt vectors in the device. */ > #define VMCI_MAX_INTRS 2 > @@ -463,9 +463,9 @@ struct vmci_datagram { > * datagram callback is invoked in a delayed context (not interrupt context). > */ > #define VMCI_FLAG_DG_NONE 0 > -#define VMCI_FLAG_WELLKNOWN_DG_HND 0x1 > -#define VMCI_FLAG_ANYCID_DG_HND0x2 > -#define VMCI_FLAG_DG_DELAYED_CB0x4 > +#define VMCI_FLAG_WELLKNOWN_DG_HND BIT(0) > +#define VMCI_FLAG_ANYCID_DG_HNDBIT(1) > +#define VMCI_FLAG_DG_DELAYED_CBBIT(2) > > /* > * Maximum supported size of a VMCI datagram for routable datagrams. > @@ -694,7 +694,7 @@ struct vmci_qp_detach_msg { > }; > > /* VMCI Doorbell API. */ > -#define VMCI_FLAG_DELAYED_CB 0x01 > +#define VMCI_FLAG_DELAYED_CB BIT(0) > > typedef void (*vmci_callback) (void *client_data); > > -- ~Randy ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] drm/bochs: Fix NULL dereference on atomic_disable helper
On Mon, Mar 11, 2019 at 02:49:58PM -0300, Rodrigo Siqueira wrote: > On 03/11, Gerd Hoffmann wrote: > > Hi, > > > > > > IIRC the drm code checks for the atomic_enable callback presence to > > > > figure whenever it should take the atomic or legacy code paths. > > > > > > It should check for drm_driver->mode_config.funcs.atomic_commit for that, > > > see drm_drv_uses_atomic_modeset(). Anything else should be a bug. > > > > > > Or do you mean the fallback to the old crtc helper prepare/commit > > > callbacks? > > > > Probably the later. There was some reason why I've left in the empty > > bochs_crtc_atomic_enable() callback ... > > Just for checking before I start to work in this patch: > The correct solution should be made atomic_enable and atomic_disable > optional, right? I should do it, and check if Bochs driver really needs > bochs_crtc_atomic_enable after my change, right? Yup. I just tried to remember why we haven't done this yet, but I think that was a patch to make crtc->helper_funcs optional. And that doesn't make sense imo, since if your crtc doesn't do anything then you don't really have an atomic driver :-) And if there's ever a legit use case for this, then that drive probably shouldn't use the atomic helpers ... But making crtc_helper_funcs->atomic_enable/disable optional sounds like a good idea. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
From: "Michael S. Tsirkin" Date: Mon, 11 Mar 2019 09:59:28 -0400 > On Mon, Mar 11, 2019 at 03:13:17PM +0800, Jason Wang wrote: >> >> On 2019/3/8 下午10:12, Christoph Hellwig wrote: >> > On Wed, Mar 06, 2019 at 02:18:07AM -0500, Jason Wang wrote: >> > > This series tries to access virtqueue metadata through kernel virtual >> > > address instead of copy_user() friends since they had too much >> > > overheads like checks, spec barriers or even hardware feature >> > > toggling. This is done through setup kernel address through vmap() and >> > > resigter MMU notifier for invalidation. >> > > >> > > Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see >> > > obvious improvement. >> > How is this going to work for CPUs with virtually tagged caches? >> >> >> Anything different that you worry? > > If caches have virtual tags then kernel and userspace view of memory > might not be automatically in sync if they access memory > through different virtual addresses. You need to do things like > flush_cache_page, probably multiple times. "flush_dcache_page()" ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/5] Clean up TTM mmap offsets
On Mon, Mar 11, 2019 at 05:51:39PM +0100, Christian König wrote: > Am 11.03.19 um 17:39 schrieb Hans de Goede: > > Hi, > > > > On 07-02-19 09:59, Thomas Zimmermann wrote: > > > Almost all TTM-based drivers use the same values for the mmap-able > > > range of BO addresses. Each driver therefore duplicates the > > > DRM_FILE_PAGE_OFFSET constant. OTOH, the mmap range's size is not > > > configurable by drivers. > > > > > > This patch set replaces driver-specific configuration with a single > > > setup. All code is located within TTM. TTM and GEM share the same > > > range for mmap-able BOs. > > > > > > Thomas Zimmermann (5): > > > staging/vboxvideo: Use same BO mmap offset as other drivers > > > drm/ttm: Define a single DRM_FILE_PAGE_OFFSET constant > > > drm/ttm: Remove file_page_offset parameter from ttm_bo_device_init() > > > drm/ttm: Quick-test mmap offset in ttm_bo_mmap() > > > drm: Use the same mmap-range offset and size for GEM and TTM > > > > Note I'm about to push a patch-series to drm-misc-next which moves > > vboxvideo out of staging and I see that this series has not landed > > in drm-misc-next yet, so it will needs to be rebased. > > Mhm, TTM is usually not pushed upstream through drm-misc-next, so that will > certainly collide with the next TTM pull request. > > So can you wait with that or should I make an exception and merge this > change though drm-misc-next? Other options: - Get amdgpu added to drm-tip and linux-next so we have a heads-up about the conflict. That's usually good enough to avoid the broken merge conflict. - Do a topic branch, pull it into both trees. - Really stuff ttm into a shared tree if it's meant to be shared infrastructure :-P Waiting+rebasing is imo the worst option, and usually not needed. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/5] Clean up TTM mmap offsets
Hi, On 11-03-19 17:51, Christian König wrote: Am 11.03.19 um 17:39 schrieb Hans de Goede: Hi, On 07-02-19 09:59, Thomas Zimmermann wrote: Almost all TTM-based drivers use the same values for the mmap-able range of BO addresses. Each driver therefore duplicates the DRM_FILE_PAGE_OFFSET constant. OTOH, the mmap range's size is not configurable by drivers. This patch set replaces driver-specific configuration with a single setup. All code is located within TTM. TTM and GEM share the same range for mmap-able BOs. Thomas Zimmermann (5): staging/vboxvideo: Use same BO mmap offset as other drivers drm/ttm: Define a single DRM_FILE_PAGE_OFFSET constant drm/ttm: Remove file_page_offset parameter from ttm_bo_device_init() drm/ttm: Quick-test mmap offset in ttm_bo_mmap() drm: Use the same mmap-range offset and size for GEM and TTM Note I'm about to push a patch-series to drm-misc-next which moves vboxvideo out of staging and I see that this series has not landed in drm-misc-next yet, so it will needs to be rebased. Mhm, TTM is usually not pushed upstream through drm-misc-next, so that will certainly collide with the next TTM pull request. Ugh, I didn't realize that this series would not be going through drm-misc-next. So can you wait with that or should I make an exception and merge this change though drm-misc-next? I've already pushed it now :| My mail was more intended as a headsup then that I expected an objection, sorry. I see 2 possible solutions: 1) Merge drm-misc-next into the ttm tree (probably the cleanest) 2) Push your series through drm-misc-next Regards, Hans ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 0/5] Clean up TTM mmap offsets
Hi, On 07-02-19 09:59, Thomas Zimmermann wrote: Almost all TTM-based drivers use the same values for the mmap-able range of BO addresses. Each driver therefore duplicates the DRM_FILE_PAGE_OFFSET constant. OTOH, the mmap range's size is not configurable by drivers. This patch set replaces driver-specific configuration with a single setup. All code is located within TTM. TTM and GEM share the same range for mmap-able BOs. Thomas Zimmermann (5): staging/vboxvideo: Use same BO mmap offset as other drivers drm/ttm: Define a single DRM_FILE_PAGE_OFFSET constant drm/ttm: Remove file_page_offset parameter from ttm_bo_device_init() drm/ttm: Quick-test mmap offset in ttm_bo_mmap() drm: Use the same mmap-range offset and size for GEM and TTM Note I'm about to push a patch-series to drm-misc-next which moves vboxvideo out of staging and I see that this series has not landed in drm-misc-next yet, so it will needs to be rebased. Regards, Hans ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On Thu 07-03-19 16:27:17, Andrea Arcangeli wrote: > > driver that GUP page for hours/days/weeks/months ... obviously the > > race window is big enough here. It affects many fs (ext4, xfs, ...) > > in different ways. I think ext4 is the most obvious because of the > > kernel log trace it leaves behind. > > > > Bottom line is for set_page_dirty to be safe you need the following: > > lock_page() > > page_mkwrite() > > set_pte_with_write() > > unlock_page() > > I also wondered why ext4 writepage doesn't recreate the bh if they got > dropped by the VM and page->private is 0. I mean, page->index and > page->mapping are still there, that's enough info for writepage itself > to take a slow path and calls page_mkwrite to find where to write the > page on disk. There are two problems: 1) What to do with errors that page_mkwrite() can generate (ENOMEM, ENOSPC, EIO). On page fault you just propagate them to userspace, on set_page_dirty() you have no chance so you just silently loose data. 2) We need various locks to protect page_mkwrite(), possibly do some IO. set_page_dirty() is rather uncertain context to acquire locks or do IO... Honza -- Jan Kara SUSE Labs, CR ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
On Mon, Mar 11, 2019 at 03:13:17PM +0800, Jason Wang wrote: > > On 2019/3/8 下午10:12, Christoph Hellwig wrote: > > On Wed, Mar 06, 2019 at 02:18:07AM -0500, Jason Wang wrote: > > > This series tries to access virtqueue metadata through kernel virtual > > > address instead of copy_user() friends since they had too much > > > overheads like checks, spec barriers or even hardware feature > > > toggling. This is done through setup kernel address through vmap() and > > > resigter MMU notifier for invalidation. > > > > > > Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see > > > obvious improvement. > > How is this going to work for CPUs with virtually tagged caches? > > > Anything different that you worry? If caches have virtual tags then kernel and userspace view of memory might not be automatically in sync if they access memory through different virtual addresses. You need to do things like flush_cache_page, probably multiple times. > I can have a test but do you know any > archs that use virtual tag cache? sparc I believe. > Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On Mon, Mar 11, 2019 at 08:48:37AM -0400, Michael S. Tsirkin wrote: > Using copyXuser is better I guess. It certainly would be faster there, but I don't think it's needed if that would be the only use case left that justifies supporting two different models. On small 32bit systems with little RAM kmap won't perform measurably different on 32bit or 64bit systems. If the 32bit host has a lot of ram it all gets slow anyway at accessing RAM above the direct mapping, if compared to 64bit host kernels, it's not just an issue for vhost + mmu notifier + kmap and the best way to optimize things is to run 64bit host kernels. Like Christoph pointed out, the main use case for retaining the copy-user model would be CPUs with virtually indexed not physically tagged data caches (they'll still suffer from the spectre-v1 fix, although I exclude they have to suffer the SMAP slowdown/feature). Those may require some additional flushing than the current copy-user model requires. As a rule of thumb any arch where copy_user_page doesn't define as copy_page will require some additional cache flushing after the kmap. Supposedly with vmap, the vmap layer should have taken care of that (I didn't verify that yet). There are some accessories like copy_to_user_page() copy_from_user_page() that could work and obviously defines to raw memcpy on x86 (the main cons is they don't provide word granular access) and at least on sparc they're tailored to ptrace assumptions so then we'd need to evaluate what happens if this is used outside of ptrace context. kmap has been used generally either to access whole pages (i.e. copy_user_page), so ptrace may actually be the only use case with subpage granularity access. #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ do {\ flush_cache_page(vma, vaddr, page_to_pfn(page));\ memcpy(dst, src, len); \ flush_ptrace_access(vma, page, vaddr, src, len, 0); \ } while (0) So I wouldn't rule out the need for a dual model, until we solve how to run this stable on non-x86 arches with not physically tagged caches. Thanks, Andrea ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] drm/bochs: Fix NULL dereference on atomic_disable helper
Hi, > > IIRC the drm code checks for the atomic_enable callback presence to > > figure whenever it should take the atomic or legacy code paths. > > It should check for drm_driver->mode_config.funcs.atomic_commit for that, > see drm_drv_uses_atomic_modeset(). Anything else should be a bug. > > Or do you mean the fallback to the old crtc helper prepare/commit > callbacks? Probably the later. There was some reason why I've left in the empty bochs_crtc_atomic_enable() callback ... cheers, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] drm/bochs: Fix NULL dereference on atomic_disable helper
On Mon, Mar 11, 2019 at 02:07:16PM +0100, Gerd Hoffmann wrote: > Hi, > > > > > static void bochs_crtc_atomic_flush(struct drm_crtc *crtc, > > > > struct drm_crtc_state > > > > *old_crtc_state) > > > > { > > > > @@ -66,6 +71,7 @@ static const struct drm_crtc_funcs bochs_crtc_funcs = > > > > { > > > > static const struct drm_crtc_helper_funcs bochs_helper_funcs = { > > > > .mode_set_nofb = bochs_crtc_mode_set_nofb, > > > > .atomic_enable = bochs_crtc_atomic_enable, > > > > + .atomic_disable = bochs_crtc_atomic_disable, > > > > > > Shouldn't we make the callback optional instead of adding empty dummy > > > functions to drivers? > > > > Hi Gerd, > > > > I agree, and I can work in this issue. > > Just one question, should we make atomic_enable optional as well? > > IIRC the drm code checks for the atomic_enable callback presence to > figure whenever it should take the atomic or legacy code paths. It should check for drm_driver->mode_config.funcs.atomic_commit for that, see drm_drv_uses_atomic_modeset(). Anything else should be a bug. Or do you mean the fallback to the old crtc helper prepare/commit callbacks? We'd need to make all of them optional ofc, with atomic_ variants being preferred ofc. -Daniel > > So, I think that will not work. > > cheers, > Gerd > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] drm/bochs: Fix NULL dereference on atomic_disable helper
Hi, > > > static void bochs_crtc_atomic_flush(struct drm_crtc *crtc, > > > struct drm_crtc_state *old_crtc_state) > > > { > > > @@ -66,6 +71,7 @@ static const struct drm_crtc_funcs bochs_crtc_funcs = { > > > static const struct drm_crtc_helper_funcs bochs_helper_funcs = { > > > .mode_set_nofb = bochs_crtc_mode_set_nofb, > > > .atomic_enable = bochs_crtc_atomic_enable, > > > + .atomic_disable = bochs_crtc_atomic_disable, > > > > Shouldn't we make the callback optional instead of adding empty dummy > > functions to drivers? > > Hi Gerd, > > I agree, and I can work in this issue. > Just one question, should we make atomic_enable optional as well? IIRC the drm code checks for the atomic_enable callback presence to figure whenever it should take the atomic or legacy code paths. So, I think that will not work. cheers, Gerd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On Mon, Mar 11, 2019 at 03:40:31PM +0800, Jason Wang wrote: > > On 2019/3/9 上午3:48, Andrea Arcangeli wrote: > > Hello Jeson, > > > > On Fri, Mar 08, 2019 at 04:50:36PM +0800, Jason Wang wrote: > > > Just to make sure I understand here. For boosting through huge TLB, do > > > you mean we can do that in the future (e.g by mapping more userspace > > > pages to kenrel) or it can be done by this series (only about three 4K > > > pages were vmapped per virtqueue)? > > When I answered about the advantages of mmu notifier and I mentioned > > guaranteed 2m/gigapages where available, I overlooked the detail you > > were using vmap instead of kmap. So with vmap you're actually doing > > the opposite, it slows down the access because it will always use a 4k > > TLB even if QEMU runs on THP or gigapages hugetlbfs. > > > > If there's just one page (or a few pages) in each vmap there's no need > > of vmap, the linearity vmap provides doesn't pay off in such > > case. > > > > So likely there's further room for improvement here that you can > > achieve in the current series by just dropping vmap/vunmap. > > > > You can just use kmap (or kmap_atomic if you're in preemptible > > section, should work from bh/irq). > > > > In short the mmu notifier to invalidate only sets a "struct page * > > userringpage" pointer to NULL without calls to vunmap. > > > > In all cases immediately after gup_fast returns you can always call > > put_page immediately (which explains why I'd like an option to drop > > FOLL_GET from gup_fast to speed it up). > > > > Then you can check the sequence_counter and inc/dec counter increased > > by _start/_end. That will tell you if the page you got and you called > > put_page to immediately unpin it or even to free it, cannot go away > > under you until the invalidate is called. > > > > If sequence counters and counter tells that gup_fast raced with anyt > > mmu notifier invalidate you can just repeat gup_fast. Otherwise you're > > done, the page cannot go away under you, the host virtual to host > > physical mapping cannot change either. And the page is not pinned > > either. So you can just set the "struct page * userringpage = page" > > where "page" was the one setup by gup_fast. > > > > When later the invalidate runs, you can just call set_page_dirty if > > gup_fast was called with "write = 1" and then you clear the pointer > > "userringpage = NULL". > > > > When you need to read/write to the memory > > kmap/kmap_atomic(userringpage) should work. > > > Yes, I've considered kmap() from the start. The reason I don't do that is > large virtqueue may need more than one page so VA might not be contiguous. > But this is probably not a big issue which just need more tricks in the > vhost memory accessors. > > > > > > In short because there's no hardware involvement here, the established > > mapping is just the pointer to the page, there is no need of setting > > up any pagetables or to do any TLB flushes (except on 32bit archs if > > the page is above the direct mapping but it never happens on 64bit > > archs). > > > I see, I believe we don't care much about the performance of 32bit archs (or > we can just fallback to copy_to_user() friends). Using copyXuser is better I guess. > Using direct mapping (I > guess kernel will always try hugepage for that?) should be better and we can > even use it for the data transfer not only for the metadata. > > Thanks We can't really. The big issue is get user pages. Doing that on data path will be slower than copyXuser. Or maybe it won't with the amount of mitigations spread around. Go ahead and try. > > > > > Thanks, > > Andrea ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On 2019/3/9 上午3:48, Andrea Arcangeli wrote: Hello Jeson, On Fri, Mar 08, 2019 at 04:50:36PM +0800, Jason Wang wrote: Just to make sure I understand here. For boosting through huge TLB, do you mean we can do that in the future (e.g by mapping more userspace pages to kenrel) or it can be done by this series (only about three 4K pages were vmapped per virtqueue)? When I answered about the advantages of mmu notifier and I mentioned guaranteed 2m/gigapages where available, I overlooked the detail you were using vmap instead of kmap. So with vmap you're actually doing the opposite, it slows down the access because it will always use a 4k TLB even if QEMU runs on THP or gigapages hugetlbfs. If there's just one page (or a few pages) in each vmap there's no need of vmap, the linearity vmap provides doesn't pay off in such case. So likely there's further room for improvement here that you can achieve in the current series by just dropping vmap/vunmap. You can just use kmap (or kmap_atomic if you're in preemptible section, should work from bh/irq). In short the mmu notifier to invalidate only sets a "struct page * userringpage" pointer to NULL without calls to vunmap. In all cases immediately after gup_fast returns you can always call put_page immediately (which explains why I'd like an option to drop FOLL_GET from gup_fast to speed it up). Then you can check the sequence_counter and inc/dec counter increased by _start/_end. That will tell you if the page you got and you called put_page to immediately unpin it or even to free it, cannot go away under you until the invalidate is called. If sequence counters and counter tells that gup_fast raced with anyt mmu notifier invalidate you can just repeat gup_fast. Otherwise you're done, the page cannot go away under you, the host virtual to host physical mapping cannot change either. And the page is not pinned either. So you can just set the "struct page * userringpage = page" where "page" was the one setup by gup_fast. When later the invalidate runs, you can just call set_page_dirty if gup_fast was called with "write = 1" and then you clear the pointer "userringpage = NULL". When you need to read/write to the memory kmap/kmap_atomic(userringpage) should work. Yes, I've considered kmap() from the start. The reason I don't do that is large virtqueue may need more than one page so VA might not be contiguous. But this is probably not a big issue which just need more tricks in the vhost memory accessors. In short because there's no hardware involvement here, the established mapping is just the pointer to the page, there is no need of setting up any pagetables or to do any TLB flushes (except on 32bit archs if the page is above the direct mapping but it never happens on 64bit archs). I see, I believe we don't care much about the performance of 32bit archs (or we can just fallback to copy_to_user() friends). Using direct mapping (I guess kernel will always try hugepage for that?) should be better and we can even use it for the data transfer not only for the metadata. Thanks Thanks, Andrea ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On 2019/3/9 上午3:11, Andrea Arcangeli wrote: On Fri, Mar 08, 2019 at 05:13:26PM +0800, Jason Wang wrote: Actually not wrapping around, the pages for used ring was marked as dirty after a round of virtqueue processing when we're sure vhost wrote something there. Thanks for the clarification. So we need to convert it to set_page_dirty and move it to the mmu notifier invalidate but in those cases where gup_fast was called with write=1 (1 out of 3). If using ->invalidate_range the page pin also must be removed immediately after get_user_pages returns (not ok to hold the pin in vmap until ->invalidate_range is called) to avoid false positive gup pin checks in things like KSM, or the pin must be released in invalidate_range_start (which is called before the pin checks). Here's why: /* * Check that no O_DIRECT or similar I/O is in progress on the * page */ if (page_mapcount(page) + 1 + swapped != page_count(page)) { set_pte_at(mm, pvmw.address, pvmw.pte, entry); goto out_unlock; } [..] set_pte_at_notify(mm, pvmw.address, pvmw.pte, entry); ^^^ too late release the pin here, the above already failed ->invalidate_range cannot be used with mutex anyway so you need to go back with invalidate_range_start/end anyway, just the pin must be released in _start at the latest in such case. Yes. My prefer is generally to call gup_fast() followed by immediate put_page() because I always want to drop FOLL_GET from gup_fast as a whole to avoid 2 useless atomic ops per gup_fast. Ok, will do this (if I still plan to use vmap() in next version). I'll write more about vmap in answer to the other email. Thanks, Andrea Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address
On 2019/3/8 下午10:58, Jerome Glisse wrote: On Fri, Mar 08, 2019 at 04:50:36PM +0800, Jason Wang wrote: On 2019/3/8 上午3:16, Andrea Arcangeli wrote: On Thu, Mar 07, 2019 at 12:56:45PM -0500, Michael S. Tsirkin wrote: On Thu, Mar 07, 2019 at 10:47:22AM -0500, Michael S. Tsirkin wrote: On Wed, Mar 06, 2019 at 02:18:12AM -0500, Jason Wang wrote: +static const struct mmu_notifier_ops vhost_mmu_notifier_ops = { + .invalidate_range = vhost_invalidate_range, +}; + void vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue **vqs, int nvqs, int iov_limit) { I also wonder here: when page is write protected then it does not look like .invalidate_range is invoked. E.g. mm/ksm.c calls mmu_notifier_invalidate_range_start and mmu_notifier_invalidate_range_end but not mmu_notifier_invalidate_range. Similarly, rmap in page_mkclean_one will not call mmu_notifier_invalidate_range. If I'm right vhost won't get notified when page is write-protected since you didn't install start/end notifiers. Note that end notifier can be called with page locked, so it's not as straight-forward as just adding a call. Writing into a write-protected page isn't a good idea. Note that documentation says: it is fine to delay the mmu_notifier_invalidate_range call to mmu_notifier_invalidate_range_end() outside the page table lock. implying it's called just later. OK I missed the fact that _end actually calls mmu_notifier_invalidate_range internally. So that part is fine but the fact that you are trying to take page lock under VQ mutex and take same mutex within notifier probably means it's broken for ksm and rmap at least since these call invalidate with lock taken. Yes this lock inversion needs more thoughts. And generally, Andrea told me offline one can not take mutex under the notifier callback. I CC'd Andrea for why. Yes, the problem then is the ->invalidate_page is called then under PT lock so it cannot take mutex, you also cannot take the page_lock, it can at most take a spinlock or trylock_page. So it must switch back to the _start/_end methods unless you rewrite the locking. The difference with _start/_end, is that ->invalidate_range avoids the _start callback basically, but to avoid the _start callback safely, it has to be called in between the ptep_clear_flush and the set_pte_at whenever the pfn changes like during a COW. So it cannot be coalesced in a single TLB flush that invalidates all sptes in a range like we prefer for performance reasons for example in KVM. It also cannot sleep. In short ->invalidate_range must be really fast (it shouldn't require to send IPI to all other CPUs like KVM may require during an invalidate_range_start) and it must not sleep, in order to prefer it to _start/_end. I.e. the invalidate of the secondary MMU that walks the linux pagetables in hardware (in vhost case with GUP in software) has to happen while the linux pagetable is zero, otherwise a concurrent hardware pagetable lookup could re-instantiate a mapping to the old page in between the set_pte_at and the invalidate_range_end (which internally calls ->invalidate_range). Jerome documented it nicely in Documentation/vm/mmu_notifier.rst . Right, I've actually gone through this several times but some details were missed by me obviously. Now you don't really walk the pagetable in hardware in vhost, but if you use gup_fast after usemm() it's similar. For vhost the invalidate would be really fast, there are no IPI to deliver at all, the problem is just the mutex. Yes. A possible solution is to introduce a valid flag for VA. Vhost may only try to access kernel VA when it was valid. Invalidate_range_start() will clear this under the protection of the vq mutex when it can block. Then invalidate_range_end() then can clear this flag. An issue is blockable is always false for range_end(). Note that there can be multiple asynchronous concurrent invalidate_range callbacks. So a flag does not work but a counter of number of active invalidation would. See how KSM is doing it for instance in kvm_main.c The pattern for this kind of thing is: my_invalidate_range_start(start,end) { ... if (mystruct_overlap(mystruct, start, end)) { mystruct_lock(); mystruct->invalidate_count++; ... mystruct_unlock(); } } my_invalidate_range_end(start,end) { ... if (mystruct_overlap(mystruct, start, end)) { mystruct_lock(); mystruct->invalidate_count--; ... mystruct_unlock(); } } my_access_va(mystruct) { again: wait_on(!mystruct->invalidate_count) mystruct_lock(); if (mystruct->invalidate_count) { mystruct_unlock(); goto again; } GUP(); ... mystruct_unlock(); } Cheers, Jérôme Yes, this should work. Thanks
Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()
On 2019/3/8 下午10:12, Christoph Hellwig wrote: On Wed, Mar 06, 2019 at 02:18:07AM -0500, Jason Wang wrote: This series tries to access virtqueue metadata through kernel virtual address instead of copy_user() friends since they had too much overheads like checks, spec barriers or even hardware feature toggling. This is done through setup kernel address through vmap() and resigter MMU notifier for invalidation. Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see obvious improvement. How is this going to work for CPUs with virtually tagged caches? Anything different that you worry? I can have a test but do you know any archs that use virtual tag cache? Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization