RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Wednesday, May 29, 2013 1:50 AM > To: Inki Dae > Cc: Rob Clark; Maarten Lankhorst; linux-fbdev; YoungJun Cho; Kyungmin Park; > myungjoo.ham; DRI mailing list; linux-arm-ker...@lists.infradead.org; > linux-me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Tue, May 28, 2013 at 4:50 PM, Inki Dae wrote: > > I think I already used reservation stuff any time in that way except > > ww-mutex. And I'm not sure that embedded system really needs ww-mutex. > If > > there is any case, > > could you tell me the case? I really need more advice and > understanding :) > > If you have only one driver, you can get away without ww_mutex. > drm/i915 does it, all buffer state is protected by dev->struct_mutex. > But as soon as you have multiple drivers sharing buffers with dma_buf > things will blow up. > > Yep, current prime is broken and can lead to deadlocks. > > In practice it doesn't (yet) matter since only the X server does the > sharing dance, and that one's single-threaded. Now you can claim that > since you have all buffers pinned in embedded gfx anyway, you don't > care. But both in desktop gfx and embedded gfx the real fun starts > once you put fences into the mix and link them up with buffers, then > every command submission risks that deadlock. Furthermore you can get > unlucky and construct a circle of fences waiting on each another (only > though if the fence singalling fires off the next batchbuffer > asynchronously). In our case, we haven't ever experienced deadlock yet but there is still possible to face with deadlock in case that a process is sharing two buffer with another process like below, Process A committed buffer A and waits for buffer B, Process B committed buffer B and waits for buffer A That is deadlock and it seems that you say we can resolve deadlock issue with ww-mutexes. And it seems that we can replace our block-wakeup mechanism with mutex lock for more performance. > > To prevent such deadlocks you _absolutely_ need to lock _all_ buffers > that take part in a command submission at once. To do that you either > need a global lock (ugh) or ww_mutexes. > > So ww_mutexes are the fundamental ingredient of all this, if you don't > see why you need them then everything piled on top is broken. I think > until you've understood why exactly we need ww_mutexes there's not > much point in discussing the finer issues of fences, reservation > objects and how to integrate it with dma_bufs exactly. > > I'll try to clarify the motivating example in the ww_mutex > documentation a bit, but I dunno how else I could explain this ... > I don't really want for you waste your time on me. I will trying to apply ww-mutexes (v4) to the proposed framework for more understanding. Thanks for your advices.:) Inki Dae > Yours, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
On Tue, May 28, 2013 at 4:50 PM, Inki Dae wrote: > I think I already used reservation stuff any time in that way except > ww-mutex. And I'm not sure that embedded system really needs ww-mutex. If > there is any case, > could you tell me the case? I really need more advice and understanding :) If you have only one driver, you can get away without ww_mutex. drm/i915 does it, all buffer state is protected by dev->struct_mutex. But as soon as you have multiple drivers sharing buffers with dma_buf things will blow up. Yep, current prime is broken and can lead to deadlocks. In practice it doesn't (yet) matter since only the X server does the sharing dance, and that one's single-threaded. Now you can claim that since you have all buffers pinned in embedded gfx anyway, you don't care. But both in desktop gfx and embedded gfx the real fun starts once you put fences into the mix and link them up with buffers, then every command submission risks that deadlock. Furthermore you can get unlucky and construct a circle of fences waiting on each another (only though if the fence singalling fires off the next batchbuffer asynchronously). To prevent such deadlocks you _absolutely_ need to lock _all_ buffers that take part in a command submission at once. To do that you either need a global lock (ugh) or ww_mutexes. So ww_mutexes are the fundamental ingredient of all this, if you don't see why you need them then everything piled on top is broken. I think until you've understood why exactly we need ww_mutexes there's not much point in discussing the finer issues of fences, reservation objects and how to integrate it with dma_bufs exactly. I'll try to clarify the motivating example in the ww_mutex documentation a bit, but I dunno how else I could explain this ... Yours, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev- > ow...@vger.kernel.org] On Behalf Of Rob Clark > Sent: Tuesday, May 28, 2013 10:49 PM > To: Inki Dae > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin > Park; myungjoo.ham; DRI mailing list; linux-arm-ker...@lists.infradead.org; > linux-me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 27, 2013 at 11:56 PM, Inki Dae wrote: > > > > > >> -Original Message- > >> From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev- > >> ow...@vger.kernel.org] On Behalf Of Rob Clark > >> Sent: Tuesday, May 28, 2013 12:48 AM > >> To: Inki Dae > >> Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; > Kyungmin > >> Park; myungjoo.ham; DRI mailing list; > > linux-arm-ker...@lists.infradead.org; > >> linux-me...@vger.kernel.org > >> Subject: Re: Introduce a new helper framework for buffer > synchronization > >> > >> On Mon, May 27, 2013 at 6:38 AM, Inki Dae wrote: > >> > Hi all, > >> > > >> > I have been removed previous branch and added new one with more > cleanup. > >> > This time, the fence helper doesn't include user side interfaces and > >> cache > >> > operation relevant codes anymore because not only we are not sure > that > >> > coupling those two things, synchronizing caches and buffer access > >> between > >> > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side > is > >> a > >> > good idea yet but also existing codes for user side have problems > with > >> badly > >> > behaved or crashing userspace. So this could be more discussed later. > >> > > >> > The below is a new branch, > >> > > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > >> exynos.git/?h=dma-f > >> > ence-helper > >> > > >> > And fence helper codes, > >> > > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > >> exynos.git/commit/? > >> > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > >> > > >> > And example codes for device driver, > >> > > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > >> exynos.git/commit/? > >> > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > >> > > >> > I think the time is not yet ripe for RFC posting: maybe existing dma > >> fence > >> > and reservation need more review and addition work. So I'd glad for > >> somebody > >> > giving other opinions and advices in advance before RFC posting. > >> > >> thoughts from a *really* quick, pre-coffee, first look: > >> * any sort of helper to simplify single-buffer sort of use-cases (v4l) > >> probably wouldn't want to bake in assumption that seqno_fence is used. > >> * I guess g2d is probably not actually a simple use case, since I > >> expect you can submit blits involving multiple buffers :-P > > > > I don't think so. One and more buffers can be used: seqno_fence also has > > only one buffer. Actually, we have already applied this approach to most > > devices; multimedia, gpu and display controller. And this approach shows > > more performance; reduced power consumption against traditional way. And > g2d > > example is just to show you how to apply my approach to device driver. > > no, you need the ww-mutex / reservation stuff any time you have > multiple independent devices (or rings/contexts for hw that can > support multiple contexts) which can do operations with multiple > buffers. I think I already used reservation stuff any time in that way except ww-mutex. And I'm not sure that embedded system really needs ww-mutex. If there is any case, could you tell me the case? I really need more advice and understanding :) Thanks, Inki Dae So you could conceivably hit this w/ gpu + g2d if multiple > buffers where shared between the two. vram migration and such > 'desktop stuff' might make the problem worse, but just because you > don't have vram doesn't mean you don't have a problem with multiple > buffers. > > >> * otherwise, you probably don't want to depend on dmabuf, which is why > >> reservation/fence is split out the way it is.. you want to be able to > >> use a single reserva
RE: Introduce a new helper framework for buffer synchronization
Hi Daniel, Thank you so much. And so very useful.:) Sorry but could be give me more comments to the below my comments? There are still things making me confusing.:( > -Original Message- > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Tuesday, May 28, 2013 7:33 PM > To: Inki Dae > Cc: 'Rob Clark'; 'Maarten Lankhorst'; 'Daniel Vetter'; 'linux-fbdev'; > 'YoungJun Cho'; 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list'; > linux-arm-ker...@lists.infradead.org; linux-me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Tue, May 28, 2013 at 12:56:57PM +0900, Inki Dae wrote: > > > > > > > -Original Message- > > > From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev- > > > ow...@vger.kernel.org] On Behalf Of Rob Clark > > > Sent: Tuesday, May 28, 2013 12:48 AM > > > To: Inki Dae > > > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; > Kyungmin > > > Park; myungjoo.ham; DRI mailing list; > > linux-arm-ker...@lists.infradead.org; > > > linux-me...@vger.kernel.org > > > Subject: Re: Introduce a new helper framework for buffer > synchronization > > > > > > On Mon, May 27, 2013 at 6:38 AM, Inki Dae wrote: > > > > Hi all, > > > > > > > > I have been removed previous branch and added new one with more > cleanup. > > > > This time, the fence helper doesn't include user side interfaces and > > > cache > > > > operation relevant codes anymore because not only we are not sure > that > > > > coupling those two things, synchronizing caches and buffer access > > > between > > > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel > side is > > > a > > > > good idea yet but also existing codes for user side have problems > with > > > badly > > > > behaved or crashing userspace. So this could be more discussed later. > > > > > > > > The below is a new branch, > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > > exynos.git/?h=dma-f > > > > ence-helper > > > > > > > > And fence helper codes, > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > > exynos.git/commit/? > > > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > > > > > > > And example codes for device driver, > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > > exynos.git/commit/? > > > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > > > > > > > I think the time is not yet ripe for RFC posting: maybe existing dma > > > fence > > > > and reservation need more review and addition work. So I'd glad for > > > somebody > > > > giving other opinions and advices in advance before RFC posting. > > > > > > thoughts from a *really* quick, pre-coffee, first look: > > > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > > > probably wouldn't want to bake in assumption that seqno_fence is used. > > > * I guess g2d is probably not actually a simple use case, since I > > > expect you can submit blits involving multiple buffers :-P > > > > I don't think so. One and more buffers can be used: seqno_fence also has > > only one buffer. Actually, we have already applied this approach to most > > devices; multimedia, gpu and display controller. And this approach shows > > more performance; reduced power consumption against traditional way. And > g2d > > example is just to show you how to apply my approach to device driver. > > Note that seqno_fence is an implementation pattern for a certain type of > direct hw->hw synchronization which uses a shared dma_buf to exchange the > sync cookie. I'm afraid that I don't understand hw->hw synchronization. hw->hw synchronization means that device has a hardware feature which supports buffer synchronization hardware internally? And what is the sync cookie? > The dma_buf attached to the seqno_fence has _nothing_ to do > with the dma_buf the fence actually coordinates access to. > > I think that confusing is a large reason for why Maarten&I don't > understand what you want to achieve with your fence helpers. Currently > they're us
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 27, 2013 at 11:56 PM, Inki Dae wrote: > > >> -Original Message- >> From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev- >> ow...@vger.kernel.org] On Behalf Of Rob Clark >> Sent: Tuesday, May 28, 2013 12:48 AM >> To: Inki Dae >> Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin >> Park; myungjoo.ham; DRI mailing list; > linux-arm-ker...@lists.infradead.org; >> linux-me...@vger.kernel.org >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> On Mon, May 27, 2013 at 6:38 AM, Inki Dae wrote: >> > Hi all, >> > >> > I have been removed previous branch and added new one with more cleanup. >> > This time, the fence helper doesn't include user side interfaces and >> cache >> > operation relevant codes anymore because not only we are not sure that >> > coupling those two things, synchronizing caches and buffer access >> between >> > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is >> a >> > good idea yet but also existing codes for user side have problems with >> badly >> > behaved or crashing userspace. So this could be more discussed later. >> > >> > The below is a new branch, >> > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/?h=dma-f >> > ence-helper >> > >> > And fence helper codes, >> > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/commit/? >> > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 >> > >> > And example codes for device driver, >> > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/commit/? >> > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae >> > >> > I think the time is not yet ripe for RFC posting: maybe existing dma >> fence >> > and reservation need more review and addition work. So I'd glad for >> somebody >> > giving other opinions and advices in advance before RFC posting. >> >> thoughts from a *really* quick, pre-coffee, first look: >> * any sort of helper to simplify single-buffer sort of use-cases (v4l) >> probably wouldn't want to bake in assumption that seqno_fence is used. >> * I guess g2d is probably not actually a simple use case, since I >> expect you can submit blits involving multiple buffers :-P > > I don't think so. One and more buffers can be used: seqno_fence also has > only one buffer. Actually, we have already applied this approach to most > devices; multimedia, gpu and display controller. And this approach shows > more performance; reduced power consumption against traditional way. And g2d > example is just to show you how to apply my approach to device driver. no, you need the ww-mutex / reservation stuff any time you have multiple independent devices (or rings/contexts for hw that can support multiple contexts) which can do operations with multiple buffers. So you could conceivably hit this w/ gpu + g2d if multiple buffers where shared between the two. vram migration and such 'desktop stuff' might make the problem worse, but just because you don't have vram doesn't mean you don't have a problem with multiple buffers. >> * otherwise, you probably don't want to depend on dmabuf, which is why >> reservation/fence is split out the way it is.. you want to be able to >> use a single reservation/fence mechanism within your driver without >> having to care about which buffers are exported to dmabuf's and which >> are not. Creating a dmabuf for every GEM bo is too heavyweight. > > Right. But I think we should dealt with this separately. Actually, we are > trying to use reservation for gpu pipe line synchronization such as sgx sync > object and this approach is used without dmabuf. In order words, some device > can use only reservation for such pipe line synchronization and at the same > time, fence helper or similar thing with dmabuf for buffer synchronization. it is probably easier to approach from the reverse direction.. ie, get reservation/synchronization right first, and then dmabuf. (Well, that isn't really a problem because Maarten's reservation/fence patches support dmabuf from the beginning.) BR, -R >> >> I'm not entirely sure if reservation/fence could/should be made any >> simpler for multi-buffer users. Probably the best thing to do is just >> get reservation/fence rolled out in a few drivers and see if some >> common patterns emerge. >> >> BR, >> -R >> >> > >> > Thanks, >> > Inki Dae >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
On Tue, May 28, 2013 at 12:56:57PM +0900, Inki Dae wrote: > > > > -Original Message- > > From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev- > > ow...@vger.kernel.org] On Behalf Of Rob Clark > > Sent: Tuesday, May 28, 2013 12:48 AM > > To: Inki Dae > > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin > > Park; myungjoo.ham; DRI mailing list; > linux-arm-ker...@lists.infradead.org; > > linux-me...@vger.kernel.org > > Subject: Re: Introduce a new helper framework for buffer synchronization > > > > On Mon, May 27, 2013 at 6:38 AM, Inki Dae wrote: > > > Hi all, > > > > > > I have been removed previous branch and added new one with more cleanup. > > > This time, the fence helper doesn't include user side interfaces and > > cache > > > operation relevant codes anymore because not only we are not sure that > > > coupling those two things, synchronizing caches and buffer access > > between > > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is > > a > > > good idea yet but also existing codes for user side have problems with > > badly > > > behaved or crashing userspace. So this could be more discussed later. > > > > > > The below is a new branch, > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > exynos.git/?h=dma-f > > > ence-helper > > > > > > And fence helper codes, > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > exynos.git/commit/? > > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > > > > > And example codes for device driver, > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > exynos.git/commit/? > > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > > > > > I think the time is not yet ripe for RFC posting: maybe existing dma > > fence > > > and reservation need more review and addition work. So I'd glad for > > somebody > > > giving other opinions and advices in advance before RFC posting. > > > > thoughts from a *really* quick, pre-coffee, first look: > > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > > probably wouldn't want to bake in assumption that seqno_fence is used. > > * I guess g2d is probably not actually a simple use case, since I > > expect you can submit blits involving multiple buffers :-P > > I don't think so. One and more buffers can be used: seqno_fence also has > only one buffer. Actually, we have already applied this approach to most > devices; multimedia, gpu and display controller. And this approach shows > more performance; reduced power consumption against traditional way. And g2d > example is just to show you how to apply my approach to device driver. Note that seqno_fence is an implementation pattern for a certain type of direct hw->hw synchronization which uses a shared dma_buf to exchange the sync cookie. The dma_buf attached to the seqno_fence has _nothing_ to do with the dma_buf the fence actually coordinates access to. I think that confusing is a large reason for why Maarten&I don't understand what you want to achieve with your fence helpers. Currently they're using the seqno_fence, but totally not in a way the seqno_fence was meant to be used. Note that with the current code there is only a pointer from dma_bufs to the fence. The fence itself has _no_ pointer to the dma_buf it syncs. This shouldn't be a problem since the fence fastpath for already signalled fences is completely barrier&lock free (it's just a load+bit-test), and fences are meant to be embedded into whatever dma tracking structure you already have, so no overhead there. The only ugly part is the fence refcounting, but I don't think we can drop that. Note that you completely reinvent this part of Maarten's fence patches by adding new r/w_complete completions to the reservation object, which completely replaces the fence stuff. Also note that a list of reservation entries is again meant to be used only when submitting the dma to the gpu. With your patches you seem to hang onto that list until dma completes. This has the ugly side-effect that you need to allocate these reservation entries with kmalloc, whereas if you just use them in the execbuf ioctl to construct the dma you can usually embed it into something else you need already anyway. At least i915 and ttm based drivers can work that way. Furthermore fences are specifically constructed as frankenstein-monsters between completion/waitqueues and c
Re: Introduce a new helper framework for buffer synchronization
Hey, Op 28-05-13 04:49, Inki Dae schreef: > >> -Original Message- >> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com] >> Sent: Tuesday, May 28, 2013 12:23 AM >> To: Inki Dae >> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'YoungJun Cho'; 'Kyungmin >> Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm- >> ker...@lists.infradead.org; linux-me...@vger.kernel.org >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> Hey, >> >> Op 27-05-13 12:38, Inki Dae schreef: >>> Hi all, >>> >>> I have been removed previous branch and added new one with more cleanup. >>> This time, the fence helper doesn't include user side interfaces and >> cache >>> operation relevant codes anymore because not only we are not sure that >>> coupling those two things, synchronizing caches and buffer access >> between >>> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is >> a >>> good idea yet but also existing codes for user side have problems with >> badly >>> behaved or crashing userspace. So this could be more discussed later. >>> >>> The below is a new branch, >>> >>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/?h=dma-f >>> ence-helper >>> >>> And fence helper codes, >>> >>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/commit/? >>> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 >>> >>> And example codes for device driver, >>> >>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/commit/? >>> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae >>> >>> I think the time is not yet ripe for RFC posting: maybe existing dma >> fence >>> and reservation need more review and addition work. So I'd glad for >> somebody >>> giving other opinions and advices in advance before RFC posting. >>> >> NAK. >> >> For examples for how to handle locking properly, see Documentation/ww- >> mutex-design.txt in my recent tree. >> I could list what I believe is wrong with your implementation, but real >> problem is that the approach you're taking is wrong. > I just removed ticket stubs to show my approach you guys as simple as > possible, and I just wanted to show that we could use buffer synchronization > mechanism without ticket stubs. The tickets have been removed in favor of a ww_context. Moving it in as a base primitive allows more locking abuse to be detected, and makes some other things easier too. > Question, WW-Mutexes could be used for all devices? I guess this has > dependence on x86 gpu: gpu has VRAM and it means different memory domain. > And could you tell my why shared fence should have only eight objects? I > think we could need more than eight objects for read access. Anyway I think > I don't surely understand yet so there might be my missing point. Yes, ww mutexes are not limited in any way to x86. They're a locking mechanism. When you acquired the ww mutexes for all buffer objects, all it does is say at that point in time you have exclusively acquired the locks of all bo's. After locking everything you can read the fence pointers safely, queue waits, and set a new fence pointer on all reservation_objects. You only need a single fence on all those objects, so 8 is plenty. Nonetheless this was a limitation of my earlier design, and I'll dynamically allocate fence_shared in the future. ~Maarten ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Tuesday, May 28, 2013 1:02 AM > To: Rob Clark > Cc: Inki Dae; Maarten Lankhorst; linux-fbdev; YoungJun Cho; Kyungmin Park; > myungjoo.ham; DRI mailing list; linux-arm-ker...@lists.infradead.org; > linux-me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 27, 2013 at 5:47 PM, Rob Clark wrote: > > On Mon, May 27, 2013 at 6:38 AM, Inki Dae wrote: > >> Hi all, > >> > >> I have been removed previous branch and added new one with more cleanup. > >> This time, the fence helper doesn't include user side interfaces and > cache > >> operation relevant codes anymore because not only we are not sure that > >> coupling those two things, synchronizing caches and buffer access > between > >> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side > is a > >> good idea yet but also existing codes for user side have problems with > badly > >> behaved or crashing userspace. So this could be more discussed later. > >> > >> The below is a new branch, > >> > >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/?h=dma-f > >> ence-helper > >> > >> And fence helper codes, > >> > >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > >> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > >> > >> And example codes for device driver, > >> > >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > >> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > >> > >> I think the time is not yet ripe for RFC posting: maybe existing dma > fence > >> and reservation need more review and addition work. So I'd glad for > somebody > >> giving other opinions and advices in advance before RFC posting. > > > > thoughts from a *really* quick, pre-coffee, first look: > > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > > probably wouldn't want to bake in assumption that seqno_fence is used. > > Yeah, which is why Maarten&I discussed ideas already for what needs to > be improved in the current dma-buf interface code to make this Just > Work. At least as long as a driver doesn't want to add new fences, > which would be especially useful for all kinds of gpu access. > > > * I guess g2d is probably not actually a simple use case, since I > > expect you can submit blits involving multiple buffers :-P > > Yeah, on a quick read the current fence helper code seems to be a bit > limited in scope. > > > * otherwise, you probably don't want to depend on dmabuf, which is why > > reservation/fence is split out the way it is.. you want to be able to > > use a single reservation/fence mechanism within your driver without > > having to care about which buffers are exported to dmabuf's and which > > are not. Creating a dmabuf for every GEM bo is too heavyweight. > > That's pretty much the reason that reservations are free-standing from > dma_bufs. The idea is to embed them into the gem/ttm/v4l buffer > object. Maarten also has some helpers to keep track of multi-buffer > ww_mutex locking and fence attaching in his reservation helpers, but I > think we should wait with those until we have drivers using them. > > For now I think the priority should be to get the basic stuff in and > ttm as the first user established. Then we can go nuts later on. > > > I'm not entirely sure if reservation/fence could/should be made any > > simpler for multi-buffer users. Probably the best thing to do is just > > get reservation/fence rolled out in a few drivers and see if some > > common patterns emerge. > > I think we can make the 1 buffer per dma op (i.e. 1:1 > dma_buf->reservation : fence mapping) work fairly simple in dma_buf > with maybe a dma_buf_attachment_start_dma/end_dma helpers. But there's > also still the open that currently there's no way to flush cpu caches > for dma access without unmapping the attachement (or resorting to That was what I tried adding user interfaces to dmabuf: coupling synchronizing caches and buffer access between CPU and CPU, CPU and DMA, and DMA and DMA with fences in kernel side. We need something to do between mapping and unmapping attachment. > trick which might not work), so we have a few gaping holes in the > interface already anyway. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev- > ow...@vger.kernel.org] On Behalf Of Rob Clark > Sent: Tuesday, May 28, 2013 12:48 AM > To: Inki Dae > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin > Park; myungjoo.ham; DRI mailing list; linux-arm-ker...@lists.infradead.org; > linux-me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 27, 2013 at 6:38 AM, Inki Dae wrote: > > Hi all, > > > > I have been removed previous branch and added new one with more cleanup. > > This time, the fence helper doesn't include user side interfaces and > cache > > operation relevant codes anymore because not only we are not sure that > > coupling those two things, synchronizing caches and buffer access > between > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is > a > > good idea yet but also existing codes for user side have problems with > badly > > behaved or crashing userspace. So this could be more discussed later. > > > > The below is a new branch, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/?h=dma-f > > ence-helper > > > > And fence helper codes, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > > > And example codes for device driver, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > > > I think the time is not yet ripe for RFC posting: maybe existing dma > fence > > and reservation need more review and addition work. So I'd glad for > somebody > > giving other opinions and advices in advance before RFC posting. > > thoughts from a *really* quick, pre-coffee, first look: > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > probably wouldn't want to bake in assumption that seqno_fence is used. > * I guess g2d is probably not actually a simple use case, since I > expect you can submit blits involving multiple buffers :-P I don't think so. One and more buffers can be used: seqno_fence also has only one buffer. Actually, we have already applied this approach to most devices; multimedia, gpu and display controller. And this approach shows more performance; reduced power consumption against traditional way. And g2d example is just to show you how to apply my approach to device driver. > * otherwise, you probably don't want to depend on dmabuf, which is why > reservation/fence is split out the way it is.. you want to be able to > use a single reservation/fence mechanism within your driver without > having to care about which buffers are exported to dmabuf's and which > are not. Creating a dmabuf for every GEM bo is too heavyweight. Right. But I think we should dealt with this separately. Actually, we are trying to use reservation for gpu pipe line synchronization such as sgx sync object and this approach is used without dmabuf. In order words, some device can use only reservation for such pipe line synchronization and at the same time, fence helper or similar thing with dmabuf for buffer synchronization. > > I'm not entirely sure if reservation/fence could/should be made any > simpler for multi-buffer users. Probably the best thing to do is just > get reservation/fence rolled out in a few drivers and see if some > common patterns emerge. > > BR, > -R > > > > > Thanks, > > Inki Dae > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com] > Sent: Tuesday, May 28, 2013 12:23 AM > To: Inki Dae > Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'YoungJun Cho'; 'Kyungmin > Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm- > ker...@lists.infradead.org; linux-me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > Hey, > > Op 27-05-13 12:38, Inki Dae schreef: > > Hi all, > > > > I have been removed previous branch and added new one with more cleanup. > > This time, the fence helper doesn't include user side interfaces and > cache > > operation relevant codes anymore because not only we are not sure that > > coupling those two things, synchronizing caches and buffer access > between > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is > a > > good idea yet but also existing codes for user side have problems with > badly > > behaved or crashing userspace. So this could be more discussed later. > > > > The below is a new branch, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/?h=dma-f > > ence-helper > > > > And fence helper codes, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > > > And example codes for device driver, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > > > I think the time is not yet ripe for RFC posting: maybe existing dma > fence > > and reservation need more review and addition work. So I'd glad for > somebody > > giving other opinions and advices in advance before RFC posting. > > > NAK. > > For examples for how to handle locking properly, see Documentation/ww- > mutex-design.txt in my recent tree. > I could list what I believe is wrong with your implementation, but real > problem is that the approach you're taking is wrong. I just removed ticket stubs to show my approach you guys as simple as possible, and I just wanted to show that we could use buffer synchronization mechanism without ticket stubs. Question, WW-Mutexes could be used for all devices? I guess this has dependence on x86 gpu: gpu has VRAM and it means different memory domain. And could you tell my why shared fence should have only eight objects? I think we could need more than eight objects for read access. Anyway I think I don't surely understand yet so there might be my missing point. Thanks, Inki Dae > > ~Maarten ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 27, 2013 at 5:47 PM, Rob Clark wrote: > On Mon, May 27, 2013 at 6:38 AM, Inki Dae wrote: >> Hi all, >> >> I have been removed previous branch and added new one with more cleanup. >> This time, the fence helper doesn't include user side interfaces and cache >> operation relevant codes anymore because not only we are not sure that >> coupling those two things, synchronizing caches and buffer access between >> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a >> good idea yet but also existing codes for user side have problems with badly >> behaved or crashing userspace. So this could be more discussed later. >> >> The below is a new branch, >> >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f >> ence-helper >> >> And fence helper codes, >> >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? >> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 >> >> And example codes for device driver, >> >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? >> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae >> >> I think the time is not yet ripe for RFC posting: maybe existing dma fence >> and reservation need more review and addition work. So I'd glad for somebody >> giving other opinions and advices in advance before RFC posting. > > thoughts from a *really* quick, pre-coffee, first look: > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > probably wouldn't want to bake in assumption that seqno_fence is used. Yeah, which is why Maarten&I discussed ideas already for what needs to be improved in the current dma-buf interface code to make this Just Work. At least as long as a driver doesn't want to add new fences, which would be especially useful for all kinds of gpu access. > * I guess g2d is probably not actually a simple use case, since I > expect you can submit blits involving multiple buffers :-P Yeah, on a quick read the current fence helper code seems to be a bit limited in scope. > * otherwise, you probably don't want to depend on dmabuf, which is why > reservation/fence is split out the way it is.. you want to be able to > use a single reservation/fence mechanism within your driver without > having to care about which buffers are exported to dmabuf's and which > are not. Creating a dmabuf for every GEM bo is too heavyweight. That's pretty much the reason that reservations are free-standing from dma_bufs. The idea is to embed them into the gem/ttm/v4l buffer object. Maarten also has some helpers to keep track of multi-buffer ww_mutex locking and fence attaching in his reservation helpers, but I think we should wait with those until we have drivers using them. For now I think the priority should be to get the basic stuff in and ttm as the first user established. Then we can go nuts later on. > I'm not entirely sure if reservation/fence could/should be made any > simpler for multi-buffer users. Probably the best thing to do is just > get reservation/fence rolled out in a few drivers and see if some > common patterns emerge. I think we can make the 1 buffer per dma op (i.e. 1:1 dma_buf->reservation : fence mapping) work fairly simple in dma_buf with maybe a dma_buf_attachment_start_dma/end_dma helpers. But there's also still the open that currently there's no way to flush cpu caches for dma access without unmapping the attachement (or resorting to trick which might not work), so we have a few gaping holes in the interface already anyway. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 27, 2013 at 6:38 AM, Inki Dae wrote: > Hi all, > > I have been removed previous branch and added new one with more cleanup. > This time, the fence helper doesn't include user side interfaces and cache > operation relevant codes anymore because not only we are not sure that > coupling those two things, synchronizing caches and buffer access between > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a > good idea yet but also existing codes for user side have problems with badly > behaved or crashing userspace. So this could be more discussed later. > > The below is a new branch, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f > ence-helper > > And fence helper codes, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > And example codes for device driver, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > I think the time is not yet ripe for RFC posting: maybe existing dma fence > and reservation need more review and addition work. So I'd glad for somebody > giving other opinions and advices in advance before RFC posting. thoughts from a *really* quick, pre-coffee, first look: * any sort of helper to simplify single-buffer sort of use-cases (v4l) probably wouldn't want to bake in assumption that seqno_fence is used. * I guess g2d is probably not actually a simple use case, since I expect you can submit blits involving multiple buffers :-P * otherwise, you probably don't want to depend on dmabuf, which is why reservation/fence is split out the way it is.. you want to be able to use a single reservation/fence mechanism within your driver without having to care about which buffers are exported to dmabuf's and which are not. Creating a dmabuf for every GEM bo is too heavyweight. I'm not entirely sure if reservation/fence could/should be made any simpler for multi-buffer users. Probably the best thing to do is just get reservation/fence rolled out in a few drivers and see if some common patterns emerge. BR, -R > > Thanks, > Inki Dae > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
Hey, Op 27-05-13 12:38, Inki Dae schreef: > Hi all, > > I have been removed previous branch and added new one with more cleanup. > This time, the fence helper doesn't include user side interfaces and cache > operation relevant codes anymore because not only we are not sure that > coupling those two things, synchronizing caches and buffer access between > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a > good idea yet but also existing codes for user side have problems with badly > behaved or crashing userspace. So this could be more discussed later. > > The below is a new branch, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f > ence-helper > > And fence helper codes, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > And example codes for device driver, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > I think the time is not yet ripe for RFC posting: maybe existing dma fence > and reservation need more review and addition work. So I'd glad for somebody > giving other opinions and advices in advance before RFC posting. > NAK. For examples for how to handle locking properly, see Documentation/ww-mutex-design.txt in my recent tree. I could list what I believe is wrong with your implementation, but real problem is that the approach you're taking is wrong. ~Maarten ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
Hi all, I have been removed previous branch and added new one with more cleanup. This time, the fence helper doesn't include user side interfaces and cache operation relevant codes anymore because not only we are not sure that coupling those two things, synchronizing caches and buffer access between CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a good idea yet but also existing codes for user side have problems with badly behaved or crashing userspace. So this could be more discussed later. The below is a new branch, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f ence-helper And fence helper codes, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 And example codes for device driver, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae I think the time is not yet ripe for RFC posting: maybe existing dma fence and reservation need more review and addition work. So I'd glad for somebody giving other opinions and advices in advance before RFC posting. Thanks, Inki Dae ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Thursday, May 23, 2013 8:56 PM > To: Inki Dae > Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; > YoungJun Cho; linux-arm-ker...@lists.infradead.org; linux- > me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Tue, May 21, 2013 at 11:22 AM, Inki Dae wrote: > >> -Original Message- > >> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel > >> Vetter > >> Sent: Tuesday, May 21, 2013 4:45 PM > >> To: Inki Dae > >> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list'; > >> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm- > >> ker...@lists.infradead.org; linux-me...@vger.kernel.org > >> Subject: Re: Introduce a new helper framework for buffer > synchronization > >> > >> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote: > >> > > - Integration of fence syncing into dma_buf. Imo we should have a > >> > > per-attachment mode which decides whether map/unmap (and the new > >> sync) > >> > > should wait for fences or whether the driver takes care of > syncing > >> > > through the new fence framework itself (for direct hw sync). > >> > > >> > I think it's a good idea to have per-attachment mode for buffer sync. > >> But > >> > I'd like to say we make sure what is the purpose of map > >> > (dma_buf_map_attachment)first. This interface is used to get a sgt; > >> > containing pages to physical memory region, and map that region with > >> > device's iommu table. The important thing is that this interface is > >> called > >> > just one time when user wants to share an allocated buffer with dma. > But > >> cpu > >> > will try to access the buffer every time as long as it wants. > Therefore, > >> we > >> > need cache control every time cpu and dma access the shared buffer: > >> cache > >> > clean when cpu goes to dma and cache invalidate when dma goes to cpu. > >> That > >> > is why I added new interfaces, DMA_BUF_GET_FENCE and > DMA_BUF_PUT_FENCE, > >> to > >> > dma buf framework. Of course, Those are ugly so we need a better way: > I > >> just > >> > wanted to show you that in such way, we can synchronize the shared > >> buffer > >> > between cpu and dma. By any chance, is there any way that kernel can > be > >> > aware of when cpu accesses the shared buffer or is there any point I > >> didn't > >> > catch up? > >> > >> Well dma_buf_map/unmap_attachment should also flush/invalidate any > caches, > >> and with the current dma_buf spec those two functions are the _only_ > means > > > > I know that dma buf exporter should make sure that cache > clean/invalidate > > are done when dma_buf_map/unmap_attachement is called. For this, already > we > > do so. However, this function is called when dma buf import is requested > by > > user to map a dmabuf fd with dma. This means that > dma_buf_map_attachement() > > is called ONCE when user wants to share the dmabuf fd with dma. Actually, > in > > case of exynos drm, dma_map_sg_attrs(), performing cache operation > > internally, is called when dmabuf import is requested by user. > > > >> you have to do so. Which strictly means that if you interleave device > dma > >> and cpu acccess you need to unmap/map every time. > >> > >> Which is obviously horribly inefficient, but a known gap in the dma_buf > > > > Right, and also this has big overhead. > > > >> interface. Hence why I proposed to add dma_buf_sync_attachment similar > to > >> dma_sync_single_for_device: > >> > >> /** > >> * dma_buf_sync_sg_attachment - sync caches for dma access > >> * @attach: dma-buf attachment to sync > >> * @sgt: the sg table to sync (returned by dma_buf_map_attachement) > >> * @direction: dma direction to sync for > >> * > >> * This function synchronizes caches for device dma through the given > >> * dma-buf attachment when interleaving dma from different devices and > the > >> * cpu. Other device dma needs to be synced also by calls to this > >> * function (or
Re: Introduce a new helper framework for buffer synchronization
On Tue, May 21, 2013 at 11:22 AM, Inki Dae wrote: >> -Original Message- >> From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel >> Vetter >> Sent: Tuesday, May 21, 2013 4:45 PM >> To: Inki Dae >> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list'; >> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm- >> ker...@lists.infradead.org; linux-me...@vger.kernel.org >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote: >> > > - Integration of fence syncing into dma_buf. Imo we should have a >> > > per-attachment mode which decides whether map/unmap (and the new >> sync) >> > > should wait for fences or whether the driver takes care of syncing >> > > through the new fence framework itself (for direct hw sync). >> > >> > I think it's a good idea to have per-attachment mode for buffer sync. >> But >> > I'd like to say we make sure what is the purpose of map >> > (dma_buf_map_attachment)first. This interface is used to get a sgt; >> > containing pages to physical memory region, and map that region with >> > device's iommu table. The important thing is that this interface is >> called >> > just one time when user wants to share an allocated buffer with dma. But >> cpu >> > will try to access the buffer every time as long as it wants. Therefore, >> we >> > need cache control every time cpu and dma access the shared buffer: >> cache >> > clean when cpu goes to dma and cache invalidate when dma goes to cpu. >> That >> > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, >> to >> > dma buf framework. Of course, Those are ugly so we need a better way: I >> just >> > wanted to show you that in such way, we can synchronize the shared >> buffer >> > between cpu and dma. By any chance, is there any way that kernel can be >> > aware of when cpu accesses the shared buffer or is there any point I >> didn't >> > catch up? >> >> Well dma_buf_map/unmap_attachment should also flush/invalidate any caches, >> and with the current dma_buf spec those two functions are the _only_ means > > I know that dma buf exporter should make sure that cache clean/invalidate > are done when dma_buf_map/unmap_attachement is called. For this, already we > do so. However, this function is called when dma buf import is requested by > user to map a dmabuf fd with dma. This means that dma_buf_map_attachement() > is called ONCE when user wants to share the dmabuf fd with dma. Actually, in > case of exynos drm, dma_map_sg_attrs(), performing cache operation > internally, is called when dmabuf import is requested by user. > >> you have to do so. Which strictly means that if you interleave device dma >> and cpu acccess you need to unmap/map every time. >> >> Which is obviously horribly inefficient, but a known gap in the dma_buf > > Right, and also this has big overhead. > >> interface. Hence why I proposed to add dma_buf_sync_attachment similar to >> dma_sync_single_for_device: >> >> /** >> * dma_buf_sync_sg_attachment - sync caches for dma access >> * @attach: dma-buf attachment to sync >> * @sgt: the sg table to sync (returned by dma_buf_map_attachement) >> * @direction: dma direction to sync for >> * >> * This function synchronizes caches for device dma through the given >> * dma-buf attachment when interleaving dma from different devices and the >> * cpu. Other device dma needs to be synced also by calls to this >> * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access >> * needs to be synced with dma_buf_begin/end_cpu_access. >> */ >> void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach, >> struct sg_table *sgt, >> enum dma_data_direction direction) >> >> Note that "sync" here only means to synchronize caches, not wait for any >> outstanding fences. This is simply to be consistent with the established >> lingo of the dma api. How the dma-buf fences fit into this is imo a >> different topic, but my idea is that all the cache coherency barriers >> (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and >> dma_buf_begin/end_cpu_access) would automatically block for any attached >> fences (i.e. block for write fences when doing read-only a
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Tuesday, May 21, 2013 4:45 PM > To: Inki Dae > Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list'; > 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm- > ker...@lists.infradead.org; linux-me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote: > > > - Integration of fence syncing into dma_buf. Imo we should have a > > > per-attachment mode which decides whether map/unmap (and the new > sync) > > > should wait for fences or whether the driver takes care of syncing > > > through the new fence framework itself (for direct hw sync). > > > > I think it's a good idea to have per-attachment mode for buffer sync. > But > > I'd like to say we make sure what is the purpose of map > > (dma_buf_map_attachment)first. This interface is used to get a sgt; > > containing pages to physical memory region, and map that region with > > device's iommu table. The important thing is that this interface is > called > > just one time when user wants to share an allocated buffer with dma. But > cpu > > will try to access the buffer every time as long as it wants. Therefore, > we > > need cache control every time cpu and dma access the shared buffer: > cache > > clean when cpu goes to dma and cache invalidate when dma goes to cpu. > That > > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, > to > > dma buf framework. Of course, Those are ugly so we need a better way: I > just > > wanted to show you that in such way, we can synchronize the shared > buffer > > between cpu and dma. By any chance, is there any way that kernel can be > > aware of when cpu accesses the shared buffer or is there any point I > didn't > > catch up? > > Well dma_buf_map/unmap_attachment should also flush/invalidate any caches, > and with the current dma_buf spec those two functions are the _only_ means I know that dma buf exporter should make sure that cache clean/invalidate are done when dma_buf_map/unmap_attachement is called. For this, already we do so. However, this function is called when dma buf import is requested by user to map a dmabuf fd with dma. This means that dma_buf_map_attachement() is called ONCE when user wants to share the dmabuf fd with dma. Actually, in case of exynos drm, dma_map_sg_attrs(), performing cache operation internally, is called when dmabuf import is requested by user. > you have to do so. Which strictly means that if you interleave device dma > and cpu acccess you need to unmap/map every time. > > Which is obviously horribly inefficient, but a known gap in the dma_buf Right, and also this has big overhead. > interface. Hence why I proposed to add dma_buf_sync_attachment similar to > dma_sync_single_for_device: > > /** > * dma_buf_sync_sg_attachment - sync caches for dma access > * @attach: dma-buf attachment to sync > * @sgt: the sg table to sync (returned by dma_buf_map_attachement) > * @direction: dma direction to sync for > * > * This function synchronizes caches for device dma through the given > * dma-buf attachment when interleaving dma from different devices and the > * cpu. Other device dma needs to be synced also by calls to this > * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access > * needs to be synced with dma_buf_begin/end_cpu_access. > */ > void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach, > struct sg_table *sgt, > enum dma_data_direction direction) > > Note that "sync" here only means to synchronize caches, not wait for any > outstanding fences. This is simply to be consistent with the established > lingo of the dma api. How the dma-buf fences fit into this is imo a > different topic, but my idea is that all the cache coherency barriers > (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and > dma_buf_begin/end_cpu_access) would automatically block for any attached > fences (i.e. block for write fences when doing read-only access, block for > all fences otherwise). As I mentioned already, kernel can't aware of when cpu accesses a shared buffer: user can access a shared buffer after mmap anytime and the shared buffer should be synchronized between cpu and dma. Therefore, the above cache coherency barriers should be called every time cpu and dma tries to access a shared buffer, checking before and after of c
Re: Introduce a new helper framework for buffer synchronization
On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote: > > - Integration of fence syncing into dma_buf. Imo we should have a > > per-attachment mode which decides whether map/unmap (and the new sync) > > should wait for fences or whether the driver takes care of syncing > > through the new fence framework itself (for direct hw sync). > > I think it's a good idea to have per-attachment mode for buffer sync. But > I'd like to say we make sure what is the purpose of map > (dma_buf_map_attachment)first. This interface is used to get a sgt; > containing pages to physical memory region, and map that region with > device's iommu table. The important thing is that this interface is called > just one time when user wants to share an allocated buffer with dma. But cpu > will try to access the buffer every time as long as it wants. Therefore, we > need cache control every time cpu and dma access the shared buffer: cache > clean when cpu goes to dma and cache invalidate when dma goes to cpu. That > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, to > dma buf framework. Of course, Those are ugly so we need a better way: I just > wanted to show you that in such way, we can synchronize the shared buffer > between cpu and dma. By any chance, is there any way that kernel can be > aware of when cpu accesses the shared buffer or is there any point I didn't > catch up? Well dma_buf_map/unmap_attachment should also flush/invalidate any caches, and with the current dma_buf spec those two functions are the _only_ means you have to do so. Which strictly means that if you interleave device dma and cpu acccess you need to unmap/map every time. Which is obviously horribly inefficient, but a known gap in the dma_buf interface. Hence why I proposed to add dma_buf_sync_attachment similar to dma_sync_single_for_device: /** * dma_buf_sync_sg_attachment - sync caches for dma access * @attach: dma-buf attachment to sync * @sgt: the sg table to sync (returned by dma_buf_map_attachement) * @direction: dma direction to sync for * * This function synchronizes caches for device dma through the given * dma-buf attachment when interleaving dma from different devices and the * cpu. Other device dma needs to be synced also by calls to this * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access * needs to be synced with dma_buf_begin/end_cpu_access. */ void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach, struct sg_table *sgt, enum dma_data_direction direction) Note that "sync" here only means to synchronize caches, not wait for any outstanding fences. This is simply to be consistent with the established lingo of the dma api. How the dma-buf fences fit into this is imo a different topic, but my idea is that all the cache coherency barriers (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and dma_buf_begin/end_cpu_access) would automatically block for any attached fences (i.e. block for write fences when doing read-only access, block for all fences otherwise). Then we could do a new dma_buf_attach_flags interface for special cases (might also be useful for other things, similar to the recently added flags in the dma api for wc/no-kernel-mapping/...). A new flag like DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care of all fencing for this attachment and the dma-buf functions should not do the automagic fence blocking. > In Linux kernel, especially embedded system, we had tried to implement > generic interfaces for buffer management; how to allocate a buffer and how > to share a buffer. As a result, now we have CMA (Contiguous Memory > Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing > between cpu and dma. And then how to synchronize a buffer between cpu and > dma? I think now it's best time to discuss buffer synchronization mechanism, > and that is very natural. I think it's important to differentiate between the two meanings of sync: - synchronize caches (i.e. cpu cache flushing in most cases) - and synchronize in-flight dma with fences. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Tuesday, May 21, 2013 6:31 AM > To: Inki Dae > Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; > YoungJun Cho; linux-arm-ker...@lists.infradead.org; linux- > me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 20, 2013 at 11:13:04PM +0200, Daniel Vetter wrote: > > On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote: > > > 2013/5/15 Rob Clark > > > > > > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae > wrote: > > > > > > > > > > > > > > >> -Original Message- > > > > >> From: Rob Clark [mailto:robdcl...@gmail.com] > > > > >> Sent: Tuesday, May 14, 2013 10:39 PM > > > > >> To: Inki Dae > > > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; > YoungJun > > > > >> Cho; linux-arm-ker...@lists.infradead.org; linux- > me...@vger.kernel.org > > > > >> Subject: Re: Introduce a new helper framework for buffer > synchronization > > > > >> > > > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae > > > > wrote: > > > > >> >> well, for cache management, I think it is a better idea.. I > didn't > > > > >> >> really catch that this was the motivation from the initial > patch, but > > > > >> >> maybe I read it too quickly. But cache can be decoupled from > > > > >> >> synchronization, because CPU access is not asynchronous. For > > > > >> >> userspace/CPU access to buffer, you should: > > > > >> >> > > > > >> >> 1) wait for buffer > > > > >> >> 2) prepare-access > > > > >> >> 3) ... do whatever cpu access to buffer ... > > > > >> >> 4) finish-access > > > > >> >> 5) submit buffer for new dma-operation > > > > >> >> > > > > >> > > > > > >> > > > > > >> > For data flow from CPU to DMA device, > > > > >> > 1) wait for buffer > > > > >> > 2) prepare-access (dma_buf_begin_cpu_access) > > > > >> > 3) cpu access to buffer > > > > >> > > > > > >> > > > > > >> > For data flow from DMA device to CPU > > > > >> > 1) wait for buffer > > > > >> > > > > >> Right, but CPU access isn't asynchronous (from the point of view > of > > > > >> the CPU), so there isn't really any wait step at this point. And > if > > > > >> you do want the CPU to be able to signal a fence from userspace > for > > > > >> some reason, you probably what something file/fd based so the > > > > >> refcnting/cleanup when process dies doesn't leave some pending > DMA > > > > >> action wedged. But I don't really see the point of that > complexity > > > > >> when the CPU access isn't asynchronous in the first place. > > > > >> > > > > > > > > > > There was my missing comments, please see the below sequence. > > > > > > > > > > For data flow from CPU to DMA device and then from DMA device to > CPU, > > > > > 1) wait for buffer <- at user side - ioctl(fd, > DMA_BUF_GET_FENCE, ...) > > > > > - including prepare-access (dma_buf_begin_cpu_access) > > > > > 2) cpu access to buffer > > > > > 3) wait for buffer <- at device driver > > > > > - but CPU is already accessing the buffer so blocked. > > > > > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) > > > > > 5) the thread, blocked at 3), is waked up by 4). > > > > > - and then finish-access (dma_buf_end_cpu_access) > > > > > > > > right, I understand you can have background threads, etc, in > > > > userspace. But there are already plenty of synchronization > primitives > > > > that can be used for cpu->cpu synchronization, either within the > same > > > > process or between multiple processes. For cpu access, even if it > is > > > > handled by background threads/processes, I think it i
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 20, 2013 at 11:13:04PM +0200, Daniel Vetter wrote: > On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote: > > 2013/5/15 Rob Clark > > > > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae wrote: > > > > > > > > > > > >> -Original Message- > > > >> From: Rob Clark [mailto:robdcl...@gmail.com] > > > >> Sent: Tuesday, May 14, 2013 10:39 PM > > > >> To: Inki Dae > > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; > > > >> YoungJun > > > >> Cho; linux-arm-ker...@lists.infradead.org; linux-me...@vger.kernel.org > > > >> Subject: Re: Introduce a new helper framework for buffer > > > >> synchronization > > > >> > > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae > > > wrote: > > > >> >> well, for cache management, I think it is a better idea.. I didn't > > > >> >> really catch that this was the motivation from the initial patch, > > > >> >> but > > > >> >> maybe I read it too quickly. But cache can be decoupled from > > > >> >> synchronization, because CPU access is not asynchronous. For > > > >> >> userspace/CPU access to buffer, you should: > > > >> >> > > > >> >> 1) wait for buffer > > > >> >> 2) prepare-access > > > >> >> 3) ... do whatever cpu access to buffer ... > > > >> >> 4) finish-access > > > >> >> 5) submit buffer for new dma-operation > > > >> >> > > > >> > > > > >> > > > > >> > For data flow from CPU to DMA device, > > > >> > 1) wait for buffer > > > >> > 2) prepare-access (dma_buf_begin_cpu_access) > > > >> > 3) cpu access to buffer > > > >> > > > > >> > > > > >> > For data flow from DMA device to CPU > > > >> > 1) wait for buffer > > > >> > > > >> Right, but CPU access isn't asynchronous (from the point of view of > > > >> the CPU), so there isn't really any wait step at this point. And if > > > >> you do want the CPU to be able to signal a fence from userspace for > > > >> some reason, you probably what something file/fd based so the > > > >> refcnting/cleanup when process dies doesn't leave some pending DMA > > > >> action wedged. But I don't really see the point of that complexity > > > >> when the CPU access isn't asynchronous in the first place. > > > >> > > > > > > > > There was my missing comments, please see the below sequence. > > > > > > > > For data flow from CPU to DMA device and then from DMA device to CPU, > > > > 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > > > > - including prepare-access (dma_buf_begin_cpu_access) > > > > 2) cpu access to buffer > > > > 3) wait for buffer <- at device driver > > > > - but CPU is already accessing the buffer so blocked. > > > > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) > > > > 5) the thread, blocked at 3), is waked up by 4). > > > > - and then finish-access (dma_buf_end_cpu_access) > > > > > > right, I understand you can have background threads, etc, in > > > userspace. But there are already plenty of synchronization primitives > > > that can be used for cpu->cpu synchronization, either within the same > > > process or between multiple processes. For cpu access, even if it is > > > handled by background threads/processes, I think it is better to use > > > the traditional pthreads or unix synchronization primitives. They > > > have existed forever, they are well tested, and they work. > > > > > > So while it seems nice and orthogonal/clean to couple cache and > > > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the > > > same generic way, but I think in practice we have to make things more > > > complex than they otherwise need to be to do this. Otherwise I think > > > we'll be having problems with badly behaved or crashing userspace. > > > > > > > > Right, this is very important. I think it's not esay to solve this > > issue. Aand at least for
Re: Introduce a new helper framework for buffer synchronization
On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote: > 2013/5/15 Rob Clark > > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae wrote: > > > > > > > > >> -Original Message- > > >> From: Rob Clark [mailto:robdcl...@gmail.com] > > >> Sent: Tuesday, May 14, 2013 10:39 PM > > >> To: Inki Dae > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun > > >> Cho; linux-arm-ker...@lists.infradead.org; linux-me...@vger.kernel.org > > >> Subject: Re: Introduce a new helper framework for buffer synchronization > > >> > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae > > wrote: > > >> >> well, for cache management, I think it is a better idea.. I didn't > > >> >> really catch that this was the motivation from the initial patch, but > > >> >> maybe I read it too quickly. But cache can be decoupled from > > >> >> synchronization, because CPU access is not asynchronous. For > > >> >> userspace/CPU access to buffer, you should: > > >> >> > > >> >> 1) wait for buffer > > >> >> 2) prepare-access > > >> >> 3) ... do whatever cpu access to buffer ... > > >> >> 4) finish-access > > >> >> 5) submit buffer for new dma-operation > > >> >> > > >> > > > >> > > > >> > For data flow from CPU to DMA device, > > >> > 1) wait for buffer > > >> > 2) prepare-access (dma_buf_begin_cpu_access) > > >> > 3) cpu access to buffer > > >> > > > >> > > > >> > For data flow from DMA device to CPU > > >> > 1) wait for buffer > > >> > > >> Right, but CPU access isn't asynchronous (from the point of view of > > >> the CPU), so there isn't really any wait step at this point. And if > > >> you do want the CPU to be able to signal a fence from userspace for > > >> some reason, you probably what something file/fd based so the > > >> refcnting/cleanup when process dies doesn't leave some pending DMA > > >> action wedged. But I don't really see the point of that complexity > > >> when the CPU access isn't asynchronous in the first place. > > >> > > > > > > There was my missing comments, please see the below sequence. > > > > > > For data flow from CPU to DMA device and then from DMA device to CPU, > > > 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > > > - including prepare-access (dma_buf_begin_cpu_access) > > > 2) cpu access to buffer > > > 3) wait for buffer <- at device driver > > > - but CPU is already accessing the buffer so blocked. > > > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) > > > 5) the thread, blocked at 3), is waked up by 4). > > > - and then finish-access (dma_buf_end_cpu_access) > > > > right, I understand you can have background threads, etc, in > > userspace. But there are already plenty of synchronization primitives > > that can be used for cpu->cpu synchronization, either within the same > > process or between multiple processes. For cpu access, even if it is > > handled by background threads/processes, I think it is better to use > > the traditional pthreads or unix synchronization primitives. They > > have existed forever, they are well tested, and they work. > > > > So while it seems nice and orthogonal/clean to couple cache and > > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the > > same generic way, but I think in practice we have to make things more > > complex than they otherwise need to be to do this. Otherwise I think > > we'll be having problems with badly behaved or crashing userspace. > > > > > Right, this is very important. I think it's not esay to solve this > issue. Aand at least for ARM based embedded system, such feature is useful > to us; coupling cache operation and buffer synchronization. I'd like to > collect other opinions and advices to solve this issue. Maybe we have a bit a misunderstanding here. The kernel really should take care of sync and cache coherency, and I agree that with the current dma_buf code (and the proposed fences) that part is still a bit hazy. But the kernel should not allow userspace to block access to a buffer until userspace is done. It should only sync with any oustanding fences and flush buffers before that userspace access happens. Then the kernel would again flush caches on the next dma access (which hopefully is only started once userspace completed access). Atm this isn't possible in an efficient way since the dma_buf api only exposes map/unmap attachment and not a function to just sync an existing mapping like dma_sync_single_for_device. I guess we should add a dma_buf_sync_attachment interface for that - we already have range-based cpu sync support with the begin/end_cpu_access interfaces. Yours, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
Hi Daniel, 2013/5/17 Daniel Vetter > On Wed, May 15, 2013 at 4:06 PM, Rob Clark wrote: > > So while it seems nice and orthogonal/clean to couple cache and > > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the > > same generic way, but I think in practice we have to make things more > > complex than they otherwise need to be to do this. Otherwise I think > > we'll be having problems with badly behaved or crashing userspace. > > I haven't read through the entire thread careful but imo this is very > important. If we add a fence interface which allows userspace to block > dma this is a no-go. The only thing we need is to sync up with all > outstanding dma operations and flush caches for cpu access. If broken > userspace starts to issue new dma (or multiple thread stomp onto each > another) that's not a problem dma fences/syncpoints should try to > solve. I'm not sure that I understood your concerns but it seems that you say we have to prohibit userspace from blocking dma. Could you please give me more detail for it? Without critical problem by userspace, this appoach is a better way against the traditional at least for ARM based embedded system. For this, I had already mentioned before like below, http://www.spinics.net/lists/dri-devel/msg38359.html If you agree to my opinion, I'd like to say we could try to solve this problem in the long term. If we prohibit such interfaces from be used without sure reason, I carefully think we might to be just going thourgh the motions: we have to use traditional way NECESSARILY. As previously stated, could please tell me about that there are sure reasons we have to prohibit the such user interfaces from being used necessarily and there is really no any way we have to solve that? Basically, I have designed and implemented that all resources to user fence are freed once timed out so that the user cannot affect the other anymore. However, I'm sure that there are things I didn't cach up. As I already mentioned, the purpose of this post is to collect other opinions and advices for better something else. Of course, we have to concentrate on solving the device-to-device sync issues first. Thanks, Inki Dae > This way we can concentrate on solving the (already > challenging) device-to-device sync issues without additional > complexities which cpu->cpu sync would impose. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
2013/5/15 Rob Clark > On Wed, May 15, 2013 at 1:19 AM, Inki Dae wrote: > > > > > >> -Original Message- > >> From: Rob Clark [mailto:robdcl...@gmail.com] > >> Sent: Tuesday, May 14, 2013 10:39 PM > >> To: Inki Dae > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun > >> Cho; linux-arm-ker...@lists.infradead.org; linux-me...@vger.kernel.org > >> Subject: Re: Introduce a new helper framework for buffer synchronization > >> > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae > wrote: > >> >> well, for cache management, I think it is a better idea.. I didn't > >> >> really catch that this was the motivation from the initial patch, but > >> >> maybe I read it too quickly. But cache can be decoupled from > >> >> synchronization, because CPU access is not asynchronous. For > >> >> userspace/CPU access to buffer, you should: > >> >> > >> >> 1) wait for buffer > >> >> 2) prepare-access > >> >> 3) ... do whatever cpu access to buffer ... > >> >> 4) finish-access > >> >> 5) submit buffer for new dma-operation > >> >> > >> > > >> > > >> > For data flow from CPU to DMA device, > >> > 1) wait for buffer > >> > 2) prepare-access (dma_buf_begin_cpu_access) > >> > 3) cpu access to buffer > >> > > >> > > >> > For data flow from DMA device to CPU > >> > 1) wait for buffer > >> > >> Right, but CPU access isn't asynchronous (from the point of view of > >> the CPU), so there isn't really any wait step at this point. And if > >> you do want the CPU to be able to signal a fence from userspace for > >> some reason, you probably what something file/fd based so the > >> refcnting/cleanup when process dies doesn't leave some pending DMA > >> action wedged. But I don't really see the point of that complexity > >> when the CPU access isn't asynchronous in the first place. > >> > > > > There was my missing comments, please see the below sequence. > > > > For data flow from CPU to DMA device and then from DMA device to CPU, > > 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > > - including prepare-access (dma_buf_begin_cpu_access) > > 2) cpu access to buffer > > 3) wait for buffer <- at device driver > > - but CPU is already accessing the buffer so blocked. > > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) > > 5) the thread, blocked at 3), is waked up by 4). > > - and then finish-access (dma_buf_end_cpu_access) > > right, I understand you can have background threads, etc, in > userspace. But there are already plenty of synchronization primitives > that can be used for cpu->cpu synchronization, either within the same > process or between multiple processes. For cpu access, even if it is > handled by background threads/processes, I think it is better to use > the traditional pthreads or unix synchronization primitives. They > have existed forever, they are well tested, and they work. > > So while it seems nice and orthogonal/clean to couple cache and > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the > same generic way, but I think in practice we have to make things more > complex than they otherwise need to be to do this. Otherwise I think > we'll be having problems with badly behaved or crashing userspace. > > Right, this is very important. I think it's not esay to solve this issue. Aand at least for ARM based embedded system, such feature is useful to us; coupling cache operation and buffer synchronization. I'd like to collect other opinions and advices to solve this issue. Thanks, Inki Dae > BR, > -R > > > 6) dma access to buffer > > 7) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > > - but DMA is already accessing the buffer so blocked. > > 8) signal <- at device driver > > 9) the thread, blocked at 7), is waked up by 8) > > - and then prepare-access (dma_buf_begin_cpu_access) > > 10 cpu access to buffer > > > > Basically, 'wait for buffer' includes buffer synchronization, committing > > processing, and cache operation. The buffer synchronization means that a > > current thread should wait for other threads accessing a shared buffer > until > > the completion of their access. And the committing processing me
Re: Introduce a new helper framework for buffer synchronization
On Wed, May 15, 2013 at 4:06 PM, Rob Clark wrote: > So while it seems nice and orthogonal/clean to couple cache and > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the > same generic way, but I think in practice we have to make things more > complex than they otherwise need to be to do this. Otherwise I think > we'll be having problems with badly behaved or crashing userspace. I haven't read through the entire thread careful but imo this is very important. If we add a fence interface which allows userspace to block dma this is a no-go. The only thing we need is to sync up with all outstanding dma operations and flush caches for cpu access. If broken userspace starts to issue new dma (or multiple thread stomp onto each another) that's not a problem dma fences/syncpoints should try to solve. This way we can concentrate on solving the (already challenging) device-to-device sync issues without additional complexities which cpu->cpu sync would impose. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
On Wed, May 15, 2013 at 1:19 AM, Inki Dae wrote: > > >> -Original Message- >> From: Rob Clark [mailto:robdcl...@gmail.com] >> Sent: Tuesday, May 14, 2013 10:39 PM >> To: Inki Dae >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun >> Cho; linux-arm-ker...@lists.infradead.org; linux-me...@vger.kernel.org >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae wrote: >> >> well, for cache management, I think it is a better idea.. I didn't >> >> really catch that this was the motivation from the initial patch, but >> >> maybe I read it too quickly. But cache can be decoupled from >> >> synchronization, because CPU access is not asynchronous. For >> >> userspace/CPU access to buffer, you should: >> >> >> >> 1) wait for buffer >> >> 2) prepare-access >> >> 3) ... do whatever cpu access to buffer ... >> >> 4) finish-access >> >> 5) submit buffer for new dma-operation >> >> >> > >> > >> > For data flow from CPU to DMA device, >> > 1) wait for buffer >> > 2) prepare-access (dma_buf_begin_cpu_access) >> > 3) cpu access to buffer >> > >> > >> > For data flow from DMA device to CPU >> > 1) wait for buffer >> >> Right, but CPU access isn't asynchronous (from the point of view of >> the CPU), so there isn't really any wait step at this point. And if >> you do want the CPU to be able to signal a fence from userspace for >> some reason, you probably what something file/fd based so the >> refcnting/cleanup when process dies doesn't leave some pending DMA >> action wedged. But I don't really see the point of that complexity >> when the CPU access isn't asynchronous in the first place. >> > > There was my missing comments, please see the below sequence. > > For data flow from CPU to DMA device and then from DMA device to CPU, > 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > - including prepare-access (dma_buf_begin_cpu_access) > 2) cpu access to buffer > 3) wait for buffer <- at device driver > - but CPU is already accessing the buffer so blocked. > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) > 5) the thread, blocked at 3), is waked up by 4). > - and then finish-access (dma_buf_end_cpu_access) right, I understand you can have background threads, etc, in userspace. But there are already plenty of synchronization primitives that can be used for cpu->cpu synchronization, either within the same process or between multiple processes. For cpu access, even if it is handled by background threads/processes, I think it is better to use the traditional pthreads or unix synchronization primitives. They have existed forever, they are well tested, and they work. So while it seems nice and orthogonal/clean to couple cache and synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the same generic way, but I think in practice we have to make things more complex than they otherwise need to be to do this. Otherwise I think we'll be having problems with badly behaved or crashing userspace. BR, -R > 6) dma access to buffer > 7) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > - but DMA is already accessing the buffer so blocked. > 8) signal <- at device driver > 9) the thread, blocked at 7), is waked up by 8) > - and then prepare-access (dma_buf_begin_cpu_access) > 10 cpu access to buffer > > Basically, 'wait for buffer' includes buffer synchronization, committing > processing, and cache operation. The buffer synchronization means that a > current thread should wait for other threads accessing a shared buffer until > the completion of their access. And the committing processing means that a > current thread possesses the shared buffer so any trying to access the > shared buffer by another thread makes the thread to be blocked. However, as > I already mentioned before, it seems that these user interfaces are so ugly > yet. So we need better way. > > Give me more comments if there is my missing point :) > > Thanks, > Inki Dae > >> BR, >> -R >> >> >> > 2) finish-access (dma_buf_end _cpu_access) >> > 3) dma access to buffer >> > >> > 1) and 2) are coupled with one function: we have implemented >> > fence_helper_commit_reserve() for it. >> > >> > Cache control(cache clean or cache invalidate) is performed properly >> > checking previous access type and current access type. >> > And the below is actual codes for it, > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: Rob Clark [mailto:robdcl...@gmail.com] > Sent: Tuesday, May 14, 2013 10:39 PM > To: Inki Dae > Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun > Cho; linux-arm-ker...@lists.infradead.org; linux-me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 13, 2013 at 10:52 PM, Inki Dae wrote: > >> well, for cache management, I think it is a better idea.. I didn't > >> really catch that this was the motivation from the initial patch, but > >> maybe I read it too quickly. But cache can be decoupled from > >> synchronization, because CPU access is not asynchronous. For > >> userspace/CPU access to buffer, you should: > >> > >> 1) wait for buffer > >> 2) prepare-access > >> 3) ... do whatever cpu access to buffer ... > >> 4) finish-access > >> 5) submit buffer for new dma-operation > >> > > > > > > For data flow from CPU to DMA device, > > 1) wait for buffer > > 2) prepare-access (dma_buf_begin_cpu_access) > > 3) cpu access to buffer > > > > > > For data flow from DMA device to CPU > > 1) wait for buffer > > Right, but CPU access isn't asynchronous (from the point of view of > the CPU), so there isn't really any wait step at this point. And if > you do want the CPU to be able to signal a fence from userspace for > some reason, you probably what something file/fd based so the > refcnting/cleanup when process dies doesn't leave some pending DMA > action wedged. But I don't really see the point of that complexity > when the CPU access isn't asynchronous in the first place. > There was my missing comments, please see the below sequence. For data flow from CPU to DMA device and then from DMA device to CPU, 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) - including prepare-access (dma_buf_begin_cpu_access) 2) cpu access to buffer 3) wait for buffer <- at device driver - but CPU is already accessing the buffer so blocked. 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) 5) the thread, blocked at 3), is waked up by 4). - and then finish-access (dma_buf_end_cpu_access) 6) dma access to buffer 7) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) - but DMA is already accessing the buffer so blocked. 8) signal <- at device driver 9) the thread, blocked at 7), is waked up by 8) - and then prepare-access (dma_buf_begin_cpu_access) 10 cpu access to buffer Basically, 'wait for buffer' includes buffer synchronization, committing processing, and cache operation. The buffer synchronization means that a current thread should wait for other threads accessing a shared buffer until the completion of their access. And the committing processing means that a current thread possesses the shared buffer so any trying to access the shared buffer by another thread makes the thread to be blocked. However, as I already mentioned before, it seems that these user interfaces are so ugly yet. So we need better way. Give me more comments if there is my missing point :) Thanks, Inki Dae > BR, > -R > > > > 2) finish-access (dma_buf_end _cpu_access) > > 3) dma access to buffer > > > > 1) and 2) are coupled with one function: we have implemented > > fence_helper_commit_reserve() for it. > > > > Cache control(cache clean or cache invalidate) is performed properly > > checking previous access type and current access type. > > And the below is actual codes for it, ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 13, 2013 at 10:52 PM, Inki Dae wrote: >> well, for cache management, I think it is a better idea.. I didn't >> really catch that this was the motivation from the initial patch, but >> maybe I read it too quickly. But cache can be decoupled from >> synchronization, because CPU access is not asynchronous. For >> userspace/CPU access to buffer, you should: >> >> 1) wait for buffer >> 2) prepare-access >> 3) ... do whatever cpu access to buffer ... >> 4) finish-access >> 5) submit buffer for new dma-operation >> > > > For data flow from CPU to DMA device, > 1) wait for buffer > 2) prepare-access (dma_buf_begin_cpu_access) > 3) cpu access to buffer > > > For data flow from DMA device to CPU > 1) wait for buffer Right, but CPU access isn't asynchronous (from the point of view of the CPU), so there isn't really any wait step at this point. And if you do want the CPU to be able to signal a fence from userspace for some reason, you probably what something file/fd based so the refcnting/cleanup when process dies doesn't leave some pending DMA action wedged. But I don't really see the point of that complexity when the CPU access isn't asynchronous in the first place. BR, -R > 2) finish-access (dma_buf_end _cpu_access) > 3) dma access to buffer > > 1) and 2) are coupled with one function: we have implemented > fence_helper_commit_reserve() for it. > > Cache control(cache clean or cache invalidate) is performed properly > checking previous access type and current access type. > And the below is actual codes for it, ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: Rob Clark [mailto:robdcl...@gmail.com] > Sent: Tuesday, May 14, 2013 2:58 AM > To: Inki Dae > Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun > Cho; linux-arm-ker...@lists.infradead.org; linux-me...@vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 13, 2013 at 1:18 PM, Inki Dae wrote: > > > > > > 2013/5/13 Rob Clark > >> > >> On Mon, May 13, 2013 at 8:21 AM, Inki Dae wrote: > >> > > >> >> In that case you still wouldn't give userspace control over the > fences. > >> >> I > >> >> don't see any way that can end well. > >> >> What if userspace never signals? What if userspace gets killed by > oom > >> >> killer. Who keeps track of that? > >> >> > >> > > >> > In all cases, all kernel resources to user fence will be released by > >> > kernel > >> > once the fence is timed out: never signaling and process killing by > oom > >> > killer makes the fence timed out. And if we use mmap mechanism you > >> > mentioned > >> > before, I think user resource could also be freed properly. > >> > >> > >> I tend to agree w/ Maarten here.. there is no good reason for > >> userspace to be *signaling* fences. The exception might be some blob > >> gpu drivers which don't have enough knowledge in the kernel to figure > >> out what to do. (In which case you can add driver private ioctls for > >> that.. still not the right thing to do but at least you don't make a > >> public API out of it.) > >> > > > > Please do not care whether those are generic or not. Let's see the > following > > three things. First, it's cache operation. As you know, ARM SoC has ACP > > (Accelerator Coherency Port) and can be connected to DMA engine or > similar > > devices. And this port is used for cache coherency between CPU cache and > DMA > > device. However, most devices on ARM based embedded systems don't use > the > > ACP port. So they need proper cache operation before and after of DMA or > CPU > > access in case of using cachable mapping. Actually, I see many Linux > based > > platforms call cache control interfaces directly for that. I think the > > reason, they do so, is that kernel isn't aware of when and how CPU > accessed > > memory. > > I think we had kicked around the idea of giving dmabuf's a > prepare/finish ioctl quite some time back. This is probably something > that should be at least a bit decoupled from fences. (Possibly > 'prepare' waits for dma access to complete, but not the other way > around.) > > And I did implement in omapdrm support for simulating coherency via > page fault-in / shoot-down.. It is one option that makes it > completely transparent to userspace, although there is some > performance const, so I suppose it depends a bit on your use-case. > > > And second, user process has to do so many things in case of using > shared > > memory with DMA device. User process should understand how DMA device is > > operated and when interfaces for controling the DMA device are called. > Such > > things would make user application so complicated. > > > > And third, it's performance optimization to multimedia and graphics > devices. > > As I mentioned already, we should consider sequential processing for > buffer > > sharing between CPU and DMA device. This means that CPU should stay with > > idle until DMA device is completed and vise versa. > > > > That is why I proposed such user interfaces. Of course, these interfaces > > might be so ugly yet: for this, Maarten pointed already out and I agree > with > > him. But there must be another better way. Aren't you think we need > similar > > thing? With such interfaces, cache control and buffer synchronization > can be > > performed in kernel level. Moreover, user applization doesn't need to > > consider DMA device controlling anymore. Therefore, one thread can > access a > > shared buffer and the other can control DMA device with the shared > buffer in > > parallel. We can really make the best use of CPU and DMA idle time. In > other > > words, we can really make the best use of multi tasking OS, Linux. > > > > So could you please tell me about that there is any reason we don't use > > public API for it? I think we can add and use public API if NECESSARY. > > we
Re: Introduce a new helper framework for buffer synchronization
Hi, On Monday 13 of May 2013 20:24:01 Inki Dae wrote: > > -Original Message- > > From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com] > > Sent: Monday, May 13, 2013 6:52 PM > > To: Inki Dae > > Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm- > > ker...@lists.infradead.org; linux-me...@vger.kernel.org; > > 'linux-fbdev'; > > 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho' > > Subject: Re: Introduce a new helper framework for buffer > > synchronization> > > Op 13-05-13 11:21, Inki Dae schreef: > > >> -Original Message- > > >> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com] > > >> Sent: Monday, May 13, 2013 5:01 PM > > >> To: Inki Dae > > >> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm- > > >> ker...@lists.infradead.org; linux-me...@vger.kernel.org; > > >> linux-fbdev; > > >> Kyungmin Park; myungjoo.ham; YoungJun Cho > > >> Subject: Re: Introduce a new helper framework for buffer > > > > synchronization > > > > >> Op 09-05-13 09:33, Inki Dae schreef: > > >>> Hi all, > > >>> > > >>> This post introduces a new helper framework based on dma fence. > > >>> And > > > > the > > > > >>> purpose of this post is to collect other opinions and advices > > >>> before > > > > RFC > > > > >>> posting. > > >>> > > >>> First of all, this helper framework, called fence helper, is in > > > > progress > > > > >>> yet so might not have enough comments in codes and also might need > > >>> to > > > > be > > > > >>> more cleaned up. Moreover, we might be missing some parts of the > > >>> dma > > >> > > >> fence. > > >> > > >>> However, I'd like to say that all things mentioned below has been > > > > tested > > > > >>> with Linux platform and worked well. > > >>> > > >>> > > >>> And tutorial for user process. > > >>> > > >>> just before cpu access > > >>> > > >>> struct dma_buf_fence *df; > > >>> > > >>> df->type = DMA_BUF_ACCESS_READ or > > DMA_BUF_ACCESS_WRITE; > > > >>> ioctl(fd, DMA_BUF_GET_FENCE, &df); > > >>> > > >>> after memset or memcpy > > >>> > > >>> ioctl(fd, DMA_BUF_PUT_FENCE, &df); > > >> > > >> NAK. > > >> > > >> Userspace doesn't need to trigger fences. It can do a buffer idle > > >> wait, > > >> and postpone submitting new commands until after it's done using > > >> the > > >> buffer. > > > > > > Hi Maarten, > > > > > > It seems that you say user should wait for a buffer like KDS does: > > > KDS > > > > uses > > > > > select() to postpone submitting new commands. But I think this way > > > > assumes > > > > > that every data flows a DMA device to a CPU. For example, a CPU > > > should > > > > keep > > > > > polling for the completion of a buffer access by a DMA device. This > > > > means > > > > > that the this way isn't considered for data flow to opposite case; > > > CPU > > > > to > > > > > DMA device. > > > > Not really. You do both things the same way. You first wait for the bo > > to be idle, this could be implemented by adding poll support to the > > dma-buf fd. > > Then you either do your read or write. Since userspace is supposed to > > be the one controlling the bo it should stay idle at that point. If > > you have another thread queueing > > the buffer againbefore your thread is done that's a bug in the > > application, > > > and can be solved with userspace locking primitives. No need for the > > kernel to get involved. > > Yes, that is how we have synchronized buffer between CPU and DMA device > until now without buffer synchronization mechanism. I thought that it's > best to make user not considering anything: user can acce
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 13, 2013 at 1:18 PM, Inki Dae wrote: > > > 2013/5/13 Rob Clark >> >> On Mon, May 13, 2013 at 8:21 AM, Inki Dae wrote: >> > >> >> In that case you still wouldn't give userspace control over the fences. >> >> I >> >> don't see any way that can end well. >> >> What if userspace never signals? What if userspace gets killed by oom >> >> killer. Who keeps track of that? >> >> >> > >> > In all cases, all kernel resources to user fence will be released by >> > kernel >> > once the fence is timed out: never signaling and process killing by oom >> > killer makes the fence timed out. And if we use mmap mechanism you >> > mentioned >> > before, I think user resource could also be freed properly. >> >> >> I tend to agree w/ Maarten here.. there is no good reason for >> userspace to be *signaling* fences. The exception might be some blob >> gpu drivers which don't have enough knowledge in the kernel to figure >> out what to do. (In which case you can add driver private ioctls for >> that.. still not the right thing to do but at least you don't make a >> public API out of it.) >> > > Please do not care whether those are generic or not. Let's see the following > three things. First, it's cache operation. As you know, ARM SoC has ACP > (Accelerator Coherency Port) and can be connected to DMA engine or similar > devices. And this port is used for cache coherency between CPU cache and DMA > device. However, most devices on ARM based embedded systems don't use the > ACP port. So they need proper cache operation before and after of DMA or CPU > access in case of using cachable mapping. Actually, I see many Linux based > platforms call cache control interfaces directly for that. I think the > reason, they do so, is that kernel isn't aware of when and how CPU accessed > memory. I think we had kicked around the idea of giving dmabuf's a prepare/finish ioctl quite some time back. This is probably something that should be at least a bit decoupled from fences. (Possibly 'prepare' waits for dma access to complete, but not the other way around.) And I did implement in omapdrm support for simulating coherency via page fault-in / shoot-down.. It is one option that makes it completely transparent to userspace, although there is some performance const, so I suppose it depends a bit on your use-case. > And second, user process has to do so many things in case of using shared > memory with DMA device. User process should understand how DMA device is > operated and when interfaces for controling the DMA device are called. Such > things would make user application so complicated. > > And third, it's performance optimization to multimedia and graphics devices. > As I mentioned already, we should consider sequential processing for buffer > sharing between CPU and DMA device. This means that CPU should stay with > idle until DMA device is completed and vise versa. > > That is why I proposed such user interfaces. Of course, these interfaces > might be so ugly yet: for this, Maarten pointed already out and I agree with > him. But there must be another better way. Aren't you think we need similar > thing? With such interfaces, cache control and buffer synchronization can be > performed in kernel level. Moreover, user applization doesn't need to > consider DMA device controlling anymore. Therefore, one thread can access a > shared buffer and the other can control DMA device with the shared buffer in > parallel. We can really make the best use of CPU and DMA idle time. In other > words, we can really make the best use of multi tasking OS, Linux. > > So could you please tell me about that there is any reason we don't use > public API for it? I think we can add and use public API if NECESSARY. well, for cache management, I think it is a better idea.. I didn't really catch that this was the motivation from the initial patch, but maybe I read it too quickly. But cache can be decoupled from synchronization, because CPU access is not asynchronous. For userspace/CPU access to buffer, you should: 1) wait for buffer 2) prepare-access 3) ... do whatever cpu access to buffer ... 4) finish-access 5) submit buffer for new dma-operation I suppose you could combine the syscall for #1 and #2.. not sure if that is a good idea or not. But you don't need to. And there is never really any need for userspace to signal a fence. BR, -R > Thanks, > Inki Dae > >> >> BR, >> -R >> ___ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel > > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
2013/5/13 Rob Clark > On Mon, May 13, 2013 at 8:21 AM, Inki Dae wrote: > > > >> In that case you still wouldn't give userspace control over the fences. > I > >> don't see any way that can end well. > >> What if userspace never signals? What if userspace gets killed by oom > >> killer. Who keeps track of that? > >> > > > > In all cases, all kernel resources to user fence will be released by > kernel > > once the fence is timed out: never signaling and process killing by oom > > killer makes the fence timed out. And if we use mmap mechanism you > mentioned > > before, I think user resource could also be freed properly. > > > I tend to agree w/ Maarten here.. there is no good reason for > userspace to be *signaling* fences. The exception might be some blob > gpu drivers which don't have enough knowledge in the kernel to figure > out what to do. (In which case you can add driver private ioctls for > that.. still not the right thing to do but at least you don't make a > public API out of it.) > > Please do not care whether those are generic or not. Let's see the following three things. First, it's cache operation. As you know, ARM SoC has ACP (Accelerator Coherency Port) and can be connected to DMA engine or similar devices. And this port is used for cache coherency between CPU cache and DMA device. However, most devices on ARM based embedded systems don't use the ACP port. So they need proper cache operation before and after of DMA or CPU access in case of using cachable mapping. Actually, I see many Linux based platforms call cache control interfaces directly for that. I think the reason, they do so, is that kernel isn't aware of when and how CPU accessed memory. And second, user process has to do so many things in case of using shared memory with DMA device. User process should understand how DMA device is operated and when interfaces for controling the DMA device are called. Such things would make user application so complicated. And third, it's performance optimization to multimedia and graphics devices. As I mentioned already, we should consider sequential processing for buffer sharing between CPU and DMA device. This means that CPU should stay with idle until DMA device is completed and vise versa. That is why I proposed such user interfaces. Of course, these interfaces might be so ugly yet: for this, Maarten pointed already out and I agree with him. But there must be another better way. Aren't you think we need similar thing? With such interfaces, cache control and buffer synchronization can be performed in kernel level. Moreover, user applization doesn't need to consider DMA device controlling anymore. Therefore, one thread can access a shared buffer and the other can control DMA device with the shared buffer in parallel. We can really make the best use of CPU and DMA idle time. In other words, we can really make the best use of multi tasking OS, Linux. So could you please tell me about that there is any reason we don't use public API for it? I think we can add and use public API if NECESSARY. Thanks, Inki Dae > BR, > -R > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
On Mon, May 13, 2013 at 8:21 AM, Inki Dae wrote: > >> In that case you still wouldn't give userspace control over the fences. I >> don't see any way that can end well. >> What if userspace never signals? What if userspace gets killed by oom >> killer. Who keeps track of that? >> > > In all cases, all kernel resources to user fence will be released by kernel > once the fence is timed out: never signaling and process killing by oom > killer makes the fence timed out. And if we use mmap mechanism you mentioned > before, I think user resource could also be freed properly. I tend to agree w/ Maarten here.. there is no good reason for userspace to be *signaling* fences. The exception might be some blob gpu drivers which don't have enough knowledge in the kernel to figure out what to do. (In which case you can add driver private ioctls for that.. still not the right thing to do but at least you don't make a public API out of it.) BR, -R ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: linux-fbdev-ow...@vger.kernel.org [mailto:linux-fbdev- > ow...@vger.kernel.org] On Behalf Of Maarten Lankhorst > Sent: Monday, May 13, 2013 8:41 PM > To: Inki Dae > Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm- > ker...@lists.infradead.org; linux-me...@vger.kernel.org; 'linux-fbdev'; > 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho' > Subject: Re: Introduce a new helper framework for buffer synchronization > > Op 13-05-13 13:24, Inki Dae schreef: > >> and can be solved with userspace locking primitives. No need for the > >> kernel to get involved. > >> > > Yes, that is how we have synchronized buffer between CPU and DMA device > > until now without buffer synchronization mechanism. I thought that it's > best > > to make user not considering anything: user can access a buffer > regardless > > of any DMA device controlling and the buffer synchronization is > performed in > > kernel level. Moreover, I think we could optimize graphics and > multimedia > > hardware performance because hardware can do more works: one thread > accesses > > a shared buffer and the other controls DMA device with the shared buffer > in > > parallel. Thus, we could avoid sequential processing and that is my > > intention. Aren't you think about that we could improve hardware > utilization > > with such way or other? of course, there could be better way. > > > If you don't want to block the hardware the only option is to allocate a > temp bo and blit to/from it using the hardware. > OpenGL already does that when you want to read back, because otherwise the > entire pipeline can get stalled. > The overhead of command submission for that shouldn't be high, if it is > you should really try to optimize that first > before coming up with this crazy scheme. > I have considered all devices sharing buffer with CPU; multimedia, display controller and graphics devices (including GPU). And we could provide easy-to-use user interfaces for buffer synchronization. Of course, the proposed user interfaces may be so ugly yet but at least, I think we need something else for it. > In that case you still wouldn't give userspace control over the fences. I > don't see any way that can end well. > What if userspace never signals? What if userspace gets killed by oom > killer. Who keeps track of that? > In all cases, all kernel resources to user fence will be released by kernel once the fence is timed out: never signaling and process killing by oom killer makes the fence timed out. And if we use mmap mechanism you mentioned before, I think user resource could also be freed properly. Thanks, Inki Dae > ~Maarten > -- > To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
Op 13-05-13 13:24, Inki Dae schreef: >> and can be solved with userspace locking primitives. No need for the >> kernel to get involved. >> > Yes, that is how we have synchronized buffer between CPU and DMA device > until now without buffer synchronization mechanism. I thought that it's best > to make user not considering anything: user can access a buffer regardless > of any DMA device controlling and the buffer synchronization is performed in > kernel level. Moreover, I think we could optimize graphics and multimedia > hardware performance because hardware can do more works: one thread accesses > a shared buffer and the other controls DMA device with the shared buffer in > parallel. Thus, we could avoid sequential processing and that is my > intention. Aren't you think about that we could improve hardware utilization > with such way or other? of course, there could be better way. > If you don't want to block the hardware the only option is to allocate a temp bo and blit to/from it using the hardware. OpenGL already does that when you want to read back, because otherwise the entire pipeline can get stalled. The overhead of command submission for that shouldn't be high, if it is you should really try to optimize that first before coming up with this crazy scheme. In that case you still wouldn't give userspace control over the fences. I don't see any way that can end well. What if userspace never signals? What if userspace gets killed by oom killer. Who keeps track of that? ~Maarten ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com] > Sent: Monday, May 13, 2013 6:52 PM > To: Inki Dae > Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm- > ker...@lists.infradead.org; linux-me...@vger.kernel.org; 'linux-fbdev'; > 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho' > Subject: Re: Introduce a new helper framework for buffer synchronization > > Op 13-05-13 11:21, Inki Dae schreef: > > > >> -Original Message- > >> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com] > >> Sent: Monday, May 13, 2013 5:01 PM > >> To: Inki Dae > >> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm- > >> ker...@lists.infradead.org; linux-me...@vger.kernel.org; linux-fbdev; > >> Kyungmin Park; myungjoo.ham; YoungJun Cho > >> Subject: Re: Introduce a new helper framework for buffer > synchronization > >> > >> Op 09-05-13 09:33, Inki Dae schreef: > >>> Hi all, > >>> > >>> This post introduces a new helper framework based on dma fence. And > the > >>> purpose of this post is to collect other opinions and advices before > RFC > >>> posting. > >>> > >>> First of all, this helper framework, called fence helper, is in > progress > >>> yet so might not have enough comments in codes and also might need to > be > >>> more cleaned up. Moreover, we might be missing some parts of the dma > >> fence. > >>> However, I'd like to say that all things mentioned below has been > tested > >>> with Linux platform and worked well. > >>> > >>> > >>> And tutorial for user process. > >>> just before cpu access > >>> struct dma_buf_fence *df; > >>> > >>> df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE; > >>> ioctl(fd, DMA_BUF_GET_FENCE, &df); > >>> > >>> after memset or memcpy > >>> ioctl(fd, DMA_BUF_PUT_FENCE, &df); > >> NAK. > >> > >> Userspace doesn't need to trigger fences. It can do a buffer idle wait, > >> and postpone submitting new commands until after it's done using the > >> buffer. > > Hi Maarten, > > > > It seems that you say user should wait for a buffer like KDS does: KDS > uses > > select() to postpone submitting new commands. But I think this way > assumes > > that every data flows a DMA device to a CPU. For example, a CPU should > keep > > polling for the completion of a buffer access by a DMA device. This > means > > that the this way isn't considered for data flow to opposite case; CPU > to > > DMA device. > Not really. You do both things the same way. You first wait for the bo to > be idle, this could be implemented by adding poll support to the dma-buf > fd. > Then you either do your read or write. Since userspace is supposed to be > the one controlling the bo it should stay idle at that point. If you have > another thread queueing > the buffer againbefore your thread is done that's a bug in the application, > and can be solved with userspace locking primitives. No need for the > kernel to get involved. > Yes, that is how we have synchronized buffer between CPU and DMA device until now without buffer synchronization mechanism. I thought that it's best to make user not considering anything: user can access a buffer regardless of any DMA device controlling and the buffer synchronization is performed in kernel level. Moreover, I think we could optimize graphics and multimedia hardware performance because hardware can do more works: one thread accesses a shared buffer and the other controls DMA device with the shared buffer in parallel. Thus, we could avoid sequential processing and that is my intention. Aren't you think about that we could improve hardware utilization with such way or other? of course, there could be better way. > >> Kernel space doesn't need the root hole you created by giving a > >> dereferencing a pointer passed from userspace. > >> Your next exercise should be to write a security exploit from the api > you > >> created here. It's the only way to learn how to write safe code. Hint: > >> df.ctx = mmap(..); > >> > > Also I'm not clear to use our way yet and that is why I posted. As you > > mentioned, it seems like that using mmap() is more safe. But there is > one > > is
Re: Introduce a new helper framework for buffer synchronization
Op 13-05-13 11:21, Inki Dae schreef: > >> -Original Message- >> From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com] >> Sent: Monday, May 13, 2013 5:01 PM >> To: Inki Dae >> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm- >> ker...@lists.infradead.org; linux-me...@vger.kernel.org; linux-fbdev; >> Kyungmin Park; myungjoo.ham; YoungJun Cho >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> Op 09-05-13 09:33, Inki Dae schreef: >>> Hi all, >>> >>> This post introduces a new helper framework based on dma fence. And the >>> purpose of this post is to collect other opinions and advices before RFC >>> posting. >>> >>> First of all, this helper framework, called fence helper, is in progress >>> yet so might not have enough comments in codes and also might need to be >>> more cleaned up. Moreover, we might be missing some parts of the dma >> fence. >>> However, I'd like to say that all things mentioned below has been tested >>> with Linux platform and worked well. >>> >>> >>> And tutorial for user process. >>> just before cpu access >>> struct dma_buf_fence *df; >>> >>> df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE; >>> ioctl(fd, DMA_BUF_GET_FENCE, &df); >>> >>> after memset or memcpy >>> ioctl(fd, DMA_BUF_PUT_FENCE, &df); >> NAK. >> >> Userspace doesn't need to trigger fences. It can do a buffer idle wait, >> and postpone submitting new commands until after it's done using the >> buffer. > Hi Maarten, > > It seems that you say user should wait for a buffer like KDS does: KDS uses > select() to postpone submitting new commands. But I think this way assumes > that every data flows a DMA device to a CPU. For example, a CPU should keep > polling for the completion of a buffer access by a DMA device. This means > that the this way isn't considered for data flow to opposite case; CPU to > DMA device. Not really. You do both things the same way. You first wait for the bo to be idle, this could be implemented by adding poll support to the dma-buf fd. Then you either do your read or write. Since userspace is supposed to be the one controlling the bo it should stay idle at that point. If you have another thread queueing the buffer againbefore your thread is done that's a bug in the application, and can be solved with userspace locking primitives. No need for the kernel to get involved. >> Kernel space doesn't need the root hole you created by giving a >> dereferencing a pointer passed from userspace. >> Your next exercise should be to write a security exploit from the api you >> created here. It's the only way to learn how to write safe code. Hint: >> df.ctx = mmap(..); >> > Also I'm not clear to use our way yet and that is why I posted. As you > mentioned, it seems like that using mmap() is more safe. But there is one > issue it makes me confusing. For your hint, df.ctx = mmap(..), the issue is > that dmabuf mmap can be used to map a dmabuf with user space. And the dmabuf > means a physical memory region allocated by some allocator such as drm gem > or ion. > > There might be my missing point so could you please give me more comments? > My point was that userspace could change df.ctx to some mmap'd memory, forcing the kernel to execute some code prepared by userspace. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: Introduce a new helper framework for buffer synchronization
> -Original Message- > From: Maarten Lankhorst [mailto:maarten.lankho...@canonical.com] > Sent: Monday, May 13, 2013 5:01 PM > To: Inki Dae > Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm- > ker...@lists.infradead.org; linux-me...@vger.kernel.org; linux-fbdev; > Kyungmin Park; myungjoo.ham; YoungJun Cho > Subject: Re: Introduce a new helper framework for buffer synchronization > > Op 09-05-13 09:33, Inki Dae schreef: > > Hi all, > > > > This post introduces a new helper framework based on dma fence. And the > > purpose of this post is to collect other opinions and advices before RFC > > posting. > > > > First of all, this helper framework, called fence helper, is in progress > > yet so might not have enough comments in codes and also might need to be > > more cleaned up. Moreover, we might be missing some parts of the dma > fence. > > However, I'd like to say that all things mentioned below has been tested > > with Linux platform and worked well. > > > > > > > And tutorial for user process. > > just before cpu access > > struct dma_buf_fence *df; > > > > df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE; > > ioctl(fd, DMA_BUF_GET_FENCE, &df); > > > > after memset or memcpy > > ioctl(fd, DMA_BUF_PUT_FENCE, &df); > NAK. > > Userspace doesn't need to trigger fences. It can do a buffer idle wait, > and postpone submitting new commands until after it's done using the > buffer. Hi Maarten, It seems that you say user should wait for a buffer like KDS does: KDS uses select() to postpone submitting new commands. But I think this way assumes that every data flows a DMA device to a CPU. For example, a CPU should keep polling for the completion of a buffer access by a DMA device. This means that the this way isn't considered for data flow to opposite case; CPU to DMA device. > Kernel space doesn't need the root hole you created by giving a > dereferencing a pointer passed from userspace. > Your next exercise should be to write a security exploit from the api you > created here. It's the only way to learn how to write safe code. Hint: > df.ctx = mmap(..); > Also I'm not clear to use our way yet and that is why I posted. As you mentioned, it seems like that using mmap() is more safe. But there is one issue it makes me confusing. For your hint, df.ctx = mmap(..), the issue is that dmabuf mmap can be used to map a dmabuf with user space. And the dmabuf means a physical memory region allocated by some allocator such as drm gem or ion. There might be my missing point so could you please give me more comments? Thanks, Inki Dae > ~Maarten ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Introduce a new helper framework for buffer synchronization
Op 09-05-13 09:33, Inki Dae schreef: > Hi all, > > This post introduces a new helper framework based on dma fence. And the > purpose of this post is to collect other opinions and advices before RFC > posting. > > First of all, this helper framework, called fence helper, is in progress > yet so might not have enough comments in codes and also might need to be > more cleaned up. Moreover, we might be missing some parts of the dma fence. > However, I'd like to say that all things mentioned below has been tested > with Linux platform and worked well. > > > And tutorial for user process. > just before cpu access > struct dma_buf_fence *df; > > df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE; > ioctl(fd, DMA_BUF_GET_FENCE, &df); > > after memset or memcpy > ioctl(fd, DMA_BUF_PUT_FENCE, &df); NAK. Userspace doesn't need to trigger fences. It can do a buffer idle wait, and postpone submitting new commands until after it's done using the buffer. Kernel space doesn't need the root hole you created by giving a dereferencing a pointer passed from userspace. Your next exercise should be to write a security exploit from the api you created here. It's the only way to learn how to write safe code. Hint: df.ctx = mmap(..); ~Maarten ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel