> -----Original Message----- > From: Lucas Stach [mailto:l.stach at pengutronix.de] > Sent: Thursday, June 20, 2013 7:11 PM > To: Inki Dae > Cc: 'Russell King - ARM Linux'; 'Inki Dae'; 'linux-fbdev'; 'YoungJun Cho'; > 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm- > kernel at lists.infradead.org; linux-media at vger.kernel.org > Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization > framework > > Am Donnerstag, den 20.06.2013, 17:24 +0900 schrieb Inki Dae: > [...] > > > > In addition, please see the below more detail examples. > > > > > > > > The conventional way (without dmabuf-sync) is: > > > > Task A > > > > ---------------------------- > > > > 1. CPU accesses buf > > > > 2. Send the buf to Task B > > > > 3. Wait for the buf from Task B > > > > 4. go to 1 > > > > > > > > Task B > > > > --------------------------- > > > > 1. Wait for the buf from Task A > > > > 2. qbuf the buf > > > > 2.1 insert the buf to incoming queue > > > > 3. stream on > > > > 3.1 dma_map_sg if ready, and move the buf to ready queue > > > > 3.2 get the buf from ready queue, and dma start. > > > > 4. dqbuf > > > > 4.1 dma_unmap_sg after dma operation completion > > > > 4.2 move the buf to outgoing queue > > > > 5. back the buf to Task A > > > > 6. go to 1 > > > > > > > > In case that two tasks share buffers, and data flow goes from Task A > to > > > Task > > > > B, we would need IPC operation to send and receive buffers properly > > > between > > > > those two tasks every time CPU or DMA access to buffers is started > or > > > > completed. > > > > > > > > > > > > With dmabuf-sync: > > > > > > > > Task A > > > > ---------------------------- > > > > 1. dma_buf_sync_lock <- synpoint (call by user side) > > > > 2. CPU accesses buf > > > > 3. dma_buf_sync_unlock <- syncpoint (call by user side) > > > > 4. Send the buf to Task B (just one time) > > > > 5. go to 1 > > > > > > > > > > > > Task B > > > > --------------------------- > > > > 1. Wait for the buf from Task A (just one time) > > > > 2. qbuf the buf > > > > 1.1 insert the buf to incoming queue > > > > 3. stream on > > > > 3.1 dma_buf_sync_lock <- syncpoint (call by kernel side) > > > > 3.2 dma_map_sg if ready, and move the buf to ready queue > > > > 3.3 get the buf from ready queue, and dma start. > > > > 4. dqbuf > > > > 4.1 dma_buf_sync_unlock <- syncpoint (call by kernel side) > > > > 4.2 dma_unmap_sg after dma operation completion > > > > 4.3 move the buf to outgoing queue > > > > 5. go to 1 > > > > > > > > On the other hand, in case of using dmabuf-sync, as you can see the > > > above > > > > example, we would need IPC operation just one time. That way, I > think we > > > > could not only reduce performance overhead but also make user > > > application > > > > simplified. Of course, this approach can be used for all DMA device > > > drivers > > > > such as DRM. I'm not a specialist in v4l2 world so there may be > missing > > > > point. > > > > > > > > > > You already need some kind of IPC between the two tasks, as I suspect > > > even in your example it wouldn't make much sense to queue the buffer > > > over and over again in task B without task A writing anything to it. > So > > > task A has to signal task B there is new data in the buffer to be > > > processed. > > > > > > There is no need to share the buffer over and over again just to get > the > > > two processes to work together on the same thing. Just share the fd > > > between both and then do out-of-band completion signaling, as you need > > > this anyway. Without this you'll end up with unpredictable behavior. > > > Just because sync allows you to access the buffer doesn't mean it's > > > valid for your use-case. Without completion signaling you could easily > > > end up overwriting your data from task A multiple times before task B > > > even tries to lock the buffer for processing. > > > > > > So the valid flow is (and this already works with the current APIs): > > > Task A Task B > > > ------ ------ > > > CPU access buffer > > > ----------completion signal---------> > > > qbuf (dragging buffer into > > > device domain, flush caches, > > > reserve buffer etc.) > > > | > > > wait for device operation to > > > complete > > > | > > > dqbuf (dragging buffer back > > > into CPU domain, invalidate > > > caches, unreserve) > > > <---------completion signal------------ > > > CPU access buffer > > > > > > > Correct. In case that data flow goes from A to B, it needs some kind > > of IPC between the two tasks every time as you said. Then, without > > dmabuf-sync, how do think about the case that two tasks share the same > > buffer but these tasks access the buffer(buf1) as write, and data of > > the buffer(buf1) isn't needed to be shared? > > > Sorry, I don't see the point you are trying to solve here. If you share > a buffer and want its content to be clearly defined at every point in > time you have to synchronize the tasks working with the buffer, not just > the buffer accesses itself. > > Easiest way to do so is doing sync through userspace with out-of-band > IPC, like in the example above.
In my opinion, that's not definitely easiest way. What I try to do is to avoid using *the out-of-band IPC*. As I mentioned in document file, the conventional mechanism not only makes user application complicated-user process needs to understand how the device driver is worked-but also may incur performance overhead by using the out-of-band IPC. The above my example may not be enough to you but there would be other cases able to use my approach efficiently. > A more advanced way to achieve this > would be using cross-device fences to avoid going through userspace for > every syncpoint. > Ok, maybe there is something I missed. So question. What is the cross-device fences? dma fence?. And how we can achieve the synchronization mechanism without going through user space for every syncpoint; CPU and DMA share a same buffer?. And could you explain it in detail as long as possible like I did? > > > > With dmabuf-sync is: > > > > Task A > > ---------------------------- > > 1. dma_buf_sync_lock <- synpoint (call by user side) > > 2. CPU writes something to buf1 > > 3. dma_buf_sync_unlock <- syncpoint (call by user side) > > 4. copy buf1 to buf2 > Random contents here? What's in the buffer, content from the CPU write, > or from V4L2 device write? > Please presume that buf1 is physically non contiguous memory, and buf2 is physically contiguous memory; device A without IOMMU is seeing buf2. We would need to copy buf1 to buf2 to send the contents of the buf1 to device A because DMA of the device A cannot access the buf1 directly. And CPU and V4L2 device don't share the contents of the buf1 but share the buf1 as storage. Thanks, Inki Dae > > 5. go to 1 > > > > > > Task B > > --------------------------- > > 1. dma_buf_sync_lock > > 2. CPU writes something to buf3 > > 3. dma_buf_sync_unlock > > 4. qbuf the buf3(src) and buf1(dst) > > 4.1 insert buf3,1 to incoming queue > > 4.2 dma_buf_sync_lock <- syncpoint (call by kernel side) > > 5. stream on > > 5.1 dma_map_sg if ready, and move the buf to ready queue > > 5.2 get the buf from ready queue, and dma start. > > 6. dqbuf > > 6.1 dma_buf_sync_unlock <- syncpoint (call by kernel side) > > 6.2 dma_unmap_sg after dma operation completion > > 6.3 move the buf3,1 to outgoing queue > > 7. go to 1 > > > > Regards, > Lucas > -- > Pengutronix e.K. | Lucas Stach | > Industrial Linux Solutions | http://www.pengutronix.de/ | > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 | > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |