On Fri, Mar 2, 2012 at 6:23 PM, Sakari Ailus <sakari.ailus at iki.fi> wrote: > Hi Daniel, > > Thanks for the patch. > > On Thu, Mar 01, 2012 at 04:36:01PM +0100, Daniel Vetter wrote: >> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch> >> --- >> ?Documentation/dma-buf-sharing.txt | ?102 >> +++++++++++++++++++++++++++++++++++- >> ?1 files changed, 99 insertions(+), 3 deletions(-) >> >> diff --git a/Documentation/dma-buf-sharing.txt >> b/Documentation/dma-buf-sharing.txt >> index 225f96d..f12542b 100644 >> --- a/Documentation/dma-buf-sharing.txt >> +++ b/Documentation/dma-buf-sharing.txt >> @@ -32,8 +32,12 @@ The buffer-user >> ?*IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details] >> ?For this first version, A buffer shared using the dma_buf sharing API: >> ?- *may* be exported to user space using "mmap" *ONLY* by exporter, outside >> of >> - ? this framework. >> -- may be used *ONLY* by importers that do not need CPU access to the buffer. >> + ?this framework. >> +- with this new iteration of the dma-buf api cpu access from the kernel has >> been >> + ?enable, see below for the details. >> + >> +dma-buf operations for device dma only >> +-------------------------------------- >> >> ?The dma_buf buffer sharing API usage contains the following steps: >> >> @@ -219,7 +223,99 @@ NOTES: >> ? ? If the exporter chooses not to allow an attach() operation once a >> ? ? map_dma_buf() API has been called, it simply returns an error. >> >> -Miscellaneous notes: >> +Kernel cpu access to a dma-buf buffer object >> +-------------------------------------------- >> + >> +The motivation to allow cpu access from the kernel to a dma-buf object from >> the >> +importers side are: >> +- fallback operations, e.g. if the devices is connected to a usb bus and the >> + ?kernel needs to shuffle the data around first before sending it away. >> +- full transperancy for existing users on the importer side, i.e. userspace >> + ?should not notice the difference between a normal object from that >> subsystem >> + ?and an imported one backed by a dma-buf. This is really important for drm >> + ?opengl drivers that expect to still use all the existing upload/download >> + ?paths. >> + >> +Access to a dma_buf from the kernel context involves three steps: >> + >> +1. Prepare access, which invalidate any necessary caches and make the object >> + ? available for cpu access. >> +2. Access the object page-by-page with the dma_buf map apis >> +3. Finish access, which will flush any necessary cpu caches and free >> reserved >> + ? resources. > > Where it should be decided which operations are being done to the buffer > when it is passed to user space and back to kernel space? > > How about spliting these operations to those done on the first time the > buffer is passed to the user space (mapping to kernel address space, for > example) and those required every time buffer is passed from kernel to user > and back (cache flusing)? > > I'm asking since any unnecessary time-consuming operations, especially as > heavy as mapping the buffer, should be avoidable in subsystems dealing > with streaming video, cameras etc., i.e. non-GPU users.
Well, this is really something for the buffer exporter to deal with.. since there is no way for an importer to create a userspace mmap'ing of the buffer. A lot of these expensive operations go away if you don't even create a userspace virtual mapping in the first place ;-) BR, -R > >> +1. Prepare acces >> + >> + ? Before an importer can acces a dma_buf object with the cpu from the >> kernel >> + ? context, it needs to notice the exporter of the access that is about to >> + ? happen. >> + >> + ? Interface: >> + ? ? ?int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?size_t start, size_t len, >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?enum dma_data_direction direction) >> + >> + ? This allows the exporter to ensure that the memory is actually available >> for >> + ? cpu access - the exporter might need to allocate or swap-in and pin the >> + ? backing storage. The exporter also needs to ensure that cpu access is >> + ? coherent for the given range and access direction. The range and access >> + ? direction can be used by the exporter to optimize the cache flushing, >> i.e. >> + ? access outside of the range or with a different direction (read instead >> of >> + ? write) might return stale or even bogus data (e.g. when the exporter >> needs to >> + ? copy the data to temporaray storage). >> + >> + ? This step might fail, e.g. in oom conditions. >> + >> +2. Accessing the buffer >> + >> + ? To support dma_buf objects residing in highmem cpu access is page-based >> using >> + ? an api similar to kmap. Accessing a dma_buf is done in aligned chunks of >> + ? PAGE_SIZE size. Before accessing a chunk it needs to be mapped, which >> returns >> + ? a pointer in kernel virtual address space. Afterwards the chunk needs to >> be >> + ? unmapped again. There is no limit on how often a given chunk can be >> mapped >> + ? and unmmapped, i.e. the importer does not need to call begin_cpu_access >> again >> + ? before mapping the same chunk again. >> + >> + ? Interfaces: >> + ? ? ?void *dma_buf_kmap(struct dma_buf *, unsigned long); >> + ? ? ?void dma_buf_kunmap(struct dma_buf *, unsigned long, void *); >> + >> + ? There are also atomic variants of these interfaces. Like for kmap they >> + ? facilitate non-blocking fast-paths. Neither the importer nor the >> exporter (in >> + ? the callback) is allowed to block when using these. >> + >> + ? Interfaces: >> + ? ? ?void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long); >> + ? ? ?void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *); >> + >> + ? For importers all the restrictions of using kmap apply, like the limited >> + ? supply of kmap_atomic slots. Hence an importer shall only hold onto at >> most 2 >> + ? atomic dma_buf kmaps at the same time (in any given process context). >> + >> + ? dma_buf kmap calls outside of the range specified in begin_cpu_access are >> + ? undefined. If the range is not PAGE_SIZE aligned, kmap needs to succeed >> on >> + ? the partial chunks at the beginning and end but may return stale or bogus >> + ? data outside of the range (in these partial chunks). >> + >> + ? Note that these calls need to always succeed. The exporter needs to >> complete >> + ? any preparations that might fail in begin_cpu_access. >> + >> +3. Finish access >> + >> + ? When the importer is done accessing the range specified in >> begin_cpu_acces, >> + ? it needs to announce this to the exporter (to facilitate cache flushing >> and >> + ? unpinning of any pinned resources). The result of of any dma_buf kmap >> calls >> + ? after end_cpu_access is undefined. >> + >> + ? Interface: >> + ? ? ?void dma_buf_end_cpu_access(struct dma_buf *dma_buf, >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? size_t start, size_t len, >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? enum dma_data_direction dir); >> + >> + >> +Miscellaneous notes >> +------------------- >> + >> ?- Any exporters or users of the dma-buf buffer sharing framework must have >> ? ?a 'select DMA_SHARED_BUFFER' in their respective Kconfigs. > > Kind regards, > > -- > Sakari Ailus > e-mail: sakari.ailus at iki.fi ? ? jabber/XMPP/Gmail: sailus at retiisi.org.uk > > _______________________________________________ > Linaro-mm-sig mailing list > Linaro-mm-sig at lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-mm-sig