On 12/18/2015 08:50 PM, Tiago Vignatti wrote: > On 12/17/2015 07:58 PM, Thomas Hellstrom wrote: >> On 12/16/2015 11:25 PM, Tiago Vignatti wrote: >>> From: Daniel Vetter <daniel.vetter at ffwll.ch> >>> >>> The userspace might need some sort of cache coherency management >>> e.g. when CPU >>> and GPU domains are being accessed through dma-buf at the same time. To >>> circumvent this problem there are begin/end coherency markers, that >>> forward >>> directly to existing dma-buf device drivers vfunc hooks. Userspace >>> can make use >>> of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence >>> would be >>> used like following: >>> - mmap dma-buf fd >>> - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. >>> read/write >>> to mmap area 3. SYNC_END ioctl. This can be repeated as >>> often as you >>> want (with the new data being consumed by the GPU or say >>> scanout device) >>> - munmap once you don't need the buffer any more >>> >>> v2 (Tiago): Fix header file type names (u64 -> __u64) >>> v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the >>> begin/end >>> dma-buf functions. Check for overflows in start/length. >>> v4 (Tiago): use 2d regions for sync. >>> v5 (Tiago): forget about 2d regions (v4); use _IOW in >>> DMA_BUF_IOCTL_SYNC and >>> remove range information from struct dma_buf_sync. >>> v6 (Tiago): use __u64 structured padded flags instead enum. Adjust >>> documentation about the recommendation on using sync ioctls. >>> >>> Cc: Sumit Semwal <sumit.semwal at linaro.org> >>> Signed-off-by: Daniel Vetter <daniel.vetter at intel.com> >>> Signed-off-by: Tiago Vignatti <tiago.vignatti at intel.com> >>> --- >>> Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++- >>> drivers/dma-buf/dma-buf.c | 43 >>> +++++++++++++++++++++++++++++++++++++++ >>> include/uapi/linux/dma-buf.h | 38 >>> ++++++++++++++++++++++++++++++++++ >>> 3 files changed, 102 insertions(+), 1 deletion(-) >>> create mode 100644 include/uapi/linux/dma-buf.h >>> >>> diff --git a/Documentation/dma-buf-sharing.txt >>> b/Documentation/dma-buf-sharing.txt >>> index 4f4a84b..2ddd4b2 100644 >>> --- a/Documentation/dma-buf-sharing.txt >>> +++ b/Documentation/dma-buf-sharing.txt >>> @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer >>> object has 2 main use-cases: >>> handles, too). So it's beneficial to support this in a similar >>> fashion on >>> dma-buf to have a good transition path for existing Android >>> userspace. >>> >>> - No special interfaces, userspace simply calls mmap on the >>> dma-buf fd. >>> + No special interfaces, userspace simply calls mmap on the >>> dma-buf fd. Very >>> + important to note though is that, even if it is not mandatory, >>> the userspace >>> + is strongly recommended to always use the cache synchronization >>> ioctl >>> + (DMA_BUF_IOCTL_SYNC) discussed next. >>> + >>> + Some systems might need some sort of cache coherency management >>> e.g. when >>> + CPU and GPU domains are being accessed through dma-buf at the >>> same time. To >>> + circumvent this problem there are begin/end coherency markers, >>> that forward >>> + directly to existing dma-buf device drivers vfunc hooks. >>> Userspace can make >>> + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The >>> sequence >>> + would be used like following: >>> + - mmap dma-buf fd >>> + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. >>> read/write >>> + to mmap area 3. SYNC_END ioctl. This can be repeated as >>> often as you >>> + want (with the new data being consumed by the GPU or say >>> scanout device) >>> + - munmap once you don't need the buffer any more >>> + >>> + In principle systems with the memory cache shared by the GPU >>> and CPU may >>> + not need SYNC_START and SYNC_END but still, userspace is always >>> encouraged >>> + to use these ioctls before and after, respectively, when >>> accessing the >>> + mapped address. >>> >> >> I think the wording here is far too weak. If this is a generic >> user-space interface and syncing >> is required for >> a) Correctness: then syncing must be mandatory. >> b) Optimal performance then an implementation must generate expected >> results also in the absence of SYNC ioctls, but is allowed to rely on >> correct pairing of SYNC_START and SYNC_END to render correctly. > > Thomas, do you think the following write-up captures this? > > > - No special interfaces, userspace simply calls mmap on the dma-buf fd. > + No special interfaces, userspace simply calls mmap on the dma-buf > fd, making > + sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is > *always* > + used when the access happens. This is discussed next paragraphs. > + > + Some systems might need some sort of cache coherency management > e.g. when > + CPU and GPU domains are being accessed through dma-buf at the same > time. To > + circumvent this problem there are begin/end coherency markers, > that forward > + directly to existing dma-buf device drivers vfunc hooks. Userspace > can make > + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The > sequence > + would be used like following: > + - mmap dma-buf fd > + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. > read/write > + to mmap area 3. SYNC_END ioctl. This can be repeated as often > as you > + want (with the new data being consumed by the GPU or say > scanout device) > + - munmap once you don't need the buffer any more > + > + Therefore, for correctness and optimal performance, systems with > the memory > + cache shared by the GPU and CPU i.e. the "coherent" and also > "incoherent" > + systems are always required to use SYNC_START and SYNC_END before > and > + after, respectively, when accessing the mapped address. > > > Thank you, > > Tiago Yes, that sounds better,
Thanks, Thomas