Re: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-11-08 Thread Clark, Rob
On Thu, Nov 3, 2011 at 3:04 AM, Marek Szyprowski
m.szyprow...@samsung.com wrote:
 Hello,

 I'm sorry for a late reply, but after Kernel Summit/ELC I have some comments.

 On Friday, October 14, 2011 5:35 PM Daniel Vetter wrote:

 On Fri, Oct 14, 2011 at 12:00:58PM +0200, Tomasz Stanislawski wrote:
  +/**
  + * struct dma_buf_ops - operations possible on struct dma_buf
  + * @create: creates a struct dma_buf of a fixed size. Actual allocation
  + *            does not happen here.
 
  The 'create' ops is not present in dma_buf_ops.
 
  + * @attach: allows different devices to 'attach' themselves to the given
  + *            buffer. It might return -EBUSY to signal that backing 
  storage
  + *            is already allocated and incompatible with the requirements
  + *            of requesting device. [optional]
  + * @detach: detach a given device from this buffer. [optional]
  + * @get_scatterlist: returns list of scatter pages allocated, increases
  + *                     usecount of the buffer. Requires atleast one 
  attach to be
  + *                     called before. Returned sg list should already be 
  mapped
  + *                     into _device_ address space.
 
  You must add a comment that this call 'may sleep'.
 
  I like the get_scatterlist idea. It allows the exported to create a
  valid scatterlist for a client in a elegant way.
 
  I do not like this whole attachment idea. The problem is that
  currently there is no support in DMA framework for allocation for
  multiple devices. As long as no such a support exists, there is no
  generic way to handle attribute negotiations and buffer allocations
  that involve multiple devices. So the exporter drivers would have to
  implement more or less hacky solutions to handle memory requirements
  and choosing the device that allocated memory.
 
  Currently, AFAIK there is even no generic way for a driver to
  acquire its own DMA memory requirements.
 
  Therefore all logic hidden beneath 'attachment' is pointless. I
  think that support for attach/detach (and related stuff) should be
  postponed until support for multi-device allocation is added to DMA
  framework.

 Imo we clearly need this to make the multi-device-driver with insane dma
 requirements work on arm. And rewriting the buffer handling in
 participating subsystem twice isn't really a great plan. I envision that
 on platforms where we need this madness, the driver must call back to the
 dma subsytem to create a dma_buf. The dma subsytem should be already aware
 of all the requirements and hence should be able to handle them..

  I don't say the attachment list idea is wrong but adding attachment
  stuff creates an illusion that problem of multi-device allocations
  is somehow magically solved. We should not force the developers of
  exporter drivers to solve the problem that is not solvable yet.

 Well, this is why we need to create a decent support infrastructure for
 platforms (= arm madness) that needs this, so that device drivers and
 subsystem don't need to invent that wheel on their own. Which as you point
 out, they actually can't.

 The real question is whether it is possible to create any generic support
 infrastructure. I really doubt. IMHO this is something that will be hacked for
 each 'product release' and will never read the mainline...

  The other problem are the APIs. For example, the V4L2 subsystem
  assumes that memory is allocated after successful VIDIOC_REQBUFS
  with V4L2_MEMORY_MMAP memory type. Therefore attach would be
  automatically followed by get_scatterlist, blocking possibility of
  any buffer migrations in future.

 Well, pardon to break the news, but v4l needs to rework the buffer
 handling. If you want to share buffers with a gpu driver, you _have_ to
 life with the fact that gpus do fully dynamic buffer management, meaning:
 - buffers get allocated and destroyed on the fly, meaning static reqbuf
   just went out the window (we obviously cache buffer objects and reuse
   them for performance, as long as the processing pipeline doesn't really
   change).
 - buffers get moved around in memory, meaning you either need full-blown
   sync-objects with a callback to drivers to tear-down mappings on-demand,
   or every driver needs to guarnatee to call put_scatterlist in a
   reasonable short time. The latter is probably the more natural thing for
   v4l devices.

 I'm really not convinced if it is possible to go for the completely dynamic
 buffer management, especially if we are implementing a proof-of-concept
 solution. Please notice the following facts:

 1. all v4l2 drivers do the 'static' buffer management - memory is being
 allocated on REQBUF() call and then mapped permanently into both userspace
 and dma (io) address space.

Is this strictly true if we are introducing a new 'enum v4l2_memory'
for dmabuf's?  Shouldn't that give us some flexibility, especially if
the v4l2 device is only the importer, not the allocator, of the
memory.

and a couple 

RE: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-11-03 Thread Marek Szyprowski
Hello,

I'm sorry for a late reply, but after Kernel Summit/ELC I have some comments.

On Friday, October 14, 2011 5:35 PM Daniel Vetter wrote:

 On Fri, Oct 14, 2011 at 12:00:58PM +0200, Tomasz Stanislawski wrote:
  +/**
  + * struct dma_buf_ops - operations possible on struct dma_buf
  + * @create: creates a struct dma_buf of a fixed size. Actual allocation
  + *does not happen here.
 
  The 'create' ops is not present in dma_buf_ops.
 
  + * @attach: allows different devices to 'attach' themselves to the given
  + *buffer. It might return -EBUSY to signal that backing 
  storage
  + *is already allocated and incompatible with the requirements
  + *of requesting device. [optional]
  + * @detach: detach a given device from this buffer. [optional]
  + * @get_scatterlist: returns list of scatter pages allocated, increases
  + * usecount of the buffer. Requires atleast one 
  attach to be
  + * called before. Returned sg list should already be 
  mapped
  + * into _device_ address space.
 
  You must add a comment that this call 'may sleep'.
 
  I like the get_scatterlist idea. It allows the exported to create a
  valid scatterlist for a client in a elegant way.
 
  I do not like this whole attachment idea. The problem is that
  currently there is no support in DMA framework for allocation for
  multiple devices. As long as no such a support exists, there is no
  generic way to handle attribute negotiations and buffer allocations
  that involve multiple devices. So the exporter drivers would have to
  implement more or less hacky solutions to handle memory requirements
  and choosing the device that allocated memory.
 
  Currently, AFAIK there is even no generic way for a driver to
  acquire its own DMA memory requirements.
 
  Therefore all logic hidden beneath 'attachment' is pointless. I
  think that support for attach/detach (and related stuff) should be
  postponed until support for multi-device allocation is added to DMA
  framework.
 
 Imo we clearly need this to make the multi-device-driver with insane dma
 requirements work on arm. And rewriting the buffer handling in
 participating subsystem twice isn't really a great plan. I envision that
 on platforms where we need this madness, the driver must call back to the
 dma subsytem to create a dma_buf. The dma subsytem should be already aware
 of all the requirements and hence should be able to handle them..
 
  I don't say the attachment list idea is wrong but adding attachment
  stuff creates an illusion that problem of multi-device allocations
  is somehow magically solved. We should not force the developers of
  exporter drivers to solve the problem that is not solvable yet.
 
 Well, this is why we need to create a decent support infrastructure for
 platforms (= arm madness) that needs this, so that device drivers and
 subsystem don't need to invent that wheel on their own. Which as you point
 out, they actually can't.

The real question is whether it is possible to create any generic support
infrastructure. I really doubt. IMHO this is something that will be hacked for
each 'product release' and will never read the mainline...
 
  The other problem are the APIs. For example, the V4L2 subsystem
  assumes that memory is allocated after successful VIDIOC_REQBUFS
  with V4L2_MEMORY_MMAP memory type. Therefore attach would be
  automatically followed by get_scatterlist, blocking possibility of
  any buffer migrations in future.
 
 Well, pardon to break the news, but v4l needs to rework the buffer
 handling. If you want to share buffers with a gpu driver, you _have_ to
 life with the fact that gpus do fully dynamic buffer management, meaning:
 - buffers get allocated and destroyed on the fly, meaning static reqbuf
   just went out the window (we obviously cache buffer objects and reuse
   them for performance, as long as the processing pipeline doesn't really
   change).
 - buffers get moved around in memory, meaning you either need full-blown
   sync-objects with a callback to drivers to tear-down mappings on-demand,
   or every driver needs to guarnatee to call put_scatterlist in a
   reasonable short time. The latter is probably the more natural thing for
   v4l devices.

I'm really not convinced if it is possible to go for the completely dynamic
buffer management, especially if we are implementing a proof-of-concept 
solution. Please notice the following facts:

1. all v4l2 drivers do the 'static' buffer management - memory is being 
allocated on REQBUF() call and then mapped permanently into both userspace 
and dma (io) address space.

2. dma-mapping api is very limited in the area of the dynamic buffer management,
this API has been designed definitely for static buffer allocation and mapping.

It looks that fully dynamic buffer management requires a complete change of 
v4l2 api principles (V4L3?) and a completely new DMA API interface. 

Re: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-10-14 Thread Tomasz Stanislawski

Hi Mr. Sumit Semwal,
Thank you for taking care of the framework for buffer sharing.
The support of buffer sharing in V4L2, both exporting and importing was 
posted in shrbuf proof-of-concept patch. It should be easy to port it to 
dmabuf.


http://lists.linaro.org/pipermail/linaro-mm-sig/2011-August/000485.html

Please refer to the comments below:

On 10/11/2011 11:23 AM, Sumit Semwal wrote:

This is the first step in defining a dma buffer sharing mechanism.

A new buffer object dma_buf is added, with operations and API to allow easy
sharing of this buffer object across devices.

The framework allows:
- a new buffer-object to be created with fixed size.
- different devices to 'attach' themselves to this buffer, to facilitate
   backing storage negotiation, using dma_buf_attach() API.
- association of a file pointer with each user-buffer and associated
allocator-defined operations on that buffer. This operation is called the
'export' operation.
- this exported buffer-object to be shared with the other entity by asking for
its 'file-descriptor (fd)', and sharing the fd across.
- a received fd to get the buffer object back, where it can be accessed using
the associated exporter-defined operations.
- the exporter and user to share the scatterlist using get_scatterlist and
put_scatterlist operations.

Atleast one 'attach()' call is required to be made prior to calling the
get_scatterlist() operation.

Couple of building blocks in get_scatterlist() are added to ease introduction
of sync'ing across exporter and users, and late allocation by the exporter.

mmap() file operation is provided for the associated 'fd', as wrapper over the
optional allocator defined mmap(), to be used by devices that might need one.

More details are there in the documentation patch.

This is based on design suggestions from many people at the mini-summits[1],
most notably from Arnd Bergmanna...@arndb.de, Rob Clarkr...@ti.com  and
Daniel Vetterdan...@ffwll.ch.

The implementation is inspired from proof-of-concept patch-set from
Tomasz Stanislawskit.stanisl...@samsung.com, who demonstrated buffer sharing
between two v4l2 devices. [2]

[1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
[2]: http://lwn.net/Articles/454389

Signed-off-by: Sumit Semwalsumit.sem...@linaro.org
Signed-off-by: Sumit Semwalsumit.sem...@ti.com
---
  drivers/base/Kconfig|   10 ++
  drivers/base/Makefile   |1 +
  drivers/base/dma-buf.c  |  242 +++
  include/linux/dma-buf.h |  162 +++
  4 files changed, 415 insertions(+), 0 deletions(-)
  create mode 100644 drivers/base/dma-buf.c
  create mode 100644 include/linux/dma-buf.h

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 21cf46f..07d8095 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,4 +174,14 @@ config SYS_HYPERVISOR

  source drivers/base/regmap/Kconfig

+config DMA_SHARED_BUFFER
+   bool Buffer framework to be shared between drivers
+   default n
+   depends on ANON_INODES
+   help
+ This option enables the framework for buffer-sharing between
+ multiple drivers. A buffer is associated with a file using driver
+ APIs extension; the file's descriptor can then be passed on to other
+ driver.
+
  endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 99a375a..d0df046 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_DEVTMPFS)  += devtmpfs.o
  obj-y += power/
  obj-$(CONFIG_HAS_DMA) += dma-mapping.o
  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
+obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o
  obj-$(CONFIG_ISA) += isa.o
  obj-$(CONFIG_FW_LOADER)   += firmware_class.o
  obj-$(CONFIG_NUMA)+= node.o
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
new file mode 100644
index 000..58c51a0
--- /dev/null
+++ b/drivers/base/dma-buf.c
@@ -0,0 +1,242 @@
+/*
+ * Framework for buffer objects that can be shared across devices/subsystems.
+ *
+ * Copyright(C) 2011 Linaro Limited. All rights reserved.
+ * Author: Sumit Semwalsumit.sem...@ti.com
+ *
+ * Many thanks to linaro-mm-sig list, and specially
+ * Arnd Bergmanna...@arndb.de, Rob Clarkr...@ti.com  and
+ * Daniel Vetterdan...@ffwll.ch  for their support in creation and
+ * refining of this idea.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, 

Re: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-10-14 Thread Sumit Semwal
On 14 October 2011 15:30, Tomasz Stanislawski t.stanisl...@samsung.com wrote:
 Hi Mr. Sumit Semwal,
Hello Mr. Tomasz Stanislawski :),

 Thank you for taking care of the framework for buffer sharing.
 The support of buffer sharing in V4L2, both exporting and importing was
 posted in shrbuf proof-of-concept patch. It should be easy to port it to
 dmabuf.

 http://lists.linaro.org/pipermail/linaro-mm-sig/2011-August/000485.html
I should thank you for the wonderful proof-of-concept patch, and the
idea behind it! I am currently working on the V4L2 side patch for it,
and would send out the RFC soon.
Also, thanks for a good review and some pertinent points; replies inline.

 Please refer to the comments below:

 On 10/11/2011 11:23 AM, Sumit Semwal wrote:

snip
 +/**
 + * dma_buf_export - Creates a new dma_buf, and associates an anon file
 + * with this buffer,so it can be exported.
 + * Also connect the allocator specific data and ops to the buffer.
 + *
 + * @priv:      [in]    Attach private data of allocator to this buffer
 + * @ops:       [in]    Attach allocator-defined dma buf ops to the new
 buffer.
 + * @flags:     [in]    mode flags for the file.

 What is the purpose of these flags? The file is not visible to any process
 by any file system, is it?
These are the standard file mode flags, which can be used to do
file-level access-type control by the exporter, so for example
write-access can be denied for  a buffer exported as a read-only
buffer.

 + *
 + * Returns, on success, a newly created dma_buf object, which wraps the
 + * supplied private data and operations for dma_buf_ops. On failure to
 + * allocate the dma_buf object, it can return NULL.
 + *
 + */
 +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
 +                               int flags)
 +{
 +       struct dma_buf *dmabuf;
 +       struct file *file;
 +

 why priv is not allowed to be NULL?
priv will be used by the exporter to attach its own context to the dma
buf; I couldn't think of any case where it could be NULL?

 +       BUG_ON(!priv || !ops);
 +
snip
 +       BUG_ON(!dmabuf-file);
 +
 +       fput(dmabuf-file);
 +

 return is not needed
Right; will correct this.

 +       return;
 +}
 +EXPORT_SYMBOL(dma_buf_put);
 +
 +/**
 + * dma_buf_attach - Add the device to dma_buf's attachments list;
 optionally,
 + * calls attach() of dma_buf_ops to allow device-specific attach
 functionality
 + * @dmabuf:    [in]    buffer to attach device to.
 + * @dev:       [in]    device to be attached.
 + *
 + * Returns struct dma_buf_attachment * for this attachment; may return
 NULL.
 + *
 + */
 +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
 +                                               struct device *dev)
 +{
 +       struct dma_buf_attachment *attach;
 +       int ret;
 +
 +       BUG_ON(!dmabuf || !dev);
 +
 +       mutex_lock(dmabuf-lock);
 +

 There is no need to call kzalloc inside critical section protected by
 dmabuf-lock. The code would be simpler if the allocation is moved outside
 the section.
Yes, you're right; will correct it.

 +       attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
 +       if (attach == NULL)
 +               goto err_alloc;
 +
 +       attach-dev = dev;
 +       if (dmabuf-ops-attach) {
 +               ret = dmabuf-ops-attach(dmabuf, dev, attach);
 +               if (!ret)
 +                       goto err_attach;
 +       }
 +       list_add(attach-node,dmabuf-attachments);
 +
 +err_alloc:
 +       mutex_unlock(dmabuf-lock);
 +       return attach;
 +err_attach:
 +       kfree(attach);
 +       mutex_unlock(dmabuf-lock);
 +       return ERR_PTR(ret);
 +}
 +EXPORT_SYMBOL(dma_buf_attach);
 +
 +/**
 + * dma_buf_detach - Remove the given attachment from dmabuf's attachments
 list;
 + * optionally calls detach() of dma_buf_ops for device-specific detach
 + * @dmabuf:    [in]    buffer to detach from.
 + * @attach:    [in]    attachment to be detached; is free'd after this
 call.
 + *
 + */
 +void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment
 *attach)
 +{
 +       BUG_ON(!dmabuf || !attach);
 +
 +       mutex_lock(dmabuf-lock);
 +       list_del(attach-node);
 +       if (dmabuf-ops-detach)
 +               dmabuf-ops-detach(dmabuf, attach);
 +

 [as above]
Ok.

 +       kfree(attach);
 +       mutex_unlock(dmabuf-lock);
 +       return;
 +}
 +EXPORT_SYMBOL(dma_buf_detach);
 diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
 new file mode 100644
 index 000..5bdf16a
 --- /dev/null
 +++ b/include/linux/dma-buf.h
 @@ -0,0 +1,162 @@
 +/*
 + * Header file for dma buffer sharing framework.
 + *
 + * Copyright(C) 2011 Linaro Limited. All rights reserved.
 + * Author: Sumit Semwalsumit.sem...@ti.com
 + *
 + * Many thanks to linaro-mm-sig list, and specially
 + * Arnd Bergmanna...@arndb.de, Rob Clarkr...@ti.com  and
 + * Daniel Vetterdan...@ffwll.ch  for their support in creation and
 + * refining of this idea.
 + *
 + * This program 

Re: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-10-14 Thread Daniel Vetter
On Fri, Oct 14, 2011 at 12:00:58PM +0200, Tomasz Stanislawski wrote:
 +/**
 + * struct dma_buf_ops - operations possible on struct dma_buf
 + * @create: creates a struct dma_buf of a fixed size. Actual allocation
 + *  does not happen here.
 
 The 'create' ops is not present in dma_buf_ops.
 
 + * @attach: allows different devices to 'attach' themselves to the given
 + *  buffer. It might return -EBUSY to signal that backing storage
 + *  is already allocated and incompatible with the requirements
 + *  of requesting device. [optional]
 + * @detach: detach a given device from this buffer. [optional]
 + * @get_scatterlist: returns list of scatter pages allocated, increases
 + *   usecount of the buffer. Requires atleast one attach to be
 + *   called before. Returned sg list should already be mapped
 + *   into _device_ address space.
 
 You must add a comment that this call 'may sleep'.
 
 I like the get_scatterlist idea. It allows the exported to create a
 valid scatterlist for a client in a elegant way.
 
 I do not like this whole attachment idea. The problem is that
 currently there is no support in DMA framework for allocation for
 multiple devices. As long as no such a support exists, there is no
 generic way to handle attribute negotiations and buffer allocations
 that involve multiple devices. So the exporter drivers would have to
 implement more or less hacky solutions to handle memory requirements
 and choosing the device that allocated memory.
 
 Currently, AFAIK there is even no generic way for a driver to
 acquire its own DMA memory requirements.
 
 Therefore all logic hidden beneath 'attachment' is pointless. I
 think that support for attach/detach (and related stuff) should be
 postponed until support for multi-device allocation is added to DMA
 framework.

Imo we clearly need this to make the multi-device-driver with insane dma
requirements work on arm. And rewriting the buffer handling in
participating subsystem twice isn't really a great plan. I envision that
on platforms where we need this madness, the driver must call back to the
dma subsytem to create a dma_buf. The dma subsytem should be already aware
of all the requirements and hence should be able to handle them..

 I don't say the attachment list idea is wrong but adding attachment
 stuff creates an illusion that problem of multi-device allocations
 is somehow magically solved. We should not force the developers of
 exporter drivers to solve the problem that is not solvable yet.

Well, this is why we need to create a decent support infrastructure for
platforms (= arm madness) that needs this, so that device drivers and
subsystem don't need to invent that wheel on their own. Which as you point
out, they actually can't.

 The other problem are the APIs. For example, the V4L2 subsystem
 assumes that memory is allocated after successful VIDIOC_REQBUFS
 with V4L2_MEMORY_MMAP memory type. Therefore attach would be
 automatically followed by get_scatterlist, blocking possibility of
 any buffer migrations in future.

Well, pardon to break the news, but v4l needs to rework the buffer
handling. If you want to share buffers with a gpu driver, you _have_ to
life with the fact that gpus do fully dynamic buffer management, meaning:
- buffers get allocated and destroyed on the fly, meaning static reqbuf
  just went out the window (we obviously cache buffer objects and reuse
  them for performance, as long as the processing pipeline doesn't really
  change).
- buffers get moved around in memory, meaning you either need full-blown
  sync-objects with a callback to drivers to tear-down mappings on-demand,
  or every driver needs to guarnatee to call put_scatterlist in a
  reasonable short time. The latter is probably the more natural thing for
  v4l devices.

 The same situation happens if buffer sharing is added to framebuffer API.

You can fix that by using the gem/ttm infrastructure of drm (or whatever
the blob gpu drivers are using). Which is why I think fb should just die,
please.

 The buffer sharing mechanism is dedicated to improve cooperation
 between multiple APIs. Therefore the common denominator strategy
 should be applied that is buffer-creation == buffer-allocation.

No.

Really, there's just no way gpu's will be moving back to static buffer
management. And I know, for many use-cases we could get away with a bunch
of static buffers (e.g. a video processing pipe). But in drm-land even
scanout-buffers can get moved around - currently only when they're not
being used, but strictly speaking nothing prevents us from copying the
scanout to a new location and issueing a pageflip and so even move the
buffer around even when it's in use.

But let's look quickly at an OpenCL usecase, moving Gb's of date per
second around between the cpu and a bunch of add-on gpus (or other special
purpose processing units). We'd also need to extend dma_buf with sync
objects to make this work well, but there's 

Re: [RFC 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-10-14 Thread Rob Clark
On Fri, Oct 14, 2011 at 5:00 AM, Tomasz Stanislawski
t.stanisl...@samsung.com wrote:
 + * @attach: allows different devices to 'attach' themselves to the given
 + *         buffer. It might return -EBUSY to signal that backing storage
 + *         is already allocated and incompatible with the requirements
 + *         of requesting device. [optional]
 + * @detach: detach a given device from this buffer. [optional]
 + * @get_scatterlist: returns list of scatter pages allocated, increases
 + *                  usecount of the buffer. Requires atleast one attach
 to be
 + *                  called before. Returned sg list should already be
 mapped
 + *                  into _device_ address space.

 You must add a comment that this call 'may sleep'.

 I like the get_scatterlist idea. It allows the exported to create a valid
 scatterlist for a client in a elegant way.

 I do not like this whole attachment idea. The problem is that currently
 there is no support in DMA framework for allocation for multiple devices. As
 long as no such a support exists, there is no generic way to handle
 attribute negotiations and buffer allocations that involve multiple devices.
 So the exporter drivers would have to implement more or less hacky solutions
 to handle memory requirements and choosing the device that allocated memory.

 Currently, AFAIK there is even no generic way for a driver to acquire its
 own DMA memory requirements.

dev-dma_params (struct device_dma_parameters).. for example

it would need to be expanded a bit to have a way to say it needs to
be physically contiguous..


 Therefore all logic hidden beneath 'attachment' is pointless. I think that
 support for attach/detach (and related stuff) should be postponed until
 support for multi-device allocation is added to DMA framework.

 I don't say the attachment list idea is wrong but adding attachment stuff
 creates an illusion that problem of multi-device allocations is somehow
 magically solved. We should not force the developers of exporter drivers to
 solve the problem that is not solvable yet.

 The other problem are the APIs. For example, the V4L2 subsystem assumes that
 memory is allocated after successful VIDIOC_REQBUFS with V4L2_MEMORY_MMAP
 memory type. Therefore attach would be automatically followed by
 get_scatterlist, blocking possibility of any buffer migrations in future.

But this problem really only applies if v4l is your buffer allocator.
I don't think a v4l limitation is a valid argument to remove the
attachment stuff.

 The same situation happens if buffer sharing is added to framebuffer API.

 The buffer sharing mechanism is dedicated to improve cooperation between
 multiple APIs. Therefore the common denominator strategy should be applied
 that is buffer-creation == buffer-allocation.

I think it would be sufficient if buffer creators that cannot defer
the allocation just take a worst-case approach and allocate physically
contiguous buffers.  No need to penalize other potential buffer
allocators.  This allows buffer creators with more flexibility the
option for deferring the allocation until it knows whether the buffer
really needs to be contiguous.

 + * @put_scatterlist: decreases usecount of buffer, might deallocate
 scatter
 + *                  pages.
 + * @mmap: memory map this buffer - optional.
 + * @release: release this buffer; to be called after the last
 dma_buf_put.
 + * @sync_sg_for_cpu: sync the sg list for cpu.
 + * @sync_sg_for_device: synch the sg list for device.
 + */
 +struct dma_buf_ops {
 +       int (*attach)(struct dma_buf *, struct device *,
 +                       struct dma_buf_attachment *);
 +
 +       void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
 +
 +       /* For {get,put}_scatterlist below, any specific buffer attributes
 +        * required should get added to device_dma_parameters accessible
 +        * via dev-dma_params.
 +        */
 +       struct scatterlist * (*get_scatterlist)(struct dma_buf_attachment
 *,
 +                                               enum dma_data_direction,
 +                                               int *nents);
 +       void (*put_scatterlist)(struct dma_buf_attachment *,
 +                                               struct scatterlist *,
 +                                               int nents);
 +       /* TODO: Add interruptible and interruptible_timeout versions */

 I don't agree the interruptible and interruptible_timeout versions are
 needed. I think that get_scatterlist should alway be interruptible. You can
 add try_get_scatterlist callback that returns ERR_PTR(-EBUSY) if the call
 would be blocking.

 +
 +       /* allow mmap optionally for devices that need it */
 +       int (*mmap)(struct dma_buf *, struct vm_area_struct *);

 The mmap is not needed for inital version. It could be added at any time in
 the future. The dmabuf client should not be allowed to create mapping of the
 dmabuf from the scatterlist.

fwiw, this wasn't