Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-09 Thread Dave Airlie
On Mon, Jan 9, 2012 at 8:10 AM, Daniel Vetter dan...@ffwll.ch wrote:
 On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
 I has test dmabuf based drm gem module for exynos and I found one problem.
 you can refer to this test repository:
 http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf

 at this repository, I added some exception codes for resource release
 in addition to Dave's patch sets.

 let's suppose we use dmabuf based vb2 and drm gem with physically
 continuous memory(no IOMMU) and we try to share allocated buffer
 between them(v4l2 and drm driver).

 1. request memory allocation through drm gem interface.
 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
 gem object.
 - internally, private gem based dmabuf moudle calls drm_buf_export()
 to register allocated gem object to fd.
 3. request qbuf with the fd(got from 2) and DMABUF type to set the
 buffer to v4l2 based device.
 - internally, vb2 plug in module gets a buffer to the fd and then
 calls dmabuf-ops-map_dmabuf() callback to get the sg table
 containing physical memory info to the gem object. and then the
 physical memory info would be copied to vb2_xx_buf object.
 for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
 this repository:
 git://github.com/robclark/kernel-omap4.git drmplane-dmabuf

 after that, if v4l2 driver want to release vb2_xx_buf object with
 allocated memory region by user request, how should we do?. refcount
 to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
 object is released videobuf2 framework don't know who is using the
 physical memory region. so this physical memory region is released and
 when drm driver tries to access the region or to release it also, a
 problem would be induced.

 for this problem, I added get_shared_cnt() callback to dma-buf.h but
 I'm not sure that this is good way. maybe there may be better way.
 if there is any missing point, please let me know.

 The dma_buf object needs to hold a reference on the underlying
 (necessarily reference-counted) buffer object when the exporter creates
 the dma_buf handle. This reference should then get dropped in the
 exporters dma_buf-ops-release() function, which is only getting called
 when the last reference to the dma_buf disappears.

 If this doesn't work like that currently, we have a bug, and exporting the
 reference count or something similar can't fix that.

 Yours, Daniel

 PS: Please cut down the original mail when replying, otherwise it's pretty
 hard to find your response ;-)

And also the importer needs to realise it doesn't own the pages in the
sg_table and when its freeing its backing memory it shouldn't free
those pages. So for GEM objects we have to keep track if we allocated
the pages or we got them from an dma buf.

Dave.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-01 Thread Sakari Ailus
Hi Arnd,

On Tue, Dec 20, 2011 at 03:36:49PM +, Arnd Bergmann wrote:
 On Tuesday 20 December 2011, Sakari Ailus wrote:
  (I'm jumping into the discussion in the middle, and might miss something
  that has already been talked about. I still hope what I'm about to say is
  relevant. :-))
 
 It certainly is relevant.
 
  In subsystems such as V4L2 where drivers deal with such large buffers, the
  buffers stay mapped all the time. The user explicitly gives the control of
  the buffers to the driver and eventually gets them back. This is already
  part of those APIs, whether they're using dma_buf or not. The user could
  have, and often has, the same buffers mapped elsewhere.
 
 Do you normally use streaming (dma_{map,sync,unmap}_*) or consistent
 (dma_{alloc,free}_*) mappings for this then?

The OMAP 3 ISP driver I'm familiar with uses the OMAP 3 IOMMU / IOVMM API
which is to be replaced by dmabuf. I'm trying to understand how the dma
api / dma-buf should be used to achieve a superset of that functionality.

I think I'm interested in the DMA mapping API. I replied to Sumit's new
patchset, you're cc'd.

  When it comes to passing these buffers between different hardware devices,
  either V4L2 or not, the user might not want to perform extra cache flush
  when the buffer memory itself is not being touched by the CPU in the process
  at all. I'd consider it impossible for the driver to know how the user space
  intends to user the buffer.
 
 The easiest solution to this problem would be to only allow consistent 
 mappings
 to be shared using the dma_buf mechanism. That means we never have to flush.

Do you mean the memory would be non-cacheable? Accessing memory w/o caching
is typically prohibitively expensive, so I don't think this could ever be
the primary means to do the above.

In some cases non-cacheable can perform better, taking into account the time
which is required for flusing the cache and the other consequences of that,
but I still think it's more of an exception than a rule.

 If you don't need the CPU to touch the buffer, that would not have any cost
 at all, we could even have no kernel mapping at all instead of an uncached
 mapping on ARM.

I think in general creating unused mappings should really be avoided.
Creating them consumes time, effort at creation time and possibly also in
cache related operations.

  Flushing the cache is quite expensive: typically it's the best to flush the
  whole data cache when one needs to flush buffers. The V4L2 DQBUF and QBUF
  IOCTLs already have flags to suggest special cache handling for buffers.
 
 [sidenote: whether it makes sense to flush individual cache lines or the 
 entire
 cache is a decision best left to the architectures. On systems with larger
 caches than on ARM, e.g. 64MB instead of 512KB, you really want to keep
 the cache intact.]

That also depend on the buffer size and what the rest of the system is
doing. I could imagine buffer size, system memory data rate, CPU frequency,
cache line width and the properties of the cache all affect how fast both of
the operations are.

It would probably be possible to perform a heuristic analysis on this at
system startup similar to different software raid algorithm implementations
( e.g. to use MMX or SSE for sw raid).

Some additional complexity will arise from the fact that on some ARM machines
one must know all the CPU MMU mappings pointing to a piece of physical
memory to properly flush them, AFAIR. Naturally a good alternative on such
system is to pperform full dcache flush / cleaning.

Also, cache handling only affects systems without coherent cache. ARM CPUs
are finding their ways to servers as well, but I'd guess it'll still take a
while before we have ARM CPUs with 64 MiB of cache..

Kind regards,

-- 
Sakari Ailus
e-mail: sakari.ai...@iki.fi jabber/XMPP/Gmail: sai...@retiisi.org.uk
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2012-01-01 Thread Rob Clark
On Sun, Jan 1, 2012 at 2:53 PM, Sakari Ailus sakari.ai...@iki.fi wrote:
 Hi Arnd,

 On Tue, Dec 20, 2011 at 03:36:49PM +, Arnd Bergmann wrote:
 On Tuesday 20 December 2011, Sakari Ailus wrote:
  (I'm jumping into the discussion in the middle, and might miss something
  that has already been talked about. I still hope what I'm about to say is
  relevant. :-))

 It certainly is relevant.

  In subsystems such as V4L2 where drivers deal with such large buffers, the
  buffers stay mapped all the time. The user explicitly gives the control of
  the buffers to the driver and eventually gets them back. This is already
  part of those APIs, whether they're using dma_buf or not. The user could
  have, and often has, the same buffers mapped elsewhere.

 Do you normally use streaming (dma_{map,sync,unmap}_*) or consistent
 (dma_{alloc,free}_*) mappings for this then?

 The OMAP 3 ISP driver I'm familiar with uses the OMAP 3 IOMMU / IOVMM API
 which is to be replaced by dmabuf. I'm trying to understand how the dma
 api / dma-buf should be used to achieve a superset of that functionality.

 I think I'm interested in the DMA mapping API. I replied to Sumit's new
 patchset, you're cc'd.

  When it comes to passing these buffers between different hardware devices,
  either V4L2 or not, the user might not want to perform extra cache flush
  when the buffer memory itself is not being touched by the CPU in the 
  process
  at all. I'd consider it impossible for the driver to know how the user 
  space
  intends to user the buffer.

 The easiest solution to this problem would be to only allow consistent 
 mappings
 to be shared using the dma_buf mechanism. That means we never have to flush.

 Do you mean the memory would be non-cacheable? Accessing memory w/o caching
 is typically prohibitively expensive, so I don't think this could ever be
 the primary means to do the above.

 In some cases non-cacheable can perform better, taking into account the time
 which is required for flusing the cache and the other consequences of that,
 but I still think it's more of an exception than a rule.

I think we decided to completely leave cpu virtual mappings out of the
API to start with, because (a) we can already get significant value
out of the API without this, and (b) it is not quite clear how to
handle virtual mappings in a way that can deal with all cases.  For
now, userspace virtual mappings must come from the exporting device,
and kernel virtual mappings (if needed by the importing device) are
not supported.. although I think it is a smaller subset of devices
that might need a kernel virtual mapping.

This sidesteps the whole issue of cache handling, avoiding aliased
mappings, etc.  And leaves cpu access synchronization and cache
handling to be handled however the exporting device handles this
today.

BR,
-R

 If you don't need the CPU to touch the buffer, that would not have any cost
 at all, we could even have no kernel mapping at all instead of an uncached
 mapping on ARM.

 I think in general creating unused mappings should really be avoided.
 Creating them consumes time, effort at creation time and possibly also in
 cache related operations.

  Flushing the cache is quite expensive: typically it's the best to flush the
  whole data cache when one needs to flush buffers. The V4L2 DQBUF and QBUF
  IOCTLs already have flags to suggest special cache handling for buffers.

 [sidenote: whether it makes sense to flush individual cache lines or the 
 entire
 cache is a decision best left to the architectures. On systems with larger
 caches than on ARM, e.g. 64MB instead of 512KB, you really want to keep
 the cache intact.]

 That also depend on the buffer size and what the rest of the system is
 doing. I could imagine buffer size, system memory data rate, CPU frequency,
 cache line width and the properties of the cache all affect how fast both of
 the operations are.

 It would probably be possible to perform a heuristic analysis on this at
 system startup similar to different software raid algorithm implementations
 ( e.g. to use MMX or SSE for sw raid).

 Some additional complexity will arise from the fact that on some ARM machines
 one must know all the CPU MMU mappings pointing to a piece of physical
 memory to properly flush them, AFAIR. Naturally a good alternative on such
 system is to pperform full dcache flush / cleaning.

 Also, cache handling only affects systems without coherent cache. ARM CPUs
 are finding their ways to servers as well, but I'd guess it'll still take a
 while before we have ARM CPUs with 64 MiB of cache..

 Kind regards,

 --
 Sakari Ailus
 e-mail: sakari.ai...@iki.fi     jabber/XMPP/Gmail: sai...@retiisi.org.uk
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a 

Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-23 Thread Semwal, Sumit
On Wed, Dec 21, 2011 at 10:57 PM, Arnd Bergmann a...@arndb.de wrote:
 On Tuesday 20 December 2011, Daniel Vetter wrote:
  I'm thinking for a first version, we can get enough mileage out of it by 
  saying:
  1) only exporter can mmap to userspace
  2) only importers that do not need CPU access to buffer..

Thanks Rob - and the exporter can do the mmap outside of dma-buf
usage, right? I mean, we don't need to provide an mmap to dma_buf()
and restrict it to exporter, when the exporter has more 'control' of
the buffer anyways.

BR,
~Sumit.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-23 Thread Rob Clark
On Fri, Dec 23, 2011 at 4:00 AM, Semwal, Sumit sumit.sem...@ti.com wrote:
 On Wed, Dec 21, 2011 at 10:57 PM, Arnd Bergmann a...@arndb.de wrote:
 On Tuesday 20 December 2011, Daniel Vetter wrote:
  I'm thinking for a first version, we can get enough mileage out of it by 
  saying:
  1) only exporter can mmap to userspace
  2) only importers that do not need CPU access to buffer..

 Thanks Rob - and the exporter can do the mmap outside of dma-buf
 usage, right?

Yes

 I mean, we don't need to provide an mmap to dma_buf()
 and restrict it to exporter, when the exporter has more 'control' of
 the buffer anyways.

No, if it is only for the exporter, it really doesn't need to be in
dmabuf (ie. the exporter already knows how he is)

BR,
-R


 BR,
 ~Sumit.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-21 Thread Arnd Bergmann
On Tuesday 20 December 2011, Daniel Vetter wrote:
  I'm thinking for a first version, we can get enough mileage out of it by 
  saying:
  1) only exporter can mmap to userspace
  2) only importers that do not need CPU access to buffer..

Ok, that sounds possible. The alternative to this would be:

1) The exporter has to use dma_alloc_coherent() or dma_alloc_writecombine()
to allocate the buffer
2. Every user space mapping has to go through dma_mmap_coherent()
or dma_mmap_writecombine()

We can extend the rules later to allow either one after we have gained
some experience using it.

  This way we can get dmabuf into the kernel, maybe even for 3.3.  I
  know there are a lot of interesting potential uses where this stripped
  down version is good enough.  It probably isn't the final version,
  maybe more features are added over time to deal with importers that
  need CPU access to buffer, sync object, etc.  But we have to start
  somewhere.
 
 I agree with Rob here - I think especially for the coherency discussion
 some actual users of dma_buf on a bunch of insane platforms (i915
 qualifies here too, because we do some cacheline flushing behind everyones
 back) would massively help in clarifying things.

Yes, agreed.

 It also sounds like that at least for proper userspace mmap support we'd
 need some dma api extensions on at least arm, and that might take a while
 ...

I think it's actually the opposite -- you'd need dma api extensions on
everything else other than arm, which already has dma_mmap_coherent()
and dma_mmap_writecombine() for this purpose.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-21 Thread Daniel Vetter
On Wed, Dec 21, 2011 at 05:27:16PM +, Arnd Bergmann wrote:
 On Tuesday 20 December 2011, Daniel Vetter wrote:
  It also sounds like that at least for proper userspace mmap support we'd
  need some dma api extensions on at least arm, and that might take a while
  ...
 
 I think it's actually the opposite -- you'd need dma api extensions on
 everything else other than arm, which already has dma_mmap_coherent()
 and dma_mmap_writecombine() for this purpose.

Yeah, that's actually what I wanted to say, but failed at ... Another
thing is that at least for i915, _writecombine isn't what we want actually
because:
- It won't work anyway cause i915 maps stuff cached and does the flushing
  itself and x86 PAT doesn't support mixed mappings (kinda like arm).
- It isn't actually enough, there's another hidden buffer between the
  memory controller interface and the gpu that i915 manually flushes
  (because even a readback on a wc mapping doesn't flush things in there).

So I assume we'll have plenty of funny beating out a good api for cpu
access ;-)

Cheers, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-20 Thread Arnd Bergmann
On Tuesday 20 December 2011, Sakari Ailus wrote:
 (I'm jumping into the discussion in the middle, and might miss something
 that has already been talked about. I still hope what I'm about to say is
 relevant. :-))

It certainly is relevant.

 In subsystems such as V4L2 where drivers deal with such large buffers, the
 buffers stay mapped all the time. The user explicitly gives the control of
 the buffers to the driver and eventually gets them back. This is already
 part of those APIs, whether they're using dma_buf or not. The user could
 have, and often has, the same buffers mapped elsewhere.

Do you normally use streaming (dma_{map,sync,unmap}_*) or consistent
(dma_{alloc,free}_*) mappings for this then?

 When it comes to passing these buffers between different hardware devices,
 either V4L2 or not, the user might not want to perform extra cache flush
 when the buffer memory itself is not being touched by the CPU in the process
 at all. I'd consider it impossible for the driver to know how the user space
 intends to user the buffer.

The easiest solution to this problem would be to only allow consistent mappings
to be shared using the dma_buf mechanism. That means we never have to flush.
If you don't need the CPU to touch the buffer, that would not have any cost
at all, we could even have no kernel mapping at all instead of an uncached
mapping on ARM.

 Flushing the cache is quite expensive: typically it's the best to flush the
 whole data cache when one needs to flush buffers. The V4L2 DQBUF and QBUF
 IOCTLs already have flags to suggest special cache handling for buffers.

[sidenote: whether it makes sense to flush individual cache lines or the entire
cache is a decision best left to the architectures. On systems with larger
caches than on ARM, e.g. 64MB instead of 512KB, you really want to keep
the cache intact.]

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-20 Thread Arnd Bergmann
On Monday 19 December 2011, Semwal, Sumit wrote:
 I didn't see a consensus on whether dma_buf should enforce some form
 of serialization within the API - so atleast for v1 of dma-buf, I
 propose to 'not' impose a restriction, and we can tackle it (add new
 ops or enforce as design?) whenever we see the first need of it - will
 that be ok? [I am bending towards the thought that it is a problem to
 solve at a bigger platform than dma_buf.]

The problem is generally understood for streaming mappings with a 
single device using it: if you have a long-running mapping, you have
to use dma_sync_*. This obviously falls apart if you have multiple
devices and no serialization between the accesses.

If you don't want serialization, that implies that we cannot have
use the  dma_sync_* API on the buffer, which in turn implies that
we cannot have streaming mappings. I think that's ok, but then
you have to bring back the mmap API on the buffer if you want to
allow any driver to provide an mmap function for a shared buffer.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-20 Thread Rob Clark
On Tue, Dec 20, 2011 at 9:41 AM, Arnd Bergmann a...@arndb.de wrote:
 On Monday 19 December 2011, Semwal, Sumit wrote:
 I didn't see a consensus on whether dma_buf should enforce some form
 of serialization within the API - so atleast for v1 of dma-buf, I
 propose to 'not' impose a restriction, and we can tackle it (add new
 ops or enforce as design?) whenever we see the first need of it - will
 that be ok? [I am bending towards the thought that it is a problem to
 solve at a bigger platform than dma_buf.]

 The problem is generally understood for streaming mappings with a
 single device using it: if you have a long-running mapping, you have
 to use dma_sync_*. This obviously falls apart if you have multiple
 devices and no serialization between the accesses.

 If you don't want serialization, that implies that we cannot have
 use the  dma_sync_* API on the buffer, which in turn implies that
 we cannot have streaming mappings. I think that's ok, but then
 you have to bring back the mmap API on the buffer if you want to
 allow any driver to provide an mmap function for a shared buffer.

I'm thinking for a first version, we can get enough mileage out of it by saying:
1) only exporter can mmap to userspace
2) only importers that do not need CPU access to buffer..

This way we can get dmabuf into the kernel, maybe even for 3.3.  I
know there are a lot of interesting potential uses where this stripped
down version is good enough.  It probably isn't the final version,
maybe more features are added over time to deal with importers that
need CPU access to buffer, sync object, etc.  But we have to start
somewhere.

BR,
-R

        Arnd
 --
 To unsubscribe from this list: send the line unsubscribe linux-media in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-20 Thread Daniel Vetter
On Tue, Dec 20, 2011 at 10:41:45AM -0600, Rob Clark wrote:
 On Tue, Dec 20, 2011 at 9:41 AM, Arnd Bergmann a...@arndb.de wrote:
  On Monday 19 December 2011, Semwal, Sumit wrote:
  I didn't see a consensus on whether dma_buf should enforce some form
  of serialization within the API - so atleast for v1 of dma-buf, I
  propose to 'not' impose a restriction, and we can tackle it (add new
  ops or enforce as design?) whenever we see the first need of it - will
  that be ok? [I am bending towards the thought that it is a problem to
  solve at a bigger platform than dma_buf.]
 
  The problem is generally understood for streaming mappings with a
  single device using it: if you have a long-running mapping, you have
  to use dma_sync_*. This obviously falls apart if you have multiple
  devices and no serialization between the accesses.
 
  If you don't want serialization, that implies that we cannot have
  use the  dma_sync_* API on the buffer, which in turn implies that
  we cannot have streaming mappings. I think that's ok, but then
  you have to bring back the mmap API on the buffer if you want to
  allow any driver to provide an mmap function for a shared buffer.
 
 I'm thinking for a first version, we can get enough mileage out of it by 
 saying:
 1) only exporter can mmap to userspace
 2) only importers that do not need CPU access to buffer..
 
 This way we can get dmabuf into the kernel, maybe even for 3.3.  I
 know there are a lot of interesting potential uses where this stripped
 down version is good enough.  It probably isn't the final version,
 maybe more features are added over time to deal with importers that
 need CPU access to buffer, sync object, etc.  But we have to start
 somewhere.

I agree with Rob here - I think especially for the coherency discussion
some actual users of dma_buf on a bunch of insane platforms (i915
qualifies here too, because we do some cacheline flushing behind everyones
back) would massively help in clarifying things.

It also sounds like that at least for proper userspace mmap support we'd
need some dma api extensions on at least arm, and that might take a while
...

Cheers, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-19 Thread Robert Morell
On Tue, Dec 13, 2011 at 07:10:02AM -0800, Arnd Bergmann wrote:
 On Monday 12 December 2011, Robert Morell wrote:
   
   Doing a buffer sharing with something that is not GPL is not fun, as, if 
   any
   issue rises there, it would be impossible to discover if the problem is 
   either
   at the closed-source driver or at the open source one. At the time I was 
   using
   the Nvidia proprietary driver, it was very common to have unexplained 
   issues
   caused likely by bad code there at the buffer management code, causing X
   applications and extensions (like xv) to break.
  
   We should really make this EXPORT_SYMBOL_GPL(), in order to be able to 
   latter
   debug future share buffer issues, when needed.
  
  Sorry, I don't buy this argument.  Making these exports GPL-only is not
  likely to cause anybody to open-source their driver, but will rather
  just cause them to use yet more closed-source code that is even less
  debuggable than this would be, to those without access to the source.
 
 But at least the broken module then won't be interacting with other modules
 because it cannot share any buffers.

One of the goals of this project is to unify the fragmented space of the
ARM SoC memory managers so that each vendor doesn't implement their own,
and they can all be closer to mainline.

I fear that restricting the use of this buffer sharing mechanism to GPL
drivers only will prevent that goal from being achieved, if an SoC
driver has to interact with modules that use a non-GPL license.

As a hypothetical example, consider laptops that have multiple GPUs.
Today, these ship with onboard graphics (integrated to the CPU or
chipset) along with a discrete GPU, where in many cases only the onboard
graphics can actually display to the screen.  In order for anything
rendered by the discrete GPU to be displayed, it has to be copied to
memory available for the smaller onboard graphics to texture from or
display directly.  Obviously, that's best done by sharing dma buffers
rather than bouncing them through the GPU.  It's not much of a stretch
to imagine that we'll see such systems with a Tegra CPU/GPU plus a
discrete GPU in the future; in that case, we'd want to be able to share
memory between the discrete GPU and the Tegra system.  In that scenario,
if this interface is GPL-only, we'd be unable to adopt the dma_buffer
sharing mechanism for Tegra.

(This isn't too pie-in-the-sky, either; people are already combining
Tegra with discrete GPUs:
http://blogs.nvidia.com/2011/11/world%e2%80%99s-first-arm-based-supercomputer-to-launch-in-barcelona/
)

Thanks,
Robert
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-13 Thread Hans Verkuil
(I've been away for the past two weeks, so I'm only now catching up)


On Thursday 08 December 2011 22:44:08 Daniel Vetter wrote:
 On Wed, Dec 7, 2011 at 14:40, Arnd Bergmann a...@arndb.de wrote:
  On Wednesday 07 December 2011, Semwal, Sumit wrote:
  Thanks for the excellent discussion - it indeed is very good learning
  for the relatively-inexperienced me :)
  
  So, for the purpose of dma-buf framework, could I summarize the
  following and rework accordingly?:
  1. remove mmap() dma_buf_op [and mmap fop], and introduce cpu_start(),
  cpu_finish() ops to bracket cpu accesses to the buffer. Also add
  DMABUF_CPU_START / DMABUF_CPU_FINI IOCTLs?
  
  I think we'd be better off for now without the extra ioctls and
  just document that a shared buffer must not be exported to user
  space using mmap at all, to avoid those problems. Serialization
  between GPU and CPU is on a higher level than the dma_buf framework
  IMHO.
 
 Agreed.
 
  2. remove sg_sync* ops for now (and we'll see if we need to add them
  later if needed)
  
  Just removing the sg_sync_* operations is not enough. We have to make
  the decision whether we want to allow
  a) only coherent mappings of the buffer into kernel memory (requiring
  an extension to the dma_map_ops on ARM to not flush caches at map/unmap
  time)
  b) not allowing any in-kernel mappings (same requirement on ARM, also
  limits the usefulness of the dma_buf if we cannot access it from the
  kernel or from user space)
  c) only allowing streaming mappings, even if those are non-coherent
  (requiring strict serialization between CPU (in-kernel) and dma users of
  the buffer)
 
 I think only allowing streaming access makes the most sense:
 - I don't see much (if any need) for the kernel to access a dma_buf -
 in all current usecases it just contains pixel data and no hw-specific
 things (like sg tables, command buffers, ..). At most I see the need
 for the kernel to access the buffer for dma bounce buffers, but that
 is internal to the dma subsystem (and hence does not need to be
 exposed).

There are a few situations where the kernel might actually access a dma_buf:

First of all there are some sensors that add meta data before the actual
pixel data, and a kernel driver might well want to read out that data and
process it. Secondly (and really very similar), video frames sent to/from
an FPGA can also contain meta data (Cisco does that on some of our products)
that the kernel may need to inspect.

I admit that these use-cases aren't very common, but they do exist.

 - Userspace can still access the contents through the exporting
 subsystem (e.g. use some gem mmap support). For efficiency reason gpu
 drivers are already messing around with cache coherency in a platform
 specific way (and hence violated the dma api a bit), so we could stuff
 the mmap coherency in there, too. When we later on extend dma_buf
 support so that other drivers than the gpu can export dma_bufs, we can
 then extend the official dma api with already a few drivers with
 use-patterns around.
 
 But I still think that the kernel must not be required to enforce
 correct access ordering for the reasons outlined in my other mail.

I agree with Daniel on this.

BTW, the V4L2 subsystem has a clear concept of passing bufffer ownership: the
VIDIOC_QBUF and VIDIOC_DQBUF ioctls deal with that. Pretty much all V4L2 apps 
request the buffers, then mmap them, then call QBUF to give the ownership of 
those buffers to the kernel. While the kernel owns those buffers any access to 
the mmap'ped memory leads to undefined results. Only after calling DQBUF can 
userspace actually safely access that memory.

Allowing mmap() on the dma_buf's fd would actually make things easier for 
V4L2. It's an elegant way of mapping the memory.

Regards,

Hans
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-13 Thread Arnd Bergmann
On Monday 12 December 2011, Robert Morell wrote:
  
  Doing a buffer sharing with something that is not GPL is not fun, as, if any
  issue rises there, it would be impossible to discover if the problem is 
  either
  at the closed-source driver or at the open source one. At the time I was 
  using
  the Nvidia proprietary driver, it was very common to have unexplained issues
  caused likely by bad code there at the buffer management code, causing X
  applications and extensions (like xv) to break.
 
  We should really make this EXPORT_SYMBOL_GPL(), in order to be able to 
  latter
  debug future share buffer issues, when needed.
 
 Sorry, I don't buy this argument.  Making these exports GPL-only is not
 likely to cause anybody to open-source their driver, but will rather
 just cause them to use yet more closed-source code that is even less
 debuggable than this would be, to those without access to the source.

But at least the broken module then won't be interacting with other modules
because it cannot share any buffers.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-12 Thread Arnd Bergmann
On Saturday 10 December 2011, Daniel Vetter wrote:
 If userspace (through some driver calls)
 tries to do stupid things, it'll just get garbage. See
 Message-ID: 
 cakmk7uhexyn-v_8cmplnwsfy14ktmurzy8yrkr5xst2-2wd...@mail.gmail.com
 for my reasons why it think this is the right way to go forward. So in
 essence I'm really interested in the reasons why you want the kernel
 to enforce this (or I'm completely missing what's the contentious
 issue here).

This has nothing to do with user space mappings. Whatever user space does,
you get garbage if you don't invalidate cache lines that were introduced
through speculative prefetching before you access cache lines that were
DMA'd from a device.

Arnd


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-12 Thread Robert Morell
On Sat, Dec 10, 2011 at 03:13:06AM -0800, Mauro Carvalho Chehab wrote:
 On 09-12-2011 20:50, Robert Morell wrote:
  On Mon, Dec 05, 2011 at 09:18:48AM -0800, Arnd Bergmann wrote:
  On Friday 02 December 2011, Sumit Semwal wrote:
  This is the first step in defining a dma buffer sharing mechanism.
 
  [...]
 
  + return dmabuf;
  +}
  +EXPORT_SYMBOL(dma_buf_export);
 
  I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
  because it's really a low-level function that I would expect
  to get used by in-kernel subsystems providing the feature to
  users and having back-end drivers, but it's not the kind of thing
  we want out-of-tree drivers to mess with.
 
  Is this really necessary?  If this is intended to be a
  lowest-common-denominator between many drivers to allow buffer sharing,
  it seems like it needs to be able to be usable by all drivers.
 
  If the interface is not accessible then I fear many drivers will be
  forced to continue to roll their own buffer sharing mechanisms (which is
  exactly what we're trying to avoid here, needless to say).
 
 Doing a buffer sharing with something that is not GPL is not fun, as, if any
 issue rises there, it would be impossible to discover if the problem is either
 at the closed-source driver or at the open source one. At the time I was using
 the Nvidia proprietary driver, it was very common to have unexplained issues
 caused likely by bad code there at the buffer management code, causing X
 applications and extensions (like xv) to break.

 We should really make this EXPORT_SYMBOL_GPL(), in order to be able to latter
 debug future share buffer issues, when needed.

Sorry, I don't buy this argument.  Making these exports GPL-only is not
likely to cause anybody to open-source their driver, but will rather
just cause them to use yet more closed-source code that is even less
debuggable than this would be, to those without access to the source.

Thanks,
Robert

 Regards,
 Mauro
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-10 Thread Mauro Carvalho Chehab

On 09-12-2011 20:50, Robert Morell wrote:

On Mon, Dec 05, 2011 at 09:18:48AM -0800, Arnd Bergmann wrote:

On Friday 02 December 2011, Sumit Semwal wrote:

This is the first step in defining a dma buffer sharing mechanism.



[...]



+   return dmabuf;
+}
+EXPORT_SYMBOL(dma_buf_export);


I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
because it's really a low-level function that I would expect
to get used by in-kernel subsystems providing the feature to
users and having back-end drivers, but it's not the kind of thing
we want out-of-tree drivers to mess with.


Is this really necessary?  If this is intended to be a
lowest-common-denominator between many drivers to allow buffer sharing,
it seems like it needs to be able to be usable by all drivers.

If the interface is not accessible then I fear many drivers will be
forced to continue to roll their own buffer sharing mechanisms (which is
exactly what we're trying to avoid here, needless to say).


Doing a buffer sharing with something that is not GPL is not fun, as, if any
issue rises there, it would be impossible to discover if the problem is either
at the closed-source driver or at the open source one. At the time I was using
the Nvidia proprietary driver, it was very common to have unexplained issues
caused likely by bad code there at the buffer management code, causing X
applications and extensions (like xv) to break.

We should really make this EXPORT_SYMBOL_GPL(), in order to be able to latter
debug future share buffer issues, when needed.

Regards,
Mauro
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-09 Thread Arnd Bergmann
On Thursday 08 December 2011, Daniel Vetter wrote:
  c) only allowing streaming mappings, even if those are non-coherent
  (requiring strict serialization between CPU (in-kernel) and dma users of
  the buffer)
 
 I think only allowing streaming access makes the most sense:
 - I don't see much (if any need) for the kernel to access a dma_buf -
 in all current usecases it just contains pixel data and no hw-specific
 things (like sg tables, command buffers, ..). At most I see the need
 for the kernel to access the buffer for dma bounce buffers, but that
 is internal to the dma subsystem (and hence does not need to be
 exposed).
 - Userspace can still access the contents through the exporting
 subsystem (e.g. use some gem mmap support). For efficiency reason gpu
 drivers are already messing around with cache coherency in a platform
 specific way (and hence violated the dma api a bit), so we could stuff
 the mmap coherency in there, too. When we later on extend dma_buf
 support so that other drivers than the gpu can export dma_bufs, we can
 then extend the official dma api with already a few drivers with
 use-patterns around.
 
 But I still think that the kernel must not be required to enforce
 correct access ordering for the reasons outlined in my other mail.

I still don't think that's possible. Please explain how you expect
to change the semantics of the streaming mapping API to allow multiple
mappers without having explicit serialization points that are visible
to all users. For simplicity, let's assume a cache coherent system
with bounce buffers where map() copies the buffer to a dma area
and unmap() copies it back to regular kernel memory. How does a driver
know if it can touch the buffer in memory or from DMA at any given
point in time? Note that this problem is the same as the cache coherency
problem but may be easier to grasp.

Arnd
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-09 Thread Alan Cox
 I still don't think that's possible. Please explain how you expect
 to change the semantics of the streaming mapping API to allow multiple
 mappers without having explicit serialization points that are visible
 to all users. For simplicity, let's assume a cache coherent system

I would agree. It's not just about barriers but in many cases where you
have multiple mappings by hardware devices the actual hardware interface
is specific to the devices. Just take a look at the fencing in TTM and
the graphics drivers.

Its not something the low level API can deal with, it requires high level
knowledge.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-09 Thread Robert Morell
On Mon, Dec 05, 2011 at 09:18:48AM -0800, Arnd Bergmann wrote:
 On Friday 02 December 2011, Sumit Semwal wrote:
  This is the first step in defining a dma buffer sharing mechanism.
 
[...]
 
  +   return dmabuf;
  +}
  +EXPORT_SYMBOL(dma_buf_export);
 
 I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
 because it's really a low-level function that I would expect
 to get used by in-kernel subsystems providing the feature to
 users and having back-end drivers, but it's not the kind of thing
 we want out-of-tree drivers to mess with.

Is this really necessary?  If this is intended to be a
lowest-common-denominator between many drivers to allow buffer sharing,
it seems like it needs to be able to be usable by all drivers.

If the interface is not accessible then I fear many drivers will be
forced to continue to roll their own buffer sharing mechanisms (which is
exactly what we're trying to avoid here, needless to say).

- Robert
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-09 Thread Daniel Vetter
On Fri, Dec 9, 2011 at 15:24, Alan Cox a...@lxorguk.ukuu.org.uk wrote:
 I still don't think that's possible. Please explain how you expect
 to change the semantics of the streaming mapping API to allow multiple
 mappers without having explicit serialization points that are visible
 to all users. For simplicity, let's assume a cache coherent system

I think I understand the cache flushing issues on arm (we're doing a
similar madness with manually flushing caches for i915 because the gpu
isn't coherent with the cpu and other dma devices). And I also agree
that you cannot make concurrent mappings work without adding something
on top of the current streaming api/dma-buf proposal. I just think
that it's not the kernel's business (well, at least not dma_buf's
business) to enforce that. If userspace (through some driver calls)
tries to do stupid things, it'll just get garbage. See
Message-ID: cakmk7uhexyn-v_8cmplnwsfy14ktmurzy8yrkr5xst2-2wd...@mail.gmail.com
for my reasons why it think this is the right way to go forward. So in
essence I'm really interested in the reasons why you want the kernel
to enforce this (or I'm completely missing what's the contentious
issue here).

 I would agree. It's not just about barriers but in many cases where you
 have multiple mappings by hardware devices the actual hardware interface
 is specific to the devices. Just take a look at the fencing in TTM and
 the graphics drivers.

 Its not something the low level API can deal with, it requires high level
 knowledge.

Yes, we might want to add some form of in-kernel sync objects on top
of dma_buf, but I'm not really convinced that this will buy us much
above simply synchronizing access in userspace with the existing ipc
tools. In my experience the expensive part of syncing two graphics
engines with the cpu is that we wake up the cpu and prevent it from
going into low-power states if we do this too often. Adding some more
overhead by going through userspace doesn't really make it much worse.
It's a completely different story if there's a hw facility to
synchronize engines without the cpu's involvement (like there is on
every multi-pipe gpu) and there sync objects make tons of sense. But
I'm not aware of that existing on SoCs between different IP blocks.

Cheers, Daniel
-- 
Daniel Vetter
daniel.vet...@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism

2011-12-08 Thread Daniel Vetter
On Wed, Dec 7, 2011 at 14:40, Arnd Bergmann a...@arndb.de wrote:
 On Wednesday 07 December 2011, Semwal, Sumit wrote:
 Thanks for the excellent discussion - it indeed is very good learning
 for the relatively-inexperienced me :)

 So, for the purpose of dma-buf framework, could I summarize the
 following and rework accordingly?:
 1. remove mmap() dma_buf_op [and mmap fop], and introduce cpu_start(),
 cpu_finish() ops to bracket cpu accesses to the buffer. Also add
 DMABUF_CPU_START / DMABUF_CPU_FINI IOCTLs?

 I think we'd be better off for now without the extra ioctls and
 just document that a shared buffer must not be exported to user
 space using mmap at all, to avoid those problems. Serialization
 between GPU and CPU is on a higher level than the dma_buf framework
 IMHO.

Agreed.

 2. remove sg_sync* ops for now (and we'll see if we need to add them
 later if needed)

 Just removing the sg_sync_* operations is not enough. We have to make
 the decision whether we want to allow
 a) only coherent mappings of the buffer into kernel memory (requiring
 an extension to the dma_map_ops on ARM to not flush caches at map/unmap
 time)
 b) not allowing any in-kernel mappings (same requirement on ARM, also
 limits the usefulness of the dma_buf if we cannot access it from the
 kernel or from user space)
 c) only allowing streaming mappings, even if those are non-coherent
 (requiring strict serialization between CPU (in-kernel) and dma users of
 the buffer)

I think only allowing streaming access makes the most sense:
- I don't see much (if any need) for the kernel to access a dma_buf -
in all current usecases it just contains pixel data and no hw-specific
things (like sg tables, command buffers, ..). At most I see the need
for the kernel to access the buffer for dma bounce buffers, but that
is internal to the dma subsystem (and hence does not need to be
exposed).
- Userspace can still access the contents through the exporting
subsystem (e.g. use some gem mmap support). For efficiency reason gpu
drivers are already messing around with cache coherency in a platform
specific way (and hence violated the dma api a bit), so we could stuff
the mmap coherency in there, too. When we later on extend dma_buf
support so that other drivers than the gpu can export dma_bufs, we can
then extend the official dma api with already a few drivers with
use-patterns around.

But I still think that the kernel must not be required to enforce
correct access ordering for the reasons outlined in my other mail.
-Daniel
-- 
Daniel Vetter
daniel.vet...@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html