Re: Use of pci_map_page in nouveau, radeon TTM.

2013-10-03 Thread Alex Ivanov
01.10.2013, 18:16, "Konrad Rzeszutek Wilk" :
> On Tue, Oct 01, 2013 at 12:16:16PM +0200, Thomas Hellstrom wrote:
>
>>  Jerome, Konrad
>>
>>  Forgive an ignorant question, but it appears like both Nouveau and
>>  Radeon may use pci_map_page() when populating TTMs on
>>  pages obtained using the ordinary (not DMA pool). These pages will,
>>  if I understand things correctly, not be pages allocated with
>>  DMA_ALLOC_COHERENT.
>
> Not always. That depends if the SWIOTLB buffer has been enabled.
> Which happens if you have Calgary IOMMU, AMD GART and if you
> run under Xen.
>
>>  From what I understand, at least for the corresponding
>>  dma_map_page() it's illegal for the CPU to access these pages
>>  without calling
>>  dma_sync_xx_for_cpu(). And before the device is allowed to access
>>  them again, you need to call dma_sync_xx_for_device().
>
> Correct.
>
>>  So mapping for PCI really invalidates the TTM interleaved CPU /
>>  device access model.
>
> Unless you use the TTM DMA one which allocates them from the
> coherent pool - in which case they are already mapped.
>
> Granted the part of using DMA export/import API is not finished
> (so moving from TTM pool to a V4L for example) and it will blow
> up with the right mix.
>
>>  Or did I miss something here?
>
> That is it. But for most of the use cases the drivers have been
> able to skirt this restriction b/c the pci_map_page/pci_unmap_page
> setup a DMA mapping that is static (until the pci_unmap_page) and
> on x86 the memory is coherent. So the map is good irregardless
> of the PCI devices. Naturally if you have multitple IOMMUs per bridge
> this all falls apart :-(
>
> This all falls flat also with non-coherent memory and I believe
> that is what some of the PA-RISC folks are hitting their heads
> against.

Konrad,

As i already answered you, this is irrelevant to the 2.0 version
of PA-RISC architecture on which we run our ATI video options.

>
> And probably also on ARM once they start using these chipsets.
>
>>  Thanks,
>>  Thomas
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Use of pci_map_page in nouveau, radeon TTM.

2013-10-01 Thread Konrad Rzeszutek Wilk
On Tue, Oct 01, 2013 at 12:16:16PM +0200, Thomas Hellstrom wrote:
> Jerome, Konrad
> 
> Forgive an ignorant question, but it appears like both Nouveau and
> Radeon may use pci_map_page() when populating TTMs on
> pages obtained using the ordinary (not DMA pool). These pages will,
> if I understand things correctly, not be pages allocated with
> DMA_ALLOC_COHERENT.

Not always. That depends if the SWIOTLB buffer has been enabled.
Which happens if you have Calgary IOMMU, AMD GART and if you
run under Xen.
> 
> From what I understand, at least for the corresponding
> dma_map_page() it's illegal for the CPU to access these pages
> without calling
> dma_sync_xx_for_cpu(). And before the device is allowed to access
> them again, you need to call dma_sync_xx_for_device().

Correct.

> So mapping for PCI really invalidates the TTM interleaved CPU /
> device access model.

Unless you use the TTM DMA one which allocates them from the
coherent pool - in which case they are already mapped.

Granted the part of using DMA export/import API is not finished
(so moving from TTM pool to a V4L for example) and it will blow
up with the right mix.
> 
> Or did I miss something here?

That is it. But for most of the use cases the drivers have been
able to skirt this restriction b/c the pci_map_page/pci_unmap_page
setup a DMA mapping that is static (until the pci_unmap_page) and
on x86 the memory is coherent. So the map is good irregardless
of the PCI devices. Naturally if you have multitple IOMMUs per bridge
this all falls apart :-(

This all falls flat also with non-coherent memory and I believe
that is what some of the PA-RISC folks are hitting their heads
against.

And probably also on ARM once they start using these chipsets.

> 
> Thanks,
> Thomas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Use of pci_map_page in nouveau, radeon TTM.

2013-10-01 Thread Lucas Stach
Am Dienstag, den 01.10.2013, 13:13 +0200 schrieb Thomas Hellstrom:
> On 10/01/2013 12:34 PM, Lucas Stach wrote:
> > Am Dienstag, den 01.10.2013, 12:16 +0200 schrieb Thomas Hellstrom:
> >> Jerome, Konrad
> >>
> >> Forgive an ignorant question, but it appears like both Nouveau and
> >> Radeon may use pci_map_page() when populating TTMs on
> >> pages obtained using the ordinary (not DMA pool). These pages will, if I
> >> understand things correctly, not be pages allocated with
> >> DMA_ALLOC_COHERENT.
> >>
> >>   From what I understand, at least for the corresponding dma_map_page()
> >> it's illegal for the CPU to access these pages without calling
> >> dma_sync_xx_for_cpu(). And before the device is allowed to access them
> >> again, you need to call dma_sync_xx_for_device().
> >> So mapping for PCI really invalidates the TTM interleaved CPU / device
> >> access model.
> >>
> > That's right. The API says you need to sync for device or cpu, but on
> > x86 you can get away with not doing so, as on x86 the calls end up just
> > being WB buffer flushes.
> 
> OK, but what about the cases where the dma subsystem allocates a bounce 
> buffer?
> (Although I think the TTM page selection works around this situation).
> Perhaps at the very least this deserves a comment in the code...

Not doing the the sync_for_* is always a violation of the dma-mapping
API and will rightfully fail on systems relying on those mechanisms to
do proper dma memory handling, bounce buffers are just one of those
cases.

> >
> > For ARM, or similar non-coherent arches you absolutely have to do the
> > syncs, or you'll end up with different contents in cache vs sysram. For
> > my nouveau on ARM work I introduced some simple helpers to do the right
> > thing. And it really isn't hard doing the syncs at the right points in
> > time, just sync for CPU when getting a cpu_prep ioctl and then sync for
> > device when validating a buffer for GPU use.
> 
> Yes, this will probably work for drivers where a buffer is either bound 
> for CPU or for GPU,
> however, on drivers using user-space sub-allocation of buffers, or for 
> partial updates of
> vertex buffers etc. that isn't sufficient. In that case one either has 
> to use coherent memory
> or implement an elaborate scheme where we sync for device and kill 
> user-space mappings on validation and
> sync for cpu in the cpu fault handler. Unfortunately the latter triggers 
> a fence wait for the
> whole buffer, not just the part of the buffer we want to write to.
> >
Yeah, either you have to use dma coherent memory, or implement some
scheme where you only sync subregions of a buffer. Though having to call
a cpu_prepare_subbuffer ioctl might just kill all benefits you got from
using userspace suballocation. So using coherent mem for those buffers
seems like a safe bet.

I already implemented some of this in the nouveau nv50 MESA driver which
uses userspace suballocation, but unfortunately I can't do any serious
performance measurements, as the system setup has other unrelated
bottlenecks.

Regards,
Lucas
-- 
Pengutronix e.K.   | Lucas Stach |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Use of pci_map_page in nouveau, radeon TTM.

2013-10-01 Thread Thomas Hellstrom

On 10/01/2013 12:34 PM, Lucas Stach wrote:

Am Dienstag, den 01.10.2013, 12:16 +0200 schrieb Thomas Hellstrom:

Jerome, Konrad

Forgive an ignorant question, but it appears like both Nouveau and
Radeon may use pci_map_page() when populating TTMs on
pages obtained using the ordinary (not DMA pool). These pages will, if I
understand things correctly, not be pages allocated with
DMA_ALLOC_COHERENT.

  From what I understand, at least for the corresponding dma_map_page()
it's illegal for the CPU to access these pages without calling
dma_sync_xx_for_cpu(). And before the device is allowed to access them
again, you need to call dma_sync_xx_for_device().
So mapping for PCI really invalidates the TTM interleaved CPU / device
access model.


That's right. The API says you need to sync for device or cpu, but on
x86 you can get away with not doing so, as on x86 the calls end up just
being WB buffer flushes.


OK, but what about the cases where the dma subsystem allocates a bounce 
buffer?

(Although I think the TTM page selection works around this situation).
Perhaps at the very least this deserves a comment in the code...


For ARM, or similar non-coherent arches you absolutely have to do the
syncs, or you'll end up with different contents in cache vs sysram. For
my nouveau on ARM work I introduced some simple helpers to do the right
thing. And it really isn't hard doing the syncs at the right points in
time, just sync for CPU when getting a cpu_prep ioctl and then sync for
device when validating a buffer for GPU use.


Yes, this will probably work for drivers where a buffer is either bound 
for CPU or for GPU,
however, on drivers using user-space sub-allocation of buffers, or for 
partial updates of
vertex buffers etc. that isn't sufficient. In that case one either has 
to use coherent memory
or implement an elaborate scheme where we sync for device and kill 
user-space mappings on validation and
sync for cpu in the cpu fault handler. Unfortunately the latter triggers 
a fence wait for the

whole buffer, not just the part of the buffer we want to write to.


Regards,
Lucas


Regards,
Thomas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Use of pci_map_page in nouveau, radeon TTM.

2013-10-01 Thread Lucas Stach
Am Dienstag, den 01.10.2013, 12:16 +0200 schrieb Thomas Hellstrom:
> Jerome, Konrad
> 
> Forgive an ignorant question, but it appears like both Nouveau and 
> Radeon may use pci_map_page() when populating TTMs on
> pages obtained using the ordinary (not DMA pool). These pages will, if I 
> understand things correctly, not be pages allocated with
> DMA_ALLOC_COHERENT.
> 
>  From what I understand, at least for the corresponding dma_map_page() 
> it's illegal for the CPU to access these pages without calling
> dma_sync_xx_for_cpu(). And before the device is allowed to access them 
> again, you need to call dma_sync_xx_for_device().
> So mapping for PCI really invalidates the TTM interleaved CPU / device 
> access model.
> 
That's right. The API says you need to sync for device or cpu, but on
x86 you can get away with not doing so, as on x86 the calls end up just
being WB buffer flushes.

For ARM, or similar non-coherent arches you absolutely have to do the
syncs, or you'll end up with different contents in cache vs sysram. For
my nouveau on ARM work I introduced some simple helpers to do the right
thing. And it really isn't hard doing the syncs at the right points in
time, just sync for CPU when getting a cpu_prep ioctl and then sync for
device when validating a buffer for GPU use.

Regards,
Lucas
-- 
Pengutronix e.K.   | Lucas Stach |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Use of pci_map_page in nouveau, radeon TTM.

2013-10-01 Thread Thomas Hellstrom

Jerome, Konrad

Forgive an ignorant question, but it appears like both Nouveau and 
Radeon may use pci_map_page() when populating TTMs on
pages obtained using the ordinary (not DMA pool). These pages will, if I 
understand things correctly, not be pages allocated with

DMA_ALLOC_COHERENT.

From what I understand, at least for the corresponding dma_map_page() 
it's illegal for the CPU to access these pages without calling
dma_sync_xx_for_cpu(). And before the device is allowed to access them 
again, you need to call dma_sync_xx_for_device().
So mapping for PCI really invalidates the TTM interleaved CPU / device 
access model.


Or did I miss something here?

Thanks,
Thomas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel