[RFC] Future TTM DMA direction

2012-01-25 Thread Thomas Hellstrom
OK, revisiting this again, please see inline below,


On 01/10/2012 06:46 PM, Jerome Glisse wrote:
> On Mon, Jan 09, 2012 at 11:11:06AM +0100, Daniel Vetter wrote:
>> On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
>>> Hi!
>>>
>>> When TTM was originally written, it was assumed that GPU apertures
>>> could address pages directly, and that the CPU could access those
>>> pages without explicit synchronization. The process of binding a
>>> page to a GPU translation table was a simple one-step operation, and
>>> we needed to worry about fragmentation in the GPU aperture only.
>>>
>>> Now that we "sort of" support DMA memory there are three things I
>>> think are missing:
>>>
>>> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
>>> (Including CMA) memory fragmentation leading to failed allocations.
>>> 2) We can't handle dynamic mapping of pages into and out of dma, and
>>> corresponding IOMMU space shortage or fragmentation, and CPU
>>> synchronization.
>>> 3) We have no straightforward way of moving pages between devices.
>>>
>>> I think a reasonable way to support this is to make binding to a
>>> non-fixed (system page based) TTM memory type a two-step binding
>>> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
>>> instead of only (MEMORY_TYPE).
>>>
>>> In step 1) the bo is bound to a specific DMA type. These could be
>>> for example:
>>> (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
>>> allowed as well.
>>> In this step, we perform dma_sync_for_device, or allocate
>>> dma-specific pages maintaining LRU lists so that if we receive a DMA
>>> memory allocation OOM, we can unbind bo:s bound to the same DMA
>>> type. Standard graphics cards would then, for example, use the NONE
>>> DMA type when run on bare metal or COHERENT when run on Xen. A
>>> "COHERENT" OOM condition would then lead to eviction of another bo.
>>> (Note that DMA eviction might involve data copies and be costly, but
>>> still better than failing).
>>> Binding with the DYNAMIC memory type would mean that CPU accesses
>>> are disallowed, and that user-space CPU page mappings might need to
>>> be killed, with a corresponding sync_for_cpu if they are faulted in
>>> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
>>> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
>>>
>>> In step 2) The bo is bound to the GPU in the same way it's done
>>> today. Evicting from DMA will of course also trigger an evict from
>>> GPU, but an evict from GPU will not trigger a DMA evict.
>>>
>>> Making a bo "anonymous" and thus moveable between devices would then
>>> mean binding it to the "NONE" DMA type.
>>>
>>> Comments, suggestions?
>> Well I think we need to solve outstanding issues in the dma_buf framework
>> first. Currently dma_buf isn't really up to par to handle coherency
>> between the cpu and devices and there's also not yet any way to handle dma
>> address space fragmentation/exhaustion.
>>
>> I fear that if you jump ahead with improving the ttm support alone we
>> might end up with something incompatible to the stuff dma_buf eventually
>> will grow, resulting in decent amounts of wasted efforts.
>>
>> Cc'ed a bunch of relevant lists to foster input from people.
>>
>> For a starter you seem to want much more low-level integration with the
>> dma api than existing users commonly need. E.g. if I understand things
>> correctly drivers just call dma_alloc_coherent and the platform/board code
>> then decides whether the device needs a contigious allocation from cma or
>> whether something else is good, too (e.g. vmalloc for the cpu + iommu).
>> Another thing is that I think doing lru eviction in case of dma address
>> space exhaustion (or fragmentation) needs at least awereness of what's
>> going on in the upper layers. iommus are commonly shared between devices
>> and I presume that two ttm drivers sitting behind the same iommu and
>> fighting over it's resources can lead to some hilarious outcomes.
>>
>> Cheers, Daniel
> I am with Daniel here, while i think the ttm API change you propose are
> good idea, i think most of the issue you are listing need to be addressed
> at lower level. If ttm keeps doing its own things for GPU in its own little
> area we gonna endup in a dma getto ;)
>
> dma space exhaustion is somethings that is highly platform specific, on
> x86 platform it's very unlikely to happen for AMD, Intel or NVidia GPU.
> While on the ARM platform it's more likely situation, at least on current
> generation.

OK. You and Daniel have convinced me to leave OOM- and fragmentation 
handling
out of TTM, but I still think TTM DMA placement might be a good thing to 
look at
when time allows.

>
> I believe that the dma api to allocate memory are just too limited for the
> kind of device and support we are having. The association to a device is
> too restrictive. I would rather see some new API to allocate DMA/IOMMU,
> something more flexible and more in

Re: [RFC] Future TTM DMA direction

2012-01-25 Thread Thomas Hellstrom

OK, revisiting this again, please see inline below,


On 01/10/2012 06:46 PM, Jerome Glisse wrote:

On Mon, Jan 09, 2012 at 11:11:06AM +0100, Daniel Vetter wrote:

On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:

Hi!

When TTM was originally written, it was assumed that GPU apertures
could address pages directly, and that the CPU could access those
pages without explicit synchronization. The process of binding a
page to a GPU translation table was a simple one-step operation, and
we needed to worry about fragmentation in the GPU aperture only.

Now that we "sort of" support DMA memory there are three things I
think are missing:

1) We can't gracefully handle coherent DMA OOMs or coherent DMA
(Including CMA) memory fragmentation leading to failed allocations.
2) We can't handle dynamic mapping of pages into and out of dma, and
corresponding IOMMU space shortage or fragmentation, and CPU
synchronization.
3) We have no straightforward way of moving pages between devices.

I think a reasonable way to support this is to make binding to a
non-fixed (system page based) TTM memory type a two-step binding
process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
instead of only (MEMORY_TYPE).

In step 1) the bo is bound to a specific DMA type. These could be
for example:
(NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
allowed as well.
In this step, we perform dma_sync_for_device, or allocate
dma-specific pages maintaining LRU lists so that if we receive a DMA
memory allocation OOM, we can unbind bo:s bound to the same DMA
type. Standard graphics cards would then, for example, use the NONE
DMA type when run on bare metal or COHERENT when run on Xen. A
"COHERENT" OOM condition would then lead to eviction of another bo.
(Note that DMA eviction might involve data copies and be costly, but
still better than failing).
Binding with the DYNAMIC memory type would mean that CPU accesses
are disallowed, and that user-space CPU page mappings might need to
be killed, with a corresponding sync_for_cpu if they are faulted in
again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
bo page bound to DYNAMIC DMA mapping should trigger a BUG.

In step 2) The bo is bound to the GPU in the same way it's done
today. Evicting from DMA will of course also trigger an evict from
GPU, but an evict from GPU will not trigger a DMA evict.

Making a bo "anonymous" and thus moveable between devices would then
mean binding it to the "NONE" DMA type.

Comments, suggestions?

Well I think we need to solve outstanding issues in the dma_buf framework
first. Currently dma_buf isn't really up to par to handle coherency
between the cpu and devices and there's also not yet any way to handle dma
address space fragmentation/exhaustion.

I fear that if you jump ahead with improving the ttm support alone we
might end up with something incompatible to the stuff dma_buf eventually
will grow, resulting in decent amounts of wasted efforts.

Cc'ed a bunch of relevant lists to foster input from people.

For a starter you seem to want much more low-level integration with the
dma api than existing users commonly need. E.g. if I understand things
correctly drivers just call dma_alloc_coherent and the platform/board code
then decides whether the device needs a contigious allocation from cma or
whether something else is good, too (e.g. vmalloc for the cpu + iommu).
Another thing is that I think doing lru eviction in case of dma address
space exhaustion (or fragmentation) needs at least awereness of what's
going on in the upper layers. iommus are commonly shared between devices
and I presume that two ttm drivers sitting behind the same iommu and
fighting over it's resources can lead to some hilarious outcomes.

Cheers, Daniel

I am with Daniel here, while i think the ttm API change you propose are
good idea, i think most of the issue you are listing need to be addressed
at lower level. If ttm keeps doing its own things for GPU in its own little
area we gonna endup in a dma getto ;)

dma space exhaustion is somethings that is highly platform specific, on
x86 platform it's very unlikely to happen for AMD, Intel or NVidia GPU.
While on the ARM platform it's more likely situation, at least on current
generation.


OK. You and Daniel have convinced me to leave OOM- and fragmentation 
handling
out of TTM, but I still think TTM DMA placement might be a good thing to 
look at

when time allows.



I believe that the dma api to allocate memory are just too limited for the
kind of device and support we are having. The association to a device is
too restrictive. I would rather see some new API to allocate DMA/IOMMU,
something more flexible and more in line with the dma-buf work.

I believe all dma allocation have a set of restriction. dma mask of the
device, is there an iommu or not, iommu dma mask if any, iommu has a limited
address space (note that recent x86 iommu don't have such limit). In the
end it's not only the d

[RFC] Future TTM DMA direction

2012-01-10 Thread Jerome Glisse
On Mon, Jan 09, 2012 at 11:11:06AM +0100, Daniel Vetter wrote:
> On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> > Hi!
> > 
> > When TTM was originally written, it was assumed that GPU apertures
> > could address pages directly, and that the CPU could access those
> > pages without explicit synchronization. The process of binding a
> > page to a GPU translation table was a simple one-step operation, and
> > we needed to worry about fragmentation in the GPU aperture only.
> > 
> > Now that we "sort of" support DMA memory there are three things I
> > think are missing:
> > 
> > 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> > (Including CMA) memory fragmentation leading to failed allocations.
> > 2) We can't handle dynamic mapping of pages into and out of dma, and
> > corresponding IOMMU space shortage or fragmentation, and CPU
> > synchronization.
> > 3) We have no straightforward way of moving pages between devices.
> > 
> > I think a reasonable way to support this is to make binding to a
> > non-fixed (system page based) TTM memory type a two-step binding
> > process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> > instead of only (MEMORY_TYPE).
> > 
> > In step 1) the bo is bound to a specific DMA type. These could be
> > for example:
> > (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
> > allowed as well.
> > In this step, we perform dma_sync_for_device, or allocate
> > dma-specific pages maintaining LRU lists so that if we receive a DMA
> > memory allocation OOM, we can unbind bo:s bound to the same DMA
> > type. Standard graphics cards would then, for example, use the NONE
> > DMA type when run on bare metal or COHERENT when run on Xen. A
> > "COHERENT" OOM condition would then lead to eviction of another bo.
> > (Note that DMA eviction might involve data copies and be costly, but
> > still better than failing).
> > Binding with the DYNAMIC memory type would mean that CPU accesses
> > are disallowed, and that user-space CPU page mappings might need to
> > be killed, with a corresponding sync_for_cpu if they are faulted in
> > again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> > bo page bound to DYNAMIC DMA mapping should trigger a BUG.
> > 
> > In step 2) The bo is bound to the GPU in the same way it's done
> > today. Evicting from DMA will of course also trigger an evict from
> > GPU, but an evict from GPU will not trigger a DMA evict.
> > 
> > Making a bo "anonymous" and thus moveable between devices would then
> > mean binding it to the "NONE" DMA type.
> > 
> > Comments, suggestions?
> 
> Well I think we need to solve outstanding issues in the dma_buf framework
> first. Currently dma_buf isn't really up to par to handle coherency
> between the cpu and devices and there's also not yet any way to handle dma
> address space fragmentation/exhaustion.
> 
> I fear that if you jump ahead with improving the ttm support alone we
> might end up with something incompatible to the stuff dma_buf eventually
> will grow, resulting in decent amounts of wasted efforts.
> 
> Cc'ed a bunch of relevant lists to foster input from people.
> 
> For a starter you seem to want much more low-level integration with the
> dma api than existing users commonly need. E.g. if I understand things
> correctly drivers just call dma_alloc_coherent and the platform/board code
> then decides whether the device needs a contigious allocation from cma or
> whether something else is good, too (e.g. vmalloc for the cpu + iommu).
> Another thing is that I think doing lru eviction in case of dma address
> space exhaustion (or fragmentation) needs at least awereness of what's
> going on in the upper layers. iommus are commonly shared between devices
> and I presume that two ttm drivers sitting behind the same iommu and
> fighting over it's resources can lead to some hilarious outcomes.
> 
> Cheers, Daniel

I am with Daniel here, while i think the ttm API change you propose are
good idea, i think most of the issue you are listing need to be addressed
at lower level. If ttm keeps doing its own things for GPU in its own little
area we gonna endup in a dma getto ;)

dma space exhaustion is somethings that is highly platform specific, on
x86 platform it's very unlikely to happen for AMD, Intel or NVidia GPU.
While on the ARM platform it's more likely situation, at least on current
generation.

I believe that the dma api to allocate memory are just too limited for the
kind of device and support we are having. The association to a device is
too restrictive. I would rather see some new API to allocate DMA/IOMMU,
something more flexible and more in line with the dma-buf work.

I believe all dma allocation have a set of restriction. dma mask of the
device, is there an iommu or not, iommu dma mask if any, iommu has a limited
address space (note that recent x86 iommu don't have such limit). In the
end it's not only the device dma mask that matter but also the i

[RFC] Future TTM DMA direction

2012-01-10 Thread Daniel Vetter
Hi Thomas,

On Mon, Jan 09, 2012 at 12:01:28PM +0100, Thomas Hellstrom wrote:
> Thanks for your input. I think this is mostly orthogonal to dma_buf, and
> really a way to adapt TTM to be DMA-api aware. That's currently done
> within the TTM backends. CMA was mearly included as an example that
> might not be relevant.
> 
> I haven't followed dma_buf that closely lately, but if it's growing
> from being just
> a way to share buffer objects between devices to something providing
> also low-level
> allocators with fragmentation prevention, there's definitely an overlap.
> However, on the dma_buf meeting in Budapest there seemed to be
> little or no interest
> in robust buffer allocation / fragmentation prevention although I
> remember bringing
> it up to the point where I felt annoying :).

Well, I've shot at you quite a bit too, and I still think it's too much
for the first few iterations. But I also think we will need a cleverer
dma subsystem sooner or later (even if it's just around dma_buf) so that's
why I've dragged your rfc out of the drm corner ;-)

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[RFC] Future TTM DMA direction

2012-01-10 Thread Konrad Rzeszutek Wilk
On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> Hi!
> 
> When TTM was originally written, it was assumed that GPU apertures
> could address pages directly, and that the CPU could access those
> pages without explicit synchronization. The process of binding a
> page to a GPU translation table was a simple one-step operation, and
> we needed to worry about fragmentation in the GPU aperture only.
> 
> Now that we "sort of" support DMA memory there are three things I
> think are missing:
> 
> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> (Including CMA) memory fragmentation leading to failed allocations.

However most allocations are done in PAGE_SIZE chunks. So there aren't
any danger of contingous allocation failures. 

However, one way that the storage and network driver had solved this
was by doing a dmapool code concept. Which is pretty much what TTM DMA
is based on - that way we won't be hitting OOMs b/c we have allocated
a pool at the start. Well, OK, we can still hit OOMs if we want more DMA
buffers than the IOMMU can provide.

We could eleviate part of the problem by making the unbind/binding process
(and hence also the unpopulate/populate) happen more lazily to lower
the exhaustion problem?


> 2) We can't handle dynamic mapping of pages into and out of dma, and
> corresponding IOMMU space shortage or fragmentation, and CPU
> synchronization.

This and 1) seem to point to the same thing - a closer relationship
with the IOMMU/DMA code. I would think that this problem would not
just be with graphics, but also with storage, userspace drivers,
and network.

Seems that some form of feedback mechanism from IOMMU API might be useful?

> 3) We have no straightforward way of moving pages between devices.
> 
> I think a reasonable way to support this is to make binding to a
> non-fixed (system page based) TTM memory type a two-step binding
> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> instead of only (MEMORY_TYPE).
> 
> In step 1) the bo is bound to a specific DMA type. These could be
> for example:
> (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
> allowed as well.
> In this step, we perform dma_sync_for_device, or allocate
> dma-specific pages maintaining LRU lists so that if we receive a DMA
> memory allocation OOM, we can unbind bo:s bound to the same DMA

The DMA API is quite stringent in wanting the DMA page allocated to be
associated with the BDF of the device. So the "same DMA type" would
need to be "same DMA type on the same PCI device."

> type. Standard graphics cards would then, for example, use the NONE
> DMA type when run on bare metal or COHERENT when run on Xen. A
> "COHERENT" OOM condition would then lead to eviction of another bo.
> (Note that DMA eviction might involve data copies and be costly, but
> still better than failing).

OK, that sounds right - we do have those buffers and we could re-use them.
Thought right now we throw away the 'tt_cached' ones instead of re-using them.

> Binding with the DYNAMIC memory type would mean that CPU accesses
> are disallowed, and that user-space CPU page mappings might need to
> be killed, with a corresponding sync_for_cpu if they are faulted in
> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
> 
> In step 2) The bo is bound to the GPU in the same way it's done
> today. Evicting from DMA will of course also trigger an evict from
> GPU, but an evict from GPU will not trigger a DMA evict.
> 
> Making a bo "anonymous" and thus moveable between devices would then
> mean binding it to the "NONE" DMA type.

Which would be copied to a different device when needed by another GPU?

The "binding" process sounds like it would need the smarts to figure out
whether it can just attach the DMA page to the other pool or if it needs
to fetch a page from the other pool, copy the contents of the page, and
retire the old one in a pool for re-use? And probably some other logic
too.

> 
> Comments, suggestions?
> 
> /Thomas
> 
> 
> 
> 
> 
> 


Re: [RFC] Future TTM DMA direction

2012-01-10 Thread Jerome Glisse
On Mon, Jan 09, 2012 at 11:11:06AM +0100, Daniel Vetter wrote:
> On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> > Hi!
> > 
> > When TTM was originally written, it was assumed that GPU apertures
> > could address pages directly, and that the CPU could access those
> > pages without explicit synchronization. The process of binding a
> > page to a GPU translation table was a simple one-step operation, and
> > we needed to worry about fragmentation in the GPU aperture only.
> > 
> > Now that we "sort of" support DMA memory there are three things I
> > think are missing:
> > 
> > 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> > (Including CMA) memory fragmentation leading to failed allocations.
> > 2) We can't handle dynamic mapping of pages into and out of dma, and
> > corresponding IOMMU space shortage or fragmentation, and CPU
> > synchronization.
> > 3) We have no straightforward way of moving pages between devices.
> > 
> > I think a reasonable way to support this is to make binding to a
> > non-fixed (system page based) TTM memory type a two-step binding
> > process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> > instead of only (MEMORY_TYPE).
> > 
> > In step 1) the bo is bound to a specific DMA type. These could be
> > for example:
> > (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
> > allowed as well.
> > In this step, we perform dma_sync_for_device, or allocate
> > dma-specific pages maintaining LRU lists so that if we receive a DMA
> > memory allocation OOM, we can unbind bo:s bound to the same DMA
> > type. Standard graphics cards would then, for example, use the NONE
> > DMA type when run on bare metal or COHERENT when run on Xen. A
> > "COHERENT" OOM condition would then lead to eviction of another bo.
> > (Note that DMA eviction might involve data copies and be costly, but
> > still better than failing).
> > Binding with the DYNAMIC memory type would mean that CPU accesses
> > are disallowed, and that user-space CPU page mappings might need to
> > be killed, with a corresponding sync_for_cpu if they are faulted in
> > again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> > bo page bound to DYNAMIC DMA mapping should trigger a BUG.
> > 
> > In step 2) The bo is bound to the GPU in the same way it's done
> > today. Evicting from DMA will of course also trigger an evict from
> > GPU, but an evict from GPU will not trigger a DMA evict.
> > 
> > Making a bo "anonymous" and thus moveable between devices would then
> > mean binding it to the "NONE" DMA type.
> > 
> > Comments, suggestions?
> 
> Well I think we need to solve outstanding issues in the dma_buf framework
> first. Currently dma_buf isn't really up to par to handle coherency
> between the cpu and devices and there's also not yet any way to handle dma
> address space fragmentation/exhaustion.
> 
> I fear that if you jump ahead with improving the ttm support alone we
> might end up with something incompatible to the stuff dma_buf eventually
> will grow, resulting in decent amounts of wasted efforts.
> 
> Cc'ed a bunch of relevant lists to foster input from people.
> 
> For a starter you seem to want much more low-level integration with the
> dma api than existing users commonly need. E.g. if I understand things
> correctly drivers just call dma_alloc_coherent and the platform/board code
> then decides whether the device needs a contigious allocation from cma or
> whether something else is good, too (e.g. vmalloc for the cpu + iommu).
> Another thing is that I think doing lru eviction in case of dma address
> space exhaustion (or fragmentation) needs at least awereness of what's
> going on in the upper layers. iommus are commonly shared between devices
> and I presume that two ttm drivers sitting behind the same iommu and
> fighting over it's resources can lead to some hilarious outcomes.
> 
> Cheers, Daniel

I am with Daniel here, while i think the ttm API change you propose are
good idea, i think most of the issue you are listing need to be addressed
at lower level. If ttm keeps doing its own things for GPU in its own little
area we gonna endup in a dma getto ;)

dma space exhaustion is somethings that is highly platform specific, on
x86 platform it's very unlikely to happen for AMD, Intel or NVidia GPU.
While on the ARM platform it's more likely situation, at least on current
generation.

I believe that the dma api to allocate memory are just too limited for the
kind of device and support we are having. The association to a device is
too restrictive. I would rather see some new API to allocate DMA/IOMMU,
something more flexible and more in line with the dma-buf work.

I believe all dma allocation have a set of restriction. dma mask of the
device, is there an iommu or not, iommu dma mask if any, iommu has a limited
address space (note that recent x86 iommu don't have such limit). In the
end it's not only the device dma mask that matter but also the i

Re: [RFC] Future TTM DMA direction

2012-01-10 Thread Konrad Rzeszutek Wilk
On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> Hi!
> 
> When TTM was originally written, it was assumed that GPU apertures
> could address pages directly, and that the CPU could access those
> pages without explicit synchronization. The process of binding a
> page to a GPU translation table was a simple one-step operation, and
> we needed to worry about fragmentation in the GPU aperture only.
> 
> Now that we "sort of" support DMA memory there are three things I
> think are missing:
> 
> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> (Including CMA) memory fragmentation leading to failed allocations.

However most allocations are done in PAGE_SIZE chunks. So there aren't
any danger of contingous allocation failures. 

However, one way that the storage and network driver had solved this
was by doing a dmapool code concept. Which is pretty much what TTM DMA
is based on - that way we won't be hitting OOMs b/c we have allocated
a pool at the start. Well, OK, we can still hit OOMs if we want more DMA
buffers than the IOMMU can provide.

We could eleviate part of the problem by making the unbind/binding process
(and hence also the unpopulate/populate) happen more lazily to lower
the exhaustion problem?


> 2) We can't handle dynamic mapping of pages into and out of dma, and
> corresponding IOMMU space shortage or fragmentation, and CPU
> synchronization.

This and 1) seem to point to the same thing - a closer relationship
with the IOMMU/DMA code. I would think that this problem would not
just be with graphics, but also with storage, userspace drivers,
and network.

Seems that some form of feedback mechanism from IOMMU API might be useful?

> 3) We have no straightforward way of moving pages between devices.
> 
> I think a reasonable way to support this is to make binding to a
> non-fixed (system page based) TTM memory type a two-step binding
> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> instead of only (MEMORY_TYPE).
> 
> In step 1) the bo is bound to a specific DMA type. These could be
> for example:
> (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
> allowed as well.
> In this step, we perform dma_sync_for_device, or allocate
> dma-specific pages maintaining LRU lists so that if we receive a DMA
> memory allocation OOM, we can unbind bo:s bound to the same DMA

The DMA API is quite stringent in wanting the DMA page allocated to be
associated with the BDF of the device. So the "same DMA type" would
need to be "same DMA type on the same PCI device."

> type. Standard graphics cards would then, for example, use the NONE
> DMA type when run on bare metal or COHERENT when run on Xen. A
> "COHERENT" OOM condition would then lead to eviction of another bo.
> (Note that DMA eviction might involve data copies and be costly, but
> still better than failing).

OK, that sounds right - we do have those buffers and we could re-use them.
Thought right now we throw away the 'tt_cached' ones instead of re-using them.

> Binding with the DYNAMIC memory type would mean that CPU accesses
> are disallowed, and that user-space CPU page mappings might need to
> be killed, with a corresponding sync_for_cpu if they are faulted in
> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
> 
> In step 2) The bo is bound to the GPU in the same way it's done
> today. Evicting from DMA will of course also trigger an evict from
> GPU, but an evict from GPU will not trigger a DMA evict.
> 
> Making a bo "anonymous" and thus moveable between devices would then
> mean binding it to the "NONE" DMA type.

Which would be copied to a different device when needed by another GPU?

The "binding" process sounds like it would need the smarts to figure out
whether it can just attach the DMA page to the other pool or if it needs
to fetch a page from the other pool, copy the contents of the page, and
retire the old one in a pool for re-use? And probably some other logic
too.

> 
> Comments, suggestions?
> 
> /Thomas
> 
> 
> 
> 
> 
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Future TTM DMA direction

2012-01-10 Thread Daniel Vetter
Hi Thomas,

On Mon, Jan 09, 2012 at 12:01:28PM +0100, Thomas Hellstrom wrote:
> Thanks for your input. I think this is mostly orthogonal to dma_buf, and
> really a way to adapt TTM to be DMA-api aware. That's currently done
> within the TTM backends. CMA was mearly included as an example that
> might not be relevant.
> 
> I haven't followed dma_buf that closely lately, but if it's growing
> from being just
> a way to share buffer objects between devices to something providing
> also low-level
> allocators with fragmentation prevention, there's definitely an overlap.
> However, on the dma_buf meeting in Budapest there seemed to be
> little or no interest
> in robust buffer allocation / fragmentation prevention although I
> remember bringing
> it up to the point where I felt annoying :).

Well, I've shot at you quite a bit too, and I still think it's too much
for the first few iterations. But I also think we will need a cleverer
dma subsystem sooner or later (even if it's just around dma_buf) so that's
why I've dragged your rfc out of the drm corner ;-)

Cheers, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC] Future TTM DMA direction

2012-01-09 Thread Thomas Hellstrom
On 01/09/2012 11:11 AM, Daniel Vetter wrote:
> On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
>
>> Hi!
>>
>> When TTM was originally written, it was assumed that GPU apertures
>> could address pages directly, and that the CPU could access those
>> pages without explicit synchronization. The process of binding a
>> page to a GPU translation table was a simple one-step operation, and
>> we needed to worry about fragmentation in the GPU aperture only.
>>
>> Now that we "sort of" support DMA memory there are three things I
>> think are missing:
>>
>> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
>> (Including CMA) memory fragmentation leading to failed allocations.
>> 2) We can't handle dynamic mapping of pages into and out of dma, and
>> corresponding IOMMU space shortage or fragmentation, and CPU
>> synchronization.
>> 3) We have no straightforward way of moving pages between devices.
>>
>> I think a reasonable way to support this is to make binding to a
>> non-fixed (system page based) TTM memory type a two-step binding
>> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
>> instead of only (MEMORY_TYPE).
>>
>> In step 1) the bo is bound to a specific DMA type. These could be
>> for example:
>> (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
>> allowed as well.
>> In this step, we perform dma_sync_for_device, or allocate
>> dma-specific pages maintaining LRU lists so that if we receive a DMA
>> memory allocation OOM, we can unbind bo:s bound to the same DMA
>> type. Standard graphics cards would then, for example, use the NONE
>> DMA type when run on bare metal or COHERENT when run on Xen. A
>> "COHERENT" OOM condition would then lead to eviction of another bo.
>> (Note that DMA eviction might involve data copies and be costly, but
>> still better than failing).
>> Binding with the DYNAMIC memory type would mean that CPU accesses
>> are disallowed, and that user-space CPU page mappings might need to
>> be killed, with a corresponding sync_for_cpu if they are faulted in
>> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
>> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
>>
>> In step 2) The bo is bound to the GPU in the same way it's done
>> today. Evicting from DMA will of course also trigger an evict from
>> GPU, but an evict from GPU will not trigger a DMA evict.
>>
>> Making a bo "anonymous" and thus moveable between devices would then
>> mean binding it to the "NONE" DMA type.
>>
>> Comments, suggestions?
>>  
> Well I think we need to solve outstanding issues in the dma_buf framework
> first. Currently dma_buf isn't really up to par to handle coherency
> between the cpu and devices and there's also not yet any way to handle dma
> address space fragmentation/exhaustion.
>
> I fear that if you jump ahead with improving the ttm support alone we
> might end up with something incompatible to the stuff dma_buf eventually
> will grow, resulting in decent amounts of wasted efforts.
>
> Cc'ed a bunch of relevant lists to foster input from people.
>

Daniel,

Thanks for your input. I think this is mostly orthogonal to dma_buf, and
really a way to adapt TTM to be DMA-api aware. That's currently done
within the TTM backends. CMA was mearly included as an example that
might not be relevant.

I haven't followed dma_buf that closely lately, but if it's growing from 
being just
a way to share buffer objects between devices to something providing 
also low-level
allocators with fragmentation prevention, there's definitely an overlap.
However, on the dma_buf meeting in Budapest there seemed to be little or 
no interest
in robust buffer allocation / fragmentation prevention although I 
remember bringing
it up to the point where I felt annoying :).

> For a starter you seem to want much more low-level integration with the
> dma api than existing users commonly need. E.g. if I understand things
> correctly drivers just call dma_alloc_coherent and the platform/board code
> then decides whether the device needs a contigious allocation from cma or
> whether something else is good, too (e.g. vmalloc for the cpu + iommu).
> Another thing is that I think doing lru eviction in case of dma address
> space exhaustion (or fragmentation) needs at least awereness of what's
> going on in the upper layers. iommus are commonly shared between devices
> and I presume that two ttm drivers sitting behind the same iommu and
> fighting over it's resources can lead to some hilarious outcomes.
>

A good point, I didn't think of that.

For TTM drivers sharing the same IOMMU it's really possible to make such 
LRU global,
(assuming IOMMU identity is available to the TTM-aware drivers), but unless
fragmentation prevention the way we use it for graphics drivers 
(allocate - submit - fence) ends
up in the IOMMU space management code, it's impossible to make this 
scheme system-wide.

> Cheers, Daniel
>

Thanks,

/Thomas


[RFC] Future TTM DMA direction

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> Hi!
> 
> When TTM was originally written, it was assumed that GPU apertures
> could address pages directly, and that the CPU could access those
> pages without explicit synchronization. The process of binding a
> page to a GPU translation table was a simple one-step operation, and
> we needed to worry about fragmentation in the GPU aperture only.
> 
> Now that we "sort of" support DMA memory there are three things I
> think are missing:
> 
> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> (Including CMA) memory fragmentation leading to failed allocations.
> 2) We can't handle dynamic mapping of pages into and out of dma, and
> corresponding IOMMU space shortage or fragmentation, and CPU
> synchronization.
> 3) We have no straightforward way of moving pages between devices.
> 
> I think a reasonable way to support this is to make binding to a
> non-fixed (system page based) TTM memory type a two-step binding
> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> instead of only (MEMORY_TYPE).
> 
> In step 1) the bo is bound to a specific DMA type. These could be
> for example:
> (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
> allowed as well.
> In this step, we perform dma_sync_for_device, or allocate
> dma-specific pages maintaining LRU lists so that if we receive a DMA
> memory allocation OOM, we can unbind bo:s bound to the same DMA
> type. Standard graphics cards would then, for example, use the NONE
> DMA type when run on bare metal or COHERENT when run on Xen. A
> "COHERENT" OOM condition would then lead to eviction of another bo.
> (Note that DMA eviction might involve data copies and be costly, but
> still better than failing).
> Binding with the DYNAMIC memory type would mean that CPU accesses
> are disallowed, and that user-space CPU page mappings might need to
> be killed, with a corresponding sync_for_cpu if they are faulted in
> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
> 
> In step 2) The bo is bound to the GPU in the same way it's done
> today. Evicting from DMA will of course also trigger an evict from
> GPU, but an evict from GPU will not trigger a DMA evict.
> 
> Making a bo "anonymous" and thus moveable between devices would then
> mean binding it to the "NONE" DMA type.
> 
> Comments, suggestions?

Well I think we need to solve outstanding issues in the dma_buf framework
first. Currently dma_buf isn't really up to par to handle coherency
between the cpu and devices and there's also not yet any way to handle dma
address space fragmentation/exhaustion.

I fear that if you jump ahead with improving the ttm support alone we
might end up with something incompatible to the stuff dma_buf eventually
will grow, resulting in decent amounts of wasted efforts.

Cc'ed a bunch of relevant lists to foster input from people.

For a starter you seem to want much more low-level integration with the
dma api than existing users commonly need. E.g. if I understand things
correctly drivers just call dma_alloc_coherent and the platform/board code
then decides whether the device needs a contigious allocation from cma or
whether something else is good, too (e.g. vmalloc for the cpu + iommu).
Another thing is that I think doing lru eviction in case of dma address
space exhaustion (or fragmentation) needs at least awereness of what's
going on in the upper layers. iommus are commonly shared between devices
and I presume that two ttm drivers sitting behind the same iommu and
fighting over it's resources can lead to some hilarious outcomes.

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48


[RFC] Future TTM DMA direction

2012-01-09 Thread Thomas Hellstrom
Hi!

When TTM was originally written, it was assumed that GPU apertures could 
address pages directly, and that the CPU could access those pages 
without explicit synchronization. The process of binding a page to a GPU 
translation table was a simple one-step operation, and we needed to 
worry about fragmentation in the GPU aperture only.

Now that we "sort of" support DMA memory there are three things I think 
are missing:

1) We can't gracefully handle coherent DMA OOMs or coherent DMA 
(Including CMA) memory fragmentation leading to failed allocations.
2) We can't handle dynamic mapping of pages into and out of dma, and 
corresponding IOMMU space shortage or fragmentation, and CPU 
synchronization.
3) We have no straightforward way of moving pages between devices.

I think a reasonable way to support this is to make binding to a 
non-fixed (system page based) TTM memory type a two-step binding 
process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE) 
instead of only (MEMORY_TYPE).

In step 1) the bo is bound to a specific DMA type. These could be for 
example:
(NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be 
allowed as well.
In this step, we perform dma_sync_for_device, or allocate dma-specific 
pages maintaining LRU lists so that if we receive a DMA memory 
allocation OOM, we can unbind bo:s bound to the same DMA type. Standard 
graphics cards would then, for example, use the NONE DMA type when run 
on bare metal or COHERENT when run on Xen. A "COHERENT" OOM condition 
would then lead to eviction of another bo. (Note that DMA eviction might 
involve data copies and be costly, but still better than failing).
Binding with the DYNAMIC memory type would mean that CPU accesses are 
disallowed, and that user-space CPU page mappings might need to be 
killed, with a corresponding sync_for_cpu if they are faulted in again 
(perhaps on a page-by-page basis). Any attempt to bo_kmap() a bo page 
bound to DYNAMIC DMA mapping should trigger a BUG.

In step 2) The bo is bound to the GPU in the same way it's done today. 
Evicting from DMA will of course also trigger an evict from GPU, but an 
evict from GPU will not trigger a DMA evict.

Making a bo "anonymous" and thus moveable between devices would then 
mean binding it to the "NONE" DMA type.

Comments, suggestions?

/Thomas









Re: [RFC] Future TTM DMA direction

2012-01-09 Thread Thomas Hellstrom

On 01/09/2012 11:11 AM, Daniel Vetter wrote:

On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
   

Hi!

When TTM was originally written, it was assumed that GPU apertures
could address pages directly, and that the CPU could access those
pages without explicit synchronization. The process of binding a
page to a GPU translation table was a simple one-step operation, and
we needed to worry about fragmentation in the GPU aperture only.

Now that we "sort of" support DMA memory there are three things I
think are missing:

1) We can't gracefully handle coherent DMA OOMs or coherent DMA
(Including CMA) memory fragmentation leading to failed allocations.
2) We can't handle dynamic mapping of pages into and out of dma, and
corresponding IOMMU space shortage or fragmentation, and CPU
synchronization.
3) We have no straightforward way of moving pages between devices.

I think a reasonable way to support this is to make binding to a
non-fixed (system page based) TTM memory type a two-step binding
process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
instead of only (MEMORY_TYPE).

In step 1) the bo is bound to a specific DMA type. These could be
for example:
(NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
allowed as well.
In this step, we perform dma_sync_for_device, or allocate
dma-specific pages maintaining LRU lists so that if we receive a DMA
memory allocation OOM, we can unbind bo:s bound to the same DMA
type. Standard graphics cards would then, for example, use the NONE
DMA type when run on bare metal or COHERENT when run on Xen. A
"COHERENT" OOM condition would then lead to eviction of another bo.
(Note that DMA eviction might involve data copies and be costly, but
still better than failing).
Binding with the DYNAMIC memory type would mean that CPU accesses
are disallowed, and that user-space CPU page mappings might need to
be killed, with a corresponding sync_for_cpu if they are faulted in
again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
bo page bound to DYNAMIC DMA mapping should trigger a BUG.

In step 2) The bo is bound to the GPU in the same way it's done
today. Evicting from DMA will of course also trigger an evict from
GPU, but an evict from GPU will not trigger a DMA evict.

Making a bo "anonymous" and thus moveable between devices would then
mean binding it to the "NONE" DMA type.

Comments, suggestions?
 

Well I think we need to solve outstanding issues in the dma_buf framework
first. Currently dma_buf isn't really up to par to handle coherency
between the cpu and devices and there's also not yet any way to handle dma
address space fragmentation/exhaustion.

I fear that if you jump ahead with improving the ttm support alone we
might end up with something incompatible to the stuff dma_buf eventually
will grow, resulting in decent amounts of wasted efforts.

Cc'ed a bunch of relevant lists to foster input from people.
   


Daniel,

Thanks for your input. I think this is mostly orthogonal to dma_buf, and
really a way to adapt TTM to be DMA-api aware. That's currently done
within the TTM backends. CMA was mearly included as an example that
might not be relevant.

I haven't followed dma_buf that closely lately, but if it's growing from 
being just
a way to share buffer objects between devices to something providing 
also low-level

allocators with fragmentation prevention, there's definitely an overlap.
However, on the dma_buf meeting in Budapest there seemed to be little or 
no interest
in robust buffer allocation / fragmentation prevention although I 
remember bringing

it up to the point where I felt annoying :).


For a starter you seem to want much more low-level integration with the
dma api than existing users commonly need. E.g. if I understand things
correctly drivers just call dma_alloc_coherent and the platform/board code
then decides whether the device needs a contigious allocation from cma or
whether something else is good, too (e.g. vmalloc for the cpu + iommu).
Another thing is that I think doing lru eviction in case of dma address
space exhaustion (or fragmentation) needs at least awereness of what's
going on in the upper layers. iommus are commonly shared between devices
and I presume that two ttm drivers sitting behind the same iommu and
fighting over it's resources can lead to some hilarious outcomes.
   


A good point, I didn't think of that.

For TTM drivers sharing the same IOMMU it's really possible to make such 
LRU global,

(assuming IOMMU identity is available to the TTM-aware drivers), but unless
fragmentation prevention the way we use it for graphics drivers 
(allocate - submit - fence) ends
up in the IOMMU space management code, it's impossible to make this 
scheme system-wide.



Cheers, Daniel
   


Thanks,

/Thomas
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Future TTM DMA direction

2012-01-09 Thread Daniel Vetter
On Mon, Jan 09, 2012 at 10:37:28AM +0100, Thomas Hellstrom wrote:
> Hi!
> 
> When TTM was originally written, it was assumed that GPU apertures
> could address pages directly, and that the CPU could access those
> pages without explicit synchronization. The process of binding a
> page to a GPU translation table was a simple one-step operation, and
> we needed to worry about fragmentation in the GPU aperture only.
> 
> Now that we "sort of" support DMA memory there are three things I
> think are missing:
> 
> 1) We can't gracefully handle coherent DMA OOMs or coherent DMA
> (Including CMA) memory fragmentation leading to failed allocations.
> 2) We can't handle dynamic mapping of pages into and out of dma, and
> corresponding IOMMU space shortage or fragmentation, and CPU
> synchronization.
> 3) We have no straightforward way of moving pages between devices.
> 
> I think a reasonable way to support this is to make binding to a
> non-fixed (system page based) TTM memory type a two-step binding
> process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE)
> instead of only (MEMORY_TYPE).
> 
> In step 1) the bo is bound to a specific DMA type. These could be
> for example:
> (NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be
> allowed as well.
> In this step, we perform dma_sync_for_device, or allocate
> dma-specific pages maintaining LRU lists so that if we receive a DMA
> memory allocation OOM, we can unbind bo:s bound to the same DMA
> type. Standard graphics cards would then, for example, use the NONE
> DMA type when run on bare metal or COHERENT when run on Xen. A
> "COHERENT" OOM condition would then lead to eviction of another bo.
> (Note that DMA eviction might involve data copies and be costly, but
> still better than failing).
> Binding with the DYNAMIC memory type would mean that CPU accesses
> are disallowed, and that user-space CPU page mappings might need to
> be killed, with a corresponding sync_for_cpu if they are faulted in
> again (perhaps on a page-by-page basis). Any attempt to bo_kmap() a
> bo page bound to DYNAMIC DMA mapping should trigger a BUG.
> 
> In step 2) The bo is bound to the GPU in the same way it's done
> today. Evicting from DMA will of course also trigger an evict from
> GPU, but an evict from GPU will not trigger a DMA evict.
> 
> Making a bo "anonymous" and thus moveable between devices would then
> mean binding it to the "NONE" DMA type.
> 
> Comments, suggestions?

Well I think we need to solve outstanding issues in the dma_buf framework
first. Currently dma_buf isn't really up to par to handle coherency
between the cpu and devices and there's also not yet any way to handle dma
address space fragmentation/exhaustion.

I fear that if you jump ahead with improving the ttm support alone we
might end up with something incompatible to the stuff dma_buf eventually
will grow, resulting in decent amounts of wasted efforts.

Cc'ed a bunch of relevant lists to foster input from people.

For a starter you seem to want much more low-level integration with the
dma api than existing users commonly need. E.g. if I understand things
correctly drivers just call dma_alloc_coherent and the platform/board code
then decides whether the device needs a contigious allocation from cma or
whether something else is good, too (e.g. vmalloc for the cpu + iommu).
Another thing is that I think doing lru eviction in case of dma address
space exhaustion (or fragmentation) needs at least awereness of what's
going on in the upper layers. iommus are commonly shared between devices
and I presume that two ttm drivers sitting behind the same iommu and
fighting over it's resources can lead to some hilarious outcomes.

Cheers, Daniel
-- 
Daniel Vetter
Mail: dan...@ffwll.ch
Mobile: +41 (0)79 365 57 48
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC] Future TTM DMA direction

2012-01-09 Thread Thomas Hellstrom

Hi!

When TTM was originally written, it was assumed that GPU apertures could 
address pages directly, and that the CPU could access those pages 
without explicit synchronization. The process of binding a page to a GPU 
translation table was a simple one-step operation, and we needed to 
worry about fragmentation in the GPU aperture only.


Now that we "sort of" support DMA memory there are three things I think 
are missing:


1) We can't gracefully handle coherent DMA OOMs or coherent DMA 
(Including CMA) memory fragmentation leading to failed allocations.
2) We can't handle dynamic mapping of pages into and out of dma, and 
corresponding IOMMU space shortage or fragmentation, and CPU 
synchronization.

3) We have no straightforward way of moving pages between devices.

I think a reasonable way to support this is to make binding to a 
non-fixed (system page based) TTM memory type a two-step binding 
process, so that a TTM placement consists of (DMA_TYPE, MEMORY_TYPE) 
instead of only (MEMORY_TYPE).


In step 1) the bo is bound to a specific DMA type. These could be for 
example:
(NONE, DYNAMIC, COHERENT, CMA),  device dependent types could be 
allowed as well.
In this step, we perform dma_sync_for_device, or allocate dma-specific 
pages maintaining LRU lists so that if we receive a DMA memory 
allocation OOM, we can unbind bo:s bound to the same DMA type. Standard 
graphics cards would then, for example, use the NONE DMA type when run 
on bare metal or COHERENT when run on Xen. A "COHERENT" OOM condition 
would then lead to eviction of another bo. (Note that DMA eviction might 
involve data copies and be costly, but still better than failing).
Binding with the DYNAMIC memory type would mean that CPU accesses are 
disallowed, and that user-space CPU page mappings might need to be 
killed, with a corresponding sync_for_cpu if they are faulted in again 
(perhaps on a page-by-page basis). Any attempt to bo_kmap() a bo page 
bound to DYNAMIC DMA mapping should trigger a BUG.


In step 2) The bo is bound to the GPU in the same way it's done today. 
Evicting from DMA will of course also trigger an evict from GPU, but an 
evict from GPU will not trigger a DMA evict.


Making a bo "anonymous" and thus moveable between devices would then 
mean binding it to the "NONE" DMA type.


Comments, suggestions?

/Thomas







___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel