Jerome Glisse wrote:
> On Thu, Jan 21, 2010 at 01:59:26PM +0100, Thomas Hellstrom wrote:
>
>> Jerome Glisse wrote:
>>
>>> On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote:
>>>
> We had to do a similar thing in the
> Poulsbo driver and it turned out that we could
On Thu, Jan 21, 2010 at 04:14:39PM +0100, Luca Barbieri wrote:
> I'm not sure I understand your proposal correctly.
> It seems your proposoal is similar to mine, replacing the term "fence
> nodes" with "ttm transactions", but I'm not sure if I understand it
> correctly
>
> Here is some pseudocode
On Thu, Jan 21, 2010 at 01:59:26PM +0100, Thomas Hellstrom wrote:
> Jerome Glisse wrote:
> >On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote:
> >>>We had to do a similar thing in the
> >>>Poulsbo driver and it turned out that we could save a significant amount of
> >>>CPU by using a de
Luca Barbieri writes:
>> The card also keeps some internal FIFO caches, so it seems to me that
>> getting that safe of races and efficient at the same time could be a bit
>> non-trivial.
>
> Are they flushable?
> It seems this *might* do the job:
> if (!pfifo->cache_flush(dev))
> return;
> pfifo-
> The card also keeps some internal FIFO caches, so it seems to me that
> getting that safe of races and efficient at the same time could be a bit
> non-trivial.
Are they flushable?
It seems this *might* do the job:
if (!pfifo->cache_flush(dev))
return;
pfifo->reassign(dev, false);
pfifo->cache_fl
Luca Barbieri writes:
>>> If not, it could possibly be hacked around by reading from a DMA
>>> object at the address of the fence sequence number and then resizing
>>> the DMA object so that addresses from a certain point on would trigger
>>> a protection fault interrupt.
>>
>> I don't think you
On Thu, Jan 21, 2010 at 4:56 PM, Luca Barbieri wrote:
>> Doing it without software methods means you need to have a semaphore
>> that "exists" in the cpu domain and therefore cannot be used again
>> until the cpu is sure it's done. So that will probably require a
>> rotating queue of several semap
> Doing it without software methods means you need to have a semaphore
> that "exists" in the cpu domain and therefore cannot be used again
> until the cpu is sure it's done. So that will probably require a
> rotating queue of several semaphores and a fallback to cpu waiting.
Why would it need a s
On Thu, Jan 21, 2010 at 3:44 PM, Francisco Jerez wrote:
> Luca Barbieri writes:
>
>>> Nvidia cards have a synchronization primitive that could be used to
>>> synchronize several FIFOs in hardware (AKA semaphores, see [1] for an
>>> example).
>>
>> Does this operate wholly on the GPU on all nVidia
>> If not, it could possibly be hacked around by reading from a DMA
>> object at the address of the fence sequence number and then resizing
>> the DMA object so that addresses from a certain point on would trigger
>> a protection fault interrupt.
>
> I don't think you can safely modify a DMA object
I'm not sure I understand your proposal correctly.
It seems your proposoal is similar to mine, replacing the term "fence
nodes" with "ttm transactions", but I'm not sure if I understand it
correctly
Here is some pseudocode for a improved, simplified version of my proposal.
It is modified so that t
Luca Barbieri writes:
>> Nvidia cards have a synchronization primitive that could be used to
>> synchronize several FIFOs in hardware (AKA semaphores, see [1] for an
>> example).
>
> Does this operate wholly on the GPU on all nVidia cards?
>
> It seems that at least on some GPUs this will trigger
Luca Barbieri wrote:
>> At a first glance:
>>
>> 1) We probably *will* need a delayed destroyed workqueue to avoid wasting
>> memory that otherwise should be freed to the system. At the very least, the
>> delayed delete process should optionally be run by a system shrinker.
>>
> You are right.
> Nvidia cards have a synchronization primitive that could be used to
> synchronize several FIFOs in hardware (AKA semaphores, see [1] for an
> example).
Does this operate wholly on the GPU on all nVidia cards?
It seems that at least on some GPUs this will trigger "software
methods" that are basi
Luca Barbieri writes:
>> At a first glance:
>>
>> 1) We probably *will* need a delayed destroyed workqueue to avoid wasting
>> memory that otherwise should be freed to the system. At the very least, the
>> delayed delete process should optionally be run by a system shrinker.
> You are right. For
> At a first glance:
>
> 1) We probably *will* need a delayed destroyed workqueue to avoid wasting
> memory that otherwise should be freed to the system. At the very least, the
> delayed delete process should optionally be run by a system shrinker.
You are right. For VRAM we don't care since we are
Jerome Glisse wrote:
> On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote:
>
>>> We had to do a similar thing in the
>>> Poulsbo driver and it turned out that we could save a significant amount of
>>> CPU by using a delayed workqueue, collecting objects and destroying them
>>> periodi
Luca Barbieri wrote:
>> We had to do a similar thing in the
>> Poulsbo driver and it turned out that we could save a significant amount of
>> CPU by using a delayed workqueue, collecting objects and destroying them
>> periodically.
>>
> Yes, indeed, we don't really care about a fence expiring
On Thu, Jan 21, 2010 at 04:49:39AM +0100, Luca Barbieri wrote:
> > We had to do a similar thing in the
> > Poulsbo driver and it turned out that we could save a significant amount of
> > CPU by using a delayed workqueue, collecting objects and destroying them
> > periodically.
> Yes, indeed, we don
> We had to do a similar thing in the
> Poulsbo driver and it turned out that we could save a significant amount of
> CPU by using a delayed workqueue, collecting objects and destroying them
> periodically.
Yes, indeed, we don't really care about a fence expiring unless we
want to use that buffer o
Luca Barbieri wrote:
> When designing this, we should also keep in mind that some drivers
> (e.g. nouveau) have multiple FIFO channels, and thus we would like a
> buffer to be referenced for reading by multiple channels at once (and
> be destroyed only when all fences are expired, obviously).
> Als
Luca Barbieri wrote:
>> Also note that the delayed delete list is not in fence order but in
>> deletion-time order, which perhaps gives room for more optimizations.
>>
> You are right.
> I think then that ttm_bo_delayed_delete may still need to be changed,
> because it stops when ttm_bo_cleanu
When designing this, we should also keep in mind that some drivers
(e.g. nouveau) have multiple FIFO channels, and thus we would like a
buffer to be referenced for reading by multiple channels at once (and
be destroyed only when all fences are expired, obviously).
Also, hardware may support on-GPU
> Also note that the delayed delete list is not in fence order but in
> deletion-time order, which perhaps gives room for more optimizations.
You are right.
I think then that ttm_bo_delayed_delete may still need to be changed,
because it stops when ttm_bo_cleanup_refs returns -EBUSY, which
happens
Luca Barbieri wrote:
> Yes it's fine. I sent your patch to Dave with an expanded commit
> comment for merging.
>
> Here is a possible redesign of the mechanism inspired by this issue.
> It seems that what we are racing against is buffer eviction, due to
> delayed deletion buffers being still kept o
Yes it's fine. I sent your patch to Dave with an expanded commit
comment for merging.
Here is a possible redesign of the mechanism inspired by this issue.
It seems that what we are racing against is buffer eviction, due to
delayed deletion buffers being still kept on the LRU list.
I'm wondering i
Resending this with Thomas Hellstrom's signoff for merging into 2.6.33
ttm_bo_delayed_delete has a race condition, because after we do:
kref_put(&nentry->list_kref, ttm_bo_release_list);
we are not holding the list lock and not holding any reference to
objects, and thus every bo in the list can b
Thomas Hellstrom wrote:
Thomas Hellstrom wrote:
Yes, it looks correct. Although it seems a little unintuitive to enter
the loop with the spinlock held, and exit it with the spinlock not held.
I've attached yet another patch to have that fixed. Could you take a
look at whether it seems OK wi
Thomas Hellstrom wrote:
Yes, it looks correct. Although it seems a little unintuitive to enter
the loop with the spinlock held, and exit it with the spinlock not held.
I've attached yet another patch to have that fixed. Could you take a
look at whether it seems OK with you and, in that case,
Luca Barbieri wrote:
>> Would nentry=list_first_entry(&entry->ddestroy, ) work?
>>
> Yes, it seems a bit less intuitive, but if that's the accepted
> practice, let's do that instead.
>
>
>> Here nentry may have been removed from the list by another process, which
>> would trigger the u
Luca,
Good catch.
Some comments inline:
Luca Barbieri wrote:
> + entry = list_first_entry(&bdev->ddestroy,
> + struct ttm_buffer_object, ddestroy);
> + kref_get(&entry->list_kref);
>
> - if (next != &bdev->ddestroy) {
> - nentry = list_entry
> Would nentry=list_first_entry(&entry->ddestroy, ) work?
Yes, it seems a bit less intuitive, but if that's the accepted
practice, let's do that instead.
> Here nentry may have been removed from the list by another process, which
> would trigger the unnecessary call, mentioned above.
You are
ttm_bo_delayed_delete has a race condition, because after we do:
kref_put(&nentry->list_kref, ttm_bo_release_list);
we are not holding the list lock and not holding any reference to
objects, and thus every bo in the list can be removed and freed at
this point.
However, we then use the next pointe
33 matches
Mail list logo