> Also note that the delayed delete list is not in fence order but in > deletion-time order, which perhaps gives room for more optimizations. You are right. I think then that ttm_bo_delayed_delete may still need to be changed, because it stops when ttm_bo_cleanup_refs returns -EBUSY, which happens when a fence has not been reached. This means that a buffer will need to wait for all previously deleted buffers to become unused, even if it is unused itself. Is this acceptable?
What if we get rid of the delayed destroy list, and instead append buffers to be deleted to their fence object, and delete them when the fence is signaled? This also allows to do it more naturally, since the fence object can just keep a normal reference to the buffers it fences, and unreference them on expiration. Then there needs to be no special "delayed destruction" logic, and it would work as if the GPU were keeping a reference to the buffer itself, using fences as a proxy to have the CPU do that work for the GPU. Then the delayed work is no longer "periodically destroy buffers" but rather "periodically check if fences are expired", naturally stopping at the first unexpired one. Drivers that support IRQs on fences could also do the work in the interrupt handler/tasklet instead, avoid the delay jiffies magic number. This may need a NAPI-like interrupt mitigation middle layer for optimal results though. > But isn't an atomic cmpxchg about as costly as a spinlock? I think it's cheaper on all architectures, as otherwise it would be mostly pointless to have it, since you can emulate it with a spinlock. _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau