> At a first glance: > > 1) We probably *will* need a delayed destroyed workqueue to avoid wasting > memory that otherwise should be freed to the system. At the very least, the > delayed delete process should optionally be run by a system shrinker. You are right. For VRAM we don't care since we are the only user, while for system backed memory some delayed destruction will be needed. The logical extension of the scheme would be for the Linux page allocator/swapper to check for TTM buffers to destroy when it would otherwise shrink caches, try to swap and/or wait on swap to happen. Not sure whether there are existing hooks for this or where exactly to hook this code.
> 2) Fences in TTM are currently not necessarily strictly ordered, and > sequence numbers are hidden from the bo code. This means, for a given FIFO, > fence sequence 3 may expire before fence sequence 2, depending on the usage > of the buffer. My definition of "channel" (I sometimes used FIFO incorrectly as a synonym of that) is exactly a set of fences that are strictly ordered. If the card has multiple HW engines, each is considered a different channel (so that a channel becomes a (fifo, engine) pair). We may need however to add the concept of a "sync domain" that would be a set of channels that support on-GPU synchronization against each other. This would model hardware where channels with the same FIFO can be synchronized together but those with different FIFOs don't, and also multi-core GPUs where synchronization might be available only inside each core and not across cores. To sum it up, a GPU consists of a set of sync domains, each consisting of a set of channels, each consisting of a sequence of fences, with the following rules: 1. Fences within the same channel expire in order 2. If channels A and B belong to the same sync domain, it's possible to emit a fence on A that is guaranteed to expire after an arbitrary fence of B Whether channels have the same FIFO or not is essentially a driver implementation detail, and what TTM cares about is if they are in the same sync domain. [I just made up "sync domain" here: is there a standard term?] This assumes that the "synchronizability" graph is a disjoint union of complete graphs. Is there any example where it is not so? Also, does this actually model correctly Poulsbo, or am I wrong? Note that we could use CPU mediation more than we currently do. For instance now Nouveau, to do inter-channel synchronization, simply waits on the fence with the CPU immediately synchronously, while it could instead queue the commands in software, and with an interrupt/delayed mechanism submit them to hardware once the fence to be waited for is expired. ------------------------------------------------------------------------------ Throughout its 18-year history, RSA Conference consistently attracts the world's best and brightest in the field, creating opportunities for Conference attendees to learn about information security's most important issues through interactions with peers, luminaries and emerging and established companies. http://p.sf.net/sfu/rsaconf-dev2dev -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel