Luca Barbieri <l...@luca-barbieri.com> writes: >> At a first glance: >> >> 1) We probably *will* need a delayed destroyed workqueue to avoid wasting >> memory that otherwise should be freed to the system. At the very least, the >> delayed delete process should optionally be run by a system shrinker. > You are right. For VRAM we don't care since we are the only user, > while for system backed memory some delayed destruction will be > needed. > The logical extension of the scheme would be for the Linux page > allocator/swapper to check for TTM buffers to destroy when it would > otherwise shrink caches, try to swap and/or wait on swap to happen. > Not sure whether there are existing hooks for this or where exactly to > hook this code. > >> 2) Fences in TTM are currently not necessarily strictly ordered, and >> sequence numbers are hidden from the bo code. This means, for a given FIFO, >> fence sequence 3 may expire before fence sequence 2, depending on the usage >> of the buffer. > > My definition of "channel" (I sometimes used FIFO incorrectly as a > synonym of that) is exactly a set of fences that are strictly ordered. > If the card has multiple HW engines, each is considered a different > channel (so that a channel becomes a (fifo, engine) pair). > > We may need however to add the concept of a "sync domain" that would > be a set of channels that support on-GPU synchronization against each > other. > This would model hardware where channels with the same FIFO can be > synchronized together but those with different FIFOs don't, and also > multi-core GPUs where synchronization might be available only inside > each core and not across cores. > > To sum it up, a GPU consists of a set of sync domains, each consisting > of a set of channels, each consisting of a sequence of fences, with > the following rules: > 1. Fences within the same channel expire in order > 2. If channels A and B belong to the same sync domain, it's possible > to emit a fence on A that is guaranteed to expire after an arbitrary > fence of B > > Whether channels have the same FIFO or not is essentially a driver > implementation detail, and what TTM cares about is if they are in the > same sync domain. > > [I just made up "sync domain" here: is there a standard term?] > > This assumes that the "synchronizability" graph is a disjoint union of > complete graphs. Is there any example where it is not so? > Also, does this actually model correctly Poulsbo, or am I wrong? > > Note that we could use CPU mediation more than we currently do. > For instance now Nouveau, to do inter-channel synchronization, simply > waits on the fence with the CPU immediately synchronously, while it > could instead queue the commands in software, and with an > interrupt/delayed mechanism submit them to hardware once the fence to > be waited for is expired.
Nvidia cards have a synchronization primitive that could be used to synchronize several FIFOs in hardware (AKA semaphores, see [1] for an example). > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau [1] http://lists.freedesktop.org/archives/nouveau/2009-December/004514.html
pgpPDUkLn0FBH.pgp
Description: PGP signature
_______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau