On Thu, Oct 30, 2014 at 04:07:05PM +0100, Thomas Hellstrom wrote: > On 10/28/2014 12:10 PM, Daniel Vetter wrote: > > On Tue, Oct 28, 2014 at 06:35:04PM +0900, Michel Dänzer wrote: > >> From: Michel Dänzer <michel.daenzer at amd.com> > >> > >> DRM_MM_SEARCH_BEST gets the smallest hole which can fit the BO. That seems > >> against the idea of TTM_PL_FLAG_TOPDOWN: > >> > >> * The smallest hole may be in the overall bottom of the area > >> * If the hole isn't much larger than the BO, it doesn't make much > >> difference whether the BO is placed at the bottom or at the top of the > >> hole > >> > >> Signed-off-by: Michel Dänzer <michel.daenzer at amd.com> > > tbh I think SEARCH_BEST is pretty much always a bad idea - it rips apart > > allocations from the same execbuf, and usually those get recycled around > > the same time. Which means you'll just fragment your mm even more if you > > try to find the best hole instead of just picking one and the stuffing the > > entire execbuf into it. So imo we might as well just kill it. > > > > Another one that I've advertised a bunch of times already is the scan > > roaster in drm_mm.c: Currently ttm just evicts until there's a big enough > > hole, which is fairly awful if you have quasi-segmented memory like with > > top-down/bottom-up schemes and different ranges for different units. With > > the roaster you just walk the lru and build up potential holes until > > there's a suitable one, and then only evict those buffers. Which means if > > you have a certain range of memory under very high pressure (e.g. the 256M > > which uvd can use or whatever it is), then you wont thrash all the other > > vram too. > > > > Cheers, Daniel > > > And I think I've commented each time why that is currently not possible > or at least worthile with TTM ;). > > Although the idea is great, even if you make a nice list of suitable > buffers to evict, as soon as you drop the LRU spinlock, that list is > invalid. Worse, while you start evicting your list, other processes may > steal part of the hole you've created. > > As some point we should perhaps figure out how to protect each managed > area with a separate mutex, but that would create trouble if, for > example, evicting from VRAM to managed GART.
Yeah, you need to do the entire scanning/evicting under the lru lock, which means you need to jump over all the buffers you can't immediately trylock. Which also means that you need a slowpath fallback that rips out everything in case the buffer doesn't fit anywhere. Wrt the lru lock you need to at least overwrite the generic placement/eviction code since you need to a bit more logic and state-keeping for drm_mm than what's currently exposed in the super-generic ttm managed area allocator (which apparently could also manage an idr). But shouldn't be more fuzz than a new vfunc with the current code as default implementation. For lru lock recursion (due to vram->gtt eviction or similar) I guess either rework stuff to allow the entire eviction process under lru. Or keep the drm_mm allocations on some separate list until you've evicted everything (dropping the lock meanwhile) and the mass-free the entire list under the lru lock and immediately do your allocation. I'm not sure whether the long hold times for the lru lock really would be a problem (they'll happen even when you drop it for actual eviction since walking the lru lists is costly when you jump over lots of buffers that don't get you towards your goal), since evicting all buffers is probably going to thrash the system much worse anyway. And for the worst case a regular sched for lru lock contention or preemption should prevent interactivity disasters for parallel cmd submission. So yeah some details to take care of, but I don't think there's anything insurmountable. It's just that I don't have time (nor justification) for it. And it will be indeed quite a bit of work to get going. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch