Re: [PATCH v4] drm/ttm: Clarify that the TTM_PL_SYSTEM is under TTMs control

Thomas Hellström Tue, 16 Nov 2021 00:34:04 -0800


On 11/16/21 09:20, Christian König wrote:

Am 16.11.21 um 08:43 schrieb Thomas Hellström:
On 11/16/21 08:19, Christian König wrote:
Am 13.11.21 um 12:26 schrieb Thomas Hellström:
Hi, Zack,

On 11/11/21 17:44, Zack Rusin wrote:
On Wed, 2021-11-10 at 09:50 -0500, Zack Rusin wrote:
TTM takes full control over TTM_PL_SYSTEM placed buffers. This makes
driver internal usage of TTM_PL_SYSTEM prone to errors because it
requires the drivers to manually handle all interactions between TTM
which can swap out those buffers whenever it thinks it's the right
thing to do and driver.

CPU buffers which need to be fenced and shared with accelerators
should
be placed in driver specific placements that can explicitly handle
CPU/accelerator buffer fencing.
Currently, apart, from things silently failing nothing is enforcing
that requirement which means that it's easy for drivers and new
developers to get this wrong. To avoid the confusion we can document
this requirement and clarify the solution.

This came up during a discussion on dri-devel:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2F232f45e9-8748-1243-09bf-56763e6668b3%40amd.com&data=04%7C01%7Cchristian.koenig%40amd.com%7C55e15a3b151b401993ca08d9a8d4c878%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637726454113422983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=HSg2rZf1yFsCCOUOcoG5Y0ogGE%2FsUymh3UqJYvZ1%2BDM%3D&reserved=0
I took a slightly deeper look into this. I think we need toformalize this a bit more to understand pros and cons and what therestrictions are really all about. Anybody looking at the prevousdiscussion will mostly see arguments similar to "this is stupid anddifficult" and "it has always been this way" which are not reallyconstructive.
First disregarding all accounting stuff, I think this all boilsdown to TTM_PL_SYSTEM having three distinct states:
1) POPULATED
2) LIMBO (Or whatever we want to call it. No pages present)
3) SWAPPED.
The ttm_bo_move_memcpy() helper understands these, and anystandalone driver implementation of the move() callback _currently_needs to understand these as well, unless using thettm_bo_move_memcpy() helper.
Now using a bounce domain to proxy SYSTEM means that the driver canforget about the SWAPPED state, it's automatically handled by themove setup code. However, another pitfall is LIMBO, in that if whenyou move from SYSTEM/LIMBO to your bounce domain, the BO will bepopulated. So any naive accelerated move() implementation creatinga 1GB BO in fixed memory, like VRAM, will needlessly allocate andfree 1GB of system memory in the process instead of just performinga clear operation. Looks like amdgpu suffers from this?
I think what is really needed is either
a) A TTM helper that helps move callback implementations resolvethe issues populating system from LIMBO or SWAP, and then alsoformalize driver notification for swapping. At a minimum, I thinkthe swap_notify() callback needs to be able to return a late error.
b) Make LIMBO and SWAPPED distinct memory regions. (I think I'dvote for this without looking into it in detail).
In both these cases, we should really make SYSTEM bindable by GPU,otherwise we'd just be trading one pitfall for another relatedwithout really resolving the root problem.
As for fencing not being supported by SYSTEM, I'm not sure why wedon't want this, because it would for example prohibit asyncttm_move_memcpy(), and also, async unbinding of ttm_tt memory likeMOB on vmgfx. (I think it's still sync).
There might be an accounting issue related to this as well, but Iguess Christian would need to chime in on this. If so, I think itneeds to be well understood and documented (in TTM, not in AMDdrivers).
I think the problem goes deeper than what has been mentioned here sofar.
Having fences attached to BOs in the system domain is probably ok,but the key point is that the BOs in the system domain are underTTMs control and should not be touched by the driver.
What we have now is that TTMs internals like the allocation state ofBOs in system memory (the populated, limbo, swapped you mentionedabove) is leaking into the drivers and I think exactly that is thepart which doesn't work reliable here. You can of course can getthat working, but that requires knowledge of the internal statewhich in my eyes was always illegal.
Well, I tend to agree to some extent, but then, like said above evendisregarding swap will cause trouble with the limbo state, becausethe driver's move callback would need knowledge of that to implementmoves limbo -> vram efficiently.
Well my long term plan is to audit the code base once more and removethe limbo state from the SYSTEM domain.
E.g. instead of a SYSTEM BO without pages you allocate a BO without aresource in general which is now possible since bo->resource is apointer.
This would still allow us to allocate "empty shell" BOs. But avalidation of those BOs doesn't cause a move, but rather justallocates the resource for the first time.
The problem so far was just that we access bo->resource way to oftenwithout checking it.

So the driver would then at least need to be aware of these empty shellbos without resource for their move callbacks? (Again thinking of themove from empty shell -> VRAM).


Thanks,

/Thomas

Re: [PATCH v4] drm/ttm: Clarify that the TTM_PL_SYSTEM is under TTMs control

Reply via email to