nouveau: new VM_BIND uapi interfaces

Danilo Krummrich Fri, 27 Jan 2023 13:09:50 -0800

On 1/27/23 16:17, Christian König wrote:

Am 27.01.23 um 15:44 schrieb Danilo Krummrich:
[SNIP]
What you want is one component for tracking the VA allocations(drm_mm based) and a different component/interface for tracking theVA mappings (probably rb tree based).
That's what the GPUVA manager is doing. There are gpuva_regionswhich correspond to VA allocations and gpuvas which represent themappings. Both are tracked separately (currently both with aseparate drm_mm, though). However, the GPUVA manager needs to takeregions into account when dealing with mappings to make sure theGPUVA manager doesn't propose drivers to merge over regionboundaries. Speaking from userspace PoV, the kernel wouldn't mergemappings from different VKBuffer objects even if they're virtuallyand physically contiguous.
That are two completely different things and shouldn't be handled ina single component.
They are different things, but they're related in a way that forhandling the mappings (in particular merging and sparse) the GPUVAmanager needs to know the VA allocation (or region) boundaries.
I have the feeling there might be a misunderstanding. Userspace is incharge to actually allocate a portion of VA space and manage it. TheGPUVA manager just needs to know about those VA space allocations andhence keeps track of them.
The GPUVA manager is not meant to be an allocator in the sense offinding and providing a hole for a given request.
Maybe the non-ideal choice of using drm_mm was implying something else.
Uff, well long story short that doesn't even remotely match therequirements. This way the GPUVA manager won't be usable for a wholebunch of use cases.
What we have are mappings which say X needs to point to Y with this andhw dependent flags.
The whole idea of having ranges is not going to fly. Neither with AMDGPUs and I strongly think not with Intels XA either.

A range in the sense of the GPUVA manager simply represents a VA spaceallocation (which in case of Nouveau is taken in userspace). Userspaceallocates the portion of VA space and lets the kernel know about it. Thecurrent implementation needs that for the named reasons. So, I thinkthere is no reason why this would work with one GPU, but not withanother. It's just part of the design choice of the manager.

And I'm absolutely happy to discuss the details of the managerimplementation though.

We should probably talk about the design of the GPUVA manager oncemore when this should be applicable to all GPU drivers.
That's what I try to figure out with this RFC, how to make itappicable for all GPU drivers, so I'm happy to discuss this. :-)
Yeah, that was really good idea :) That proposal here is really far awayfrom the actual requirements.

And those are the ones I'm looking for. Do you mind sharing therequirements for amdgpu in particular?

For sparse residency the kernel also needs to know the regionboundaries to make sure that it keeps sparse mappings around.
What?
When userspace creates a new VKBuffer with theVK_BUFFER_CREATE_SPARSE_BINDING_BIT the kernel may need to createsparse mappings in order to ensure that using this buffer without anymemory backed mappings doesn't fault the GPU.
Currently, the implementation does this the following way:
1. Userspace creates a new VKBuffer and hence allocates a portion ofthe VA space for it. It calls into the kernel indicating the new VAspace region and the fact that the region is sparse.
2. The kernel picks up the region and stores it in the GPUVA manager,the driver creates the corresponding sparse mappings / page tableentries.
3. Userspace might ask the driver to create a couple of memory backedmappings for this particular VA region. The GPUVA manager stores themapping parameters, the driver creates the corresponding page tableentries.
4. Userspace might ask to unmap all the memory backed mappings fromthis particular VA region. The GPUVA manager removes the mappingparameters, the driver cleans up the corresponding page table entries.However, the driver also needs to re-create the sparse mappings, sinceit's a sparse buffer, hence it needs to know the boundaries of theregion it needs to create the sparse mappings in.
Again, this is not how things are working. First of all the kernelabsolutely should *NOT* know about those regions.
What we have inside the kernel is the information what happens if anaddress X is accessed. On AMD HW this can be:
1. Route to the PCIe bus because the mapped BO is stored in system memory.
2. Route to the internal MC because the mapped BO is stored in localmemory.
3. Route to other GPUs in the same hive.
4. Route to some doorbell to kick of other work.
...
x. Ignore write, return 0 on reads (this is what is used for sparsemappings).
x+1. Trigger a recoverable page fault. This is used for things like SVA.
x+2. Trigger a non-recoverable page fault. This is used for things likeunmapped regions where access is illegal.
All this is plus some hw specific caching flags.
When Vulkan allocates a sparse VKBuffer what should happen is thefollowing:
1. The Vulkan driver somehow figures out a VA region A..B for thebuffer. This can be in userspace (libdrm_amdgpu) or kernel (drm_mm), butessentially is currently driver specific.


Right, for Nouveau we have this in userspace as well.

2. The kernel gets a request to map the VA range A..B as sparse, meaningthat it updates the page tables from A..B with the sparse setting.
3. User space asks kernel to map a couple of memory backings at locationA+1, A+10, A+15 etc....
4. The VKBuffer is de-allocated, userspace asks kernel to update regionA..B to not map anything (usually triggers a non-recoverable fault).


Until here this seems to be identical to what I'm doing.

It'd be interesting to know how amdgpu handles everything thatpotentially happens between your 3) and 4). More specifically, how arethe page tables changed when memory backed mappings are mapped on asparse range? What happens when the memory backed mappings are unmapped,but the VKBuffer isn't de-allocated, and hence sparse mappings need tobe re-deployed?

Let's assume the sparse VKBuffer (and hence the VA space allocation) ispretty large. In Nouveau the corresponding PTEs would have a rather hugepage size to cover this. Now, if small memory backed mappings are mappedto this huge sparse buffer, in Nouveau we'd allocate a new PT with acorresponding smaller page size overlaying the sparse mappings PTEs.


How would this look like in amdgpu?

When you want to unify this between hw drivers I strongly suggest tocompletely start from scratch once more.
First of all don't think about those mappings as VMAs, that won't workbecause VMAs are usually something large. Think of this as individualPTEs controlled by the application. similar how COW mappings and structpages are handled inside the kernel.

Why do you consider tracking single PTEs superior to tracking VMAs? Allthe properties for a page you mentioned above should be equal for theentirety of pages of a whole (memory backed) mapping, aren't they?

Then I would start with the VA allocation manager. You could probablybase that on drm_mm. We handle it differently in amdgpu currently, but Ithink this is something we could change.

It was not my intention to come up with an actual allocator for the VAspace in the sense of actually finding a free and fitting hole in the VAspace.

For Nouveau (and XE, I think) we have this in userspace and from whatyou've written previously I thought the same applies for amdgpu?

Then come up with something close to the amdgpu VM system. I'm prettysure that should work for Nouveau and Intel XA as well. In other wordsyou just have a bunch of very very small structures which representsmappings and a larger structure which combine all mappings of a specifictype, e.g. all mappings of a BO or all sparse mappings etc...

Considering what you wrote above I assume that small structures /mappings in this paragraph refer to PTEs.

Immediately, I don't really see how this fine grained resolution ofsingle PTEs would help implementing this in Nouveau. Actually, I thinkit would even complicate the handling of PTs, but I would need to thinkabout this a bit more.

Merging of regions is actually not mandatory. We don't do it in amdgpuand can live with the additional mappings pretty well. But I think thiscan differ between drivers.
Regards,
Christian.

Re: [PATCH drm-next 05/14] drm/nouveau: new VM_BIND uapi interfaces

Reply via email to