On 11/10/23 09:50, Christian König wrote:
Am 09.11.23 um 19:34 schrieb Danilo Krummrich:
On 11/9/23 17:03, Christian König wrote:
Am 09.11.23 um 16:50 schrieb Thomas Hellström:
[SNIP]

Did we get any resolution on this?

FWIW, my take on this is that it would be possible to get GPUVM to work both 
with and without internal refcounting; If with, the driver needs a vm close to 
resolve cyclic references, if without that's not necessary. If GPUVM is allowed 
to refcount in mappings and vm_bos, that comes with a slight performance drop 
but as Danilo pointed out, the VM lifetime problem iterating over a vm_bo's 
mapping becomes much easier and the code thus becomes easier to maintain moving 
forward. That convinced me it's a good thing.

I strongly believe you guys stumbled over one of the core problems with the VM 
here and I think that reference counting is the right answer to solving this.

The big question is that what is reference counted and in which direction does 
the dependency points, e.g. we have here VM, BO, BO_VM and Mapping objects.

Those patches here suggest a counted Mapping -> VM reference and I'm pretty sure that 
this isn't a good idea. What we should rather really have is a BO -> VM or BO_VM 
->VM reference. In other words that each BO which is part of the VM keeps a reference 
to the VM.

We have both. Please see the subsequent patch introducing VM_BO structures for 
that.

As I explained, mappings (struct drm_gpuva) keep a pointer to their VM they're 
mapped
in and besides that it doesn't make sense to free a VM that still contains 
mappings,
the reference count ensures that. This simply ensures memory safety.


BTW: At least in amdgpu we can have BOs which (temporary) doesn't have any 
mappings, but are still considered part of the VM.

That should be possible.



Another issue Christian brought up is that something intended to be embeddable 
(a base class) shouldn't really have its own refcount. I think that's a valid 
point. If you at some point need to derive from multiple such structs each 
having its own refcount, things will start to get weird. One way to resolve 
that would be to have the driver's subclass provide get() and put() ops, and 
export a destructor for the base-class, rather than to have the base-class 
provide the refcount and a destructor  ops.

GPUVM simply follows the same pattern we have with drm_gem_objects. And I think 
it makes
sense. Why would we want to embed two struct drm_gpuvm in a single driver 
structure?

Because you need one drm_gpuvm structure for each application using the driver? 
Or am I missing something?

Right, *one*, but not more than one. Wasn't that the concern? Maybe I 
misunderstood something. :)


As far as I can see a driver would want to embed that into your fpriv structure 
which is allocated during drm_driver.open callback.



Well, I have never seen stuff like that in the kernel. Might be that this 
works, but I would rather not try if avoidable.


That would also make it possible for the driver to decide the context for the 
put() call: If the driver needs to be able to call put() from irq / atomic 
context but the base-class'es destructor doesn't allow atomic context, the 
driver can push freeing out to a work item if needed.

Finally, the refcount overflow Christian pointed out. Limiting the number of 
mapping sounds like a reasonable remedy to me.

Well that depends, I would rather avoid having a dependency for mappings.

Taking the CPU VM handling as example as far as I know vm_area_structs doesn't 
grab a reference to their mm_struct either. Instead they get automatically 
destroyed when the mm_struct is destroyed.

Certainly, that would be possible. However, thinking about it, this might call 
for
huge trouble.

First of all, we'd still need to reference count a GPUVM and take a reference 
for each
VM_BO, as we do already. Now instead of simply increasing the reference count 
for each
mapping as well, we'd need a *mandatory* driver callback that is called when 
the GPUVM
reference count drops to zero. Maybe something like vm_destroy().

The reason is that GPUVM can't just remove all mappings from the tree nor can 
it free them
by itself, since drivers might use them for tracking their allocated page 
tables and/or
other stuff.

Now, let's think about the scope this callback might be called from. When a 
VM_BO is destroyed
the driver might hold a couple of locks (for Xe it would be the VM's shared 
dma-resv lock and
potentially the corresponding object's dma-resv lock if they're not the same 
already). If
destroying this VM_BO leads to the VM being destroyed, the drivers vm_destroy() 
callback would
be called with those locks being held as well.

I feel like doing this finally opens the doors of the locking hell entirely. I 
think we should
really avoid that.

That's a really good point, but I fear exactly that's the use case.

I would expect that VM_BO structures are added in the drm_gem_object_funcs.open 
callback and freed in drm_gem_object_funcs.close.

Since it is perfectly legal for userspace to close a BO while there are still 
mappings (can trivial be that the app is killed) I would expect that the 
drm_gem_object_funcs.close handling is something like asking drm_gpuvm 
destroying the VM_BO and getting the mappings which should be cleared in the 
page table in return.

Yes, you can do that. You can ask GPUVM to create a set of operations for you 
to shut down all
mappings of a given VM_BO. But until this is done the VM_BO structure is kept 
alive. Semantically,
it's exactly the same though.


In amdgpu we even go a step further and the VM structure keeps track of all the 
mappings of deleted VM_BOs so that higher level can query those and clear them 
later on.

Background is that the drm_gem_object_funcs.close can't fail, but it can 
perfectly be that the app is killed because of an OOM situation and we can't do 
page tables updates in that moment because of this.



Which makes sense in that case because when the mm_struct is gone the 
vm_area_struct doesn't make sense any more either.

What we clearly need is a reference to prevent the VM or at least the shared 
resv to go away to early.

Yeah, that was a good hint and we've covered that.


Regards,
Christian.


But I think all of this is fixable as follow-ups if needed, unless I'm missing 
something crucial.

Fully agree, I think at this point we should go ahead and land this series.

Yeah, agree this is not UAPI so not nailed in stone. Feel free to add my 
acked-by as well if you want.

Only keep in mind that when you give drivers some functionality in a common 
component they usually expect to keep that functionality.

For example changing the dma_resv object to make sure that drivers can't cause 
use after free errors any more was an extremely annoying experience since every 
user of those interface had to change at once.

Regards,
Christian.



Just my 2 cents.

/Thomas






Reply via email to