Hi Thomas,

> that the access() handler took a shortcut when the new locking order 
> was  established
There is no new locking order, the access handler is just for debugging 
and ignoring the correct locking order between mmap_sem and bo_reserve.

That this is throwing a lockdep warning is perfectly possible. We should 
probably move that to a trylock instead.

> bo_reserve()
> copy_to_user() / copy_from_user()
> bo_unreserve() 
That one is illegal for a completely different reason.

The address accessed by copy_to_user()/copy_from_user() could be a BO 
itself, so to resolve this we could end up locking a BO twice.

Adding a might_lock() to the beginning of ttm_bo_vm_fault as you 
suggested doesn't work either, because at this point the mmap_sem is 
still locked.

So lockdep would complain about the incorrect bo_reserve and mmap_sem order.

Christian.

Am 13.10.2018 um 21:04 schrieb Thomas Hellstrom:
> Hi, Christian,
>
> On 10/13/2018 07:36 PM, Christian König wrote:
>> Hi Thomas,
>>
>>> bo_reserve()
>>> copy_to_user() / copy_from_user()
>>> bo_unreserve() 
>>
>> That pattern is illegal for a number of reasons and the mmap_sem is 
>> only one of it.
>>
>> So the locking order must always be mmap_sem->bo_reservation. See the 
>> userptr implementation in amdgpu as well.
>>
>> Christian.
>
> I'm not arguing against that, and since vmwgfx doesn't use that 
> pattern, the locking order doesn't really matter to me since it's even 
> possible to make the TTM fault() handler more well-behaved if we were 
> to fix the locking order to mmap_sem->bo_reserve.
>
> My concern is, since the _opposite_ locking order is (admittedly 
> vaguely) documented in the fault handler, that the access() handler 
> took a shortcut when the new locking order was established possibly 
> without auditing of the other TTM drivers for locking inversion: For 
> example it looks from a quick glance like 
> nouveau_gem_pushbuf_reloc_apply() calls copy_from_user() with bo's 
> reserved (which IIRC was the typical use-case at the time this was 
> last lifted). And lockdep won't trip unless the access() callback is 
> actually called.
>
> My point is if AMD wants to enforce this locking order, then IMHO the 
> other drivers need to be audited and corrected if they are assuming 
> the locking order documented in fault(). A good way to catch such 
> drivers would be to add that might_lock().
>
> Thanks,
> Thomas
>
>
>>
>> Am 12.10.2018 um 16:52 schrieb Thomas Hellstrom:
>>> Hi, Felix,
>>>
>>> It looks like there is a locking inversion in ttm_bo_vm_access() 
>>> where we take a sleeping bo_reserve() while holding mmap_sem().
>>>
>>> Previously we've been assuming the other way around or at least 
>>> undefined allowing for drivers to do
>>>
>>> bo_reserve()
>>> copy_to_user() / copy_from_user()
>>> bo_unreserve()
>>>
>>> I'm not sure the latter pattern is used in any drivers, though, and 
>>> I guess there are ways around it. So it might make sense to fix the 
>>> locking order at this point. In that case, perhaps one should add a
>>>
>>> might_lock(&bo->resv->lock.base);
>>>
>>> at the start of the TTM fault handler to trip lockdep on locking 
>>> order violations in situations where the access() callback isn't 
>>> commonly used...
>>>
>>> /Thomas
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to