On Thu, Mar 15, 2018 at 10:56 AM, Christian König
<ckoenig.leichtzumer...@gmail.com> wrote:
> Am 15.03.2018 um 10:20 schrieb Daniel Vetter:
>>
>> On Tue, Mar 13, 2018 at 06:20:07PM +0100, Christian König wrote:
>> [SNIP]
>> Take a look at the DOT graphs for atomic I've done a while ago. I think we
>> could make a formidable competition for who's doing the worst diagrams :-)
>
>
> Thanks, going to give that a try.
>
>> [SNIP]
>> amdgpu: Expects that you never hold any of the heavywheight locks while
>> waiting for a fence (since gpu resets will need them).
>>
>> i915: Happily blocks on fences while holding all kinds of locks, expects
>> gpu reset to be able to recover even in this case.
>
>
> In this case I can comfort you, the looks amdgpu needs to grab during GPU
> reset are the reservation lock of the VM page tables. I have strong doubt
> that i915 will ever hold those.

Ah good, means that very likely there's at least no huge fundamental
design issue that we run into.

> Could be that we run into problems because Thread A hold lock 1 tries to
> take lock 2, then i915 holds 2 and our reset path needs 1.

Yeah that might happen, but lockdep will catch those, and generally
those cases can be fixed with slight reordering or re-annotating of
the code to avoid upsetting lockdep. As long as we don't have a
full-on functional dependency (which is what I've feared).

>> [SNIP]
>>>
>>> Yes, except for fallback paths and bootup self tests we simply never wait
>>> for fences while holding locks.
>>
>> That's not what I meant with "are you sure". Did you enable the
>> cross-release stuff (after patching the bunch of leftover core kernel
>> issues still present), annotate dma_fence with the cross-release stuff,
>> run a bunch of multi-driver (amdgpu vs i915) dma-buf sharing tests and
>> weep?
>
>
> Ok, what exactly do you mean with cross-release checking?

Current lockdep doesn't spot deadlocks like the below:

thread A: holds mutex, waiting for completion.

thread B: acquires mutex before it will ever signal the completion A
is waiting for

->deadlock

cross-release lockdep support can catch these through new fancy
annotations. Similar waiter/signaller annotations exists for waiting
on workers and anything else, and it would be a perfect fit for
waiter/signaller code around dma_fence.

lwn has you covered a usual: https://lwn.net/Articles/709849/

Cheers, Daniel

>> I didn't do the full thing yet, but just within i915 we've found tons of
>> small little deadlocks we never really considered thanks to cross release,
>> and that wasn't even including the dma_fence annotation. Luckily nothing
>> that needed a full-on driver redesign.
>>
>> I guess I need to ping core kernel maintainers about cross-release again.
>> I'd much prefer if we could validate ->invalidate_mapping and the
>> locking/fence dependency issues using that, instead of me having to read
>> and understand all the drivers.
>
> [SNIP]
>>
>> I fear that with the ->invalidate_mapping callback (which inverts the
>> control flow between importer and exporter) and tying dma_fences into all
>> this it will be a _lot_ worse. And I'm definitely too stupid to understand
>> all the dependency chains without the aid of lockdep and a full test suite
>> (we have a bunch of amdgpu/i915 dma-buf tests in igt btw).
>
>
> Yes, that is also something I worry about.
>
> Regards,
> Christian.



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to