Re: [Mesa-dev] [RFC] Fix attempt for Mesa + X-Server 1.20 + modesetting-ddx hangs on KDE5.

2018-05-09 Thread Adam Jackson
On Fri, 2018-05-04 at 15:45 +0200, Mario Kleiner wrote:

> The real problem, if i understand it correctly, is the way the life-time
> of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's
> bindContext() functions. Whenever glXMakeCurrent() etc. are called to
> assign new/different GLXDrawables to the same context (ie. one context
> reused for drawing into many different drawables, as opposed to using
> one dedicated context for each drawable), we destroy the underlying
> DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all
> state wrt. pending bufferswaps, msc, sbc, ust.

That's utterly, utterly, utterly broken.

> Therefore one of these patches is either a good enough fix for the KDE
> hang problems atm. or a diagnosis of the problem as a starting point for
> brighter people to deal with the root cause ;-)

I'll see what I can come up with. I'm not sure there's a great fix for
this that doesn't involve a few more roundtrips at MakeCurrent time,
since we can lose drawables asynchronously, but such is life.

- ajax
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Fix attempt for Mesa + X-Server 1.20 + modesetting-ddx hangs on KDE5.

2018-05-08 Thread Michel Dänzer
On 2018-05-08 06:41 PM, Adam Jackson wrote:
> On Fri, 2018-05-04 at 15:45 +0200, Mario Kleiner wrote:
> 
>> The real problem, if i understand it correctly, is the way the life-time
>> of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's
>> bindContext() functions. Whenever glXMakeCurrent() etc. are called to
>> assign new/different GLXDrawables to the same context (ie. one context
>> reused for drawing into many different drawables, as opposed to using
>> one dedicated context for each drawable), we destroy the underlying
>> DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all
>> state wrt. pending bufferswaps, msc, sbc, ust.
> 
> That's utterly, utterly, utterly broken.
> 
>> Therefore one of these patches is either a good enough fix for the KDE
>> hang problems atm. or a diagnosis of the problem as a starting point for
>> brighter people to deal with the root cause ;-)
> 
> I'll see what I can come up with. I'm not sure there's a great fix for
> this that doesn't involve a few more roundtrips at MakeCurrent time,
> since we can lose drawables asynchronously, but such is life.

I had an idea, at least for SBC:

In dri3_destroy_drawable, store the drawable's send_sbc value in a hash
table (keyed on the XID) in struct dri3_screen. Then in
dri3_create_drawable, if there's an entry for the drawable's XID in the
hash table, initialize send_sbc and recv_sbc to that.

If nobody beats me to it, I'll try this tomorrow.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] Fix attempt for Mesa + X-Server 1.20 + modesetting-ddx hangs on KDE5.

2018-05-06 Thread Mario Kleiner
On Sun, May 6, 2018 at 1:51 PM, Tobias Klausmann
 wrote:
> Hi,
>
> fyi: there is another bugreport #106372 [1], where i bisected the problem in
> the xserver and found a problematic commit, with code which can easily be
> reverted (patch in the bugreport), maybe you could check if that fixes the
> issue as well!

Hi Tobias,

thanks for the info. Yes, that's consistent with the Mesa bug and why
it apparently happens only 1.20 modesetting-ddx - or infrequently
enough on other ddx'en for nobody making a connection.

1. Mesa feeds way too large (way in the future)  >> 2^32 targetMsc's
into the PresentPixmap request, due to the Mesa bug.

2. Other ddx truncate the way too large targetMsc back to < 2^32 when
using the old drmWaitVblank ioctl to queue a vblank event, and due to
the magic of integer 32 bit truncation, most or all of the damage is
undone. Maybe no glitch, or only a hang of a few frames duration, or
only very infrequent long hangs, depending on the exact timing of
client vs. server execution, what and how much drawing plasmashell
does, etc.

3. modesetting-ddx directly queues the too large targetMsc via the new
drmCrtcQueueSequence ioctl if running on Linux 4.15 or later, and the
kernel dutyfully waits forever -> Hang.

I think in Michel's debug patch, only applying the #if 0 for the
ms_queue_vblank() function should be enough for the ddx to work around
the Mesa bug. Fixing client bugs in the server is probably not a good
idea though, given that we know it is a Mesa bug.

I think i found - and hopefully fixed - three other bugs in the
modesetting-ddx vblank handling, but they would only help for other
issues, not this specific one.

thanks,
-mario

>
> PS: I looked into bugzilla last weekend where i bisected this issue and did
> not recheck when opening the actual bugreport (sorry for that)
>
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=106372
>
> Greetings,
>
> Tobias
>
>
>
> On 5/4/18 3:45 PM, Mario Kleiner wrote:
>>
>> Two patches, solving the same problem in two different ways, the 1st
>> one ready to go, the 2nd one would need the debug statements removed.
>>
>> Only apply one of those for testing, the 2nd one will be useless with
>> the 1st one applied, but demonstrates the problem.
>>
>> So X-Server 1.20 RC + modesetting-ddx with DRI3/Present hangs at least
>> KDE-5's plasmashell and makes KDE-5 unusable with that setup.
>>
>> As KDE's plasmashell uses QT-5's QtQuick OpenGL based rendering api's
>> to render scene-graphs, this bug might affect other QT applications
>> as well.
>>
>> This fix works, but it points to some problems in modesetting-ddx's
>> current vblank handling, because other ddx'en seem to be mostly
>> unaffected by this Mesa bug.
>>
>> The problem is that neither of these two fixes is a proper final
>> solution, but better than nothing. It leaves the OML_sync_control
>> extensions glXWaitForSbcOML(), glXWaitForMscOML() calls and the
>> SGI_video_sync glXWaitVideoSyncSGI() functions broken for some
>> use patterns.
>>
>> The real problem, if i understand it correctly, is the way the life-time
>> of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's
>> bindContext() functions. Whenever glXMakeCurrent() etc. are called to
>> assign new/different GLXDrawables to the same context (ie. one context
>> reused for drawing into many different drawables, as opposed to using
>> one dedicated context for each drawable), we destroy the underlying
>> DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all
>> state wrt. pending bufferswaps, msc, sbc, ust.
>>
>> Nothing in the specs says that clients should expect to lose such
>> state on a GLXDrawable d1 whenever they reassign drawables other than
>> d1 to a GL context. A sequence like...
>>
>> 1.glXMakeCurrent(context, drawable1);
>> 2.draw draw draw
>> 3.glXSwapbuffers(context, drawable1);
>> 4.glXMakeCurrent(context, drawable2); // drawable 1 loses all state!
>> 5.glXWaitForSbcOML(dpy, drawable1, ...);
>>
>> ... would probably cause a hang of the client in glXWaitForSbcOML, as
>> the function requires information stored in the "original" drawable1
>> up to step 3, but lost in step 4 due to dri3_drawable destruction.
>>
>> Patch 1 has a potentially large performance impact when switching
>> drawables on a given context, due to the enforced wait on swap completion,
>> but might save OML clients which do waits for sbc,msc on a separate
>> thread,
>> whereas patch 2 doesn't have a performance impact, but doesn't even
>> partially solve trouble with OML_sync_control.
>>
>> However, i'm totally out of time atm. and probably not the right person
>> to think about a better solution, and by dumb luck, my own application
>> doesn't recycle the same context for different drawables, but uses a
>> dedicated context for each drawable, so it dodges this bullet.
>>
>> Therefore one of these patches is either a good enough fix for the KDE
>> hang problems atm. or a diagnosis of the 

Re: [Mesa-dev] [RFC] Fix attempt for Mesa + X-Server 1.20 + modesetting-ddx hangs on KDE5.

2018-05-06 Thread Tobias Klausmann

Hi,

fyi: there is another bugreport #106372 [1], where i bisected the 
problem in the xserver and found a problematic commit, with code which 
can easily be reverted (patch in the bugreport), maybe you could check 
if that fixes the issue as well!


PS: I looked into bugzilla last weekend where i bisected this issue and 
did not recheck when opening the actual bugreport (sorry for that)


[1] https://bugs.freedesktop.org/show_bug.cgi?id=106372

Greetings,

Tobias


On 5/4/18 3:45 PM, Mario Kleiner wrote:

Two patches, solving the same problem in two different ways, the 1st
one ready to go, the 2nd one would need the debug statements removed.

Only apply one of those for testing, the 2nd one will be useless with
the 1st one applied, but demonstrates the problem.

So X-Server 1.20 RC + modesetting-ddx with DRI3/Present hangs at least
KDE-5's plasmashell and makes KDE-5 unusable with that setup.

As KDE's plasmashell uses QT-5's QtQuick OpenGL based rendering api's
to render scene-graphs, this bug might affect other QT applications
as well.

This fix works, but it points to some problems in modesetting-ddx's
current vblank handling, because other ddx'en seem to be mostly
unaffected by this Mesa bug.

The problem is that neither of these two fixes is a proper final
solution, but better than nothing. It leaves the OML_sync_control
extensions glXWaitForSbcOML(), glXWaitForMscOML() calls and the
SGI_video_sync glXWaitVideoSyncSGI() functions broken for some
use patterns.

The real problem, if i understand it correctly, is the way the life-time
of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's
bindContext() functions. Whenever glXMakeCurrent() etc. are called to
assign new/different GLXDrawables to the same context (ie. one context
reused for drawing into many different drawables, as opposed to using
one dedicated context for each drawable), we destroy the underlying
DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all
state wrt. pending bufferswaps, msc, sbc, ust.

Nothing in the specs says that clients should expect to lose such
state on a GLXDrawable d1 whenever they reassign drawables other than
d1 to a GL context. A sequence like...

1.glXMakeCurrent(context, drawable1);
2.draw draw draw
3.glXSwapbuffers(context, drawable1);
4.glXMakeCurrent(context, drawable2); // drawable 1 loses all state!
5.glXWaitForSbcOML(dpy, drawable1, ...);

... would probably cause a hang of the client in glXWaitForSbcOML, as
the function requires information stored in the "original" drawable1
up to step 3, but lost in step 4 due to dri3_drawable destruction.

Patch 1 has a potentially large performance impact when switching
drawables on a given context, due to the enforced wait on swap completion,
but might save OML clients which do waits for sbc,msc on a separate thread,
whereas patch 2 doesn't have a performance impact, but doesn't even
partially solve trouble with OML_sync_control.

However, i'm totally out of time atm. and probably not the right person
to think about a better solution, and by dumb luck, my own application
doesn't recycle the same context for different drawables, but uses a
dedicated context for each drawable, so it dodges this bullet.

Therefore one of these patches is either a good enough fix for the KDE
hang problems atm. or a diagnosis of the problem as a starting point for
brighter people to deal with the root cause ;-)

Thanks,
-mario

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev