Re: [Mesa-dev] [RFC] Fix attempt for Mesa + X-Server 1.20 + modesetting-ddx hangs on KDE5.
On Fri, 2018-05-04 at 15:45 +0200, Mario Kleiner wrote: > The real problem, if i understand it correctly, is the way the life-time > of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's > bindContext() functions. Whenever glXMakeCurrent() etc. are called to > assign new/different GLXDrawables to the same context (ie. one context > reused for drawing into many different drawables, as opposed to using > one dedicated context for each drawable), we destroy the underlying > DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all > state wrt. pending bufferswaps, msc, sbc, ust. That's utterly, utterly, utterly broken. > Therefore one of these patches is either a good enough fix for the KDE > hang problems atm. or a diagnosis of the problem as a starting point for > brighter people to deal with the root cause ;-) I'll see what I can come up with. I'm not sure there's a great fix for this that doesn't involve a few more roundtrips at MakeCurrent time, since we can lose drawables asynchronously, but such is life. - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Fix attempt for Mesa + X-Server 1.20 + modesetting-ddx hangs on KDE5.
On 2018-05-08 06:41 PM, Adam Jackson wrote: > On Fri, 2018-05-04 at 15:45 +0200, Mario Kleiner wrote: > >> The real problem, if i understand it correctly, is the way the life-time >> of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's >> bindContext() functions. Whenever glXMakeCurrent() etc. are called to >> assign new/different GLXDrawables to the same context (ie. one context >> reused for drawing into many different drawables, as opposed to using >> one dedicated context for each drawable), we destroy the underlying >> DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all >> state wrt. pending bufferswaps, msc, sbc, ust. > > That's utterly, utterly, utterly broken. > >> Therefore one of these patches is either a good enough fix for the KDE >> hang problems atm. or a diagnosis of the problem as a starting point for >> brighter people to deal with the root cause ;-) > > I'll see what I can come up with. I'm not sure there's a great fix for > this that doesn't involve a few more roundtrips at MakeCurrent time, > since we can lose drawables asynchronously, but such is life. I had an idea, at least for SBC: In dri3_destroy_drawable, store the drawable's send_sbc value in a hash table (keyed on the XID) in struct dri3_screen. Then in dri3_create_drawable, if there's an entry for the drawable's XID in the hash table, initialize send_sbc and recv_sbc to that. If nobody beats me to it, I'll try this tomorrow. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Fix attempt for Mesa + X-Server 1.20 + modesetting-ddx hangs on KDE5.
On Sun, May 6, 2018 at 1:51 PM, Tobias Klausmannwrote: > Hi, > > fyi: there is another bugreport #106372 [1], where i bisected the problem in > the xserver and found a problematic commit, with code which can easily be > reverted (patch in the bugreport), maybe you could check if that fixes the > issue as well! Hi Tobias, thanks for the info. Yes, that's consistent with the Mesa bug and why it apparently happens only 1.20 modesetting-ddx - or infrequently enough on other ddx'en for nobody making a connection. 1. Mesa feeds way too large (way in the future) >> 2^32 targetMsc's into the PresentPixmap request, due to the Mesa bug. 2. Other ddx truncate the way too large targetMsc back to < 2^32 when using the old drmWaitVblank ioctl to queue a vblank event, and due to the magic of integer 32 bit truncation, most or all of the damage is undone. Maybe no glitch, or only a hang of a few frames duration, or only very infrequent long hangs, depending on the exact timing of client vs. server execution, what and how much drawing plasmashell does, etc. 3. modesetting-ddx directly queues the too large targetMsc via the new drmCrtcQueueSequence ioctl if running on Linux 4.15 or later, and the kernel dutyfully waits forever -> Hang. I think in Michel's debug patch, only applying the #if 0 for the ms_queue_vblank() function should be enough for the ddx to work around the Mesa bug. Fixing client bugs in the server is probably not a good idea though, given that we know it is a Mesa bug. I think i found - and hopefully fixed - three other bugs in the modesetting-ddx vblank handling, but they would only help for other issues, not this specific one. thanks, -mario > > PS: I looked into bugzilla last weekend where i bisected this issue and did > not recheck when opening the actual bugreport (sorry for that) > > [1] https://bugs.freedesktop.org/show_bug.cgi?id=106372 > > Greetings, > > Tobias > > > > On 5/4/18 3:45 PM, Mario Kleiner wrote: >> >> Two patches, solving the same problem in two different ways, the 1st >> one ready to go, the 2nd one would need the debug statements removed. >> >> Only apply one of those for testing, the 2nd one will be useless with >> the 1st one applied, but demonstrates the problem. >> >> So X-Server 1.20 RC + modesetting-ddx with DRI3/Present hangs at least >> KDE-5's plasmashell and makes KDE-5 unusable with that setup. >> >> As KDE's plasmashell uses QT-5's QtQuick OpenGL based rendering api's >> to render scene-graphs, this bug might affect other QT applications >> as well. >> >> This fix works, but it points to some problems in modesetting-ddx's >> current vblank handling, because other ddx'en seem to be mostly >> unaffected by this Mesa bug. >> >> The problem is that neither of these two fixes is a proper final >> solution, but better than nothing. It leaves the OML_sync_control >> extensions glXWaitForSbcOML(), glXWaitForMscOML() calls and the >> SGI_video_sync glXWaitVideoSyncSGI() functions broken for some >> use patterns. >> >> The real problem, if i understand it correctly, is the way the life-time >> of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's >> bindContext() functions. Whenever glXMakeCurrent() etc. are called to >> assign new/different GLXDrawables to the same context (ie. one context >> reused for drawing into many different drawables, as opposed to using >> one dedicated context for each drawable), we destroy the underlying >> DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all >> state wrt. pending bufferswaps, msc, sbc, ust. >> >> Nothing in the specs says that clients should expect to lose such >> state on a GLXDrawable d1 whenever they reassign drawables other than >> d1 to a GL context. A sequence like... >> >> 1.glXMakeCurrent(context, drawable1); >> 2.draw draw draw >> 3.glXSwapbuffers(context, drawable1); >> 4.glXMakeCurrent(context, drawable2); // drawable 1 loses all state! >> 5.glXWaitForSbcOML(dpy, drawable1, ...); >> >> ... would probably cause a hang of the client in glXWaitForSbcOML, as >> the function requires information stored in the "original" drawable1 >> up to step 3, but lost in step 4 due to dri3_drawable destruction. >> >> Patch 1 has a potentially large performance impact when switching >> drawables on a given context, due to the enforced wait on swap completion, >> but might save OML clients which do waits for sbc,msc on a separate >> thread, >> whereas patch 2 doesn't have a performance impact, but doesn't even >> partially solve trouble with OML_sync_control. >> >> However, i'm totally out of time atm. and probably not the right person >> to think about a better solution, and by dumb luck, my own application >> doesn't recycle the same context for different drawables, but uses a >> dedicated context for each drawable, so it dodges this bullet. >> >> Therefore one of these patches is either a good enough fix for the KDE >> hang problems atm. or a diagnosis of the
Re: [Mesa-dev] [RFC] Fix attempt for Mesa + X-Server 1.20 + modesetting-ddx hangs on KDE5.
Hi, fyi: there is another bugreport #106372 [1], where i bisected the problem in the xserver and found a problematic commit, with code which can easily be reverted (patch in the bugreport), maybe you could check if that fixes the issue as well! PS: I looked into bugzilla last weekend where i bisected this issue and did not recheck when opening the actual bugreport (sorry for that) [1] https://bugs.freedesktop.org/show_bug.cgi?id=106372 Greetings, Tobias On 5/4/18 3:45 PM, Mario Kleiner wrote: Two patches, solving the same problem in two different ways, the 1st one ready to go, the 2nd one would need the debug statements removed. Only apply one of those for testing, the 2nd one will be useless with the 1st one applied, but demonstrates the problem. So X-Server 1.20 RC + modesetting-ddx with DRI3/Present hangs at least KDE-5's plasmashell and makes KDE-5 unusable with that setup. As KDE's plasmashell uses QT-5's QtQuick OpenGL based rendering api's to render scene-graphs, this bug might affect other QT applications as well. This fix works, but it points to some problems in modesetting-ddx's current vblank handling, because other ddx'en seem to be mostly unaffected by this Mesa bug. The problem is that neither of these two fixes is a proper final solution, but better than nothing. It leaves the OML_sync_control extensions glXWaitForSbcOML(), glXWaitForMscOML() calls and the SGI_video_sync glXWaitVideoSyncSGI() functions broken for some use patterns. The real problem, if i understand it correctly, is the way the life-time of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's bindContext() functions. Whenever glXMakeCurrent() etc. are called to assign new/different GLXDrawables to the same context (ie. one context reused for drawing into many different drawables, as opposed to using one dedicated context for each drawable), we destroy the underlying DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all state wrt. pending bufferswaps, msc, sbc, ust. Nothing in the specs says that clients should expect to lose such state on a GLXDrawable d1 whenever they reassign drawables other than d1 to a GL context. A sequence like... 1.glXMakeCurrent(context, drawable1); 2.draw draw draw 3.glXSwapbuffers(context, drawable1); 4.glXMakeCurrent(context, drawable2); // drawable 1 loses all state! 5.glXWaitForSbcOML(dpy, drawable1, ...); ... would probably cause a hang of the client in glXWaitForSbcOML, as the function requires information stored in the "original" drawable1 up to step 3, but lost in step 4 due to dri3_drawable destruction. Patch 1 has a potentially large performance impact when switching drawables on a given context, due to the enforced wait on swap completion, but might save OML clients which do waits for sbc,msc on a separate thread, whereas patch 2 doesn't have a performance impact, but doesn't even partially solve trouble with OML_sync_control. However, i'm totally out of time atm. and probably not the right person to think about a better solution, and by dumb luck, my own application doesn't recycle the same context for different drawables, but uses a dedicated context for each drawable, so it dodges this bullet. Therefore one of these patches is either a good enough fix for the KDE hang problems atm. or a diagnosis of the problem as a starting point for brighter people to deal with the root cause ;-) Thanks, -mario ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev