Re: [Dri-devel] Context teardown
Good advice! It looks like a double free in __driUtilUpdateDrawableInfo when freeing pdp->pClipRects. But at that point not only has pClipRects been freed, so has pdp! This happens (in the original code) with __driGarbageCollectDrawables being called before the driver's DestroyContext and the subsequent lock trying to operate on a drawable that doesn't exist. However, the context still has a copy of the drawable pointer, which now points to freed memory. So it looks like the first patch eliminates the problem in this case. However, I wonder if the driver's DestroyBuffer should set the context's drawable pointer to NULL, since that's called by driDestroyDrawable right before the drawable is freed. The problem there is we'd then need to fix the lock function to handle the case where there is no drawable. --Leif On Wed, 5 Feb 2003, Brian Paul wrote: > Keith Whitwell wrote: > > Leif Delgass wrote: > > > >> On Wed, 5 Feb 2003, Keith Whitwell wrote: > >> > >> > >>> Ian Romanick wrote: > >>> > Keith Whitwell wrote: > > > > The other bug report I've had is triggered in similar > > circumstances, but goes into an infinite loop inside > > DRI_VALIDATE_DRAWABLE_INFO(), as a magic stamp value never gets > > updated because the X protocol message never succeeds -- but it > > doesn't segfault. > > > > I've got a patch that solves (I hope) that problem, but I'm not > > sure working around this is a good idea as it seems to result from > > maybe a double free somewhere... > > > > Yes. The light-05 test in viewperf shows this bug on r200. If you > want to send me your patch, I can try it out. > >>> > >>> > >>> There are now two patches, one from Egbert Eich (who reported the > >>> problem). I haven't had time to look at his as it changes some deep, > >>> dark, dri stuff that I wasn't ever involved with, but looks sane > >>> nonetheless. His avoids the error reply from the X server, whereas mine > >>> copes with it once it arrives. I'm not sure either will help texobj > >>> which seems to be a malloc/free bug. > >>> > >>> I'm attaching both. I actually think applying *both* is the way to go. > >> > >> > >> > >> The reordering in driDestroyDrawable fixes the X error with texobj for > >> me. I never got a segfault running texobj outside of gdb. I do remember > >> seeing one once while debugging, but I can't recall how I got there and > >> can't reproduce it. Where did you see the malloc problem? > > > > > > The segfault you report is inside malloc, but called from the X error > > handler. As the 2nd patch removes the error, you never get to malloc, > > but my guess is something is still screwy there. However, as you say, > > you only see this in gdb, so I don't know what that means... > > Someone could probably track down the malloc problem pretty quickly with > ElectricFence or with libc's built-in memory debugger. From 'man malloc': > > > Recent versions of Linux libc (later than 5.4.23) and GNU libc (2.x) > include a malloc implementation which is tunable via environment vari- > ables. When MALLOC_CHECK_ is set, a special (less efficient) implemen- > tation is used which is designed to be tolerant against simple errors, > such as double calls of free() with the same argument, or overruns of a > single byte (off-by-one bugs). Not all such errors can be protected > against, however, and memory leaks can result. If MALLOC_CHECK_ is set > to 0, any detected heap corruption is silently ignored; if set to 1, a > diagnostic is printed on stderr; if set to 2, abort() is called immedi- > ately. This can be useful because otherwise a crash may happen much > later, and the true cause for the problem is then very hard to track > down. > > > -Brian > > > > > --- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > ___ > Dri-devel mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/dri-devel > -- Leif Delgass http://www.retinalburn.net --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Steven Paul Lilly wrote: Will all this stuff about context teardown fix the problem I'm having with my glut based program that always gives me X Error of failed request: BadValue (integer parameter out of range for operation) Major opcode of failed request: 144 (XFree86-DRI) Minor opcode of failed request: 9 () Value in failed request: 0x101 Serial number of failed request: 73 Current serial number in output stream: 73 when I exit? I'm using X compiled from the dri trunk about a week ago. I don't see this when direct rendering is disabled. Quite possibly -- try applying in particular the second (inline) patch from my earlier post. Keith --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Will all this stuff about context teardown fix the problem I'm having with my glut based program that always gives me X Error of failed request: BadValue (integer parameter out of range for operation) Major opcode of failed request: 144 (XFree86-DRI) Minor opcode of failed request: 9 () Value in failed request: 0x101 Serial number of failed request: 73 Current serial number in output stream: 73 when I exit? I'm using X compiled from the dri trunk about a week ago. I don't see this when direct rendering is disabled. --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Keith Whitwell wrote: Leif Delgass wrote: On Wed, 5 Feb 2003, Keith Whitwell wrote: Ian Romanick wrote: Keith Whitwell wrote: The other bug report I've had is triggered in similar circumstances, but goes into an infinite loop inside DRI_VALIDATE_DRAWABLE_INFO(), as a magic stamp value never gets updated because the X protocol message never succeeds -- but it doesn't segfault. I've got a patch that solves (I hope) that problem, but I'm not sure working around this is a good idea as it seems to result from maybe a double free somewhere... Yes. The light-05 test in viewperf shows this bug on r200. If you want to send me your patch, I can try it out. There are now two patches, one from Egbert Eich (who reported the problem). I haven't had time to look at his as it changes some deep, dark, dri stuff that I wasn't ever involved with, but looks sane nonetheless. His avoids the error reply from the X server, whereas mine copes with it once it arrives. I'm not sure either will help texobj which seems to be a malloc/free bug. I'm attaching both. I actually think applying *both* is the way to go. The reordering in driDestroyDrawable fixes the X error with texobj for me. I never got a segfault running texobj outside of gdb. I do remember seeing one once while debugging, but I can't recall how I got there and can't reproduce it. Where did you see the malloc problem? The segfault you report is inside malloc, but called from the X error handler. As the 2nd patch removes the error, you never get to malloc, but my guess is something is still screwy there. However, as you say, you only see this in gdb, so I don't know what that means... Someone could probably track down the malloc problem pretty quickly with ElectricFence or with libc's built-in memory debugger. From 'man malloc': Recent versions of Linux libc (later than 5.4.23) and GNU libc (2.x) include a malloc implementation which is tunable via environment vari- ables. When MALLOC_CHECK_ is set, a special (less efficient) implemen- tation is used which is designed to be tolerant against simple errors, such as double calls of free() with the same argument, or overruns of a single byte (off-by-one bugs). Not all such errors can be protected against, however, and memory leaks can result. If MALLOC_CHECK_ is set to 0, any detected heap corruption is silently ignored; if set to 1, a diagnostic is printed on stderr; if set to 2, abort() is called immedi- ately. This can be useful because otherwise a crash may happen much later, and the true cause for the problem is then very hard to track down. -Brian --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Leif Delgass wrote: On Wed, 5 Feb 2003, Keith Whitwell wrote: Ian Romanick wrote: Keith Whitwell wrote: The other bug report I've had is triggered in similar circumstances, but goes into an infinite loop inside DRI_VALIDATE_DRAWABLE_INFO(), as a magic stamp value never gets updated because the X protocol message never succeeds -- but it doesn't segfault. I've got a patch that solves (I hope) that problem, but I'm not sure working around this is a good idea as it seems to result from maybe a double free somewhere... Yes. The light-05 test in viewperf shows this bug on r200. If you want to send me your patch, I can try it out. There are now two patches, one from Egbert Eich (who reported the problem). I haven't had time to look at his as it changes some deep, dark, dri stuff that I wasn't ever involved with, but looks sane nonetheless. His avoids the error reply from the X server, whereas mine copes with it once it arrives. I'm not sure either will help texobj which seems to be a malloc/free bug. I'm attaching both. I actually think applying *both* is the way to go. The reordering in driDestroyDrawable fixes the X error with texobj for me. I never got a segfault running texobj outside of gdb. I do remember seeing one once while debugging, but I can't recall how I got there and can't reproduce it. Where did you see the malloc problem? The segfault you report is inside malloc, but called from the X error handler. As the 2nd patch removes the error, you never get to malloc, but my guess is something is still screwy there. However, as you say, you only see this in gdb, so I don't know what that means... Anyway, it sounds like it's worthwhile committing these. Keith --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
On Wed, 5 Feb 2003, Keith Whitwell wrote: > Ian Romanick wrote: > > Keith Whitwell wrote: > > > >> The other bug report I've had is triggered in similar circumstances, > >> but goes into an infinite loop inside DRI_VALIDATE_DRAWABLE_INFO(), as > >> a magic stamp value never gets updated because the X protocol message > >> never succeeds -- but it doesn't segfault. > >> > >> I've got a patch that solves (I hope) that problem, but I'm not sure > >> working around this is a good idea as it seems to result from maybe a > >> double free somewhere... > > > > > > Yes. The light-05 test in viewperf shows this bug on r200. If you want > > to send me your patch, I can try it out. > > There are now two patches, one from Egbert Eich (who reported the > problem). I haven't had time to look at his as it changes some deep, > dark, dri stuff that I wasn't ever involved with, but looks sane > nonetheless. His avoids the error reply from the X server, whereas mine > copes with it once it arrives. I'm not sure either will help texobj > which seems to be a malloc/free bug. > > I'm attaching both. I actually think applying *both* is the way to go. The reordering in driDestroyDrawable fixes the X error with texobj for me. I never got a segfault running texobj outside of gdb. I do remember seeing one once while debugging, but I can't recall how I got there and can't reproduce it. Where did you see the malloc problem? --Leif --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
> There are now two patches, one from Egbert Eich (who reported the problem). I > haven't had time to look at his as it changes some deep, dark, dri stuff that > I wasn't ever involved with, but looks sane nonetheless. His avoids the error > reply from the X server, whereas mine copes with it once it arrives. I'm not > sure either will help texobj which seems to be a malloc/free bug. > I'm attaching both. I actually think applying *both* is the way to go. I didn't yet understand what this sould buy me with a Radeon7500, but at least I have the impression that these patches don't do any harm to me when running my beloved FlightGear, Martin. -- Unix _IS_ user friendly - it's just selective about who its friends are ! -- --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Keith Whitwell wrote: Ian Romanick wrote: Keith Whitwell wrote: The other bug report I've had is triggered in similar circumstances, but goes into an infinite loop inside DRI_VALIDATE_DRAWABLE_INFO(), as a magic stamp value never gets updated because the X protocol message never succeeds -- but it doesn't segfault. I've got a patch that solves (I hope) that problem, but I'm not sure working around this is a good idea as it seems to result from maybe a double free somewhere... Yes. The light-05 test in viewperf shows this bug on r200. If you want to send me your patch, I can try it out. There are now two patches, one from Egbert Eich (who reported the problem). I haven't had time to look at his as it changes some deep, dark, dri stuff that I wasn't ever involved with, but looks sane nonetheless. His avoids the error reply from the X server, whereas mine copes with it once it arrives. I'm not sure either will help texobj which seems to be a malloc/free bug. I'm attaching both. I actually think applying *both* is the way to go. Both those patches look reasonable. Even better, they make the light-05 problems go away. :) ugs-01 still tanks, but that seems to be related to the displaylist memory usage bug mentioned earlier. --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Ian Romanick wrote: Keith Whitwell wrote: The other bug report I've had is triggered in similar circumstances, but goes into an infinite loop inside DRI_VALIDATE_DRAWABLE_INFO(), as a magic stamp value never gets updated because the X protocol message never succeeds -- but it doesn't segfault. I've got a patch that solves (I hope) that problem, but I'm not sure working around this is a good idea as it seems to result from maybe a double free somewhere... Yes. The light-05 test in viewperf shows this bug on r200. If you want to send me your patch, I can try it out. There are now two patches, one from Egbert Eich (who reported the problem). I haven't had time to look at his as it changes some deep, dark, dri stuff that I wasn't ever involved with, but looks sane nonetheless. His avoids the error reply from the X server, whereas mine copes with it once it arrives. I'm not sure either will help texobj which seems to be a malloc/free bug. I'm attaching both. I actually think applying *both* is the way to go. Keith --- dri_util.c 2002/11/25 14:26:55 1.1.1.3 +++ dri_util.c 2003/02/05 10:17:40 @@ -758,8 +762,8 @@ psp->fullscreen = NULL; } } - __driGarbageCollectDrawables(pcp->driScreenPriv->drawHash); (*pcp->driScreenPriv->DriverAPI.DestroyContext)(pcp); + __driGarbageCollectDrawables(pcp->driScreenPriv->drawHash); (void)XF86DRIDestroyContext(dpy, scrn, pcp->contextID); Xfree(pcp); } Warning: Remote host denied X11 forwarding. Index: dri_util.c === RCS file: /cvsroot/dri/xc/xc/lib/GL/dri/dri_util.c,v retrieving revision 1.6 diff -u -r1.6 dri_util.c --- dri_util.c 28 Nov 2002 19:16:45 - 1.6 +++ dri_util.c 4 Feb 2003 23:06:41 - @@ -618,16 +618,20 @@ &pdp->numBackClipRects, &pdp->pBackClipRects )) { + /* Error -- eg the window may have been destroyed. Keep going +* with no cliprects. +*/ +pdp->pStamp = &pdp->lastStamp; /* prevent endless loop */ pdp->numClipRects = 0; pdp->pClipRects = NULL; pdp->numBackClipRects = 0; pdp->pBackClipRects = 0; - /* ERROR!!! */ } +else + pdp->pStamp = &(psp->pSAREA->drawableTable[pdp->index].stamp); DRM_SPINLOCK(&psp->pSAREA->drawable_lock, psp->drawLockID); -pdp->pStamp = &(psp->pSAREA->drawableTable[pdp->index].stamp); } /*/
Re: [Dri-devel] Context teardown
Keith Whitwell wrote: The other bug report I've had is triggered in similar circumstances, but goes into an infinite loop inside DRI_VALIDATE_DRAWABLE_INFO(), as a magic stamp value never gets updated because the X protocol message never succeeds -- but it doesn't segfault. I've got a patch that solves (I hope) that problem, but I'm not sure working around this is a good idea as it seems to result from maybe a double free somewhere... Yes. The light-05 test in viewperf shows this bug on r200. If you want to send me your patch, I can try it out. --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Leif Delgass wrote: On Tue, 4 Feb 2003, Keith Whitwell wrote: Yes, I ran into this too when the DMA buffer is flushed in radeonDestroyContext. I had tracked it down to the DRI_VALIDATE_DRAWABLE macro in the lock function, so that makes sense. Where is the drawable destroyed? That's the one. I haven't looked at it deeply yet (which app did you see this with?). I assume it gets destroyed in a previous call to XDestroyWindow(), which the dri doesn't know anything about. Keith glutDestroyWindow does indeed call XDestroyWindow before glXDestroyContext. I noticed that glxgears does it the other way around, and doesn't produce the X error. As a workaround, I tried adding this to radeonGetLock and I didn't get any errors: if ( dPriv->refcount > 0 ) DRI_VALIDATE_DRAWABLE_INFO( sPriv, dPriv ); I think the segfault is more disturbing -- it happens in a malloc, so it looks like the memory manager is getting screwed up somewhere along the line. The other bug report I've had is triggered in similar circumstances, but goes into an infinite loop inside DRI_VALIDATE_DRAWABLE_INFO(), as a magic stamp value never gets updated because the X protocol message never succeeds -- but it doesn't segfault. I've got a patch that solves (I hope) that problem, but I'm not sure working around this is a good idea as it seems to result from maybe a double free somewhere... Keith --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
On Tue, 4 Feb 2003, Keith Whitwell wrote: > > Yes, I ran into this too when the DMA buffer is flushed in > > radeonDestroyContext. I had tracked it down to the DRI_VALIDATE_DRAWABLE > > macro in the lock function, so that makes sense. Where is the drawable > > destroyed? > > That's the one. I haven't looked at it deeply yet (which app did you see this > with?). I assume it gets destroyed in a previous call to XDestroyWindow(), > which the dri doesn't know anything about. > > Keith glutDestroyWindow does indeed call XDestroyWindow before glXDestroyContext. I noticed that glxgears does it the other way around, and doesn't produce the X error. As a workaround, I tried adding this to radeonGetLock and I didn't get any errors: if ( dPriv->refcount > 0 ) DRI_VALIDATE_DRAWABLE_INFO( sPriv, dPriv ); Maybe we could just return at that point if refcount < 1, rather than going on to deal with cliprects and texture aging. Is that needed before the command buffer gets flushed? Do you think this could be a valid fix? I'm not sure exactly what refcount is counting, but from my quick test with glxgears and texobj it appears to be 1 when the window still exists and zero when it doesn't. However, I don't know if we can rely on this being up to date. Intersting also is that glut doesn't seem to call XCloseDisplay, as radeonDestroyScreen doesn't get called. glxgears calls XCloseDisplay and that seems to call radeonDestroyScreen (it doesn't happen in XDestroyWindow). -- Leif Delgass http://www.retinalburn.net --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Yes, it seems to segfault while trying to deal with the error returned from the X server. It's correct that there should be an error, but it's not clear why the segfault occurs... Keith Leif Delgass wrote: It was with the texobj Mesa demo, which appears to call glDeleteTextures and then glutDestroyWindow on ESC. I haven't looked at the implementation of glutDestroyWindow yet. On Tue, 4 Feb 2003, Keith Whitwell wrote: Yes, I ran into this too when the DMA buffer is flushed in radeonDestroyContext. I had tracked it down to the DRI_VALIDATE_DRAWABLE macro in the lock function, so that makes sense. Where is the drawable destroyed? That's the one. I haven't looked at it deeply yet (which app did you see this with?). I assume it gets destroyed in a previous call to XDestroyWindow(), which the dri doesn't know anything about. Keith --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
It was with the texobj Mesa demo, which appears to call glDeleteTextures and then glutDestroyWindow on ESC. I haven't looked at the implementation of glutDestroyWindow yet. On Tue, 4 Feb 2003, Keith Whitwell wrote: > > Yes, I ran into this too when the DMA buffer is flushed in > > radeonDestroyContext. I had tracked it down to the DRI_VALIDATE_DRAWABLE > > macro in the lock function, so that makes sense. Where is the drawable > > destroyed? > > That's the one. I haven't looked at it deeply yet (which app did you see this > with?). I assume it gets destroyed in a previous call to XDestroyWindow(), > which the dri doesn't know anything about. > > Keith > > -- Leif Delgass http://www.retinalburn.net --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Yes, I ran into this too when the DMA buffer is flushed in radeonDestroyContext. I had tracked it down to the DRI_VALIDATE_DRAWABLE macro in the lock function, so that makes sense. Where is the drawable destroyed? That's the one. I haven't looked at it deeply yet (which app did you see this with?). I assume it gets destroyed in a previous call to XDestroyWindow(), which the dri doesn't know anything about. Keith --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Context teardown
Leif Delgass wrote: I investigated why radeonDestroyContext wasn't being called for many of the Mesa demos. It turns out that only a few of the demos actually destroy the window and/or context before exit()-ing on a key press event. So if a glut app doesn't call glutDestroyWindow() or a glx app doesn't call glXDestroyContext and XDestroyWindow/XCloseDisplay then the Mesa client driver's DestroyContext/DestroyScreen never get called. This is also the case if the app is killed by a signal. So I guess we can't assume that these functions will be called, meaning that trying to clean up state in the SAREA (e.g. global texture regions) or flushing remaining buffers from those functions can't necessarily be relied on. The kernel modules have hooks for cleaning things up on client exit -- the 'release' method of the file descriptor. Also, it appears that the DRM lock is _not_ held on entering the driver's DestroyContext. I don't think this is really a problem for the current drivers, but some of my assumptions were wrong so I thought I'd point this out in case anyone else was operating under the same assumptions. ;) Yes, I've got a report of a similar problem where the driver tries to grab a lock after destroying the drawable. Strictly this is allowable, but our lock-grabbing function does some stuff that depends on having a drawable handy. Keith --- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel