Re: [Pharo-dev] [Moose-dev] Re: Too frequent crashes :-(
Thanks Peter for your email. Alexandre > On Dec 18, 2016, at 10:10 PM, Peter Uhnak wrote: > > Hi Alex, > > I certainly understand your frustration, I felt it too on Windows to the > point where I stopped using Pharo for couple of weeks out of rage, and then > spend in total at least 40+ hours digging into the VM until I added usable > workaround for Windows. Not to mention how frustrating was the fixing it. > > Obviously on Mac there are still some unresolved pathways, but it will not > magically fix itself. > > The reason for the crash (bad object pinning and moving of canvas memory) has > been known for some time now, so more crash.dmps do not give any more insight. > > If this is to be resolved then one of the two have to happen: > > A) Someone who really understands VM/image memory management / GC / pinning > fixes the issue. > > B) Someone with Mac (which I don't have) digs around BitBlt code (or wherever > it was) and adds a similar workaround. > > Considering the issue has emerged more than a year ago (Spur switch), I don't > think (A) is going to happen any time soon, so I guess the only chance is to > get elbows greasy and fix it yourself (B) (or you make one of your students > suffer :)). > > I didn't have a single Roassal/BitBlt related crash on Windows since my fix > was added, so there should be a way to add a workaround for Mac too. > > (there are of course crashes related to FT, but the story is arguably the > same). > > TL;DR: I don't think it's on a todo list of anyone who actually understands > this, so the only way to fix it is by yourself or find a concrete person that > would be willing to dig into it. > > Peter > -- _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;: Alexandre Bergel http://www.bergel.eu ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
Re: [Pharo-dev] [Moose-dev] Re: Too frequent crashes :-(
Hi peter I can understand your frustration. I would be really like to see what I can do to help. But I do not know. Except showing your mail to the vm guys. Thanks for having send it! Stef On Mon, Dec 19, 2016 at 2:03 PM, Peter Uhnak wrote: > On Mon, Dec 19, 2016 at 08:12:29AM +0800, Ben Coman wrote: > > Can yo point to where you added you workaround? > > The fix is a single line, because I hate myself. > > interpreterProxy failed ifTrue:[^nil]. > > https://github.com/pharo-project/pharo-vm/commit/ > 9bf66cf656b176d988e1b0ba74fc37da467e6192 > > To give you more info: > > The problem is that memory of canvas forms are not properly pinned, so > during garbage collection the form is being moved, but if at the same time > the canvas form is being updated and moved, you are accessing wrong memory > -> crash. > > My fix will return prematurely if an error occurs and throws > PrimitiveFailed in the image before any wrong memory is accessed. On > Roassal side the PrimitiveFailed is catched and a paint cycle is skipped > --- this is good enough, as it results only in ocassional flicker that > immediately fixes itself instead of crashing the image. > > It seems that on Mac there are also other places in the BitBlt code where > the surface is being accessed without a check. > > Also be careful not to be misled by the crash dump stack. It took me quite > a while to figure out that GrafPort is already operating on wrong data, so > it's not GrafPort's fault, but BitBlt's; of course both should possibly be > investigated with respect to the mac crash. > > Final note, personally I found it much easier the debug and manipulate the > resulting C code (and recompiling just that), then to modify the Slang code > and rebuild the source code and recompile it all (but again, I don't know > what is the proper way to work with the VM code). > > I used this script to trigger the crash https://gist.github.com/ > peteruhnak/024650ed2594301558df4da913549b54 > As the crash depends on memory consumption and "proper" garbage collection > cycle, it wasn't the easiest to reproduce, however the script above usually > managed to crash it. Having a more reliable way would be nice, but simply > triggering GC (nor full GC) wasn't enough because the memory wasn't in the > "right" state. > > Peter > >
Re: [Pharo-dev] [Moose-dev] Re: Too frequent crashes :-(
On Mon, Dec 19, 2016 at 08:12:29AM +0800, Ben Coman wrote: > Can yo point to where you added you workaround? The fix is a single line, because I hate myself. interpreterProxy failed ifTrue:[^nil]. https://github.com/pharo-project/pharo-vm/commit/9bf66cf656b176d988e1b0ba74fc37da467e6192 To give you more info: The problem is that memory of canvas forms are not properly pinned, so during garbage collection the form is being moved, but if at the same time the canvas form is being updated and moved, you are accessing wrong memory -> crash. My fix will return prematurely if an error occurs and throws PrimitiveFailed in the image before any wrong memory is accessed. On Roassal side the PrimitiveFailed is catched and a paint cycle is skipped --- this is good enough, as it results only in ocassional flicker that immediately fixes itself instead of crashing the image. It seems that on Mac there are also other places in the BitBlt code where the surface is being accessed without a check. Also be careful not to be misled by the crash dump stack. It took me quite a while to figure out that GrafPort is already operating on wrong data, so it's not GrafPort's fault, but BitBlt's; of course both should possibly be investigated with respect to the mac crash. Final note, personally I found it much easier the debug and manipulate the resulting C code (and recompiling just that), then to modify the Slang code and rebuild the source code and recompile it all (but again, I don't know what is the proper way to work with the VM code). I used this script to trigger the crash https://gist.github.com/peteruhnak/024650ed2594301558df4da913549b54 As the crash depends on memory consumption and "proper" garbage collection cycle, it wasn't the easiest to reproduce, however the script above usually managed to crash it. Having a more reliable way would be nice, but simply triggering GC (nor full GC) wasn't enough because the memory wasn't in the "right" state. Peter
Re: [Pharo-dev] [Moose-dev] Re: Too frequent crashes :-(
Hi Alex, I certainly understand your frustration, I felt it too on Windows to the point where I stopped using Pharo for couple of weeks out of rage, and then spend in total at least 40+ hours digging into the VM until I added usable workaround for Windows. Not to mention how frustrating was the fixing it. Obviously on Mac there are still some unresolved pathways, but it will not magically fix itself. The reason for the crash (bad object pinning and moving of canvas memory) has been known for some time now, so more crash.dmps do not give any more insight. If this is to be resolved then one of the two have to happen: A) Someone who really understands VM/image memory management / GC / pinning fixes the issue. B) Someone with Mac (which I don't have) digs around BitBlt code (or wherever it was) and adds a similar workaround. Considering the issue has emerged more than a year ago (Spur switch), I don't think (A) is going to happen any time soon, so I guess the only chance is to get elbows greasy and fix it yourself (B) (or you make one of your students suffer :)). I didn't have a single Roassal/BitBlt related crash on Windows since my fix was added, so there should be a way to add a workaround for Mac too. (there are of course crashes related to FT, but the story is arguably the same). TL;DR: I don't think it's on a todo list of anyone who actually understands this, so the only way to fix it is by yourself or find a concrete person that would be willing to dig into it. Peter