Bug#986176: openuniverse runs with crippled GUI, then crashes.
On 5/26/21 8:01 PM, Bernhard Übelacker wrote: > Testing in a VM with a more reasonable 6GB apparently does not provoke >> the crash. > > I fear the issue might also be specific to the graphics library > because the crash happens in nouveau_dri.so. > Therefore a VM might not show this issue. > I defer to your expertise. I'm not really familiar with how the graphics libraries and graphics drivers work with the software, so you're way more likely to be right. > >> ... and openuniverse seems to expand to fill available space. > > That would be a memory leak I guess. > Then the backtrace would be really not that interesting. > ??? I can't reproduce that now. It still crashes after a while but it doesn't expand to fill available space. Did something else get fixed that might affect it? >> ... but checking screenshots of it online I see many UI elements that >> simply are not present when I start it. > > I guess the gui needs a libglui, which is not "yet" > packaged for debian (see #801858). > I think that's a showstopper bug for OpenUniverse. It renders the program useless, even if it weren't crashing. I'm surprised to find it outside of the "Sid" distribution if it doesn't have that. > > But while writing this email, I got my hands on a nouveau capable laptop. > There I found openuniverse also crashing if I leave it some time alone, > at the very exact instruction [1]. Oh good. I mean, not good that it's crashing, but good that the crash can be reliably reproduced outside of my peculiar configuration. If other people are seeing it too, it means I haven't done something that puts my machine into a failure-prone configuration that I'll never figure out. > I tried to record with rr, but this forces the driver to software mode, > therefore the issue then does not show up. Hum, that's interesting. Now I need to go read man pages. Thank you for looking into it. Bear
Bug#986176: openuniverse runs with crippled GUI, then crashes.
Hello Ray, Warning, a coredump from this system would be immense. Or, well anyway pretty darn large. systemd-coredump should limit the core to 2G. And as a first target, the journal output might have a backtrace from which one could start looking. Maybe running openuniverse with a memory limit produces the same error in dmesg? systemd-run --user --scope -p MemoryMax=2G openuniverse It would also be possible to tell the kernel to just use a certain amount of RAM by adding e.g. "mem=2G" to the kernel parameters. But this would require a reboot of the system. Testing in a VM with a more reasonable 6GB apparently does not provoke the crash. I fear the issue might also be specific to the graphics library because the crash happens in nouveau_dri.so. Therefore a VM might not show this issue. ... and openuniverse seems to expand to fill available space. That would be a memory leak I guess. Then the backtrace would be really not that interesting. ... but checking screenshots of it online I see many UI elements that simply are not present when I start it. I guess the gui needs a libglui, which is not "yet" packaged for debian (see #801858). If the issue might be related to the usage of multiple threads, the risk that the issue gets triggered might be lowered by running openuniverse just on a single CPU core: taskset 0x0001 openuniverse ## But while writing this email, I got my hands on a nouveau capable laptop. There I found openuniverse also crashing if I leave it some time alone, at the very exact instruction [1]. I could not see a excessive memory usage - htop shows 0.7% usage of 7.66G. So I can't currently see a connection between the available RAM size and this issue. I tried to record with rr, but this forces the driver to software mode, therefore the issue then does not show up. Also running with valgrind does not crash nor show something obvious. Kind regards, Bernhard [1] (gdb) bt #0 0x7fc3fc635d63 in create_cache_trans (st=0x556dd8391f80) at ../src/mesa/state_tracker/st_cb_bitmap.c:402 #1 accum_bitmap (bitmap=0x7fc3ff07fcf1 "", unpack=0x7fc3f4201ad8, height=14, width=7, y=441, x=0, ctx=0x7fc3f41cf010) at ../src/mesa/state_tracker/st_cb_bitmap.c:516 #2 st_Bitmap (ctx=0x7fc3f41cf010, x=0, y=441, width=7, height=14, unpack=0x7fc3f4201ad8, bitmap=0x7fc3ff07fcf1 "") at ../src/mesa/state_tracker/st_cb_bitmap.c:621 #3 0x7fc3fc8c167e in _mesa_Bitmap (width=7, height=14, xorig=, yorig=3, xmove=7, ymove=0, bitmap=0x7fc3ff07fcf1 "") at ../src/mesa/main/drawpix.c:357 #4 0x7fc3ff066830 in glutBitmapCharacter (fontID=0x556dd6aba740 , character=) at freeglut_font.c:122 #5 0x556dd6aa09ec in glutprintstring (x=, y=, z=, string=) at font.cpp:76 #6 glutprintstring (string=0x7fff4ffb0400 "Body distance from Sun (Km): 151595991.59", z=0, y=, x=0) at font.cpp:67 #7 printstring (x=x@entry=0, y=, z=z@entry=0, string=string@entry=0x7fff4ffb0400 "Body distance from Sun (Km): 151595991.59") at font.cpp:86 #8 0x556dd6a95150 in OnScreenInfo () at info.cpp:211 #9 0x556dd6a9f028 in Display () at ou.cpp:517 #10 0x7fc3ff06ed83 in fghRedrawWindow (window=0x556dd82bad20) at freeglut_main.c:231 #11 fghcbDisplayWindow (window=0x556dd82bad20, enumerator=0x7fff4ffb0570) at freeglut_main.c:248 #12 0x7fc3ff072619 in fgEnumWindows (enumCallback=enumCallback@entry=0x7fc3ff06ed10 , enumerator=enumerator@entry=0x7fff4ffb0570) at freeglut_structure.c:396 #13 0x7fc3ff06f2fb in fghDisplayAll () at freeglut_main.c:271 #14 glutMainLoopEvent () at freeglut_main.c:1523 #15 0x7fc3ff06fc0b in glutMainLoop () at freeglut_main.c:1571 #16 0x556dd6a85c3d in main (argc=, argv=0x7fff4ffb08a8) at ou.cpp:572
Bug#986176: openuniverse runs with crippled GUI, then crashes.
On Wed, 14 Apr 2021 14:04:33 + Ray Dillinger wrote: > > Warning, a coredump from this system would be immense. Or, well anyway > pretty darn large. The machine has over 64G of RAM memory installed and > openuniverse seems to expand to fill available space. I could make a VM > with artificially small memory to produce a more manageable coredump, > but I wonder whether a VM environment would tickle the spot that > provokes this bug. Testing in a VM with a more reasonable 6GB apparently does not provoke the crash. It doesn't fix the interface issues, but it doesn't outright crash. But, in light of that fact, the clues seen so far point in one direction, and if I'm right about it the backtrace probably wouldn't even be relevant to finding the problem. Consider the facts: I have a system with an unusual amount of memory. I see Openuniverse expand to fill available memory and then crash. The crash happens at an instruction to allocate memory. A virtual machine with a less-unusual amount of memory doesn't provoke this crash. Admittedly not very much to go on but what do these clues add up to? I have not even looked at the source code of openuniverse, but this is pretty clearly a memory management bug, and I have a fairly solid theory/guess as to what kind. Managing memory in big chunks can provoke flawed applications to fail in at least three ways they don't fail when managing memory in smaller chunks: First, by extending the time between deallocations and allocations (giving other applications time to allocate and spoil memory availability, provoking a crash on the next allocation). Second, by provoking the allocation of proportional size buffers while deallocating on criteria not sufficient to ensure that such a large buffer remains available, again provoking a crash on the next allocation. Third, by some static structure that keeps track of pointers to allocated memory having a finite limit that is exceeded - resulting in a buffer with an overwritten or unrecorded pointer, provoking a memory leak. Although this theory may be incorrect, these are at the very least the first "obvious" places to look. Bear
Bug#986176: openuniverse runs with crippled GUI, then crashes.
On Wed, 14 Apr 2021 11:59:43 +0200 =?UTF-8?Q?Bernhard_=c3=9cbelacker?= wrote: > Hello Ray, > from the "Code:" line you supplied I think the segfault happens > in create_cache_trans at ../src/mesa/state_tracker/st_cb_bitmap.c:402. > > https://sources.debian.org/src/mesa/20.3.5-1/src/mesa/state_tracker/st_cb_bitmap.c/#L402 > > > But I guess this information is not enough for the maintiner, > to find out what inputs causing the segfault in this function. > > Maybe you could install systemd-coredump and deliver the > output of 'journalctl --no-pager' following the last segfault line, > that appears in dmesg too. > > More details are in this link: https://wiki.debian.org/HowToGetABacktrace Warning, a coredump from this system would be immense. Or, well anyway pretty darn large. The machine has over 64G of RAM memory installed and openuniverse seems to expand to fill available space. I could make a VM with artificially small memory to produce a more manageable coredump, but I wonder whether a VM environment would tickle the spot that provokes this bug. Bear
Bug#986176: openuniverse runs with crippled GUI, then crashes.
Hello Ray, from the "Code:" line you supplied I think the segfault happens in create_cache_trans at ../src/mesa/state_tracker/st_cb_bitmap.c:402. https://sources.debian.org/src/mesa/20.3.5-1/src/mesa/state_tracker/st_cb_bitmap.c/#L402 But I guess this information is not enough for the maintiner, to find out what inputs causing the segfault in this function. Maybe you could install systemd-coredump and deliver the output of 'journalctl --no-pager' following the last segfault line, that appears in dmesg too. More details are in this link: https://wiki.debian.org/HowToGetABacktrace Kind regards, Bernhard https://wiki.debian.org/InterpretingKernelOutputAtProcessCrash From submitter: [406058.660546] openuniverse[242638]: segfault at 20 ip 7f86f454ad63 sp 7ffefd7050a0 error 4 in nouveau_dri.so[7f86f4517000+d46000] [406058.660565] Code: 48 48 89 c7 b9 02 00 00 00 ff 90 08 03 00 00 4c 8b 54 24 10 be ff 00 00 00 48 89 c7 49 89 82 70 12 00 00 49 8b 82 60 12 00 00 <8b> 50 20 c1 e2 05 e8 52 c9 fc ff 4c 8b 54 24 10 48 89 ea 4c 89 fe "error 4" == 0b100 0: no page found 0: read access 1: user-mode access echo -n "find /b ..., ..., 0x" && \ echo "48 48 89 c7 b9 02 00 00 00 ff 90 08 03 00 00 4c 8b 54 24 10 be ff 00 00 00 48 89 c7 49 89 82 70 12 00 00 49 8b 82 60 12 00 00 <8b> 50 20 c1 e2 05 e8 52 c9 fc ff 4c 8b 54 24 10 48 89 ea 4c 89 fe" \ | sed 's/[<>]//g' | sed 's/ /, 0x/g' find /b ..., ..., 0x48, 0x48, 0x89, 0xc7, 0xb9, 0x02, 0x00, 0x00, 0x00, 0xff, 0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0xbe, 0xff, 0x00, 0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 0x82, 0x70, 0x12, 0x00, 0x00, 0x49, 0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 0x50, 0x20, 0xc1, 0xe2, 0x05, 0xe8, 0x52, 0xc9, 0xfc, 0xff, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0x48, 0x89, 0xea, 0x4c, 0x89, 0xfe # single-use Bullseye/testing amd64 qemu VM 2021-04-14 echo "set enable-bracketed-paste off" >> /etc/inputrc; bash apt update # to speedup testing mv /etc/manpath.config /etc/manpath.config.renamed apt install libeatmydata1 export LD_PRELOAD=/usr/lib/$(uname -m)-linux-gnu/libeatmydata.so apt dist-upgrade apt install gdb libgl1-mesa-dri \ coreutils-dbgsym libgl1-mesa-dri-dbgsym . gdb -q set width 0 set pagination off file /bin/ls tb main run call dlopen("/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so",0x102) info share find /b 0x767c3160, 0x7750504e, 0x48, 0x48, 0x89, 0xc7, 0xb9, 0x02, 0x00, 0x00, 0x00, 0xff, 0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0xbe, 0xff, 0x00, 0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 0x82, 0x70, 0x12, 0x00, 0x00, 0x49, 0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 0x50, 0x20, 0xc1, 0xe2, 0x05, 0xe8, 0x52, 0xc9, 0xfc, 0xff, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0x48, 0x89, 0xea, 0x4c, 0x89, 0xfe b * (0x767f3d39 + 42) benutzer@debian:~$ gdb -q (gdb) set width 0 (gdb) set pagination off (gdb) file /bin/ls Reading symbols from /bin/ls... Reading symbols from /usr/lib/debug/.build-id/64/61a544c35b9dc1d172d1a1c09043e487326966.debug... (gdb) tb main Temporary breakpoint 1 at 0x4760: file src/ls.c, line 1622. (gdb) run Starting program: /usr/bin/ls [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Temporary breakpoint 1, main (argc=1, argv=0x7fffe628) at src/ls.c:1622 1622src/ls.c: Datei oder Verzeichnis nicht gefunden. (gdb) call dlopen("/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so",0x102) $1 = (void *) 0x5557a980 (gdb) find /b ..., ..., 0x48, 0x48, 0x89, 0xc7, 0xb9, 0x02, 0x00, 0x00, 0x00, 0xff, 0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0xbe, 0xff, 0x00, 0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 0x82, 0x70, 0x12, 0x00, 0x00, 0x49, 0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 0x50, 0x20, 0xc1, 0xe2, 0x05, 0xe8, 0x52, 0xc9, 0xfc, 0xff, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0x48, 0x89, 0xea, 0x4c, 0x89, 0xfe A syntax error in expression, near `..., ..., 0x48, 0x48, 0x89, 0xc7, 0xb9, 0x02, 0x00, 0x00, 0x00, 0xff, 0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0xbe, 0xff, 0x00, 0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 0x82, 0x70, 0x12, 0x00, 0x00, 0x49, 0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 0x50, 0x20, 0xc1, 0xe2, 0x05, 0xe8, 0x52, 0xc9, 0xfc, 0xff, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0x48, 0x89, 0xea, 0x4c, 0x89, 0xfe'. (gdb) info share FromTo Syms Read Shared Object Library ... 0x767c3160 0x7750504e Yes /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so ... (*): Shared library is missing debugging information. (gdb) find /b 0x767c3160, 0x7750504e, 0x48, 0x48, 0x89, 0xc7, 0xb9, 0x02, 0x00, 0x00, 0x00, 0xff, 0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0xbe, 0xff, 0x00, 0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 0x82, 0x70, 0x12, 0x00, 0x00, 0x49, 0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 0x50, 0x20, 0xc1, 0xe2, 0x05, 0xe8, 0x52, 0xc9, 0xfc,
Bug#986176: openuniverse runs with crippled GUI, then crashes.
Package: openuniverse version: 1.0beta3.1+dfsg-6.1 When I started openuniverse, it put up a window with no menu items and no other control elements. It responded to '?' or 'H' keystrokes by putting up a short list of keystroke shortcuts - presumably corresponding to nonexistent menu options. These keystroke shortcuts seemed to work, but within a few minutes openuniverse crashed. I started it a few more times trying for a while to figure out what I did that made it crash, but it seemed random. Finally I started it and went looking online for any discussion of the problem. It crashed after no more than 5 minutes, before I had even turned away from the browser and tried to do anything with it. So I'm pretty sure it's not something I did. In dmesg it says: [406058.660546] openuniverse[242638]: segfault at 20 ip 7f86f454ad63 sp 7ffefd7050a0 error 4 in nouveau_dri.so[7f86f4517000+d46000] [406058.660565] Code: 48 48 89 c7 b9 02 00 00 00 ff 90 08 03 00 00 4c 8b 54 24 10 be ff 00 00 00 48 89 c7 49 89 82 70 12 00 00 49 8b 82 60 12 00 00 <8b> 50 20 c1 e2 05 e8 52 c9 fc ff 4c 8b 54 24 10 48 89 ea 4c 89 fe Which appears to implicate a conflict with nouveau. I have an nvidia 1050TI video card but I have not downloaded drivers from nvidia's site for it. OpenUniverse documentation strongly suggests the proprietary drivers I am not using. I am not familiar with openuniverse, but checking screenshots of it online I see many UI elements that simply are not present when I start it. It's even missing a basic icon for a launcher shortcut. Checking dependencies I see that it conflicts with openuniverse-common(<=1.0beta3.1-3). I have installed version 1.0beta3.1+dfsg-6.1. That looks to me like it should not have installed with the current version of openuniverse-common, but these version numbers are inconsistent in format so I'm not certain. Checking dependencies I also see that it requires libjpeg26-turbo >= 1.3.1 and my installed version is 1:2.0.6-4. Again it looks to me like it shouldn't have installed with this version, but because of the inconsistency in version number format I'm not sure. Finally I see in its dependencies that it suggests package 'celestia' which has no installation candidate in the Testing/Bullseye release. This is very sad. I like Celestia. I miss it ever since Jessie. I have sometimes gone out and gotten the .deb from their site and installed it - but not yet this time. I tried openuniverse first looking for an adequate in-distro replacement. This is a fresh install of Bullseye, made using 'grml-debootstrap' less than a week ago. I have absolutely no software installed on this machine that is not downloaded from the 'Bullseye' archive. Packages openuniverse depends on: openuniverse-common: Installed version is 1.0beta3.1+dfsg-6.1 freeglut3 >= 2.8.1: Installed version is 2.8.1-6 libc6 >= 2.14: Installed version is 2.31-10 libgcc-s1 >= 3.0: Installed version is 10.2.1-6 libglu1-mesa | libglu1: Installed version is libglu1-mesa libjpeg62-turbo >= 1.3.1: installed version is 1:2.0.6-4 libplib1: Installed version is 1.8.5-8 libstdc++6 >= 5 : installed version is 10.2.1-6 Hope this helps! Ray "Bear" Dillinger