Bug#986176: openuniverse runs with crippled GUI, then crashes.

2021-05-27 Thread Ray Dillinger


On 5/26/21 8:01 PM, Bernhard Übelacker wrote:
> Testing in a VM with a more reasonable 6GB apparently does not provoke
>> the crash.
>
> I fear the issue might also be specific to the graphics library
> because the crash happens in nouveau_dri.so.
> Therefore a VM might not show this issue.
>
I defer to your expertise.  I'm not really familiar with how the
graphics libraries and graphics drivers work with the software, so
you're way more likely to be right.

>
>> ... and openuniverse seems to expand to fill available space.
>
> That would be a memory leak I guess.
> Then the backtrace would be really not that interesting.
>
??? I can't reproduce that now.  It still crashes after a while but it
doesn't expand to fill available space.  Did something else get fixed
that might affect it?


>> ... but checking screenshots of it online I see many UI elements that
>> simply are not present when I start it.
>
> I guess the gui needs a libglui, which is not "yet"
> packaged for debian (see #801858).
>
I think that's a showstopper bug for OpenUniverse.  It renders the
program useless, even if it weren't crashing. I'm surprised to find it
outside of the "Sid" distribution if it doesn't have that.

>
> But while writing this email, I got my hands on a nouveau capable laptop.
> There I found openuniverse also crashing if I leave it some time alone,
> at the very exact instruction [1].


Oh good.  I mean, not good that it's crashing, but good that the crash
can be reliably reproduced outside of my peculiar configuration.  If
other people are seeing it too, it means I haven't done something that
puts my machine into a failure-prone configuration that I'll never
figure out.

> I tried to record with rr, but this forces the driver to software mode,
> therefore the issue then does not show up.


Hum, that's interesting.  Now I need to go read man pages.  Thank you
for looking into it.


                Bear



Bug#986176: openuniverse runs with crippled GUI, then crashes.

2021-05-26 Thread Bernhard Übelacker

Hello Ray,



Warning, a coredump from this system would be immense.  Or, well anyway
pretty darn large.


systemd-coredump should limit the core to 2G.
And as a first target, the journal output might have a backtrace
from which one could start looking.

Maybe running openuniverse with a memory limit produces the same error in dmesg?

systemd-run --user --scope -p MemoryMax=2G openuniverse

It would also be possible to tell the kernel to just use a certain
amount of RAM by adding e.g. "mem=2G" to the kernel parameters.
But this would require a reboot of the system.




Testing in a VM with a more reasonable 6GB apparently does not provoke
the crash.


I fear the issue might also be specific to the graphics library
because the crash happens in nouveau_dri.so.
Therefore a VM might not show this issue.




... and openuniverse seems to expand to fill available space.


That would be a memory leak I guess.
Then the backtrace would be really not that interesting.




... but checking screenshots of it online I see many UI elements that
simply are not present when I start it.


I guess the gui needs a libglui, which is not "yet"
packaged for debian (see #801858).



If the issue might be related to the usage of multiple threads,
the risk that the issue gets triggered might be lowered by running
openuniverse just on a single CPU core:

taskset 0x0001 openuniverse


##


But while writing this email, I got my hands on a nouveau capable laptop.
There I found openuniverse also crashing if I leave it some time alone,
at the very exact instruction [1].

I could not see a excessive memory usage - htop shows 0.7% usage of 7.66G.
So I can't currently see a connection between the available RAM size and this 
issue.

I tried to record with rr, but this forces the driver to software mode,
therefore the issue then does not show up.
Also running with valgrind does not crash nor show something obvious.

Kind regards,
Bernhard



[1]
(gdb) bt
#0  0x7fc3fc635d63 in create_cache_trans (st=0x556dd8391f80) at 
../src/mesa/state_tracker/st_cb_bitmap.c:402
#1  accum_bitmap (bitmap=0x7fc3ff07fcf1  "", 
unpack=0x7fc3f4201ad8, height=14, width=7, y=441, x=0, ctx=0x7fc3f41cf010) at 
../src/mesa/state_tracker/st_cb_bitmap.c:516
#2  st_Bitmap (ctx=0x7fc3f41cf010, x=0, y=441, width=7, height=14, unpack=0x7fc3f4201ad8, 
bitmap=0x7fc3ff07fcf1  "") at 
../src/mesa/state_tracker/st_cb_bitmap.c:621
#3  0x7fc3fc8c167e in _mesa_Bitmap (width=7, height=14, xorig=, yorig=3, 
xmove=7, ymove=0, bitmap=0x7fc3ff07fcf1  "") at 
../src/mesa/main/drawpix.c:357
#4  0x7fc3ff066830 in glutBitmapCharacter (fontID=0x556dd6aba740 
, character=) at freeglut_font.c:122
#5  0x556dd6aa09ec in glutprintstring (x=, y=, 
z=, string=) at font.cpp:76
#6  glutprintstring (string=0x7fff4ffb0400 "Body distance from Sun (Km): 
151595991.59", z=0, y=, x=0) at font.cpp:67
#7  printstring (x=x@entry=0, y=, z=z@entry=0, 
string=string@entry=0x7fff4ffb0400 "Body distance from Sun (Km): 151595991.59") at 
font.cpp:86
#8  0x556dd6a95150 in OnScreenInfo () at info.cpp:211
#9  0x556dd6a9f028 in Display () at ou.cpp:517
#10 0x7fc3ff06ed83 in fghRedrawWindow (window=0x556dd82bad20) at 
freeglut_main.c:231
#11 fghcbDisplayWindow (window=0x556dd82bad20, enumerator=0x7fff4ffb0570) at 
freeglut_main.c:248
#12 0x7fc3ff072619 in fgEnumWindows 
(enumCallback=enumCallback@entry=0x7fc3ff06ed10 , 
enumerator=enumerator@entry=0x7fff4ffb0570) at freeglut_structure.c:396
#13 0x7fc3ff06f2fb in fghDisplayAll () at freeglut_main.c:271
#14 glutMainLoopEvent () at freeglut_main.c:1523
#15 0x7fc3ff06fc0b in glutMainLoop () at freeglut_main.c:1571
#16 0x556dd6a85c3d in main (argc=, argv=0x7fff4ffb08a8) at 
ou.cpp:572



Bug#986176: openuniverse runs with crippled GUI, then crashes.

2021-04-19 Thread Ray Dillinger
On Wed, 14 Apr 2021 14:04:33 + Ray Dillinger  wrote:

>
> Warning, a coredump from this system would be immense.  Or, well anyway
> pretty darn large.  The machine has over 64G of RAM memory installed and
> openuniverse seems to expand to fill available space. I could make a VM
> with artificially small memory to produce a more manageable coredump,
> but I wonder whether a VM environment would tickle the spot that
> provokes this bug.


Testing in a VM with a more reasonable 6GB apparently does not provoke
the crash.  It doesn't fix the interface issues, but it doesn't outright
crash.

But, in light of that fact, the clues seen so far point in one
direction, and if I'm right about it the backtrace probably wouldn't
even be relevant to finding the problem. 


Consider the facts:

I have a system with an unusual amount of memory.  I see Openuniverse
expand to fill available memory and then crash.  The crash happens at an
instruction to allocate memory.  A virtual machine with a less-unusual
amount of memory doesn't provoke this crash.


Admittedly not very much to go on but what do these clues add up to?

I have not even looked at the source code of openuniverse, but this is
pretty clearly a memory management bug, and I have a fairly solid
theory/guess as to what kind.  Managing memory in big chunks can provoke
flawed applications to fail in at least three ways they don't fail when
managing memory in smaller chunks:


First, by extending the time between deallocations and allocations
(giving other applications time to allocate and spoil memory
availability, provoking a crash on the next allocation).

Second, by provoking the allocation of proportional size buffers while
deallocating on criteria not sufficient to ensure that such a large
buffer remains available, again provoking a crash on the next allocation.

Third, by some static structure that keeps track of pointers to
allocated memory having a finite limit that is exceeded - resulting in a
buffer with an overwritten or unrecorded pointer, provoking a memory leak.


Although this theory may be incorrect, these are at the very least the
first "obvious" places to look.


Bear



Bug#986176: openuniverse runs with crippled GUI, then crashes.

2021-04-14 Thread Ray Dillinger
On Wed, 14 Apr 2021 11:59:43 +0200 =?UTF-8?Q?Bernhard_=c3=9cbelacker?=
 wrote:
> Hello Ray,
> from the "Code:" line you supplied I think the segfault happens
> in create_cache_trans at ../src/mesa/state_tracker/st_cb_bitmap.c:402.
>
>
https://sources.debian.org/src/mesa/20.3.5-1/src/mesa/state_tracker/st_cb_bitmap.c/#L402
>
>
> But I guess this information is not enough for the maintiner,
> to find out what inputs causing the segfault in this function.
>
> Maybe you could install systemd-coredump and deliver the
> output of 'journalctl --no-pager' following the last segfault line,
> that appears in dmesg too.
>
> More details are in this link: https://wiki.debian.org/HowToGetABacktrace


Warning, a coredump from this system would be immense.  Or, well anyway
pretty darn large.  The machine has over 64G of RAM memory installed and
openuniverse seems to expand to fill available space. I could make a VM
with artificially small memory to produce a more manageable coredump,
but I wonder whether a VM environment would tickle the spot that
provokes this bug.


Bear



Bug#986176: openuniverse runs with crippled GUI, then crashes.

2021-04-14 Thread Bernhard Übelacker

Hello Ray,
from the "Code:" line you supplied I think the segfault happens
in create_cache_trans at ../src/mesa/state_tracker/st_cb_bitmap.c:402.

https://sources.debian.org/src/mesa/20.3.5-1/src/mesa/state_tracker/st_cb_bitmap.c/#L402


But I guess this information is not enough for the maintiner,
to find out what inputs causing the segfault in this function.

Maybe you could install systemd-coredump and deliver the
output of 'journalctl --no-pager' following the last segfault line,
that appears in dmesg too.

More details are in this link: https://wiki.debian.org/HowToGetABacktrace

Kind regards,
Bernhard



https://wiki.debian.org/InterpretingKernelOutputAtProcessCrash

From submitter:
[406058.660546] openuniverse[242638]: segfault at 20 ip 7f86f454ad63 sp 
7ffefd7050a0 error 4 in nouveau_dri.so[7f86f4517000+d46000]
[406058.660565] Code: 48 48 89 c7 b9 02 00 00 00 ff 90 08 03 00 00 4c 8b 54 24 
10 be ff 00 00 00 48 89 c7 49 89 82 70 12 00 00 49 8b 82 60 12 00 00 <8b> 50 20 
c1 e2 05 e8 52 c9 fc ff 4c 8b 54 24 10 48 89 ea 4c 89 fe

"error 4" == 0b100
0: no page found
0: read access
1: user-mode access

echo -n "find /b ..., ..., 0x" && \
echo "48 48 89 c7 b9 02 00 00 00 ff 90 08 03 00 00 4c 8b 54 24 10 be ff 00 00 
00 48 89 c7 49 89 82 70 12 00 00 49 8b 82 60 12 00 00 <8b> 50 20 c1 e2 05 e8 52 
c9 fc ff 4c 8b 54 24 10 48 89 ea 4c 89 fe" \
 | sed 's/[<>]//g' | sed 's/ /, 0x/g'

find /b ..., ..., 0x48, 0x48, 0x89, 0xc7, 0xb9, 0x02, 0x00, 0x00, 0x00, 0xff, 
0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0xbe, 0xff, 0x00, 
0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 0x82, 0x70, 0x12, 0x00, 0x00, 0x49, 
0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 0x50, 0x20, 0xc1, 0xe2, 0x05, 0xe8, 
0x52, 0xc9, 0xfc, 0xff, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0x48, 0x89, 0xea, 0x4c, 
0x89, 0xfe






# single-use Bullseye/testing amd64 qemu VM 2021-04-14

echo "set enable-bracketed-paste off" >> /etc/inputrc; bash

apt update

# to speedup testing
mv /etc/manpath.config /etc/manpath.config.renamed
apt install libeatmydata1
export LD_PRELOAD=/usr/lib/$(uname -m)-linux-gnu/libeatmydata.so

apt dist-upgrade
apt install gdb libgl1-mesa-dri \
coreutils-dbgsym libgl1-mesa-dri-dbgsym
.




gdb -q
set width 0
set pagination off
file /bin/ls
tb main
run
call dlopen("/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so",0x102)
info share
find /b 0x767c3160, 0x7750504e, 0x48, 0x48, 0x89, 0xc7, 0xb9, 
0x02, 0x00, 0x00, 0x00, 0xff, 0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 0x54, 
0x24, 0x10, 0xbe, 0xff, 0x00, 0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 0x82, 
0x70, 0x12, 0x00, 0x00, 0x49, 0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 0x50, 
0x20, 0xc1, 0xe2, 0x05, 0xe8, 0x52, 0xc9, 0xfc, 0xff, 0x4c, 0x8b, 0x54, 0x24, 
0x10, 0x48, 0x89, 0xea, 0x4c, 0x89, 0xfe
b * (0x767f3d39 + 42)




benutzer@debian:~$ gdb -q
(gdb) set width 0
(gdb) set pagination off
(gdb) file /bin/ls
Reading symbols from /bin/ls...
Reading symbols from 
/usr/lib/debug/.build-id/64/61a544c35b9dc1d172d1a1c09043e487326966.debug...
(gdb) tb main
Temporary breakpoint 1 at 0x4760: file src/ls.c, line 1622.
(gdb) run
Starting program: /usr/bin/ls 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main (argc=1, argv=0x7fffe628) at src/ls.c:1622
1622src/ls.c: Datei oder Verzeichnis nicht gefunden.
(gdb) call dlopen("/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so",0x102)
$1 = (void *) 0x5557a980
(gdb) find /b ..., ..., 0x48, 0x48, 0x89, 0xc7, 0xb9, 0x02, 0x00, 0x00, 0x00, 
0xff, 0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0xbe, 0xff, 
0x00, 0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 0x82, 0x70, 0x12, 0x00, 0x00, 
0x49, 0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 0x50, 0x20, 0xc1, 0xe2, 0x05, 
0xe8, 0x52, 0xc9, 0xfc, 0xff, 0x4c, 0x8b, 0x54, 0x24, 0x10, 0x48, 0x89, 0xea, 
0x4c, 0x89, 0xfe
A syntax error in expression, near `..., ..., 0x48, 0x48, 0x89, 0xc7, 0xb9, 
0x02, 0x00, 0x00, 0x00, 0xff, 0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 0x54, 
0x24, 0x10, 0xbe, 0xff, 0x00, 0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 0x82, 
0x70, 0x12, 0x00, 0x00, 0x49, 0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 0x50, 
0x20, 0xc1, 0xe2, 0x05, 0xe8, 0x52, 0xc9, 0xfc, 0xff, 0x4c, 0x8b, 0x54, 0x24, 
0x10, 0x48, 0x89, 0xea, 0x4c, 0x89, 0xfe'.
(gdb) info share
FromTo  Syms Read   Shared Object Library
...
0x767c3160  0x7750504e  Yes 
/usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so
...
(*): Shared library is missing debugging information.
(gdb) find /b 0x767c3160, 0x7750504e, 0x48, 0x48, 0x89, 0xc7, 
0xb9, 0x02, 0x00, 0x00, 0x00, 0xff, 0x90, 0x08, 0x03, 0x00, 0x00, 0x4c, 0x8b, 
0x54, 0x24, 0x10, 0xbe, 0xff, 0x00, 0x00, 0x00, 0x48, 0x89, 0xc7, 0x49, 0x89, 
0x82, 0x70, 0x12, 0x00, 0x00, 0x49, 0x8b, 0x82, 0x60, 0x12, 0x00, 0x00, 0x8b, 
0x50, 0x20, 0xc1, 0xe2, 0x05, 0xe8, 0x52, 0xc9, 0xfc, 

Bug#986176: openuniverse runs with crippled GUI, then crashes.

2021-03-30 Thread Ray Dillinger
Package: openuniverse

version: 1.0beta3.1+dfsg-6.1

When I started openuniverse, it put up a window with no menu items and
no other control elements.  It responded to '?' or 'H' keystrokes by
putting up a short list of keystroke shortcuts - presumably
corresponding to nonexistent menu options.  These keystroke shortcuts
seemed to work, but within a few minutes openuniverse crashed.  I
started it a few more times trying for a while to figure out what I did
that made it crash, but it seemed random.  Finally I started it and went
looking online for any discussion of the problem.  It crashed after no
more than 5 minutes, before I had even turned away from the browser and
tried to do anything with it.  So I'm pretty sure it's not something I did.

In dmesg it says:

[406058.660546] openuniverse[242638]: segfault at 20 ip 7f86f454ad63
sp 7ffefd7050a0 error 4 in nouveau_dri.so[7f86f4517000+d46000]
[406058.660565] Code: 48 48 89 c7 b9 02 00 00 00 ff 90 08 03 00 00 4c 8b
54 24 10 be ff 00 00 00 48 89 c7 49 89 82 70 12 00 00 49 8b 82 60 12 00
00 <8b> 50 20 c1 e2 05 e8 52 c9 fc ff 4c 8b 54 24 10 48 89 ea 4c 89 fe

Which appears to implicate a conflict with nouveau.  I have an nvidia
1050TI video card but I have not downloaded drivers from nvidia's site
for it.  OpenUniverse documentation strongly suggests the proprietary
drivers I am not using.

I am not familiar with openuniverse, but checking screenshots of it
online I see many UI elements that simply are not present when I start
it.  It's even missing a basic icon for a launcher shortcut.

Checking dependencies I see that it conflicts with
openuniverse-common(<=1.0beta3.1-3).  I have installed version
1.0beta3.1+dfsg-6.1.  That looks to me like it should not have installed
with the current version of openuniverse-common, but these version
numbers are inconsistent in format so I'm not certain. 

Checking dependencies I also see that it requires libjpeg26-turbo >=
1.3.1 and my installed version is 1:2.0.6-4.  Again it looks to me like
it shouldn't have installed with this version, but because of the
inconsistency in version number format I'm not sure.

Finally I see in its dependencies that it suggests package 'celestia'
which has no installation candidate in the Testing/Bullseye release. 
This is very sad.  I like Celestia.  I miss it ever since Jessie.  I
have sometimes gone out and gotten the .deb from their site and
installed it - but not yet this time.  I tried openuniverse first
looking for an adequate in-distro replacement.

This is a fresh install of Bullseye, made using 'grml-debootstrap' less
than a week ago.  I have absolutely no software installed on this
machine that is not downloaded from the 'Bullseye' archive.


Packages openuniverse depends on:

openuniverse-common:  Installed version is 1.0beta3.1+dfsg-6.1

freeglut3 >= 2.8.1:   Installed version is 2.8.1-6

libc6 >= 2.14:   Installed version is 2.31-10

libgcc-s1 >= 3.0:  Installed version is 10.2.1-6

libglu1-mesa | libglu1: Installed version is libglu1-mesa

libjpeg62-turbo >= 1.3.1: installed version is 1:2.0.6-4

libplib1: Installed version is 1.8.5-8

libstdc++6 >= 5 : installed version is 10.2.1-6


Hope this helps!


                    Ray "Bear" Dillinger