Hi,
I'm experiencing crashes in a DFB application (running on an ARM Linux
2.6 machine, with fbdev) when the OS suspends and then resumes. Do
you have any idea what could be causing this and how to fix it? As
you'll see, we couldn't get very far debugging it...
What happens: after suspending the machine
("echo mem >/sys/power/state") and then resuming it, the DFB
application is gone, with the following log messages (this was
without debug mode enabled):
(!) [ 1076: 19.429] --> Caught signal 4 (at 0xbe826a2c, illegal
trap) <--
(!) FUSION_PROPERTY_LEASE --> Connection timed out
(!) [ 1076: 19.521] --> Caught signal 11 (at (nil), invalid address)
<--
Killed
We initially suspected it has something to do with the clock being
bumped at wake-up and fusion module interpreting this as a time-out
and causing fusion_property_lease() to fail. I still think this could
be a problem, seeing that there's no suspend/resume related code in
linux-fusion, but it's more complicated than that, unfortunately:
Notice the SIGILL (=4) signal above. This happens *before* the system
is suspended, unfortunately without any useful backtrace:
(gdb) c
Continuing. Breakpoint 2 at 0x4067f150:
file /home/zeitlin/src/rea/src/DirectFB/src/core/core.c, line 633.
Pending breakpoint "dfb_core_suspend" resolved Program terminated
with signal SIGKILL, Killed. The program no longer exists.
Also, there's the following output in debug build when suspending:
(!) [VT Switcher 9.060] ( 1359) *** Assumption [core != NULL]
failed *** [/home/zeitlin/src/rea/src/DirectFB/src/core/core.c:633 in
dfb_core_suspend()]
(!) [ 1338: 48.076] --> Caught signal 4 (at 0xbede4954, illegal
trap) <--
(!) [Main Thread 48.081] ( 1338) *** Assertion [(thread)->magic
== D_MAGIC("DirectThread")] failed ***
[/home/zeitlin/src/rea/src/DirectFB/lib/direct/thread.c:294 in
direct_thread_cancel()]
The assumption fails because vt_thread() calls the function with NULL
core, so there's clearly a bug here, but I wonder if the bug is
having the check or passing NULL from vt_thread()...
Breakpoint 2, dfb_core_suspend (core=0x0)
at /home/zeitlin/src/rea/src/DirectFB/src/core/core.c:633 633
D_ASSUME( core != NULL ); Current language: auto; currently c (gdb)
bt #0 dfb_core_suspend (core=0x0)
at /home/zeitlin/src/rea/src/DirectFB/src/core/core.c:633 #1
0x4074d400 in vt_thread (thread=0x1f918, arg=0x0)
at /home/zeitlin/src/rea/src/DirectFB/systems/fbdev/vt.c:392 #2
0x40727994 in direct_thread_main (arg=0x1f918)
at /home/zeitlin/src/rea/src/DirectFB/lib/direct/thread.c:490 #3
0x404acae0 in pthread_detach ()
from /home/zeitlin/rea/debug/lib/libpthread.so.0 #4 0x00000000 in ??
()
(the app receives SIG41 (SIG_SWITCH_FROM) shortly before this, so it's
done in reaction to that [of course, not with no-vt-switching
option]).
A bit later, the app gets SIGSEGV (see the very first log output in
this mail):
Program received signal SIGSEGV, Segmentation fault.
[Switching to thread 1497]
0x406bd5e4 in dfb_wm_set_active (stack=0x20119d00, active=false)
at /home/zeitlin/src/rea/src/DirectFB/src/core/wm.c:435 435
D_ASSERT( wm_local->funcs->SetActive != NULL );
(gdb) bt
#0 0x406bd5e4 in dfb_wm_set_active (stack=0x20119d00, active=false)
at /home/zeitlin/src/rea/src/DirectFB/src/core/wm.c:435
#1 0x40696fb8 in dfb_layer_context_deactivate (context=0x20019000)
at /home/zeitlin/src/rea/src/DirectFB/src/core/layer_context.c:299
#2 0x4069b630 in dfb_layer_suspend (layer=0x1f9f8)
at /home/zeitlin/src/rea/src/DirectFB/src/core/layer_control.c:106
#3 0x406a1a90 in dfb_layers_suspend (core=0x1f470)
at /home/zeitlin/src/rea/src/DirectFB/src/core/layers.c:334
#4 0x4067f2cc in dfb_core_suspend (core=0x1f470)
at /home/zeitlin/src/rea/src/DirectFB/src/core/core.c:647
#5 0x4074d400 in vt_thread (thread=0x1f918, arg=0x0)
at /home/zeitlin/src/rea/src/DirectFB/systems/fbdev/vt.c:392
#6 0x40727994 in direct_thread_main (arg=0x1f918)
at /home/zeitlin/src/rea/src/DirectFB/lib/direct/thread.c:490
Here, wm_local points to invalid memory (hence the crash).
Unfortunately, using no-vt-switching doesn't help much: we don't get
that assumption failure above, but the app still receives SIGILL (on
resume this time), with useless gdb backtrace.
Any ideas where should I look for the bug? Are there any known
problems with suspend/resume?
Thanks,
Vaclav
--
PGP key: 0x465264C9, available from http://pgp.mit.edu/
_______________________________________________
directfb-dev mailing list
[email protected]
http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev