Hi,

I'm experiencing crashes in a DFB application (running on an ARM Linux 
2.6 machine, with fbdev) when the OS suspends and then resumes. Do 
you have any idea what could be causing this and how to fix it? As 
you'll see, we couldn't get very far debugging it...

What happens: after suspending the machine
("echo mem >/sys/power/state") and then resuming it, the DFB 
application is gone, with the following log messages (this was 
without debug mode enabled):

(!) [ 1076:   19.429] --> Caught signal 4 (at 0xbe826a2c, illegal 
trap) <--
(!) FUSION_PROPERTY_LEASE    --> Connection timed out
(!) [ 1076:   19.521] --> Caught signal 11 (at (nil), invalid address) 
<--
Killed

We initially suspected it has something to do with the clock being 
bumped at wake-up and fusion module interpreting this as a time-out 
and causing fusion_property_lease() to fail. I still think this could 
be a problem, seeing that there's no suspend/resume related code in 
linux-fusion, but it's more complicated than that, unfortunately:

Notice the SIGILL (=4) signal above. This happens *before* the system 
is suspended, unfortunately without any useful backtrace:

(gdb) c
Continuing. Breakpoint 2 at 0x4067f150: 
file /home/zeitlin/src/rea/src/DirectFB/src/core/core.c, line 633. 
Pending breakpoint "dfb_core_suspend" resolved  Program terminated 
with signal SIGKILL, Killed. The program no longer exists. 

Also, there's the following output in debug build when suspending:

(!) [VT Switcher       9.060] ( 1359) *** Assumption [core != NULL] 
failed *** [/home/zeitlin/src/rea/src/DirectFB/src/core/core.c:633 in 
dfb_core_suspend()]
(!) [ 1338:   48.076] --> Caught signal 4 (at 0xbede4954, illegal 
trap) <-- 
(!) [Main Thread      48.081] ( 1338) *** Assertion [(thread)->magic 
== D_MAGIC("DirectThread")] failed *** 
[/home/zeitlin/src/rea/src/DirectFB/lib/direct/thread.c:294 in 
direct_thread_cancel()] 

The assumption fails because vt_thread() calls the function with NULL 
core, so there's clearly a bug here, but I wonder if the bug is 
having the check or passing NULL from vt_thread()... 

Breakpoint 2, dfb_core_suspend (core=0x0) 
at /home/zeitlin/src/rea/src/DirectFB/src/core/core.c:633 633          
D_ASSUME( core != NULL ); Current language:  auto; currently c (gdb) 
bt #0  dfb_core_suspend (core=0x0) 
at /home/zeitlin/src/rea/src/DirectFB/src/core/core.c:633 #1  
0x4074d400 in vt_thread (thread=0x1f918, arg=0x0) 
at /home/zeitlin/src/rea/src/DirectFB/systems/fbdev/vt.c:392 #2  
0x40727994 in direct_thread_main (arg=0x1f918) 
at /home/zeitlin/src/rea/src/DirectFB/lib/direct/thread.c:490 #3  
0x404acae0 in pthread_detach () 
from /home/zeitlin/rea/debug/lib/libpthread.so.0 #4  0x00000000 in ?? 
() 

(the app receives SIG41 (SIG_SWITCH_FROM) shortly before this, so it's 
done in reaction to that [of course, not with no-vt-switching 
option]).

A bit later, the app gets SIGSEGV (see the very first log output in 
this mail):

Program received signal SIGSEGV, Segmentation fault.
[Switching to thread 1497]
0x406bd5e4 in dfb_wm_set_active (stack=0x20119d00, active=false) 
at /home/zeitlin/src/rea/src/DirectFB/src/core/wm.c:435 435          
D_ASSERT( wm_local->funcs->SetActive != NULL );
(gdb) bt
#0 0x406bd5e4 in dfb_wm_set_active (stack=0x20119d00, active=false) 
at /home/zeitlin/src/rea/src/DirectFB/src/core/wm.c:435
#1  0x40696fb8 in dfb_layer_context_deactivate (context=0x20019000) 
at /home/zeitlin/src/rea/src/DirectFB/src/core/layer_context.c:299
#2  0x4069b630 in dfb_layer_suspend (layer=0x1f9f8) 
at /home/zeitlin/src/rea/src/DirectFB/src/core/layer_control.c:106
#3  0x406a1a90 in dfb_layers_suspend (core=0x1f470) 
at /home/zeitlin/src/rea/src/DirectFB/src/core/layers.c:334 
#4  0x4067f2cc in dfb_core_suspend (core=0x1f470) 
at /home/zeitlin/src/rea/src/DirectFB/src/core/core.c:647
#5  0x4074d400 in vt_thread (thread=0x1f918, arg=0x0) 
at /home/zeitlin/src/rea/src/DirectFB/systems/fbdev/vt.c:392
#6  0x40727994 in direct_thread_main (arg=0x1f918) 
at /home/zeitlin/src/rea/src/DirectFB/lib/direct/thread.c:490

Here, wm_local points to invalid memory (hence the crash).

Unfortunately, using no-vt-switching doesn't help much: we don't get 
that assumption failure above, but the app still receives SIGILL (on 
resume this time), with useless gdb backtrace.

Any ideas where should I look for the bug? Are there any known 
problems with suspend/resume?

Thanks,
Vaclav

-- 
PGP key: 0x465264C9, available from http://pgp.mit.edu/

_______________________________________________
directfb-dev mailing list
[email protected]
http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev

Reply via email to