Hi, I did some hacking with qi and the fedora crash utility to get memory dumps from the several 2.6.39 suspend bugs. I wrote the steps down to
http://lindi.iki.fi/lindi/openmoko/crash/README but I'll include a copy in this mail too: 1) Boot linux with mem=64M so that only half of the memory is used. 2) Patch qi bootloader with http://lindi.iki.fi/lindi/openmoko/crash/crashdump.diff so that qi copies the first half of the memory to the second half of the memory very early on boot before it has touched memory in any way. This allows us to get a clean memory dump just after watchdog reset. 3) Patch the crash tool with http://lindi.iki.fi/lindi/openmoko/crash/crash-arm1.diff. 4) Compile some helper tools: http://lindi.iki.fi/lindi/openmoko/crash/crashdump-erase.c http://lindi.iki.fi/lindi/openmoko/crash/crashdump-read.c 5) $ modprobe s3c2410_wdt 6) $ apt-get install watchdog 7) create a kdump from the live system: $ crash -e emacs /usr/lib/debug/boot/vmlinux-2.6.39-gta02-gta02 crash> extend extensions/snap.so ./extensions/snap.so: shared object loaded crash> snap live.kdump ive.kdump: [100%] -rw-r--r-- 1 root root 67109380 Jan 25 10:53 live.kdump crash> exit 8) prepare crashdump area: $ crashdump-erase C 9) start a suspend/resume stress-test: while true; do om screen power 1 rtcwake -s 10 -m no echo mem > /sys/power/state done 10) wait for the system to reboot with watchdog. (If it gets stuck then you probably hit a bug that occured when watchdog was not started yet after resume. You can simulate watchdog reset with debug board or by very briefly (< 1 second) removing the battery so that memory is not lost.) 11) create a kdump from the memory dump. We'll reuse the headers from the live dump: $ (dd if=live.kdump bs=516 count=1 2> /dev/null; crashdump-read) > suspend.kdump 12) analyze the dump: $ crash -e emacs suspend.kdump /usr/lib/debug/boot/vmlinux-2.6.39-gta02-gta02 WARNING: Couldn't retrieve crash_notes please wait... (determining panic task) crash: invalid kernel virtual address: 0 type: "fill_thread_info" crash: invalid task address: c3b53580 KERNEL: /usr/lib/debug/boot/vmlinux-2.6.39-gta02-gta02 DUMPFILE: suspend.kdump CPUS: 1 DATE: Tue Jan 24 20:28:10 2012 UPTIME: (cannot calculate: unknown HZ value) LOAD AVERAGE: 1.60, 1.22, 0.60 TASKS: 66 NODENAME: ginger RELEASE: 2.6.39-gta02-gta02 VERSION: #1 Mon Oct 31 20:02:07 UTC 2011 MACHINE: armv4tl (unknown Mhz) MEMORY: 64 MB PANIC: "" PID: 0 COMMAND: "swapper" TASK: c036e4a8 [THREAD_INFO: c036a000] CPU: 0 STATE: TASK_RUNNING WARNING: panic task not found crash> foreach bt PID: 0 TASK: c036e4a8 CPU: 0 COMMAND: "swapper" #0 [<c0295e40>] (schedule) from [<c00279f0>] #1 [<c00279f0>] (cpu_idle) from [<c00088f0>] #2 [<c00088f0>] (start_kernel) from [<30008038>] PID: 1 TASK: c3817d60 CPU: 0 COMMAND: "init" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 2 TASK: c3817ac0 CPU: 0 COMMAND: "kthreadd" #0 [<c0295e40>] (schedule) from [<c0054098>] #1 [<c0054098>] (kthreadd) from [<c00272e8>] PID: 3 TASK: c3817820 CPU: 0 COMMAND: "ksoftirqd/0" #0 [<c0295e40>] (schedule) from [<c00416b4>] #1 [<c00416b4>] (run_ksoftirqd) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 6 TASK: c3817040 CPU: 0 COMMAND: "rcu_kthread" #0 [<c0295e40>] (schedule) from [<c006bae4>] #1 [<c006bae4>] (rcu_kthread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 7 TASK: c382cd60 CPU: 0 COMMAND: "khelper" #0 [<c0295e40>] (schedule) from [<c00503a4>] #1 [<c00503a4>] (rescuer_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 8 TASK: c382cac0 CPU: 0 COMMAND: "sync_supers" #0 [<c0295e40>] (schedule) from [<c0081668>] #1 [<c0081668>] (bdi_sync_supers) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 9 TASK: c382c820 CPU: 0 COMMAND: "bdi-default" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00820f8>] #2 [<c00820f8>] (bdi_forker_thread) from [<c0054138>] #3 [<c0054138>] (kthread) from [<c00272e8>] PID: 10 TASK: c382c580 CPU: 0 COMMAND: "kblockd" #0 [<c0295e40>] (schedule) from [<c00503a4>] #1 [<c00503a4>] (rescuer_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 11 TASK: c382c2e0 CPU: 0 COMMAND: "irq/53-pcf50633" #0 [<c0295e40>] (schedule) from [<c0068c6c>] #1 [<c0068c6c>] (irq_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 12 TASK: c382c040 CPU: 0 COMMAND: "kswapd0" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c007c150>] #2 [<c007c150>] (kswapd) from [<c0054138>] #3 [<c0054138>] (kthread) from [<c00272e8>] PID: 13 TASK: c38b5d60 CPU: 0 COMMAND: "fsnotify_mark" #0 [<c0295e40>] (schedule) from [<c00c89b8>] #1 [<c00c89b8>] (fsnotify_mark_destroy) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 14 TASK: c38b5ac0 CPU: 0 COMMAND: "crypto" #0 [<c0295e40>] (schedule) from [<c00503a4>] #1 [<c00503a4>] (rescuer_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 21 TASK: c3918d60 CPU: 0 COMMAND: "kworker/u:1" #0 [<c0295e40>] (schedule) from [<c0050690>] #1 [<c0050690>] (worker_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 24 TASK: c38b5040 CPU: 0 COMMAND: "mtdblock0" #0 [<c0295e40>] (schedule) from [<c01be2e8>] #1 [<c01be2e8>] (mtd_blktrans_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 25 TASK: c38b52e0 CPU: 0 COMMAND: "kworker/0:1" #0 [<c0295e40>] (schedule) from [<c0050690>] #1 [<c0050690>] (worker_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 26 TASK: c38b5580 CPU: 0 COMMAND: "mtdblock1" #0 [<c0295e40>] (schedule) from [<c01be2e8>] #1 [<c01be2e8>] (mtd_blktrans_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 27 TASK: c38b5820 CPU: 0 COMMAND: "mtdblock2" #0 [<c0295e40>] (schedule) from [<c01be2e8>] #1 [<c01be2e8>] (mtd_blktrans_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 28 TASK: c3918ac0 CPU: 0 COMMAND: "mtdblock3" #0 [<c0295e40>] (schedule) from [<c01be2e8>] #1 [<c01be2e8>] (mtd_blktrans_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 29 TASK: c3918820 CPU: 0 COMMAND: "mtdblock4" #0 [<c0295e40>] (schedule) from [<c01be2e8>] #1 [<c01be2e8>] (mtd_blktrans_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 30 TASK: c3918580 CPU: 0 COMMAND: "mtdblock5" #0 [<c0295e40>] (schedule) from [<c01be2e8>] #1 [<c01be2e8>] (mtd_blktrans_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 31 TASK: c39182e0 CPU: 0 COMMAND: "mtdblock6" #0 [<c0295e40>] (schedule) from [<c01be2e8>] #1 [<c01be2e8>] (mtd_blktrans_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 32 TASK: c3918040 CPU: 0 COMMAND: "spi_gpio.2" #0 [<c0295e40>] (schedule) from [<c00503a4>] #1 [<c00503a4>] (rescuer_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 35 TASK: c398e820 CPU: 0 COMMAND: "irq/132-glamo-m" #0 [<c0295e40>] (schedule) from [<c0068c6c>] #1 [<c0068c6c>] (irq_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 37 TASK: c398e2e0 CPU: 0 COMMAND: "mmcqd/1" #0 [<c0295e40>] (schedule) from [<c01fa9a4>] #1 [<c01fa9a4>] (mmc_queue_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 38 TASK: c398e040 CPU: 0 COMMAND: "kjournald" #0 [<c0295e40>] (schedule) from [<c01057fc>] #1 [<c01057fc>] (kjournald) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 91 TASK: c3a30d60 CPU: 0 COMMAND: "udevd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 125 TASK: c3b0d580 CPU: 0 COMMAND: "khubd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<bf0044cc>] #2 [<bf0044cc>] ($a [usbcore]) from [<c0054138>] #3 [<c0054138>] (kthread) from [<c00272e8>] PID: 364 TASK: c2c36ac0 CPU: 0 COMMAND: "kworker/u:2" #0 [<c0295e40>] (schedule) from [<c0050690>] #1 [<c0050690>] (worker_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 375 TASK: c3abb040 CPU: 0 COMMAND: "udevd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 597 TASK: c3abb2e0 CPU: 0 COMMAND: "rsyslogd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 598 TASK: c3a0d580 CPU: 0 COMMAND: "rs:main Q:Reg" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 599 TASK: c3a30040 CPU: 0 COMMAND: "rsyslogd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 600 TASK: c2c36d60 CPU: 0 COMMAND: "rsyslogd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 620 TASK: c3a0dac0 CPU: 0 COMMAND: "dbus-daemon" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 633 TASK: c2ccf040 CPU: 0 COMMAND: "dropbear" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 692 TASK: c3b0dd60 CPU: 0 COMMAND: "watchdog" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 796 TASK: c2c2dd60 CPU: 0 COMMAND: "answering-machi" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 802 TASK: c3a0d820 CPU: 0 COMMAND: "xdm" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 809 TASK: c2c36580 CPU: 0 COMMAND: "Xorg" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 817 TASK: c2c2d2e0 CPU: 0 COMMAND: "xdm" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 834 TASK: c3a0d040 CPU: 0 COMMAND: "xvkbd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1170 TASK: c398e580 CPU: 0 COMMAND: "dropbear" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1171 TASK: c2ccf2e0 CPU: 0 COMMAND: "flush-179:0" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00b83b0>] #2 [<c00b83b0>] (bdi_writeback_thread) from [<c0054138>] #3 [<c0054138>] (kthread) from [<c00272e8>] PID: 1172 TASK: c2ccfac0 CPU: 0 COMMAND: "bash" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1180 TASK: c2ccf820 CPU: 0 COMMAND: "screen" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1181 TASK: c3a302e0 CPU: 0 COMMAND: "screen" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1182 TASK: c3a30ac0 CPU: 0 COMMAND: "bash" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1253 TASK: c3b0d2e0 CPU: 0 COMMAND: "kworker/0:0" #0 [<c0295e40>] (schedule) from [<c0050690>] #1 [<c0050690>] (worker_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 1258 TASK: c3b0dac0 CPU: 0 COMMAND: "udevd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1259 TASK: c2c2dac0 CPU: 0 COMMAND: "bash" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1333 TASK: c2c2d820 CPU: 0 COMMAND: "stap" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1560 TASK: c2c2d580 CPU: 0 COMMAND: "stapio" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1561 TASK: c2c2d040 CPU: 0 COMMAND: "stapio" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1562 TASK: c2ccf580 CPU: 0 COMMAND: "stapio" #0 [<c0295e40>] (schedule) from [<c005b020>] #1gta02_bat_get_voltage) from [<c002ee00>] #5 [<c002ee00>] (gta02_bat_get_capacity) from [<c01ee3e8>] #6 [<c01ee3e8>] (platform_bat_get_property) from [<c01edd64>] #7 [<c01edd64>] (power_supply_show_property) from [<c01edf80>] #8 [<c01edf80>] (power_supply_uevent) from [<c01a95d0>] #9 [<c01a95d0>] (dev_uevent) from [<c0162654>] #10 [<c0162654>] (kobject_uevent_env) from [<c004ff68>] #11 [<c004ff68>] (process_one_work) from [<c00505b8>] #12 [<c00505b8>] (worker_thread) from [<c0054138>] #13 [<c0054138>] (kthread) from [<c00272e8>] PID: 1597 TASK: c398ed60 CPU: 0 COMMAND: "kworker/0:3" #0 [<c0295e40>] (schedule) from [<c0050690>] #1 [<c0050690>] (worker_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 1598 TASK: c398eac0 CPU: 0 COMMAND: "kworker/0:4" #0 [<c0295e40>] (schedule) from [<c0050690>] #1 [<c0050690>] (worker_thread) from [<c0054138>] #2 [<c0054138>] (kthread) from [<c00272e8>] PID: 1599 TASK: c2ccfd60 CPU: 0 COMMAND: "udevd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1600 TASK: c3817580 CPU: 0 COMMAND: "udevd" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] PID: 1606 TASK: c3abb580 CPU: 0 COMMAND: "sleep" #0 [<c0295e40>] (schedule) from [<c005b020>] #1 [<c005b020>] (refrigerator) from [<c00288c8>] #2 [<c00288c8>] (do_signal) from [<c0028bd8>] #3 [<c0028bd8>] (do_notify_resume) from [<c0026a94>] >From the above we can see that pcf50633_adc_sync_read has got stuck before we have suspended. To verify this theory I disabled gta02_bat_get_voltage with systemtap: the kernel does not hang in this way anymore: 13) cat > disable-gta02_bat_get_voltage.stp <<EOF #!/usr/bin/stap -g function _begin () %{ /* Address of branch inside gta02_bat_get_voltage to pcf50633_adc_sync_read */ __u32 *branch = (void*)0xc002eda4; _stp_printf("*branch = %x\n", *branch); if (*branch == 0xeb062230) { *branch = 0xe1a00000; /* nop */ } %} function _end () %{ /* Address of branch inside gta02_bat_get_voltage to pcf50633_adc_sync_read */ __u32 *branch = (void*)0xc002eda4; _stp_printf("*branch = %x\n", *branch); if (*branch == 0xe1a00000) { *branch = 0xeb062230; /* bl */ } %} probe begin { _begin(); } probe end { _end(); } EOF 14) chmod a+x disable-gta02_bat_get_voltage.stp 15) ./disable-gta02_bat_get_voltage.stp Unfortunately there are still other suspend bugs that are less common but still sometimes occur. -Timo
