Hi, 

We use debian for a number of machines in our storage infrastructure and we 
have recently been seeing a number of "hangs". We primary notice this by seeing 
nfsd processes locking up and then a hung task killer going wild. We finally 
managed to get a trace last night - its pasted below: 

We did not see this crash under 2.6.39 back port however this kernel 
spontaneously rebooted at ~200 days uptime (we had about 3/4 of our infra 
reboot in a few weeks. It was not a good time for our ops teams). 

I would be grateful if anybody who could help me narrow this down would jump in 
and help with requests for further info, or provide further advice. 


[11309697.466397] ------------[ cut here ]------------ 
[11309697.466556] WARNING: at 
/build/buildd-linux_3.2.23-1~bpo60+2-amd64-oLufer/linux-3.2.23/fs/jbd2/journal.c:507
 __jbd2_log_start_commit+0x7e/0x8c [jbd2]()
[11309697.466660] Hardware name: X8DT6
[11309697.466728] JBD2: bad log_start_commit: 2205591757 2205591757 14613566 0
[11309697.466798] Modules linked in: netconsole autofs4 8021q garp bridge stp 
nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding tcp_htcp ext4 jbd2 
crc16 configfs loop ohci_hcd tpm_tis tpm i7core_edac i2c_i801 snd_pcm snd_timer 
snd soundcore edac_core i2c_core ioatdma tpm_bios snd_page_alloc coretemp 
crc32c_intel psmouse pcspkr joydev evdev acpi_cpufreq mperf processor serio_raw 
button thermal_sys ext3 jbd mbcache usbhid hid sd_mod ses enclosure crc_t10dif 
uhci_hcd ahci libahci libata igb ehci_hcd e1000e usbcore dca megaraid_sas 
usb_common scsi_mod [last unloaded: netconsole]
[11309697.470190] Pid: 62, comm: kswapd0 Not tainted 3.2.0-0.bpo.3-amd64 #1
[11309697.470261] Call Trace:
[11309697.470329] [<ffffffff810498a8>] ? warn_slowpath_common+0x78/0x8c
[11309697.470399] [<ffffffff8104995a>] ? warn_slowpath_fmt+0x45/0x4a
[11309697.470471] [<ffffffffa01cabad>] ? __jbd2_log_start_commit+0x7e/0x8c 
[jbd2]
[11309697.470558] [<ffffffffa01cac83>] ? jbd2_log_start_commit+0x21/0x2f [jbd2]
[11309697.470634] [<ffffffffa02dee7a>] ? ext4_evict_inode+0x86/0x2d1 [ext4]
[11309697.470707] [<ffffffff81119626>] ? evict+0x9a/0x14e
[11309697.470775] [<ffffffff811198b4>] ? dispose_list+0x35/0x3f
[11309697.470844] [<ffffffff81119b87>] ? prune_icache_sb+0x2c9/0x2d8
[11309697.470915] [<ffffffff811081b0>] ? prune_super+0xd6/0x147
[11309697.470987] [<ffffffff810cb9e2>] ? shrink_slab+0x1a3/0x266
[11309697.471056] [<ffffffff810cd937>] ? balance_pgdat+0x335/0x625
[11309697.471126] [<ffffffff810cdf31>] ? kswapd+0x30a/0x325
[11309697.471196] [<ffffffff81063815>] ? wake_up_bit+0x20/0x20
[11309697.471265] [<ffffffff810cdc27>] ? balance_pgdat+0x625/0x625
[11309697.471334] [<ffffffff810cdc27>] ? balance_pgdat+0x625/0x625
[11309697.471403] [<ffffffff810633d9>] ? kthread+0x7a/0x82
[11309697.471472] [<ffffffff8136d3f4>] ? kernel_thread_helper+0x4/0x10
[11309697.471543] [<ffffffff8106335f>] ? kthread_worker_fn+0x147/0x147
[11309697.471613] [<ffffffff8136d3f0>] ? gs_change+0x13/0x13
[11309697.471680] ---[ end trace 56d2be5ea52d0917 ]---





-- 
George Barnett

gbarn...@atlassian.com


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/b2ec601cdda242189a46599b31ea6...@atlassian.com

Reply via email to