Hi all, some of my RHEL6-systems are facing a kernel panic from time to time. 2 of them are huge HP's (DL585 G7) with 48cores and 128GB, one of them is an older Primergy RX300S2 (4 cores, 8GB). Some other systems (also Primergies) run fine all the time...
some other facts: - all filesystems ext4 - nfs4 enabled - 3 bonding devices, each having 2 physical devices - 2 of the bonding devices configured for jumbo frames (MTU=9000) Here's the console-log from one of the HP's: ------------[ cut here ]------------ kernel BUG at fs/inode.c:1333! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu47/cache/index2/shared_cpu_map CPU 4 Modules linked in: iptable_filter ip_tables nfs fscache fuse nfsd nfs_acl auth_rpcgss exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 dm_mirror dm_region_hash dm_log uinput power_meter hwmon bnx2 amd64_edac_mod edac_core edac_mce_amd i2c_piix4 sg h pilo nx_nic(U) ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic pata_atiixp ahci hpsa(U) radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: freq_table] Modules linked in: iptable_filter ip_tables nfs fscache fuse nfsd nfs_acl auth_rpcgss exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 dm_mirror dm_region_hash dm_log uinput power_meter hwmon bnx2 amd64_edac_mod edac_core edac_mce_amd i2c_piix4 sg h pilo nx_nic(U) ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic pata_atiixp ahci hpsa(U) radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: freq_table] Pid: 3393, comm: lockd Tainted: G W ---------------- 2.6.32-71.14.1.el6.x86_64 #1 ProLiant DL585 G7 RIP: 0010:[<ffffffff81186bf9>] [<ffffffff81186bf9>] iput+0x69/0x70 RSP: 0018:ffff88082b86fce0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8802fc8616c8 RCX: 000000000000c60e RDX: ffff88202e13a901 RSI: ffffffffa0341de0 RDI: ffff8802fc8616c8 RBP: ffff88082b86fcf0 R08: 000000000002ac45 R09: 0000000000000000 R10: 000000000000000f R11: 0000000000000000 R12: ffff880227b49c00 R13: ffffffffa034e060 R14: ffff88202e13a940 R15: 00000000fffffff5 FS: 00007fac6a0247c0(0000) GS:ffff88002c240000(0000) knlGS:00000000f77916c0 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007fac6a048000 CR3: 0000000c2da36000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 Process lockd (pid: 3393, threadinfo ffff88082b86e000, task ffff88082d9ab4e0) Stack: ffff88082b86fd40 ffff8802fc861680 ffff88082b86fd10 ffffffff813fdbf1 <0> ffff880227b49c00 ffff880227b49c00 ffff88082b86fd30 ffffffffa03351f8 <0> ffff88082b86fd30 ffff880227b49c10 ffff88082b86fd60 ffffffffa0341e2c Call Trace: [<ffffffff813fdbf1>] sock_release+0x71/0x90 [<ffffffffa03351f8>] svc_sock_free+0x48/0x70 [sunrpc] [<ffffffffa0341e2c>] svc_xprt_free+0x4c/0x70 [sunrpc] [<ffffffffa0341de0>] ? svc_xprt_free+0x0/0x70 [sunrpc] [<ffffffff8125cb97>] kref_put+0x37/0x70 [<ffffffffa0340f29>] svc_xprt_put+0x19/0x20 [sunrpc] [<ffffffffa0341191>] svc_xprt_release+0xc1/0xe0 [sunrpc] [<ffffffffa03415bd>] svc_recv+0x2ed/0x830 [sunrpc] [<ffffffff8105c530>] ? default_wake_function+0x0/0x20 [<ffffffffa02f6291>] lockd+0xc1/0x230 [lockd] [<ffffffffa02f61d0>] ? lockd+0x0/0x230 [lockd] [<ffffffff81091a76>] kthread+0x96/0xa0 [<ffffffff810141ca>] child_rip+0xa/0x20 [<ffffffff810919e0>] ? kthread+0x0/0xa0 [<ffffffff810141c0>] ? child_rip+0x0/0x20 Code: 38 48 c7 c0 f0 7c 18 81 48 85 d2 74 12 48 8b 42 20 48 c7 c2 f0 7c 18 81 48 85 c0 48 0f 44 c2 48 89 df ff d0 48 83 c4 08 5b c9 c3 <0f> 0b eb fe 0f 1f 00 55 48 89 e5 41 55 41 54 53 48 83 ec 08 0f RIP [<ffffffff81186bf9>] iput+0x69/0x70 RSP <ffff88082b86fce0> ÿMounting proc filesystem Mounting sysfs filesystem Creating /dev Creating initial device nodes Free memory/Total memory (free %): 456164 / 495584 ( 92.0457 ) Loading jbd2.ko module Loading mbcache.ko module Loading ext4.ko module Loading crc-t10dif.ko module Loading sd_mod.ko module Loading ata_generic.ko module Loading exportfs.ko module Loading autofs4.ko module Loading ipmi_msghandler.ko module Loading sunrpc.ko module Loading ipv6.ko module Loading uinput.ko module Loading hwmon.ko module Loading bnx2.ko module Loading edac_core.ko module Loading edac_mce_amd.ko module Loading sg.ko module Loading hpilo.ko module Loading nx_nic.ko module Loading cdrom.ko module Loading pata_acpi.ko module Loading pata_atiixp.ko module Loading ahci.ko module Loading hpsa.ko module hpsa 0000:03:00.0: controller message 03:00 timed out hpsa 0000:03:00.0: controller message 03:00 timed out hpsa 0000:03:00.0: controller message 03:00 timed out hpsa 0000:44:00.0: controller message 03:00 timed out hpsa 0000:44:00.0: controller message 03:00 timed out hpsa 0000:44:00.0: controller message 03:00 timed out Loading i2c-core.ko module Loading dm-mod.ko module Loading nfs_acl.ko module Loading auth_rpcgss.ko module Loading ipmi_devintf.ko module Loading ipmi_si.ko module Loading lockd.ko module Loadingpower_meter ACPI000D:00: Ignoring unsafe software power cap! bonding.ko module Loading dm-log.ko module Loading power_meter.ko module Loading amd64_edac_mod.ko module Loading i2c-piix4.ko module Loading sr_mod.ko module Loading drm.ko module Loading i2c-algo-bit.ko module Loading nfsd.ko module Loading dm-region-hash.ko module Loading ttm.ko module Loading drm_kms_helper.ko module Loading dm-mirror.ko module Loading radeon.ko module Waiting for required block device discovery Waiting for 8 sdd-like device(s)...Found Creating Block Devices Creating block device loop0 Creating block device loop1 Creating block device loop2 Creating block device loop3 Creating block device loop4 Creating block device loop5 Creating block device loop6 Creating block device loop7 Creating block device ram0 Creating block device ram1 Creating block device ram10 Creating block device ram11 Creating block device ram12 Creating block device ram13 Creating block device ram14 Creating block device ram15 Creating block device ram2 Creating block device ram3 Creating block device ram4 Creating block device ram5 Creating block device ram6 Creating block device ram7 Creating block device ram8 Creating block device ram9 Creating block device sda Creating block device sdb Creating block device sdc Creating block device sdd Creating block device sr0 mdadm: No arrays found in config file or automatically Free memory/Total memory (free %): 432796 / 495584 ( 87.3305 ) Saving to the local filesystem /dev/sdd1 e2fsck 1.41.12 (17-May-2010) Homes: recovering journal Homes: clean, 9073003/164782080 files, 387571383/659105347 blocks Free memory/Total memory (free %): 427248 / 495584 ( 86.211 ) Copying data : [ 2 %] Copying data : [100 %] Saving core complete Restarting system. Backtrace from crash-dump utility shows: GNU gdb (GDB) 7.0 Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: /usr/lib/debug/lib/modules/2.6.32-71.14.1.el6.x86_64/vmlinux DUMPFILE: ./vmcore [PARTIAL DUMP] CPUS: 48 DATE: Wed Feb 9 09:30:52 2011 UPTIME: 14 days, 13:57:19 LOAD AVERAGE: 3.65, 3.39, 3.25 TASKS: 1663 NODENAME: hydra.sie.siemens.at RELEASE: 2.6.32-71.14.1.el6.x86_64 VERSION: #1 SMP Wed Jan 5 17:01:01 EST 2011 MACHINE: x86_64 (2095 Mhz) MEMORY: 128 GB PANIC: "kernel BUG at fs/inode.c:1333!" PID: 3393 COMMAND: "lockd" TASK: ffff88082d9ab4e0 [THREAD_INFO: ffff88082b86e000] CPU: 4 STATE: TASK_RUNNING (PANIC) crash> bt PID: 3393 TASK: ffff88082d9ab4e0 CPU: 4 COMMAND: "lockd" #0 [ffff88082b86f9a0] machine_kexec at ffffffff8103695b #1 [ffff88082b86fa00] crash_kexec at ffffffff810b9068 #2 [ffff88082b86fad0] oops_end at ffffffff814cc6e0 #3 [ffff88082b86fb00] die at ffffffff8101733b #4 [ffff88082b86fb30] do_trap at ffffffff814cbfb4 #5 [ffff88082b86fb90] do_invalid_op at ffffffff81014ee5 #6 [ffff88082b86fc30] invalid_op at ffffffff81013f5b [exception RIP: iput+105] RIP: ffffffff81186bf9 RSP: ffff88082b86fce0 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8802fc8616c8 RCX: 000000000000c60e RDX: ffff88202e13a901 RSI: ffffffffa0341de0 RDI: ffff8802fc8616c8 RBP: ffff88082b86fcf0 R8: 000000000002ac45 R9: 0000000000000000 R10: 000000000000000f R11: 0000000000000000 R12: ffff880227b49c00 R13: ffffffffa034e060 R14: ffff88202e13a940 R15: 00000000fffffff5 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88082b86fcf8] sock_release at ffffffff813fdbf1 #8 [ffff88082b86fd18] svc_sock_free at ffffffffa03351f8 #9 [ffff88082b86fd38] svc_xprt_free at ffffffffa0341e2c #10 [ffff88082b86fd68] kref_put at ffffffff8125cb97 #11 [ffff88082b86fd88] svc_xprt_put at ffffffffa0340f29 #12 [ffff88082b86fd98] svc_xprt_release at ffffffffa0341191 #13 [ffff88082b86fdc8] svc_recv at ffffffffa03415bd #14 [ffff88082b86fe58] lockd at ffffffffa02f6291 #15 [ffff88082b86fee8] kthread at ffffffff81091a76 #16 [ffff88082b86ff48] kernel_thread at ffffffff810141ca crash> any idea? any hint? what else can i do to find the reason for these panics? how to solve it? thanks a lot, christian
_______________________________________________ rhelv6-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv6-list
