There's a similar bug about this same crash in the 358 kernel. Its a different bug. I rolled back to the previous for now, Redhat should have a fix soon.
--larry On 04/17/2013 03:02 PM, 5hosting Team wrote: > > Hey guys, > > > > We run a 40 node webcluster (only apache, php processes) and the nodes > keep on crashing with a kernel panic. For me it looks like the rename > of a file/directory aint working. I found someone posting the same a > few days ago and it should be fixed in kernel 2.6.32-358.2.1.el6, but > that’s the kernel we’re running. And we just used fsck yesterday night > to check for problems with the file system. So something doesn’t seem > right. > > > > Here are 3 crashlogs from 3 different nodes: > > Apr 17 20:20:16 001 kernel: Modules linked in: gfs2 dlm configfs sg > sd_mod crc_t10dif ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core > ib_addr iscsi_tcp iptable_filter ip_tables ip6t_REJECT > nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter > ip6_tables serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support > shpchp ahci video output e1000e dm_mirror dm_region_hash dm_log dm_mod > nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio > ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx > iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: > scsi_wait_scan] > > Apr 17 20:20:16 001 kernel: > > Apr 17 20:20:16 001 kernel: Pid: 2915, comm: php-cgi Not tainted > 2.6.32-358.2.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM > > Apr 17 20:20:16 001 kernel: RIP: 0010:[<ffffffffa04266ff>] > [<ffffffffa04266ff>] gfs2_inplace_reserve+0x54f/0x7e0 [gfs2] > > Apr 17 20:20:16 001 kernel: RSP: 0018:ffff880417e8ba58 EFLAGS: 00010283 > > Apr 17 20:20:16 001 kernel: RAX: ffff8804185a3da8 RBX: > 0000000000000003 RCX: 000000000db41094 > > Apr 17 20:20:16 001 kernel: RDX: 000000000db41094 RSI: > 000000000db21756 RDI: ffff8804187ef440 > > Apr 17 20:20:16 001 kernel: RBP: ffff880417e8bb18 R08: > 0000000000000000 R09: 0000000000000000 > > Apr 17 20:20:16 001 kernel: R10: 0000000000001000 R11: > 0000000000000000 R12: ffff8804187ef000 > > Apr 17 20:20:16 001 kernel: R13: 0000000000000000 R14: > ffff88041519c3e0 R15: ffff880417e8bb78 > > Apr 17 20:20:16 001 kernel: FS: 00007f07791ff7c0(0000) > GS:ffff880028200000(0000) knlGS:0000000000000000 > > Apr 17 20:20:16 001 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Apr 17 20:20:16 001 kernel: CR2: 0000000000000060 CR3: > 0000000411e94000 CR4: 00000000001407f0 > > Apr 17 20:20:16 001 kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Apr 17 20:20:16 001 kernel: DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > > Apr 17 20:20:16 001 kernel: Process php-cgi (pid: 2915, threadinfo > ffff880417e8a000, task ffff8804125cc040) > > Apr 17 20:20:16 001 kernel: Stack: > > Apr 17 20:20:16 001 kernel: ffff8804163dbab0 ffffffffa04012d0 > ffff880417e8ba78 ffff8804163dd0c0 > > Apr 17 20:20:16 001 kernel: <d> ffff8804163dbab0 00000115a04012d0 > ffff880417e8ba98 ffff8804187ef000 > > Apr 17 20:20:16 001 kernel: <d> ffff880417e8baf8 ffffffffa0402931 > ffff8804185a3da8 0000000000000000 > > Apr 17 20:20:16 001 kernel: Call Trace: > > Apr 17 20:20:16 001 kernel: [<ffffffffa04012d0>] ? > gfs2_dirent_find_space+0x0/0x50 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa0402931>] ? > gfs2_dirent_search+0x191/0x1a0 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa041e041>] > gfs2_rename+0x6b1/0x8c0 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa041dab8>] ? > gfs2_rename+0x128/0x8c0 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa041dad6>] ? > gfs2_rename+0x146/0x8c0 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa041dafc>] ? > gfs2_rename+0x16c/0x8c0 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa040c44f>] ? > gfs2_glock_put+0x3f/0x180 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa040c8a3>] ? > gfs2_holder_uninit+0x23/0x40 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa040da5e>] ? > gfs2_glock_dq_uninit+0x1e/0x30 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa041c9dc>] ? > gfs2_permission+0x9c/0x100 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffffa041da65>] ? > gfs2_rename+0xd5/0x8c0 [gfs2] > > Apr 17 20:20:16 001 kernel: [<ffffffff8118ffdb>] vfs_rename+0x3ab/0x440 > > Apr 17 20:20:16 001 kernel: [<ffffffff81191d0a>] sys_renameat+0x1da/0x240 > > Apr 17 20:20:16 001 kernel: [<ffffffff81277495>] ? > _atomic_dec_and_lock+0x55/0x80 > > Apr 17 20:20:16 001 kernel: [<ffffffff81186874>] ? cp_new_stat+0xe4/0x100 > > Apr 17 20:20:16 001 kernel: [<ffffffff81186c46>] ? sys_newstat+0x36/0x50 > > Apr 17 20:20:16 001 kernel: [<ffffffff810dc8f7>] ? > audit_syscall_entry+0x1d7/0x200 > > Apr 17 20:20:16 001 kernel: [<ffffffff81191d8b>] sys_rename+0x1b/0x20 > > Apr 17 20:20:16 001 kernel: [<ffffffff8100b072>] > system_call_fastpath+0x16/0x1b > > Apr 17 20:20:16 001 kernel: Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 > 8b 4d a0 48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 > 75 d6 ff ff 48 89 45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 > e9 01 fb ff ff 48 > > Apr 17 20:20:16 001 kernel: RIP [<ffffffffa04266ff>] > gfs2_inplace_reserve+0x54f/0x7e0 [gfs2] > > Apr 17 20:20:16 001 kernel: RSP <ffff880417e8ba58> > > Apr 17 20:20:16 001 kernel: CR2: 0000000000000060 > > Apr 17 20:20:16 001 kernel: ---[ end trace 0647d0d2004566f6 ]--- > > > > > > Apr 17 20:21:00 002 kernel: Modules linked in: gfs2 dlm configfs sg > sd_mod crc_t10dif ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core > ib_addr iscsi_tcp iptable_filter ip_tables ip6t_REJECT > nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter > ip6_tables serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support > shpchp ahci video output e1000e dm_mirror dm_region_hash dm_log dm_mod > nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio > ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx > iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: > scsi_wait_scan] > > Apr 17 20:21:00 002 kernel: > > Apr 17 20:21:00 002 kernel: Pid: 2839, comm: php-cgi Not tainted > 2.6.32-358.2.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM > > Apr 17 20:21:00 002 kernel: RIP: 0010:[<ffffffffa04266ff>] > [<ffffffffa04266ff>] gfs2_inplace_reserve+0x54f/0x7e0 [gfs2] > > Apr 17 20:21:00 002 kernel: RSP: 0000:ffff8803f518ba58 EFLAGS: 00010283 > > Apr 17 20:21:00 002 kernel: RAX: ffff88041447bda8 RBX: > 0000000000000003 RCX: 000000000db41094 > > Apr 17 20:21:00 002 kernel: RDX: 000000000db41094 RSI: > 000000000db21756 RDI: ffff880414cd5440 > > Apr 17 20:21:00 002 kernel: RBP: ffff8803f518bb18 R08: > 0000000000000000 R09: 0000000000000000 > > Apr 17 20:21:00 002 kernel: R10: 0000000000001000 R11: > 0000000000000000 R12: ffff880414cd5000 > > Apr 17 20:21:00 002 kernel: R13: 0000000000000000 R14: > ffff8803f9e918c0 R15: ffff8803f518bb78 > > Apr 17 20:21:00 002 kernel: FS: 00007f6a7e8a27c0(0000) > GS:ffff8800282c0000(0000) knlGS:0000000000000000 > > Apr 17 20:21:00 002 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Apr 17 20:21:00 002 kernel: CR2: 0000000000000060 CR3: > 00000003f6313000 CR4: 00000000001407e0 > > Apr 17 20:21:00 002 kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Apr 17 20:21:00 002 kernel: DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > > Apr 17 20:21:00 002 kernel: Process php-cgi (pid: 2839, threadinfo > ffff8803f518a000, task ffff8803f5189540) > > Apr 17 20:21:00 002 kernel: Stack: > > Apr 17 20:21:00 002 kernel: ffff880411bfbcb0 ffffffffa04012d0 > ffff8803f518ba78 ffff88041518eb60 > > Apr 17 20:21:00 002 kernel: <d> ffff880411bfbcb0 00000115a04012d0 > ffff8803f518ba98 ffff880414cd5000 > > Apr 17 20:21:00 002 kernel: <d> ffff8803f518baf8 ffffffffa0402931 > ffff88041447bda8 0000000000000000 > > Apr 17 20:21:00 002 kernel: Call Trace: > > Apr 17 20:21:00 002 kernel: [<ffffffffa04012d0>] ? > gfs2_dirent_find_space+0x0/0x50 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa0402931>] ? > gfs2_dirent_search+0x191/0x1a0 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa041e041>] > gfs2_rename+0x6b1/0x8c0 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa041dab8>] ? > gfs2_rename+0x128/0x8c0 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa041dad6>] ? > gfs2_rename+0x146/0x8c0 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa041dafc>] ? > gfs2_rename+0x16c/0x8c0 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa040c44f>] ? > gfs2_glock_put+0x3f/0x180 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa040c8a3>] ? > gfs2_holder_uninit+0x23/0x40 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa040da5e>] ? > gfs2_glock_dq_uninit+0x1e/0x30 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa041c9dc>] ? > gfs2_permission+0x9c/0x100 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffffa041da65>] ? > gfs2_rename+0xd5/0x8c0 [gfs2] > > Apr 17 20:21:00 002 kernel: [<ffffffff8118ffdb>] vfs_rename+0x3ab/0x440 > > Apr 17 20:21:00 002 kernel: [<ffffffff81191d0a>] sys_renameat+0x1da/0x240 > > Apr 17 20:21:00 002 kernel: [<ffffffff81277495>] ? > _atomic_dec_and_lock+0x55/0x80 > > Apr 17 20:21:00 002 kernel: [<ffffffff81186874>] ? cp_new_stat+0xe4/0x100 > > Apr 17 20:21:00 002 kernel: [<ffffffff81186c46>] ? sys_newstat+0x36/0x50 > > Apr 17 20:21:00 002 kernel: [<ffffffff810dc8f7>] ? > audit_syscall_entry+0x1d7/0x200 > > Apr 17 20:21:00 002 kernel: [<ffffffff81191d8b>] sys_rename+0x1b/0x20 > > Apr 17 20:21:00 002 kernel: [<ffffffff8100b072>] > system_call_fastpath+0x16/0x1b > > Apr 17 20:21:00 002 kernel: Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 > 8b 4d a0 48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 > 75 d6 ff ff 48 89 45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 > e9 01 fb ff ff 48 > > Apr 17 20:21:00 002 kernel: RIP [<ffffffffa04266ff>] > gfs2_inplace_reserve+0x54f/0x7e0 [gfs2] > > Apr 17 20:21:00 002 kernel: RSP <ffff8803f518ba58> > > Apr 17 20:21:00 002 kernel: CR2: 0000000000000060 > > Apr 17 20:21:00 002 kernel: ---[ end trace 1425fd0e2954015a ]--- > > > > > > Apr 17 20:12:49 003 kernel: BUG: unable to handle kernel NULL pointer > dereference at 0000000000000060 > > Apr 17 20:12:49 003 kernel: IP: [<ffffffffa04236ff>] > gfs2_inplace_reserve+0x54f/0x7e0 [gfs2] > > Apr 17 20:12:49 003 kernel: PGD 3d96fc067 PUD 3d2c0a067 PMD 0 > > Apr 17 20:12:49 003 kernel: Oops: 0002 [#1] SMP > > Apr 17 20:12:49 003 kernel: last sysfs file: /sys/kernel/dlm/b1/control > > Apr 17 20:12:49 003 kernel: CPU 1 > > Apr 17 20:12:49 003 kernel: Modules linked in: gfs2 dlm configfs sg > sd_mod crc_t10dif ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core > ib_addr iscsi_tcp iptable_filter ip_tables ip6t_REJECT > nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter > ip6_tables serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support > shpchp ahci video output e1000e dm_mirror dm_region_hash dm_log dm_mod > nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio > ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx > iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: > scsi_wait_scan] > > Apr 17 20:12:49 003 kernel: > > Apr 17 20:12:49 003 kernel: Pid: 3386, comm: php-cgi Not tainted > 2.6.32-358.2.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM > > Apr 17 20:12:49 003 kernel: RIP: 0010:[<ffffffffa04236ff>] > [<ffffffffa04236ff>] gfs2_inplace_reserve+0x54f/0x7e0 [gfs2] > > Apr 17 20:12:49 003 kernel: RSP: 0018:ffff8803d1c27a58 EFLAGS: 00010283 > > Apr 17 20:12:49 003 kernel: RAX: ffff880416771da8 RBX: > 0000000000000003 RCX: 000000000db41094 > > Apr 17 20:12:49 003 kernel: RDX: 000000000db41094 RSI: > 000000000db21756 RDI: ffff88041277b440 > > Apr 17 20:12:49 003 kernel: RBP: ffff8803d1c27b18 R08: > 0000000000000000 R09: 0000000000000000 > > Apr 17 20:12:49 003 kernel: R10: 0000000000001000 R11: > 0000000000000000 R12: ffff88041277b000 > > Apr 17 20:12:49 003 kernel: R13: 0000000000000000 R14: > ffff8803a9c181c0 R15: ffff8803d1c27b78 > > Apr 17 20:12:49 003 kernel: FS: 00007fd494d017c0(0000) > GS:ffff880028240000(0000) knlGS:0000000000000000 > > Apr 17 20:12:49 003 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > Apr 17 20:12:49 003 kernel: CR2: 0000000000000060 CR3: > 00000003d170c000 CR4: 00000000001407e0 > > Apr 17 20:12:49 003 kernel: DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > > Apr 17 20:12:49 003 kernel: DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > > Apr 17 20:12:49 003 kernel: Process php-cgi (pid: 3386, threadinfo > ffff8803d1c26000, task ffff8803d1450080) > > Apr 17 20:12:49 003 kernel: Stack: > > Apr 17 20:12:49 003 kernel: ffff8803a9c2e270 ffffffffa03fe2d0 > ffff8803d1c27a78 ffff8803cee73800 > > Apr 17 20:12:49 003 kernel: <d> ffff8803a9c2e270 00000115a03fe2d0 > ffff8803d1c27a98 ffff88041277b000 > > Apr 17 20:12:49 003 kernel: <d> ffff8803d1c27af8 ffffffffa03ff931 > ffff880416771da8 0000000000000000 > > Apr 17 20:12:49 003 kernel: Call Trace: > > Apr 17 20:12:49 003 kernel: [<ffffffffa03fe2d0>] ? > gfs2_dirent_find_space+0x0/0x50 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa03ff931>] ? > gfs2_dirent_search+0x191/0x1a0 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa041b041>] > gfs2_rename+0x6b1/0x8c0 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa041aab8>] ? > gfs2_rename+0x128/0x8c0 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa041aad6>] ? > gfs2_rename+0x146/0x8c0 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa041aafc>] ? > gfs2_rename+0x16c/0x8c0 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa040944f>] ? > gfs2_glock_put+0x3f/0x180 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa04098a3>] ? > gfs2_holder_uninit+0x23/0x40 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa040aa5e>] ? > gfs2_glock_dq_uninit+0x1e/0x30 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa04199dc>] ? > gfs2_permission+0x9c/0x100 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffffa041aa65>] ? > gfs2_rename+0xd5/0x8c0 [gfs2] > > Apr 17 20:12:49 003 kernel: [<ffffffff8118ffdb>] vfs_rename+0x3ab/0x440 > > Apr 17 20:12:49 003 kernel: [<ffffffff81191d0a>] sys_renameat+0x1da/0x240 > > Apr 17 20:12:49 003 kernel: [<ffffffff81277495>] ? > _atomic_dec_and_lock+0x55/0x80 > > Apr 17 20:12:49 003 kernel: [<ffffffff81186874>] ? cp_new_stat+0xe4/0x100 > > Apr 17 20:12:49 003 kernel: [<ffffffff81186c46>] ? sys_newstat+0x36/0x50 > > Apr 17 20:12:49 003 kernel: [<ffffffff810dc8f7>] ? > audit_syscall_entry+0x1d7/0x200 > > Apr 17 20:12:49 003 kernel: [<ffffffff81191d8b>] sys_rename+0x1b/0x20 > > Apr 17 20:12:49 003 kernel: [<ffffffff8100b072>] > system_call_fastpath+0x16/0x1b > > Apr 17 20:12:49 003 kernel: Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 > 8b 4d a0 48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 > 75 d6 ff ff 48 89 45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 > e9 01 fb ff ff 48 > > Apr 17 20:12:49 003 kernel: RIP [<ffffffffa04236ff>] > gfs2_inplace_reserve+0x54f/0x7e0 [gfs2] > > Apr 17 20:12:49 003 kernel: RSP <ffff8803d1c27a58> > > Apr 17 20:12:49 003 kernel: CR2: 0000000000000060 > > Apr 17 20:12:49 003 kernel: ---[ end trace 06b117dc4fff0890 ]--- > > > > > > > > The call trace looks for me kinda the same on all nodes and after we > rebooted ALL 40 nodes, the “bug” seems to be gone and the system is > running fine right now. (it’s running 20 minutes now without > rebooting, before that we had a reboot every half minute) > > > > Do you know anything about that – how can we fix it? > > It’s a webcluster and such crashes aren’t good. It should be online > 24/7 but right now it doesn’t look that good. > > > > Thanks in advance, Jürgen > -- Laurence Schuler (Larry) [email protected] Systems Support ADNET Systems, Inc Scientific Visualization Studio http://svs.gsfc.nasa.gov NASA/Goddard Space Flight Center, Code 606.4 phone: 1-301-286-1799 Greenbelt, MD 20771 fax: 1-301-286-1634 Note: I am not a government employee and have no authority to obligate any federal, state or local government to perform any action or payment.
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
