Regards,
YB Tan Sri Dato' Sri Adli a.k.a Dell
my.linkedin.com/pub/yb-tan-sri-dato-sri-adli-a-k-a-dell/44/64b/464/
H/p number: (017) 362 3661
________________________________
From: Scooter Morris <[email protected]>
To: linux clustering <[email protected]>
Sent: Thursday, April 18, 2013 3:34 AM
Subject: Re: [Linux-cluster] GFS2 crashes - sys_rename
There is a fix for that. Request a patched kernel for bugzilla bug# 92299
from your RedHat support folks. We had the same problem and the patched kernel
resolved it.
-- scooter
On 04/17/2013 12:02 PM, 5hosting Team wrote:
>Hey guys,
>
>We run a 40 node webcluster (only apache, php processes) and the nodes keep on
>crashing with a kernel panic. For me it looks like the rename of a
>file/directory aint working. I found someone posting the same a few days ago
>and it should be fixed in kernel 2.6.32-358.2.1.el6, but that’s the kernel
>we’re running. And we just used fsck yesterday night to check for problems
>with the file system. So something doesn’t seem right.
>
>Here are 3 crashlogs from 3 different nodes:
>Apr 17 20:20:16 001 kernel: Modules linked in: gfs2 dlm configfs sg sd_mod
>crc_t10dif ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp
>iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
>nf_conntrack ip6table_filter ip6_tables serio_raw i2c_i801 i2c_core iTCO_wdt
>iTCO_vendor_support shpchp ahci video output e1000e dm_mirror dm_region_hash
>dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic
>uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx
>iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
>Apr 17 20:20:16 001 kernel:
>Apr 17 20:20:16 001 kernel: Pid: 2915, comm: php-cgi Not tainted
>2.6.32-358.2.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM
>Apr 17 20:20:16 001 kernel: RIP: 0010:[<ffffffffa04266ff>]
>[<ffffffffa04266ff>] gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]
>Apr 17 20:20:16 001 kernel: RSP: 0018:ffff880417e8ba58 EFLAGS: 00010283
>Apr 17 20:20:16 001 kernel: RAX: ffff8804185a3da8 RBX: 0000000000000003 RCX:
>000000000db41094
>Apr 17 20:20:16 001 kernel: RDX: 000000000db41094 RSI: 000000000db21756 RDI:
>ffff8804187ef440
>Apr 17 20:20:16 001 kernel: RBP: ffff880417e8bb18 R08: 0000000000000000 R09:
>0000000000000000
>Apr 17 20:20:16 001 kernel: R10: 0000000000001000 R11: 0000000000000000 R12:
>ffff8804187ef000
>Apr 17 20:20:16 001 kernel: R13: 0000000000000000 R14: ffff88041519c3e0 R15:
>ffff880417e8bb78
>Apr 17 20:20:16 001 kernel: FS: 00007f07791ff7c0(0000)
>GS:ffff880028200000(0000) knlGS:0000000000000000
>Apr 17 20:20:16 001 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>Apr 17 20:20:16 001 kernel: CR2: 0000000000000060 CR3: 0000000411e94000 CR4:
>00000000001407f0
>Apr 17 20:20:16 001 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>0000000000000000
>Apr 17 20:20:16 001 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>0000000000000400
>Apr 17 20:20:16 001 kernel: Process php-cgi (pid: 2915, threadinfo
>ffff880417e8a000, task ffff8804125cc040)
>Apr 17 20:20:16 001 kernel: Stack:
>Apr 17 20:20:16 001 kernel: ffff8804163dbab0 ffffffffa04012d0 ffff880417e8ba78
>ffff8804163dd0c0
>Apr 17 20:20:16 001 kernel: <d> ffff8804163dbab0 00000115a04012d0
>ffff880417e8ba98 ffff8804187ef000
>Apr 17 20:20:16 001 kernel: <d> ffff880417e8baf8 ffffffffa0402931
>ffff8804185a3da8 0000000000000000
>Apr 17 20:20:16 001 kernel: Call Trace:
>Apr 17 20:20:16 001 kernel: [<ffffffffa04012d0>] ?
>gfs2_dirent_find_space+0x0/0x50 [gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa0402931>] ?
>gfs2_dirent_search+0x191/0x1a0 [gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa041e041>] gfs2_rename+0x6b1/0x8c0 [gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa041dab8>] ? gfs2_rename+0x128/0x8c0
>[gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa041dad6>] ? gfs2_rename+0x146/0x8c0
>[gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa041dafc>] ? gfs2_rename+0x16c/0x8c0
>[gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa040c44f>] ? gfs2_glock_put+0x3f/0x180
>[gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa040c8a3>] ?
>gfs2_holder_uninit+0x23/0x40 [gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa040da5e>] ?
>gfs2_glock_dq_uninit+0x1e/0x30 [gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa041c9dc>] ? gfs2_permission+0x9c/0x100
>[gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffffa041da65>] ? gfs2_rename+0xd5/0x8c0
>[gfs2]
>Apr 17 20:20:16 001 kernel: [<ffffffff8118ffdb>] vfs_rename+0x3ab/0x440
>Apr 17 20:20:16 001 kernel: [<ffffffff81191d0a>] sys_renameat+0x1da/0x240
>Apr 17 20:20:16 001 kernel: [<ffffffff81277495>] ?
>_atomic_dec_and_lock+0x55/0x80
>Apr 17 20:20:16 001 kernel: [<ffffffff81186874>] ? cp_new_stat+0xe4/0x100
>Apr 17 20:20:16 001 kernel: [<ffffffff81186c46>] ? sys_newstat+0x36/0x50
>Apr 17 20:20:16 001 kernel: [<ffffffff810dc8f7>] ?
>audit_syscall_entry+0x1d7/0x200
>Apr 17 20:20:16 001 kernel: [<ffffffff81191d8b>] sys_rename+0x1b/0x20
>Apr 17 20:20:16 001 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>Apr 17 20:20:16 001 kernel: Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 8b 4d a0
>48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 75 d6 ff ff 48 89
>45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 e9 01 fb ff ff 48
>Apr 17 20:20:16 001 kernel: RIP [<ffffffffa04266ff>]
>gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]
>Apr 17 20:20:16 001 kernel: RSP <ffff880417e8ba58>
>Apr 17 20:20:16 001 kernel: CR2: 0000000000000060
>Apr 17 20:20:16 001 kernel: ---[ end trace 0647d0d2004566f6 ]---
>
>
>Apr 17 20:21:00 002 kernel: Modules linked in: gfs2 dlm configfs sg sd_mod
>crc_t10dif ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp
>iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
>nf_conntrack ip6table_filter ip6_tables serio_raw i2c_i801 i2c_core iTCO_wdt
>iTCO_vendor_support shpchp ahci video output e1000e dm_mirror dm_region_hash
>dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic
>uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx
>iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
>Apr 17 20:21:00 002 kernel:
>Apr 17 20:21:00 002 kernel: Pid: 2839, comm: php-cgi Not tainted
>2.6.32-358.2.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM
>Apr 17 20:21:00 002 kernel: RIP: 0010:[<ffffffffa04266ff>]
>[<ffffffffa04266ff>] gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]
>Apr 17 20:21:00 002 kernel: RSP: 0000:ffff8803f518ba58 EFLAGS: 00010283
>Apr 17 20:21:00 002 kernel: RAX: ffff88041447bda8 RBX: 0000000000000003 RCX:
>000000000db41094
>Apr 17 20:21:00 002 kernel: RDX: 000000000db41094 RSI: 000000000db21756 RDI:
>ffff880414cd5440
>Apr 17 20:21:00 002 kernel: RBP: ffff8803f518bb18 R08: 0000000000000000 R09:
>0000000000000000
>Apr 17 20:21:00 002 kernel: R10: 0000000000001000 R11: 0000000000000000 R12:
>ffff880414cd5000
>Apr 17 20:21:00 002 kernel: R13: 0000000000000000 R14: ffff8803f9e918c0 R15:
>ffff8803f518bb78
>Apr 17 20:21:00 002 kernel: FS: 00007f6a7e8a27c0(0000)
>GS:ffff8800282c0000(0000) knlGS:0000000000000000
>Apr 17 20:21:00 002 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>Apr 17 20:21:00 002 kernel: CR2: 0000000000000060 CR3: 00000003f6313000 CR4:
>00000000001407e0
>Apr 17 20:21:00 002 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>0000000000000000
>Apr 17 20:21:00 002 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>0000000000000400
>Apr 17 20:21:00 002 kernel: Process php-cgi (pid: 2839, threadinfo
>ffff8803f518a000, task ffff8803f5189540)
>Apr 17 20:21:00 002 kernel: Stack:
>Apr 17 20:21:00 002 kernel: ffff880411bfbcb0 ffffffffa04012d0 ffff8803f518ba78
>ffff88041518eb60
>Apr 17 20:21:00 002 kernel: <d> ffff880411bfbcb0 00000115a04012d0
>ffff8803f518ba98 ffff880414cd5000
>Apr 17 20:21:00 002 kernel: <d> ffff8803f518baf8 ffffffffa0402931
>ffff88041447bda8 0000000000000000
>Apr 17 20:21:00 002 kernel: Call Trace:
>Apr 17 20:21:00 002 kernel: [<ffffffffa04012d0>] ?
>gfs2_dirent_find_space+0x0/0x50 [gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa0402931>] ?
>gfs2_dirent_search+0x191/0x1a0 [gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa041e041>] gfs2_rename+0x6b1/0x8c0 [gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa041dab8>] ? gfs2_rename+0x128/0x8c0
>[gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa041dad6>] ? gfs2_rename+0x146/0x8c0
>[gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa041dafc>] ? gfs2_rename+0x16c/0x8c0
>[gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa040c44f>] ? gfs2_glock_put+0x3f/0x180
>[gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa040c8a3>] ?
>gfs2_holder_uninit+0x23/0x40 [gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa040da5e>] ?
>gfs2_glock_dq_uninit+0x1e/0x30 [gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa041c9dc>] ? gfs2_permission+0x9c/0x100
>[gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffffa041da65>] ? gfs2_rename+0xd5/0x8c0
>[gfs2]
>Apr 17 20:21:00 002 kernel: [<ffffffff8118ffdb>] vfs_rename+0x3ab/0x440
>Apr 17 20:21:00 002 kernel: [<ffffffff81191d0a>] sys_renameat+0x1da/0x240
>Apr 17 20:21:00 002 kernel: [<ffffffff81277495>] ?
>_atomic_dec_and_lock+0x55/0x80
>Apr 17 20:21:00 002 kernel: [<ffffffff81186874>] ? cp_new_stat+0xe4/0x100
>Apr 17 20:21:00 002 kernel: [<ffffffff81186c46>] ? sys_newstat+0x36/0x50
>Apr 17 20:21:00 002 kernel: [<ffffffff810dc8f7>] ?
>audit_syscall_entry+0x1d7/0x200
>Apr 17 20:21:00 002 kernel: [<ffffffff81191d8b>] sys_rename+0x1b/0x20
>Apr 17 20:21:00 002 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>Apr 17 20:21:00 002 kernel: Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 8b 4d a0
>48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 75 d6 ff ff 48 89
>45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 e9 01 fb ff ff 48
>Apr 17 20:21:00 002 kernel: RIP [<ffffffffa04266ff>]
>gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]
>Apr 17 20:21:00 002 kernel: RSP <ffff8803f518ba58>
>Apr 17 20:21:00 002 kernel: CR2: 0000000000000060
>Apr 17 20:21:00 002 kernel: ---[ end trace 1425fd0e2954015a ]---
>
>
>Apr 17 20:12:49 003 kernel: BUG: unable to handle kernel NULL pointer
>dereference at 0000000000000060
>Apr 17 20:12:49 003 kernel: IP: [<ffffffffa04236ff>]
>gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]
>Apr 17 20:12:49 003 kernel: PGD 3d96fc067 PUD 3d2c0a067 PMD 0
>Apr 17 20:12:49 003 kernel: Oops: 0002 [#1] SMP
>Apr 17 20:12:49 003 kernel: last sysfs file: /sys/kernel/dlm/b1/control
>Apr 17 20:12:49 003 kernel: CPU 1
>Apr 17 20:12:49 003 kernel: Modules linked in: gfs2 dlm configfs sg sd_mod
>crc_t10dif ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp
>iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
>nf_conntrack ip6table_filter ip6_tables serio_raw i2c_i801 i2c_core iTCO_wdt
>iTCO_vendor_support shpchp ahci video output e1000e dm_mirror dm_region_hash
>dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic
>uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx
>iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
>Apr 17 20:12:49 003 kernel:
>Apr 17 20:12:49 003 kernel: Pid: 3386, comm: php-cgi Not tainted
>2.6.32-358.2.1.el6.x86_64 #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM
>Apr 17 20:12:49 003 kernel: RIP: 0010:[<ffffffffa04236ff>]
>[<ffffffffa04236ff>] gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]
>Apr 17 20:12:49 003 kernel: RSP: 0018:ffff8803d1c27a58 EFLAGS: 00010283
>Apr 17 20:12:49 003 kernel: RAX: ffff880416771da8 RBX: 0000000000000003 RCX:
>000000000db41094
>Apr 17 20:12:49 003 kernel: RDX: 000000000db41094 RSI: 000000000db21756 RDI:
>ffff88041277b440
>Apr 17 20:12:49 003 kernel: RBP: ffff8803d1c27b18 R08: 0000000000000000 R09:
>0000000000000000
>Apr 17 20:12:49 003 kernel: R10: 0000000000001000 R11: 0000000000000000 R12:
>ffff88041277b000
>Apr 17 20:12:49 003 kernel: R13: 0000000000000000 R14: ffff8803a9c181c0 R15:
>ffff8803d1c27b78
>Apr 17 20:12:49 003 kernel: FS: 00007fd494d017c0(0000)
>GS:ffff880028240000(0000) knlGS:0000000000000000
>Apr 17 20:12:49 003 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>Apr 17 20:12:49 003 kernel: CR2: 0000000000000060 CR3: 00000003d170c000 CR4:
>00000000001407e0
>Apr 17 20:12:49 003 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>0000000000000000
>Apr 17 20:12:49 003 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>0000000000000400
>Apr 17 20:12:49 003 kernel: Process php-cgi (pid: 3386, threadinfo
>ffff8803d1c26000, task ffff8803d1450080)
>Apr 17 20:12:49 003 kernel: Stack:
>Apr 17 20:12:49 003 kernel: ffff8803a9c2e270 ffffffffa03fe2d0 ffff8803d1c27a78
>ffff8803cee73800
>Apr 17 20:12:49 003 kernel: <d> ffff8803a9c2e270 00000115a03fe2d0
>ffff8803d1c27a98 ffff88041277b000
>Apr 17 20:12:49 003 kernel: <d> ffff8803d1c27af8 ffffffffa03ff931
>ffff880416771da8 0000000000000000
>Apr 17 20:12:49 003 kernel: Call Trace:
>Apr 17 20:12:49 003 kernel: [<ffffffffa03fe2d0>] ?
>gfs2_dirent_find_space+0x0/0x50 [gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa03ff931>] ?
>gfs2_dirent_search+0x191/0x1a0 [gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa041b041>] gfs2_rename+0x6b1/0x8c0 [gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa041aab8>] ? gfs2_rename+0x128/0x8c0
>[gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa041aad6>] ? gfs2_rename+0x146/0x8c0
>[gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa041aafc>] ? gfs2_rename+0x16c/0x8c0
>[gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa040944f>] ? gfs2_glock_put+0x3f/0x180
>[gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa04098a3>] ?
>gfs2_holder_uninit+0x23/0x40 [gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa040aa5e>] ?
>gfs2_glock_dq_uninit+0x1e/0x30 [gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa04199dc>] ? gfs2_permission+0x9c/0x100
>[gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffffa041aa65>] ? gfs2_rename+0xd5/0x8c0
>[gfs2]
>Apr 17 20:12:49 003 kernel: [<ffffffff8118ffdb>] vfs_rename+0x3ab/0x440
>Apr 17 20:12:49 003 kernel: [<ffffffff81191d0a>] sys_renameat+0x1da/0x240
>Apr 17 20:12:49 003 kernel: [<ffffffff81277495>] ?
>_atomic_dec_and_lock+0x55/0x80
>Apr 17 20:12:49 003 kernel: [<ffffffff81186874>] ? cp_new_stat+0xe4/0x100
>Apr 17 20:12:49 003 kernel: [<ffffffff81186c46>] ? sys_newstat+0x36/0x50
>Apr 17 20:12:49 003 kernel: [<ffffffff810dc8f7>] ?
>audit_syscall_entry+0x1d7/0x200
>Apr 17 20:12:49 003 kernel: [<ffffffff81191d8b>] sys_rename+0x1b/0x20
>Apr 17 20:12:49 003 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>Apr 17 20:12:49 003 kernel: Code: 0f 84 c1 fc ff ff e9 41 fb ff ff 48 8b 4d a0
>48 8b b1 10 03 00 00 48 8b bd 78 ff ff ff ba 01 00 00 00 e8 75 d6 ff ff 48 89
>45 90 <49> 89 45 60 c7 45 9c 01 00 00 00 48 8b 45 90 e9 01 fb ff ff 48
>Apr 17 20:12:49 003 kernel: RIP [<ffffffffa04236ff>]
>gfs2_inplace_reserve+0x54f/0x7e0 [gfs2]
>Apr 17 20:12:49 003 kernel: RSP <ffff8803d1c27a58>
>Apr 17 20:12:49 003 kernel: CR2: 0000000000000060
>Apr 17 20:12:49 003 kernel: ---[ end trace 06b117dc4fff0890 ]---
>
>
>
>The call trace looks for me kinda the same on all nodes and after we rebooted
>ALL 40 nodes, the “bug” seems to be gone and the system is running fine right
>now. (it’s running 20 minutes now without rebooting, before that we had a
>reboot every half minute)
>
>Do you know anything about that – how can we fix it?
>It’s a webcluster and such crashes aren’t good. It should be online 24/7 but
>right now it doesn’t look that good.
>
>Thanks in advance, Jürgen
>
>
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster