Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both servers

Sérgio Surkamp Wed, 07 Dec 2011 04:32:50 -0800

Hi,

I have seen a similar problem, can you confirm if you have the same
symptoms?


The performance suddenly drop from the normal performance (in our
cluster, about 30Mbytes/s), reading/writing, to a few Kbytes/s (about
200Kbytes/s), reading only, for a while, and as sudden as it started, it
backs to the normal read/write performance, cycling randomly. When the
"read only" occurs on one node, the other shows only the heartbeat
activity (about 2 io's per 2 seconds) until the first back to normal
and vice-versa.

The servers are running e-mail application (IMAP/POP/SMTP -- Maildir
format) with more than 20.000 user, so they are constantly creating,
removing and moving files.

Dumping the processes in D state when the server is in "constant few
kbytes read only state", they look like:

node#0:
10739 D    imapd           ocfs2_lookup_lock_orphan_dir
11658 D    imapd           ocfs2_reserve_suballoc_bits
12326 D    imapd           ocfs2_lookup_lock_orphan_dir
12330 D    pop3d           lock_rename
12351 D    imapd           ocfs2_lookup_lock_orphan_dir
12357 D    imapd           ocfs2_lookup_lock_orphan_dir
12359 D    imapd           unlinkat
12381 D    imapd           ocfs2_lookup_lock_orphan_dir
12498 D    deliverquota    ocfs2_wait_for_mask
12710 D    pop3d           ocfs2_reserve_suballoc_bits
12712 D    imapd           unlinkat
12726 D    imapd           ocfs2_reserve_suballoc_bits
12730 D    imapd           unlinkat
12736 D    imapd           ocfs2_reserve_suballoc_bits
12738 D    imapd           unlinkat
12749 D    pop3d           lock_rename
12891 D    pop3d           ocfs2_reserve_suballoc_bits
12971 D    pop3d           mutex_fastpath_lock_retval
12985 D    pop3d           lock_rename
13006 D    deliverquota    ocfs2_reserve_suballoc_bits
13061 D    pop3d           lock_rename
13117 D    pop3d           lock_rename
[-- suppressed --]
100+ processes in D state

node#1:
24428 D    deliverquota    ocfs2_wait_for_mask

Some stacktraces from the processes:

Call Trace:
 [<ffffffff81437e31>] __mutex_lock_common+0x12f/0x1a1
 [<ffffffff81437ef2>] __mutex_lock_slowpath+0x19/0x1b
 [<ffffffff81437f5b>] mutex_lock+0x23/0x3a
 [<ffffffffa065ba1f>] ocfs2_lookup_lock_orphan_dir+0xb8/0x18a [ocfs2]
 [<ffffffffa065c7d5>] ocfs2_prepare_orphan_dir+0x3f/0x229 [ocfs2]
 [<ffffffffa0660bab>] ocfs2_unlink+0x523/0xa81 [ocfs2]
 [<ffffffff810425b3>] ? need_resched+0x23/0x2d
 [<ffffffff810425cb>] ? should_resched+0xe/0x2f
 [<ffffffff810425cb>] ? should_resched+0xe/0x2f
 [<ffffffff8116324d>] ? dquot_initialize+0x126/0x13d
 [<ffffffff810425b3>] ? need_resched+0x23/0x2d
 [<ffffffff81122c0c>] vfs_unlink+0x82/0xd1
 [<ffffffff81124bcc>] do_unlinkat+0xc6/0x178
 [<ffffffff8112186b>] ? path_put+0x22/0x27
 [<ffffffff810a7d03>] ? audit_syscall_entry+0x103/0x12f
 [<ffffffff81124c94>] sys_unlink+0x16/0x18
 [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b

Call Trace:
 [<ffffffff81437e31>] __mutex_lock_common+0x12f/0x1a1
 [<ffffffffa0633682>] ? ocfs2_match+0x2c/0x3a [ocfs2]
 [<ffffffff81437ef2>] __mutex_lock_slowpath+0x19/0x1b
 [<ffffffff81437f5b>] mutex_lock+0x23/0x3a
 [<ffffffffa0676a82>] ocfs2_reserve_suballoc_bits+0x11a/0x499 [ocfs2]
 [<ffffffffa0678b4c>] ocfs2_reserve_new_inode+0x134/0x37a [ocfs2]
 [<ffffffffa065d409>] ocfs2_mknod+0x2d4/0xf26 [ocfs2]
 [<ffffffffa063d02c>] ? ocfs2_should_refresh_lock_res+0x8f/0x1ad [ocfs2]
 [<ffffffffa0653cf6>] ? ocfs2_wait_for_recovery+0x1a/0x8f [ocfs2]
 [<ffffffff81437f4e>] ? mutex_lock+0x16/0x3a
 [<ffffffffa065e0fd>] ocfs2_create+0xa2/0x10a [ocfs2]
 [<ffffffff8112268f>] vfs_create+0x7e/0x9d
 [<ffffffff81125794>] do_filp_open+0x302/0x92d
 [<ffffffff810425cb>] ? should_resched+0xe/0x2f
 [<ffffffff81437731>] ? _cond_resched+0xe/0x22
 [<ffffffff81238109>] ? might_fault+0xe/0x10
 [<ffffffff812381f3>] ? __strncpy_from_user+0x20/0x4a
 [<ffffffff81114bc8>] do_sys_open+0x62/0x109
 [<ffffffff81114ca2>] sys_open+0x20/0x22
 [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b

Checking the bz, these two bugs seems to have similar behavior:

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1281
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1300

on the mailing list archive, this thread also shows similar behavior:

http://www.mail-archive.com/ocfs2-users@oss.oracle.com/msg02509.html

The cluster is formed by two Dell PE 1950 with 8G ram, attached via
2Gbit FC to a Dell EMC AX/100 storage. The network between them is
running at 1Gbit.

Running CenOS 5.7, OCFS2 1.6.4 and ULEK 2.6.32-100.0.19.el5.

Tests so far:

* We have changed mount option data from ordered to writeback -- no
  success;
* We have added mount option localalloc=16 -- no success;
* We have turned off group and user quota support -- no success;
* Rebooted the servers (to test with everything fresh) -- no success;
* Mounted the filesystem only in one node -- success;

The problem does not show when the filesystem is mounted on only one
node, so we are currently working around by exporting the filesystem
via NFS, leading me to conclude that the problem is inside the cluster
stack.

We have checked logs, debugs, traces trying to pinpoint the problem,
but with no success.

In our case, node#0 has heavier I/O load than node#1, could it trigger
something?

The filesystem is about 94% full (751G of 803G).

Regards,
Sérgio

Em Wed, 7 Dec 2011 12:02:33 +0100
Eduardo Diaz - Gmail <ediaz...@gmail.com> escreveu:

> I make the same test that you...
> 
> and.. error too :-(
> 
> 75120.532071] INFO: task o2quot/0:3714 blocked for more than 120
> seconds. [75120.532091] "echo 0
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [75120.532108] o2quot/0      D f5939dc8     0  3714      2 0x00000000
> [75120.532113]  f6bc4c80 00000046 c1441e20 f5939dc8 f6bc4e3c c1441e20
> c1441e20 f9bbdbf2
> [75120.532121]  f6bc4e3c c3a08e20 00000000 c33d1f43 0000442f f9a7c716
> f6bc4c80 f6bc4c80
> [75120.532128]  eedd5b7c f6bc4e3c f6bd6414 f6bd643c 00000004 00000000
> 011cd9ce f6bd6630
> [75120.532136] Call Trace:
> [75120.532154]  [<f9bbdbf2>] ?
> ocfs2_metadata_cache_io_unlock+0x11/0x12 [ocfs2] [75120.532163]
> [<f9a7c716>] ? start_this_handle+0x2fb/0x37e [jbd2] [75120.532171]
> [<c127f4d7>] ? __mutex_lock_common+0xe8/0x13b [75120.532176]
> [<c127f539>] ? __mutex_lock_slowpath+0xf/0x11 [75120.532180]
> [<c127f5ca>] ? mutex_lock+0x17/0x24 [75120.532184]  [<c127f5ca>] ?
> mutex_lock+0x17/0x24 [75120.532198]  [<f9bc3e91>] ?
> ocfs2_sync_dquot_helper+0x166/0x2c5 [ocfs2] [75120.532204]
> [<c10ec21d>] ? dquot_scan_active+0x63/0xab [75120.532217]
> [<f9bc3d2b>] ? ocfs2_sync_dquot_helper+0x0/0x2c5 [ocfs2]
> [75120.532231]  [<f9bc325b>] ? qsync_work_fn+0x23/0x3b [ocfs2]
> [75120.532236]  [<c1047917>] ? worker_thread+0x141/0x1bd
> [75120.532249]  [<f9bc3238>] ? qsync_work_fn+0x0/0x3b [ocfs2]
> [75120.532254]  [<c104a65a>] ? autoremove_wake_function+0x0/0x2d
> [75120.532258]  [<c10477d6>] ? worker_thread+0x0/0x1bd
> [75120.532262]  [<c104a428>] ? kthread+0x61/0x66 [75120.532266]
> [<c104a3c7>] ? kthread+0x0/0x66 [75120.532271]  [<c1008d87>] ?
> kernel_thread_helper+0x7/0x10 [75120.532276] INFO: task
> jbd2/drbd0-21:3723 blocked for more than 120 seconds.
> 
> On Wed, Dec 7, 2011 at 11:42 AM, Eduardo Diaz - Gmail
> <ediaz...@gmail.com> wrote:
> > You can make a fsck.ocfs2 to see if the filesystem are broken?
> >
> > On Wed, Dec 7, 2011 at 11:30 AM, Marek Krolikowski
> > <ad...@wset.edu.pl> wrote:
> >> Hello
> >> I use: sys-fs/ocfs2-tools-1.6.4 and create file system with all
> >> features: mkfs.ocfs2 -N 2 -L MAIL
> >> --fs-feature-level=max-features /dev/dm-0 and after this got
> >> kernel panic :(
> >>
> >>
> >>
> >> -----Oryginalna wiadomość----- From: Eduardo Diaz - Gmail
> >> Sent: Wednesday, December 07, 2011 11:08 AM
> >> To: Marek Krolikowski
> >>
> >> Cc: ocfs2-users@oss.oracle.com
> >> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read
> >> from both servers
> >>
> >> Try to use other filesystem for example xfs and make a full test.
> >>
> >> Try create a one node cluster and filesystem and make tests..
> >> (fileindex can be the problem)
> >>
> >> make a fsck to the filesystem
> >>
> >> Try upgrade ocfs2 to last version and use the max features, only
> >> has two nodes?..
> >>
> >> I will do. make a backup, create a new filesystem with all features
> >> that you need and make mkfs. the cluster only with the number of
> >> the nodes that you will use.
> >>
> >> Restore de data.
> >>
> >> Make extensive test for a week before put in production :-)..
> >>
> >> On Tue, Dec 6, 2011 at 2:04 PM, Marek Krolikowski
> >> <ad...@wset.edu.pl> wrote:
> >>>
> >>> hey m8
> >>> Like i say i am not expert too but when i use ext3 and write/read
> >>> working with np.
> >>>
> >>>
> >>> -----Oryginalna wiadomość----- From: Eduardo Diaz - Gmail
> >>> Sent: Tuesday, December 06, 2011 3:06 AM
> >>> To: Marek Królikowski
> >>> Cc: ocfs2-users@oss.oracle.com
> >>> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many
> >>> write/read from both
> >>> servers
> >>>
> >>>
> >>> I am not a expert, but you have a problem in you EMC system
> >>> (multipath system) or drivers.
> >>>
> >>> Did you test this before put in production? or test this NAS with
> >>> other filesystem, xfs for example?.
> >>>
> >>> As I read I see" that hung_task_timeout_secs" some task wait more
> >>> that 120 seg may be a problem of you EMC/Fiber/cable problem...
> >>>
> >>> 2011/12/4 Marek Królikowski <ad...@wset.edu.pl>:
> >>>>
> >>>>
> >>>> I do for all night tests with write/read files from ocfs2 on
> >>>> both servers something like this:
> >>>> On MAIL1 server:
> >>>> #!/bin/bash
> >>>> while true
> >>>> do
> >>>> rm -rf /mnt/EMC/MAIL1
> >>>> mkdir /mnt/EMC/MAIL1
> >>>> cp -r /usr /mnt/EMC/MAIL1
> >>>> rm -rf /mnt/EMC/MAIL1
> >>>> done;
> >>>> On MAIL2 server:
> >>>> #!/bin/bash
> >>>> while true
> >>>> do
> >>>> rm -rf /mnt/EMC/MAIL2
> >>>> mkdir /mnt/EMC/MAIL2
> >>>> cp -r /usr /mnt/EMC/MAIL2
> >>>> rm -rf /mnt/EMC/MAIL2
> >>>> done;
> >>>>
> >>>> Today i check logs and see:
> >>>> o2dlm: Node 1 joins domain EAC7942B71964050AE2046D3F0CDD7B2
> >>>> o2dlm: Nodes in domain EAC7942B71964050AE2046D3F0CDD7B2: 0 1
> >>>> (rm,26136,0):ocfs2_unlink:953 ERROR: status = -2
> >>>> (touch,26137,0):ocfs2_check_dir_for_entry:2120 ERROR: status =
> >>>> -17 (touch,26137,0):ocfs2_mknod:461 ERROR: status = -17
> >>>> (touch,26137,0):ocfs2_create:631 ERROR: status = -17
> >>>> (rm,26142,0):ocfs2_unlink:953 ERROR: status = -2
> >>>> INFO: task kworker/u:2:20246 blocked for more than 120 seconds.
> >>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> >>>> message. kworker/u:2     D ffff88107f4525c0     0 20246      2
> >>>> 0x00000000 ffff880b730b57d0 0000000000000046 ffff8810201297d0
> >>>> 00000000000125c0 ffff880f5a399fd8 00000000000125c0
> >>>> 00000000000125c0 00000000000125c0 ffff880f5a398000
> >>>> 00000000000125c0 ffff880f5a399fd8 00000000000125c0 Call Trace:
> >>>> [<ffffffff81481b71>] ? __mutex_lock_slowpath+0xd1/0x140
> >>>> [<ffffffff814818d3>] ? mutex_lock+0x23/0x40
> >>>> [<ffffffffa0937d95>] ? ocfs2_wipe_inode+0x105/0x690 [ocfs2]
> >>>> [<ffffffffa0935cfb>] ? ocfs2_query_inode_wipe.clone.9+0xcb/0x370
> >>>> [ocfs2] [<ffffffffa09385a4>] ? ocfs2_delete_inode+0x284/0x3f0
> >>>> [ocfs2] [<ffffffffa0919a10>] ?
> >>>> ocfs2_dentry_attach_lock+0x5a0/0x5a0 [ocfs2]
> >>>> [<ffffffffa093872e>] ? ocfs2_evict_inode+0x1e/0x50 [ocfs2]
> >>>> [<ffffffff81145900>] ? evict+0x70/0x140 [<ffffffffa0919322>] ?
> >>>> __ocfs2_drop_dl_inodes.clone.2+0x32/0x60 [ocfs2]
> >>>> [<ffffffffa0919a39>] ? ocfs2_drop_dl_inodes+0x29/0x90 [ocfs2]
> >>>> [<ffffffff8106e56f>] ? process_one_work+0x11f/0x440
> >>>> [<ffffffff8106f279>] ? worker_thread+0x159/0x330
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff81073fa6>] ? kthread+0x96/0xa0 [<ffffffff8148bb24>] ?
> >>>> kernel_thread_helper+0x4/0x10 [<ffffffff81073f10>] ?
> >>>> kthread_worker_fn+0x1a0/0x1a0 [<ffffffff8148bb20>] ?
> >>>> gs_change+0x13/0x13 INFO: task rm:5192 blocked for more than 120
> >>>> seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>> disables this message. rm              D ffff88107f2725c0     0
> >>>>  5192  16338 0x00000000 ffff881014ccb040 0000000000000082
> >>>> ffff8810206b8040 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 00000000000125c0 00000000000125c0
> >>>> ffff8804d7696000 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 Call Trace: [<ffffffff8148148d>] ?
> >>>> schedule_timeout+0x1ed/0x2e0 [<ffffffffa0886162>] ?
> >>>> dlmconvert_master+0xe2/0x190 [ocfs2_dlm] [<ffffffffa08878bf>] ?
> >>>> dlmlock+0x7f/0xb70 [ocfs2_dlm] [<ffffffff81480e0a>] ?
> >>>> wait_for_common+0x13a/0x190 [<ffffffff8104bc50>] ?
> >>>> try_to_wake_up+0x280/0x280 [<ffffffffa0928a38>] ?
> >>>> __ocfs2_cluster_lock.clone.21+0x1d8/0x6b0 [ocfs2]
> >>>> [<ffffffffa0928fcc>] ? ocfs2_inode_lock_full_nested+0xbc/0x490
> >>>> [ocfs2] [<ffffffffa0943c1b>] ?
> >>>> ocfs2_lookup_lock_orphan_dir+0x6b/0x1b0 [ocfs2]
> >>>> [<ffffffffa09454ba>] ? ocfs2_prepare_orphan_dir+0x4a/0x280
> >>>> [ocfs2] [<ffffffffa094616f>] ? ocfs2_unlink+0x6ef/0xb90 [ocfs2]
> >>>> [<ffffffff811b35a9>] ? may_link.clone.22+0xd9/0x170
> >>>> [<ffffffff8113aa58>] ? vfs_unlink+0x98/0x100
> >>>> [<ffffffff8113ac41>] ? do_unlinkat+0x181/0x1b0
> >>>> [<ffffffff8113e7cd>] ? vfs_readdir+0x9d/0xe0
> >>>> [<ffffffff811653d8>] ? fsnotify_find_inode_mark+0x28/0x40
> >>>> [<ffffffff81166324>] ? dnotify_flush+0x54/0x110
> >>>> [<ffffffff8112b07f>] ? filp_close+0x5f/0x90
> >>>> [<ffffffff8148aa12>] ? system_call_fastpath+0x16/0x1b INFO: task
> >>>> kworker/u:2:20246 blocked for more than 120 seconds. "echo 0
> >>>> > /proc/sys/kernel/hung_task_timeout_secs" disables this
> >>>> > message. kworker/u:2     D ffff88107f4525c0     0 20246      2
> >>>> > 0x00000000 ffff880b730b57d0 0000000000000046 ffff8810201297d0
> >>>> > 00000000000125c0
> >>>> ffff880f5a399fd8 00000000000125c0 00000000000125c0
> >>>> 00000000000125c0 ffff880f5a398000 00000000000125c0
> >>>> ffff880f5a399fd8 00000000000125c0 Call Trace:
> >>>> [<ffffffff81481b71>] ? __mutex_lock_slowpath+0xd1/0x140
> >>>> [<ffffffff814818d3>] ? mutex_lock+0x23/0x40
> >>>> [<ffffffffa0937d95>] ? ocfs2_wipe_inode+0x105/0x690 [ocfs2]
> >>>> [<ffffffffa0935cfb>] ? ocfs2_query_inode_wipe.clone.9+0xcb/0x370
> >>>> [ocfs2] [<ffffffffa09385a4>] ? ocfs2_delete_inode+0x284/0x3f0
> >>>> [ocfs2] [<ffffffffa0919a10>] ?
> >>>> ocfs2_dentry_attach_lock+0x5a0/0x5a0 [ocfs2]
> >>>> [<ffffffffa093872e>] ? ocfs2_evict_inode+0x1e/0x50 [ocfs2]
> >>>> [<ffffffff81145900>] ? evict+0x70/0x140 [<ffffffffa0919322>] ?
> >>>> __ocfs2_drop_dl_inodes.clone.2+0x32/0x60 [ocfs2]
> >>>> [<ffffffffa0919a39>] ? ocfs2_drop_dl_inodes+0x29/0x90 [ocfs2]
> >>>> [<ffffffff8106e56f>] ? process_one_work+0x11f/0x440
> >>>> [<ffffffff8106f279>] ? worker_thread+0x159/0x330
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff81073fa6>] ? kthread+0x96/0xa0 [<ffffffff8148bb24>] ?
> >>>> kernel_thread_helper+0x4/0x10 [<ffffffff81073f10>] ?
> >>>> kthread_worker_fn+0x1a0/0x1a0 [<ffffffff8148bb20>] ?
> >>>> gs_change+0x13/0x13 INFO: task rm:5192 blocked for more than 120
> >>>> seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>> disables this message. rm              D ffff88107f2725c0     0
> >>>>  5192  16338 0x00000000 ffff881014ccb040 0000000000000082
> >>>> ffff8810206b8040 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 00000000000125c0 00000000000125c0
> >>>> ffff8804d7696000 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 Call Trace: [<ffffffff8148148d>] ?
> >>>> schedule_timeout+0x1ed/0x2e0 [<ffffffffa0886162>] ?
> >>>> dlmconvert_master+0xe2/0x190 [ocfs2_dlm] [<ffffffffa08878bf>] ?
> >>>> dlmlock+0x7f/0xb70 [ocfs2_dlm] [<ffffffff81480e0a>] ?
> >>>> wait_for_common+0x13a/0x190 [<ffffffff8104bc50>] ?
> >>>> try_to_wake_up+0x280/0x280 [<ffffffffa0928a38>] ?
> >>>> __ocfs2_cluster_lock.clone.21+0x1d8/0x6b0 [ocfs2]
> >>>> [<ffffffffa0928fcc>] ? ocfs2_inode_lock_full_nested+0xbc/0x490
> >>>> [ocfs2] [<ffffffffa0943c1b>] ?
> >>>> ocfs2_lookup_lock_orphan_dir+0x6b/0x1b0 [ocfs2]
> >>>> [<ffffffffa09454ba>] ? ocfs2_prepare_orphan_dir+0x4a/0x280
> >>>> [ocfs2] [<ffffffffa094616f>] ? ocfs2_unlink+0x6ef/0xb90 [ocfs2]
> >>>> [<ffffffff811b35a9>] ? may_link.clone.22+0xd9/0x170
> >>>> [<ffffffff8113aa58>] ? vfs_unlink+0x98/0x100
> >>>> [<ffffffff8113ac41>] ? do_unlinkat+0x181/0x1b0
> >>>> [<ffffffff8113e7cd>] ? vfs_readdir+0x9d/0xe0
> >>>> [<ffffffff811653d8>] ? fsnotify_find_inode_mark+0x28/0x40
> >>>> [<ffffffff81166324>] ? dnotify_flush+0x54/0x110
> >>>> [<ffffffff8112b07f>] ? filp_close+0x5f/0x90
> >>>> [<ffffffff8148aa12>] ? system_call_fastpath+0x16/0x1b INFO: task
> >>>> kworker/u:2:20246 blocked for more than 120 seconds. "echo 0
> >>>> > /proc/sys/kernel/hung_task_timeout_secs" disables this
> >>>> > message. kworker/u:2     D ffff88107f4525c0     0 20246      2
> >>>> > 0x00000000 ffff880b730b57d0 0000000000000046 ffff8810201297d0
> >>>> > 00000000000125c0
> >>>> ffff880f5a399fd8 00000000000125c0 00000000000125c0
> >>>> 00000000000125c0 ffff880f5a398000 00000000000125c0
> >>>> ffff880f5a399fd8 00000000000125c0 Call Trace:
> >>>> [<ffffffff81481b71>] ? __mutex_lock_slowpath+0xd1/0x140
> >>>> [<ffffffff814818d3>] ? mutex_lock+0x23/0x40
> >>>> [<ffffffffa0937d95>] ? ocfs2_wipe_inode+0x105/0x690 [ocfs2]
> >>>> [<ffffffffa0935cfb>] ? ocfs2_query_inode_wipe.clone.9+0xcb/0x370
> >>>> [ocfs2] [<ffffffffa09385a4>] ? ocfs2_delete_inode+0x284/0x3f0
> >>>> [ocfs2] [<ffffffffa0919a10>] ?
> >>>> ocfs2_dentry_attach_lock+0x5a0/0x5a0 [ocfs2]
> >>>> [<ffffffffa093872e>] ? ocfs2_evict_inode+0x1e/0x50 [ocfs2]
> >>>> [<ffffffff81145900>] ? evict+0x70/0x140 [<ffffffffa0919322>] ?
> >>>> __ocfs2_drop_dl_inodes.clone.2+0x32/0x60 [ocfs2]
> >>>> [<ffffffffa0919a39>] ? ocfs2_drop_dl_inodes+0x29/0x90 [ocfs2]
> >>>> [<ffffffff8106e56f>] ? process_one_work+0x11f/0x440
> >>>> [<ffffffff8106f279>] ? worker_thread+0x159/0x330
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff81073fa6>] ? kthread+0x96/0xa0 [<ffffffff8148bb24>] ?
> >>>> kernel_thread_helper+0x4/0x10 [<ffffffff81073f10>] ?
> >>>> kthread_worker_fn+0x1a0/0x1a0 [<ffffffff8148bb20>] ?
> >>>> gs_change+0x13/0x13 INFO: task rm:5192 blocked for more than 120
> >>>> seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>> disables this message. rm              D ffff88107f2725c0     0
> >>>>  5192  16338 0x00000000 ffff881014ccb040 0000000000000082
> >>>> ffff8810206b8040 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 00000000000125c0 00000000000125c0
> >>>> ffff8804d7696000 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 Call Trace: [<ffffffff8148148d>] ?
> >>>> schedule_timeout+0x1ed/0x2e0 [<ffffffffa0886162>] ?
> >>>> dlmconvert_master+0xe2/0x190 [ocfs2_dlm] [<ffffffffa08878bf>] ?
> >>>> dlmlock+0x7f/0xb70 [ocfs2_dlm] [<ffffffff81480e0a>] ?
> >>>> wait_for_common+0x13a/0x190 [<ffffffff8104bc50>] ?
> >>>> try_to_wake_up+0x280/0x280 [<ffffffffa0928a38>] ?
> >>>> __ocfs2_cluster_lock.clone.21+0x1d8/0x6b0 [ocfs2]
> >>>> [<ffffffffa0928fcc>] ? ocfs2_inode_lock_full_nested+0xbc/0x490
> >>>> [ocfs2] [<ffffffffa0943c1b>] ?
> >>>> ocfs2_lookup_lock_orphan_dir+0x6b/0x1b0 [ocfs2]
> >>>> [<ffffffffa09454ba>] ? ocfs2_prepare_orphan_dir+0x4a/0x280
> >>>> [ocfs2] [<ffffffffa094616f>] ? ocfs2_unlink+0x6ef/0xb90 [ocfs2]
> >>>> [<ffffffff811b35a9>] ? may_link.clone.22+0xd9/0x170
> >>>> [<ffffffff8113aa58>] ? vfs_unlink+0x98/0x100
> >>>> [<ffffffff8113ac41>] ? do_unlinkat+0x181/0x1b0
> >>>> [<ffffffff8113e7cd>] ? vfs_readdir+0x9d/0xe0
> >>>> [<ffffffff811653d8>] ? fsnotify_find_inode_mark+0x28/0x40
> >>>> [<ffffffff81166324>] ? dnotify_flush+0x54/0x110
> >>>> [<ffffffff8112b07f>] ? filp_close+0x5f/0x90
> >>>> [<ffffffff8148aa12>] ? system_call_fastpath+0x16/0x1b INFO: task
> >>>> kworker/u:2:20246 blocked for more than 120 seconds. "echo 0
> >>>> > /proc/sys/kernel/hung_task_timeout_secs" disables this
> >>>> > message. kworker/u:2     D ffff88107f4525c0     0 20246      2
> >>>> > 0x00000000 ffff880b730b57d0 0000000000000046 ffff8810201297d0
> >>>> > 00000000000125c0
> >>>> ffff880f5a399fd8 00000000000125c0 00000000000125c0
> >>>> 00000000000125c0 ffff880f5a398000 00000000000125c0
> >>>> ffff880f5a399fd8 00000000000125c0 Call Trace:
> >>>> [<ffffffff81481b71>] ? __mutex_lock_slowpath+0xd1/0x140
> >>>> [<ffffffff814818d3>] ? mutex_lock+0x23/0x40
> >>>> [<ffffffffa0937d95>] ? ocfs2_wipe_inode+0x105/0x690 [ocfs2]
> >>>> [<ffffffffa0935cfb>] ? ocfs2_query_inode_wipe.clone.9+0xcb/0x370
> >>>> [ocfs2] [<ffffffffa09385a4>] ? ocfs2_delete_inode+0x284/0x3f0
> >>>> [ocfs2] [<ffffffffa0919a10>] ?
> >>>> ocfs2_dentry_attach_lock+0x5a0/0x5a0 [ocfs2]
> >>>> [<ffffffffa093872e>] ? ocfs2_evict_inode+0x1e/0x50 [ocfs2]
> >>>> [<ffffffff81145900>] ? evict+0x70/0x140 [<ffffffffa0919322>] ?
> >>>> __ocfs2_drop_dl_inodes.clone.2+0x32/0x60 [ocfs2]
> >>>> [<ffffffffa0919a39>] ? ocfs2_drop_dl_inodes+0x29/0x90 [ocfs2]
> >>>> [<ffffffff8106e56f>] ? process_one_work+0x11f/0x440
> >>>> [<ffffffff8106f279>] ? worker_thread+0x159/0x330
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff81073fa6>] ? kthread+0x96/0xa0 [<ffffffff8148bb24>] ?
> >>>> kernel_thread_helper+0x4/0x10 [<ffffffff81073f10>] ?
> >>>> kthread_worker_fn+0x1a0/0x1a0 [<ffffffff8148bb20>] ?
> >>>> gs_change+0x13/0x13 INFO: task rm:5192 blocked for more than 120
> >>>> seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>> disables this message. rm              D ffff88107f2725c0     0
> >>>>  5192  16338 0x00000000 ffff881014ccb040 0000000000000082
> >>>> ffff8810206b8040 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 00000000000125c0 00000000000125c0
> >>>> ffff8804d7696000 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 Call Trace: [<ffffffff8148148d>] ?
> >>>> schedule_timeout+0x1ed/0x2e0 [<ffffffffa0886162>] ?
> >>>> dlmconvert_master+0xe2/0x190 [ocfs2_dlm] [<ffffffffa08878bf>] ?
> >>>> dlmlock+0x7f/0xb70 [ocfs2_dlm] [<ffffffff81480e0a>] ?
> >>>> wait_for_common+0x13a/0x190 [<ffffffff8104bc50>] ?
> >>>> try_to_wake_up+0x280/0x280 [<ffffffffa0928a38>] ?
> >>>> __ocfs2_cluster_lock.clone.21+0x1d8/0x6b0 [ocfs2]
> >>>> [<ffffffffa0928fcc>] ? ocfs2_inode_lock_full_nested+0xbc/0x490
> >>>> [ocfs2] [<ffffffffa0943c1b>] ?
> >>>> ocfs2_lookup_lock_orphan_dir+0x6b/0x1b0 [ocfs2]
> >>>> [<ffffffffa09454ba>] ? ocfs2_prepare_orphan_dir+0x4a/0x280
> >>>> [ocfs2] [<ffffffffa094616f>] ? ocfs2_unlink+0x6ef/0xb90 [ocfs2]
> >>>> [<ffffffff811b35a9>] ? may_link.clone.22+0xd9/0x170
> >>>> [<ffffffff8113aa58>] ? vfs_unlink+0x98/0x100
> >>>> [<ffffffff8113ac41>] ? do_unlinkat+0x181/0x1b0
> >>>> [<ffffffff8113e7cd>] ? vfs_readdir+0x9d/0xe0
> >>>> [<ffffffff811653d8>] ? fsnotify_find_inode_mark+0x28/0x40
> >>>> [<ffffffff81166324>] ? dnotify_flush+0x54/0x110
> >>>> [<ffffffff8112b07f>] ? filp_close+0x5f/0x90
> >>>> [<ffffffff8148aa12>] ? system_call_fastpath+0x16/0x1b INFO: task
> >>>> kworker/u:2:20246 blocked for more than 120 seconds. "echo 0
> >>>> > /proc/sys/kernel/hung_task_timeout_secs" disables this
> >>>> > message. kworker/u:2     D ffff88107f4525c0     0 20246      2
> >>>> > 0x00000000 ffff880b730b57d0 0000000000000046 ffff8810201297d0
> >>>> > 00000000000125c0
> >>>> ffff880f5a399fd8 00000000000125c0 00000000000125c0
> >>>> 00000000000125c0 ffff880f5a398000 00000000000125c0
> >>>> ffff880f5a399fd8 00000000000125c0 Call Trace:
> >>>> [<ffffffff81481b71>] ? __mutex_lock_slowpath+0xd1/0x140
> >>>> [<ffffffff814818d3>] ? mutex_lock+0x23/0x40
> >>>> [<ffffffffa0937d95>] ? ocfs2_wipe_inode+0x105/0x690 [ocfs2]
> >>>> [<ffffffffa0935cfb>] ? ocfs2_query_inode_wipe.clone.9+0xcb/0x370
> >>>> [ocfs2] [<ffffffffa09385a4>] ? ocfs2_delete_inode+0x284/0x3f0
> >>>> [ocfs2] [<ffffffffa0919a10>] ?
> >>>> ocfs2_dentry_attach_lock+0x5a0/0x5a0 [ocfs2]
> >>>> [<ffffffffa093872e>] ? ocfs2_evict_inode+0x1e/0x50 [ocfs2]
> >>>> [<ffffffff81145900>] ? evict+0x70/0x140 [<ffffffffa0919322>] ?
> >>>> __ocfs2_drop_dl_inodes.clone.2+0x32/0x60 [ocfs2]
> >>>> [<ffffffffa0919a39>] ? ocfs2_drop_dl_inodes+0x29/0x90 [ocfs2]
> >>>> [<ffffffff8106e56f>] ? process_one_work+0x11f/0x440
> >>>> [<ffffffff8106f279>] ? worker_thread+0x159/0x330
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff8106f120>] ? manage_workers.clone.21+0x120/0x120
> >>>> [<ffffffff81073fa6>] ? kthread+0x96/0xa0 [<ffffffff8148bb24>] ?
> >>>> kernel_thread_helper+0x4/0x10 [<ffffffff81073f10>] ?
> >>>> kthread_worker_fn+0x1a0/0x1a0 [<ffffffff8148bb20>] ?
> >>>> gs_change+0x13/0x13 INFO: task rm:5192 blocked for more than 120
> >>>> seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>> disables this message. rm              D ffff88107f2725c0     0
> >>>>  5192  16338 0x00000000 ffff881014ccb040 0000000000000082
> >>>> ffff8810206b8040 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 00000000000125c0 00000000000125c0
> >>>> ffff8804d7696000 00000000000125c0 ffff8804d7697fd8
> >>>> 00000000000125c0 Call Trace: [<ffffffff8148148d>] ?
> >>>> schedule_timeout+0x1ed/0x2e0 [<ffffffffa0886162>] ?
> >>>> dlmconvert_master+0xe2/0x190 [ocfs2_dlm] [<ffffffffa08878bf>] ?
> >>>> dlmlock+0x7f/0xb70 [ocfs2_dlm] [<ffffffff81480e0a>] ?
> >>>> wait_for_common+0x13a/0x190 [<ffffffff8104bc50>] ?
> >>>> try_to_wake_up+0x280/0x280 [<ffffffffa0928a38>] ?
> >>>> __ocfs2_cluster_lock.clone.21+0x1d8/0x6b0 [ocfs2]
> >>>> [<ffffffffa0928fcc>] ? ocfs2_inode_lock_full_nested+0xbc/0x490
> >>>> [ocfs2] [<ffffffffa0943c1b>] ?
> >>>> ocfs2_lookup_lock_orphan_dir+0x6b/0x1b0 [ocfs2]
> >>>> [<ffffffffa09454ba>] ? ocfs2_prepare_orphan_dir+0x4a/0x280
> >>>> [ocfs2] [<ffffffffa094616f>] ? ocfs2_unlink+0x6ef/0xb90 [ocfs2]
> >>>> [<ffffffff811b35a9>] ? may_link.clone.22+0xd9/0x170
> >>>> [<ffffffff8113aa58>] ? vfs_unlink+0x98/0x100
> >>>> [<ffffffff8113ac41>] ? do_unlinkat+0x181/0x1b0
> >>>> [<ffffffff8113e7cd>] ? vfs_readdir+0x9d/0xe0
> >>>> [<ffffffff811653d8>] ? fsnotify_find_inode_mark+0x28/0x40
> >>>> [<ffffffff81166324>] ? dnotify_flush+0x54/0x110
> >>>> [<ffffffff8112b07f>] ? filp_close+0x5f/0x90
> >>>> [<ffffffff8148aa12>] ? system_call_fastpath+0x16/0x1b
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Ocfs2-users mailing list
> >>>> Ocfs2-users@oss.oracle.com
> >>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>>
> >>>
> >>>
> >>
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users


-- 
  .:''''':.
.:'        `     Sérgio Surkamp | Administrador de Redes
::    ........   ser...@gruposinternet.com.br
`:.        .:'
  `:,   ,.:'     *Grupos Internet S.A.*
    `: :'        R. Lauro Linhares, 2123 Torre B - Sala 201
     : :         Trindade - Florianópolis - SC
     :.'
     ::          +55 48 3234-4109
     :
     '           http://www.gruposinternet.com.br

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both servers

Reply via email to