Hello and happy new year! I do enable quota and i got oops on both servers and can`t login - console frozen after give right login and password. I do sysrq t,s,b and this is what i get: https://wizja2.tktelekom.pl/ocfs2/2012.01.03-3.1.6/ anything else You need? Cheers!
-----Oryginalna wiadomość----- From: srinivas eeda Sent: Friday, December 23, 2011 10:52 PM To: Marek Królikowski Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from both Please press sysrq key and t to dump kernel stacks on both nodes and please email me the messages files. On 12/23/2011 1:19 PM, Marek Królikowski wrote: > Hello > I get oops on TEST-MAIL2: > > INFO: task ocfs2dc:15430 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > ocfs2dc D ffff88107f232c40 0 15430 2 0x00000000 > ffff881014889080 0000000000000046 ffff881000000000 ffff88102060c080 > 0000000000012c40 ffff88101eefbfd8 0000000000012c40 ffff88101eefa010 > ffff88101eefbfd8 0000000000012c40 0000000000000001 00000001130a4380 > Call Trace: > [<ffffffff8148db41>] ? __mutex_lock_slowpath+0xd1/0x140 > [<ffffffff8148da53>] ? mutex_lock+0x23/0x40 > [<ffffffff81181eb6>] ? dqget+0x246/0x3a0 > [<ffffffff81182281>] ? __dquot_initialize+0x121/0x210 > [<ffffffff8114c90d>] ? d_kill+0x9d/0x100 > [<ffffffffa0a601c3>] ? ocfs2_find_local_alias+0x23/0x100 [ocfs2] > [<ffffffffa0a7fca8>] ? ocfs2_delete_inode+0x98/0x3e0 [ocfs2] > [<ffffffffa0a7106c>] ? ocfs2_unblock_lock+0x10c/0x770 [ocfs2] > [<ffffffffa0a80969>] ? ocfs2_evict_inode+0x19/0x40 [ocfs2] > [<ffffffff8114e9cc>] ? evict+0x8c/0x170 > [<ffffffffa0a5fccd>] ? ocfs2_dentry_lock_put+0x5d/0x90 [ocfs2] > [<ffffffffa0a7177a>] ? ocfs2_process_blocked_lock+0xaa/0x280 [ocfs2] > [<ffffffff8107beb2>] ? prepare_to_wait+0x82/0x90 > [<ffffffff8107bceb>] ? finish_wait+0x4b/0xa0 > [<ffffffffa0a71aa0>] ? ocfs2_downconvert_thread+0x150/0x270 [ocfs2] > [<ffffffff8107bb60>] ? wake_up_bit+0x40/0x40 > [<ffffffffa0a71950>] ? ocfs2_process_blocked_lock+0x280/0x280 [ocfs2] > [<ffffffffa0a71950>] ? ocfs2_process_blocked_lock+0x280/0x280 [ocfs2] > [<ffffffff8107b686>] ? kthread+0x96/0xa0 > [<ffffffff81498a74>] ? kernel_thread_helper+0x4/0x10 > [<ffffffff8107b5f0>] ? kthread_worker_fn+0x190/0x190 > [<ffffffff81498a70>] ? gs_change+0x13/0x13 > INFO: task kworker/0:1:30806 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > kworker/0:1 D ffff88107f212c40 0 30806 2 0x00000000 > ffff8810152f4080 0000000000000046 0000000000000000 ffffffff81a0d020 > 0000000000012c40 ffff880c28a57fd8 0000000000012c40 ffff880c28a56010 > ffff880c28a57fd8 0000000000012c40 ffff880c28a57a08 00000001152f4080 > Call Trace: > [<ffffffff8148d45d>] ? schedule_timeout+0x1ed/0x2d0 > [<ffffffffa05244c0>] ? __jbd2_journal_file_buffer+0xd0/0x230 [jbd2] > [<ffffffff8148ce5c>] ? wait_for_common+0x12c/0x1a0 > [<ffffffff81052230>] ? try_to_wake_up+0x280/0x280 > [<ffffffff81085e21>] ? ktime_get+0x61/0xf0 > [<ffffffffa0a6e850>] ? __ocfs2_cluster_lock+0x1f0/0x780 [ocfs2] > [<ffffffff81046fa7>] ? find_busiest_group+0x1f7/0xb00 > [<ffffffffa0a73a56>] ? ocfs2_inode_lock_full_nested+0x126/0x540 [ocfs2] > [<ffffffffa0ad4da9>] ? ocfs2_lock_global_qf+0x29/0xd0 [ocfs2] > [<ffffffffa0ad4da9>] ? ocfs2_lock_global_qf+0x29/0xd0 [ocfs2] > [<ffffffffa0ad71df>] ? ocfs2_sync_dquot_helper+0xbf/0x330 [ocfs2] > [<ffffffffa0ad7120>] ? ocfs2_acquire_dquot+0x390/0x390 [ocfs2] > [<ffffffff81181c3a>] ? dquot_scan_active+0xda/0x110 > [<ffffffffa0ad4ca0>] ? ocfs2_global_is_id+0x60/0x60 [ocfs2] > [<ffffffffa0ad4cc1>] ? qsync_work_fn+0x21/0x40 [ocfs2] > [<ffffffff810753f3>] ? process_one_work+0x123/0x450 > [<ffffffff8107690b>] ? worker_thread+0x15b/0x370 > [<ffffffff810767b0>] ? manage_workers+0x110/0x110 > [<ffffffff810767b0>] ? manage_workers+0x110/0x110 > [<ffffffff8107b686>] ? kthread+0x96/0xa0 > [<ffffffff81498a74>] ? kernel_thread_helper+0x4/0x10 > [<ffffffff8107b5f0>] ? kthread_worker_fn+0x190/0x190 > [<ffffffff81498a70>] ? gs_change+0x13/0x13 > > And i can`t login to TEST-MAIL1 after give login and password console say > when i lastlog but i don`t get bash - console don`t answer... but there is > no OOPS or something like this on screen. > I don`t restart both server tell me what to do now. > Thanks > > > -----Oryginalna wiadomość----- From: srinivas eeda > Sent: Thursday, December 22, 2011 9:12 PM > To: Marek Królikowski > Cc: ocfs2-users@oss.oracle.com > Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from > both > > We need to know what happened to node 2. Was the node rebooted because > of a network timeout or kernel panic? can you please configure > netconsole, serial console and rerun the test? > > On 12/22/2011 8:08 AM, Marek Królikowski wrote: >> Hello >> After 24 hours i see TEST-MAIL2 reboot ( possible kernel panic) but >> TEST-MAIL1 got in dmesg: >> TEST-MAIL1 ~ #dmesg >> [cut] >> o2net: accepted connection from node TEST-MAIL2 (num 1) at >> 172.17.1.252:7777 >> o2dlm: Node 1 joins domain B24C4493BBC74FEAA3371E2534BB3611 >> o2dlm: Nodes in domain B24C4493BBC74FEAA3371E2534BB3611: 0 1 >> o2net: connection to node TEST-MAIL2 (num 1) at 172.17.1.252:7777 has >> been idle for 60.0 seconds, shutting it down. >> (swapper,0,0):o2net_idle_timer:1562 Here are some times that might help >> debug the situation: (Timer: 33127732045, Now 33187808090, DataReady >> 33127732039, Advance 33127732051-33127732051, Key 0xebb9cd47, Func 506, >> FuncTime 33127732045-33127732048) >> o2net: no longer connected to node TEST-MAIL2 (num 1) at >> 172.17.1.252:7777 >> (du,5099,12):dlm_do_master_request:1324 ERROR: link to 1 went down! >> (du,5099,12):dlm_get_lock_resource:907 ERROR: status = -112 >> (dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: >> B24C4493BBC74FEAA3371E2534BB3611: res M000000000000000000000cf023ef70, >> error -112 send AST to node 1 >> (dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -112 >> (dlm_thread,14321,1):dlm_send_proxy_ast_msg:484 ERROR: >> B24C4493BBC74FEAA3371E2534BB3611: res P000000000000000000000000000000, >> error -107 send AST to node 1 >> (dlm_thread,14321,1):dlm_flush_asts:605 ERROR: status = -107 >> (kworker/u:3,5071,0):o2net_connect_expired:1724 ERROR: no connection >> established with node 1 after 60.0 seconds, giving up and returning >> errors. >> (o2hb-B24C4493BB,14310,0):o2dlm_eviction_cb:267 o2dlm has evicted node 1 >> from group B24C4493BBC74FEAA3371E2534BB3611 >> (ocfs2rec,5504,6):dlm_get_lock_resource:834 >> B24C4493BBC74FEAA3371E2534BB3611:M0000000000000000000015f023ef70: at >> least one node (1) to recover before lock mastery can begin >> (ocfs2rec,5504,6):dlm_get_lock_resource:888 >> B24C4493BBC74FEAA3371E2534BB3611:M0000000000000000000015f023ef70: at >> least one node (1) to recover before lock mastery can begin >> (du,5099,12):dlm_restart_lock_mastery:1213 ERROR: node down! 1 >> (du,5099,12):dlm_wait_for_lock_mastery:1030 ERROR: status = -11 >> (du,5099,12):dlm_get_lock_resource:888 >> B24C4493BBC74FEAA3371E2534BB3611:N000000000020924f: at least one node (1) >> to recover before lock mastery can begin >> (dlm_reco_thread,14322,0):dlm_get_lock_resource:834 >> B24C4493BBC74FEAA3371E2534BB3611:$RECOVERY: at least one node (1) to >> recover before lock mastery can begin >> (dlm_reco_thread,14322,0):dlm_get_lock_resource:868 >> B24C4493BBC74FEAA3371E2534BB3611: recovery map is not empty, but must >> master $RECOVERY lock now >> (dlm_reco_thread,14322,0):dlm_do_recovery:523 (14322) Node 0 is the >> Recovery Master for the Dead Node 1 for Domain >> B24C4493BBC74FEAA3371E2534BB3611 >> (ocfs2rec,5504,6):ocfs2_replay_journal:1549 Recovering node 1 from slot 1 >> on device (253,0) >> (ocfs2rec,5504,6):ocfs2_begin_quota_recovery:407 Beginning quota recovery >> in slot 1 >> (kworker/u:0,2909,0):ocfs2_finish_quota_recovery:599 Finishing quota >> recovery in slot 1 >> >> And i try give this command: >> debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP >> allow >> debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or >> directory >> debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP >> off >> debugfs.ocfs2: Unable to write log mask "ENTRY": No such file or >> directory >> >> But not working.... >> >> >> -----Oryginalna wiadomość----- From: Srinivas Eeda >> Sent: Wednesday, December 21, 2011 8:43 PM >> To: Marek Królikowski >> Cc: ocfs2-users@oss.oracle.com >> Subject: Re: [Ocfs2-users] ocfs2 - Kernel panic on many write/read from >> both >> >> Those numbers look good. Basically with the fixes backed out and another >> fix I gave, you are not seeing that many orphans hanging around and >> hence not seeing the process stuck kernel stacks. You can run the test >> longer or if you are satisfied, please enable quotas and re-run the test >> with the modified kernel. You might see a dead lock which needs to be >> fixed(I was not able to reproduce this yet). If the system hangs, please >> capture the following and provide me the output >> >> 1. echo t > /proc/sysrq-trigger >> 2. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP >> allow >> 3. wait for 10 minutes >> 4. debugfs.ocfs2 -l ENTRY EXIT DLM_GLUE QUOTA INODE DISK_ALLOC EXTENT_MAP >> off >> 5. echo t > /proc/sysrq-trigger >> > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users