Al 13/12/10 20:49, En/na Sunil Mushran ha escrit:
On 12/12/2010 11:58 PM, frank wrote:
After that, all node operations frozen; we can not log in either.

Node 0 keep on log this kind of messages until it stopped "message" logging at 10:49:

/Dec 4 10:49:34 heraclito kernel: (sendmail,19074,6):ocfs2_inode_lock_full:2121 ERROR: status = -22 Dec 4 10:49:34 heraclito kernel: (sendmail,19074,6):_ocfs2_statfs:1266 ERROR: status = -22 Dec 4 10:49:34 heraclito kernel: (sendmail,19074,6):dlm_send_remote_convert_request:393 ERROR: dlm status = DLM_IVLOCKID Dec 4 10:49:34 heraclito kernel: (sendmail,19074,6):dlmconvert_remote:327 ERROR: dlm status = DLM_IVLOCKID Dec 4 10:49:34 heraclito kernel: (sendmail,19074,6):ocfs2_cluster_lock:1258 ERROR: DLM error DLM_IVLOCKID while calling dlmlock on resource M00000000
0000000000000b6f931666: bad lockid/

Node 0 is trying to upconvert the lock level.

Node 1 keep on log this kind of messages until it stopped "message" logging at 10:00:

/Dec 4 10:00:20 parmenides kernel: (o2net,10545,14):dlm_convert_lock_handler:489 ERROR: did not find lock to convert on grant queue! cookie=0:6 Dec 4 10:00:20 parmenides kernel: lockres: M000000000000000000000b6f931666, owner=1, state=0 Dec 4 10:00:20 parmenides kernel: last used: 0, refcnt: 4, on purge list: no Dec 4 10:00:20 parmenides kernel: on dirty list: no, on reco list: no, migrating pending: no
Dec  4 10:00:20 parmenides kernel:   inflight locks: 0, asts reserved: 0
Dec  4 10:00:20 parmenides kernel:   refmap nodes: [ 0 ], inflight=0
Dec  4 10:00:20 parmenides kernel:   granted queue:
Dec 4 10:00:20 parmenides kernel: type=5, conv=-1, node=1, cookie=1:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), pending=(conv=n,lock=n
,cancel=n,unlock=n)
Dec  4 10:00:20 parmenides kernel:   converting queue:
Dec 4 10:00:20 parmenides kernel: type=0, conv=3, node=0, cookie=0:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), pending=(conv=n,lock=n,
cancel=n,unlock=n)
Dec  4 10:00:20 parmenides kernel:   blocked queue:/

Node 1 does not find that lock in the granted queue because that lock is in the
converting queue. Do you have the very first error message on both nodes
relating to this resource?
Here they are:

Node 0:
Dec 4 09:15:06 heraclito kernel: o2net: connection to node parmenides (num 1) at 192.168.1.2:7777 has been idle for 30.0 seconds, shutting it down. Dec 4 09:15:06 heraclito kernel: (swapper,0,7):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1291450476.228826 now 1291450506.229456 dr 1291450476.228760 adv 1291450476.228842:1291450476.228843 func (de6e01eb:500) 1291450476.228827:1291450476.228829) Dec 4 09:15:06 heraclito kernel: o2net: no longer connected to node parmenides (num 1) at 192.168.1.2:7777 Dec 4 09:15:06 heraclito kernel: (vzlist,22622,7):dlm_send_remote_convert_request:395 ERROR: status = -112 Dec 4 09:15:06 heraclito kernel: (snmpd,16452,10):dlm_send_remote_convert_request:395 ERROR: status = -112 Dec 4 09:15:06 heraclito kernel: (snmpd,16452,10):dlm_wait_for_node_death:370 0D3E49EB1F614A3EAEC0E2A74A34AFFF: waiting 5000ms for notification of de
ath of node 1
Dec 4 09:15:06 heraclito kernel: (httpd,4615,10):dlm_do_master_request:1334 ERROR: link to 1 went down! Dec 4 09:15:06 heraclito kernel: (httpd,4615,10):dlm_get_lock_resource:917 ERROR: status = -112 Dec 4 09:15:06 heraclito kernel: (python,20750,10):dlm_do_master_request:1334 ERROR: link to 1 went down! Dec 4 09:15:06 heraclito kernel: (python,20750,10):dlm_get_lock_resource:917 ERROR: status = -112 Dec 4 09:15:06 heraclito kernel: (vzlist,22622,7):dlm_wait_for_node_death:370 0D3E49EB1F614A3EAEC0E2A74A34AFFF: waiting 5000ms for notification of de
ath of node 1
Dec 4 09:15:06 heraclito kernel: o2net: accepted connection from node parmenides (num 1) at 192.168.1.2:7777 Dec 4 09:15:11 heraclito kernel: (snmpd,16452,5):dlm_send_remote_convert_request:393 ERROR: dlm status = DLM_IVLOCKID Dec 4 09:15:11 heraclito kernel: (snmpd,16452,5):dlmconvert_remote:327 ERROR: dlm status = DLM_IVLOCKID Dec 4 09:15:11 heraclito kernel: (snmpd,16452,5):ocfs2_cluster_lock:1258 ERROR: DLM error DLM_IVLOCKID while calling dlmlock on resource M00000000000
0000000000b6f931666: bad lockid

Node 1:
Dec 4 09:15:06 parmenides kernel: o2net: connection to node heraclito (num 0) at 192.168.1.3:7777 has been idle for 30.0 seconds, shutting it down. Dec 4 09:15:06 parmenides kernel: (swapper,0,9):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1291450476.231519 now 1291450506.232462 dr 1291450476.231506 adv 1291450476.231522:1291450476.231522 func (de6e01eb:505) 1291450475.650496:1291450475.650501) Dec 4 09:15:06 parmenides kernel: o2net: no longer connected to node heraclito (num 0) at 192.168.1.3:7777 Dec 4 09:15:06 parmenides kernel: (snmpd,12342,11):dlm_do_master_request:1334 ERROR: link to 0 went down! Dec 4 09:15:06 parmenides kernel: (minilogd,12700,0):dlm_wait_for_lock_mastery:1117 ERROR: status = -112 Dec 4 09:15:06 parmenides kernel: (smbd,25555,12):dlm_do_master_request:1334 ERROR: link to 0 went down! Dec 4 09:15:06 parmenides kernel: (python,12439,9):dlm_do_master_request:1334 ERROR: link to 0 went down! Dec 4 09:15:06 parmenides kernel: (python,12439,9):dlm_get_lock_resource:917 ERROR: status = -112 Dec 4 09:15:06 parmenides kernel: (smbd,25555,12):dlm_get_lock_resource:917 ERROR: status = -112 Dec 4 09:15:06 parmenides kernel: (minilogd,12700,0):dlm_do_master_request:1334 ERROR: link to 0 went down! Dec 4 09:15:06 parmenides kernel: (minilogd,12700,0):dlm_get_lock_resource:917 ERROR: status = -107 Dec 4 09:15:06 parmenides kernel: (dlm_thread,10627,4):dlm_drop_lockres_ref:2211 ERROR: status = -112 Dec 4 09:15:06 parmenides kernel: (dlm_thread,10627,4):dlm_purge_lockres:206 ERROR: status = -112 Dec 4 09:15:06 parmenides kernel: o2net: connected to node heraclito (num 0) at 192.168.1.3:7777 Dec 4 09:15:06 parmenides kernel: (snmpd,12342,11):dlm_get_lock_resource:917 ERROR: status = -112 Dec 4 09:15:11 parmenides kernel: (o2net,10545,6):dlm_convert_lock_handler:489 ERROR: did not find lock to convert on grant queue! cookie=0:6 Dec 4 09:15:11 parmenides kernel: lockres: M000000000000000000000b6f931666, owner=1, state=0 Dec 4 09:15:11 parmenides kernel: last used: 0, refcnt: 4, on purge list: no Dec 4 09:15:11 parmenides kernel: on dirty list: no, on reco list: no, migrating pending: no
Dec  4 09:15:11 parmenides kernel:   inflight locks: 0, asts reserved: 0
Dec  4 09:15:11 parmenides kernel:   refmap nodes: [ 0 ], inflight=0
Dec  4 09:15:11 parmenides kernel:   granted queue:
Dec 4 09:15:11 parmenides kernel: type=5, conv=-1, node=1, cookie=1:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), pending=(conv=n,lock=n
,cancel=n,unlock=n)
Dec  4 09:15:11 parmenides kernel:   converting queue:
Dec 4 09:15:11 parmenides kernel: type=0, conv=3, node=0, cookie=0:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), pending=(conv=n,lock=n,
cancel=n,unlock=n)
Dec  4 09:15:11 parmenides kernel:   blocked queue:


Also, this is definitely a system object. Can you list the system directory?
# debugfs.ocfs2 -R "ls -l //" /dev/sdX

# debugfs.ocfs2 -R "ls -l //" /dev/mapper/mpath2
6 drwxr-xr-x 4 0 0 3896 19-Oct-2010 08:42 . 6 drwxr-xr-x 4 0 0 3896 19-Oct-2010 08:42 .. 7 -rw-r--r-- 1 0 0 0 19-Oct-2010 08:42 bad_blocks 8 -rw-r--r-- 1 0 0 831488 19-Oct-2010 08:42 global_inode_alloc 9 -rw-r--r-- 1 0 0 4096 19-Oct-2010 08:47 slot_map 10 -rw-r--r-- 1 0 0 1048576 19-Oct-2010 08:42 heartbeat 11 -rw-r--r-- 1 0 0 2199023255552 19-Oct-2010 08:42 global_bitmap 12 drwxr-xr-x 2 0 0 12288 14-Dec-2010 08:58 orphan_dir:0000 13 drwxr-xr-x 2 0 0 16384 14-Dec-2010 08:50 orphan_dir:0001 14 -rw-r--r-- 1 0 0 1103101952 19-Oct-2010 08:42 extent_alloc:0000 15 -rw-r--r-- 1 0 0 1103101952 19-Oct-2010 08:42 extent_alloc:0001 16 -rw-r--r-- 1 0 0 14109638656 19-Oct-2010 08:42 inode_alloc:0000 17 -rw-r--r-- 1 0 0 6673137664 19-Oct-2010 08:42 inode_alloc:0001 18 -rw-r--r-- 1 0 0 268435456 19-Oct-2010 08:46 journal:0000 19 -rw-r--r-- 1 0 0 268435456 19-Oct-2010 08:47 journal:0001 20 -rw-r--r-- 1 0 0 0 19-Oct-2010 08:42 local_alloc:0000 21 -rw-r--r-- 1 0 0 0 19-Oct-2010 08:42 local_alloc:0001 22 -rw-r--r-- 1 0 0 0 19-Oct-2010 08:42 truncate_log:0000 23 -rw-r--r-- 1 0 0 0 19-Oct-2010 08:42 truncate_log:0001

Thanks once more for your help.
Regards.

Frank



--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to