On 12/12/2010 11:58 PM, frank wrote:
After that, all node operations frozen; we can not log in either.

Node 0 keep on log this kind of messages until it stopped "message" logging at 
10:49:

/Dec  4 10:49:34 heraclito kernel: 
(sendmail,19074,6):ocfs2_inode_lock_full:2121 ERROR: status = -22
Dec  4 10:49:34 heraclito kernel: (sendmail,19074,6):_ocfs2_statfs:1266 ERROR: 
status = -22
Dec  4 10:49:34 heraclito kernel: 
(sendmail,19074,6):dlm_send_remote_convert_request:393 ERROR: dlm status = 
DLM_IVLOCKID
Dec  4 10:49:34 heraclito kernel: (sendmail,19074,6):dlmconvert_remote:327 
ERROR: dlm status = DLM_IVLOCKID
Dec  4 10:49:34 heraclito kernel: (sendmail,19074,6):ocfs2_cluster_lock:1258 
ERROR: DLM error DLM_IVLOCKID while calling dlmlock on resource M00000000
0000000000000b6f931666: bad lockid/

Node 0 is trying to upconvert the lock level.

Node 1 keep on log this kind of messages until it stopped "message" logging at 
10:00:

/Dec  4 10:00:20 parmenides kernel: 
(o2net,10545,14):dlm_convert_lock_handler:489 ERROR: did not find lock to 
convert on grant queue! cookie=0:6
Dec  4 10:00:20 parmenides kernel: lockres: M000000000000000000000b6f931666, 
owner=1, state=0
Dec  4 10:00:20 parmenides kernel:   last used: 0, refcnt: 4, on purge list: no
Dec  4 10:00:20 parmenides kernel:   on dirty list: no, on reco list: no, 
migrating pending: no
Dec  4 10:00:20 parmenides kernel:   inflight locks: 0, asts reserved: 0
Dec  4 10:00:20 parmenides kernel:   refmap nodes: [ 0 ], inflight=0
Dec  4 10:00:20 parmenides kernel:   granted queue:
Dec  4 10:00:20 parmenides kernel:     type=5, conv=-1, node=1, cookie=1:6, 
ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), pending=(conv=n,lock=n
,cancel=n,unlock=n)
Dec  4 10:00:20 parmenides kernel:   converting queue:
Dec  4 10:00:20 parmenides kernel:     type=0, conv=3, node=0, cookie=0:6, 
ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), pending=(conv=n,lock=n,
cancel=n,unlock=n)
Dec  4 10:00:20 parmenides kernel:   blocked queue:/

Node 1 does not find that lock in the granted queue because that lock is in the
converting queue. Do you have the very first error message on both nodes
relating to this resource?

Also, this is definitely a system object. Can you list the system directory?
# debugfs.ocfs2 -R "ls -l //" /dev/sdX


We reboot both nodes at 13:03, and we recovered services as usual with no more 
problems.

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to