Al 13/12/10 20:49, En/na Sunil Mushran ha escrit:
On 12/12/2010 11:58 PM, frank wrote:
After that, all node operations frozen; we can not log in either.
Node 0 keep on log this kind of messages until it stopped "message"
logging at 10:49:
/Dec 4 10:49:34 heraclito kernel:
(sendmail,19074,6):ocfs2_inode_lock_full:2121 ERROR: status = -22
Dec 4 10:49:34 heraclito kernel:
(sendmail,19074,6):_ocfs2_statfs:1266 ERROR: status = -22
Dec 4 10:49:34 heraclito kernel:
(sendmail,19074,6):dlm_send_remote_convert_request:393 ERROR: dlm
status = DLM_IVLOCKID
Dec 4 10:49:34 heraclito kernel:
(sendmail,19074,6):dlmconvert_remote:327 ERROR: dlm status = DLM_IVLOCKID
Dec 4 10:49:34 heraclito kernel:
(sendmail,19074,6):ocfs2_cluster_lock:1258 ERROR: DLM error
DLM_IVLOCKID while calling dlmlock on resource M00000000
0000000000000b6f931666: bad lockid/
Node 0 is trying to upconvert the lock level.
Node 1 keep on log this kind of messages until it stopped "message"
logging at 10:00:
/Dec 4 10:00:20 parmenides kernel:
(o2net,10545,14):dlm_convert_lock_handler:489 ERROR: did not find
lock to convert on grant queue! cookie=0:6
Dec 4 10:00:20 parmenides kernel: lockres:
M000000000000000000000b6f931666, owner=1, state=0
Dec 4 10:00:20 parmenides kernel: last used: 0, refcnt: 4, on
purge list: no
Dec 4 10:00:20 parmenides kernel: on dirty list: no, on reco list:
no, migrating pending: no
Dec 4 10:00:20 parmenides kernel: inflight locks: 0, asts reserved: 0
Dec 4 10:00:20 parmenides kernel: refmap nodes: [ 0 ], inflight=0
Dec 4 10:00:20 parmenides kernel: granted queue:
Dec 4 10:00:20 parmenides kernel: type=5, conv=-1, node=1,
cookie=1:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n
,cancel=n,unlock=n)
Dec 4 10:00:20 parmenides kernel: converting queue:
Dec 4 10:00:20 parmenides kernel: type=0, conv=3, node=0,
cookie=0:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n,
cancel=n,unlock=n)
Dec 4 10:00:20 parmenides kernel: blocked queue:/
Node 1 does not find that lock in the granted queue because that lock
is in the
converting queue. Do you have the very first error message on both nodes
relating to this resource?
Here they are:
Node 0:
Dec 4 09:15:06 heraclito kernel: o2net: connection to node parmenides
(num 1) at 192.168.1.2:7777 has been idle for 30.0 seconds, shutting it
down.
Dec 4 09:15:06 heraclito kernel: (swapper,0,7):o2net_idle_timer:1503
here are some times that might help debug the situation: (tmr
1291450476.228826
now 1291450506.229456 dr 1291450476.228760 adv
1291450476.228842:1291450476.228843 func (de6e01eb:500)
1291450476.228827:1291450476.228829)
Dec 4 09:15:06 heraclito kernel: o2net: no longer connected to node
parmenides (num 1) at 192.168.1.2:7777
Dec 4 09:15:06 heraclito kernel:
(vzlist,22622,7):dlm_send_remote_convert_request:395 ERROR: status = -112
Dec 4 09:15:06 heraclito kernel:
(snmpd,16452,10):dlm_send_remote_convert_request:395 ERROR: status = -112
Dec 4 09:15:06 heraclito kernel:
(snmpd,16452,10):dlm_wait_for_node_death:370
0D3E49EB1F614A3EAEC0E2A74A34AFFF: waiting 5000ms for notification of de
ath of node 1
Dec 4 09:15:06 heraclito kernel:
(httpd,4615,10):dlm_do_master_request:1334 ERROR: link to 1 went down!
Dec 4 09:15:06 heraclito kernel:
(httpd,4615,10):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:06 heraclito kernel:
(python,20750,10):dlm_do_master_request:1334 ERROR: link to 1 went down!
Dec 4 09:15:06 heraclito kernel:
(python,20750,10):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:06 heraclito kernel:
(vzlist,22622,7):dlm_wait_for_node_death:370
0D3E49EB1F614A3EAEC0E2A74A34AFFF: waiting 5000ms for notification of de
ath of node 1
Dec 4 09:15:06 heraclito kernel: o2net: accepted connection from node
parmenides (num 1) at 192.168.1.2:7777
Dec 4 09:15:11 heraclito kernel:
(snmpd,16452,5):dlm_send_remote_convert_request:393 ERROR: dlm status =
DLM_IVLOCKID
Dec 4 09:15:11 heraclito kernel: (snmpd,16452,5):dlmconvert_remote:327
ERROR: dlm status = DLM_IVLOCKID
Dec 4 09:15:11 heraclito kernel:
(snmpd,16452,5):ocfs2_cluster_lock:1258 ERROR: DLM error DLM_IVLOCKID
while calling dlmlock on resource M00000000000
0000000000b6f931666: bad lockid
Node 1:
Dec 4 09:15:06 parmenides kernel: o2net: connection to node heraclito
(num 0) at 192.168.1.3:7777 has been idle for 30.0 seconds, shutting it
down.
Dec 4 09:15:06 parmenides kernel: (swapper,0,9):o2net_idle_timer:1503
here are some times that might help debug the situation: (tmr
1291450476.231519
now 1291450506.232462 dr 1291450476.231506 adv
1291450476.231522:1291450476.231522 func (de6e01eb:505)
1291450475.650496:1291450475.650501)
Dec 4 09:15:06 parmenides kernel: o2net: no longer connected to node
heraclito (num 0) at 192.168.1.3:7777
Dec 4 09:15:06 parmenides kernel:
(snmpd,12342,11):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec 4 09:15:06 parmenides kernel:
(minilogd,12700,0):dlm_wait_for_lock_mastery:1117 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel:
(smbd,25555,12):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec 4 09:15:06 parmenides kernel:
(python,12439,9):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec 4 09:15:06 parmenides kernel:
(python,12439,9):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel:
(smbd,25555,12):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel:
(minilogd,12700,0):dlm_do_master_request:1334 ERROR: link to 0 went down!
Dec 4 09:15:06 parmenides kernel:
(minilogd,12700,0):dlm_get_lock_resource:917 ERROR: status = -107
Dec 4 09:15:06 parmenides kernel:
(dlm_thread,10627,4):dlm_drop_lockres_ref:2211 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel:
(dlm_thread,10627,4):dlm_purge_lockres:206 ERROR: status = -112
Dec 4 09:15:06 parmenides kernel: o2net: connected to node heraclito
(num 0) at 192.168.1.3:7777
Dec 4 09:15:06 parmenides kernel:
(snmpd,12342,11):dlm_get_lock_resource:917 ERROR: status = -112
Dec 4 09:15:11 parmenides kernel:
(o2net,10545,6):dlm_convert_lock_handler:489 ERROR: did not find lock to
convert on grant queue! cookie=0:6
Dec 4 09:15:11 parmenides kernel: lockres:
M000000000000000000000b6f931666, owner=1, state=0
Dec 4 09:15:11 parmenides kernel: last used: 0, refcnt: 4, on purge
list: no
Dec 4 09:15:11 parmenides kernel: on dirty list: no, on reco list:
no, migrating pending: no
Dec 4 09:15:11 parmenides kernel: inflight locks: 0, asts reserved: 0
Dec 4 09:15:11 parmenides kernel: refmap nodes: [ 0 ], inflight=0
Dec 4 09:15:11 parmenides kernel: granted queue:
Dec 4 09:15:11 parmenides kernel: type=5, conv=-1, node=1,
cookie=1:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n
,cancel=n,unlock=n)
Dec 4 09:15:11 parmenides kernel: converting queue:
Dec 4 09:15:11 parmenides kernel: type=0, conv=3, node=0,
cookie=0:6, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n,
cancel=n,unlock=n)
Dec 4 09:15:11 parmenides kernel: blocked queue:
Also, this is definitely a system object. Can you list the system
directory?
# debugfs.ocfs2 -R "ls -l //" /dev/sdX
# debugfs.ocfs2 -R "ls -l //" /dev/mapper/mpath2
6 drwxr-xr-x 4 0 0 3896
19-Oct-2010 08:42 .
6 drwxr-xr-x 4 0 0 3896
19-Oct-2010 08:42 ..
7 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 bad_blocks
8 -rw-r--r-- 1 0 0 831488
19-Oct-2010 08:42 global_inode_alloc
9 -rw-r--r-- 1 0 0 4096
19-Oct-2010 08:47 slot_map
10 -rw-r--r-- 1 0 0 1048576
19-Oct-2010 08:42 heartbeat
11 -rw-r--r-- 1 0 0 2199023255552
19-Oct-2010 08:42 global_bitmap
12 drwxr-xr-x 2 0 0 12288
14-Dec-2010 08:58 orphan_dir:0000
13 drwxr-xr-x 2 0 0 16384
14-Dec-2010 08:50 orphan_dir:0001
14 -rw-r--r-- 1 0 0 1103101952
19-Oct-2010 08:42 extent_alloc:0000
15 -rw-r--r-- 1 0 0 1103101952
19-Oct-2010 08:42 extent_alloc:0001
16 -rw-r--r-- 1 0 0 14109638656
19-Oct-2010 08:42 inode_alloc:0000
17 -rw-r--r-- 1 0 0 6673137664
19-Oct-2010 08:42 inode_alloc:0001
18 -rw-r--r-- 1 0 0 268435456
19-Oct-2010 08:46 journal:0000
19 -rw-r--r-- 1 0 0 268435456
19-Oct-2010 08:47 journal:0001
20 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 local_alloc:0000
21 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 local_alloc:0001
22 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 truncate_log:0000
23 -rw-r--r-- 1 0 0 0
19-Oct-2010 08:42 truncate_log:0001
Thanks once more for your help.
Regards.
Frank
--
Aquest missatge ha estat analitzat per MailScanner
a la cerca de virus i d'altres continguts perillosos,
i es considera que està net.
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users