We had a similar issue using SLES 9 and a CX300.
We upgraded to the latest ocfs version and changed our
O2CB_HEARTBEAT_THRESHOLD in the /etc/sysconfig/o2cb file(on both nodes)
to the following :
# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=61
It seemed to sort the issue out for us, but could be a totally different
issue! ;-)
Mark Maiden
Systems Administrator
Globoforce, Ltd
6 Beckett Way Parkwest
Dublin 12
Ireland
t: +353 1 625 8812
f: +353 1 625 8880
e: [EMAIL PROTECTED]
www.globoforce.com
http://guidance.gospelcom.net/answer.htm
Andy Phillips wrote:
Hi,
I've got _exactly_ the same problem. I've not had the time to dive
through the source code and check it. We're on ES4.3 and ocfs-1.2.3.
For us the problem (same trace as below) was not that repeatable, and
was possibly related to the i/o pattern.
What seems to happen is that the underlying "network services" of
ocfs2 (o2net) believes that no packets are being sent. The tcp socket is
surrounded by wrapper functions, one of which times when the last packet
is received. Its this that decides the socket is dead, then closes the
socket. Meanwhile, the upper layers (which are actually sending data
regularly) find the carpet yanked out from underneath them, and decide
to halt the cluster to protect the data.
Highly annoying. I expect it will be some signed 32bit integer
wrapping somewhere....
Andy
On Mon, 2006-09-18 at 11:14 +0100, Andrew Brunton wrote:
Hi,
We have 2 Dell 1850’s in a cluster, both machines are running Redhat
Enterprise Linux 4 AS, update 2.
The boxes are connected to a Dell EMC CX300 using emulex HBA’s
The cluster is running an Oracle 10gR2 std edition RAC.
We are using ocfs2 to store files generated by our application and not
to store anything to do with the database.
We’ve been having a few problems were the servers appear to hang, and
have to be shutdown (using the powerbutton) and then started up again.
This seems to be happening every weekend and I don’t really understand
what’s happening, or how to fix it.
I’ve included an extract from messages in the hope someone can shed
some light on the matter.
Kind regards
Andrew
Sep 17 22:06:04 argon2 kernel: (0,0):o2net_idle_timer:1310 connection
to node argon1.crewe.ukfuels.co.uk (num 0) at 10.1.1.110:7777 has been
idle for 10 seconds, shutting it down.
Sep 17 22:06:04 argon2 kernel: (0,0):o2net_idle_timer:1321 here are
some times that might help debug the situation: (tmr 1158527154.993223
now 1158527164.993090 dr 1158527154.993213 adv
1158527154.993227:1158527154.993228 func (101e0528:505)
1158527153.796194:1158527153.796200)
Sep 17 22:06:04 argon2 kernel: (3854,0):o2net_set_nn_state:411 no
longer connected to node argon1.crewe.ukfuels.co.uk (num 0) at
10.1.1.110:7777
Sep 17 22:06:04 argon2 kernel:
(73,3):dlm_send_remote_unlock_request:350 ERROR: status = -112
Sep 17 22:06:04 argon2 kernel:
(73,3):dlm_send_remote_unlock_request:350 ERROR: status = -107
Sep 17 22:06:05 argon2 last message repeated 185 times
Sep 17 22:06:05 argon2 kernel:
(26144,1):dlm_send_remote_unlock_request:350 ERROR: status = -107
Sep 17 22:06:05 argon2 last message repeated 154 times
Sep 17 22:06:05 argon2 kernel:
(25274,2):dlm_send_remote_unlock_request:350 ERROR: status = -107
Sep 17 22:06:05 argon2 last message repeated 123 times
Sep 17 22:06:05 argon2 kernel:
(73,3):dlm_send_remote_unlock_request:350 ERROR: status = -107
Sep 17 22:06:05 argon2 last message repeated 472 times
Sep 17 22:06:05 argon2 kernel:
(73,1):dlm_send_remote_unlock_request:350 ERROR: status = -107
Sep 17 22:06:08 argon2 last message repeated 3239 times
Sep 17 22:06:08 argon2 kernel:
(73,3):dlm_send_remote_unlock_request:350 ERROR: status = -107
Sep 17 22:06:08 argon2 last message repeated 118 times
Sep 17 22:06:08 argon2 kernel:
(73,1):dlm_send_remote_unlock_request:350 ERROR: status = -107
Sep 18 08:40:32 argon2 syslogd 1.4.1: restart.
Sep 18 08:40:32 argon2 syslog: syslogd startup succeeded
Sep 18 08:40:32 argon2 kernel: klogd 1.4.1, log source = /proc/kmsg
started.
Sep 18 08:40:32 argon2 kernel: Bootdata ok (command line is ro
root=LABEL=/ apic rhgb quiet)
Sep 18 08:40:32 argon2 kernel: Linux version 2.6.9-22.0.1.ELsmp
([EMAIL PROTECTED]) (gcc version 3.4.4 20050721
(Red Hat 3.4.4-2)) #1 SMP
Andrew Brunton
Senior Application Developer
UK Fuels Limited
Tel +44 (0)1270 655636
Fax +44 (0)1270 655700
[EMAIL PROTECTED]
________________________________________________________________________
In order to protect our email recipients, Betfair use SkyScan from
MessageLabs to scan all Incoming and Outgoing mail for viruses.
________________________________________________________________________
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users