Hi, Florian
I compard my HA config, can almost say, your Heartbeat configure just
can work, but DRBD has something wrong. See this:
crmd[17381]: 2008/03/05_11:44:34 ERROR: process_lrm_event: LRM
operation DRBD_AFD:1_promote_0 (17) Timed Out (timeout=20000ms)
drbd[18348]: 2008/03/05_11:44:34 DEBUG: r0 notify: post for stop -
counts: active 0 - starting 1 - stopping 1
drbd[18348]: 2008/03/05_11:44:34 DEBUG: r0: Calling drbdadm -c /etc/
drbd.conf state r0
drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Exit code 0
drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Command output: Child
process does not terminate! Exiting. No response from the DRBD driver!
Is the module loaded? Unknown/TOO_LARGE
drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Calling drbdadm -c /etc/
drbd.conf cstate r0
lrmd[17378]: 2008/03/05_11:44:54 WARN: DRBD_AFD:1:notify process (PID
18348) timed out (try 1). Killing with signal SIGTERM (15).
lrmd[17378]: 2008/03/05_11:44:54 WARN: operation notify[18] on
ocf::drbd::DRBD_AFD:1 for client 17381, its parameters:
CRM_meta_role=[Master] CRM_meta_notify_stop_resource=[DRBD_AFD:0 ]
CRM_meta_notify_operation=[stop]
CRM_meta_notify_start_resource=[DRBD_AFD:1 ]
CRM_meta_notify_stop_uname=[noderz ]
CRM_meta_notify_promote_resource=[DRBD_AFD:1 ] drbd_resource=[r0]
CRM_meta_notify_master_uname=[noderz ]
CRM_meta_notify_demote_uname=[noderz ] CRM_meta_master_max=[1]
CRM_meta_notify_master_resource=[DRBD_AFD:0 ] CRM_meta_timeout=[20000]
CRM_meta_s: pid [18348] timed out
There's something wrong when HA running drbdadm command, it hangs. By
seeing you drbd.conf, I think you may be using the DRBD 8.x but not
7.x, am I right? I must say for your case, the more stable DRBD 7.x is
enough: you never want Two-Primary DRBD node.
Regards,
Chun Tian (binghe)
Hi,
thanks for your reply.
The attachements were already added to the first mail to the list
but here they are again :)
Thanks
Florian
-----Ursprüngliche Nachricht-----
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
ha.org] Im Auftrag von Chun Tian (binghe)
Gesendet: Montag, 10. März 2008 13:26
An: General Linux-HA mailing list
Betreff: Re: AW: [Linux-HA] Switchover problem with DRBD
Hi, there
If you're using HA 2.x, maybe you should show some parts of your
cib.xml. I have running DRBD HA clusters, and I think the key is in
the cib.xml
Regards,
Chun Tian (binghe)
Isn't anybody able to give a hint why promoting the DRBD-instance
fails? :(
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
ha.org] Im Auftrag von Schmidt, Florian
Gesendet: Mittwoch, 5. März 2008 15:22
An: General Linux-HA mailing list; [EMAIL PROTECTED]
Betreff: [Linux-HA] Switchover problem with DRBD
Hi everybody,
Testing my 2-node-cluster i got a strange behaviour when stopping
heartbeat on my primary node. I don't know if it is caused by
heartbeat
or DRBD or both, so I post this in both lists.
Starting with this:
============
Last updated: Wed Mar 5 15:01:10 2008
Current DC: noderz (91d062c3-ad0a-4c24-b759-acada7f19101)
2 Nodes configured.
3 Resources configured.
============
Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): online
Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online
Master/Slave Set: DRBD
DRBD_AFD:0 (heartbeat::ocf:drbd): Master noderz
DRBD_AFD:1 (heartbeat::ocf:drbd): Started nodekrz Resource
Group:
Group1
Filesystem (heartbeat::ocf:Filesystem): Started noderz
AFD (lsb:afdha): Started noderz
Cluster_IP (heartbeat::ocf:IPaddr): Started noderz
I said /etc/init.d/heartbeat stop on primary node (noderz) and
expected
this:
============
Last updated: Wed Mar 5 15:01:10 2008
Current DC: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d)
2 Nodes configured.
3 Resources configured.
============
Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): OFFLINE
Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online
Master/Slave Set: DRBD
DRBD_AFD:0 (heartbeat::ocf:drbd): stopped
DRBD_AFD:1 (heartbeat::ocf:drbd): Master nodekrz
Resource Group: Group1
Filesystem (heartbeat::ocf:Filesystem): Started nodekrz
AFD (lsb:afdha): Started nodekrz
Cluster_IP (heartbeat::ocf:IPaddr): Started nodekrz
But I got this:
============
Last updated: Wed Mar 5 14:52:06 2008
Current DC: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d)
2 Nodes configured.
3 Resources configured.
============
Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): OFFLINE
Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online
Master/Slave Set: DRBD
DRBD_AFD:0 (heartbeat::ocf:drbd): Stopped
DRBD_AFD:1 (heartbeat::ocf:drbd): Started nodekrz
Failed actions:
DRBD_AFD:1_promote_0 (node=nodekrz, call=17, rc=-2): Timed Out
I added the /var/log/ha-debug of the node, a cibadmin -Q, my ha.cf
and
my drbd.conf (if needed)
Would be nice if someone could give me a hint why the switchover
fails.
Thanks a lot for any help.
Florian
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
<ha.cf><drbd.conf><ha-
debug><cib.xml>_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems