Hi, Florian

I compard my HA config, can almost say, your Heartbeat configure just can work, but DRBD has something wrong. See this:

crmd[17381]: 2008/03/05_11:44:34 ERROR: process_lrm_event: LRM operation DRBD_AFD:1_promote_0 (17) Timed Out (timeout=20000ms) drbd[18348]: 2008/03/05_11:44:34 DEBUG: r0 notify: post for stop - counts: active 0 - starting 1 - stopping 1 drbd[18348]: 2008/03/05_11:44:34 DEBUG: r0: Calling drbdadm -c /etc/ drbd.conf state r0
drbd[18348]:    2008/03/05_11:44:44 DEBUG: r0: Exit code 0
drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Command output: Child process does not terminate! Exiting. No response from the DRBD driver! Is the module loaded? Unknown/TOO_LARGE drbd[18348]: 2008/03/05_11:44:44 DEBUG: r0: Calling drbdadm -c /etc/ drbd.conf cstate r0 lrmd[17378]: 2008/03/05_11:44:54 WARN: DRBD_AFD:1:notify process (PID 18348) timed out (try 1). Killing with signal SIGTERM (15). lrmd[17378]: 2008/03/05_11:44:54 WARN: operation notify[18] on ocf::drbd::DRBD_AFD:1 for client 17381, its parameters: CRM_meta_role=[Master] CRM_meta_notify_stop_resource=[DRBD_AFD:0 ] CRM_meta_notify_operation=[stop] CRM_meta_notify_start_resource=[DRBD_AFD:1 ] CRM_meta_notify_stop_uname=[noderz ] CRM_meta_notify_promote_resource=[DRBD_AFD:1 ] drbd_resource=[r0] CRM_meta_notify_master_uname=[noderz ] CRM_meta_notify_demote_uname=[noderz ] CRM_meta_master_max=[1] CRM_meta_notify_master_resource=[DRBD_AFD:0 ] CRM_meta_timeout=[20000] CRM_meta_s: pid [18348] timed out

There's something wrong when HA running drbdadm command, it hangs. By seeing you drbd.conf, I think you may be using the DRBD 8.x but not 7.x, am I right? I must say for your case, the more stable DRBD 7.x is enough: you never want Two-Primary DRBD node.

Regards,

Chun Tian (binghe)

Hi,

thanks for your reply.

The attachements were already added to the first mail to the list but here they are again :)

Thanks
Florian

-----Ursprüngliche Nachricht-----
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
ha.org] Im Auftrag von Chun Tian (binghe)
Gesendet: Montag, 10. März 2008 13:26
An: General Linux-HA mailing list
Betreff: Re: AW: [Linux-HA] Switchover problem with DRBD

Hi, there

If you're using HA 2.x, maybe you should show some parts of your
cib.xml. I have running DRBD HA clusters, and I think the key is in
the cib.xml

Regards,

Chun Tian (binghe)

Isn't anybody able to give a hint why promoting the DRBD-instance
fails? :(

Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
ha.org] Im Auftrag von Schmidt, Florian
Gesendet: Mittwoch, 5. März 2008 15:22
An: General Linux-HA mailing list; [EMAIL PROTECTED]
Betreff: [Linux-HA] Switchover problem with DRBD

Hi everybody,

Testing my 2-node-cluster i got a strange behaviour when stopping
heartbeat on my primary node. I don't know if it is caused by
heartbeat
or DRBD or both, so I post this in both lists.

Starting with this:

============
Last updated: Wed Mar  5 15:01:10 2008
Current DC: noderz (91d062c3-ad0a-4c24-b759-acada7f19101)
2 Nodes configured.
3 Resources configured.
============

Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): online
Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online

Master/Slave Set: DRBD
  DRBD_AFD:0  (heartbeat::ocf:drbd):  Master noderz
  DRBD_AFD:1  (heartbeat::ocf:drbd):  Started nodekrz Resource
Group:
Group1
  Filesystem  (heartbeat::ocf:Filesystem):    Started noderz
  AFD (lsb:afdha):    Started noderz
Cluster_IP      (heartbeat::ocf:IPaddr):        Started noderz



I said /etc/init.d/heartbeat stop on primary node (noderz) and
expected
this:

============
Last updated: Wed Mar  5 15:01:10 2008
Current DC: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d)
2 Nodes configured.
3 Resources configured.
============

Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): OFFLINE
Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online

Master/Slave Set: DRBD
  DRBD_AFD:0  (heartbeat::ocf:drbd):  stopped
  DRBD_AFD:1  (heartbeat::ocf:drbd):  Master nodekrz
Resource Group: Group1
  Filesystem  (heartbeat::ocf:Filesystem):    Started nodekrz
  AFD (lsb:afdha):    Started nodekrz
Cluster_IP      (heartbeat::ocf:IPaddr):        Started nodekrz


But I got this:
============
Last updated: Wed Mar  5 14:52:06 2008
Current DC: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d)
2 Nodes configured.
3 Resources configured.
============

Node: noderz (91d062c3-ad0a-4c24-b759-acada7f19101): OFFLINE
Node: nodekrz (44425bd9-2cba-4d6a-ac62-82a8bb81a23d): online

Master/Slave Set: DRBD
  DRBD_AFD:0  (heartbeat::ocf:drbd):  Stopped
  DRBD_AFD:1  (heartbeat::ocf:drbd):  Started nodekrz

Failed actions:
  DRBD_AFD:1_promote_0 (node=nodekrz, call=17, rc=-2): Timed Out


I added the /var/log/ha-debug of the node, a cibadmin -Q, my ha.cf
and
my drbd.conf (if needed)

Would be nice if someone could give me a hint why the switchover
fails.

Thanks a lot for any help.
Florian
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
<ha.cf><drbd.conf><ha- debug><cib.xml>_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to