[Pacemaker] Problem with colocation

2011-07-22 Thread Taneli Leppä

Hello,

I'm having a problem with colocation (namely that services end up on
different nodes):

Online: [ cluster1.intra cluster2.intra ]
OFFLINE: [ cluster3.intra ]

Sphinx_IP   (ocf::heartbeat:IPaddr2):   Started cluster1.intra
Sphinx  (lsb:sphinx):   Started cluster2.intra

As per request on irc, I've attached my cibadmin log.

--
  Taneli Leppä   | CISSP, RHCE, ZCE, CMDEV
  Crasman Co Ltd | tan...@crasman.fi

  

  




  


  

  

  
  

  
  
  

  
  


  

  
  


  
  
  

  
  

  
  
  

  


  

  
  

  

  

  
  

  


  
  

  

  


  

  

  
  

  

  
  


  

  


  
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Sending message via cpg FAILED: (rc=12) Doesn't exist

2011-07-22 Thread Proskurin Kirill

Hello all.


pacemaker-1.1.5
corosync-1.4.0

4 nodes in cluster. 3 online 1 not.
In logs:

Jul 22 11:50:23 my106.example.com crmd: [28030]: info: 
pcmk_quorum_notification: Membership 0: quorum retained (0)
Jul 22 11:50:23 my106.example.com crmd: [28030]: info: do_started: 
Delaying start, no membership data (0010)
Jul 22 11:50:23 my106.example.com crmd: [28030]: info: 
config_query_callback: Shutdown escalation occurs after: 120ms
Jul 22 11:50:23 my106.example.com crmd: [28030]: info: 
config_query_callback: Checking for expired actions every 90ms
Jul 22 11:50:23 my106.example.com crmd: [28030]: info: do_started: 
Delaying start, no membership data (0010)
Jul 22 11:50:27 my106.example.com attrd: [28028]: info: cib_connect: 
Connected to the CIB after 1 signon attempts
Jul 22 11:50:27 my106.example.com attrd: [28028]: info: cib_connect: 
Sending full refresh
Jul 22 11:52:18 corosync [TOTEM ] A processor joined or left the 
membership and a new membership was formed.
Jul 22 11:52:18 corosync [CPG   ] chosen downlist: sender r(0) 
ip(10.3.1.107) ; members(old:4 left:1)
Jul 22 11:52:18 corosync [MAIN  ] Completed service synchronization, 
ready to provide service.
Jul 22 11:52:19 my106.example.com pacemakerd: [28021]: ERROR: 
send_cpg_message: Sending message via cpg FAILED: (rc=12) Doesn't exist
Jul 22 11:52:19 my106.example.com pacemakerd: [28021]: ERROR: 
send_cpg_message: Sending message via cpg FAILED: (rc=12) Doesn't exist
Jul 22 11:52:19 my106.example.com pacemakerd: [28021]: ERROR: 
send_cpg_message: Sending message via cpg FAILED: (rc=12) Doesn't exist




DC:

Jul 22 11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee
Jul 22 11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee
Jul 22 11:50:07 my107.example.com pacemakerd: [22388]: info: 
update_node_processes: Node my106.example.com now has process list: 
0002 (was 00

12)
Jul 22 11:50:07 my107.example.com attrd: [22397]: info: crm_update_peer: 
Node my106.example.com: id=0 state=unknown addr=(null) votes=0 born=0 
seen=0 proc=00

02 (new)
Jul 22 11:50:07 my107.example.com cib: [22395]: info: crm_update_peer: 
Node my106.example.com: id=0 state=unknown addr=(null) votes=0 born=0 
seen=0 proc=0002

 (new)
Jul 22 11:50:07 my107.example.com stonith-ng: [22394]: info: 
crm_update_peer: Node my106.example.com: id=0 state=unknown addr=(null) 
votes=0 born=0 seen=0 proc=0

002 (new)
Jul 22 11:50:07 my107.example.com crmd: [22399]: info: crm_update_peer: 
Node my106.example.com: id=0 state=unknown addr=(null) votes=0 born=0 
seen=0 proc=000

2 (new)
Jul 22 11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee
Jul 22 11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee


There is a problem?

--
Best regards,
Proskurin Kirill

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Cluster type is: corosync

2011-07-22 Thread Proskurin Kirill

Hello again!

Hope I`m not flooding too much here but I have another problem.

I install same rpm of corosync, openais, pacemaker, cluster_glue on all 
nodes. I check it twice.


And then I start some of they - they can`t connect to cluster and stays 
offline. In logs I see what they see other nodes and connectivity is ok. 
But I found the difference:


Online nodes in cluster have:
[root@mysender39 ~]# grep 'Cluster type is' /var/log/corosync.log
Jul 22 20:38:58 mysender39.mail.ru stonith-ng: [3499]: info: 
get_cluster_type: Cluster type is: 'openais'.
Jul 22 20:38:58 mysender39.mail.ru attrd: [3502]: info: 
get_cluster_type: Cluster type is: 'openais'.
Jul 22 20:38:58 mysender39.mail.ru cib: [3500]: info: get_cluster_type: 
Cluster type is: 'openais'.
Jul 22 20:38:59 mysender39.mail.ru crmd: [3504]: info: get_cluster_type: 
Cluster type is: 'openais'.


Offline have:
[root@mysender2 ~]# grep 'Cluster type is' /var/log/corosync.log
Jul 22 13:39:17 mysender2.mail.ru stonith-ng: [9028]: info: 
get_cluster_type: Cluster type is: 'corosync'.
Jul 22 13:39:17 mysender2.mail.ru attrd: [9031]: info: get_cluster_type: 
Cluster type is: 'corosync'.
Jul 22 13:39:17 mysender2.mail.ru cib: [9029]: info: get_cluster_type: 
Cluster type is: 'corosync'.
Jul 22 13:39:18 mysender2.mail.ru crmd: [9033]: info: get_cluster_type: 
Cluster type is: 'corosync'.


What`s wrong and how can I fix it?

--
Best regards,
Proskurin Kirill

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Sending message via cpg FAILED: (rc=12) Doesn't exist

2011-07-22 Thread Steven Dake
On 07/22/2011 01:15 AM, Proskurin Kirill wrote:
> Hello all.
> 
> 
> pacemaker-1.1.5
> corosync-1.4.0
> 
> 4 nodes in cluster. 3 online 1 not.
> In logs:
> 
> Jul 22 11:50:23 my106.example.com crmd: [28030]: info:
> pcmk_quorum_notification: Membership 0: quorum retained (0)
> Jul 22 11:50:23 my106.example.com crmd: [28030]: info: do_started:
> Delaying start, no membership data (0010)
> Jul 22 11:50:23 my106.example.com crmd: [28030]: info:
> config_query_callback: Shutdown escalation occurs after: 120ms
> Jul 22 11:50:23 my106.example.com crmd: [28030]: info:
> config_query_callback: Checking for expired actions every 90ms
> Jul 22 11:50:23 my106.example.com crmd: [28030]: info: do_started:
> Delaying start, no membership data (0010)
> Jul 22 11:50:27 my106.example.com attrd: [28028]: info: cib_connect:
> Connected to the CIB after 1 signon attempts
> Jul 22 11:50:27 my106.example.com attrd: [28028]: info: cib_connect:
> Sending full refresh
> Jul 22 11:52:18 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jul 22 11:52:18 corosync [CPG   ] chosen downlist: sender r(0)
> ip(10.3.1.107) ; members(old:4 left:1)
> Jul 22 11:52:18 corosync [MAIN  ] Completed service synchronization,
> ready to provide service.
> Jul 22 11:52:19 my106.example.com pacemakerd: [28021]: ERROR:
> send_cpg_message: Sending message via cpg FAILED: (rc=12) Doesn't exist
> Jul 22 11:52:19 my106.example.com pacemakerd: [28021]: ERROR:
> send_cpg_message: Sending message via cpg FAILED: (rc=12) Doesn't exist
> Jul 22 11:52:19 my106.example.com pacemakerd: [28021]: ERROR:
> send_cpg_message: Sending message via cpg FAILED: (rc=12) Doesn't exist
> 
> 
> 
> DC:
> 
> Jul 22 11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee
> Jul 22 11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee
> Jul 22 11:50:07 my107.example.com pacemakerd: [22388]: info:
> update_node_processes: Node my106.example.com now has process list:
> 0002 (was 00
> 12)
> Jul 22 11:50:07 my107.example.com attrd: [22397]: info: crm_update_peer:
> Node my106.example.com: id=0 state=unknown addr=(null) votes=0 born=0
> seen=0 proc=00
> 02 (new)
> Jul 22 11:50:07 my107.example.com cib: [22395]: info: crm_update_peer:
> Node my106.example.com: id=0 state=unknown addr=(null) votes=0 born=0
> seen=0 proc=0002
>  (new)
> Jul 22 11:50:07 my107.example.com stonith-ng: [22394]: info:
> crm_update_peer: Node my106.example.com: id=0 state=unknown addr=(null)
> votes=0 born=0 seen=0 proc=0
> 002 (new)
> Jul 22 11:50:07 my107.example.com crmd: [22399]: info: crm_update_peer:
> Node my106.example.com: id=0 state=unknown addr=(null) votes=0 born=0
> seen=0 proc=000
> 2 (new)
> Jul 22 11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee
> Jul 22 11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee
> 
> 
> There is a problem?
> 

Does your retransmit list continually display e4 e5 etc for rest of
cluster lifetime, or is this short lived?



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Sending message via cpg FAILED: (rc=12) Doesn't exist

2011-07-22 Thread Proskurin Kirill

22.07.2011 20:30, Steven Dake пишет:

On 07/22/2011 01:15 AM, Proskurin Kirill wrote:

Hello all.


pacemaker-1.1.5
corosync-1.4.0
11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee
Jul 22 11:50:07 corosync [TOTEM ] Retransmit List: e4 e5 e7 e8 ea eb ed ee


There is a problem?



Does your retransmit list continually display e4 e5 etc for rest of
cluster lifetime, or is this short lived?


Yes it continually display this.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Unable to configure Pacemaker with cibadmin

2011-07-22 Thread Kelly Wong
Hello,

I am trying to update the configuration of my cluster through the cibadmin
command, but the command always fails:

cibadmin --replace --scope resources --xml-file r.xml
Call cib_replace failed (-41): Remote node did not respond


I was able to replace the initial blank configuration, but updating it
doesn¹t seem to work.  The cluster is functioning and running some of the
resources.  Some of the are down, but I don¹t think that should make a
difference:


Last updated: Fri Jul 22 18:33:03 2011
Stack: openais
Current DC: poc-tst-rh4 - partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, 2 expected votes
3 Resources configured.


Online: [ poc-tst-rh4 poc-tst-rh4-2 ]

 Resource Group: mysql
 fs_mysql(ocf::heartbeat:Filesystem):Started poc-tst-rh4
 mysqld(ocf::heartbeat:mysql):Stopped
 Master/Slave Set: ms_drbd_mysql
 Masters: [ poc-tst-rh4 ]
 Slaves: [ poc-tst-rh4-2 ]
 Clone Set: pingclone
 Started: [ poc-tst-rh4-2 poc-tst-rh4 ]

Failed actions:
mysqld_start_0 (node=poc-tst-rh4, call=26, rc=5, status=complete): not
installed
fs_mysql_start_0 (node=poc-tst-rh4-2, call=31, rc=5, status=complete):
not installed

If I try to use the crm command line, it rejects any configuration changes I
make:
crm configure edit
ERROR: could not replace mysql
INFO: offending xml: 




























What could be causing the configuration to fail?

Thank you for any assistance,
Kelly Wong
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker