[Pacemaker] pcs equivalent of crm configure erase

2013-04-14 Thread Andreas Mock
Hi all,

 

can someone tell me what the pcs equivalent to

crm configure erase is?

 

Is there a pcs cheat sheet showing the common tasks?

Or a documentation?

 

Best regards

Andreas

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Disable startup fencing with cman

2013-04-14 Thread Andreas Mock
Hi all,

 

in a two node cluster (RHEL6.x, cman, pacemaker)

when I startup the very first node,

this node will try to fence the other node if it can't see it.

This can be true in case of maintenance. How do I avoid

this startup fencing temporarily when I know that the

other node is down?

 

Best regards

Andreas

 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] iscsi target mounting readonly on client

2013-04-14 Thread Joseph-Andre Guaragna
Thanks for the informations. I give it a try monday and gave you a feedback.



2013/4/12 Felix Zachlod fz.li...@sis-gmbh.info:
 Hello Joseph!

 -Ursprüngliche Nachricht-,
 Von: Joseph-Andre Guaragna [mailto:joseph-an...@rdmo.com]
 Gesendet: Freitag, 12. April 2013 17:19
 An: pacemaker@oss.clusterlabs.org
 Betreff: [Pacemaker] iscsi target mounting readonly on client

 You have to make two things absolutely shure.

 1. Data that has been acknowledged by you iscsi Target to your initiator has
 hit the device and not only the page cache!

 If you run your target in fileio mode you have to use write trough- cause
 with write back you or your cluster manager can't ever tell if the writes
 have completed before switching the DRBD states.
 That will only perform good if you have a decent raid card with BBWC! BUT
 YOU MUST RUN WRITE TRHOUGH or blockio (which will be write trough too)
 running write back in such a constellation IS NOT SAFE you might risk
 SERIOUS DATA CORRUPTION when switching targets.

 2. On your initiator side try to rise the /sys/block/sd*/device/timeout
 value. That is the time the block device will wait for a command to complete
 before handing an i/o error tot he upper layer- which will most probably
 lead to your filesystem remounting r/o.

 3. This is just a side note: do not use iet. We were running a production
 target wit iet for about 2 year which caused horrible problems to us.
 Consider scst or lio (I personally do not have any experiences with lio but
 scst is running in our production environment for years now without any
 problems)

 regards
 Felix


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Cleanup over secondary node

2013-04-14 Thread Daniel Bareiro

Hi all!

I'm testing Pacemaker+Corosync cluster with KVM virtual machines. When
restarting a node, I got the following status:

# crm status

Last updated: Sun Apr 14 11:50:00 2013
Last change: Sun Apr 14 11:49:54 2013
Stack: openais
Current DC: daedalus - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
8 Resources configured.


Online: [ atlantis daedalus ]

 Resource Group: servicios
 fs_drbd_servicios  (ocf::heartbeat:Filesystem):Started daedalus
 clusterIP  (ocf::heartbeat:IPaddr2):   Started daedalus
 Mysql  (ocf::heartbeat:mysql): Started daedalus
 Apache (ocf::heartbeat:apache):Started daedalus
 Pure-FTPd  (ocf::heartbeat:Pure-FTPd): Started daedalus
 Asterisk   (ocf::heartbeat:asterisk):  Started daedalus
 Master/Slave Set: drbd_serviciosClone [drbd_servicios]
 Masters: [ daedalus ]
 Slaves: [ atlantis ]

Failed actions:
Asterisk_monitor_0 (node=atlantis, call=12, rc=5, status=complete): not 
installed


The problem is that if I do a cleanup of the Asterisk resource in the
secondary, this has no effect. It seems to be Paceemaker needs to have
access to the config file to the resource. But this is not available,
because it is mounted on the DRBD device that is accessible in the
primary:

Apr 14 11:58:06 atlantis cib: [1136]: info: apply_xml_diff: Digest mis-match: 
expected f6e4778e0ca9d8d681ba86acb83a6086, calculated 
ad03ff3e0622f60c78e8e1ece055bd63
Apr 14 11:58:06 atlantis cib: [1136]: notice: cib_process_diff: Diff 0.825.3 - 
0.825.4 not applied to 0.825.3: Failed application of an update diff
Apr 14 11:58:06 atlantis cib: [1136]: info: cib_server_process_diff: Requesting 
re-sync from peer
Apr 14 11:58:06 atlantis crmd: [1141]: info: delete_resource: Removing resource 
Asterisk for 3141_crm_resource (internal) on atlantis
Apr 14 11:58:06 atlantis crmd: [1141]: info: notify_deleted: Notifying 
3141_crm_resource on atlantis that Asterisk was deleted
Apr 14 11:58:06 atlantis crmd: [1141]: WARN: decode_transition_key: Bad UUID 
(crm-resource-3141) in sscanf result (3) for 0:0:crm-resource-3141
Apr 14 11:58:06 atlantis crmd: [1141]: info: ais_dispatch_message: Membership 
1616: quorum retained
Apr 14 11:58:06 atlantis lrmd: [1138]: info: rsc:Asterisk probe[13] (pid 3144)
Apr 14 11:58:06 atlantis asterisk[3144]: ERROR: Config 
/etc/asterisk/asterisk.conf doesn't exist
Apr 14 11:58:06 atlantis lrmd: [1138]: info: operation monitor[13] on Asterisk 
for client 1141: pid 3144 exited with return code 5
Apr 14 11:58:06 atlantis crmd: [1141]: info: process_lrm_event: LRM operation 
Asterisk_monitor_0 (call=13, rc=5, cib-update=40, confirmed=true) not installed


Is there any way to remedy this situation?


Thanks in advance for your reply.


Regards,
Daniel
-- 
Ing. Daniel Bareiro - GNU/Linux registered user #188.598
Proudly running Debian GNU/Linux with uptime:
11:46:23 up 49 days, 19:53, 12 users,  load average: 0.00, 0.01, 0.00


signature.asc
Description: Digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Disable startup fencing with cman

2013-04-14 Thread Pavlos Parissis
On 14/04/2013 10:47 πμ, Andreas Mock wrote:
 Hi all,
 
  
 
 in a two node cluster (RHEL6.x, cman, pacemaker)
 
 when I startup the very first node,
 
 this node will try to fence the other node if it can't see it.
 
 This can be true in case of maintenance. How do I avoid
 
 this startup fencing temporarily when I know that the
 
 other node is down?

Have you tried to standby the node? I don't know if it will work, just
sharing my idea here.


 
  
 
 Best regards
 
 Andreas
 
  
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-14 Thread Pavlos Parissis
On 12/04/2013 09:37 μμ, Pavlos Parissis wrote:
 Hoi,
 
 As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node
 cluster.
 
 Before the upgrade process both nodes are using CentOS 6.3, corosync
 1.4.1-7 and pacemaker-1.1.7.
 
 I followed the rolling upgrade process, so I stopped pacemaker and then
 corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades
 also pacemaker to 1.1.8-7 and corosync to 1.4.1-15.
 The upgrade of rpms went smoothly as I knew about the crmsh issue so I
 made sure I had crmsh rpm on my repos.
 
 Corosync started without any problems and both nodes could see each
 other[2]. But for some reason node2 failed to receive a reply on join
 offer from node1 and node1 never joined the cluster. Node1 formed a new
 cluster as it never got an reply from node2, so I ended up with a
 split-brain situation.
 
 Logs of node1 can be found here
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log
 and of node2 here
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log


Doing a Disconnect  Reattach upgrade of both nodes at the same time
brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to
join a cluster with a 1.1.7 failed.

Cheers,
Pavlos




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] 1.1.8 not compatible with 1.1.7?

2013-04-14 Thread Andrew Beekhof

On 15/04/2013, at 7:31 AM, Pavlos Parissis pavlos.paris...@gmail.com wrote:

 On 12/04/2013 09:37 μμ, Pavlos Parissis wrote:
 Hoi,
 
 As I wrote to another post[1] I failed to upgrade to 1.1.8 for a 2 node
 cluster.
 
 Before the upgrade process both nodes are using CentOS 6.3, corosync
 1.4.1-7 and pacemaker-1.1.7.
 
 I followed the rolling upgrade process, so I stopped pacemaker and then
 corosync on node1 and upgraded to CentOS 6.4. The OS upgrade upgrades
 also pacemaker to 1.1.8-7 and corosync to 1.4.1-15.
 The upgrade of rpms went smoothly as I knew about the crmsh issue so I
 made sure I had crmsh rpm on my repos.
 
 Corosync started without any problems and both nodes could see each
 other[2]. But for some reason node2 failed to receive a reply on join
 offer from node1 and node1 never joined the cluster. Node1 formed a new
 cluster as it never got an reply from node2, so I ended up with a
 split-brain situation.
 
 Logs of node1 can be found here
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node1.log
 and of node2 here
 https://dl.dropboxusercontent.com/u/1773878/pacemaker-issue/node2.log
 
 
 Doing a Disconnect  Reattach upgrade of both nodes at the same time
 brings me a working 1.1.8 cluster. Any attempt to make a 1.1.8 node to
 join a cluster with a 1.1.7 failed.

There wasn't enough detail in the logs to suggest a solution, but if you add 
the following to /etc/sysconfig/pacemaker and re-test, it might shed some 
additional light on the problem.

export PCMK_trace_functions=ais_dispatch_message

Certainly there was no intention to make them incompatible.

 
 Cheers,
 Pavlos
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Disable startup fencing with cman

2013-04-14 Thread Andrew Beekhof

On 14/04/2013, at 6:47 PM, Andreas Mock andreas.m...@web.de wrote:

 Hi all,
  
 in a two node cluster (RHEL6.x, cman, pacemaker)
 when I startup the very first node,
 this node will try to fence the other node if it can't see it.
 This can be true in case of maintenance. How do I avoid
 this startup fencing temporarily when I know that the
 other node is down?

Set the target-role for your fencing device(s) to Stopped and use stonith_admin 
--confirm
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] racing crm commands... last write wins?

2013-04-14 Thread Andrew Beekhof

On 12/04/2013, at 11:35 PM, Brian J. Murrell br...@interlinx.bc.ca wrote:

 On 13-04-10 07:02 PM, Andrew Beekhof wrote:
 
 On 11/04/2013, at 6:33 AM, Brian J. Murrell 
 brian-squohqy54cvwr29bmmi...@public.gmane.org wrote:
 
 Does crm_resource suffer from this problem
 
 no
 
 Excellent.
 
 I was unable to find any comprehensive documentation on just how to
 implement a pacemaker configuration solely with crm_resource and the
 manpage for it doesn't seem to indicate any way to create resources, for
 example.

Right, creation (and any other modifications of the config) is via cibadmin.
However that involves dealing with XML which most people have an aversion to, 
hence the common use of pcs and crmsh.

 
 Is it typical that when you don't want to use crm (or pcs) and want
 to rely on the crm_* group of commands, that you do so in conjunction
 with cibadmin for things like creating resources, etc.?

Yes.

  It seems so,
 but I just want to make sure there is not something I have not uncovered
 yet.
 
 Cheers,
 b.
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs equivalent of crm configure erase

2013-04-14 Thread Andrew Beekhof

On 14/04/2013, at 5:52 PM, Andreas Mock andreas.m...@web.de wrote:

 Hi all,
  
 can someone tell me what the pcs equivalent to
 crm configure erase is?
  
 Is there a pcs cheat sheet showing the common tasks?
 Or a documentation?

pcs help should be reasonably informative, but I don't see anything equivalent
Chris?

  
 Best regards
 Andreas
  
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Different value on cluster-infrastructure between 2 nodes

2013-04-14 Thread Andrew Beekhof

On 12/04/2013, at 11:10 PM, Pavlos Parissis pavlos.paris...@gmail.com wrote:

 Hi
 
 I am doing a rolling upgrade of pacemaker from CentOS 6.3 to 6.4 and
 when 1st node is upgraded and gets 1.1.8 version it doesn't join the
 cluster and I ended up with 2 clusters.
 
 In the logs of node1 I see
 cluster-infrastructure value=classic openais (with pluin)
 
 but node2(still in centos6.3 and pacemaker 1.1.7) it has
 cluster-infrastructure=openais

The string changed but they mean the same thing.

 
 I also see different dc-version between nodes.

Because both nodes are their own DC for some reason.

 
 Does anyone know if these could be the reason for node1 to not join the
 cluster and decides to make its own cluster?

No.  Its the side-effect, not the cause.

 
 corosync communication looks fine
 
 Printing ring status.
 Local node ID 484162314
 RING ID 0
id  = 10.187.219.28
status  = ring 0 active with no faults
 RING ID 1
id  = 192.168.1.2
status  = ring 1 active with no faults
 
 
 Cheers,
 Pavlos
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] attrd waits one second before doing update

2013-04-14 Thread Andrew Beekhof

On 12/04/2013, at 5:45 PM, Rainer Brestan rainer.bres...@gmx.net wrote:

 OK, and where is the difference between 1.1.8 and 1.1.7.

Prior to 1.1.8 the local node flushed its value immediately, which caused the 
CIB to be updated too soon (compared to the other nodes).
Since the whole point of attrd is to try and have them arrive at the same time, 
we changed this to be more consistent.

 I am currently testing this on a one node cluster, so attrd wait for the 
 message come back from himself.
 This cant take one second, or is attrd waiting this time anyhow to be sure to 
 get it from all nodes back?

There is no additional delay, the local node flushes its value as soon as the 
message comes back to itself (and therefor all other nodes too)

 Rainer
  
 Gesendet: Freitag, 12. April 2013 um 02:03 Uhr
 Von: Andrew Beekhof and...@beekhof.net
 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Betreff: Re: [Pacemaker] attrd waits one second before doing update
 
 On 12/04/2013, at 7:17 AM, Rainer Brestan rainer.bres...@gmx.net wrote:
 
  In pacemaker 1.1.7-6 with corosync 1.4.1-7 update of attributes works 
  almost online.
  Used with SysInfo resource agent and manual commands like attrd_updater -U 
  4 -n test.
 
  In the logfile there is one line
  attrd[...] notice: attrd_trigger_update: Sending flush up to all hosts for: 
  ...
  and a few milliseconds later
  attrd[...] notice: attrd_perform_update: Sent update ...
  with the same content.
 
  After upgrade to version 1.1.8-6 there is always nearly exact one second 
  between trigger and perform.
  2013-04-11T22:51:55.389+02:00 int2node2 attrd[28370] notice: notice: 
  attrd_trigger_update: Sending flush op to all hosts for: text (81)
  2013-04-11T22:51:56.397+02:00 int2node2 attrd[28370] notice: notice: 
  attrd_perform_update: Sent update 5814: text=81
 
  And what i found out having several updates running, they have a single 
  queue.
  All attrd_updater processes are waiting for the next to be finished, so 
  there cant be more than one update per second any more.
 
  Has this something to do with
  attrd: Have single-shot clients wait for an ack before disconnecting
  stated in the Changelog for 1.1.8 ?
 
 No, nothing at all.
 
 
  If yes, is it intended to have a single queue ?
 
 More like unavoidable, since we need to talk to the other nodes and messages 
 between them are ordered.
 
  And is this 1 second fixed ?
  From where does this 1 second come, i dont think that it takes one second 
  to get the ack.
 
 When the timer expires, attrd sends a cluster message to all nodes (including 
 itself) telling them to update the CIB with their current value.
 The delay comes from waiting for the cluster message we sent to arrive back 
 again before sending our own updates, this helps ensure all the updates 
 arrive in the CIB at almost the same time.
 
 
  This can run into heavy delays (and therefore timeouts) for monitor 
  functions of RA performing attribute updates.
 
  Rainer
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] RHEL6.x dependency between 2-node-settings for cman and quorum settings in pacemaker

2013-04-14 Thread Andrew Beekhof

On 12/04/2013, at 4:58 PM, Andreas Mock andreas.m...@web.de wrote:

 Hi all,
 
 another question rised up while reading documentation concerning
 2-node-cluster under RHEL6.x with CMAN and pacemaker.
 
 a) In the quick start guide one of the things you set is
 CMAN_QUORUM_TIMEOUT=0 in /etc/sysconfig/cman to get one
 node of the cluster up without waiting for quorum. (Correct
 me if my understanding is wrong)
 
 b) There is a special setting in cluster.conf
  cman two_node=1 expected_votes=1 
  /cman
 which allows one node to gain quorum in a two node cluster
 (Please also correct me here if my understanding is wrong)
 
 c) And there is a pacemaker setting
 no-quorum-policy which is mostly set to 'ignore' in all startup
 tutorials.
 
 My question: I would like to understand how these settings
 influence each other and/or are dependent.

a) allows service cman start to complete (and therefor allow service 
pacemaker start to begin) before quorum has arrived.
b) is a possible alternative to a) but I've never tested it because it is 
superseded by c) and in fact makes c) meaningless since the cluster always has 
quorum.

a+c is preferred for consistency with clusters of more than 2 nodes.

 
 As most insight as possible appreciated. ;-)
 
 Best regards
 Andreas
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Cleanup over secondary node

2013-04-14 Thread Andrew Beekhof

On 15/04/2013, at 1:01 AM, Daniel Bareiro daniel-lis...@gmx.net wrote:

 
 Hi all!
 
 I'm testing Pacemaker+Corosync cluster with KVM virtual machines. When
 restarting a node, I got the following status:
 
 # crm status
 
 Last updated: Sun Apr 14 11:50:00 2013
 Last change: Sun Apr 14 11:49:54 2013
 Stack: openais
 Current DC: daedalus - partition with quorum
 Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
 2 Nodes configured, 2 expected votes
 8 Resources configured.
 
 
 Online: [ atlantis daedalus ]
 
 Resource Group: servicios
 fs_drbd_servicios  (ocf::heartbeat:Filesystem):Started daedalus
 clusterIP  (ocf::heartbeat:IPaddr2):   Started daedalus
 Mysql  (ocf::heartbeat:mysql): Started daedalus
 Apache (ocf::heartbeat:apache):Started daedalus
 Pure-FTPd  (ocf::heartbeat:Pure-FTPd): Started daedalus
 Asterisk   (ocf::heartbeat:asterisk):  Started daedalus
 Master/Slave Set: drbd_serviciosClone [drbd_servicios]
 Masters: [ daedalus ]
 Slaves: [ atlantis ]
 
 Failed actions:
Asterisk_monitor_0 (node=atlantis, call=12, rc=5, status=complete): not 
 installed
 
 
 The problem is that if I do a cleanup of the Asterisk resource in the
 secondary, this has no effect. It seems to be Paceemaker needs to have
 access to the config file to the resource.

Not Pacemaker, the resource agent.
Pacemaker runs a non-recurring monitor operation to see what state the service 
is in, it seems the asterisk agent needs that config file.

I'd suggest changing the agent so that if the asterisk process is not running, 
the agent returns 7 (not running) before trying to access the config file.

 But this is not available,
 because it is mounted on the DRBD device that is accessible in the
 primary:
 
 Apr 14 11:58:06 atlantis cib: [1136]: info: apply_xml_diff: Digest mis-match: 
 expected f6e4778e0ca9d8d681ba86acb83a6086, calculated 
 ad03ff3e0622f60c78e8e1ece055bd63
 Apr 14 11:58:06 atlantis cib: [1136]: notice: cib_process_diff: Diff 0.825.3 
 - 0.825.4 not applied to 0.825.3: Failed application of an update diff
 Apr 14 11:58:06 atlantis cib: [1136]: info: cib_server_process_diff: 
 Requesting re-sync from peer
 Apr 14 11:58:06 atlantis crmd: [1141]: info: delete_resource: Removing 
 resource Asterisk for 3141_crm_resource (internal) on atlantis
 Apr 14 11:58:06 atlantis crmd: [1141]: info: notify_deleted: Notifying 
 3141_crm_resource on atlantis that Asterisk was deleted
 Apr 14 11:58:06 atlantis crmd: [1141]: WARN: decode_transition_key: Bad UUID 
 (crm-resource-3141) in sscanf result (3) for 0:0:crm-resource-3141
 Apr 14 11:58:06 atlantis crmd: [1141]: info: ais_dispatch_message: Membership 
 1616: quorum retained
 Apr 14 11:58:06 atlantis lrmd: [1138]: info: rsc:Asterisk probe[13] (pid 3144)
 Apr 14 11:58:06 atlantis asterisk[3144]: ERROR: Config 
 /etc/asterisk/asterisk.conf doesn't exist
 Apr 14 11:58:06 atlantis lrmd: [1138]: info: operation monitor[13] on 
 Asterisk for client 1141: pid 3144 exited with return code 5
 Apr 14 11:58:06 atlantis crmd: [1141]: info: process_lrm_event: LRM operation 
 Asterisk_monitor_0 (call=13, rc=5, cib-update=40, confirmed=true) not 
 installed
 
 
 Is there any way to remedy this situation?
 
 
 Thanks in advance for your reply.
 
 
 Regards,
 Daniel
 -- 
 Ing. Daniel Bareiro - GNU/Linux registered user #188.598
 Proudly running Debian GNU/Linux with uptime:
 11:46:23 up 49 days, 19:53, 12 users,  load average: 0.00, 0.01, 0.00
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to display interface link status in corosync

2013-04-14 Thread Yuichi SEINO
Hi,

2013/4/8 Andrew Beekhof and...@beekhof.net:
 I'm not 100% sure what the best approach is here.

 Traditionally this is done with resource agents (ie. ClusterMon or ping) 
 which update attrd.
 We could potentially build it into attrd directly, but then we'd need to 
 think about how to turn it on/off.

 I think I'd lean towards a new agent+daemon or a new daemon launched by 
 ClusterMon.
I check to see if I implement this function by a new agent+daemon.
I have a question. I am not sure how to launch daemon by ClusterMon.
Do you mean to use crm_mon -E?

Sincerely,
Yuichi


 On 04/04/2013, at 8:59 PM, Yuichi SEINO seino.clust...@gmail.com wrote:

 Hi All,

 I want to display interface link status in corosync. So, I think that
 I will add this function to the part of pacemakerd.
 I am going to display this status to Node Attributes  in crm_mon.
 When the state of link change, corosync can run the callback function.
 When it happens, we update attributes. And, this function need to
 start after attrd started. pacemakerd of mainloop start after
 sub-process started. So, I think that this is the best timing.

 I show the expected crm_mon.

 # crm_mon -fArc1
 Last updated: Thu Apr  4 08:08:08 2013
 Last change: Wed Apr  3 04:15:48 2013 via crmd on coro-n2
 Stack: corosync
 Current DC: coro-n1 (168427526) - partition with quorum
 Version: 1.1.9-c791037
 2 Nodes configured, unknown expected votes
 2 Resources configured.


 Online: [ coro-n1 coro-n2 ]

 Full list of resources:

 Clone Set: OFclone [openstack-fencing]
 Started: [ coro-n1 coro-n2 ]

 Node Attributes:
 * Node coro-n1:
+ ringnumber(0)   : 10.10.0.6 is FAULTY
+ ringnumber(1)   : 10.20.0.6 is UP
 * Node coro-n2:
+ ringnumber(0)   : 10.10.0.7 is FAULTY
+ ringnumber(1)   : 10.20.0.7 is UP

 Migration summary:
 * Node coro-n2:
 * Node coro-n1:

 Tickets:


 Sincerely,
 Yuichi


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org