Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-04 Thread Holger Teutsch
On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote:
> On 2011-03-03 10:43, Holger Teutsch wrote:
> > Hi,
> > I submit a patch for
> > "bugzilla 2541: Shell should warn if parameter uniqueness is violated"
> > for discussion.
> 
> I'll leave it do Dejan to review the code, but I love the functionality.
> Thanks a lot for tackling this. My only suggestion for an improvement is
> to make the warning message a bit more terse, as in:
> 
> WARNING: Resources ip1a, ip1b violate uniqueness for parameter "ip":
> "1.2.3.4"
> 

Florian,
I see your point. Although my formatting allows for an unlimited number
of collisions ( 8-) ) in real life we will only have 2 or 3. Will change
this together with Dejan's hints.

> Cheers,
> Florian
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Cluster proxy pacemaker-corosync in a BladeCenter

2011-03-04 Thread victor
I already had rode the Tim Serong's page. It's good.

If I don't start the cluster at boot time and I set stonith-action to poweroff 
instead of reboot, it's run if a service down or a node not respond but if the 
connection between nodes is down, power off each other also.
The problem is when the two nodes are live but not communication between them.




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Failing back a multi-state resource eg. DRBD

2011-03-04 Thread David McCurley
Are you wanting to move all the resources back or just that one resource?

I'm still learning, but one simple way I move all resources back from nodeb to 
nodea is like this:

# on nodeb
sudo crm node standby
# now services migrate to nodea
# still on nodeb
sudo crm node online

This may be a naive way to do it but it works for now :)

There is also a "crm resource migrate" to migrate individual resources.  For 
that, see here:

http://www.clusterlabs.org/doc/crm_cli.html


- Original Message -
> From: "Dominic Malolepszy" 
> To: pacemaker@oss.clusterlabs.org
> Sent: Thursday, March 3, 2011 12:18:51 AM
> Subject: [Pacemaker] Failing back a multi-state resource eg. DRBD
> Hi,
> 
> I'm trying to simulate various scenarios and what to do to correct the
> problem. I have a DRBD cluster as defined below; if the primary fails
> (ie power cycled drbd01.test), the secondary (drbd02.test) takes over
> successfully, so DRBD:master now runs on drbd02.test. When node
> drbd01.test comes back up, DRBD:master remains on drbd02.st (ie due to
> resource stickness); and drbd01.test simply becomes DRBD:Slave; this
> is what I want.
> 
> Now what command/s would I need to run to move the master back to
> drbd01.test, and make drbd02.test the new slave? The name of the
> multi-state resource is ms-drbd0, below is the config I am currently
> running.
> 
> 
> node drbd01.test \
> attributes standby="off"
> node drbd02.test \
> attributes standby="off"
> primitive drbd0 ocf:linbit:drbd \
> params drbd_resource="drbd0" \
> op monitor interval="60s" \
> op start interval="0" timeout="240s" \
> op promote interval="0" timeout="90s" start-delay="3s" \
> op demote interval="0" timeout="90s" start-delay="3s" \
> op notify interval="0" timeout="90s" \
> op stop interval="0" timeout="100s" \
> op monitor interval="10s" role="Master" timeout="20s" start-delay="5s"
> \
> op monitor interval="20s" role="Slave" timeout="20s" start-delay="5s"
> primitive fs0 ocf:heartbeat:Filesystem \
> params directory="/var/lib/pgsql/9.0/data" device="/dev/drbd0"
> fstype="ext3" \
> op start interval="0" timeout="60s" start-delay="1s" \
> op stop interval="0"
> primitive ip ocf:heartbeat:IPaddr \
> params ip="192.168.1.50" cidr_netmask="24" \
> op monitor interval="10s"
> primitive pgsql0 ocf:heartbeat:pgsql \
> params pgctl="/usr/pgsql-9.0/bin/pg_ctl" \
> params psql="/usr/pgsql-9.0/bin/psql" \
> params pgdata="/var/lib/pgsql/9.0/data" \
> op monitor interval="30s" timeout="30s" \
> op start interval="0" timeout="120s" start_delay="1s" \
> op stop interval="0" timeout="120s"
> primitive ping_gateway ocf:pacemaker:ping \
> params host_list="192.168.1.1" multiplier="1000" \
> op monitor interval="10s" timeout="60s" \
> op start interval="0" timeout="60s" \
> op stop interval="0" timeout="20s"
> ms ms-drbd0 drbd0 \
> meta master-max="1" master-node-max="1" notify="true"
> clone-node-max="1" clone-max="2"
> clone connectivity_check ping_gateway \
> meta globally-unique="false"
> location master-connected-node ms-drbd0 \
> rule $id="master-connected-node-rule" $role="master" -inf: not_defined
> pingd or pingd lte 0
> location primary_location ip 50: drbd01.test
> colocation fs0-with-drbd0 inf: fs0 ms-drbd0:Master
> colocation ip-with-pgsql0 inf: ip pgsql0
> colocation pgsql0-with-fs0 inf: pgsql0 fs0
> order fs0-after-drbd0 inf: ms-drbd0:promote fs0:start
> order ip-after-pgsql0 inf: pgsql0 ip
> order pgsql0-after-fs0 inf: fs0:start pgsql0
> property $id="cib-bootstrap-options" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
> 
> 
> Cheers,
> Dominic.
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] LSB smb

2011-03-04 Thread Juergen Hartmann
Hello together,
I try to create a group and the first resources in the group starting well but
only the smb didn't start and I haven't got a clue why. If someone stick me to
the problem it would be very helpful ...

primitive SG.appvg ocf:heartbeat:LVM params volgrpname="appvg" exclusive="yes" /
op monitor depth="0" timeout="30" interval="10" 
primitive SG.fs-home ocf:heartbeat:Filesystem params /
device="/dev/appvg/homelvol" directory="/home/users" fstype="ext3"
primitive SG.fs-opsp ocf:heartbeat:Filesystem params / 
device="/dev/appvg/opslvol" directory="/opspapp" fstype="ext3"
primitive SG.ip ocf:heartbeat:IPaddr2 params nic="eth0" ip="144.145.1.165" /
cidr_netmask="21" op monitor interval="30s"
primitive SG.mailto ocf:heartbeat:MailTo params email=x...@bla.com /
subject=SG_switched op monitor interval=120s timeout=60s
primitive SG.ClusterMonitor ocf:pacemaker:ClusterMon params /
htmlfile="/opspapp/cluster-monitor.html" params /
pidfile="/var/run/rlb-cluster-monitor.pid" op start interval="0" timeout="90s" /
op stop interval="0" timeout="100s" meta target-role="Started" 
primitive SG.samba lsb:smb

group SG SG.appvg SG.fs-home SG.fs-opsp SG.ip SG.mailto SG.ClusterMonitor 
SG.samba

location SG.loc SG inf: server1

ERROR :
SG.samba_monitor_0 (node=server1, call=112, rc=6, status=complete): / 
not configured

Thanks in advance !

Juergen


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] stonith in pacemaker clarification

2011-03-04 Thread Pentarh Udi
This is the log. I called "crm node fence node4" and this restarted node4,
then shut down node3, then lost quorum and information about all running
resources (crm status printed no resources configures while there are 6
groups configured).

-

Mar 04 12:36:40 node1 crmd: [2717]: info: abort_transition_graph:
te_update_diff:146 - Triggered transition abort (complete=1,
tag=transient_attributes, id=node4, magic=NA, cib=0.633.101) : Transient
attribute: update
Mar 04 12:36:40 node1 crmd: [2717]: info: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Mar 04 12:36:40 node1 crmd: [2717]: info: do_state_transition: All 4 cluster
nodes are eligible to run resources.
Mar 04 12:36:40 node1 crmd: [2717]: info: do_pe_invoke: Query 1938:
Requesting the current CIB: S_POLICY_ENGINE
Mar 04 12:36:40 node1 crmd: [2717]: info: do_pe_invoke_callback: Invoking
the PE: query=1938, ref=pe_calc-dc-1299260200-1188, seq=400, quorate=1
Mar 04 12:36:40 node1 pengine: [2716]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Mar 04 12:36:40 node1 pengine: [2716]: info: unpack_config: Node scores:
'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Mar 04 12:36:40 node1 pengine: [2716]: WARN: pe_fence_node: Node node4 will
be fenced because termination was requested
Mar 04 12:36:40 node1 pengine: [2716]: WARN: determine_online_status: Node
node4 is unclean
Mar 04 12:36:40 node1 pengine: [2716]: info: determine_online_status: Node
node2 is online

Mar 04 12:36:40 node1 pengine: [2716]: WARN: stage6: Scheduling Node node4
for STONITH
Mar 04 12:36:40 node1 pengine: [2716]: info: native_stop_constraints:
st-node3_stop_0 is implicit after node4 is fenced

Mar 04 12:36:40 node1 pengine: [2716]: notice: LogActions: Leave resource
st-node4 (Started node3)
Mar 04 12:36:40 node1 pengine: [2716]: notice: LogActions: Move resource
st-node3  (Started node4 -> node2)
Mar 04 12:36:40 node1 crmd: [2717]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Mar 04 12:36:40 node1 crmd: [2717]: info: unpack_graph: Unpacked transition
471: 6 actions in 6 synapses
Mar 04 12:36:40 node1 crmd: [2717]: info: do_te_invoke: Processing graph 471
(ref=pe_calc-dc-1299260200-1188) derived from /var/lib/pengine/pe-warn-7.bz2
Mar 04 12:36:40 node1 crmd: [2717]: info: te_pseudo_action: Pseudo action 87
fired and confirmed
Mar 04 12:36:40 node1 crmd: [2717]: info: te_rsc_command: Initiating action
88: start st-node3_start_0 on node2
Mar 04 12:36:40 node1 pengine: [2716]: WARN: process_pe_message: Transition
471: WARNINGs found during PE processing. PEngine Input stored in:
/var/lib/pengine/pe-warn-7.bz2
Mar 04 12:36:40 node1 pengine: [2716]: info: process_pe_message:
Configuration WARNINGs found during PE processing.  Please run "crm_verify
-L" to identify issues.
Mar 04 12:36:40 node1 crmd: [2717]: info: match_graph_event: Action
st-node3_start_0 (88) confirmed on node2 (rc=0)
Mar 04 12:36:40 node1 crmd: [2717]: info: te_pseudo_action: Pseudo action 89
fired and confirmed
Mar 04 12:36:40 node1 crmd: [2717]: info: te_fence_node: Executing reboot
fencing operation (91) on node4 (timeout=6)
Mar 04 12:36:40 node1 stonithd: [2712]: info: client tengine [pid: 2717]
requests a STONITHoperation RESET on node node4
...
Mar 04 12:36:40 node1 stonithd: [2712]: info: we can't manage node4,
broadcast request to other nodes
Mar 04 12:36:40 node1 stonithd: [2712]: info: Broadcasting the message
succeeded: require others to stonith node node4.
Mar 04 12:36:43 node1 cib: [2713]: info: ais_dispatch: Membership 404:
quorum retained
Mar 04 12:36:43 node1 cib: [2713]: info: crm_update_peer: Node node4:
id=67152064 state=lost (new) addr=r(0) ip(192.168.0.4)  votes=1 born=388
seen=400 proc=00013312
Mar 04 12:36:43 node1 crmd: [2717]: info: ais_dispatch: Membership 404:
quorum retained
Mar 04 12:36:43 node1 crmd: [2717]: info: ais_status_callback: status: node4
is now lost (was member)
Mar 04 12:36:43 node1 crmd: [2717]: info: crm_update_peer: Node node4:
id=67152064 state=lost (new) addr=r(0) ip(192.168.0.4)  votes=1 born=388
seen=400 proc=00013312
Mar 04 12:36:43 node1 crmd: [2717]: info: erase_node_from_join: Removed node
node4 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1
Mar 04 12:36:43 node1 cib: [2713]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/1939,
version=0.633.102): ok (rc=0)
Mar 04 12:36:43 node1 crmd: [2717]: info: crm_ais_dispatch: Setting expected
votes to 4
Mar 04 12:36:43 node1 cib: [2713]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/1942,
version=0.633.103): ok (rc=0)
Mar 04 12:36:46 node1 stonithd: [2712]: info: Succeeded to STONITH the node
node4: optype=RESET. whodoit