[ClusterLabs] Change permissions from command line

2016-10-26 Thread Auer, Jens
Hi,

is it possible to change user permissions from the command line? I am currently 
changing them via the web interface, but I have to write a manual and just 
pasting commands is easier than showing screenshots. I know pcs can edit acl 
settings, but I don't know how to change permissions.

I found the permissions in /var/lib/pcsd/pcs_settings.conf. Is it ok to edit 
this file when the cluster is stopped and then just restart the cluster?

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] kind=Optional order constraint not working at startup

2016-09-23 Thread Auer, Jens
Hi,

> But if A can tolerate outage of B, why does it matter whether A is started 
> before or
> after B? By the same logic it should be able to reconnect once B is up? At 
> least that
> is what I'd expect.
In our case B is the file system resource that stores the configuration file 
for resource A. 
Resource A is a cloned resource that is started on both servers in our cluster. 
On the active
node, A should read the config file from the shared file system. On the passive 
node it
reads a default file. After that the config file is not read anymore and thus 
the shared filesystem can
go down and up again without disturbing the other resource.

After moving the filesystem to the passive node for failover, the process 
updates itself by reading the
configuration from the now new ini file. This requires that the shared 
filesystem is started on the node,
but I don't want to restart the process for internal reasons.

I could start the processes before the shared filesystem is started and then 
always re-sync. However this
will confuse the users because they don't expect this to happen.

In the end we probably will not go with cloned resources and just start them 
cleanly after the shared filesystem
is started on a node. This is much simpler and will solve the ordering problems 
here. It should also be possible
to put everything in a group as they are additionally co-located.

Cheers,
  Jens
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] kind=Optional order constraint not working at startup

2016-09-22 Thread Auer, Jens
Hi,

> >> shared_fs has to wait for the DRBD promotion, but the other resources
> >> have no such limitation, so they are free to start before shared_fs.
> > Isn't there an implicit limitation by the ordering constraint? I have
> > drbd_promote < shared_fs < snmpAgent-clone, and I would expect this to be a
> transitive relationship.
> 
> Yes, but shared fs < snmpAgent-Clone is optional, so snmpAgent-Clone is free 
> to
> start without it.
I was probably confused by the description in the manual. It says that
"* Optional - Only applies if both resources are starting and/or stopping." 
(from 
RedHat HA documentation). I assumed that this means e.g.
that when all resources are started when I start the cluster the constraint 
holds. 

> > What is the meaning of "transition"? Is there any way I can force resource 
> > actions
> into transitions?
> 
> A transition is simply the cluster's response to the current cluster state, 
> as directed
> by the configuration. The easiest way to think of it is as the "steps" as 
> described
> above.
> 
> If the configuration says a service should be running, but the service is not 
> currently
> running, then the cluster will schedule a start action (if possible 
> considering
> constraints, etc.). All such actions that may be scheduled together at one 
> time is a
> "transition".
> 
> You can't really control transitions; you can only control the configuration, 
> and
> transitions result from configuration+state.
> 
> The only way to force actions to take place in a certain order is to use 
> mandatory
> constraints.
> 
> The problem here is that you want the constraint to be mandatory only at 
> "start-
> up". But there really is no such thing. Consider the case where the cluster 
> stays up,
> and for whatever maintenance purpose, you stop all the resources, then start 
> them
> again later. Is that the same as start-up or not? What if you restart all but 
> one
> resource?
I think start-up is just a special case of what I think is a dependency for 
starting a resource. 
My current understanding is that a mandatory constraint means "If you 
start/stop resource A then you 
have to start/stop resource B". An optional  constraint says that the 
constraint only holds when
you start/stop two resources together in a single transition. What I want to 
express is more like
a dependency "don't start resource A before resource B has been started at all. 
State changes of resource B 
should not impact resource A". I realize this is kind of odd, but if A can 
tolerate outages of its dependency B,
e.g. reconnect, this makes sense. In principle this is what an optional 
constraint does, but not restricted
to a single transition.

> I can imagine one possible (but convoluted) way to do something like this, 
> using
> node attributes and rules:
> 
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-
> single/Pacemaker_Explained/index.html#idm140521751827232
> 
> With a rule, you can specify a location constraint that applies, not to a 
> particular
> node, but to any node with a particular value of a particular node attribute.
> 
> You would need a custom resource agent that sets a node attribute. Let's say 
> it
> takes three parameters, the node attribute name, the value to set when 
> starting (or
> do nothing), and the value to set when stopping (or do nothing). (That might
> actually be a good idea for a new ocf:pacemaker:
> agent.)
> 
> You'd have an instance of this resource grouped with shared-fs, that would 
> set the
> attribute to some magic value when started (say, "1").
> You'd have another instance grouped with snmpAgent-clone that would set it
> differently when stopped ("0"). Then, you'd have a location constraint for
> snmpAgent-clone with a rule that says it is only allowed on nodes with the 
> attribute
> set to "1".
> 
> With that, snmpAgent-clone would be unable to start until shared-fs had 
> started at
> least once. shared-fs could stop without affecting snmpAgent-clone. If 
> snmpAgent-
> clone stopped, it would reset, so it would require shared-fs again.
> 
> I haven't thought through all possible scenarios, but I think it would give 
> the
> behavior you want.
That sounds interesting... I think we explore a solution which could accept 
restarting our resources.
We only used the cloned resource set because we want our processes up and 
running to
minimize outage when doing a failover. Currently, the second server is a 
passive backup
which has everything up and running ready to take over. After the fs switches, 
it resynchs 
and then is ready to go. We probably can accept the additional timeout for 
starting the resources
completely, but we have to explore this.

Thanks,

  Jens


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 

Re: [ClusterLabs] kind=Optional order constraint not working at startup

2016-09-21 Thread Auer, Jens
reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Ken Gaillot [kgail...@redhat.com]
Gesendet: Mittwoch, 21. September 2016 16:30
An: users@clusterlabs.org
Betreff: Re: [ClusterLabs] kind=Optional order constraint not working at startup

On 09/21/2016 09:00 AM, Auer, Jens wrote:
> Hi,
>
> could this be issue 5039 (http://bugs.clusterlabs.org/show_bug.cgi?id=5039)? 
> It sounds similar.

Correct -- "Optional" means honor the constraint only if both resources
are starting *in the same transition*.

shared_fs has to wait for the DRBD promotion, but the other resources
have no such limitation, so they are free to start before shared_fs.

The problem is "... only impacts the startup procedure". Pacemaker
doesn't distinguish start-up from any other state of the cluster. Nodes
(and entire partitions of nodes) can come and go at any time, and any or
all resources can be stopped and started again at any time, so
"start-up" is not really as meaningful as it sounds.

Maybe try an optional constraint of the other resources on the DRBD
promotion. That would make it more likely that all the resources end up
starting in the same transition.

> Cheers,
>   Jens
>
> --
> Jens Auer | CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> jens.a...@cgi.com
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
> de.cgi.com/pflichtangaben.
>
>
> 
> Von: Auer, Jens [jens.a...@cgi.com]
> Gesendet: Mittwoch, 21. September 2016 15:10
> An: users@clusterlabs.org
> Betreff: [ClusterLabs] kind=Optional order constraint not working at startup
>
> Hi,
>
> in my cluster setup I have a couple of resources from which I need to start 
> some in specific order. Basically I have two cloned resources that should 
> start after mounting a DRBD filesystem on all nodes plus one resource that 
> start after the clone sets. It is important that this only impacts the 
> startup procedure. Once the system is running stopping or starting one of the 
> clone resources should not impact the other resource's state. From reading 
> the manual, this should be what a local constraint with kind=Optional 
> implements. However, when I start the cluster the filesystem is started after 
> the otehr resources ignoring the ordering constraint.
>
> My cluster configuration:
> pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 
> MDA1PFP-PCS02,MDA1PFP-S02
> pcs cluster start --all
> sleep 5
> crm_attribute --type nodes --node MDA1PFP-PCS01 --name ServerRole --update 
> PRIME
> crm_attribute --type nodes --node MDA1PFP-PCS02 --name ServerRole --update 
> BACKUP
> pcs property set stonith-enabled=false
> pcs resource defaults resource-stickiness=100
>
> rm -f mda; pcs cluster cib mda
> pcs -f mda property set no-quorum-policy=ignore
>
> pcs -f mda resource create mda-ip ocf:heartbeat:IPaddr2 ip=192.168.120.20 
> cidr_netmask=24 nic=bond0 op monitor interval=1s
> pcs -f mda constraint location mda-ip prefers MDA1PFP-PCS01=50
> pcs -f mda resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 
> host_list=pf-pep-dev-1  params timeout=1 attempts=3  op monitor interval=1 
> --clone
> pcs -f mda constraint location mda-ip rule score=-INFINITY pingd lt 1 or 
> not_defined pingd
>
> pcs -f mda resource create ACTIVE ocf:heartbeat:dummy
> pcs -f mda constraint colocation add ACTIVE with mda-ip score=INFINITY
>
> pcs -f mda resource create drbd1 ocf:linbit:drbd drbd_resource=shared_fs op 
> monitor interval=60s
> pcs -f mda resource master drbd1_sync drbd1 master-max=1 master-node-max=1 
> clone-max=2 clone-node-max=1 notify=true
> pcs -f mda constraint colocation add master drbd1_sync with mda-ip 
> score=INFINITY
>
> pcs -f mda resource create shared_fs Filesystem device="/dev/drbd1" 
> directory=/shared_fs fstype="xfs"
> pcs -f mda constraint order promote drbd1_sync then start shared_fs
> pcs -f mda constraint colocation add shared_fs with master drbd1_sync 
> score=INFINITY
>
> pcs -f mda resource create supervisor ocf:pfpep:supervisor params 
> config="/shared_fs/pfpep.ini" --clone
> pcs -f mda resource create snmpAgent ocf:pfpep:snmpAgent params 
> config="/shared_fs/pfpep.ini" --clone
> pcs -f mda resource create clusterSwitchNotification ocf:pfpep:clusterSwitch 
> params config="/shared_fs/pfpep.ini"
>
> pcs -f mda constraint order start shared_fs then snmpAgent-clo

Re: [ClusterLabs] kind=Optional order constraint not working at startup

2016-09-21 Thread Auer, Jens
Hi,

could this be issue 5039 (http://bugs.clusterlabs.org/show_bug.cgi?id=5039)? It 
sounds similar.

Cheers,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.



Von: Auer, Jens [jens.a...@cgi.com]
Gesendet: Mittwoch, 21. September 2016 15:10
An: users@clusterlabs.org
Betreff: [ClusterLabs] kind=Optional order constraint not working at startup

Hi,

in my cluster setup I have a couple of resources from which I need to start 
some in specific order. Basically I have two cloned resources that should start 
after mounting a DRBD filesystem on all nodes plus one resource that start 
after the clone sets. It is important that this only impacts the startup 
procedure. Once the system is running stopping or starting one of the clone 
resources should not impact the other resource's state. From reading the 
manual, this should be what a local constraint with kind=Optional implements. 
However, when I start the cluster the filesystem is started after the otehr 
resources ignoring the ordering constraint.

My cluster configuration:
pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 
MDA1PFP-PCS02,MDA1PFP-S02
pcs cluster start --all
sleep 5
crm_attribute --type nodes --node MDA1PFP-PCS01 --name ServerRole --update PRIME
crm_attribute --type nodes --node MDA1PFP-PCS02 --name ServerRole --update 
BACKUP
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100

rm -f mda; pcs cluster cib mda
pcs -f mda property set no-quorum-policy=ignore

pcs -f mda resource create mda-ip ocf:heartbeat:IPaddr2 ip=192.168.120.20 
cidr_netmask=24 nic=bond0 op monitor interval=1s
pcs -f mda constraint location mda-ip prefers MDA1PFP-PCS01=50
pcs -f mda resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 
host_list=pf-pep-dev-1  params timeout=1 attempts=3  op monitor interval=1 
--clone
pcs -f mda constraint location mda-ip rule score=-INFINITY pingd lt 1 or 
not_defined pingd

pcs -f mda resource create ACTIVE ocf:heartbeat:dummy
pcs -f mda constraint colocation add ACTIVE with mda-ip score=INFINITY

pcs -f mda resource create drbd1 ocf:linbit:drbd drbd_resource=shared_fs op 
monitor interval=60s
pcs -f mda resource master drbd1_sync drbd1 master-max=1 master-node-max=1 
clone-max=2 clone-node-max=1 notify=true
pcs -f mda constraint colocation add master drbd1_sync with mda-ip 
score=INFINITY

pcs -f mda resource create shared_fs Filesystem device="/dev/drbd1" 
directory=/shared_fs fstype="xfs"
pcs -f mda constraint order promote drbd1_sync then start shared_fs
pcs -f mda constraint colocation add shared_fs with master drbd1_sync 
score=INFINITY

pcs -f mda resource create supervisor ocf:pfpep:supervisor params 
config="/shared_fs/pfpep.ini" --clone
pcs -f mda resource create snmpAgent ocf:pfpep:snmpAgent params 
config="/shared_fs/pfpep.ini" --clone
pcs -f mda resource create clusterSwitchNotification ocf:pfpep:clusterSwitch 
params config="/shared_fs/pfpep.ini"

pcs -f mda constraint order start shared_fs then snmpAgent-clone  kind=Optional
pcs -f mda constraint order start shared_fs then supervisor-clone kind=Optional
pcs -f mda constraint order start snmpAgent-clone then supervisor-clone 
kind=Optional
pcs -f mda constraint order start supervisor-clone then 
clusterSwitchNotification kind=Optional
pcs -f mda constraint colocation add clusterSwitchNotification with shared_fs 
score=INFINITY

pcs cluster cib-push mda

The order of resource startup in the log file is:
Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation snmpAgent_start_0: 
ok (node=MDA1PFP-PCS01, call=40, rc=0, cib-update=82, confirmed=true)
Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation drbd1_start_0: ok 
(node=MDA1PFP-PCS01, call=39, rc=0, cib-update=83, confirmed=true)
Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation ping_start_0: ok 
(node=MDA1PFP-PCS01, call=38, rc=0, cib-update=85, confirmed=true)
Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation supervisor_start_0: 
ok (node=MDA1PFP-PCS01, call=45, rc=0, cib-update=88, confirmed=true)
Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation ACTIVE_start_0: ok 
(node=MDA1PFP-PCS01, call=48, rc=0, cib-update=94, confirmed=true)
Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation mda-ip_start_0: ok 
(node=MDA1PFP-PCS01, call=47, rc=0, cib-update=96, confirmed=true)
Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation 
clusterSwitchNotification_start_0: ok (node=MDA1PFP-PCS01, call=50, rc=0, 
cib-update=98, confirmed=true)
Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation shared_fs_start_0: 
ok (node=MDA1PFP-PCS01, call=57, rc=0, cib-update=101, confirmed=true)

Why is the shared file system s

[ClusterLabs] kind=Optional order constraint not working at startup

2016-09-21 Thread Auer, Jens
Hi,

in my cluster setup I have a couple of resources from which I need to start 
some in specific order. Basically I have two cloned resources that should start 
after mounting a DRBD filesystem on all nodes plus one resource that start 
after the clone sets. It is important that this only impacts the startup 
procedure. Once the system is running stopping or starting one of the clone 
resources should not impact the other resource's state. From reading the 
manual, this should be what a local constraint with kind=Optional implements. 
However, when I start the cluster the filesystem is started after the otehr 
resources ignoring the ordering constraint.

My cluster configuration:
pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 
MDA1PFP-PCS02,MDA1PFP-S02
pcs cluster start --all
sleep 5
crm_attribute --type nodes --node MDA1PFP-PCS01 --name ServerRole --update PRIME
crm_attribute --type nodes --node MDA1PFP-PCS02 --name ServerRole --update 
BACKUP
pcs property set stonith-enabled=false
pcs resource defaults resource-stickiness=100

rm -f mda; pcs cluster cib mda
pcs -f mda property set no-quorum-policy=ignore

pcs -f mda resource create mda-ip ocf:heartbeat:IPaddr2 ip=192.168.120.20 
cidr_netmask=24 nic=bond0 op monitor interval=1s
pcs -f mda constraint location mda-ip prefers MDA1PFP-PCS01=50
pcs -f mda resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 
host_list=pf-pep-dev-1  params timeout=1 attempts=3  op monitor interval=1 
--clone
pcs -f mda constraint location mda-ip rule score=-INFINITY pingd lt 1 or 
not_defined pingd

pcs -f mda resource create ACTIVE ocf:heartbeat:dummy
pcs -f mda constraint colocation add ACTIVE with mda-ip score=INFINITY

pcs -f mda resource create drbd1 ocf:linbit:drbd drbd_resource=shared_fs op 
monitor interval=60s
pcs -f mda resource master drbd1_sync drbd1 master-max=1 master-node-max=1 
clone-max=2 clone-node-max=1 notify=true
pcs -f mda constraint colocation add master drbd1_sync with mda-ip 
score=INFINITY

pcs -f mda resource create shared_fs Filesystem device="/dev/drbd1" 
directory=/shared_fs fstype="xfs"
pcs -f mda constraint order promote drbd1_sync then start shared_fs
pcs -f mda constraint colocation add shared_fs with master drbd1_sync 
score=INFINITY 

pcs -f mda resource create supervisor ocf:pfpep:supervisor params 
config="/shared_fs/pfpep.ini" --clone 
pcs -f mda resource create snmpAgent ocf:pfpep:snmpAgent params 
config="/shared_fs/pfpep.ini" --clone
pcs -f mda resource create clusterSwitchNotification ocf:pfpep:clusterSwitch 
params config="/shared_fs/pfpep.ini"

pcs -f mda constraint order start shared_fs then snmpAgent-clone  kind=Optional
pcs -f mda constraint order start shared_fs then supervisor-clone kind=Optional
pcs -f mda constraint order start snmpAgent-clone then supervisor-clone 
kind=Optional
pcs -f mda constraint order start supervisor-clone then 
clusterSwitchNotification kind=Optional
pcs -f mda constraint colocation add clusterSwitchNotification with shared_fs 
score=INFINITY

pcs cluster cib-push mda

The order of resource startup in the log file is:
Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation snmpAgent_start_0: 
ok (node=MDA1PFP-PCS01, call=40, rc=0, cib-update=82, confirmed=true)
Sep 21 13:01:21 MDA1PFP-S01 crmd[2760]:  notice: Operation drbd1_start_0: ok 
(node=MDA1PFP-PCS01, call=39, rc=0, cib-update=83, confirmed=true)
Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation ping_start_0: ok 
(node=MDA1PFP-PCS01, call=38, rc=0, cib-update=85, confirmed=true)
Sep 21 13:01:23 MDA1PFP-S01 crmd[2760]:  notice: Operation supervisor_start_0: 
ok (node=MDA1PFP-PCS01, call=45, rc=0, cib-update=88, confirmed=true)
Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation ACTIVE_start_0: ok 
(node=MDA1PFP-PCS01, call=48, rc=0, cib-update=94, confirmed=true)
Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation mda-ip_start_0: ok 
(node=MDA1PFP-PCS01, call=47, rc=0, cib-update=96, confirmed=true)
Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation 
clusterSwitchNotification_start_0: ok (node=MDA1PFP-PCS01, call=50, rc=0, 
cib-update=98, confirmed=true)
Sep 21 13:01:28 MDA1PFP-S01 crmd[2760]:  notice: Operation shared_fs_start_0: 
ok (node=MDA1PFP-PCS01, call=57, rc=0, cib-update=101, confirmed=true)

Why is the shared file system started after the other resources?

Best wishes,
  Jens


























___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi,

>> I've decided to create two answers for the two problems. The cluster
>> still fails to relocate the resource after unloading the modules even
>> with resource-agents 3.9.7
> From the point of view of the resource agent,
> you configured it to use a non-existing network.
> Which it considers to be a configuration error,
> which is treated by pacemaker as
> "don't try to restart anywhere
> but let someone else configure it properly, first".
> Still, I have yet to see what scenario you are trying to test here.
> To me, this still looks like "scenario evil admin".  If so, I'd not even
> try, at least not on the pacemaker configuration level.
It's not evil admin as this would not make sense. I am trying to find a way to 
force a failover condition e.g. by simulating a network card defect or network 
outage without running to the server room every time. 

> CONFIDENTIALITY NOTICE:
> Oh please :-/
> This is a public mailing list.
Sorry, this is a standard disclaimer I usually remove. We are forced to add 
this to e-mails, but I think this is fairly common for commercial companies.

>> Also the netmask and the ip address are wrong. I have configured the
>> device to 192.168.120.10 with netmask 192.168.120.10. How does IpAddr2
>> get the wrong configuration? I have no idea.
>A netmask of "192.168.120.10" is nonsense.
>That is the address, not a mask.
Oops, my fault when writing the e-mail. Obviously this is the address. The 
configured netmask for the device is 255.255.255.0, but after IPaddr2 brings it 
up again it is 255.255.255.255 which is not what I configured in the betwork 
configuration. 

> Also, according to some posts back,
> you have configured it in pacemaker with
> cidr_netmask=32, which is not particularly useful either.
Thanks for pointing this out. I copied the parameters from the manual/tutorial, 
but did not think about the values.

> Again: the IPaddr2 resource agent is supposed to control the assignment
> of an IP address, hence the name.
> It is not supposed to create or destroy network interfaces,
> or configure bonding, or bridges, or anything like that.
> In fact, it is not even supposed to bring up or down the interfaces,
> even though for "convenience" it seems to do "ip link set up".
This is what made me wonder in the beginning. When I bring down the device, 
this leads to a failure of the resource agent which is exactly what I expected. 
I did not expect it to bring the device up  again, and definitetly not ignoring 
the default network configuration.

> Monitoring connectivity, or dealing with removed interface drivers,
> or unplugged devices, or whatnot, has to be dealt with elsewhere.
I am using a ping daemon for that. 

> What you did is: down the bond, remove all slave assignments, even
> remove the driver, and expect the resource agent to "heal" things that
> it does not know about. It can not.
I am not expecting the RA to heal anything. How could it? And why would I 
expect it? In fact I am expecting the opposite that is a consistent failure 
when the device is down. This may be also wrong because you can assign ip 
addresses to downed devices.

My initial expectation was that the resource cannot be started when the device 
is down and then is relocated. I think this more or less the core functionality 
of the cluster. I can see a reason why it does not switch to another node when 
there is a configuration error in the cluster because it is fair to assume that 
the configuration is identical (wrong) on all nodes. But what happens if the 
network device is broken? The server would start, fail to assign the ip address 
and then prevent the whole cluster from working? What happens if the network 
card breaks while the cluster is running? 

Best wishes,
  Jens

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-20 Thread Auer, Jens
om 
the membership list
Sep 20 12:08:04 MDA1PFP-S02 attrd[2351]:  notice: Purged 1 peers with id=1 
and/or uname=MDA1PFP-PCS01 from the membership cache
Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]:  notice: crm_update_peer_proc: 
Node MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]:  notice: Removing MDA1PFP-PCS01/1 
from the membership list
Sep 20 12:08:04 MDA1PFP-S02 stonith-ng[2349]:  notice: Purged 1 peers with id=1 
and/or uname=MDA1PFP-PCS01 from the membership cache
Sep 20 12:08:04 MDA1PFP-S02 cib[2348]:  notice: crm_update_peer_proc: Node 
MDA1PFP-PCS01[1] - state is now lost (was member)
Sep 20 12:08:04 MDA1PFP-S02 cib[2348]:  notice: Removing MDA1PFP-PCS01/1 from 
the membership list
Sep 20 12:08:04 MDA1PFP-S02 cib[2348]:  notice: Purged 1 peers with id=1 and/or 
uname=MDA1PFP-PCS01 from the membership cache
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]: warning: FSA: Input I_ELECTION_DC from 
do_election_check() received in state S_INTEGRATION
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Notifications disabled
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:   error: pcmkRegisterNode: Triggered 
assert at xml.c:594 : node->type == XML_ELEMENT_NODE
Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]:  notice: Demote  drbd1:0 (Master 
-> Slave MDA1PFP-PCS02)
Sep 20 12:08:04 MDA1PFP-S02 pengine[2353]:  notice: Calculated Transition 0: 
/var/lib/pacemaker/pengine/pe-input-1813.bz2
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Initiating action 55: notify 
drbd1_pre_notify_demote_0 on MDA1PFP-PCS02 (local)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Operation drbd1_notify_0: ok 
(node=MDA1PFP-PCS02, call=39, rc=0, cib-update=0, confirmed=true)
Sep 20 12:08:04 MDA1PFP-S02 crmd[2354]:  notice: Initiating action 18: demote 
drbd1_demote_0 on MDA1PFP-PCS02 (local)

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Ken Gaillot [kgail...@redhat.com]
Gesendet: Montag, 19. September 2016 17:27
An: Auer, Jens; Cluster Labs - All topics related to open-source clustering 
welcomed
Betreff: Re: AW: [ClusterLabs] No DRBD resource promoted to master in 
Active/Passive setup

On 09/19/2016 09:48 AM, Auer, Jens wrote:
> Hi,
>
>> Is the network interface being taken down here used for corosync
>> communication? If so, that is a node-level failure, and pacemaker will
>> fence.
>
> We have different connections on each server:
> - A bonded 10GB network card for data traffic that will be accessed via a 
> virtual ip managed by pacemaker in 192.168.120.1/24. In the cluster nodes 
> MDA1PFP-S01 and MDA1PFP-S02 are assigned to 192.168.120.10 and 192.168.120.11.
>
> - A dedicated back-to-back connection for corosync heartbeats in 
> 192.168.121.1/24. MDA1PFP-PCS01 and MDA1PFP-S02 are assigned to 
> 192.168.121.10 and 192.168.121.11. When the cluster is created, we use these 
> as primary node names and use the 10GB device as a second backup connection 
> for increased reliability: pcs cluster setup --name MDA1PFP 
> MDA1PFP-PCS01,MDA1PFP-S01 MDA1PFP-PCS02,MDA1PFP-S02
>
> - A dedicated back-to-back connection for drbd in 192.168.122.1/24. Hosts 
> MDA1PFP-DRBD01 and MDA1PFP-DRBD02 are assigned 192.168.23.10 and 
> 192.168.123.11.

Ah, nice.

> Given that I think it is not a node-level failure. pcs status also reports 
> the nodes as online. I think this should not trigger fencing from pacemaker.
>
>> When DRBD is configured with 'fencing resource-only' and 'fence-peer
>> "/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
>> it will try to add a constraint that prevents the other node from
>> becoming master. It removes the constraint when connectivity is restored.
>
>> I am not familiar with all the under-the-hood details, but IIUC, if
>> pacemaker actually fences the node, then the other node can still take
>> over the DRBD. But if there is a network outage and no pacemaker
>> fencing, then you'll see the behavior you describe -- DRBD prevents
>> m

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
Hi,

one thing to add is that everything works as expected when I physically unplug 
the network cables to force a failover. 

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Auer, Jens [jens.a...@cgi.com]
Gesendet: Dienstag, 20. September 2016 13:44
An: Cluster Labs - All topics related to open-source clustering welcomed
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

Hi,

I've decided to create two answers for the two problems. The cluster still 
fails to relocate the resource after unloading the modules even with 
resource-agents 3.9.7
MDA1PFP-S01 11:42:50 2533 0 ~ # yum list resource-agents
Loaded plugins: langpacks, product-id, search-disabled-repos, 
subscription-manager
Installed Packages
resource-agents.x86_64  
  3.9.7-4.el7   
 @/resource-agents-3.9.7-4.el7.x86_64

Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]: warning: Action 9 (mda-ip_start_0) on 
MDA1PFP-PCS01 failed (target: 0 vs. rc: 6): Error
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]: warning: Action 9 (mda-ip_start_0) on 
MDA1PFP-PCS01 failed (target: 0 vs. rc: 6): Error
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Transition 5 (Complete=3, 
Pending=0, Fired=0, Skipped=0, Incomplete=1, 
Source=/var/lib/pacemaker/pengine/pe-input-552.bz2): Complete
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: Stopmda-ip 
(MDA1PFP-PCS01)
Sep 20 11:42:52 MDA1PFP-S01 pengine[13907]:  notice: Calculated Transition 6: 
/var/lib/pacemaker/pengine/pe-input-553.bz2
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Initiating action 2: stop 
mda-ip_stop_0 on MDA1PFP-PCS01 (local)
Sep 20 11:42:52 MDA1PFP-S01 IPaddr2(mda-ip)[15336]: INFO: IP status = no, 
IP_CIP=
Sep 20 11:42:52 MDA1PFP-S01 lrmd[13905]:  notice: mda-ip_stop_0:15336:stderr [ 
Device "bond0" does not exist. ]
Sep 20 11:42:52 MDA1PFP-S01 crmd[13908]:  notice: Operation mda-ip_stop_0: ok 
(node=MDA1PFP-PCS01, call=18, rc=0, cib-update=48, confirmed=true)
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93 96 98
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 93 98 9a 
9c
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Marking ringid 1 
interface 192.168.120.10 FAULTY
Sep 20 11:42:53 MDA1PFP-S01 corosync[13887]: [TOTEM ] Retransmit List: 98 9c 9f 
a1
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: Transition 6 (Complete=2, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-553.bz2): Complete
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]
Sep 20 11:42:53 MDA1PFP-S01 crmd[13908]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:  notice: On loss of CCM Quorum: 
Ignore
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]: warning: Processing failed op start 
for mda-ip on MDA1PFP-PCS01: not configured (6)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]:   error: Preventing mda-ip from 
re-starting anywhere: operation start failed 'not configured' (6)
Sep 20 11:42:53 MDA1PFP-S01 pengine[13907]: warning: Forcing mda-ip away from 
MDA1PFP-PCS01 after 100 failures (max=100)
Sep 20 11:42:53 MDA1P

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
192.168.120.20: icmp_seq=3 ttl=64 time=0.029 ms

MDA1PFP-S02 11:33:31 1273 0 ~ # ping 192.168.120.20
PING 192.168.120.20 (192.168.120.20) 56(84) bytes of data.
>From 192.168.120.11 icmp_seq=10 Destination Host Unreachable
>From 192.168.120.11 icmp_seq=11 Destination Host Unreachable
>From 192.168.120.11 icmp_seq=12 Destination Host Unreachable

Best wishes,
  Jens


--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Ken Gaillot [kgail...@redhat.com]
Gesendet: Montag, 19. September 2016 17:31
An: users@clusterlabs.org
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

On 09/19/2016 10:04 AM, Jan Pokorný wrote:
> On 19/09/16 10:18 +, Auer, Jens wrote:
>> Ok, after reading the log files again I found
>>
>> Sep 19 10:03:45 MDA1PFP-S01 crmd[7797]:  notice: Initiating action 3: stop 
>> mda-ip_stop_0 on MDA1PFP-PCS01 (local)
>> Sep 19 10:03:45 MDA1PFP-S01 crmd[7797]:  notice: 
>> MDA1PFP-PCS01-mda-ip_monitor_1000:14 [ ocf-exit-reason:Unknown interface 
>> [bond0] No such device.\n ]
>> Sep 19 10:03:45 MDA1PFP-S01 IPaddr2(mda-ip)[8745]: ERROR: Unknown interface 
>> [bond0] No such device.
>> Sep 19 10:03:45 MDA1PFP-S01 IPaddr2(mda-ip)[8745]: WARNING: [findif] failed
>> Sep 19 10:03:45 MDA1PFP-S01 lrmd[7794]:  notice: mda-ip_stop_0:8745:stderr [ 
>> ocf-exit-reason:Unknown interface [bond0] No such device. ]
>> Sep 19 10:03:45 MDA1PFP-S01 crmd[7797]:  notice: Operation mda-ip_stop_0: ok 
>> (node=MDA1PFP-PCS01, call=16, rc=0, cib-update=49, confirmed=true)
>> Sep 19 10:03:46 MDA1PFP-S01 crmd[7797]:  notice: Transition 3 (Complete=2, 
>> Pending=0, Fired=0, Skipped=0, Incomplete=0, 
>> Source=/var/lib/pacemaker/pengine/pe-input-501.bz2): Complete
>> Sep 19 10:03:46 MDA1PFP-S01 crmd[7797]:  notice: State transition 
>> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
>> origin=notify_crmd ]
>> Sep 19 10:03:46 MDA1PFP-S01 crmd[7797]:  notice: State transition S_IDLE -> 
>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
>> origin=abort_transition_graph ]
>> Sep 19 10:03:46 MDA1PFP-S01 pengine[7796]:  notice: On loss of CCM Quorum: 
>> Ignore
>> Sep 19 10:03:46 MDA1PFP-S01 pengine[7796]: warning: Processing failed op 
>> monitor for mda-ip on MDA1PFP-PCS01: not configured (6)
>> Sep 19 10:03:46 MDA1PFP-S01 pengine[7796]:   error: Preventing mda-ip from 
>> re-starting anywhere: operation monitor failed 'not configured' (6)
>>
>> I think that explains why the resource is not started on the other
>> node, but I am not sure this is a good decision. It seems to be a
>> little harsh to prevent the resource from starting anywhere,
>> especially considering that the other node will be able to start the
>> resource.

The resource agent is supposed to return "not configured" only when the
*pacemaker* configuration of the resource is inherently invalid, so
there's no chance of it starting anywhere.

As Jan suggested, make sure you've applied any resource-agents updates.
If that doesn't fix it, it sounds like a bug in the agent, or something
really is wrong with your pacemaker resource config.

>
> The problem to start with is that based on
>
>> Sep 19 10:03:45 MDA1PFP-S01 IPaddr2(mda-ip)[8745]: ERROR: Unknown interface 
>> [bond0] No such device.
>> Sep 19 10:03:45 MDA1PFP-S01 IPaddr2(mda-ip)[8745]: WARNING: [findif] failed
>
> you may be using too ancient version resource-agents:
>
> https://github.com/ClusterLabs/resource-agents/pull/320
>
> so until you update, the troubleshooting would be quite moot.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Auer, Jens
nput-555.bz2
Sep 20 11:43:02 MDA1PFP-S01 crmd[13908]:  notice: Transition 8 (Complete=0, 
Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-555.bz2): Complete
Sep 20 11:43:02 MDA1PFP-S01 crmd[13908]:  notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]

Cheers,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.

____
Von: Auer, Jens [jens.a...@cgi.com]
Gesendet: Montag, 19. September 2016 16:36
An: Cluster Labs - All topics related to open-source clustering welcomed
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

Hi,

>> After the restart ifconfig still shows the device bond0 to be not RUNNING:
>> MDA1PFP-S01 09:07:54 2127 0 ~ # ifconfig
>> bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST>  mtu 1500
>> inet 192.168.120.20  netmask 255.255.255.255  broadcast 0.0.0.0
>> ether a6:17:2c:2a:72:fc  txqueuelen 3  (Ethernet)
>> RX packets 2034  bytes 286728 (280.0 KiB)
>> RX errors 0  dropped 29  overruns 0  frame 0
>> TX packets 2284  bytes 355975 (347.6 KiB)
>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

There seems to be some difference because the device is not RUNNING;
mdaf-pf-pep-spare 14:17:53 999 0 ~ # ifconfig
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
inet 192.168.120.10  netmask 255.255.255.0  broadcast 192.168.120.255
inet6 fe80::5eb9:1ff:fe9c:e7fc  prefixlen 64  scopeid 0x20
ether 5c:b9:01:9c:e7:fc  txqueuelen 3  (Ethernet)
RX packets 15455692  bytes 22377220306 (20.8 GiB)
RX errors 0  dropped 2392  overruns 0  frame 0
TX packets 14706747  bytes 21361519159 (19.8 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Also the netmask and the ip address are wrong. I have configured the device to 
192.168.120.10 with netmask 192.168.120.10. How does IpAddr2 get the wrong 
configuration? I have no idea.

>Anyway, you should rather be using "ip" command from iproute suite
>than various if* tools that come short in some cases:
>http://inai.de/2008/02/19
>This would also be consistent with IPaddr2 uses under the hood.

We are using RedHat 7 and this uses either NetworkManager or the network 
scripts. We use the later and ifup/ifdown should be the correct way to use the 
network card. I also tried using ip link set dev bond0 up/down and it brings up 
the device with the correct ip address and network mask.

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Jan Pokorný [jpoko...@redhat.com]
Gesendet: Montag, 19. September 2016 14:57
An: Cluster Labs - All topics related to open-source clustering welcomed
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

On 19/09/16 09:15 +, Auer, Jens wrote:
> After the restart ifconfig still shows the device bond0 to be not RUNNING:
> MDA1PFP-S01 09:07:54 2127 0 ~ # ifconfig
> bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST>  mtu 1500
> inet 192.168.120.20  netmask 255.255.255.255  broadcast 0.0.0.0
> ether a6:17:2c:2a:72:fc  txqueuelen 3  (Ethernet)
> RX packets 2034  bytes 286728 (280.0 KiB)
> RX errors 0  dropped 29  over

Re: [ClusterLabs] No DRBD resource promoted to master in Active/Passive setup

2016-09-19 Thread Auer, Jens
Hi,

> Is the network interface being taken down here used for corosync
> communication? If so, that is a node-level failure, and pacemaker will
> fence.

We have different connections on each server:
- A bonded 10GB network card for data traffic that will be accessed via a 
virtual ip managed by pacemaker in 192.168.120.1/24. In the cluster nodes 
MDA1PFP-S01 and MDA1PFP-S02 are assigned to 192.168.120.10 and 192.168.120.11.

- A dedicated back-to-back connection for corosync heartbeats in 
192.168.121.1/24. MDA1PFP-PCS01 and MDA1PFP-S02 are assigned to 192.168.121.10 
and 192.168.121.11. When the cluster is created, we use these as primary node 
names and use the 10GB device as a second backup connection for increased 
reliability: pcs cluster setup --name MDA1PFP MDA1PFP-PCS01,MDA1PFP-S01 
MDA1PFP-PCS02,MDA1PFP-S02

- A dedicated back-to-back connection for drbd in 192.168.122.1/24. Hosts 
MDA1PFP-DRBD01 and MDA1PFP-DRBD02 are assigned 192.168.23.10 and 192.168.123.11.

Given that I think it is not a node-level failure. pcs status also reports the 
nodes as online. I think this should not trigger fencing from pacemaker.

> When DRBD is configured with 'fencing resource-only' and 'fence-peer
> "/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
> it will try to add a constraint that prevents the other node from
> becoming master. It removes the constraint when connectivity is restored.

> I am not familiar with all the under-the-hood details, but IIUC, if
> pacemaker actually fences the node, then the other node can still take
> over the DRBD. But if there is a network outage and no pacemaker
> fencing, then you'll see the behavior you describe -- DRBD prevents
> master takeover, to avoid stale data being used.

This is my understanding as well, but there should be no network outage for 
DRBD. I can reproduce the behavior by stopping cluster nodes which DRBD seems 
to interpret as network outages since it cannot communicate with the shutdown 
node anymore. Maybe I should ask on the DRBD mailing list?

Cheers,
  Jens
--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Ken Gaillot [kgail...@redhat.com]
Gesendet: Montag, 19. September 2016 16:28
An: Auer, Jens; Cluster Labs - All topics related to open-source clustering 
welcomed
Betreff: Re: [ClusterLabs] No DRBD resource promoted to master in 
Active/Passive setup

On 09/19/2016 02:31 AM, Auer, Jens wrote:
> Hi,
>
> I am not sure that pacemaker should do any fencing here. In my setting, 
> corosync is configured to use a back-to-back connection for heartbeats. This 
> is different subnet then used by the ping resource that checks the network 
> connectivity and detects a failure. In my test, I bring down the network 
> device used by ping and this triggers the failover. The node status is known 
> by pacemaker since it receives heartbeats and it only a resource failure. I 
> asked for fencing conditions a few days ago, and basically was asserted that 
> resource failure should not trigger STONITH actions if not explicitly 
> configured.

Is the network interface being taken down here used for corosync
communication? If so, that is a node-level failure, and pacemaker will
fence.

There is a bit of a distinction between DRBD fencing and pacemaker
fencing. The DRBD configuration is designed so that DRBD's fencing
method is to go through pacemaker.

When DRBD is configured with 'fencing resource-only' and 'fence-peer
"/usr/lib/drbd/crm-fence-peer.sh";', and DRBD detects a network outage,
it will try to add a constraint that prevents the other node from
becoming master. It removes the constraint when connectivity is restored.

I am not familiar with all the under-the-hood details, but IIUC, if
pacemaker actually fences the node, then the other node can still take
over the DRBD. But if there is a network outage and no pacemaker
fencing, then you'll see the behavior you describe -- DRBD prevents
master takeover, to avoid stale data being used.


> I am also wondering why this is "sticky". After a failover test the DRBD 
> resources are not working even if I restart the clust

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-19 Thread Auer, Jens
Hi,

>> After the restart ifconfig still shows the device bond0 to be not RUNNING:
>> MDA1PFP-S01 09:07:54 2127 0 ~ # ifconfig
>> bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST>  mtu 1500
>> inet 192.168.120.20  netmask 255.255.255.255  broadcast 0.0.0.0
>> ether a6:17:2c:2a:72:fc  txqueuelen 3  (Ethernet)
>> RX packets 2034  bytes 286728 (280.0 KiB)
>> RX errors 0  dropped 29  overruns 0  frame 0
>> TX packets 2284  bytes 355975 (347.6 KiB)
>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

There seems to be some difference because the device is not RUNNING;
mdaf-pf-pep-spare 14:17:53 999 0 ~ # ifconfig
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
inet 192.168.120.10  netmask 255.255.255.0  broadcast 192.168.120.255
inet6 fe80::5eb9:1ff:fe9c:e7fc  prefixlen 64  scopeid 0x20
ether 5c:b9:01:9c:e7:fc  txqueuelen 3  (Ethernet)
RX packets 15455692  bytes 22377220306 (20.8 GiB)
RX errors 0  dropped 2392  overruns 0  frame 0
TX packets 14706747  bytes 21361519159 (19.8 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Also the netmask and the ip address are wrong. I have configured the device to 
192.168.120.10 with netmask 192.168.120.10. How does IpAddr2 get the wrong 
configuration? I have no idea.

>Anyway, you should rather be using "ip" command from iproute suite
>than various if* tools that come short in some cases:
>http://inai.de/2008/02/19
>This would also be consistent with IPaddr2 uses under the hood.

We are using RedHat 7 and this uses either NetworkManager or the network 
scripts. We use the later and ifup/ifdown should be the correct way to use the 
network card. I also tried using ip link set dev bond0 up/down and it brings up 
the device with the correct ip address and network mask. 

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Jan Pokorný [jpoko...@redhat.com]
Gesendet: Montag, 19. September 2016 14:57
An: Cluster Labs - All topics related to open-source clustering welcomed
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

On 19/09/16 09:15 +, Auer, Jens wrote:
> After the restart ifconfig still shows the device bond0 to be not RUNNING:
> MDA1PFP-S01 09:07:54 2127 0 ~ # ifconfig
> bond0: flags=5123<UP,BROADCAST,MASTER,MULTICAST>  mtu 1500
> inet 192.168.120.20  netmask 255.255.255.255  broadcast 0.0.0.0
> ether a6:17:2c:2a:72:fc  txqueuelen 3  (Ethernet)
> RX packets 2034  bytes 286728 (280.0 KiB)
> RX errors 0  dropped 29  overruns 0  frame 0
> TX packets 2284  bytes 355975 (347.6 KiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

This seems to suggest bond0 interface is up and address-assigned
(well, the netmask is strange).  So there would be nothing
contradictory to what I said on the address of IPaddr2.

Anyway, you should rather be using "ip" command from iproute suite
than various if* tools that come short in some cases:
http://inai.de/2008/02/19
This would also be consistent with IPaddr2 uses under the hood.

--
Jan (Poki)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Preferred location is sometimes ignored

2016-09-16 Thread Auer, Jens
On 09/16/2016 09:45 AM, Auer, Jens wrote:
>> Hi,
>>
>> MDA1PFP-S01 14:41:35 1805 0 ~ # pcs constraint --full
>> Location Constraints:
>>   Resource: mda-ip
>> Enabled on: MDA1PFP-PCS01 (score:50)
>> (id:location-mda-ip-MDA1PFP-PCS01-50)
>> Constraint: location-mda-ip
>>   Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
>> Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
>> Expression: not_defined pingd  (id:location-mda-ip-rule-expr-1)
>If I'm reading this right, you have two separate location constraints
>for mda-ip, a positive preference for one particular node (score 50 for
>MDA1PFP-PCS01), and a -INFINITY preference whenever the ping attribute
i>s bad.

Yes, I have two location constraints. The first one is a preferred location for 
startup and the second one should move the resource if ping fails. When I start 
the nodes for the test the ping should not fail as everything is ok. So the 
location preference should be the only score taken into account. So I am 
wondering why the resource is sometimes not started on the preferred node. 

>> Which constraint is sometimes ignored?
The location constraint of 50 for node MDA1PFP-PCS01.

>> Is there a way to get more debugging output from pcs, e.g. what
>> triggered actions, which scores are computed and from which values?

>Not from pcs, but there are some lower-level tools that can sometimes be
>helpful. "crm_simulate -sL" will show all the scores that went into the
>current placement.

Thanks for the tip. Is there any way to get more output when starting the 
cluster up initially? This is only problem I am concerned here because 
afterwards it works fine.

Cheers,
  Jens
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-16 Thread Auer, Jens
Hi,

thanks for the help.

> I'm not sure what you mean by "the device the virtual ip is attached
> to", but a separate question is why the resource agent reported that
> restarting the IP was successful, even though that device was
> unavailable. If the monitor failed when the device was made unavailable,
> I would expect the restart to fail as well.

I created the virtual ip with parameter nic=bond0, and this is the device I am 
bringing down
and was referring to in my question. I think the current behavior is a little 
inconsistent. I bring 
down the device and pacemaker recognizes this and restarts the resource. 
However, the monitor
then should fail again, but it just doesn't detect any problems. 

Cheers,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Ken Gaillot [kgail...@redhat.com]
Gesendet: Freitag, 16. September 2016 17:27
An: users@clusterlabs.org
Betreff: Re: [ClusterLabs] Virtual ip resource restarted on node with down 
network device

On 09/16/2016 10:08 AM, Auer, Jens wrote:
> Hi,
>
> I have configured an Active/Passive cluster to host a virtual ip
> address. To test failovers, I shutdown the device the virtual ip is
> attached to and expected that it moves to the other node. However, the
> virtual ip is detected as FAILED, but is then restarted on the same
> node. I was able to solve this by using a ping resource which we want to
> do anyway, but I am wondering why the resource is restarted on the node
> and no failure is detected anymore.

If a *node* fails, pacemaker will recover all its resources elsewhere,
if possible.

If a *resource* fails but the node is OK, the response is configurable,
via the "on-fail" operation option and "migration-threshold" resource
option.

By default, on-fail=restart for monitor operations, and
migration-threshold=INFINITY. This means that if a monitor fails,
pacemaker will attempt to restart the resource on the same node.

To get an immediate failover of the resource, set migration-threshold=1
on the resource.

I'm not sure what you mean by "the device the virtual ip is attached
to", but a separate question is why the resource agent reported that
restarting the IP was successful, even though that device was
unavailable. If the monitor failed when the device was made unavailable,
I would expect the restart to fail as well.

>
> On my setup, this is very easy to reproduce:
> 1. Start cluster with virtual ip
> 2. On the node hosting the virtual ip, bring down the network device
> with ifdown
> => The resource is detected as failed
> => The resource is restarted
> => No failures are dected from now on
>
> Best wishes,
>   Jens
>
> --
> *Jens Auer *| CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> _jens.auer@cgi.com_ <mailto:jens.a...@cgi.com>
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie
> unter _de.cgi.com/pflichtangaben_ <http://de.cgi.com/pflichtangaben>.\

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] (no subject)

2016-09-16 Thread Auer, Jens
Please ignore this mail with empty subject. I have reposted the mail with 
subject
No DRBD resource promoted to master in Active/Passive setup

Sorry,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com<mailto:jens.a...@cgi.com>
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben<http://de.cgi.com/pflichtangaben>.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.
____
Von: Auer, Jens [jens.a...@cgi.com]
Gesendet: Freitag, 16. September 2016 16:51
An: users@clusterlabs.org
Betreff: [ClusterLabs] (no subject)

Hi,

I have an Active/Passive configuration with a drbd mast/slave resource:

MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status
Cluster name: MDA1PFP
Last updated: Fri Sep 16 14:41:18 2016Last change: Fri Sep 16 14:39:49 
2016 by root via cibadmin on MDA1PFP-PCS01
Stack: corosync
Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with 
quorum
2 nodes and 7 resources configured

Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]

Full list of resources:

 Master/Slave Set: drbd1_sync [drbd1]
 Masters: [ MDA1PFP-PCS02 ]
 Slaves: [ MDA1PFP-PCS01 ]
 mda-ip(ocf::heartbeat:IPaddr2):Started MDA1PFP-PCS02
 Clone Set: ping-clone [ping]
 Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
 ACTIVE(ocf::heartbeat:Dummy):Started MDA1PFP-PCS02
 shared_fs(ocf::heartbeat:Filesystem):Started MDA1PFP-PCS02

PCSD Status:
  MDA1PFP-PCS01: Online
  MDA1PFP-PCS02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full
 Master: drbd1_sync
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
notify=true
  Resource: drbd1 (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=shared_fs
   Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
   promote interval=0s timeout=90 (drbd1-promote-interval-0s)
   demote interval=0s timeout=90 (drbd1-demote-interval-0s)
   stop interval=0s timeout=100 (drbd1-stop-interval-0s)
   monitor interval=60s (drbd1-monitor-interval-60s)
 Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
  Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
  stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
  monitor interval=1s (mda-ip-monitor-interval-1s)
 Clone: ping-clone
  Resource: ping (class=ocf provider=pacemaker type=ping)
   Attributes: dampen=5s multiplier=1000 host_list=pf-pep-dev-1 timeout=1 
attempts=3
   Operations: start interval=0s timeout=60 (ping-start-interval-0s)
   stop interval=0s timeout=20 (ping-stop-interval-0s)
   monitor interval=1 (ping-monitor-interval-1)
 Resource: ACTIVE (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (ACTIVE-start-interval-0s)
  stop interval=0s timeout=20 (ACTIVE-stop-interval-0s)
  monitor interval=10 timeout=20 (ACTIVE-monitor-interval-10)
 Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
  Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
  stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
  monitor interval=20 timeout=40 (shared_fs-monitor-interval-20)

MDA1PFP-S01 14:41:35 1805 0 ~ # pcs constraint --full
Location Constraints:
  Resource: mda-ip
Enabled on: MDA1PFP-PCS01 (score:50) (id:location-mda-ip-MDA1PFP-PCS01-50)
Constraint: location-mda-ip
  Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
Expression: not_defined pingd  (id:location-mda-ip-rule-expr-1)
Ordering Constraints:
  start ping-clone then start mda-ip (kind:Optional) 
(id:order-ping-clone-mda-ip-Optional)
  promote drbd1_sync then start shared_fs (kind:Mandatory) 
(id:order-drbd1_sync-shared_fs-mandatory)
Colocation Constraints:
  ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY)
  drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master) 
(with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY)
  shared_fs with drbd1_sync (score:INFINITY) (

[ClusterLabs] Preferred location is sometimes ignored

2016-09-16 Thread Auer, Jens
Hi,

I have a Active/Passive configuration with a drbd mast/slave resource:

MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status
Cluster name: MDA1PFP
Last updated: Fri Sep 16 14:41:18 2016Last change: Fri Sep 16 14:39:49 
2016 by root via cibadmin on MDA1PFP-PCS01
Stack: corosync
Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with 
quorum
2 nodes and 7 resources configured

Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]

Full list of resources:

 Master/Slave Set: drbd1_sync [drbd1]
 Masters: [ MDA1PFP-PCS02 ]
 Slaves: [ MDA1PFP-PCS01 ]
 mda-ip(ocf::heartbeat:IPaddr2):Started MDA1PFP-PCS02
 Clone Set: ping-clone [ping]
 Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
 ACTIVE(ocf::heartbeat:Dummy):Started MDA1PFP-PCS02
 shared_fs(ocf::heartbeat:Filesystem):Started MDA1PFP-PCS02

PCSD Status:
  MDA1PFP-PCS01: Online
  MDA1PFP-PCS02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full
 Master: drbd1_sync
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
notify=true
  Resource: drbd1 (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=shared_fs
   Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
   promote interval=0s timeout=90 (drbd1-promote-interval-0s)
   demote interval=0s timeout=90 (drbd1-demote-interval-0s)
   stop interval=0s timeout=100 (drbd1-stop-interval-0s)
   monitor interval=60s (drbd1-monitor-interval-60s)
 Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
  Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
  stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
  monitor interval=1s (mda-ip-monitor-interval-1s)
 Clone: ping-clone
  Resource: ping (class=ocf provider=pacemaker type=ping)
   Attributes: dampen=5s multiplier=1000 host_list=pf-pep-dev-1 timeout=1 
attempts=3
   Operations: start interval=0s timeout=60 (ping-start-interval-0s)
   stop interval=0s timeout=20 (ping-stop-interval-0s)
   monitor interval=1 (ping-monitor-interval-1)
 Resource: ACTIVE (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (ACTIVE-start-interval-0s)
  stop interval=0s timeout=20 (ACTIVE-stop-interval-0s)
  monitor interval=10 timeout=20 (ACTIVE-monitor-interval-10)
 Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
  Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
  stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
  monitor interval=20 timeout=40 (shared_fs-monitor-interval-20)

MDA1PFP-S01 14:41:35 1805 0 ~ # pcs constraint --full
Location Constraints:
  Resource: mda-ip
Enabled on: MDA1PFP-PCS01 (score:50) (id:location-mda-ip-MDA1PFP-PCS01-50)
Constraint: location-mda-ip
  Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
Expression: not_defined pingd  (id:location-mda-ip-rule-expr-1)
Ordering Constraints:
  start ping-clone then start mda-ip (kind:Optional) 
(id:order-ping-clone-mda-ip-Optional)
  promote drbd1_sync then start shared_fs (kind:Mandatory) 
(id:order-drbd1_sync-shared_fs-mandatory)
Colocation Constraints:
  ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY)
  drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master) 
(with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY)
  shared_fs with drbd1_sync (score:INFINITY) (rsc-role:Started) 
(with-rsc-role:Master) (id:colocation-shared_fs-drbd1_sync-INFINITY)

As you can see I have defined a location constraint for the virtual ip mda-ip 
and colocated the drbd master and filesystem. However, this constraint is 
sometimes ignored. Can anybody point me to the correct configuration?

Is there a way to get more debugging output from pcs, e.g. what triggered 
actions, which scores are computed and from which values?

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message 

[ClusterLabs] (no subject)

2016-09-16 Thread Auer, Jens
Hi,

I have an Active/Passive configuration with a drbd mast/slave resource:

MDA1PFP-S01 14:40:27 1803 0 ~ # pcs status
Cluster name: MDA1PFP
Last updated: Fri Sep 16 14:41:18 2016Last change: Fri Sep 16 14:39:49 
2016 by root via cibadmin on MDA1PFP-PCS01
Stack: corosync
Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with 
quorum
2 nodes and 7 resources configured

Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]

Full list of resources:

 Master/Slave Set: drbd1_sync [drbd1]
 Masters: [ MDA1PFP-PCS02 ]
 Slaves: [ MDA1PFP-PCS01 ]
 mda-ip(ocf::heartbeat:IPaddr2):Started MDA1PFP-PCS02
 Clone Set: ping-clone [ping]
 Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
 ACTIVE(ocf::heartbeat:Dummy):Started MDA1PFP-PCS02
 shared_fs(ocf::heartbeat:Filesystem):Started MDA1PFP-PCS02

PCSD Status:
  MDA1PFP-PCS01: Online
  MDA1PFP-PCS02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

MDA1PFP-S01 14:41:19 1804 0 ~ # pcs resource --full
 Master: drbd1_sync
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
notify=true
  Resource: drbd1 (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=shared_fs
   Operations: start interval=0s timeout=240 (drbd1-start-interval-0s)
   promote interval=0s timeout=90 (drbd1-promote-interval-0s)
   demote interval=0s timeout=90 (drbd1-demote-interval-0s)
   stop interval=0s timeout=100 (drbd1-stop-interval-0s)
   monitor interval=60s (drbd1-monitor-interval-60s)
 Resource: mda-ip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.120.20 cidr_netmask=32 nic=bond0
  Operations: start interval=0s timeout=20s (mda-ip-start-interval-0s)
  stop interval=0s timeout=20s (mda-ip-stop-interval-0s)
  monitor interval=1s (mda-ip-monitor-interval-1s)
 Clone: ping-clone
  Resource: ping (class=ocf provider=pacemaker type=ping)
   Attributes: dampen=5s multiplier=1000 host_list=pf-pep-dev-1 timeout=1 
attempts=3
   Operations: start interval=0s timeout=60 (ping-start-interval-0s)
   stop interval=0s timeout=20 (ping-stop-interval-0s)
   monitor interval=1 (ping-monitor-interval-1)
 Resource: ACTIVE (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (ACTIVE-start-interval-0s)
  stop interval=0s timeout=20 (ACTIVE-stop-interval-0s)
  monitor interval=10 timeout=20 (ACTIVE-monitor-interval-10)
 Resource: shared_fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/shared_fs fstype=xfs
  Operations: start interval=0s timeout=60 (shared_fs-start-interval-0s)
  stop interval=0s timeout=60 (shared_fs-stop-interval-0s)
  monitor interval=20 timeout=40 (shared_fs-monitor-interval-20)

MDA1PFP-S01 14:41:35 1805 0 ~ # pcs constraint --full
Location Constraints:
  Resource: mda-ip
Enabled on: MDA1PFP-PCS01 (score:50) (id:location-mda-ip-MDA1PFP-PCS01-50)
Constraint: location-mda-ip
  Rule: score=-INFINITY boolean-op=or  (id:location-mda-ip-rule)
Expression: pingd lt 1  (id:location-mda-ip-rule-expr)
Expression: not_defined pingd  (id:location-mda-ip-rule-expr-1)
Ordering Constraints:
  start ping-clone then start mda-ip (kind:Optional) 
(id:order-ping-clone-mda-ip-Optional)
  promote drbd1_sync then start shared_fs (kind:Mandatory) 
(id:order-drbd1_sync-shared_fs-mandatory)
Colocation Constraints:
  ACTIVE with mda-ip (score:INFINITY) (id:colocation-ACTIVE-mda-ip-INFINITY)
  drbd1_sync with mda-ip (score:INFINITY) (rsc-role:Master) 
(with-rsc-role:Started) (id:colocation-drbd1_sync-mda-ip-INFINITY)
  shared_fs with drbd1_sync (score:INFINITY) (rsc-role:Started) 
(with-rsc-role:Master) (id:colocation-shared_fs-drbd1_sync-INFINITY)

The cluster starts fine, except resources starting not on the preferred host. I 
asked this in a different question to keep things separated.
The status after starting is:
Last updated: Fri Sep 16 14:39:57 2016  Last change: Fri Sep 16 
14:39:49 2016 by root via cibadmin on MDA1PFP-PCS01
Stack: corosync
Current DC: MDA1PFP-PCS02 (version 1.1.13-10.el7-44eb2dd) - partition with 
quorum
2 nodes and 7 resources configured

Online: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]

 Master/Slave Set: drbd1_sync [drbd1]
 Masters: [ MDA1PFP-PCS02 ]
 Slaves: [ MDA1PFP-PCS01 ]
mda-ip  (ocf::heartbeat:IPaddr2):Started MDA1PFP-PCS02
 Clone Set: ping-clone [ping]
 Started: [ MDA1PFP-PCS01 MDA1PFP-PCS02 ]
ACTIVE  (ocf::heartbeat:Dummy): Started MDA1PFP-PCS02
shared_fs(ocf::heartbeat:Filesystem):Started MDA1PFP-PCS02

>From this state, I did two tests to simulate a cluster failover:
1. Shutdown the cluster node with the master with pcs cluster stop
2. Disable the network device for the virtual ip with ifdown and wait until 
ping detects it

In both cases, the failover is executed but 

Re: [ClusterLabs] Cluster administration from non-root users

2016-06-17 Thread Auer, Jens
Thanks a lot. Everything works as expected.
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.


Von: Tomas Jelinek [tojel...@redhat.com]
Gesendet: Montag, 13. Juni 2016 14:32
An: users@clusterlabs.org
Betreff: Re: [ClusterLabs] Cluster administration from non-root users

Dne 13.6.2016 v 13:57 Auer, Jens napsal(a):
> Hi,
>
> I am trying to give admin rights to my clusters to non-root users. I
> have two users which need to be able to control the cluster. Both are
> members of the haclient group, and I have created acl roles granting
> write-access. I can query the cluster status, but I am unable to perform
> any commands:
> id
> uid=1000(mdaf) gid=1000(mdaf)
> groups=1000(mdaf),10(wheel),189(haclient),801(mdaf),802(mdafkey),803(mdafmaintain)
>
> pcs acl
> ACLs are enabled
>
> User: mdaf
>Roles: admin
> User: mdafmaintain
>Roles: admin
> Role: admin
>Permission: write xpath /cib (admin-write)
>
> pcs cluster status
> Cluster Status:
>   Last updated: Mon Jun 13 11:46:45 2016Last change: Mon Jun 13
> 11:46:38 2016 by root via cibadmin on MDA2PFP-S02
>   Stack: corosync
>   Current DC: MDA2PFP-S01 (version 1.1.13-10.el7-44eb2dd) - partition
> with quorum
>   2 nodes and 9 resources configured
>   Online: [ MDA2PFP-S01 MDA2PFP-S02 ]
>
> PCSD Status:
>MDA2PFP-S01: Online
>MDA2PFP-S02: Online
>
> pcs cluster stop
> Error: localhost: Permission denied - (HTTP error: 403)
>
> pcs cluster start
> Error: localhost: Permission denied - (HTTP error: 403)

Hi Jens,

You configured permissions to edit CIB. But it is also required to
assign permissions to use pcsd (only root is allowed to start and stop
services, so the request goes through pcsd).

This can be done using pcs web UI:
- open the web UI in your browser at https://:2224
- login as hacluster user
- add existing cluster
- go to permissions
- set permissions for your cluster
- don't forget to apply changes

Regards,
Tomas

>
> I tried to use sudo instead, but this also not working:
> sudo pcs status
> Permission denied
> Error: unable to locate command: /usr/sbin/crm_mon
>
> Any help would be greatly appreciated.
>
> Best wishes,
>Jens
>
> --
> *Jens Auer *| CGI | Software-Engineer
> CGI (Germany) GmbH & Co. KG
> Rheinstraße 95 | 64295 Darmstadt | Germany
> T: +49 6151 36860 154
> _jens.auer@cgi.com_ <mailto:jens.a...@cgi.com>
> Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie
> unter _de.cgi.com/pflichtangaben_ <http://de.cgi.com/pflichtangaben>.
> CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging
> to CGI Group Inc. and its affiliates may be contained in this message.
> If you are not a recipient indicated or intended in this message (or
> responsible for delivery of this message to such person), or you think
> for any reason that this message may have been addressed to you in
> error, you may not use or copy or deliver this message to anyone else.
> In such case, you should destroy this message and are asked to notify
> the sender by reply e-mail.
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Cluster administration from non-root users

2016-06-13 Thread Auer, Jens
Hi,

I am trying to give admin rights to my clusters to non-root users. I have two 
users which need to be able to control the cluster. Both are members of the 
haclient group, and I have created acl roles granting write-access. I can query 
the cluster status, but I am unable to perform any commands:
id
uid=1000(mdaf) gid=1000(mdaf) 
groups=1000(mdaf),10(wheel),189(haclient),801(mdaf),802(mdafkey),803(mdafmaintain)

pcs acl
ACLs are enabled

User: mdaf
  Roles: admin
User: mdafmaintain
  Roles: admin
Role: admin
  Permission: write xpath /cib (admin-write)

pcs cluster status
Cluster Status:
 Last updated: Mon Jun 13 11:46:45 2016Last change: Mon Jun 13 11:46:38 
2016 by root via cibadmin on MDA2PFP-S02
 Stack: corosync
 Current DC: MDA2PFP-S01 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
 2 nodes and 9 resources configured
 Online: [ MDA2PFP-S01 MDA2PFP-S02 ]

PCSD Status:
  MDA2PFP-S01: Online
  MDA2PFP-S02: Online

pcs cluster stop
Error: localhost: Permission denied - (HTTP error: 403)

pcs cluster start
Error: localhost: Permission denied - (HTTP error: 403)

I tried to use sudo instead, but this also not working:
sudo pcs status
Permission denied
Error: unable to locate command: /usr/sbin/crm_mon

Any help would be greatly appreciated.

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org