Re: [ClusterLabs] start a resource

2016-05-17 Thread Ken Gaillot
On 05/16/2016 12:22 PM, Dimitri Maziuk wrote:
> On 05/13/2016 04:31 PM, Ken Gaillot wrote:
> 
>> That is definitely not a properly functioning cluster. Something
>> is going wrong at some level.
> 
> Yeah, well... how do I find out what/where?

What happens after "pcs resource cleanup"? "pcs status" reports the
time associated with each failure, so you can check whether you are
seeing the same failure or a new one.

The system log is usually the best starting point, as it will have
messages from pacemaker, corosync and the resource agents. You can
look around the time of the failure(s) to look for details or anything
unusual.

Pacemaker also has a detail log (by default, /var/log/pacemaker.log).
In general, this is more useful to developers than administrators, but
if the system log doesn't help, it can sometimes shed a little more light.

> One question: in corosync.conf I have nodelist { node { ring0_addr:
> node1_name nodeid: 1 } node { ring0_addr: node2_name nodeid: 2 } }
> 
> Could 'pcs cluster stop/start' reset the interface that resolves
> to nodeX_name? If so, that would answer why ssh connections get
> killed.

No, Pacemaker and pcs don't touch the interfaces (unless of course you
explicitly add a cluster resource to do so, which wouldn't work anyway
for the interface(s) that corosync itself needs to use).


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] start a resource

2016-05-13 Thread Ken Gaillot
On 05/06/2016 01:01 PM, Dimitri Maziuk wrote:
> On 05/06/2016 12:05 PM, Ian wrote:
>> Are you getting any other errors now that you've fixed the
>> config?
> 
> It's running now that I did the cluster stop/start, but no: I
> wasn't getting any other errors. I did have a symlink resource
> "stopped" for no apparent reason and with no errors logged.
> 
> The cluster is a basic active-passive pair. The relevant part of
> the setup is:
> 
> drbd filesystem floating ip colocated with drbd filesystem +inf 
> order drbd filesystem then floating ip
> 
> ocf:heartbeat:symlink resource that does /etc/rsyncd.conf ->
> /drbd/etc/rsyncd.conf colocated with drbd filesystem +inf order
> drbd filesystem then the symlink
> 
> ocf:heartbeat:rsyncd resource that is colocated with the symlink 
> order symlink then rsyncd order floating ip then rsyncd
> 
> (Looking at this, maybe I should also colocate rsyncd with floating
> ip to avoid any confusion in pacemaker's little brain.)

Not strictly necessary, since rsync is colocated with symlink which is
colocated with filesystem, and ip is also colocated with filesystem.

But it is a good idea to model all logical dependencies, since you
don't know what changes you might make to the configuration in the
future. If you want rsyncd to always be with the floating ip, then by
all means add a colocation constraint.

> But this is not specific to rsyncd: the behaviour was exactly the
> same when a co-worker made a typo in apache config (which is
> another resource on the same cluster). The only way to restart
> apache was to "pcs cluster stop ; pcs cluster start" and that
> randomly killed ssh connections to the nodes' "proper" IPs.

That is definitely not a properly functioning cluster. Something is
going wrong at some level.

When you say that "pcs resource cleanup" didn't fix the issue, what
happened after that? Did "pcs status" still show an error for the
resource? If so, there was an additional failure.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] start a resource

2016-05-07 Thread Dimitri Maziuk
On 05/06/2016 12:05 PM, Ian wrote:
> Are you getting any other errors now that you've fixed the config?

It's running now that I did the cluster stop/start, but no: I wasn't
getting any other errors. I did have a symlink resource "stopped" for no
apparent reason and with no errors logged.

The cluster is a basic active-passive pair. The relevant part of the
setup is:

drbd filesystem
floating ip
  colocated with drbd filesystem +inf
  order drbd filesystem then floating ip

ocf:heartbeat:symlink resource that
  does /etc/rsyncd.conf -> /drbd/etc/rsyncd.conf
  colocated with drbd filesystem +inf
  order drbd filesystem then the symlink

ocf:heartbeat:rsyncd resource that is
  colocated with the symlink
  order symlink then rsyncd
  order floating ip then rsyncd

(Looking at this, maybe I should also colocate rsyncd with floating ip
to avoid any confusion in pacemaker's little brain.)

But this is not specific to rsyncd: the behaviour was exactly the same
when a co-worker made a typo in apache config (which is another resource
on the same cluster). The only way to restart apache was to "pcs cluster
stop ; pcs cluster start" and that randomly killed ssh connections to
the nodes' "proper" IPs.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] start a resource

2016-05-06 Thread Ian
Are you getting any other errors now that you've fixed the config?

What does your config file look like?
On May 6, 2016 10:33 AM, "Dmitri Maziuk"  wrote:

> On 2016-05-05 23:50, Moiz Arif wrote:
>
>> Hi Dimitri,
>>
>> Try cleanup of the fail count for the resource with the any of the below
>> commands:
>>
>> via pcs : pcs resource cleanup rsyncd
>>
>
> Tried it, didn't work. Tried pcs resource debug-start rsyncd -- got no
> errors, resource didn't start. Tried disable/enable.
>
> So far the only way I've been able to do this is pcs cluster stop ; pcs
> cluster start which is ridiculous on a production cluster with drbd and a
> database etc. (And it killed my ssh connection to the other node, again.)
>
> Ay other suggestions?
> Thanks,
> Dima
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] start a resource

2016-05-06 Thread Dmitri Maziuk

On 2016-05-05 23:50, Moiz Arif wrote:

Hi Dimitri,

Try cleanup of the fail count for the resource with the any of the below
commands:

via pcs : pcs resource cleanup rsyncd


Tried it, didn't work. Tried pcs resource debug-start rsyncd -- got no 
errors, resource didn't start. Tried disable/enable.


So far the only way I've been able to do this is pcs cluster stop ; pcs 
cluster start which is ridiculous on a production cluster with drbd and 
a database etc. (And it killed my ssh connection to the other node, again.)


Ay other suggestions?
Thanks,
Dima


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] start a resource

2016-05-05 Thread Moiz Arif
Hi Dimitri,
Try cleanup of the fail count for the resource with the any of the below 
commands:
via pcs : pcs resource cleanup rsyncdvia crm: crm resource cleanup rsyncd
Hope it helps. 
Moiz
To: users@clusterlabs.org
From: dmaz...@bmrb.wisc.edu
Date: Thu, 5 May 2016 14:15:09 -0500
Subject: [ClusterLabs] start a resource

Hi all,
 
I'm sure it must be a FAQ, but how do I start a resource? E.g.
 
Failed Actions:
* rsyncd_start_0 on tarpon 'unknown error' (1): call=78,
status=complete, exitreason='Error. "pid file" entry required in the
rsyncd config file by rsyncd OCF RA.',
last-rc-change='Thu May  5 13:55:50 2016', queued=0ms, exec=51ms
 
OK, I fixed the config file, how do I restart rsyncd now?
 
TIA
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org ___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org