Re: [ClusterLabs] Regression in Filesystem RA

2017-11-29 Thread Christian Balzer

Hello,

sorry for the late reply, moving Date Centers tends to keep one busy.

I looked at the PR and while it works and certainly is an improvement, it
wouldn't help me in my case much.
Biggest issue being fuser and its exponential slowdown and the RA still
uses this.

What I did was to recklessly force my crap code into a script:
---
#/bin/bash
lsof -n |grep $1 |grep DIR| awk '{print $2}'
---

And call that instead of fuser as well as removing all kill logging by
default (determining the number pids isn't free either). 

With that in place it can deal with 10k processes to kill in less than 10
seconds.

Regards,

Christian

On Tue, 24 Oct 2017 09:07:50 +0200 Dejan Muhamedagic wrote:

> On Tue, Oct 24, 2017 at 08:59:17AM +0200, Dejan Muhamedagic wrote:
> > [...]
> > I just made a pull request:
> > 
> > https://github.com/ClusterLabs/resource-agents/pull/1042  
> 
> NB: It is completely untested!
> 
> > It would be great if you could test it!
> > 
> > Cheers,
> > 
> > Dejan
> >   
> > > Regards,
> > > 
> > > Christian
> > >   
> > > > > Maybe we can even come up with a way
> > > > > to both "pretty print" and kill fast?
> > > > 
> > > > My best guess right now is no ;-) But we could log nicely for the
> > > > usual case of a small number of stray processes ... maybe
> > > > something like this:
> > > > 
> > > > i=""
> > > > get_pids | tr '\n' ' ' | fold -s |
> > > > while read procs; do
> > > > if [ -z "$i" ]; then
> > > > killnlog $procs
> > > > i="nolog"
> > > > else
> > > > justkill $procs
> > > > fi
> > > > done
> > > > 
> > > > Cheers,
> > > > 
> > > > Dejan
> > > >   
> > > > > -- 
> > > > > : Lars Ellenberg
> > > > > : LINBIT | Keeping the Digital World Running
> > > > > : DRBD -- Heartbeat -- Corosync -- Pacemaker
> > > > > : R&D, Integration, Ops, Consulting, Support
> > > > > 
> > > > > DRBD® and LINBIT® are registered trademarks of LINBIT
> > > > > 
> > > > > ___
> > > > > Users mailing list: Users@clusterlabs.org
> > > > > http://lists.clusterlabs.org/mailman/listinfo/users
> > > > > 
> > > > > Project Home: http://www.clusterlabs.org
> > > > > Getting started: 
> > > > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > > Bugs: http://bugs.clusterlabs.org
> > > > 
> > > > ___
> > > > Users mailing list: Users@clusterlabs.org
> > > > http://lists.clusterlabs.org/mailman/listinfo/users
> > > > 
> > > > Project Home: http://www.clusterlabs.org
> > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > > > Bugs: http://bugs.clusterlabs.org
> > > >   
> > > 
> > > 
> > > -- 
> > > Christian BalzerNetwork/Systems Engineer
> > > ch...@gol.com Rakuten Communications  
> > 
> > ___
> > Users mailing list: Users@clusterlabs.org
> > http://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org  
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: questions about startup fencing

2017-11-29 Thread Ulrich Windl



> Kristoffer Gronlund  wrote:
>>Adam Spiers  writes:
>>
>>> - The whole cluster is shut down cleanly.
>>>
>>> - The whole cluster is then started up again.  (Side question: what
>>>   happens if the last node to shut down is not the first to start up?
>>>   How will the cluster ensure it has the most recent version of the
>>>   CIB?  Without that, how would it know whether the last man standing
>>>   was shut down cleanly or not?)
>>
>>This is my opinion, I don't really know what the "official" pacemaker
>>stance is: There is no such thing as shutting down a cluster cleanly. A
>>cluster is a process stretching over multiple nodes - if they all shut
>>down, the process is gone. When you start up again, you effectively have
>>a completely new cluster.
> 
> Sorry, I don't follow you at all here.  When you start the cluster up
> again, the cluster config from before the shutdown is still there.
> That's very far from being a completely new cluster :-)

The problem is you cannot "start the cluster" in pacemaker; you can only "start 
nodes". The nodes will come up one by one. As opposed (as I had said) to HP 
Sertvice Guard, where there is a "cluster formation timeout". That is, the 
nodes wait for the specified time for the cluster to "form". Then the cluster 
starts as a whole. Of course that only applies if the whole cluster was down, 
not if a single node was down.


> 
>>When starting up, how is the cluster, at any point, to know if the
>>cluster it has knowledge of is the "latest" cluster?
> 
> That was exactly my question.
> 
>>The next node could have a newer version of the CIB which adds yet
>>more nodes to the cluster.
> 
> Yes, exactly.  If the first node to start up was not the last man
> standing, the CIB history is effectively being forked.  So how is this
> issue avoided?

Quorum? "Cluster formation delay"?

> 
>>The only way to bring up a cluster from being completely stopped is to
>>treat it as creating a completely new cluster. The first node to start
>>"creates" the cluster and later nodes join that cluster.
> 
> That's ignoring the cluster config, which persists even when the
> cluster's down.
> 
> But to be clear, you picked a small side question from my original
> post and answered that.  The main questions I had were about startup
> fencing :-)
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-29 Thread Jan Pokorný
On 29/11/17 22:00 +0100, Jan Pokorný wrote:
> On 28/11/17 22:35 +0300, Andrei Borzenkov wrote:
>> 28.11.2017 13:01, Jan Pokorný пишет:
>>> On 27/11/17 17:43 +0300, Andrei Borzenkov wrote:
 Отправлено с iPhone
 
> 27 нояб. 2017 г., в 14:36, Ferenc Wágner  написал(а):
> 
> Andrei Borzenkov  writes:
> 
>> 25.11.2017 10:05, Andrei Borzenkov пишет:
>> 
>>> In one of guides suggested procedure to simulate split brain was to kill
>>> corosync process. It actually worked on one cluster, but on another
>>> corosync process was restarted after being killed without cluster
>>> noticing anything. Except after several attempts pacemaker died with
>>> stopping resources ... :)
>>> 
>>> This is SLES12 SP2; I do not see any Restart in service definition so it
>>> probably not systemd.
>>> 
>> FTR - it was not corosync, but pacemakker; its unit file specifies
>> RestartOn=error so killing corosync caused pacemaker to fail and be
>> restarted by systemd.
> 
> And starting corosync via a Requires dependency?
 
 Exactly.
>>> 
>>> From my testing it looks like we should change
>>> "Requires=corosync.service" to "BindsTo=corosync.service"
>>> in pacemaker.service.
>>> 
>>> Could you give it a try?
>>> 
>> 
>> I'm not sure what is expected outcome, but pacemaker.service is still
>> restarted (due to Restart=on-failure).
> 
> Expected outcome is that pacemaker.service will become
> "inactive (dead)" after killing corosync (as a result of being
> "bound" by pacemaker).  Have you indeed issued "systemctl
> daemon-reload" after updating the pacemaker unit file?
> 
> (FTR, I tried with systemd 235).
> 
>> If intention is to unconditionally stop it when corosync dies,
>> pacemaker should probably exit with unique code and unit files have
>> RestartPreventExitStatus set to it.
> 
> That would be an elaborate way to reach the same.
> 
> But good point in questioning what's the "best intention" around these
> scenarios -- normally, fencing would happen, but as you note, the node
> had actually survived by being fast enough to put corosync back to
> life, and from there, whether it adds any value to have pacemaker
> restarted on non-clean terminations at all.  I don't know.
> 
> Would it make more sense to have FailureAction=reboot-immediate to
> at least in part emulate the fencing instead?

Although the restart may be also blazingly fast in some cases,
not making much difference except for taking all the previously
running resources forcibly down as an extra step, which may be
either good or bad.

-- 
Jan (Poki)


pgpo6ZFEeT30X.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is corosync supposed to be restarted if it fies?

2017-11-29 Thread Jan Pokorný
On 28/11/17 22:35 +0300, Andrei Borzenkov wrote:
> 28.11.2017 13:01, Jan Pokorný пишет:
>> On 27/11/17 17:43 +0300, Andrei Borzenkov wrote:
>>> Отправлено с iPhone
>>> 
 27 нояб. 2017 г., в 14:36, Ferenc Wágner  написал(а):
 
 Andrei Borzenkov  writes:
 
> 25.11.2017 10:05, Andrei Borzenkov пишет:
> 
>> In one of guides suggested procedure to simulate split brain was to kill
>> corosync process. It actually worked on one cluster, but on another
>> corosync process was restarted after being killed without cluster
>> noticing anything. Except after several attempts pacemaker died with
>> stopping resources ... :)
>> 
>> This is SLES12 SP2; I do not see any Restart in service definition so it
>> probably not systemd.
>> 
> FTR - it was not corosync, but pacemakker; its unit file specifies
> RestartOn=error so killing corosync caused pacemaker to fail and be
> restarted by systemd.
 
 And starting corosync via a Requires dependency?
>>> 
>>> Exactly.
>> 
>> From my testing it looks like we should change
>> "Requires=corosync.service" to "BindsTo=corosync.service"
>> in pacemaker.service.
>> 
>> Could you give it a try?
>> 
> 
> I'm not sure what is expected outcome, but pacemaker.service is still
> restarted (due to Restart=on-failure).

Expected outcome is that pacemaker.service will become
"inactive (dead)" after killing corosync (as a result of being
"bound" by pacemaker).  Have you indeed issued "systemctl
daemon-reload" after updating the pacemaker unit file?

(FTR, I tried with systemd 235).

> If intention is to unconditionally stop it when corosync dies,
> pacemaker should probably exit with unique code and unit files have
> RestartPreventExitStatus set to it.

That would be an elaborate way to reach the same.

But good point in questioning what's the "best intention" around these
scenarios -- normally, fencing would happen, but as you note, the node
had actually survived by being fast enough to put corosync back to
life, and from there, whether it adds any value to have pacemaker
restarted on non-clean terminations at all.  I don't know.

Would it make more sense to have FailureAction=reboot-immediate to
at least in part emulate the fencing instead?

-- 
Jan (Poki)


pgpvr3dRWe6V_.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Klaus Wenninger
On 11/29/2017 09:09 PM, Kristoffer Grönlund wrote:
> Adam Spiers  writes:
>
>> OK, so reading between the lines, if we don't want our cluster's
>> latest config changes accidentally discarded during a complete cluster
>> reboot, we should ensure that the last man standing is also the first
>> one booted up - right?
> That would make sense to me, but I don't know if it's the only
> solution. If you separately ensure that they all have the same
> configuration first, you could start them in any order I guess.

I guess it is not that bad as after the last man standing has left
the stage it would take a quorate number (actually depending on
how many you allow to survive) of nodes till anything
happens again (equivalent to wait-for-all in 2-node clusters).
And one of these should have a reasonably current cib.

>
>> If so, I think that's a perfectly reasonable thing to ask for, but
>> maybe it should be documented explicitly somewhere?  Apologies if it
>> is already and I missed it.
> Yeah, maybe a section discussing both starting and stopping a whole
> cluster would be helpful, but I don't know if I feel like I've thought
> about it enough myself. Regarding the HP Service Guard commands that
> Ulrich Windl mentioned, the very idea of such commands offends me on
> some level but I don't know if I can clearly articulate why. :D
>


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Kristoffer Grönlund
Adam Spiers  writes:

>
> OK, so reading between the lines, if we don't want our cluster's
> latest config changes accidentally discarded during a complete cluster
> reboot, we should ensure that the last man standing is also the first
> one booted up - right?

That would make sense to me, but I don't know if it's the only
solution. If you separately ensure that they all have the same
configuration first, you could start them in any order I guess.

>
> If so, I think that's a perfectly reasonable thing to ask for, but
> maybe it should be documented explicitly somewhere?  Apologies if it
> is already and I missed it.

Yeah, maybe a section discussing both starting and stopping a whole
cluster would be helpful, but I don't know if I feel like I've thought
about it enough myself. Regarding the HP Service Guard commands that
Ulrich Windl mentioned, the very idea of such commands offends me on
some level but I don't know if I can clearly articulate why. :D

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers

Kristoffer Gronlund  wrote:

Adam Spiers  writes:

Kristoffer Gronlund  wrote:

Adam Spiers  writes:


- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question: what
  happens if the last node to shut down is not the first to start up?
  How will the cluster ensure it has the most recent version of the
  CIB?  Without that, how would it know whether the last man standing
  was shut down cleanly or not?)


This is my opinion, I don't really know what the "official" pacemaker
stance is: There is no such thing as shutting down a cluster cleanly. A
cluster is a process stretching over multiple nodes - if they all shut
down, the process is gone. When you start up again, you effectively have
a completely new cluster.


Sorry, I don't follow you at all here.  When you start the cluster up
again, the cluster config from before the shutdown is still there.
That's very far from being a completely new cluster :-)


You have a new cluster with (possibly fragmented) memories of a previous
life ;)


Well yeah, that's another way of describing it :-)


Yes, exactly.  If the first node to start up was not the last man
standing, the CIB history is effectively being forked.  So how is this
issue avoided?


The only way to bring up a cluster from being completely stopped is to
treat it as creating a completely new cluster. The first node to start
"creates" the cluster and later nodes join that cluster.


That's ignoring the cluster config, which persists even when the
cluster's down.


There could be a command in pacemaker which resets a set of nodes to a
common known state, basically to pick the CIB from one of the nodes as
the survivor and copy that to all of them. But in the end, that's just
the same thing as just picking one node as the first node, and telling
the others to join that one and to discard their configurations. So,
treating it as a new cluster.


OK, so reading between the lines, if we don't want our cluster's
latest config changes accidentally discarded during a complete cluster
reboot, we should ensure that the last man standing is also the first
one booted up - right?

If so, I think that's a perfectly reasonable thing to ask for, but
maybe it should be documented explicitly somewhere?  Apologies if it
is already and I missed it.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cluster with two ESX server

2017-11-29 Thread Klaus Wenninger
On 11/29/2017 08:24 PM, Andrei Borzenkov wrote:
> 29.11.2017 20:14, Klaus Wenninger пишет:
>> On 11/28/2017 07:41 PM, Andrei Borzenkov wrote:
>>> 28.11.2017 10:45, Ramann, Björn пишет:
 hi@all,

 in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now 
 I'm looking for a way to configure the cluster fence/stonith with two ESX 
 server - is this possible?
>>> if you have sgared storage, SBD may be an option.
>> True.
>> And if you feel like experimenting you can have a look at
>> https://github.com/wenningerk/sbd/tree/vmware.
>>
>> On ESX you don't have virtual watchdog-devices with
>> a kernel-driver sitting on top (contrary to e.g.
>> with qemu-kvm).
>> This basically is a test-implementation using
>> vSphere HA Application Monitoring as a replacement.
>>
> This sure sounds interesting. Does it work with open-vm-tools or does it
> require VMware tools?

Unfortunately with none of both.
You need libappmonitorlib.so from GuestSDK which I didn't
find anywhere else.
Apart from that library you are fine with open-vm-tools.
See VMware_GuestSDK.spec from my github-repo for details.

When setting up a vSphere Cluster enable Application
Monitoring and check that the following is true.

('Failure interval' = 'Minimum uptime') * 'Maximum per-VM resets' ==
'Maximum reset time window'

Otherwise your 'watchdog' will stop working after 3 resets
till the reset time window is over (maybe never).

Regards,
Klaus

>
>> In comparison to using softdog this approach doesn't rely
>> on any working code inside vm to trigger a reboot.
>>
>>  [root@node4 ~]# sbd query-watchdog
>>
>>   Discovered 3 watchdog devices:
>>
>>   [1] vmware
>>   Identity: VMware Application Monitoring (gray)
>>   Driver: 
>>
>>   [2] /dev/watchdog
>>   Identity: Software Watchdog
>>   Driver: softdog
>>   CAUTION: Not recommended for use with sbd.
>>
>>   [3] /dev/watchdog0
>>   Identity: Software Watchdog
>>   Driver: softdog
>>   CAUTION: Not recommended for use with sbd.
>>
>>
>> Have in mind that this is just a proof-of-concept
>> implementation. So expect any kind of changes and
>> be aware that in the current state it is definitely
>> not fit to go into any distribution.
>>
>> Regarding building you can find VMware_GuestSDK.spec
>> in the vmware-branch of my sbd-fork.
>> Basically this builds rpms from the vmware-GuestSDK-tarball -
>> both library-binary-rpm for the target and devel-rpm
>> for building vmware-enabled-sbd.
>>
>> Regards,
>> Klaus
>>
 I try to us  fence_vmware with vcenter, but then the vcenter is  a single 
 point of failure und running two vcenter is current not possible.

>>> You can run vcenter on vFT VM in which case it should be pretty robust.
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cluster with two ESX server

2017-11-29 Thread Andrei Borzenkov
29.11.2017 20:14, Klaus Wenninger пишет:
> On 11/28/2017 07:41 PM, Andrei Borzenkov wrote:
>> 28.11.2017 10:45, Ramann, Björn пишет:
>>> hi@all,
>>>
>>> in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now 
>>> I'm looking for a way to configure the cluster fence/stonith with two ESX 
>>> server - is this possible?
>> if you have sgared storage, SBD may be an option.
> 
> True.
> And if you feel like experimenting you can have a look at
> https://github.com/wenningerk/sbd/tree/vmware.
> 
> On ESX you don't have virtual watchdog-devices with
> a kernel-driver sitting on top (contrary to e.g.
> with qemu-kvm).
> This basically is a test-implementation using
> vSphere HA Application Monitoring as a replacement.
> 

This sure sounds interesting. Does it work with open-vm-tools or does it
require VMware tools?

> In comparison to using softdog this approach doesn't rely
> on any working code inside vm to trigger a reboot.
> 
>  [root@node4 ~]# sbd query-watchdog
> 
>   Discovered 3 watchdog devices:
> 
>   [1] vmware
>   Identity: VMware Application Monitoring (gray)
>   Driver: 
> 
>   [2] /dev/watchdog
>   Identity: Software Watchdog
>   Driver: softdog
>   CAUTION: Not recommended for use with sbd.
> 
>   [3] /dev/watchdog0
>   Identity: Software Watchdog
>   Driver: softdog
>   CAUTION: Not recommended for use with sbd.
> 
> 
> Have in mind that this is just a proof-of-concept
> implementation. So expect any kind of changes and
> be aware that in the current state it is definitely
> not fit to go into any distribution.
> 
> Regarding building you can find VMware_GuestSDK.spec
> in the vmware-branch of my sbd-fork.
> Basically this builds rpms from the vmware-GuestSDK-tarball -
> both library-binary-rpm for the target and devel-rpm
> for building vmware-enabled-sbd.
> 
> Regards,
> Klaus
> 
>>
>>> I try to us  fence_vmware with vcenter, but then the vcenter is  a single 
>>> point of failure und running two vcenter is current not possible.
>>>
>> You can run vcenter on vFT VM in which case it should be pretty robust.
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Kristoffer Grönlund
Adam Spiers  writes:

> Kristoffer Gronlund  wrote:
>>Adam Spiers  writes:
>>
>>> - The whole cluster is shut down cleanly.
>>>
>>> - The whole cluster is then started up again.  (Side question: what
>>>   happens if the last node to shut down is not the first to start up?
>>>   How will the cluster ensure it has the most recent version of the
>>>   CIB?  Without that, how would it know whether the last man standing
>>>   was shut down cleanly or not?)
>>
>>This is my opinion, I don't really know what the "official" pacemaker
>>stance is: There is no such thing as shutting down a cluster cleanly. A
>>cluster is a process stretching over multiple nodes - if they all shut
>>down, the process is gone. When you start up again, you effectively have
>>a completely new cluster.
>
> Sorry, I don't follow you at all here.  When you start the cluster up
> again, the cluster config from before the shutdown is still there.
> That's very far from being a completely new cluster :-)

You have a new cluster with (possibly fragmented) memories of a previous
life ;)

>
> Yes, exactly.  If the first node to start up was not the last man
> standing, the CIB history is effectively being forked.  So how is this
> issue avoided?
>
>>The only way to bring up a cluster from being completely stopped is to
>>treat it as creating a completely new cluster. The first node to start
>>"creates" the cluster and later nodes join that cluster.
>
> That's ignoring the cluster config, which persists even when the
> cluster's down.

There could be a command in pacemaker which resets a set of nodes to a
common known state, basically to pick the CIB from one of the nodes as
the survivor and copy that to all of them. But in the end, that's just
the same thing as just picking one node as the first node, and telling
the others to join that one and to discard their configurations. So,
treating it as a new cluster.

>
> But to be clear, you picked a small side question from my original
> post and answered that.  The main questions I had were about startup
> fencing :-)

I did! :)

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers

Klaus Wenninger  wrote:

On 11/29/2017 04:23 PM, Kristoffer Grönlund wrote:

Adam Spiers  writes:


- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question: what
  happens if the last node to shut down is not the first to start up?
  How will the cluster ensure it has the most recent version of the
  CIB?  Without that, how would it know whether the last man standing
  was shut down cleanly or not?)

This is my opinion, I don't really know what the "official" pacemaker
stance is: There is no such thing as shutting down a cluster cleanly. A
cluster is a process stretching over multiple nodes - if they all shut
down, the process is gone. When you start up again, you effectively have
a completely new cluster.

When starting up, how is the cluster, at any point, to know if the
cluster it has knowledge of is the "latest" cluster? The next node could
have a newer version of the CIB which adds yet more nodes to the
cluster.


To make it even clearer imagine a node being reverted
to a previous state by recovering it from a backup.


Yes, I'm asking how this kind of scenario is dealt with :-)

Another example is a config change being made after one or more of the
cluster nodes had already been shut down.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers

Kristoffer Gronlund  wrote:

Adam Spiers  writes:


- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question: what
  happens if the last node to shut down is not the first to start up?
  How will the cluster ensure it has the most recent version of the
  CIB?  Without that, how would it know whether the last man standing
  was shut down cleanly or not?)


This is my opinion, I don't really know what the "official" pacemaker
stance is: There is no such thing as shutting down a cluster cleanly. A
cluster is a process stretching over multiple nodes - if they all shut
down, the process is gone. When you start up again, you effectively have
a completely new cluster.


Sorry, I don't follow you at all here.  When you start the cluster up
again, the cluster config from before the shutdown is still there.
That's very far from being a completely new cluster :-)


When starting up, how is the cluster, at any point, to know if the
cluster it has knowledge of is the "latest" cluster?


That was exactly my question.


The next node could have a newer version of the CIB which adds yet
more nodes to the cluster.


Yes, exactly.  If the first node to start up was not the last man
standing, the CIB history is effectively being forked.  So how is this
issue avoided?


The only way to bring up a cluster from being completely stopped is to
treat it as creating a completely new cluster. The first node to start
"creates" the cluster and later nodes join that cluster.


That's ignoring the cluster config, which persists even when the
cluster's down.

But to be clear, you picked a small side question from my original
post and answered that.  The main questions I had were about startup
fencing :-)

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cluster with two ESX server

2017-11-29 Thread Klaus Wenninger
On 11/28/2017 07:41 PM, Andrei Borzenkov wrote:
> 28.11.2017 10:45, Ramann, Björn пишет:
>> hi@all,
>>
>> in my configuration, the 1st Node run on ESX1, the second run on ESX2. Now 
>> I'm looking for a way to configure the cluster fence/stonith with two ESX 
>> server - is this possible?
> if you have sgared storage, SBD may be an option.

True.
And if you feel like experimenting you can have a look at
https://github.com/wenningerk/sbd/tree/vmware.

On ESX you don't have virtual watchdog-devices with
a kernel-driver sitting on top (contrary to e.g.
with qemu-kvm).
This basically is a test-implementation using
vSphere HA Application Monitoring as a replacement.

In comparison to using softdog this approach doesn't rely
on any working code inside vm to trigger a reboot.

 [root@node4 ~]# sbd query-watchdog

  Discovered 3 watchdog devices:

  [1] vmware
  Identity: VMware Application Monitoring (gray)
  Driver: 

  [2] /dev/watchdog
  Identity: Software Watchdog
  Driver: softdog
  CAUTION: Not recommended for use with sbd.

  [3] /dev/watchdog0
  Identity: Software Watchdog
  Driver: softdog
  CAUTION: Not recommended for use with sbd.


Have in mind that this is just a proof-of-concept
implementation. So expect any kind of changes and
be aware that in the current state it is definitely
not fit to go into any distribution.

Regarding building you can find VMware_GuestSDK.spec
in the vmware-branch of my sbd-fork.
Basically this builds rpms from the vmware-GuestSDK-tarball -
both library-binary-rpm for the target and devel-rpm
for building vmware-enabled-sbd.

Regards,
Klaus

>
>> I try to us  fence_vmware with vcenter, but then the vcenter is  a single 
>> point of failure und running two vcenter is current not possible.
>>
> You can run vcenter on vFT VM in which case it should be pretty robust.
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Gao,Yan

On 11/29/2017 04:54 PM, Ken Gaillot wrote:

On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote:

The same questions apply if this troublesome node was actually a
remote node running pacemaker_remoted, rather than the 5th node in
the
cluster.


Remote nodes don't join at the crmd level as cluster nodes do, so they
don't "start up" in the same sense, and start-up fencing doesn't apply
to them. Instead, the cluster initiates the connection when called for
(I don't remember for sure whether it fences the remote node if the
connection fails, but that would make sense).
According to link_rsc2remotenode() and handle_startup_fencing(), similar 
"startup-fencing applies to remote nodes too. So if a remote resource 
fails to start, the remote node will be fenced. A global setting 
statup-fencing=false will change the behavior for remote nodes too.


Regards,
  Yan

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Ken Gaillot
On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote:
> Hi all,
> 
> A colleague has been valiantly trying to help me belatedly learn
> about
> the intricacies of startup fencing, but I'm still not fully
> understanding some of the finer points of the behaviour.
> 
> The documentation on the "startup-fencing" option[0] says
> 
> Advanced Use Only: Should the cluster shoot unseen nodes? Not
> using the default is very unsafe!
> 
> and that it defaults to TRUE, but doesn't elaborate any further:
> 
> https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacema
> ker_Explained/s-cluster-options.html
> 
> Let's imagine the following scenario:
> 
> - We have a 5-node cluster, with all nodes running cleanly.
> 
> - The whole cluster is shut down cleanly.
> 
> - The whole cluster is then started up again.  (Side question: what
>   happens if the last node to shut down is not the first to start up?
>   How will the cluster ensure it has the most recent version of the
>   CIB?  Without that, how would it know whether the last man standing
>   was shut down cleanly or not?)

Of course, the cluster can't know what CIB version nodes it doesn't see
have, so if a set of nodes is started with an older version, it will go
with that.

However, a node can't do much without quorum, so it would be difficult
to get in a situation where CIB changes were made with quorum before
shutdown, but none of those nodes are present at the next start-up with
quorum.

In any case, when a new node joins a cluster, the nodes do compare CIB
versions. If the new node has a newer CIB, the cluster will use it. If
other changes have been made since then, the newest CIB wins, so one or
the other's changes will be lost.

Whether missing nodes were shut down cleanly or not relates to your
next question ...

> - 4 of the nodes boot up fine and rejoin the cluster within the
>   dc-deadtime interval, foruming a quorum, but the 5th doesn't.
> 
> IIUC, with startup-fencing enabled, this will result in that 5th node
> automatically being fenced.  If I'm right, is that really *always*
> necessary?

It's always safe. :-) As you mentioned, if the missing node was the
last one alive in the previous run, the cluster can't know whether it
shut down cleanly or not. Even if the node was known to shut down
cleanly in the last run, the cluster still can't know whether the node
was started since then and is now merely unreachable. So, fencing is
necessary to ensure it's not accessing resources.

The same scenario is why a single node can't have quorum at start-up in
a cluster with "two_node" set. Both nodes have to see each other at
least once before they can assume it's safe to do anything.

> Let's suppose further that the cluster configuration is such that no
> stateful resources which could potentially conflict with other nodes
> will ever get launched on that 5th node.  For example it might only
> host stateless clones, or resources with require=nothing set, or it
> might not even host any resources at all due to some temporary
> constraints which have been applied.
> 
> In those cases, what is to be gained from fencing?  The only thing I
> can think of is that using (say) IPMI to power-cycle the node *might*
> fix whatever issue was preventing it from joining the cluster.  Are
> there any other reasons for fencing in this case?  It wouldn't help
> avoid any data corruption, at least.

Just because constraints are telling the node it can't run a resource
doesn't mean the node isn't malfunctioning and running it anyway. If
the node can't tell us it's OK, we have to assume it's not.

> Now let's imagine the same scenario, except rather than a clean full
> cluster shutdown, all nodes were affected by a power cut, but also
> this time the whole cluster is configured to *only* run stateless
> clones, so there is no risk of conflict between two nodes
> accidentally
> running the same resource.  On startup, the 4 nodes in the quorum
> have
> no way of knowing that the 5th node was also affected by the power
> cut, so in theory from their perspective it could still be running a
> stateless clone.  Again, is there anything to be gained from fencing
> the 5th node once it exceeds the dc-deadtime threshold for joining,
> other than the chance that a reboot might fix whatever was preventing
> it from joining, and get the cluster back to full strength?

If a cluster runs only services that have no potential to conflict,
then you don't need a cluster. :-)

Unique clones require communication even if they're stateless (think
IPaddr2). I'm pretty sure even some anonymous stateless clones require
communication to avoid issues.

> Also, when exactly does the dc-deadtime timer start ticking?
> Is it reset to zero after a node is fenced, so that potentially that
> node could go into a reboot loop if dc-deadtime is set too low?

A node's crmd starts the timer at start-up and whenever a new election
starts, and is stopped when the DC makes it a join offer. I don't t

[ClusterLabs] Antw: Re: questions about startup fencing

2017-11-29 Thread Ulrich Windl



> Adam Spiers  writes:
> 
>> - The whole cluster is shut down cleanly.
>>
>> - The whole cluster is then started up again.  (Side question: what
>>   happens if the last node to shut down is not the first to start up?
>>   How will the cluster ensure it has the most recent version of the
>>   CIB?  Without that, how would it know whether the last man standing
>>   was shut down cleanly or not?)
> 
> This is my opinion, I don't really know what the "official" pacemaker
> stance is: There is no such thing as shutting down a cluster cleanly. A
> cluster is a process stretching over multiple nodes - if they all shut
> down, the process is gone. When you start up again, you effectively have
> a completely new cluster.
> 
> When starting up, how is the cluster, at any point, to know if the
> cluster it has knowledge of is the "latest" cluster? The next node could
> have a newer version of the CIB which adds yet more nodes to the
> cluster.
> 
> The only way to bring up a cluster from being completely stopped is to
> treat it as creating a completely new cluster. The first node to start
> "creates" the cluster and later nodes join that cluster.

I think it is (once again) a problem of pacemaker: In HP Service Guard there
was a "cmhaltnode" to halt a node, and a "cmhaltcluster" (AFAIR) to halt the
whole cluster. The other direction was "cmrunnode" and "cmruncluster" (AFAIR).

So when doing it on the cluster level, all nodes end with the same information
(and can start with the "latest"...

> 
> Cheers,
> Kristoffer
> 
> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Klaus Wenninger
On 11/29/2017 04:23 PM, Kristoffer Grönlund wrote:
> Adam Spiers  writes:
>
>> - The whole cluster is shut down cleanly.
>>
>> - The whole cluster is then started up again.  (Side question: what
>>   happens if the last node to shut down is not the first to start up?
>>   How will the cluster ensure it has the most recent version of the
>>   CIB?  Without that, how would it know whether the last man standing
>>   was shut down cleanly or not?)
> This is my opinion, I don't really know what the "official" pacemaker
> stance is: There is no such thing as shutting down a cluster cleanly. A
> cluster is a process stretching over multiple nodes - if they all shut
> down, the process is gone. When you start up again, you effectively have
> a completely new cluster.
>
> When starting up, how is the cluster, at any point, to know if the
> cluster it has knowledge of is the "latest" cluster? The next node could
> have a newer version of the CIB which adds yet more nodes to the
> cluster.

To make it even clearer imagine a node being reverted
to a previous state by recovering it from a backup.

Regards,
Klaus

>
> The only way to bring up a cluster from being completely stopped is to
> treat it as creating a completely new cluster. The first node to start
> "creates" the cluster and later nodes join that cluster.
>
> Cheers,
> Kristoffer
>


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Kristoffer Grönlund
Adam Spiers  writes:

> - The whole cluster is shut down cleanly.
>
> - The whole cluster is then started up again.  (Side question: what
>   happens if the last node to shut down is not the first to start up?
>   How will the cluster ensure it has the most recent version of the
>   CIB?  Without that, how would it know whether the last man standing
>   was shut down cleanly or not?)

This is my opinion, I don't really know what the "official" pacemaker
stance is: There is no such thing as shutting down a cluster cleanly. A
cluster is a process stretching over multiple nodes - if they all shut
down, the process is gone. When you start up again, you effectively have
a completely new cluster.

When starting up, how is the cluster, at any point, to know if the
cluster it has knowledge of is the "latest" cluster? The next node could
have a newer version of the CIB which adds yet more nodes to the
cluster.

The only way to bring up a cluster from being completely stopped is to
treat it as creating a completely new cluster. The first node to start
"creates" the cluster and later nodes join that cluster.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] building from source

2017-11-29 Thread Ken Gaillot
On Tue, 2017-11-28 at 11:23 -0800, Aaron Cody wrote:
> I'm trying to build all of the pacemaker/corosync components from
> source instead of using the redhat rpms - I have a few questions.
> 
> I'm building on redhat 7.2 and so far I have been able to build:
> 
> libqb 1.0.2
> pacemaker 1.1.18
> corosync 2.4.3
> resource-agents 4.0.1
> 
> however I have not been able to build pcs yet, i'm getting ruby
> errors:
> 
> sudo make install_pcsd
> which: no python3 in (/sbin:/bin:/usr/sbin:/usr/bin)
> make -C pcsd build_gems
> make[1]: Entering directory `/home/whacuser/pcs/pcsd'
> bundle package
> `ruby_22` is not a valid platform. The available options are: [:ruby,
> :ruby_18, :ruby_19, :ruby_20, :ruby_21, :mri, :mri_18, :mri_19,
> :mri_20, :mri_21, :rbx, :jruby,
> :jruby_18, :jruby_19, :mswin, :mingw, :mingw_18, :mingw_19,
> :mingw_20, :mingw_21, :x64_mingw, :x64_mingw_20, :x64_mingw_21]
> make[1]: *** [get_gems] Error 4
> make[1]: Leaving directory `/home/whacuser/pcs/pcsd'
> make: *** [install_pcsd] Error 2
> 
> 
> Q1: Is this the complete set of components I need to build?

Not considering pcs, yes.

> Q2: do I need cluster-glue?

It's only used now to be able to use heartbeat-style fence agents. If
you have what you need in Red Hat's fence agent packages, you don't
need it.

> Q3: any idea how I can get past the build error with pcsd?
> Q4: if I use the pcs rpm instead of building pcs from source, I see
> an error when my cluster starts up 'unable to get cib'. This didn't
> happen when I was using the redhat rpms, so i'm wondering what i'm
> missing...
> 
> thanks

pcs development is closely tied to Red Hat releases, so it's hit-or-
miss mixing and matching pcs and RHEL versions. Upgrading to RHEL 7.4
would get you recent versions of everything, though, so that would be
easiest if it's an option.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers

Hi all,

A colleague has been valiantly trying to help me belatedly learn about
the intricacies of startup fencing, but I'm still not fully
understanding some of the finer points of the behaviour.

The documentation on the "startup-fencing" option[0] says

   Advanced Use Only: Should the cluster shoot unseen nodes? Not
   using the default is very unsafe!

and that it defaults to TRUE, but doesn't elaborate any further:

   
https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/s-cluster-options.html

Let's imagine the following scenario:

- We have a 5-node cluster, with all nodes running cleanly.

- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question: what
 happens if the last node to shut down is not the first to start up?
 How will the cluster ensure it has the most recent version of the
 CIB?  Without that, how would it know whether the last man standing
 was shut down cleanly or not?)

- 4 of the nodes boot up fine and rejoin the cluster within the
 dc-deadtime interval, foruming a quorum, but the 5th doesn't.

IIUC, with startup-fencing enabled, this will result in that 5th node
automatically being fenced.  If I'm right, is that really *always*
necessary?

Let's suppose further that the cluster configuration is such that no
stateful resources which could potentially conflict with other nodes
will ever get launched on that 5th node.  For example it might only
host stateless clones, or resources with require=nothing set, or it
might not even host any resources at all due to some temporary
constraints which have been applied.

In those cases, what is to be gained from fencing?  The only thing I
can think of is that using (say) IPMI to power-cycle the node *might*
fix whatever issue was preventing it from joining the cluster.  Are
there any other reasons for fencing in this case?  It wouldn't help
avoid any data corruption, at least.

Now let's imagine the same scenario, except rather than a clean full
cluster shutdown, all nodes were affected by a power cut, but also
this time the whole cluster is configured to *only* run stateless
clones, so there is no risk of conflict between two nodes accidentally
running the same resource.  On startup, the 4 nodes in the quorum have
no way of knowing that the 5th node was also affected by the power
cut, so in theory from their perspective it could still be running a
stateless clone.  Again, is there anything to be gained from fencing
the 5th node once it exceeds the dc-deadtime threshold for joining,
other than the chance that a reboot might fix whatever was preventing
it from joining, and get the cluster back to full strength?

Also, when exactly does the dc-deadtime timer start ticking?
Is it reset to zero after a node is fenced, so that potentially that
node could go into a reboot loop if dc-deadtime is set too low?

The same questions apply if this troublesome node was actually a
remote node running pacemaker_remoted, rather than the 5th node in the
cluster.

I have an uncomfortable feeling that I'm missing something obvious,
probably due to the documentation's warning that "Not using the
default [for startup-fencing] is very unsafe!"  Or is it only unsafe
when the resource which exceeded dc-deadtime on startup could
potentially be running a stateful resource which the cluster now wants
to restart elsewhere?  If that's the case, would it be possible to
optionally limit startup fencing to when it's really needed?

Thanks for any light you can shed!

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] building from source

2017-11-29 Thread Aaron Cody
I'm trying to build all of the pacemaker/corosync components from
source instead of using the redhat rpms - I have a few questions.

I'm building on redhat 7.2 and so far I have been able to build:

libqb 1.0.2
pacemaker 1.1.18
corosync 2.4.3
resource-agents 4.0.1

however I have not been able to build pcs yet, i'm getting ruby errors:

sudo make install_pcsd
which: no python3 in (/sbin:/bin:/usr/sbin:/usr/bin)
make -C pcsd build_gems
make[1]: Entering directory `/home/whacuser/pcs/pcsd'
bundle package
`ruby_22` is not a valid platform. The available options are: [:ruby,
:ruby_18, :ruby_19, :ruby_20, :ruby_21, :mri, :mri_18, :mri_19, :mri_20,
:mri_21, :rbx, :jruby,
:jruby_18, :jruby_19, :mswin, :mingw, :mingw_18, :mingw_19, :mingw_20,
:mingw_21, :x64_mingw, :x64_mingw_20, :x64_mingw_21]
make[1]: *** [get_gems] Error 4
make[1]: Leaving directory `/home/whacuser/pcs/pcsd'
make: *** [install_pcsd] Error 2


Q1: Is this the complete set of components I need to build?
Q2: do I need cluster-glue?
Q3: any idea how I can get past the build error with pcsd?
Q4: if I use the pcs rpm instead of building pcs from source, I see an
error when my cluster starts up 'unable to get cib'. This didn't happen
when I was using the redhat rpms, so i'm wondering what i'm missing...

thanks
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org