Re: [ClusterLabs] Fwd: FW: heartbeat can monitor virtual IP alive or not .

2016-04-21 Thread Digimer
ter for mailing lists). > Could you explain in *heartbeat can monitor virtual IP alive or not* > please ? thank a lot. Pacemaker can do this just fine. It's one of the initial examples in "Clusters from Scratch"; http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/

Re: [ClusterLabs] STONITH Fencing for Amazon EC2

2016-08-02 Thread Digimer
thing else I should be looking for? I *think* it fell behind (fence_ec2, iirc). It might need to be picked up, updated/tested and then it can be re-added to the official list. I'm not 100% on this though, so if someone contradicts me, ignore me. -- Digimer Papers and Projects: https://alteev

[ClusterLabs] clvmd in rgmanager with self_fence="true"

2016-08-02 Thread Digimer
ve come up frustratingly blank. If anyone can give me a clue, I would be very grateful. :) digimer 1. https://bugzilla.redhat.com/show_bug.cgi?id=1349755 -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to

Re: [ClusterLabs] Default Behavior

2016-06-28 Thread Digimer
; *Vladimir Pavlov* > > > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clu

Re: [ClusterLabs] Proposed change for 1.1.16: ending python 2.6 compatibility

2016-07-05 Thread Digimer
I say go for it. A key to good HA is simplicity, and maintaining two branches (or getting stuck on a dead-end branch) seems to go against that ethos. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?

Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Digimer
On 04/08/16 11:44 PM, Andrei Borzenkov wrote: > 05.08.2016 02:33, Digimer пишет: >> On 04/08/16 07:21 PM, Dan Swartzendruber wrote: >>> On 2016-08-04 19:03, Digimer wrote: >>>> On 04/08/16 06:56 PM, Dan Swartzendruber wrote: >>>>> I'm setting up an

Re: [ClusterLabs] Antw: Fencing with a 3-node (1 for quorum only) cluster

2016-08-05 Thread Digimer
of the hosts. >> All it is for is quorum. So, looking at fencing next. The primary > > I wonder what happens if the machine where the VM runs crashes (2 of 3 nodes > down). 2 of 3 dead is loss of quorum. Surviving node stops offering cluster services when it could have otherwise su

Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-06 Thread Digimer
On 06/08/16 08:22 PM, Dan Swartzendruber wrote: > On 2016-08-06 19:46, Digimer wrote: >> On 06/08/16 07:33 PM, Dan Swartzendruber wrote: >>> >>> Okay, I almost have this all working. fence_ipmilan for the supermicro >>> host. Had to specify lanplus for i

Re: [ClusterLabs] Recovering after split-brain

2016-06-20 Thread Digimer
On 20/06/16 05:58 PM, Dimitri Maziuk wrote: > On 06/20/2016 03:58 PM, Digimer wrote: > >> Then wouldn't it be a lot better to just run your services on both nodes >> all the time and take HA out of the picture? Availability is predicated >> on building the simplest syst

Re: [ClusterLabs] [ClusterLabs Developers] HA/Clusterlabs Summit 2017 Proposal

2017-01-31 Thread Digimer
On 31/01/17 03:19 AM, Kristoffer Grönlund wrote: > Digimer <li...@alteeve.ca> writes: > >> On 30/01/17 09:23 AM, Kristoffer Grönlund wrote: >>> Hi everyone! >>> >>> The last time we had an HA summit was in 2015, and the intention then >>> was

Re: [ClusterLabs] [ClusterLabs Developers] HA/Clusterlabs Summit 2017 Proposal

2017-01-31 Thread Digimer
lier (change to Wed/Thu instead of Thu/Fri) to >> make it easier for people traveling to/from the conference. > > Hi Chris, > > Sounds great! Happy to move it to September 6-7 if that works out > better. > > Cheers, > Kristoffer I've updated the wiki to set the

Re: [ClusterLabs] [ClusterLabs Developers] HA/Clusterlabs Summit 2017 Proposal

2017-01-31 Thread Digimer
nformal sight seeing outing for the following weekend, whether it be held Wed/Thu or Thu/Fri. The last few times I've been to Europe, I afforded myself little to no time to see any sights. I don't plan to rush out this time, and would love to have some friendly company. :) -- Digimer Papers an

Re: [ClusterLabs] Releasing crmsh version 3.0.0

2017-01-31 Thread Digimer
pensuse.org/repositories/network:/ha-clustering:/Stable/ > > Archives of the tagged release: > > * https://github.com/ClusterLabs/crmsh/archive/3.0.0.tar.gz > * https://github.com/ClusterLabs/crmsh/archive/3.0.0.zip > > As usual, a huge thank you to all contributors and users o

Re: [ClusterLabs] resource-agents v4.0.0

2017-01-31 Thread Digimer
: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty

Re: [ClusterLabs] [ClusterLabs Developers] HA/Clusterlabs Summit 2017 Proposal

2017-01-31 Thread Digimer
On 31/01/17 03:19 AM, Kristoffer Grönlund wrote: > Digimer <li...@alteeve.ca> writes: > >> On 30/01/17 09:23 AM, Kristoffer Grönlund wrote: >>> Hi everyone! >>> >>> The last time we had an HA summit was in 2015, and the intention then >>> was

Re: [ClusterLabs] HA/Clusterlabs Summit 2017 Proposal

2017-02-06 Thread Digimer
there are, though, is here: http://plan.alteeve.ca/index.php/HA_Cluster_Summit_2015 Please feel free to comment/edit as you wish. I can set up an account on the wiki if you don't have one from last time (I only close it normally to keep the spammers out). digimer On 07/02/17 12:47 AM, Gang He wrote: >

Re: [ClusterLabs] Primitive and clone relation .

2017-02-05 Thread Digimer
/clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html#_ensure_resources_run_on_the_same_host -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certai

Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-01-30 Thread Digimer
running. If you can pass both of these tests, you will have simulated most all possible node failure modes (I say 'most' because it is impossible to think of everything :) ). -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convoluti

Re: [ClusterLabs] snapshots in a clvm environment - some questions for proceeding

2017-02-24 Thread Digimer
system and applications are in a clean state when you take the snapshot, so using the image is like recovering from sudden power loss. If data was in cache but not flushed out, you could have corruption. If you can't stop your VMs, I'd recommend using a backup application inside the VM that kn

Re: [ClusterLabs] snapshots in a clvm environment - some questions for proceeding

2017-02-24 Thread Digimer
On 24/02/17 06:27 PM, Lentes, Bernd wrote: > > > - On Feb 24, 2017, at 7:20 PM, Digimer li...@alteeve.ca wrote: > > >>> >>> I read >>> https://www.centos.org/docs/5/html/Cluster_Logical_Volume_Manager/cluster_activation.html >>> . >>

Re: [ClusterLabs] using IPMI for fencing - configuring IPMI with ipmitool - HELP

2017-02-28 Thread Digimer
i curse in a mailing > list, but ipmitool really frustrates me. > Why can't i set access to this channel ? I'm running the commands as root. > It's ipmitool 1.8.15. > > Can someone help me in configuring IPMI that i can used it from the other > node to fence this node ? > > Bi

Re: [ClusterLabs] pacemaker doesn't failover when httpd killed

2016-09-05 Thread Digimer
Depends on your OS, but generally /var/log/messages. Also, please share your full pacemaker config. Please only obfuscate passwords. digimer On 05/09/16 07:53 PM, Nurit Vilosny wrote: > Hi Kristoffer, > Thanks for the prompt answer. > Result of kill -9 is a dead process. Restart is

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-07 Thread Digimer
oot@node1 ~]# > > Any help would be appreciated, I think there is something dumb that I'm > missing. > > Thank you. > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > >

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-08 Thread Digimer
h PDUs are called to open the circuits feeding the lost node, thus ensuring it is off. If for some reason both methods fail, pacemaker goes back to IPMI and tries that again, then on to PDUs, ... and will loop until one of the methods succeeds, leaving the cluster (intentionally) hung in the mean

Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-09-08 Thread Digimer
e method with a pair of switched PDUs as a backup fence method. This provides full coverage and is generally a lot faster. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _

Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-09-08 Thread Digimer
he starved node should be declared lost by corosync, the remaining nodes reform and if they're still quorate, the hung node should be fenced. Recovery occur and life goes on. Unless you don't have fencing, then may $deity of mercy. ;) -- Digimer Papers and Projects: https://alteeve.ca/w

Re: [ClusterLabs] stonithd/fenced filling up logs

2016-10-04 Thread Digimer
a lot of "tutorials" make when the author doesn't understand the role of fencing. In your case, pcs setup cman to use the fence_pcmk "passthrough" fence agent, as it should. So when something went wrong, corosync detected it, informed cman which then requested pacemaker to fe

[ClusterLabs] Live migration problem

2016-10-05 Thread Digimer
in production, and many are using the .102 drivers. So I have a feeling that it wasn't so much the upgrade that made the difference, but instead the reinstall of the drivers. I have no idea why this bug happened, but hopefully this might save someone some grief in the future if they hit the same.

Re: [ClusterLabs] Cluster active/active

2016-10-08 Thread Digimer
Can you share your current full configuration please? If you're hitting errors, please also share the relevant log entries from the nodes. digimer On 07/10/16 09:06 PM, Dayvidson Bezerra wrote: > The company only uses Ubuntu, and do not want another distro in your > environment. &g

Re: [ClusterLabs] stonithd/fenced filling up logs

2016-10-05 Thread Digimer
ry best, you lose your services. At worst, you corrupt your data. Why risk that at all when fencing solves the problem perfectly fine? -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _

Re: [ClusterLabs] stonithd/fenced filling up logs

2016-10-04 Thread Digimer
On 04/10/16 07:09 PM, Israel Brewster wrote: > On Oct 4, 2016, at 3:03 PM, Digimer <li...@alteeve.ca> wrote: >> >> On 04/10/16 06:50 PM, Israel Brewster wrote: >>> On Oct 4, 2016, at 2:26 PM, Ken Gaillot <kgail...@redhat.com >>> <mailto:kgail...@redhat.

Re: [ClusterLabs] stonithd/fenced filling up logs

2016-10-04 Thread Digimer
On 04/10/16 07:50 PM, Israel Brewster wrote: > On Oct 4, 2016, at 3:38 PM, Digimer <li...@alteeve.ca> wrote: >> >> On 04/10/16 07:09 PM, Israel Brewster wrote: >>> On Oct 4, 2016, at 3:03 PM, Digimer <li...@alteeve.ca> wrote: >>>> >>>>

[ClusterLabs] Recovering a failed (but running) server in rgmanager

2016-09-18 Thread Digimer
cts that the server is running fine and simply marks the server as 'started'. Is there no way to do something similar to go 'failed' -> 'started' without the 'disable' step? I tried freezing the service, no luck. I also tried coalescing via '-c', but that didn't help either. Thanks! -- Digimer

Re: [ClusterLabs] [rgmanager] Recovering a failed (but running) server in rgmanager

2016-09-19 Thread Digimer
On 19/09/16 03:07 PM, Digimer wrote: > On 19/09/16 02:39 PM, Digimer wrote: >> On 19/09/16 02:30 PM, Jan Pokorný wrote: >>> On 18/09/16 15:37 -0400, Digimer wrote: >>>> If, for example, a server's definition file is corrupted while the >>>> server

Re: [ClusterLabs] [rgmanager] Recovering a failed (but running) server in rgmanager

2016-09-19 Thread Digimer
On 19/09/16 03:13 PM, Digimer wrote: > On 19/09/16 03:07 PM, Digimer wrote: >> On 19/09/16 02:39 PM, Digimer wrote: >>> On 19/09/16 02:30 PM, Jan Pokorný wrote: >>>> On 18/09/16 15:37 -0400, Digimer wrote: >>>>> If, for example, a server's defini

Re: [ClusterLabs] [rgmanager] Recovering a failed (but running) server in rgmanager

2016-09-19 Thread Digimer
On 19/09/16 02:30 PM, Jan Pokorný wrote: > On 18/09/16 15:37 -0400, Digimer wrote: >> If, for example, a server's definition file is corrupted while the >> server is running, rgmanager will put the server into a 'failed' state. >> That's fine and fair. > > Please,

Re: [ClusterLabs] best practice fencing with ipmi in 2node-setups / cloneresource/monitor/timeout

2016-09-20 Thread Digimer
reason to shutdown a node. What is your opinion on > this? Can i just set the primitive monitor operation to disabled? Monitoring is how you will detect that, for example, the IPMI cable failed or was unplugged. I do not believe the node will get fenced on fence agent monitor failing... At least not b

Re: [ClusterLabs] Which cluster HA package to choose

2016-08-22 Thread Digimer
gt; Do you know if the latest ver is stable? > > And which companies are using it? > > > > Thanks in advance, > > Ron Short answer is "Corosync v2 + pacemaker 1.1.10+" (1.1.14+, ideally) Long answer is here: https://alteeve.ca/w/History_of_HA_Clustering --

Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-09 Thread Digimer
On 09/10/16 11:58 PM, Andrei Borzenkov wrote: > 10.10.2016 00:42, Eric Robinson пишет: >> Digimer, thanks for your thoughts. Booth is one of the solutions I >> looked at, but I don't like it because it is complex and difficult to >> implement > > HA is comp

Re: [ClusterLabs] Can Bonding Cause a Broadcast Storm?

2016-11-15 Thread Digimer
The condition stopped as > soon as the Linux server in question became reachable again. > > -- > Eric Robinson A properly build mode=1 bond will only use one interface or the other, not both, so it shouldn't cause a storm. -- Digimer Papers and Projects: https://alteeve.ca/w/ What

Re: [ClusterLabs] [pacemaker+ clvm] Cluster lvm must be active exclusively to create snapshot

2016-12-05 Thread Digimer
VG Attr > LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert > volume-1b0ea468-37c8-4b47-a6fa-6cce65b068b5 cinder-volumes -wi--- > 1.00g > > > thank you very much! > > > > > > __

Re: [ClusterLabs] [pacemaker+ clvm] Cluster lvm must be active exclusively to create snapshot

2016-12-05 Thread Digimer
On 05/12/16 10:32 PM, su liu wrote: > Digimer, thank you very much! > > I do not need to have the data accessible on both nodes at once. I want > to use the clvm+pacemaker+corosync in OpenStack Cinder. I'm not sure what "cinder" is, so I don't know what it needs to work.

Re: [ClusterLabs] [pacemaker+ clvm] Cluster lvm must be active exclusively to create snapshot

2016-12-05 Thread Digimer
r 'lvscan'? You should see it on both nodes at the same time as soon as it is created, *if* things are working properly. It is possible, without stonith, that they are not. Please configure and test stonith, and see if the problem remains. If it does, tail the system logs on both nodes, creat

Re: [ClusterLabs] New ClusterLabs logo unveiled :-)

2016-12-22 Thread Digimer
; Nice, and congratulations, Krig, for the logo escalation :) > > (Still looking forward to seeing the animated version...) I failed to get the designer to get an update for me, but it doesn't matter because I do really like this one. Thanks, Ken! -- Digimer Papers and Projects: https://al

Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-24 Thread Digimer
16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > stderr: [ ] > Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > stderr: [ Failed: keys cannot be same. You can not fence yourself. ] > Mar 24 16:35:31 b015 stonith-ng[2251]: warning: fence_scsi[20020] > stderr: [ ] > Mar 24 16:35:31 b015 stonith-ng[2251]:

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-21 Thread Digimer
; question is: how to avoid such behaviour? > > Thank you! Please share your config along with the logs from the nodes that were effected. cheers, digimer -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einste

Re: [ClusterLabs] [Linux-cluster] Active/passive cluster between physical and VM

2017-03-22 Thread Digimer
is deprecated, so please switch over to there (http://lists.clusterlabs.org/mailman/listinfo/users). -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal tal

Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-31 Thread Digimer
I think it is reasonable to expect corosync to handle this properly. How hard would it be to make corosync resilient to this fault case? -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einst

[ClusterLabs] Teamed interfaces and Corosync v2

2017-04-08 Thread Digimer
contraindications for using broadcast teams (or teamd at all) under corosync v2? Thanks! -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and

Re: [ClusterLabs] Loss of quoram not detected. corosync 1.4.8 , pacemaker 1.1.14. CentOS 6

2017-04-13 Thread Digimer
configure everything for you. Do NOT configure corosync directly; You need to configure it inside cman itself. Reset corosync.conf back to defaults. Reference; http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html -- Digimer Papers and Projects: ht

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-16 Thread Digimer
On 16/04/17 04:04 PM, Eric Robinson wrote: >> -Original Message- >> From: Digimer [mailto:li...@alteeve.ca] >> Sent: Sunday, April 16, 2017 11:17 AM >> To: Cluster Labs - All topics related to open-source clustering welcomed >> <users@clusterlab

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Digimer
that's not his problem. > > Dima Can you elaborate? I'm not following your point/concern here... Availability is all about making sure customers/users can access their services. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and conv

Re: [ClusterLabs] Antw: Re: how important would you consider to have two independent fencing device for each node ?

2017-04-19 Thread Digimer
On 19/04/17 02:38 AM, Ulrich Windl wrote: >>>> Digimer <li...@alteeve.ca> schrieb am 18.04.2017 um 19:08 in Nachricht > <26e49390-b384-b46e-4965-eba5bfe59...@alteeve.ca>: >> On 18/04/17 11:07 AM, Lentes, Bernd wrote: >>> Hi, >>> >>> i'm

Re: [ClusterLabs] Loss of quoram not detected. corosync 1.4.8 , pacemaker 1.1.14. CentOS 6

2017-04-13 Thread Digimer
it, and you can pretend cman doesn't exist for all intent and purpose. If that's not good enough, switch to EL7 where it's pure pacemaker and corosync v2. digimer On 13/04/17 06:18 PM, neeraj ch wrote: > I have dreaded that answer. Maybe I can fix vote quorum on corosync 1.4. > Or maybe I can

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-16 Thread Digimer
thing should ever be more important than availability, and availability is a product of simplicity. So in my view, a 3-node cluster adds complexity that is avoidable, and so is sub-optimal. I'm happy to answer any questions you have on my comments/point of view on this. -- Digimer Paper

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Digimer
On 18/04/17 03:47 AM, Ulrich Windl wrote: >>>> Digimer <li...@alteeve.ca> schrieb am 16.04.2017 um 20:17 in Nachricht > <12cde13f-8bad-a2f1-6834-960ff3afc...@alteeve.ca>: >> On 16/04/17 01:53 PM, Eric Robinson wrote: >>> I was reading in "Clusters

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Digimer
On 18/04/17 10:00 AM, Digimer wrote: > On 18/04/17 03:47 AM, Ulrich Windl wrote: >>>>> Digimer <li...@alteeve.ca> schrieb am 16.04.2017 um 20:17 in Nachricht >> <12cde13f-8bad-a2f1-6834-960ff3afc...@alteeve.ca>: >>> On 16/04/17 01:53 PM, Eric Robinso

Re: [ClusterLabs] lvm on shared storage and a lot of...

2017-04-18 Thread Digimer
VM to the letter. I don't know what could be > the problem. > > would you suggest ways to troubleshoot it? Is it faulty/failing hardware? > > many thanks, > L. LVM or clustered LVM? -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in

Re: [ClusterLabs] how important would you consider to have two independent fencing device for each node ?

2017-04-18 Thread Digimer
, as one example. It's slow, and needs shared storage, but a small box somewhere running a small tgtd or iscsid should do the trick (note that I have never used SBD myself...). -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Digimer
ut I am still lost... > On Mon, Apr 17, 2017 at 1:45 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu > <mailto:dmaz...@bmrb.wisc.edu>> wrote: > > On 04/17/2017 11:58 AM, Digimer wrote: > > > ... Unless I am misunderstanding, your comment is related to > > s

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Digimer
ng code figure this one out? That sounds like a use-case where a full HA cluster is overkill already. In any case, it would be a tiny fraction of installs and would be tangential to the 2v3+ node debate that this thread started with. -- Digimer Papers and Projects: https://alteeve.com/w/ "I a

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-18 Thread Digimer
it's just the cluster network interconnect that has > failed. > > IMO SCSI fencing should never be used on a 2 node cluster for reasons > you have already described very clearly. > > Chrissie I was fairly generic on that term because I've seen (and even wrote one!) where snmp was u

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-22 Thread Digimer
On 23/04/17 12:51 AM, Andrei Borzenkov wrote: > 22.04.2017 23:33, Dmitri Maziuk пишет: >> On 4/22/2017 12:02 PM, Digimer wrote: >> >>> Having SBD properly configured is *massively* safer than no fencing at >>> all. So for people where other fence methods are not

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-22 Thread Digimer
or me. That said, I have the same reservation with IPMI itself. So to me, "proper" fencing requires a backup, totally external, option like a pair of switched PDUs. Of course, I'm more paranoid than most. Having SBD properly configured is *massively* safer than no fencing at all. So for

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-03-13 Thread Digimer
mmences. > > Is this delay a feature of the cpg_mcast_joined function? If I > understand correctly (unlikely), it looks like cpg_mcast_joined is not > completing because one of the nodes in the group is missing, but I > haven't looked at that code closely yet. Is it advisable to have &g

[ClusterLabs] FenceAgentAPI

2017-03-06 Thread Digimer
' document and I will have anyone interested comment before making it an official update. Comments? -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have

Re: [ClusterLabs] FenceAgentAPI

2017-03-07 Thread Digimer
On 07/03/17 05:09 AM, Jan Pokorný wrote: > On 06/03/17 17:12 -0500, Digimer wrote: >> The old FenceAgentAPI document on fedorahosted is gone now that fedora >> hosted is closed. So I created a copy on the clusterlabs wiki: >> >> http://wiki.clusterlabs.org/wiki/FenceA

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Digimer
t someone *might* build unimportant clusters doesn't change anything, really. One could ask "if the service isn't important, why go to the hassle of building a cluster at all? It's just avoidable complication". So, in closing, I still argue that if you need HA, you always need fencing. -- Dig

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Digimer
On 18/04/17 08:50 PM, Dimitri Maziuk wrote: > On 04/18/2017 07:05 PM, Digimer wrote: > >> Certainly, the people creating the software have to assume that a >> split-brain is devastating. Same for people who teach others and people >> who write documentation. > > s

Re: [ClusterLabs] Stonith hostname vs port vs plug

2017-07-31 Thread Digimer
1 > node2"? But how the fence device will combine the hostname with port > (or plug)? I presume that node1 must somehow know that node2's plug is > Centos2, otherwise It could reboot itself (?) > Thank you. The "plug" should match the name used by the hypervisor, not the

Re: [ClusterLabs] DRBD AND cLVM ???

2017-07-31 Thread Digimer
ecommended, but it is possible with creative use of filter = [] in lvm.conf. I've not done it myself, mind you. As far as clvmd on DRBD, to LVM, it's no different if the block device is a SAN LUN or DRBD... It only cares that a changed block/inode on one side is the same on the other. -- Digime

Re: [ClusterLabs] Failover due to intermittent network partition.

2017-08-11 Thread Digimer
election say 10 - 15 seconds before > considering quoram loss ? > > Of reference, I am using pacemaker 1.14 with corosync. > > > Thank you You should be able to increase corosync's token timeout to do this. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, l

Re: [ClusterLabs] DRBD or SAN ?

2017-07-17 Thread Digimer
ice that can be taken offline by a bad firmware update, user error, etc. DRBD gives you full mechanical and electrical replication of the data and has survived some serious in-the-field faults in our Anvil! system (including a case where three drives were ejected at the same time from the node hostin

Re: [ClusterLabs] DRBD or SAN ?

2017-07-17 Thread Digimer
ion system, it > is probably better to use that (or at least use it at one level). You are absolutely correct. However, OP asked about DRBD vs SAN, not DRBD/SAN versus backup. Proper continuity planning requires redundancy (DRBD + clustering), backup and DR as three separate components. -- Digimer Pape

Re: [ClusterLabs] DRBD split brain after Cluster node recovery

2017-07-12 Thread Digimer
er.sh' script (and crm-unfence-peer.sh for after resync). This is the only way to avoid split-brains. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have li

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Digimer
on). With DRBD 9, you can set it up to momentarily do dual-primary to support live migration, though I have not used this myself yet. With dual-primary, you need to be sure a few things are in place (ie: proper fencing, but you need that anyway, a cluster resource manager, etc). -- Digimer Pap

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Digimer
out warning. * Failed backplanes causing multiple disks to be lost. * User error destroying RAID arrays. * Bad components used during upgrades causing a node to be offline until a new part is delivered Etc. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interest

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Digimer
understand how that would work... The goal of clvmd is to ensure changes to the VG (on a shared PV, like a LUN or DRBD) happen on all nodes at the same time. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Digimer
d LVM to manage the DRBD space, creating per-VM LVs, and then use the resource manager to manage the servers. This keeps the LVM data in sync and avoids the cost of cluster locking. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convoluti

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Digimer
; >> If you want to have a shared FS, yes. If you want to back VMs though, we >> use clustered LVM to manage the DRBD space, creating per-VM LVs, and >> then use the resource manager to manage the servers. This keeps the LVM >> data in sync and avoids the cost of cluster locking. &g

Re: [ClusterLabs] epic fail

2017-07-23 Thread Digimer
to 'fencing resource-and-stonith'? If so, then the only way to get a split-brain is if something is configured wrong in pacemaker or if something caused crm-fence-peer.sh to report success when it didn't actually succeed... -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, l

Re: [ClusterLabs] why resources are restarted when a node rejoins a cluster?

2017-07-24 Thread Digimer
respond to the list, not developers-ow...@clusterlabs.org. digimer -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and

Re: [ClusterLabs] convert already clustered LVM to cLVM

2017-06-28 Thread Digimer
gt; > > Thanks in advance > > Cristiano So to confirm, they have locking type = 3, but vgdisplay does not show clustered? I've not tried this myself in ages, but yes, 'vgchange -cy ...' should do the trick. It's possible to have fallback_to_local_locking = 1 and a mix of clustered

Re: [ClusterLabs] clusterlabs.org now supports https :-)

2017-06-29 Thread Digimer
ogin and change your password if you > have an account on one of these sites. > > [1] https://letsencrypt.org/ > [2] https://wiki.clusterlabs.org/ > [3] https://bugs.clusterlabs.org/ More security is more better! Thanks, Ken! -- Digimer Papers and Projects: https://al

Re: [ClusterLabs] Question about STONITH for VM HA cluster in shared hosts environment

2017-06-29 Thread Digimer
d argue. If you want to keep the services in VMs, that's fine, get a pair of nodes and make them an HA cluster to protect the VMs as the services (we do this all the time). With that, then you pair IPMI and switched PDUs for complete coverage (IPMI alone isn't enough, because if the host is destroye

[ClusterLabs] Introducing the Anvil! Intelligent Availability platform

2017-07-05 Thread Digimer
, of course, to all of you for the years of advice, banter and debate. I still have very much to learn! Now, time to start working full time on version 3! -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain th

Re: [ClusterLabs] About Corosync up to 16 nodes limit

2017-07-05 Thread Digimer
der and harder to keep things stable as the number of nodes grow. There is a lot of coordination that has to happen between the nodes and it gets ever more complex. Generally speaking, you don't want large clusters. It is always advised to break things up it separate smaller clusters whenever possible. --

Re: [ClusterLabs] About Corosync up to 16 nodes limit

2017-07-05 Thread Digimer
large. Again, there is no hard code limit here, just what is practical. Can I ask how large of a cluster you are planning to build, and what it will be used for? Note also; This is not related to pacemaker remote. You can have very large counts of remote nodes. digimer On 2017-07-05 11:27 PM

Re: [ClusterLabs] [Linux-cluster] HA cluster 6.5 redhat active passive Error

2017-04-29 Thread Digimer
locking_type = 3; fallback_to_clustered_locking = 1 fallback_to_local_locking = 0 } This assumes you are not trying to use LVM and clustered LVM at the same time. If you are, you probably don't want to. If you do anyway, don't set the fallback variables. With this, you then sta

Re: [ClusterLabs] [Linux-cluster] Need advice Redhat Clusters

2017-07-30 Thread Digimer
IPMI is common on most servers, so fence_ipmilan is quite common. Switched PDUs from APC are also popular, and they use fence_apc_snmp, etc. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the

[ClusterLabs] OT: Anyone going to Open Source Summit in Tokyo?

2017-05-16 Thread Digimer
Hi all, Pardon the off-topic post. I'll be attending Open Source Summit in Tokyo (http://events.linuxfoundation.org/events/open-source-summit-japan) in a couple of weeks. If anyone else from the cluster world will be attending, maybe we can meet up for beer/sake/coffee. :) -- Digimer

Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-19 Thread Digimer
On 19/06/17 11:40 PM, Andrei Borzenkov wrote: > 20.06.2017 02:15, Digimer пишет: >> On 19/06/17 06:59 PM, Ferenc Wágner wrote: >>> Digimer <li...@alteeve.ca> writes: >>> >>>> So we have a tool that watches for changes to clvmd by running >>>

[ClusterLabs] Notifications on changes in clustered LVM

2017-06-19 Thread Digimer
So we have a tool that watches for changes to clvmd by running pvscan/vgscan/lvscan, but this seems to be expensive and occassionally cause trouble. Is there any other way to be notified or to check when something changes? cheers -- Digimer Papers and Projects: https://alteeve.com/w/ "

Re: [ClusterLabs] Notifications on changes in clustered LVM

2017-06-19 Thread Digimer
On 19/06/17 06:59 PM, Ferenc Wágner wrote: > Digimer <li...@alteeve.ca> writes: > >> So we have a tool that watches for changes to clvmd by running >> pvscan/vgscan/lvscan, but this seems to be expensive and occassionally >> cause trouble. > > Wha

Re: [ClusterLabs] (no subject)

2017-05-24 Thread Digimer
rlabs.org > > > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://w

Re: [ClusterLabs] Antw: Re: (no subject)

2017-05-26 Thread Digimer
On 26/05/17 02:05 AM, Ulrich Windl wrote: > PLEASE learn how to use the subject in E_mail messages! Christopher explained that the email was sent early by accident. digimer >>>> Christopher Pax <ops...@gmail.com> schrieb am 24.05.2017 um 22:36 in >>>> Nachricht >

Re: [ClusterLabs] question about fence-virsh

2017-05-19 Thread Digimer
ats not really an option. > > -- > Andrew W. Kerber fence_virsh -a -l root -p -n -o status That should show the status. To reboot, change 'status' to 'reboot'. If this doesn't work, make sure you can ssh from the nodes to the hypervisor as the root user. -- Digimer Papers and Proje

Re: [ClusterLabs] Antw: clearing failed actions

2017-05-31 Thread Digimer
erlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > ___ > Users mailing list: U

Re: [ClusterLabs] Antw: Re: Antw: clearing failed actions

2017-06-01 Thread Digimer
ni-regensburg.de] >> Sent: Thursday, June 1, 2017 8:34 AM >> To: users@clusterlabs.org >> Subject: [ClusterLabs] Antw: Re: Antw: clearing failed actions >> >>>>> Digimer <li...@alteeve.ca> schrieb am 01.06.2017 um 00:03 in Nachricht >> <50aad2

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Digimer
learning HA on SUSE/RHEL and then, after you know what config works for you, migrate to the target OS. That way you have only one set of variables at a time. Also, use fencing. Seriously, just do it. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested

<    1   2   3   4   5   >