Re: [ClusterLabs] Antw: After reboot each node thinks the other is offline.

2017-08-01 Thread Dmitri Maziuk
On 2017-08-01 03:05, Stephen Carville (HA List) wrote: Can clustering even be done reliably on CentOS 6? I have no objection to moving to 7 but I was hoping I could get this up quicker than building out a bunch of new balancers. I have a number of centos 6 active/passive pairs running

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Dmitri Maziuk
On 2017-07-24 07:51, Tomer Azran wrote: We don't have the ability to use it. Is that the only solution? No, but I'd recommend thinking about it first. Are you sure you will care about your cluster working when your server room is on fire? 'Cause unless you have halon suppression, your server

[ClusterLabs] epic fail

2017-07-23 Thread Dmitri Maziuk
So yesterday I ran yum update that puled in the new pacemaker and tried to restart it. The node went into its usual "can't unmount drbd because kernel is using it" and got stonith'ed in the middle of yum transaction. The end result: DRBD reports split brain, HA daemons don't start on boot, RPM

Re: [ClusterLabs] Antw: Re: DRBD or SAN ?

2017-07-19 Thread Dmitri Maziuk
On 7/19/2017 1:29 AM, Ulrich Windl wrote: Maybe it's like with the cluster: Once you have set it up correctly, it runs quite well, but the wy to get there may be painful. I quit my experiments with dual-primary DRBD in some early SLES11 (SP1), because it fenced a lot and refused to come up

Re: [ClusterLabs] DRBD or SAN ?

2017-07-18 Thread Dmitri Maziuk
On 7/17/2017 2:07 PM, Chris Adams wrote: However, just like RAID is not a replacement for backups, DRBD is IMHO not a replacement for database replication. DRBD would just replicate database files, so if for example file corruption would be copied from host to host. When something provides a

Re: [ClusterLabs] DRBD or SAN ?

2017-07-17 Thread Dmitri Maziuk
On 7/17/2017 4:51 AM, Lentes, Bernd wrote: I'm asking myself if a DRBD configuration wouldn't be more redundant and high available. ... Is DRBD in conjuction with a database (MySQL or Postgres) possible ? Have you seen https://github.com/ewwhite/zfs-ha/wiki ? -- I recently deployed one

Re: [ClusterLabs] DRBD split brain after Cluster node recovery

2017-07-14 Thread Dmitri Maziuk
On 7/14/2017 3:57 AM, ArekW wrote: Hi, I have stonith run and tested. The problem was that there is mistake in drbd documentation. The 'fencing' belongs to net (not disk). If you are running NFS on top of a dual-primary DRBD with some sort of a cluster filesystem, I'd think *that* is your

Re: [ClusterLabs] DRBD split brain after Cluster node recovery

2017-07-12 Thread Dmitri Maziuk
On 7/12/2017 4:33 AM, ArekW wrote: Hi, Can in be fixed that the drbd is entering split brain after cluster node recovery? I always configure "after-sb*" handlers and drbd-level fence but I never ran it with allow-two-primaries. You'll have read the fine manual on how that works in a

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Dmitri Maziuk
On 2017-06-16 10:16, Digimer wrote: On 16/06/17 11:07 AM, Eric Robinson wrote: Step over to the *bsd side. They have cookies. Also zfs. And no lennartware, that alone's worth $700/year. Dima I left BSD for Linux back in 2000 or so. I have often been wistful for those days. ;-) --Eric

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Dmitri Maziuk
On 2017-06-16 02:21, Eric Robinson wrote: Someone talk me off the ledge here. Step over to the *bsd side. They have cookies. Also zfs. And no lennartware, that alone's worth $700/year. Dima ___ Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-17 Thread Dmitri Maziuk
On 2017-05-17 06:24, Lentes, Bernd wrote: ... I'd like to know what the software is use is doing. Am i the only one having that opinion ? No. How do you solve the problem of a deathmatch or killing the wrong node ? *I* live dangerously with fencing disabled. But then my clusters only

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-23 Thread Dmitri Maziuk
On 4/22/2017 11:51 PM, Andrei Borzenkov wrote: As a real life example (not Linux/pacemaker) - panicking node flush eddisk buffers, so it was not safe to access shared filesystem until this was complete. This could take quite a lot of time, so without agent on *surviving* node(s) that monitors

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-22 Thread Dmitri Maziuk
On 4/22/2017 12:02 PM, Digimer wrote: Having SBD properly configured is *massively* safer than no fencing at all. So for people where other fence methods are not available for whatever reason, SBD is the way to go. Now you're talking. IMO in a 2-node cluster, a node that kills itself in

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Dmitri Maziuk
On 2017-04-16 15:04, Eric Robinson wrote: On 16/04/17 01:53 PM, Eric Robinson wrote: I was reading in "Clusters from Scratch" where Beekhof states, "Some would argue that two-node clusters are always pointless, but that is an argument for another time." What you want to know is whether the

Re: [ClusterLabs] Fraud Detection Check?

2017-04-13 Thread Dmitri Maziuk
On 2017-04-13 01:39, Jan Pokorný wrote: After a bit of a search, the best practice at the list server seems to be: [...] if you change the message (eg, by adding a list signature or by adding the list name to the Subject field), you *should* DKIM sign. This is of course going entirely

Re: [ClusterLabs] OS Patching Process

2016-11-24 Thread Dmitri Maziuk
On 2016-11-24 10:41, Toni Tschampke wrote: We recently did an upgrade for our cluster nodes from Wheezy to Jessie. IIRC it's the MIT CS joke that they have clusters whose uptime goes way back past the manufacturing date of any/every piece of hardware they're running on. They aren't linux-ha

Re: [ClusterLabs] Antw: OS Patching Process

2016-11-23 Thread Dmitri Maziuk
On 2016-11-23 02:23, Ulrich Windl wrote: I'd recommend making a backup of the DRBD data (you always should, anyway), the shut down the cluster, upgrade all the needed components, then start the cluster again. Do your basic tests. If you corrupted your data, re-create DRBD from scratch. Then

Re: [ClusterLabs] OS Patching Process

2016-11-22 Thread Dmitri Maziuk
On 2016-11-22 10:35, Jason A Ramsey wrote: Can anyone recommend a bulletproof process for OS patching a pacemaker cluster that manages a drbd mirror (with LVM on top of the drbd and luns defined for an iscsi target cluster if that matters)? Any time I’ve tried to mess with the cluster, it seems

Re: [ClusterLabs] Antw: Re: Can't do anything right; how do I start over?

2016-10-17 Thread Dmitri Maziuk
On 2016-10-17 02:12, Ulrich Windl wrote: Have you tried a proper variant of "lsof" before? So maybe you know which process might block the device. I also think if you have LVM on top of DRBD, you must deactivate the VG before trying to unmount. No LVM here: AFAIMC these days it's another

Re: [ClusterLabs] Can't do anything right; how do I start over?

2016-10-15 Thread Dmitri Maziuk
On 2016-10-15 01:56, Jay Scott wrote: So, what's wrong? (I'm a newbie, of course.) Here's what worked for me on centos 7: http://octopus.bmrb.wisc.edu/dokuwiki/doku.php?id=sysadmin:pacemaker YMMV and all that. cheers, Dima ___ Users mailing

Re: [ClusterLabs] Virtual ip resource restarted on node with down network device

2016-09-20 Thread Dmitri Maziuk
On 2016-09-20 09:53, Ken Gaillot wrote: I do think ifdown is not quite the best failure simulation, since there aren't that many real-world situation that merely take an interface down. To simulate network loss (without pulling the cable), I think maybe using the firewall to block all traffic

Re: [ClusterLabs] Change disk

2016-09-15 Thread Dmitri Maziuk
On 2016-09-14 09:30, NetLink wrote: 1.Put node 2 in standby 2.Change and configure the new bigger disk on node 2 3.Put node 2 back online and wait for syncing. 4.Put node 1 in standby and repeat the procedure Would this approach work? I

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-08 Thread Dmitri Maziuk
On 2016-09-08 02:03, Digimer wrote: You need to solve the problem with fencing in DRBD. Leaving it off WILL result in a split-brain eventually, full stop. With working fencing, you will NOT get a split-brain, full stop. "Split brain is a situation where, due to temporary failure of all

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-07 Thread Dmitri Maziuk
On 2016-09-06 14:04, Devin Ortner wrote: I have a 2-node cluster running CentOS 6.8 and Pacemaker with DRBD. I have been using the "Clusters from Scratch" documentation to create my cluster and I am running into a problem where DRBD is not failing over to the other node when one goes down. I

Re: [ClusterLabs] ocf scripts shell and local variables

2016-08-30 Thread Dmitri Maziuk
On 2016-08-30 03:44, Dejan Muhamedagic wrote: The kernel reads the shebang line and it is what defines the interpreter which is to be invoked to run the script. Yes, and does the kernel read when the script is source'd or executed via any of the mechanisms that have the executable specified

Re: [ClusterLabs] ocf scripts shell and local variables

2016-08-29 Thread Dmitri Maziuk
On 2016-08-29 04:06, Gabriele Bulfon wrote: Thanks, though this does not work :) Uhm... right. Too many languages, sorry: perl's system() will call the login shell, system system() uses /bin/sh, and exec()s will run whatever the programmer tells them to. The point is none of them cares what

Re: [ClusterLabs] ocf scripts shell and local variables

2016-08-26 Thread Dmitri Maziuk
On 2016-08-26 08:56, Ken Gaillot wrote: On 08/26/2016 08:11 AM, Gabriele Bulfon wrote: I tried adding some debug in ocf-shellfuncs, showing env and ps -ef into the corosync.log I suspect it's always using ksh, because in the env output I produced I find this: KSH_VERSION=.sh.version This is

Re: [ClusterLabs] Unable to Build fence-agents from Source on RHEL6

2016-08-10 Thread Dmitri Maziuk
On 2016-08-10 10:04, Jason A Ramsey wrote: Traceback (most recent call last): File "eps/fence_eps", line 14, in if sys.version_info.major > 2: AttributeError: 'tuple' object has no attribute 'major' Replace with sys.version_info[0] Dima ___

Re: [ClusterLabs] Recovering after split-brain

2016-06-21 Thread Dmitri Maziuk
On 2016-06-20 17:19, Digimer wrote: Nikhil indicated that they could switch where traffic went up-stream without issue, if I understood properly. They have some interesting setup, but that notwithstanding: if split brain happens some clients will connect to "old master" and some: to "new

Re: [ClusterLabs] Recovering after split-brain

2016-06-20 Thread Dmitri Maziuk
On 2016-06-20 09:13, Jehan-Guillaume de Rorthais wrote: I've heard multiple time this kind of argument on the field, but soon or later, these clusters actually had a split brain scenario with clients connected on both side, some very bad corruptions, data lost, etc. I'm sure it's a very

Re: [ClusterLabs] restarting pacemakerd

2016-06-18 Thread Dmitri Maziuk
On 2016-06-18 05:15, Ferenc Wágner wrote: ... On the other hand, one could argue that restarting failed services should be the default behavior of systemd (or any init system). Still, it is not. As an off-topic snide comment, I never understood the thinking behind that: restarting without

Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Dmitri Maziuk
On 2016-06-08 09:11, Ken Gaillot wrote: On 06/08/2016 03:26 AM, Jan Pokorný wrote: Pacemaker can drive systemd-managed services for quite some time. This is as easy as changing lsb:dovecot to systemd:dovecot. Great! Any chance that could be mentioned on

Re: [ClusterLabs] mail server (postfix)

2016-06-04 Thread Dmitri Maziuk
On 2016-06-04 01:10, Digimer wrote: We're running postfix/dovecot/postgres for our mail on an HA cluster, but we put it all in a set of VMs and made the VMs HA on DRBD. Hmm. I deliver to ~/Maildir and /home is NFS-mounted all over the place, so my primary goal is HA NFS server. I'd hesitate

Re: [ClusterLabs] start a resource

2016-05-06 Thread Dmitri Maziuk
On 2016-05-05 23:50, Moiz Arif wrote: Hi Dimitri, Try cleanup of the fail count for the resource with the any of the below commands: via pcs : pcs resource cleanup rsyncd Tried it, didn't work. Tried pcs resource debug-start rsyncd -- got no errors, resource didn't start. Tried

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-26 Thread Dmitri Maziuk
On 2016-04-26 00:58, Klaus Wenninger wrote: But what you are attempting doesn't sound entirely proprietary. So once you have something that looks like it might be useful for others as well let the community participate and free yourself from having to always take care of your private copy ;-)

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-25 Thread Dmitri Maziuk
On 2016-04-24 16:20, Ken Gaillot wrote: Correct, you would need to customize the RA. Well, you wouldn't because your custom RA will be overwritten by the next RPM update. Dimitri ___ Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] dropping ssh connection on failover

2016-04-15 Thread Dmitri Maziuk
On 2016-04-15 07:46, Klaus Wenninger wrote: Which IP-address did you use to ssh to that box? One controlled by pacemaker and possibly being migrated or a fixed one assigned to that box? Good try but no: the "sunken" (as opposed to floating ;) address of course. If what digimer says is true,