[ClusterLabs] dead cluster after centos update

2017-10-23 Thread Dimitri Maziuk
nd pcs resource debug-start resource-zfs --full works fine: the pool is imported, filesystems are mounted and exported -- but the resources remain stopped no matter what. I don't see anything useful in the logs. How do I unfsck this mess? -- Dimitri Maziuk Programmer/sysadmin BioMa

Re: [ClusterLabs] start one node only?

2017-08-24 Thread Dimitri Maziuk
PS. centos 7.latest w/ the current pcs/corosync/pacemaker rpms as distributed by centos, resources are stonith:fence_scsi, IPaddr2, and ZFS. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature

[ClusterLabs] start one node only?

2017-08-24 Thread Dimitri Maziuk
, and then I can shut one of them down and it'll keep running. But that doesn't seem to happen when starting cold. What am I missing? TIA -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature

Re: [ClusterLabs] IPaddr2 RA and bonding

2017-08-07 Thread Dimitri Maziuk
> > exit( $rc ); and it doesn't have to be a "resource agent" or a custom implementation of ifdown, or anything. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signa

Re: [ClusterLabs] epic fail

2017-07-24 Thread Dimitri Maziuk
On 07/24/2017 11:34 AM, Ken Gaillot wrote: > On Mon, 2017-07-24 at 18:09 +0200, Valentin Vidic wrote: >> On Mon, Jul 24, 2017 at 11:01:26AM -0500, Dimitri Maziuk wrote: >>> Lsof/fuser show the PID of the process holding FS open as "kernel". >> >> That could

Re: [ClusterLabs] epic fail

2017-07-24 Thread Dimitri Maziuk
[6886]: INFO: Running > stop for /dev/drbd0 on /raid > Jul 22 14:03:48 zebrafish Filesystem(drbd_filesystem)[6886]: INFO: Trying to > unmount /raid > Jul 22 14:03:48 zebrafish Filesystem(drbd_filesystem)[6886]: ERROR: Couldn't > unmount /raid; trying cleanup with TERM ... -- Dimitr

Re: [ClusterLabs] epic fail

2017-07-24 Thread Dimitri Maziuk
78]: notice: Transition aborted by > operation drbd_filesystem_stop_0 'modify' on zebrafish: Event failed > Jul 22 14:03:55 zebrafish crmd[1078]: warning: Action 45 > (drbd_filesystem_stop_0) on zebrafish failed (target: 0 vs. rc: 1): Error > Jul 22 14:03:55 zebrafish c

Re: [ClusterLabs] epic fail

2017-07-24 Thread Dimitri Maziuk
s simply the wrong tool for this particular job. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org http://lists.

Re: [ClusterLabs] vip is not removed after node lost connection with the other two nodes

2017-06-23 Thread Dimitri Maziuk
on modern kernels? -- That's an honest question, I have not seen that in forever (fingers crossed knock on wood). I.e. is the expectation that real life failure will be "nice" to corosync actually warranted? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http:

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Dimitri Maziuk
On 06/16/2017 12:55 PM, Eric Robinson wrote: > I must have misspoken. No, I had invisible tags all over my last two messages. (Digimer and I have differing views on usefulness of fencing in two-node active-passive clusters.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madi

Re: [ClusterLabs] Installing on SLES 12 -- Where's the Repos?

2017-06-16 Thread Dimitri Maziuk
On 06/16/2017 12:26 PM, Eric Robinson wrote: > > Out of curiosity, what did I say that indicates that we're not using fencing? > Same place you said you were new to HA and needed to learn corosync and pacemaker to use OpenBSD. HTH, -- Dimitri Maziuk Programmer/sysadmin BioMagRe

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-10 Thread Dimitri Maziuk
On 05/10/2017 01:54 PM, Ken Gaillot wrote: > On 05/10/2017 12:26 PM, Dimitri Maziuk wrote: >> - fencing in 2-node clusters does not work reliably without fixed delay > > Not quite. Fixed delay allows a particular method for avoiding a death > match in a two-node cluster.

Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-10 Thread Dimitri Maziuk
a > fixed delay. I believe that's what digimer uses. Is it just me or does this sound like catch-22: - pacemaker does not work reliably without fencing - fencing in 2-node clusters does not work reliably without fixed delay - code that ships with pacemaker does not implement fixed delay. -- Dimitr

Re: [ClusterLabs] Antw: Behavior after stop action failure with the failure-timeout set and STONITH disabled

2017-05-05 Thread Dimitri Maziuk
the DRBD device, and the power button is the only way to "unfreeze" it. Hack the RA to write the status file somewhere else perhaps? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPG

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Dimitri Maziuk
p, assume makes an ass of you and me. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org http://lists.cluste

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Dimitri Maziuk
people get a 404 until you get back to work on Monday? The whole SCARY SPLIT BRANE! RUN!! RUN AWAY!!! spiel is really quite pointless without the answer to that. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP d

Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Dimitri Maziuk
ps you should consider a different definition or different cluster software. Oh, wait... -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailin

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Dimitri Maziuk
elopment" host: postgres 9.x in hot standby streaming replication, static contents is pushed with zfs snapshots, the only thing you need to "cluster" is floating ip. Yes, this works perfectly fine with haresources and a couple of two-liner mon scripts. And nagios on the "maste

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Dimitri Maziuk
"wrong" node, then the only practical difference between that and "proper" fencing with split brain detection and trimmings is the cost of the latter. Send an SMS to the sysadmin and have them figure it out. Better still, pay an extra nickel and buy servers that don't go titsup in the

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Dimitri Maziuk
floating ip is bound >> to eth0. In shred-nothing cluster "split brain" means whichever MAC address is in ARP cache of the border router is the one that gets the traffic. How does the existing code figure this one out? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, U

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Dimitri Maziuk
;best" in that it's simple, stupid, does all you you need/can do and nothing that doesn't make your cluster run any "better". It's also very unexciting. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu sig

Re: [ClusterLabs] 2-Node Cluster Pointless?

2017-04-17 Thread Dimitri Maziuk
art where we all like to write something new, clever, and exciting? Which is usually not the same as the best we can do for the actual problem at hand? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Descript

Re: [ClusterLabs] Fraud Detection Check?

2017-04-12 Thread Dimitri Maziuk
part that is signed does not get altered by adding the mime part with list footers.) DKIM is the example of how to do it wrong *after* we worked out the way to do it right. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description:

Re: [ClusterLabs] Pacemaker for Embedded Systems

2017-04-11 Thread Dimitri Maziuk
On 04/10/2017 10:22 PM, Klaus Wenninger wrote: > On 04/11/2017 12:11 AM, Dimitri Maziuk wrote: >> When fencing puts my vehicle in a "known" state, I'd want to be very >> sure it's the *safe* state. > > *safe* for -- the other vehicles driving along... > So i

Re: [ClusterLabs] Pacemaker for Embedded Systems

2017-04-10 Thread Dimitri Maziuk
sure it's the *safe* state. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mai

Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Dimitri Maziuk
On 04/07/2017 02:22 PM, Eric Robinson wrote: >>> You guys got a thing against Office 365? > >> doesn't everybody? > > Fair enough. ;) On a serious note, I too received your e-mails without any red flags attached. -- Dimitri Maziuk Programmer/sysadmin BioMagRes

Re: [ClusterLabs] Fraud Detection Check?

2017-04-07 Thread Dimitri Maziuk
On 04/07/2017 01:32 PM, Eric Robinson wrote: > You guys got a thing against Office 365? doesn't everybody? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signat

Re: [ClusterLabs] Sync Apache config files

2017-03-13 Thread Dimitri Maziuk
best IME, although on our two-node active/passive pairs I haven't had any problems with DRBD either -- as long as it's not exported via nfs on centos 7/corosync/pacemaker. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPG

Re: [ClusterLabs] Sync Apache config files

2017-03-13 Thread Dimitri Maziuk
mber of ways to do it but pcs is not one of them. Automation solutions like chef or puppet can do it (saltstack has an event-reactor system that can make it transparent, don't know about others), you could put it on zfs and clone snapshots, syncthing, rsync, and so on. -- Dimitri Maziuk Programmer/sysadmin

Re: [ClusterLabs] centos 7 drbd fubar

2017-01-06 Thread Dimitri Maziuk
hat fails over just fine, the difference is it doesn't export the drbd over nfs. So it could be nfs. I also had it working initially -- otherwise it'd never made it into production, so it may be the recent redhat kernels. Thanks though, I'll probably try drbd-users next. -- Dimitri Maziuk Programmer

[ClusterLabs] centos 7 drbd fubar

2016-12-27 Thread Dimitri Maziuk
CESS COMMAND > /raid: root kernel mount (root)/raid After running yum up on the primary and rebooting it again, 5. pcs cluster unstandby causes the same fail to unmount loop on the secondary, that has to be powered down until the primary recovers. Hopefully I'm doing something

Re: [ClusterLabs] ocf:heartbeat:IPaddr2 - Different network segment

2016-11-30 Thread Dimitri Maziuk
PS you could probably use iptables to block/log outgoing traffic from the wrong ip (different on each node) to be really really sure. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature

Re: [ClusterLabs] ocf:heartbeat:IPaddr2 - Different network segment

2016-11-30 Thread Dimitri Maziuk
s long as outgoing packets don't have it as their from address, you should be fine. I.e. just have both ips up on either node and see what happens. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital sig

Re: [ClusterLabs] Can't do anything right; how do I start over?

2016-10-15 Thread Dimitri Maziuk
md[1137]: warning: Action 46 > (drbd_filesystem_stop_0) on lionfish failed (target: 0 vs. rc: 1): Error > Oct 15 15:32:00 lionfish crmd[1137]: notice: Transition aborted by > drbd_filesystem_stop_0 'modify' on lionfish: Event failed > (magic=0:1;46:4:0:700f71e0-d565 > -496f-a2c6-6b97f0cfd940

Re: [ClusterLabs] Can't do anything right; how do I start over?

2016-10-14 Thread Dimitri Maziuk
aside for it. If it's small enough, dd if=/dev/zero of=/your/partition Get DRBD working and fully sync'ed outside of the cluster before you start adding it. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: Ope

Re: [ClusterLabs] Corosync ring shown faulty between healthy nodes & networks (rrp_mode: passive)

2016-10-06 Thread Dimitri Maziuk
PS. in security handling everything at one (high) level is known as "hard crunchy shell with soft chewy center". It's not seen as a good thing. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital

Re: [ClusterLabs] Corosync ring shown faulty between healthy nodes & networks (rrp_mode: passive)

2016-10-06 Thread Dimitri Maziuk
ctions, is way more disruptive than mdadm going into "degraded" state and sending you an e-mail. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature _

Re: [ClusterLabs] Corosync ring shown faulty between healthy nodes & networks (rrp_mode: passive)

2016-10-06 Thread Dimitri Maziuk
ious counter-example is a hard disk failure: they're common on commodity spinning rust drives and they're cheap and easy to handle at lower level by throwing in a 2nd one in mdadm raid-1. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc

Re: [ClusterLabs] stonithd/fenced filling up logs

2016-10-05 Thread Dimitri Maziuk
On 10/05/2016 12:19 PM, Digimer wrote: > Explain why this is a bad idea, because I don't see anything wrong with it. My point exactly. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signat

Re: [ClusterLabs] hi list

2016-09-30 Thread Dimitri Maziuk
ed to "ip". -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users

Re: [ClusterLabs] DRBD failover in Pacemaker

2016-09-08 Thread Dimitri Maziuk
ng around the back of the rack. Maybe if you run a zillion of stacked active-active resources on a 100-node cluster DRBD split brain becomes a real problem, from where I'm sitting stonith'ing DRBD nodes is a solution in search of a problem. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank,

Re: [ClusterLabs] ocf scripts shell and local variables

2016-09-01 Thread Dimitri Maziuk
to use, but the system doesn't have to listen. So e.g. whoever suggested (Lars?) that on non-Linux platforms you sed all the shebang lines to /usr/bin/bash or whatever -- that's not guaranteed to work. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signatu

Re: [ClusterLabs] ocf scripts shell and local variables

2016-08-30 Thread Dimitri Maziuk
ise > wouldn't to be able to maintain software. Not sure where local > originates, but wouldn't bet that it's bash. Well 2 out of 3 is "most", can't argue with that. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.a

Re: [ClusterLabs] ocf scripts shell and local variables

2016-08-29 Thread Dimitri Maziuk
On 08/29/2016 03:27 PM, Vladislav Bogdanov wrote: > Maybe #!/bin/ocfsh symlink provided by resource-agents package? ... and that's how lennartware ended up implementing its own syslog... -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.

Re: [ClusterLabs] design question to DRBD

2016-06-25 Thread Dimitri Maziuk
ause I haven't looked into gfs lock manager: I'm sure it sucks just as hard only differently.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users

Re: [ClusterLabs] design question to DRBD

2016-06-22 Thread Dimitri Maziuk
don't know which of them would be "less complicated". -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.

Re: [ClusterLabs] design question to DRBD

2016-06-22 Thread Dimitri Maziuk
r filesystem? Otherwise it'll be mounted on one node only and you can't run your webapp on the other as documentroot etc. are unavailable there. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digit

Re: [ClusterLabs] design question to DRBD

2016-06-22 Thread Dimitri Maziuk
On 06/22/2016 01:00 PM, Lentes, Bernd wrote: > - On Jun 22, 2016, at 7:17 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: >> Does your webapp ever write to /srv/www? > it does. Yeah, OK, it that case you want DRBD so the writes go to both nodes at once. If you have to use

Re: [ClusterLabs] design question to DRBD

2016-06-22 Thread Dimitri Maziuk
store and transactional replication on the database side, and have only the floating IP address controlled by the cluster. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital sig

Re: [ClusterLabs] Recovering after split-brain

2016-06-21 Thread Dimitri Maziuk
ontrolled power socket. I knew that, actually, that's why I hung on to heartbeat for as long as I could. It'd be nice to have it spelled out in bold at the start of every "explained from scratch" document on clusterlabs.org for the young players. -- Dimitri Maziuk Programmer/sysadmin BioMag

Re: [ClusterLabs] Recovering after split-brain

2016-06-20 Thread Dimitri Maziuk
s, then make like simpler and remove pacemaker entirely. Obviously you'd have to remove the other node as well since you now can't have the single service access point anymore. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Descript

Re: [ClusterLabs] restarting pacemakerd

2016-06-18 Thread Dimitri Maziuk
g and generating alert and failing" is the alert flood. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org http:/

[ClusterLabs] dovecot RA

2016-06-07 Thread Dimitri Maziuk
Hi all, next question: I'm on centos 7 and there's no more /etc/init.d/. With lennartware spreading, is there a coherent plan to deal with former LSB agents? Specifically, should I roll my own RA for dovecot or is there one in the works somewhere? TIA, -- Dimitri Maziuk Programmer/sysadmin

Re: [ClusterLabs] mail server (postfix)

2016-06-06 Thread Dimitri Maziuk
eah, that could work... but if my way works I won't have to write my own RA -- or at least not for postfix. ;) Thanks, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP di

Re: [ClusterLabs] mail server (postfix)

2016-06-05 Thread Dimitri Maziuk
irs, then c) restart postfix in send-only "slave" configuration. On the other node I could simply restart the "master" postfix after b), but on the node going passive the b) has to be between a) and c). Thx, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- h

[ClusterLabs] mail server (postfix)

2016-06-03 Thread Dimitri Maziuk
omplish it? (I know running an MTA that way is not the Approved Way(tm), I have my reasons for wanting to it like this.) TIA -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenP

Re: [ClusterLabs] start a resource

2016-05-07 Thread Dimitri Maziuk
h connections to the nodes' "proper" IPs. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org http://cl

[ClusterLabs] start a resource

2016-05-05 Thread Dimitri Maziuk
55:50 2016', queued=0ms, exec=51ms OK, I fixed the config file, how do I restart rsyncd now? TIA -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Use

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-28 Thread Dimitri Maziuk
/curl and enable /server-status in the first place.) -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org http://clu

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-24 Thread Dimitri Maziuk
-apache.html suggests that apache RA does not and all you can do in practice is run the same curl http:/localhost/server-status check with different frequencies. Would that be what we actually have ATM? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.

Re: [ClusterLabs] dropping ssh connection on failover

2016-04-15 Thread Dimitri Maziuk
didn't have the auto-recovery and notification handlers set up initially and ended up split-braining it. Now that everything's clean and happy, pcs cluster stop works without killing the login. Solved for now. Thx -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wi

[ClusterLabs] dropping ssh connection on failover

2016-04-15 Thread Dimitri Maziuk
of the new and improved ip RA and/or ip command? TIA, -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org http