nd
pcs resource debug-start resource-zfs --full
works fine: the pool is imported, filesystems are mounted and exported
-- but the resources remain stopped no matter what.
I don't see anything useful in the logs. How do I unfsck this mess?
--
Dimitri Maziuk
Programmer/sysadmin
BioMa
PS. centos 7.latest w/ the current pcs/corosync/pacemaker rpms as
distributed by centos, resources are stonith:fence_scsi, IPaddr2, and ZFS.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
, and then I
can shut one of them down and it'll keep running. But that doesn't seem
to happen when starting cold.
What am I missing?
TIA
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
>
> exit( $rc );
and it doesn't have to be a "resource agent" or a custom implementation
of ifdown, or anything.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signa
On 07/24/2017 11:34 AM, Ken Gaillot wrote:
> On Mon, 2017-07-24 at 18:09 +0200, Valentin Vidic wrote:
>> On Mon, Jul 24, 2017 at 11:01:26AM -0500, Dimitri Maziuk wrote:
>>> Lsof/fuser show the PID of the process holding FS open as "kernel".
>>
>> That could
[6886]: INFO: Running
> stop for /dev/drbd0 on /raid
> Jul 22 14:03:48 zebrafish Filesystem(drbd_filesystem)[6886]: INFO: Trying to
> unmount /raid
> Jul 22 14:03:48 zebrafish Filesystem(drbd_filesystem)[6886]: ERROR: Couldn't
> unmount /raid; trying cleanup with TERM
...
--
Dimitr
78]: notice: Transition aborted by
> operation drbd_filesystem_stop_0 'modify' on zebrafish: Event failed
> Jul 22 14:03:55 zebrafish crmd[1078]: warning: Action 45
> (drbd_filesystem_stop_0) on zebrafish failed (target: 0 vs. rc: 1): Error
> Jul 22 14:03:55 zebrafish c
s simply the wrong tool for this particular job.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://lists.
on modern kernels? -- That's an honest question, I have not seen that in
forever (fingers crossed knock on wood).
I.e. is the expectation that real life failure will be "nice" to
corosync actually warranted?
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http:
On 06/16/2017 12:55 PM, Eric Robinson wrote:
> I must have misspoken.
No, I had invisible tags all over my last two messages.
(Digimer and I have differing views on usefulness of fencing in two-node
active-passive clusters.)
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madi
On 06/16/2017 12:26 PM, Eric Robinson wrote:
>
> Out of curiosity, what did I say that indicates that we're not using fencing?
>
Same place you said you were new to HA and needed to learn corosync and
pacemaker to use OpenBSD.
HTH,
--
Dimitri Maziuk
Programmer/sysadmin
BioMagRe
On 05/10/2017 01:54 PM, Ken Gaillot wrote:
> On 05/10/2017 12:26 PM, Dimitri Maziuk wrote:
>> - fencing in 2-node clusters does not work reliably without fixed delay
>
> Not quite. Fixed delay allows a particular method for avoiding a death
> match in a two-node cluster.
a
> fixed delay. I believe that's what digimer uses.
Is it just me or does this sound like catch-22:
- pacemaker does not work reliably without fencing
- fencing in 2-node clusters does not work reliably without fixed delay
- code that ships with pacemaker does not implement fixed delay.
--
Dimitr
the DRBD device, and the
power button is the only way to "unfreeze" it.
Hack the RA to write the status file somewhere else perhaps?
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPG
p, assume makes an ass of you and me.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://lists.cluste
people get a 404 until you get
back to work on Monday?
The whole SCARY SPLIT BRANE! RUN!! RUN AWAY!!! spiel is really quite
pointless without the answer to that.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP d
ps you should consider a different definition or different cluster
software.
Oh, wait...
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailin
elopment" host: postgres 9.x in hot
standby streaming replication, static contents is pushed with zfs
snapshots, the only thing you need to "cluster" is floating ip.
Yes, this works perfectly fine with haresources and a couple of
two-liner mon scripts. And nagios on the "maste
"wrong" node, then the only
practical difference between that and "proper" fencing with split brain
detection and trimmings is the cost of the latter.
Send an SMS to the sysadmin and have them figure it out. Better still,
pay an extra nickel and buy servers that don't go titsup in the
floating ip is bound
>> to eth0.
In shred-nothing cluster "split brain" means whichever MAC address is in
ARP cache of the border router is the one that gets the traffic. How
does the existing code figure this one out?
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, U
;best" in that it's simple, stupid, does all you you
need/can do and nothing that doesn't make your cluster run any "better".
It's also very unexciting.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
sig
art where we all like to write something
new, clever, and exciting? Which is usually not the same as the best we
can do for the actual problem at hand?
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Descript
part that is
signed does not get altered by adding the mime part with list footers.)
DKIM is the example of how to do it wrong *after* we worked out the way
to do it right.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description:
On 04/10/2017 10:22 PM, Klaus Wenninger wrote:
> On 04/11/2017 12:11 AM, Dimitri Maziuk wrote:
>> When fencing puts my vehicle in a "known" state, I'd want to be very
>> sure it's the *safe* state.
>
> *safe* for -- the other vehicles driving along...
> So i
sure it's the *safe* state.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mai
On 04/07/2017 02:22 PM, Eric Robinson wrote:
>>> You guys got a thing against Office 365?
>
>> doesn't everybody?
>
> Fair enough.
;)
On a serious note, I too received your e-mails without any red flags
attached.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagRes
On 04/07/2017 01:32 PM, Eric Robinson wrote:
> You guys got a thing against Office 365?
doesn't everybody?
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signat
best IME, although on our
two-node active/passive pairs I haven't had any problems with DRBD
either -- as long as it's not exported via nfs on centos
7/corosync/pacemaker.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPG
mber of ways to do it but pcs is not one of them.
Automation solutions like chef or puppet can do it (saltstack has an
event-reactor system that can make it transparent, don't know about
others), you could put it on zfs and clone snapshots, syncthing, rsync,
and so on.
--
Dimitri Maziuk
Programmer/sysadmin
hat fails over just fine, the difference is it
doesn't export the drbd over nfs. So it could be nfs. I also had it
working initially -- otherwise it'd never made it into production, so it
may be the recent redhat kernels.
Thanks though, I'll probably try drbd-users next.
--
Dimitri Maziuk
Programmer
CESS COMMAND
> /raid: root kernel mount (root)/raid
After running yum up on the primary and rebooting it again,
5. pcs cluster unstandby
causes the same fail to unmount loop on the secondary, that has to be
powered down until the primary recovers.
Hopefully I'm doing something
PS you could probably use iptables to block/log outgoing traffic from
the wrong ip (different on each node) to be really really sure.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
s long as outgoing packets don't have it as
their from address, you should be fine.
I.e. just have both ips up on either node and see what happens.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital sig
md[1137]: warning: Action 46
> (drbd_filesystem_stop_0) on lionfish failed (target: 0 vs. rc: 1): Error
> Oct 15 15:32:00 lionfish crmd[1137]: notice: Transition aborted by
> drbd_filesystem_stop_0 'modify' on lionfish: Event failed
> (magic=0:1;46:4:0:700f71e0-d565
> -496f-a2c6-6b97f0cfd940
aside for it.
If it's small enough, dd if=/dev/zero of=/your/partition
Get DRBD working and fully sync'ed outside of the cluster before you
start adding it.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: Ope
PS. in security handling everything at one (high) level is known as
"hard crunchy shell with soft chewy center". It's not seen as a good thing.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital
ctions, is way more disruptive than mdadm going into "degraded"
state and sending you an e-mail.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
_
ious counter-example is a hard disk failure: they're common on
commodity spinning rust drives and they're cheap and easy to handle at
lower level by throwing in a 2nd one in mdadm raid-1.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
On 10/05/2016 12:19 PM, Digimer wrote:
> Explain why this is a bad idea, because I don't see anything wrong with it.
My point exactly.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signat
ed to "ip".
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
ng around the back of the rack. Maybe if you run a
zillion of stacked active-active resources on a 100-node cluster DRBD
split brain becomes a real problem, from where I'm sitting stonith'ing
DRBD nodes is a solution in search of a problem.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank,
to use, but the
system doesn't have to listen.
So e.g. whoever suggested (Lars?) that on non-Linux platforms you sed
all the shebang lines to /usr/bin/bash or whatever -- that's not
guaranteed to work.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signatu
ise
> wouldn't to be able to maintain software. Not sure where local
> originates, but wouldn't bet that it's bash.
Well 2 out of 3 is "most", can't argue with that.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.a
On 08/29/2016 03:27 PM, Vladislav Bogdanov wrote:
> Maybe #!/bin/ocfsh symlink provided by resource-agents package?
... and that's how lennartware ended up implementing its own syslog...
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.
ause I haven't looked into gfs lock manager: I'm sure it sucks
just as hard only differently.)
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users
don't know which of them would be "less complicated".
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.
r filesystem?
Otherwise it'll be mounted on one node only and you can't run your
webapp on the other as documentroot etc. are unavailable there.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digit
On 06/22/2016 01:00 PM, Lentes, Bernd wrote:
> - On Jun 22, 2016, at 7:17 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
>> Does your webapp ever write to /srv/www?
> it does.
Yeah, OK, it that case you want DRBD so the writes go to both nodes at
once.
If you have to use
store and transactional
replication on the database side, and have only the floating IP address
controlled by the cluster.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital sig
ontrolled power socket.
I knew that, actually, that's why I hung on to heartbeat for as long as
I could. It'd be nice to have it spelled out in bold at the start of
every "explained from scratch" document on clusterlabs.org for the young
players.
--
Dimitri Maziuk
Programmer/sysadmin
BioMag
s, then make like simpler and remove pacemaker entirely.
Obviously you'd have to remove the other node as well since you now
can't have the single service access point anymore.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Descript
g and
generating alert and failing" is the alert flood.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http:/
Hi all,
next question: I'm on centos 7 and there's no more /etc/init.d/. With lennartware spreading, is there a coherent plan to deal
with former LSB agents?
Specifically, should I roll my own RA for dovecot or is there one in the
works somewhere?
TIA,
--
Dimitri Maziuk
Programmer/sysadmin
eah, that could work... but if my way works I won't have to write my
own RA -- or at least not for postfix. ;)
Thanks,
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP di
irs, then c) restart postfix
in send-only "slave" configuration. On the other node I could simply
restart the "master" postfix after b), but on the node going passive the
b) has to be between a) and c).
Thx,
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- h
omplish it?
(I know running an MTA that way is not the Approved Way(tm), I have my
reasons for wanting to it like this.)
TIA
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenP
h connections to
the nodes' "proper" IPs.
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://cl
55:50 2016', queued=0ms, exec=51ms
OK, I fixed the config file, how do I restart rsyncd now?
TIA
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Use
/curl and enable /server-status in the first place.)
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://clu
-apache.html
suggests that apache RA does not and all you can do in practice is run
the same curl http:/localhost/server-status check with different
frequencies. Would that be what we actually have ATM?
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.
didn't have the auto-recovery and
notification handlers set up initially and ended up split-braining it.
Now that everything's clean and happy, pcs cluster stop works without
killing the login. Solved for now.
Thx
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wi
of the new and
improved ip RA and/or ip command?
TIA,
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http
62 matches
Mail list logo