Re: [ClusterLabs] DRBD or SAN ?
On 2017-07-17 03:07 PM, Chris Adams wrote: > Once upon a time, Iansaid: >> I think a big advantage compared to native replication is that DRBD offers >> synchronous replication at the block level as opposed to the transaction >> level. > > However, just like RAID is not a replacement for backups, DRBD is IMHO > not a replacement for database replication. DRBD would just replicate > database files, so if for example file corruption would be copied from > host to host. When something provides a native replication system, it > is probably better to use that (or at least use it at one level). You are absolutely correct. However, OP asked about DRBD vs SAN, not DRBD/SAN versus backup. Proper continuity planning requires redundancy (DRBD + clustering), backup and DR as three separate components. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DRBD or SAN ?
Once upon a time, Iansaid: > I think a big advantage compared to native replication is that DRBD offers > synchronous replication at the block level as opposed to the transaction > level. However, just like RAID is not a replacement for backups, DRBD is IMHO not a replacement for database replication. DRBD would just replicate database files, so if for example file corruption would be copied from host to host. When something provides a native replication system, it is probably better to use that (or at least use it at one level). -- Chris Adams ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DRBD or SAN ?
DRBD is absolutely compatible with mysql/postgres. In my experience, with a 10G pipe for block replication, there's also basically no performance degradation compared to native disk writes, even with an SSD array. > I've always favored native replication over disk replication for databases. I'm not sure that's a necessity, but I would think the biggest advantage is that with disk replication, you couldn't run the database server all the time, you'd have to start it (and its VM in your case) after disk failover. This is true, you have to start mysql/the vm during failover, but in my experience usually that is very fast (depending on mysql/vm configuration). Also, depending on what replication tools you are using, you'd still have to promote the slave to a master, which might save seconds or less compared to starting mysql (which, I recognize, could be important). Note that I am unfamiliar with methods of postgres replication. I think a big advantage compared to native replication is that DRBD offers synchronous replication at the block level as opposed to the transaction level. I have not run my own tests, but my vicarious experience with tools such as Galera or other synchronous forms of SQL replication indicate that there may be a significant performance hit based on workload. Obviously, there's no significant performance hit if you are doing mysql native asynchronous replication (I guess as long as you aren't spending all your i/o on the master on your binlogs), but then you are relying on asynchronous processes to keep your data safe and available. Perhaps that is not a big deal, I am not well versed in that level of replication theory. > physical proximity so that environmental factors could take down the whole thing. Literal server fires come to mind. @OP, I agree with Ken that a multi-datacenter setup is ideal if your application can deal with its various caveats and may be worth investigating since the advantages of moving to a DRBD setup doesn't eliminate any more problems than a multi-dc setup would as long as your SAN is already set up on independent electrical circuits and separate networking stacks to begin with. E.g., both a multi-server and multi-center setup would protect from small disasters that take out the whole server, but a DRBD setup does not add much more than that whereas a multi-center setup would also protect from large-scale disaster. DRBD and a SAN would also both suffer from a building-wide power outage. Do you have generators? On Mon, Jul 17, 2017 at 1:30 PM, Digimerwrote: > On 2017-07-17 05:51 AM, Lentes, Bernd wrote: > > Hi, > > > > i established a two node cluster with two HP servers and SLES 11 SP4. > I'd like to start now with a test period. Resources are virtual machines. > The vm's reside on a FC SAN. The SAN has two power supplies, two storage > controller, two network interfaces for configuration. Each storage > controller has two FC connectors. On each server i have one FC controller > with two connectors in a multipath configuration. Each connector from the > SAN controller inside the server is connected to a different storage > controller from the SAN. But isn't a SAN, despite all that redundancy, a > SPOF ? > > I'm asking myself if a DRBD configuration wouldn't be more redundant and > high available. There i have two completely independent instances of the vm. > > We have one web application with a databse which is really crucial for > us. Downtime should be maximum one or two hours, if longer we run in > trouble. > > Is DRBD in conjuction with a database (MySQL or Postgres) possible ? > > > > > > Bernd > > DRBD any day. > > Yes, even with all the redundancy, it's a single electrical/mechanical > device that can be taken offline by a bad firmware update, user error, > etc. DRBD gives you full mechanical and electrical replication of the > data and has survived some serious in-the-field faults in our Anvil! > system (including a case where three drives were ejected at the same > time from the node hosting the VMs, and the servers lived). > > -- > Digimer > Papers and Projects: https://alteeve.com/w/ > "I am, somehow, less interested in the weight and convolutions of > Einstein’s brain than in the near certainty that people of equal talent > have lived and died in cotton fields and sweatshops." - Stephen Jay Gould > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DRBD or SAN ?
On 2017-07-17 05:51 AM, Lentes, Bernd wrote: > Hi, > > i established a two node cluster with two HP servers and SLES 11 SP4. I'd > like to start now with a test period. Resources are virtual machines. The > vm's reside on a FC SAN. The SAN has two power supplies, two storage > controller, two network interfaces for configuration. Each storage controller > has two FC connectors. On each server i have one FC controller with two > connectors in a multipath configuration. Each connector from the SAN > controller inside the server is connected to a different storage controller > from the SAN. But isn't a SAN, despite all that redundancy, a SPOF ? > I'm asking myself if a DRBD configuration wouldn't be more redundant and high > available. There i have two completely independent instances of the vm. > We have one web application with a databse which is really crucial for us. > Downtime should be maximum one or two hours, if longer we run in trouble. > Is DRBD in conjuction with a database (MySQL or Postgres) possible ? > > > Bernd DRBD any day. Yes, even with all the redundancy, it's a single electrical/mechanical device that can be taken offline by a bad firmware update, user error, etc. DRBD gives you full mechanical and electrical replication of the data and has survived some serious in-the-field faults in our Anvil! system (including a case where three drives were ejected at the same time from the node hosting the VMs, and the servers lived). -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DRBD or SAN ?
On 07/17/2017 04:51 AM, Lentes, Bernd wrote: > Hi, > > i established a two node cluster with two HP servers and SLES 11 SP4. I'd > like to start now with a test period. Resources are virtual machines. The > vm's reside on a FC SAN. The SAN has two power supplies, two storage > controller, two network interfaces for configuration. Each storage controller > has two FC connectors. On each server i have one FC controller with two > connectors in a multipath configuration. Each connector from the SAN > controller inside the server is connected to a different storage controller > from the SAN. But isn't a SAN, despite all that redundancy, a SPOF ? What types of failure would be an example of a SPOF? Perhaps a single logic board, or physical proximity so that environmental factors could take down the whole thing. > I'm asking myself if a DRBD configuration wouldn't be more redundant and high > available. There i have two completely independent instances of the vm. I'd agree. However, you'd have to estimate the above risks of the SAN approach, compare any other advantages/disadvantages, and decide whether it's worth a redesign. > We have one web application with a databse which is really crucial for us. > Downtime should be maximum one or two hours, if longer we run in trouble. Another thing to consider is that the risks of SAN vs DRBD are probably much less than the risks of a single data center. If you used booth to set up 2-3 redundant clusters, it would be easier to accept small risks at any one site. Of course, that may not be possible in your situation, every case is different. > Is DRBD in conjuction with a database (MySQL or Postgres) possible ? > > > Bernd I've always favored native replication over disk replication for databases. I'm not sure that's a necessity, but I would think the biggest advantage is that with disk replication, you couldn't run the database server all the time, you'd have to start it (and its VM in your case) after disk failover. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DRBD or SAN ?
On 7/17/2017 4:51 AM, Lentes, Bernd wrote: I'm asking myself if a DRBD configuration wouldn't be more redundant and high available. ... Is DRBD in conjuction with a database (MySQL or Postgres) possible ? Have you seen https://github.com/ewwhite/zfs-ha/wiki ? -- I recently deployed one and so far it's working better than one centos 7 drbd + pacemaker + nfs cluster I have. Although in 20-20 hindsight I wonder if I should've gone BSD instead. If your database is postgres, streaming replication in 9.latest is something to consider. I haven't had any problems running it on top of drbd, but there are advantages to having two completely independent copies of everything. Esp. if it's on top of zfs's advantages. We already "upgraded" our webserver postgres to streaming replication, in the next month or so I plan to bump it to 9.latest from postgres repo, run two independent instances, ditch drbd altogether and use pacemaker only for floating ip. (And zfs insremental snapshots to replicate static content.) In that scenario freebsd's carp looks like much leaner and cleaner alternative to the whole linux ha stack and with zfs in the kernel and absence of systemd... the grass on the other side looks greener and greener every day. Dima ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] did anyone manage to combine ClusterMon RA with HP systems insight manager ? - SOLVED
- On Jul 11, 2017, at 7:25 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > Hi, > > i established a two node cluster and i'd like to start now a test period with > some not very important resources. > I'd like to monitor the cluster via SNMP, so i realize if he's e.g. migrating. > I followed > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html#s-notification-snmp > . > Configuration of the RA went fine, and the cluster is sending traps. As a > central management station we use a HP Systems Insight Manager, because we > have > a some HP servers and SIM can monitor the hardware quite well. > I tried to integrate the respective mib into the SIM. Compilation and Adding > seem to work fine, but SIM does not relate the traps to the mib. > Traps arrive quite fine, i checked it with tcpdump and wireshark. For SIM the > traps are "unregistered", that means it does not relate it to any mib. > I also tried the most recent one from > https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt . Same > problem. I also tried the one from https://github.com/sys4/pacemaker-snmp > (combined with the respective perl-script), > but it also did not work. > I'd like to stay with SIM, because of our HP hardware. And maintaining a > second > system, e.g. Nagios, just for the cluster, ... is this really necessary ? > I read already a lot about SNMP and mib's, and what i learned until now is > that > SNMP is maybe "simple", but not trivial. Same with the mib's. > > Did anyone combine these two successfully ? > > We use SLES 11 SP4 and pacemaker 1.1.12. > Hi, i managed it. If someone is interested, here is an explaination: https://community.saas.hpe.com/t5/Systems-Insight-Manager-Forum/adding-a-pacemaker-cluster-to-HP-SIM-7-4/td-p/1596730 Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] DRBD or SAN ?
Hi, i established a two node cluster with two HP servers and SLES 11 SP4. I'd like to start now with a test period. Resources are virtual machines. The vm's reside on a FC SAN. The SAN has two power supplies, two storage controller, two network interfaces for configuration. Each storage controller has two FC connectors. On each server i have one FC controller with two connectors in a multipath configuration. Each connector from the SAN controller inside the server is connected to a different storage controller from the SAN. But isn't a SAN, despite all that redundancy, a SPOF ? I'm asking myself if a DRBD configuration wouldn't be more redundant and high available. There i have two completely independent instances of the vm. We have one web application with a databse which is really crucial for us. Downtime should be maximum one or two hours, if longer we run in trouble. Is DRBD in conjuction with a database (MySQL or Postgres) possible ? Bernd -- Bernd Lentes Systemadministration institute of developmental genetics Gebäude 35.34 - Raum 208 HelmholtzZentrum München bernd.len...@helmholtz-muenchen.de phone: +49 (0)89 3187 1241 fax: +49 (0)89 3187 2294 no backup - no mercy Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Problem with stonith and starting services
> El 17 jul 2017, a las 8:02, Ulrich Windl> escribió: > > Hi! > > Could this mean the stonith-timeout is signioficantly larger than the time > for a complete reboot? So the fenced node would be up again when the cluster > thinks the fencing has just completed. > > Regards, > Ulrich > P.S: Sorry for the late reply; I was offline for a while... > Thanks. Well, the issue has been resolved, I sent a mail some days ago. And yes, the fencing timeout is set to 180 seconds Cheers ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: Problem with stonith and starting services
Hi! Could this mean the stonith-timeout is signioficantly larger than the time for a complete reboot? So the fenced node would be up again when the cluster thinks the fencing has just completed. Regards, Ulrich P.S: Sorry for the late reply; I was offline for a while... >>> Cesar Hernandezschrieb am 06.07.2017 um 16:20 in Nachricht <0674aeed-8fd2-4dab-a27f-498db0f36...@medlabmg.com>: >> >> If node2 is getting the notification of its own fencing, it wasn't >> successfully fenced. Successful fencing would render it incapacitated >> (powered down, or at least cut off from the network and any shared >> resources). > > > Maybe I don't understand you, or maybe you don't understand me... ;) > This is the syslog of the machine, where you can see that the machine has > rebooted successfully, and as I said, it has been rebooted successfully all > the times: > > Jul 5 10:41:54 node2 kernel: [0.00] Initializing cgroup subsys > cpuset > Jul 5 10:41:54 node2 kernel: [0.00] Initializing cgroup subsys cpu > Jul 5 10:41:54 node2 kernel: [0.00] Initializing cgroup subsys > cpuacct > Jul 5 10:41:54 node2 kernel: [0.00] Linux version 3.16.0-4-amd64 > (debian-ker...@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP > Debian 3.16.39-1 (2016-12-30) > Jul 5 10:41:54 node2 kernel: [0.00] Command line: > BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 > root=UUID=711e1ec2-2a36-4405-bf46-44b43cfee42e ro init=/bin/systemd > console=ttyS0 console=hvc0 > Jul 5 10:41:54 node2 kernel: [0.00] e820: BIOS-provided physical RAM > map: > Jul 5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem > 0x-0x0009dfff] usable > Jul 5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem > 0x0009e000-0x0009] reserved > Jul 5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem > 0x000e-0x000f] reserved > Jul 5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem > 0x0010-0x3fff] usable > Jul 5 10:41:54 node2 kernel: [0.00] BIOS-e820: [mem > 0xfc00-0x] reserved > Jul 5 10:41:54 node2 kernel: [0.00] NX (Execute Disable) > protection: active > Jul 5 10:41:54 node2 kernel: [0.00] SMBIOS 2.4 present. > > ... > > Jul 5 10:41:54 node2 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port > 67 > > ... > > Jul 5 10:41:54 node2 corosync[585]: [MAIN ] Corosync Cluster Engine > ('UNKNOWN'): started and ready to provide service. > Jul 5 10:41:54 node2 corosync[585]: [MAIN ] Corosync built-in features: > nss > Jul 5 10:41:54 node2 corosync[585]: [MAIN ] Successfully read main > configuration file '/etc/corosync/corosync.conf'. > > ... > > Jul 5 10:41:57 node2 crmd[608]: notice: Defaulting to uname -n for the > local classic openais (with plugin) node name > Jul 5 10:41:57 node2 crmd[608]: notice: Membership 4308: quorum acquired > Jul 5 10:41:57 node2 crmd[608]: notice: plugin_handle_membership: Node > node2[1108352940] - state is now member (was (null)) > Jul 5 10:41:57 node2 crmd[608]: notice: plugin_handle_membership: Node > node11[794540] - state is now member (was (null)) > Jul 5 10:41:57 node2 crmd[608]: notice: The local CRM is operational > Jul 5 10:41:57 node2 crmd[608]: notice: State transition S_STARTING -> > S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] > Jul 5 10:41:57 node2 stonith-ng[604]: notice: Watching for stonith > topology changes > Jul 5 10:41:57 node2 stonith-ng[604]: notice: Membership 4308: quorum > acquired > Jul 5 10:41:57 node2 stonith-ng[604]: notice: plugin_handle_membership: > Node node11[794540] - state is now member (was (null)) > Jul 5 10:41:57 node2 stonith-ng[604]: notice: On loss of CCM Quorum: > Ignore > Jul 5 10:41:58 node2 stonith-ng[604]: notice: Added 'st-fence_propio:0' to > the device list (1 active devices) > Jul 5 10:41:59 node2 stonith-ng[604]: notice: Operation reboot of node2 by > node11 for crmd.2141@node11.61c3e613: OK > Jul 5 10:41:59 node2 crmd[608]: crit: We were allegedly just fenced by > node11 for node11! > Jul 5 10:41:59 node2 corosync[585]: [pcmk ] info: pcmk_ipc_exit: Client > crmd (conn=0x228d970, async-conn=0x228d970) left > Jul 5 10:41:59 node2 pacemakerd[597]: warning: The crmd process (608) can > no longer be respawned, shutting the cluster down. > Jul 5 10:41:59 node2 pacemakerd[597]: notice: Shutting down Pacemaker > Jul 5 10:41:59 node2 pacemakerd[597]: notice: Stopping pengine: Sent -15 > to process 607 > Jul 5 10:41:59 node2 pengine[607]: notice: Invoking handler for signal > 15: Terminated > Jul 5 10:41:59 node2 pacemakerd[597]: notice: Stopping attrd: Sent -15 to > process 606 > Jul 5 10:41:59 node2 attrd[606]: notice: Invoking handler for signal 15: > Terminated > Jul 5 10:41:59 node2 attrd[606]: notice: