Re: [ClusterLabs] Resources suddenly get target-role="stopped"
Hi Ken, The comments are in the text. On 4.12.2015 19:06, Ken Gaillot wrote: On 12/04/2015 10:22 AM, Klechomir wrote: Hi list, My issue is the following: I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8 (observed the same problem with Corosync 2.3.5 & Pacemaker 1.1.13-rc3) Bumped on this issue when started playing with VirtualDomain resources, but this seems to be unrelated to the RA. The problem is that without apparent reason a resource gets target-role="Stopped". This happens after (successful) migration, or after failover., or after VM restart . My tests showed that changing the resource name fixes this problem, but this seems to be a temporary workaround. The resource configuration is: primitive VMA_VM1 ocf:heartbeat:VirtualDomain \ params config="/NFSvolumes/CDrive1/VM1/VM1.xml" hypervisor="qemu:///system" migration_transport="tcp" \ meta allow-migrate="true" target-role="Started" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="10" timeout="30" depth="0" \ utilization cpu="1" hv_memory="925" order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1 Here is the log from one such stop, after successful migration with "crm migrate resource VMA_VM1": Dec 04 15:18:22 [3818929] CLUSTER-1 crmd:debug: cancel_op: Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564) Dec 04 15:18:22 [4434] CLUSTER-1 lrmd: info: cancel_recurring_action: Cancelling operation VMA_VM1_monitor_1 Dec 04 15:18:23 [3818929] CLUSTER-1 crmd:debug: cancel_op: Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled Dec 04 15:18:23 [3818929] CLUSTER-1 crmd:debug: do_lrm_rsc_op:Performing key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_migrate_to_0 VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 DEBUG: Virtual domain VM1 is currently running. VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2 (using virsh --connect=qemu:///system --quiet migrate --live VM1 qemu+tcp://CLUSTER-2/system ). Dec 04 15:18:24 [3818929] CLUSTER-1 crmd: info: process_lrm_event:LRM operation VMA_VM1_monitor_1 (call=5564, status=1, cib-update=0, confirmed=false) Cancelled Dec 04 15:18:24 [3818929] CLUSTER-1 crmd:debug: update_history_cache: Updating history for 'VMA_VM1' with monitor op VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. Dec 04 15:18:26 [4434] CLUSTER-1 lrmd:debug: operation_finished: VMA_VM1_migrate_to_0:1797698 - exited with rc=0 Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: operation_finished: VMA_VM1_migrate_to_0:1797698 [ 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2 (using virsh --connect=qemu:///system --quiet migrate --live VM1 qemu+tcp://CLUSTER-2/system ). ] Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: operation_finished: VMA_VM1_migrate_to_0:1797698 [ 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ] Dec 04 15:18:27 [3818929] CLUSTER-1 crmd:debug: create_operation_update: do_update_resource: Updating resouce VMA_VM1 after complete migrate_to op (interval=0) Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: notice: process_lrm_event:LRM operation VMA_VM1_migrate_to_0 (call=5697, rc=0, cib-update=89, confirmed=true) ok Dec 04 15:18:27 [3818929] CLUSTER-1 crmd:debug: update_history_cache: Updating history for 'VMA_VM1' with migrate_to op Dec 04 15:18:31 [3818929] CLUSTER-1 crmd:debug: cancel_op: Operation VMA_VM1:5564 already cancelled Dec 04 15:18:31 [3818929] CLUSTER-1 crmd:debug: do_lrm_rsc_op:Performing key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0 VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 DEBUG: Virtual domain VM1 is not running: failed to get domain 'vm1' domain not found: no domain with matching name 'vm1' This looks like the problem. Configuration error? As far as I checked this is a harmless bug in VirtualDomain RA. It downcases the output from "virsh domain info" command, to be able to parse the status easily, which prevents matching the domain name. In any case this error doesn't affect the RA functionality, in this case it just finds out that the resource is already stopped, while my big concern is why is it stopped. VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 INFO: Domain VM1 already stopped. Dec 04 15:18:31 [4434] CLUSTER-1 lrmd:debug: operation_finished: VMA_VM1_stop_0:1798719 - exited with rc=0 Dec 04 15:18:31 [4434] CLUSTER-1 lrmd: notice: operation_finished: VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31 INFO: Domain VM1 already stopped. ] Dec 04 15:18:32 [3818929] CLUSTER-1 crmd:debug: create_operation_upd
Re: [ClusterLabs] Resources suddenly get target-role="stopped"
Hi, Sorry didn't get your point. The xml of the VM is on a active-active drbd drive with ocfs2 fs on it and is visible from both nodes. The live migration is always successful. On 4.12.2015 19:30, emmanuel segura wrote: I think the xml of your vm need to available on both nodes, but your using a failover resource Filesystem_CDrive1, because pacemaker monitor resource on both nodes to check if they are running in multiple nodes. 2015-12-04 18:06 GMT+01:00 Ken Gaillot : On 12/04/2015 10:22 AM, Klechomir wrote: Hi list, My issue is the following: I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8 (observed the same problem with Corosync 2.3.5 & Pacemaker 1.1.13-rc3) Bumped on this issue when started playing with VirtualDomain resources, but this seems to be unrelated to the RA. The problem is that without apparent reason a resource gets target-role="Stopped". This happens after (successful) migration, or after failover., or after VM restart . My tests showed that changing the resource name fixes this problem, but this seems to be a temporary workaround. The resource configuration is: primitive VMA_VM1 ocf:heartbeat:VirtualDomain \ params config="/NFSvolumes/CDrive1/VM1/VM1.xml" hypervisor="qemu:///system" migration_transport="tcp" \ meta allow-migrate="true" target-role="Started" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="10" timeout="30" depth="0" \ utilization cpu="1" hv_memory="925" order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1 Here is the log from one such stop, after successful migration with "crm migrate resource VMA_VM1": Dec 04 15:18:22 [3818929] CLUSTER-1 crmd:debug: cancel_op: Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564) Dec 04 15:18:22 [4434] CLUSTER-1 lrmd: info: cancel_recurring_action: Cancelling operation VMA_VM1_monitor_1 Dec 04 15:18:23 [3818929] CLUSTER-1 crmd:debug: cancel_op: Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled Dec 04 15:18:23 [3818929] CLUSTER-1 crmd:debug: do_lrm_rsc_op:Performing key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_migrate_to_0 VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 DEBUG: Virtual domain VM1 is currently running. VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2 (using virsh --connect=qemu:///system --quiet migrate --live VM1 qemu+tcp://CLUSTER-2/system ). Dec 04 15:18:24 [3818929] CLUSTER-1 crmd: info: process_lrm_event:LRM operation VMA_VM1_monitor_1 (call=5564, status=1, cib-update=0, confirmed=false) Cancelled Dec 04 15:18:24 [3818929] CLUSTER-1 crmd:debug: update_history_cache: Updating history for 'VMA_VM1' with monitor op VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. Dec 04 15:18:26 [4434] CLUSTER-1 lrmd:debug: operation_finished: VMA_VM1_migrate_to_0:1797698 - exited with rc=0 Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: operation_finished: VMA_VM1_migrate_to_0:1797698 [ 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2 (using virsh --connect=qemu:///system --quiet migrate --live VM1 qemu+tcp://CLUSTER-2/system ). ] Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: operation_finished: VMA_VM1_migrate_to_0:1797698 [ 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ] Dec 04 15:18:27 [3818929] CLUSTER-1 crmd:debug: create_operation_update: do_update_resource: Updating resouce VMA_VM1 after complete migrate_to op (interval=0) Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: notice: process_lrm_event:LRM operation VMA_VM1_migrate_to_0 (call=5697, rc=0, cib-update=89, confirmed=true) ok Dec 04 15:18:27 [3818929] CLUSTER-1 crmd:debug: update_history_cache: Updating history for 'VMA_VM1' with migrate_to op Dec 04 15:18:31 [3818929] CLUSTER-1 crmd:debug: cancel_op: Operation VMA_VM1:5564 already cancelled Dec 04 15:18:31 [3818929] CLUSTER-1 crmd:debug: do_lrm_rsc_op:Performing key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0 VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 DEBUG: Virtual domain VM1 is not running: failed to get domain 'vm1' domain not found: no domain with matching name 'vm1' This looks like the problem. Configuration error? VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 INFO: Domain VM1 already stopped. Dec 04 15:18:31 [4434] CLUSTER-1 lrmd:debug: operation_finished: VMA_VM1_stop_0:1798719 - exited with rc=0 Dec 04 15:18:31 [4434] CLUSTER-1 lrmd: notice: operation_finished: VMA_VM1_stop_0:1798719 [ 2015/12/04_15:18:31 INFO: Domain VM1 already stopped. ] Dec 04 15:18:32 [3818929] CLUSTER-1 crmd:debug:
[ClusterLabs] Antw: new version of Cronlink RA
Hi! I wonder: It seems it does the same thing as the RA I wrote some years ago: (crm ra info ISC-cron) OCF Resource Agent managing crontabs for ISC cron (ocf:xola:ISC-cron) OCF Resource Agent managing crontabs for ISC cron This RA manages crontabs for the ISC cron daemon by managing links to specific crontabs in /etc/cron.d. The "start" method adds a symbolic link for the specified crontab file to /etc/cron.d, while the "stop" method removes that link again. The "monitor" method tests if a link for the specified crontab exists in /etc/cron.d. Parameters (*: required, []: default): crontab* (string): Name of crontab file Name of the crontab file. The file cannot be inside /etc/cron.d. linkname (string): Name of crontab link Name of the link to the crontab file inside /etc/cron.d. If absent, the name of the "crontab" parameter is used. Operations' defaults (advisory minimum): start timeout=30s stop timeout=30s reloadtimeout=30s monitor timeout=30s interval=5m Example config would look like this: primitive prm_cron_sample ocf:xola:ISC-cron \ params crontab="/etc/crontabs/sample" \ op start interval=0 timeout=30 \ op stop interval=0 timeout=30 \ op monitor interval=600 timeout=30 If active the RA will create a symlink in /etc/cron.d to /etc/crontabs/sample (which must be owned by root and writable only by root) We use it combined wioth colocation for other primitives... If people are interested, I can publish the current version. Regards, Ulrich >>> Charles Williams schrieb am 17.11.2015 um 10:05 in Nachricht <1447751118.5585.3.camel@attitude>: > > hey all, > > just finished a new version of the cronlink RA. If interested in testing > you can get it here: > https://wiki.itadmins.net/doku.php?id=high_availability:ocf_cronlink > > I currently have this in production (since 2012) and have had no issues. > If you have any problems or ideas just let me know. > > Chuck ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: start service after filesystemressource
>>> Ken Gaillot schrieb am 20.11.2015 um 16:06 in >>> Nachricht <564f36e2.90...@redhat.com>: [...] >> location cli-prefer-collectd collectd inf: host-1 >> location cli-prefer-failover-ip1 failover-ip1 inf: host-1 >> location cli-prefer-failover-ip2 failover-ip2 inf: host-1 >> location cli-prefer-failover-ip3 failover-ip3 inf: host-1 >> location cli-prefer-res_drbd_export res_drbd_export inf: hermes-1 >> location cli-prefer-res_fs res_fs inf: host-1 > > A word of warning, these "cli-" constraints were added automatically > when you ran CLI commands to move resources to specific hosts. You have > to clear these when you're done with whatever the move was for, > otherwise the resources will only run on those nodes from now on. Actually I prefer to add timing restrictions when creating such. So they become ineffective automatically. Usually it's sufficient to say "for 5 minutes" (PT5M). Of course this depends on your location preferences and stickiness settings. For use resources remain where they run unless theres a failure... Regards, Ulrich ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Perl Modules for resource agents (was: Resource Agent language discussion)
Hi! A few comments (Build.PL): The part beginning at --- $ocf_dirs = qx{ . "$lib_ocf_dirs" 2> /dev/null echo "\$INITDIR" ... --- is somewhat complicated. Why not do something like --- $ocf_dirs = qx{ . "$lib_ocf_dirs" 2> /dev/null echo "INITDIR=\$INITDIR" ... --- and then parse the output to get both, the variable name and the value? (OCF_ReturnCodes.pm) Why not replace our $OCF_SUCCESS = 0; with use constant OCF_SUCCESS => 0; ? What's the use of "/g" in "$ARG[0] =~ /(\d+)/g" (OCF_Functions.pm)? And maybe some working sample RA would be nice to see... Regards, Ulrich >>> Jehan-Guillaume de Rorthais schrieb am 26.11.2015 um 01:13 in Nachricht <20151126011336.1ca93535@firost>: > Le Thu, 20 Aug 2015 18:21:01 +0200, > Jehan-Guillaume de Rorthais a écrit : > >> On Thu, 20 Aug 2015 15:05:24 +1000 >> Andrew Beekhof wrote: > [...] > >> > > What I was discussing here was: >> > > >> > > * if not using bash, is there any trap we should avoid that are already >> > >addressed in the ocf-shellfuncs library? >> > >> > No, you just might have to re-implement some things. >> > Particularly logging. >> >> Ok, that was my conclusion so far. I'll have a look at the logging funcs > then. >> >> > > * is there a chance a perl version of such library would be accepted >> > > upstream? >> > >> > Depends if you’re volunteering to maintain it too :) >> >> I do. I'll have to do it on my own for my RA anyway. > > Months are flying! Already 3 of them since my last answer... > > I spent some time to port "ocf-shellfuncs", "ocf-returncodes" and > "ocf-directories" shell scripts as perl modules called "OCF_Functions.pm", > "OCF_ReturnCodes.pm" and "OCF_Directories.pm". They are currently > hiding in our pgsql-resource-agent repository under the "multistate/lib" > folder. See : > > https://github.com/dalibo/pgsql-resource-agent/ > > They are used from the "pgsqlms" resource agent available in the > "multistate/script" folder. They are supposed to leave in > "$OCF_ROOT//lib/heartbeat/". The pgsqlms agent has been tested again and > again > in various failure situations under CentOS 6 and CentOS 7. Modules seems to > behave correctly. > > Before considering pushing them out in a dedicated repository (or upstream?) > where maintaining them would be easier, I would like to hear some feedback > about > them. > > First, OCF_Functions does not implement all the shell functions available in > ocf-shellfuncs. As a first step, I focused on a simple module supporting the > popular functions we actually needed for our own agent. Let me know if I > forgot > a function that MUST be in this first version. > > Second, "OCF_Directories.pm" is actually generated from > "OCF_Directories.pm.PL". > Because I can not rely on the upstream autogen/configure to detect the > distribution specific destination folders, I wrote a wrapper in > "multistate/Build.PL" around the "ocf-directories" shell script to export > these > variables in a temp file. Then when "building" the module, > OCF_Directories.pm.PL read this temp file to produce the final > distribution-dependent "OCF_Directories.pm". I don't like stuffing too much > shell in perl scripts, but it's really like the autogen/configure process at > the end of the day and this piece of code is only in the build process. > > Cleaner ways would be to: > > * generate OCF_Directories.pm by the upstream ./configure which already > have > all the logic > * re-implement the logic to find the appropriate destination folders in > "Build.PL". I am currently not able to follow this solution as reverse > engineering the autogen/configure process seems pretty difficult and > time > consuming. > > The libs are currently auto-installed with our pgsqlms agent following the > quite > standard way to install perl modules and scripts: > > perl Build.PL > perl Build > perl Build install > > Any feedback, advice, patch etc would be appreciated! > > PS: files are in attachment for ease of review. > > Regards, > -- > Jehan-Guillaume de Rorthais > Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Resources suddenly get target-role="stopped"
the next time show your full config unless your config have something special that you can't show. 2015-12-07 9:08 GMT+01:00 Klechomir : > Hi, > Sorry didn't get your point. > > The xml of the VM is on a active-active drbd drive with ocfs2 fs on it and > is visible from both nodes. > The live migration is always successful. > > > > On 4.12.2015 19:30, emmanuel segura wrote: >> >> I think the xml of your vm need to available on both nodes, but your >> using a failover resource Filesystem_CDrive1, because pacemaker >> monitor resource on both nodes to check if they are running in >> multiple nodes. >> >> 2015-12-04 18:06 GMT+01:00 Ken Gaillot : >>> >>> On 12/04/2015 10:22 AM, Klechomir wrote: Hi list, My issue is the following: I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8 (observed the same problem with Corosync 2.3.5 & Pacemaker 1.1.13-rc3) Bumped on this issue when started playing with VirtualDomain resources, but this seems to be unrelated to the RA. The problem is that without apparent reason a resource gets target-role="Stopped". This happens after (successful) migration, or after failover., or after VM restart . My tests showed that changing the resource name fixes this problem, but this seems to be a temporary workaround. The resource configuration is: primitive VMA_VM1 ocf:heartbeat:VirtualDomain \ params config="/NFSvolumes/CDrive1/VM1/VM1.xml" hypervisor="qemu:///system" migration_transport="tcp" \ meta allow-migrate="true" target-role="Started" \ op start interval="0" timeout="120s" \ op stop interval="0" timeout="120s" \ op monitor interval="10" timeout="30" depth="0" \ utilization cpu="1" hv_memory="925" order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1 Here is the log from one such stop, after successful migration with "crm migrate resource VMA_VM1": Dec 04 15:18:22 [3818929] CLUSTER-1 crmd:debug: cancel_op: Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564) Dec 04 15:18:22 [4434] CLUSTER-1 lrmd: info: cancel_recurring_action: Cancelling operation VMA_VM1_monitor_1 Dec 04 15:18:23 [3818929] CLUSTER-1 crmd:debug: cancel_op: Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled Dec 04 15:18:23 [3818929] CLUSTER-1 crmd:debug: do_lrm_rsc_op:Performing key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_migrate_to_0 VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 DEBUG: Virtual domain VM1 is currently running. VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2 (using virsh --connect=qemu:///system --quiet migrate --live VM1 qemu+tcp://CLUSTER-2/system ). Dec 04 15:18:24 [3818929] CLUSTER-1 crmd: info: process_lrm_event:LRM operation VMA_VM1_monitor_1 (call=5564, status=1, cib-update=0, confirmed=false) Cancelled Dec 04 15:18:24 [3818929] CLUSTER-1 crmd:debug: update_history_cache: Updating history for 'VMA_VM1' with monitor op VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. Dec 04 15:18:26 [4434] CLUSTER-1 lrmd:debug: operation_finished: VMA_VM1_migrate_to_0:1797698 - exited with rc=0 Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: operation_finished: VMA_VM1_migrate_to_0:1797698 [ 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2 (using virsh --connect=qemu:///system --quiet migrate --live VM1 qemu+tcp://CLUSTER-2/system ). ] Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: operation_finished: VMA_VM1_migrate_to_0:1797698 [ 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ] Dec 04 15:18:27 [3818929] CLUSTER-1 crmd:debug: create_operation_update: do_update_resource: Updating resouce VMA_VM1 after complete migrate_to op (interval=0) Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: notice: process_lrm_event:LRM operation VMA_VM1_migrate_to_0 (call=5697, rc=0, cib-update=89, confirmed=true) ok Dec 04 15:18:27 [3818929] CLUSTER-1 crmd:debug: update_history_cache: Updating history for 'VMA_VM1' with migrate_to op Dec 04 15:18:31 [3818929] CLUSTER-1 crmd:debug: cancel_op: Operation VMA_VM1:5564 already cancelled Dec 04 15:18:31 [3818929] CLUSTER-1 crmd:debug: do_lrm_rsc_op:Performing key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0 VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 DEBUG: Virtual domain VM1 is n
[ClusterLabs] design of a two-node cluster
Hi, i've been asking all around here a while ago. Unfortunately I couldn't continue to work on my cluster, so I'm still thinking about the design. I hope you will help me again with some recommendations, because when the cluster is running changing of the design is not possible anymore. These are my requirements: - all services are running inside virtual machines (KVM), mostly databases and static/dynamic webpages - I have two nodes and would like to have some vm's running on node A and some on node B during normal operation as a kind of loadbalancing - I'd like to keep the setup simple (if possible) - availability is important, performance not so much (webpages some hundred requests per day, databases some hundred inserts/selects per day) - I'd like to have snapshots of the vm's - live migration of the vm's should be possible - nodes are SLES 11 SP4, vm's are Windows 7 and severable linux distributions (Ubuntu, SLES, OpenSuSE) - setup should be extensible (add further vm's) - I have a shared storage (FC SAN) My ideas/questions: Should I install all vm's in one partition or every vm in a seperate partition ? The advantage of one vm per partition is that I don't need a cluster fs, right ? I read to avoid a cluster fs if possible because it adds further complexity. Below the fs I'd like to have logical volumes because they are easy to expand. Do I need cLVM (I think so) ? Is it an advantage to install the vm's in plain partitions, without a fs ? It would reduce the complexity further because I don't need a fs. Would live migration still be possible ? snapshots: I was playing around with virsh (libvirt) to create snapshots of the vm's. In the end I gave up. virsh explains commands in its help, but when you want to use them you get messages like "not supported yet", although I use libvirt 1.2.11. This is ridiculous. I think I will create my snapshots inside the vm's using lvm. We have a network based backup solution (Legato/EMC) which saves the disks every night. Supplying a snapshot for that I have a consistent backup. The databases are dumped with their respective tools. Thanks in advance. Bernd -- Bernd Lentes Systemadministration institute of developmental genetics Gebäude 35.34 - Raum 208 HelmholtzZentrum München bernd.len...@helmholtz-muenchen.de phone: +49 (0)89 3187 1241 fax: +49 (0)89 3187 2294 Wer Visionen hat soll zum Hausarzt gehen Helmut Schmidt Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Dr. Nikolaus Blum, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] design of a two-node cluster
On 07/12/15 12:35 PM, Lentes, Bernd wrote: > Hi, > > i've been asking all around here a while ago. Unfortunately I couldn't > continue to work on my cluster, so I'm still thinking about the design. > I hope you will help me again with some recommendations, because when the > cluster is running changing of the design is not possible anymore. > > These are my requirements: > > - all services are running inside virtual machines (KVM), mostly databases > and static/dynamic webpages This is fine, it's what we do with our 2-node clusters. > - I have two nodes and would like to have some vm's running on node A and > some on node B during normal operation as a kind of loadbalancing I used to do this, but I've since stopped. The reasons are: 1. You need to know that one node can host all servers and still perform properly. By always running on one node, you know that this is the case. Further, if one node ever stops being powerful enough, you will find out early and can address the issue immediately. 2. If there is a problem, you can always be sure which node to terminate (ie: the node hosting all servers gets the fence delay, so the node without servers will always get fenced). If you lose input power, you can quickly power down the backup node to shed load, etc. > - I'd like to keep the setup simple (if possible) There is a minimum complexity in HA, but you can get as close as possible. We've spent years trying to simplify our VM hosting clusters as much as possible. > - availability is important, performance not so much (webpages some > hundred requests per day, databases some hundred inserts/selects per day) All the more reason to consolidate all VMs on one host. > - I'd like to have snapshots of the vm's This is never a good idea, as you catch the state of the disk at the point of the snapshot, but not RAM. Anything in buffers will be missed so you can not rely on the snapshot images to always be consistent or even functional. > - live migration of the vm's should be possible Easy enough. > - nodes are SLES 11 SP4, vm's are Windows 7 and severable linux > distributions (Ubuntu, SLES, OpenSuSE) The OS installed on the guest VMs should not factor. As for the node OS, SUSE invests in making sure that HA works well so you should be fine. > - setup should be extensible (add further vm's) That is entirely a question of available hardware resources. > - I have a shared storage (FC SAN) Personally, I prefer DRBD (truly replicated storage), but SAN is fine. > My ideas/questions: > > Should I install all vm's in one partition or every vm in a seperate > partition ? The advantage of one vm per partition is that I don't need a > cluster fs, right ? I would put each VM on a dedicated LV and not have an FS between the VM and the host. The question then becomes; What is the PV? I use clustered LVM to make sure all nodes are in sync, LVM-wise. > I read to avoid a cluster fs if possible because it adds further > complexity. Below the fs I'd like to have logical volumes because they are > easy to expand. Avoiding clustered FS is always preferable, yes. I use a small gfs2 partition, but this is just for storing VM XML data, install media, etc. Things that change rarely. Some advocate for having independent FSes on each node and keeping the data in sync using things like rsync or what have you. > Do I need cLVM (I think so) ? Is it an advantage to install the vm's in > plain partitions, without a fs ? I advise it, yes. > It would reduce the complexity further because I don't need a fs. Would > live migration still be possible ? Live migration is possible provided both nodes can see the same physical storage at the same time. For example, DRBD dual-primary works. If you use clustered LVM, you can be sure that the backing LVs are the same across the nodes. > snapshots: > I was playing around with virsh (libvirt) to create snapshots of the vm's. > In the end I gave up. virsh explains commands in its help, but when you > want to use them you get messages > like "not supported yet", although I use libvirt 1.2.11. This is > ridiculous. I think I will create my snapshots inside the vm's using lvm. > We have a network based backup solution (Legato/EMC) which saves the disks > every night. > Supplying a snapshot for that I have a consistent backup. The databases > are dumped with their respective tools. > > Thanks in advance. I don't recommend snapshots, as I mentioned. Focus on your backup application and create DR VMs if you want to minimize the time to recovery after a total VM loss is what I recommend. > Bernd -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.p
Re: [ClusterLabs] design of a two-node cluster
Digimer wrote: > > On 07/12/15 12:35 PM, Lentes, Bernd wrote: > > Hi, > > > > i've been asking all around here a while ago. Unfortunately I couldn't > > continue to work on my cluster, so I'm still thinking about the design. > > I hope you will help me again with some recommendations, because > when > > the cluster is running changing of the design is not possible anymore. > > > > These are my requirements: > > > > - all services are running inside virtual machines (KVM), mostly > > databases and static/dynamic webpages > > This is fine, it's what we do with our 2-node clusters. > > > - I have two nodes and would like to have some vm's running on node > A > > and some on node B during normal operation as a kind of loadbalancing > > I used to do this, but I've since stopped. The reasons are: > > 1. You need to know that one node can host all servers and still perform > properly. By always running on one node, you know that this is the case. > Further, if one node ever stops being powerful enough, you will find out > early and can address the issue immediately. > > 2. If there is a problem, you can always be sure which node to terminate > (ie: the node hosting all servers gets the fence delay, so the node > without servers will always get fenced). If you lose input power, you can > quickly power down the backup node to shed load, etc. Hi Digimer, thanks for your reply. I don't understand what you want to say in (2). > > > - I'd like to keep the setup simple (if possible) > > There is a minimum complexity in HA, but you can get as close as > possible. We've spent years trying to simplify our VM hosting clusters as > much as possible. > > > - availability is important, performance not so much (webpages some > > hundred requests per day, databases some hundred inserts/selects > per > > day) > > All the more reason to consolidate all VMs on one host. > > > - I'd like to have snapshots of the vm's > > This is never a good idea, as you catch the state of the disk at the point of > the snapshot, but not RAM. Anything in buffers will be missed so you can > not rely on the snapshot images to always be consistent or even > functional. > > > - live migration of the vm's should be possible > > Easy enough. > > > - nodes are SLES 11 SP4, vm's are Windows 7 and severable linux > > distributions (Ubuntu, SLES, OpenSuSE) > > The OS installed on the guest VMs should not factor. As for the node OS, > SUSE invests in making sure that HA works well so you should be fine. > > > - setup should be extensible (add further vm's) > > That is entirely a question of available hardware resources. > > > - I have a shared storage (FC SAN) > > Personally, I prefer DRBD (truly replicated storage), but SAN is fine. > > > My ideas/questions: > > > > Should I install all vm's in one partition or every vm in a seperate > > partition ? The advantage of one vm per partition is that I don't need > > a cluster fs, right ? > > I would put each VM on a dedicated LV and not have an FS between the > VM and the host. The question then becomes; What is the PV? I use > clustered LVM to make sure all nodes are in sync, LVM-wise. Is this the setup you are running (without fs) ? > > > I read to avoid a cluster fs if possible because it adds further > > complexity. Below the fs I'd like to have logical volumes because they > > are easy to expand. > > Avoiding clustered FS is always preferable, yes. I use a small gfs2 > partition, but this is just for storing VM XML data, install media, etc. > Things that change rarely. Some advocate for having independent FSes > on each node and keeping the data in sync using things like rsync or what > have you. > > > Do I need cLVM (I think so) ? Is it an advantage to install the vm's > > in plain partitions, without a fs ? > > I advise it, yes. > > > It would reduce the complexity further because I don't need a fs. > > Would live migration still be possible ? > > Live migration is possible provided both nodes can see the same physical > storage at the same time. For example, DRBD dual-primary works. If you > use clustered LVM, you can be sure that the backing LVs are the same > across the nodes. And this works without a cluster fs ? But when both nodes accesses the LV concurrently (during the migration), will the data not be destroyed ? cLVM does not control concurrent access, it just cares about propagating the lvm metadata to all nodes and locking during changes of the metadata. > > > snapshots: > > I was playing around with virsh (libvirt) to create snapshots of the vm's. > > In the end I gave up. virsh explains commands in its help, but when > > you want to use them you get messages like "not supported yet", > > although I use libvirt 1.2.11. This is ridiculous. I think I will > > create my snapshots inside the vm's using lvm. > > We have a network based backup solution (Legato/EMC) which saves > the > > disks every night. > > Supplying a snapshot for that I have a consistent backup.
Re: [ClusterLabs] Resources suddenly get target-role="stopped"
On 12/07/2015 02:03 AM, Klechomir wrote: > Hi Ken, > The comments are in the text. > > On 4.12.2015 19:06, Ken Gaillot wrote: >> On 12/04/2015 10:22 AM, Klechomir wrote: >>> Hi list, >>> My issue is the following: >>> >>> I have very stable cluster, using Corosync 2.1.0.26 and Pacemaker 1.1.8 >>> (observed the same problem with Corosync 2.3.5 & Pacemaker 1.1.13-rc3) >>> >>> Bumped on this issue when started playing with VirtualDomain resources, >>> but this seems to be unrelated to the RA. >>> >>> The problem is that without apparent reason a resource gets >>> target-role="Stopped". This happens after (successful) migration, or >>> after failover., or after VM restart . >>> >>> My tests showed that changing the resource name fixes this problem, but >>> this seems to be a temporary workaround. >>> >>> The resource configuration is: >>> primitive VMA_VM1 ocf:heartbeat:VirtualDomain \ >>> params config="/NFSvolumes/CDrive1/VM1/VM1.xml" >>> hypervisor="qemu:///system" migration_transport="tcp" \ >>> meta allow-migrate="true" target-role="Started" \ >>> op start interval="0" timeout="120s" \ >>> op stop interval="0" timeout="120s" \ >>> op monitor interval="10" timeout="30" depth="0" \ >>> utilization cpu="1" hv_memory="925" >>> order VM_VM1_after_Filesystem_CDrive1 inf: Filesystem_CDrive1 VMA_VM1 >>> >>> Here is the log from one such stop, after successful migration with "crm >>> migrate resource VMA_VM1": >>> >>> Dec 04 15:18:22 [3818929] CLUSTER-1 crmd:debug: cancel_op: >>> Cancelling op 5564 for VMA_VM1 (VMA_VM1:5564) >>> Dec 04 15:18:22 [4434] CLUSTER-1 lrmd: info: >>> cancel_recurring_action: Cancelling operation >>> VMA_VM1_monitor_1 >>> Dec 04 15:18:23 [3818929] CLUSTER-1 crmd:debug: cancel_op: >>> Op 5564 for VMA_VM1 (VMA_VM1:5564): cancelled >>> Dec 04 15:18:23 [3818929] CLUSTER-1 crmd:debug: >>> do_lrm_rsc_op:Performing >>> key=351:199:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 >>> op=VMA_VM1_migrate_to_0 >>> VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 DEBUG: >>> Virtual domain VM1 is currently running. >>> VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:23 INFO: VM1: >>> Starting live migration to CLUSTER-2 (using virsh >>> --connect=qemu:///system --quiet migrate --live VM1 >>> qemu+tcp://CLUSTER-2/system ). >>> Dec 04 15:18:24 [3818929] CLUSTER-1 crmd: info: >>> process_lrm_event:LRM operation VMA_VM1_monitor_1 (call=5564, >>> status=1, cib-update=0, confirmed=false) Cancelled >>> Dec 04 15:18:24 [3818929] CLUSTER-1 crmd:debug: >>> update_history_cache: Updating history for 'VMA_VM1' with >>> monitor op >>> VirtualDomain(VMA_VM1)[1797698]:2015/12/04_15:18:26 INFO: VM1: >>> live migration to CLUSTER-2 succeeded. >>> Dec 04 15:18:26 [4434] CLUSTER-1 lrmd:debug: >>> operation_finished: VMA_VM1_migrate_to_0:1797698 - exited with rc=0 >>> Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: >>> operation_finished: VMA_VM1_migrate_to_0:1797698 [ >>> 2015/12/04_15:18:23 INFO: VM1: Starting live migration to CLUSTER-2 >>> (using virsh --connect=qemu:///system --quiet migrate --live VM1 >>> qemu+tcp://CLUSTER-2/system ). ] >>> Dec 04 15:18:26 [4434] CLUSTER-1 lrmd: notice: >>> operation_finished: VMA_VM1_migrate_to_0:1797698 [ >>> 2015/12/04_15:18:26 INFO: VM1: live migration to CLUSTER-2 succeeded. ] >>> Dec 04 15:18:27 [3818929] CLUSTER-1 crmd:debug: >>> create_operation_update: do_update_resource: Updating resouce >>> VMA_VM1 after complete migrate_to op (interval=0) >>> Dec 04 15:18:27 [3818929] CLUSTER-1 crmd: notice: >>> process_lrm_event:LRM operation VMA_VM1_migrate_to_0 (call=5697, >>> rc=0, cib-update=89, confirmed=true) ok >>> Dec 04 15:18:27 [3818929] CLUSTER-1 crmd:debug: >>> update_history_cache: Updating history for 'VMA_VM1' with >>> migrate_to op >>> Dec 04 15:18:31 [3818929] CLUSTER-1 crmd:debug: cancel_op: >>> Operation VMA_VM1:5564 already cancelled >>> Dec 04 15:18:31 [3818929] CLUSTER-1 crmd:debug: >>> do_lrm_rsc_op:Performing >>> key=225:200:0:fb6e486a-023a-4b44-83cf-4c0c208a0f56 op=VMA_VM1_stop_0 >>> VirtualDomain(VMA_VM1)[1798719]:2015/12/04_15:18:31 DEBUG: >>> Virtual domain VM1 is not running: failed to get domain 'vm1' domain >>> not found: no domain with matching name 'vm1' >> This looks like the problem. Configuration error? > > As far as I checked this is a harmless bug in VirtualDomain RA. It > downcases the output from "virsh domain info" command, to be able to > parse the status easily, which prevents matching the domain name. > In any case this error doesn't affect the RA functionality, in this case > it just finds out that the resource is already stopped, while my big > concern is why is it stopped. I'm not sure whether it's the cause of your main issue, but it's not harmles
[ClusterLabs] Multisite Clusters: how many servers?
Hi, it is possible / advisable to set up a multisite cluster with booth with one server at each site? So having three servers all together? Mit freundlichen Grüßen, Michael Schwartzkopff -- [*] sys4 AG http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044 Franziskanerstraße 15, 81669 München Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein signature.asc Description: This is a digitally signed message part. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Multisite Clusters: how many servers?
On 07/12/15 03:57 PM, Michael Schwartzkopff wrote: > Hi, > > it is possible / advisable to set up a multisite cluster with booth with one > server at each site? > > So having three servers all together? > > Mit freundlichen Grüßen, > > Michael Schwartzkopff As I understand it, no. You need a cluster on each site to be able to trust that a lost site behaves predictably. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Multisite Clusters: how many servers?
Hi, My 2 cents, You need to be able to form a quorum to keep writes going in case of a split brain scenario, if that is important to you and you can/prepared to pay for it, if its data protection only then a fail safe to read only is all you need. So 3 machines a quorum is 2, 15 machines the quorum is 8. I am actually building 1 at the moment with gluster as the backend so the main site will have 2 and DR one. regards Steven From: Digimer Sent: Tuesday, 8 December 2015 10:00 a.m. To: m...@sys4.de; Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Multisite Clusters: how many servers? On 07/12/15 03:57 PM, Michael Schwartzkopff wrote: > Hi, > > it is possible / advisable to set up a multisite cluster with booth with one > server at each site? > > So having three servers all together? > > Mit freundlichen Grüßen, > > Michael Schwartzkopff As I understand it, no. You need a cluster on each site to be able to trust that a lost site behaves predictably. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] design of a two-node cluster
On 07/12/15 03:27 PM, Lentes, Bernd wrote: > Digimer wrote: >> >> On 07/12/15 12:35 PM, Lentes, Bernd wrote: >>> Hi, >>> >>> i've been asking all around here a while ago. Unfortunately I couldn't >>> continue to work on my cluster, so I'm still thinking about the > design. >>> I hope you will help me again with some recommendations, because >> when >>> the cluster is running changing of the design is not possible anymore. >>> >>> These are my requirements: >>> >>> - all services are running inside virtual machines (KVM), mostly >>> databases and static/dynamic webpages >> >> This is fine, it's what we do with our 2-node clusters. >> >>> - I have two nodes and would like to have some vm's running on node >> A >>> and some on node B during normal operation as a kind of loadbalancing >> >> I used to do this, but I've since stopped. The reasons are: >> >> 1. You need to know that one node can host all servers and still perform >> properly. By always running on one node, you know that this is the case. >> Further, if one node ever stops being powerful enough, you will find out >> early and can address the issue immediately. >> >> 2. If there is a problem, you can always be sure which node to terminate >> (ie: the node hosting all servers gets the fence delay, so the node >> without servers will always get fenced). If you lose input power, you > can >> quickly power down the backup node to shed load, etc. > > Hi Digimer, > thanks for your reply. > I don't understand what you want to say in (2). To prevent a dual fence, where both nodes fence each other when communication between the nodes fail but the nodes are otherwise healthy, you need to set a fence delay against one of the nodes. So when this happens, if the delay is on node 1, this will happen; Node 1 looks up how to fence node 2, sees no delay and fences immediately. Node 2 looks up how to fence node 1, sees a delay and pauses. Node 2 will be dead long before the delay expires, ensuring that node 2 always loses in such a case. If you have VMs on both nodes, then no matter which node the delay is on, some servers will be interrupted. This is just one example. The other, as I mentioned, would be a lost power condition. Your UPSes can hold up both nodes for a period of time. If you can shut down one node, you can extend how long the UPSes can run. So if the power goes out for a period of time, you can immediately power down one node (the one hosting no servers) without first live-migrating VMs, which will make things simpler and save time. Another similar example would be a loss of cooling, where you would want to shut down nodes to minimize how much heat is being created. There are other examples, but I think this clarifies what I meant. >>> - I'd like to keep the setup simple (if possible) >> >> There is a minimum complexity in HA, but you can get as close as >> possible. We've spent years trying to simplify our VM hosting clusters > as >> much as possible. >> >>> - availability is important, performance not so much (webpages some >>> hundred requests per day, databases some hundred inserts/selects >> per >>> day) >> >> All the more reason to consolidate all VMs on one host. >> >>> - I'd like to have snapshots of the vm's >> >> This is never a good idea, as you catch the state of the disk at the > point of >> the snapshot, but not RAM. Anything in buffers will be missed so you can >> not rely on the snapshot images to always be consistent or even >> functional. >> >>> - live migration of the vm's should be possible >> >> Easy enough. >> >>> - nodes are SLES 11 SP4, vm's are Windows 7 and severable linux >>> distributions (Ubuntu, SLES, OpenSuSE) >> >> The OS installed on the guest VMs should not factor. As for the node OS, >> SUSE invests in making sure that HA works well so you should be fine. >> >>> - setup should be extensible (add further vm's) >> >> That is entirely a question of available hardware resources. >> >>> - I have a shared storage (FC SAN) >> >> Personally, I prefer DRBD (truly replicated storage), but SAN is fine. >> >>> My ideas/questions: >>> >>> Should I install all vm's in one partition or every vm in a seperate >>> partition ? The advantage of one vm per partition is that I don't need >>> a cluster fs, right ? >> >> I would put each VM on a dedicated LV and not have an FS between the >> VM and the host. The question then becomes; What is the PV? I use >> clustered LVM to make sure all nodes are in sync, LVM-wise. > > Is this the setup you are running (without fs) ? Yes, we use DRBD to replicate the storage and use the /dev/drbdX device as the clustered LVM PV. We have one VG for the space (could add a new DRBD resource later if needed...) and then create a dedicated LV per VM. We have, as I mentioned, one small LV formatted with gfs2 where we store the VM's XML files (so that any change made to a VM is immediately available to all nodes. >>> I read to avoid a cluster fs if possible because it adds further >>> complexity. Bel
Re: [ClusterLabs] Multisite Clusters: how many servers?
On 12/07/2015 09:57 PM, Michael Schwartzkopff wrote: > Hi, > > it is possible / advisable to set up a multisite cluster with booth with one > server at each site? > > So having three servers all together? Yes. No. Might be. ^L The concept of geo cluster is cluster of cluster, where the local cluster handles local issue using local timing (seconds), while the geo handles geo issues with much wider timing constrains (minutes, if not hour(s)). A geo setup needs a minimum of 2 sites + an arbitrator site. The arbitrator can be a single node (even a small VM) with the cluster stack running, if only used as a plain arbitrator. In theory you could run without an arbitrator, if the admins are the arbitrators. The other sites needs a cluster stack running, of course this will work with a single node cluster, too. Wonder what setup needs geo redundancy but no local clusters, or what setup needs booth but isn't geo. greetings Kai Dupke Senior Product Manager Server Product Line -- Sell not virtue to purchase wealth, nor liberty to purchase power. Phone: +49-(0)5102-9310828 Mail: kdu...@suse.com Mobile: +49-(0)173-5876766 WWW: www.suse.com SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Multisite Clusters: how many servers?
On 12/07/2015 10:30 PM, Steven Jones wrote: > You need to be able to form a quorum to keep writes going in case of a split > brain scenario, if that is important to you and you can/prepared to pay for > it, if its data protection only then a fail safe to read only is all you need. > > So 3 machines a quorum is 2, 15 machines the quorum is 8. Booth is based on the sites, not the nodes. It doesn't matter how many nodes are at each site as long as the site works on it's own. When a site comes down, then booth uses an arbitrator (site) to determine the failing site is down. For most environments such a switch will not be done automatically but with some organizational interaction. You don't want to switch your data center between EU and US if not really needed, without an SEC filing, or whatever some C* has in mind. Of course this is not a technical limitation but an organizational. greetings kai ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: design of a two-node cluster
>>> Digimer schrieb am 07.12.2015 um 22:40 in Nachricht <5665fcdc.1030...@alteeve.ca>: [...] > Node 1 looks up how to fence node 2, sees no delay and fences > immediately. Node 2 looks up how to fence node 1, sees a delay and > pauses. Node 2 will be dead long before the delay expires, ensuring that > node 2 always loses in such a case. If you have VMs on both nodes, then > no matter which node the delay is on, some servers will be interrupted. AFAIK, the cluster will try to migrate resources if a fencing is pending, but not yet complete. Is that true? [...] Regards, Ulrich ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org