Re: [Linux-HA] Beginner questions
Juha Heinanen wrote: > Juha Heinanen writes: > > > the real problem is that start of mysql server by pacemaker stops > > altogether after a few manual stops (/etc/init.d/mysql stop). > > i think i figured this out. when pacemaker needed to start my > mysql-server resource three times on node lenny1, it migrated the group > to node lenny2. when i then repeated stoping of mysql-server on lenny2, > it migrated the group back to lenny1, but didn't start mysql-server, > because it remembered that it had already started it there 3 times. > > if so, my conclusion is to forget migration-threshold parameter. That sounds about right. You can configure a failure-timeout. That's an amount of time after which the cluster forgets about failures. Read up on failure timeout and don't miss the section "how to ensure time based rules take effect" in the pdf documentation. Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat degrades drbd resource
I don't know what your CIB xml config looks like but, if you're forcing collocation within a group, you may want to try using the drbbdisk RA provided by heartbeat instead of the drbd CRM OCF script. Heiko Schellhorn wrote: Hi I installed drbd (8.0.14) together with heartbeat (2.0.8) on a Gentoo-system. I have following problem: Standalone the drbd resource works perfectly. I can mount/unmount it alternate on both nodes. Reading/writing works and /proc/drbd looks fine. But when I start heartbeat it degrades the resource step by step until it's marked as unconfigured. An excerpt of the logfile is attached. Heartbeat itself starts up and runs. Two of the three resources configured up to now are also working. Only drbd shows problems. (See the file crm_mon-out) I don't think it's a problem of communication between the nodes because drbd is working standalone and e.g. the IPaddr2 resource is also working within heartbeat. I also tried several heartbeat-configurations. First I defined the resources as single resources and then I combined the resources to a resource group. There was no difference. Has someone seen such an issue before? Any ideas ? I didn't find anything helpful in the list archive. If you need more informations I can provide a complete log and the config. Thanks Heiko Last updated: Mon Mar 23 13:11:49 2009 Current DC: mainsrv2 (d7bd5c11-babc-4b69-97d6-3d20d01d8d66) 2 Nodes configured. 1 Resources configured. Node: mainsrv2 (d7bd5c11-babc-4b69-97d6-3d20d01d8d66): online Node: mainsrv1 (6a5eacba-7389-4305-9074-de6116504c49): online Resource Group: heartbeat_group_1 resource_IP (heartbeat::ocf:IPaddr2): Started mainsrv1 resource_drbd (heartbeat::ocf:drbd): Started mainsrv1 fs_drbd (heartbeat::ocf:Filesystem): Stopped drbd[30750][30763]: 2009/03/23_12:42:19 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf state dr0 drbd[30750][30772]: 2009/03/23_12:42:19 DEBUG: dr0: Exit code 0 drbd[30750][30778]: 2009/03/23_12:42:19 DEBUG: dr0: Command output: Secondary/Secondary drbd[30750][30794]: 2009/03/23_12:42:19 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf cstate dr0 drbd[30750][30798]: 2009/03/23_12:42:19 DEBUG: dr0: Exit code 0 drbd[30750][30799]: 2009/03/23_12:42:19 DEBUG: dr0: Command output: Connected drbd[30750][30800]: 2009/03/23_12:42:19 DEBUG: dr0 status: Secondary/Secondary Secondary Secondary Connected drbd[30808][30815]: 2009/03/23_12:42:20 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf state dr0 drbd[30808][30819]: 2009/03/23_12:42:20 DEBUG: dr0: Exit code 0 drbd[30808][30820]: 2009/03/23_12:42:20 DEBUG: dr0: Command output: Secondary/Unknown drbd[30808][30830]: 2009/03/23_12:42:20 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf cstate dr0 drbd[30808][30836]: 2009/03/23_12:42:20 DEBUG: dr0: Exit code 0 drbd[30808][30837]: 2009/03/23_12:42:20 DEBUG: dr0: Command output: WFConnection drbd[30808][30839]: 2009/03/23_12:42:20 DEBUG: dr0 status: Secondary/Unknown Secondary Unknown WFConnection drbd[30808][30841]: 2009/03/23_12:42:20 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf down dr0 drbd[30808][30873]: 2009/03/23_12:42:21 DEBUG: dr0: Exit code 0 drbd[30808][30874]: 2009/03/23_12:42:21 DEBUG: dr0: Command output: drbd[30808][30875]: 2009/03/23_12:42:21 DEBUG: dr0 stop: drbdadm down succeeded. drbd[30876][30883]: 2009/03/23_12:42:21 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf state dr0 drbd[30876][30888]: 2009/03/23_12:42:21 DEBUG: dr0: Exit code 0 drbd[30876][30889]: 2009/03/23_12:42:21 DEBUG: dr0: Command output: Unconfigured drbd[30876][30897]: 2009/03/23_12:42:21 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf cstate dr0 drbd[30876][30901]: 2009/03/23_12:42:21 DEBUG: dr0: Exit code 0 drbd[30876][30902]: 2009/03/23_12:42:21 DEBUG: dr0: Command output: Unconfigured drbd[30876][30903]: 2009/03/23_12:42:21 DEBUG: dr0 status: Unconfigured Unconfigured Unconfigured Unconfigured drbd[30876][30904]: 2009/03/23_12:42:21 DEBUG: dr0 start: already configured. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] STONITH: internal vs. external
Hi, On Mon, Mar 23, 2009 at 09:41:38AM -0700, Ethan Bannister wrote: > > This may turn out to be a silly question, but what is the true difference > between and external STONITH plugin and an internal STONITH plugin. "internal" (not a good name) plugins are locked in memory on start in order to have them function even in situations when memory is tight. So, they are a bit better than external plugins. But nowadays, with enough memory, there's not much difference between the two. > I have > a SAN set up for fail-over, and everything looks like it is working as it > should. However, I would like to use STONITH to prevent split-brain. I do > not have a STONITH device so I would need to use something like meatware > (which does not allow automatic fail-over) or ssh. But there are two types, > ssh and external/ssh. What is the difference? I will try to do some > research in the meantime, but if I get an answer before I find out myself, > that would be greatly appreciated. Also, if I were to use ssh as a STONITH > plugin, will my machine automatically migrate resources to the other > machine? You should never use ssh for production clusters. It is not reliable. It is good for testing only. If you have a SAN, you can try external/sbd. Thanks, Dejan > Thanks for any help you can provide :) > -- > View this message in context: > http://www.nabble.com/STONITH%3A-internal-vs.-external-tp22663871p22663871.html > Sent from the Linux-HA mailing list archive at Nabble.com. > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] suicide in no-quorum-policy
Hi, On Fri, Mar 20, 2009 at 05:09:00PM +0100, Michael Schwartzkopff wrote: > Hi, > > in the metadata of the pengine I found the option no-quorum-policy which can > be set to "suicide". > > What exactly does the node do, when this option is set up suicide? Should commit suicide, i.e. reboot. But I never tried that. > Has STONITH to be configured to make this option work? No. > As far as I remenber, there once was the dicsussion, that no node can commit > suicide via STONITH. Is this still valid? Yes. With the exception for the suicide plugin. But I'd recommend using a "real" device for stonith. > Makes this option sense, If STONITH > is used? Or a some other mechanism used? Normally, stonith should take care of that. Thanks, Dejan > Thanks the enlightening answers. > > -- > Dr. Michael Schwartzkopff > MultiNET Services GmbH > Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany > Tel: +49 - 89 - 45 69 11 0 > Fax: +49 - 89 - 45 69 11 21 > mob: +49 - 174 - 343 28 75 > > mail: mi...@multinet.de > web: www.multinet.de > > Sitz der Gesellschaft: 85630 Grasbrunn > Registergericht: Amtsgericht M?nchen HRB 114375 > Gesch?ftsf?hrer: G?nter Jurgeneit, Hubert Martens > > --- > > PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B > Skype: misch42 > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] STONITH: internal vs. external
This may turn out to be a silly question, but what is the true difference between and external STONITH plugin and an internal STONITH plugin. I have a SAN set up for fail-over, and everything looks like it is working as it should. However, I would like to use STONITH to prevent split-brain. I do not have a STONITH device so I would need to use something like meatware (which does not allow automatic fail-over) or ssh. But there are two types, ssh and external/ssh. What is the difference? I will try to do some research in the meantime, but if I get an answer before I find out myself, that would be greatly appreciated. Also, if I were to use ssh as a STONITH plugin, will my machine automatically migrate resources to the other machine? Thanks for any help you can provide :) -- View this message in context: http://www.nabble.com/STONITH%3A-internal-vs.-external-tp22663871p22663871.html Sent from the Linux-HA mailing list archive at Nabble.com. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Beginner questions
Juha Heinanen writes: > the real problem is that start of mysql server by pacemaker stops > altogether after a few manual stops (/etc/init.d/mysql stop). i think i figured this out. when pacemaker needed to start my mysql-server resource three times on node lenny1, it migrated the group to node lenny2. when i then repeated stoping of mysql-server on lenny2, it migrated the group back to lenny1, but didn't start mysql-server, because it remembered that it had already started it there 3 times. if so, my conclusion is to forget migration-threshold parameter. -- juha ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Beginner questions
Dominik Klein writes: > Heartbeat will for example no longer be part of the next suse enterprise > linux (sles11) ha solution. It will be based on openais. So for new > setups, this should be the way to go - at least imho. yes, after there are packages available for debian lenny. -- juha ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Beginner questions
Dominik Klein writes: > I read your email on the pacemaker list and from what you've shared and > explained, i cannot spot find a configuration issue. It should just work > like that (and does work like that for me). i did more experiments and noticed that migration-threshold=N doesn't work as i thought it would. i thought that if starting of a resource fails N times, the group of the resource will migrate to the other node. what happens instead is that if N is 3, for example, and i stop the resource (e.g., mysql server) three times, pacemaker will start it two times on the original node and on third start migrates the resources to the other one even if start worked fine each time. is there a means to achieve the migration only when start failed N times? > Maybe post your entire configuration, preferrably a hb_report > archive. i think i had a bug in my crm during the earlier tests. i had set migration-threshold on an individual resource (mysql-server) crm_resource --meta --resource mysql-server --set-parameter migration-threshold --property-value 3 instead of the whole group. now i have group mysql-server-group fs0 virtual-ip mysql-server \ meta migration-threshold="3" and migration of the resources takes place after third start. complete config is below. the real problem is that start of mysql server by pacemaker stops altogether after a few manual stops (/etc/init.d/mysql stop). here is an example. i stop mysql and all other resources are started on the other node except mysql server: crmd[9940]: 2009/03/23_19:33:23 info: send_direct_ack: ACK'ing resource op drbd0:0_monitor_6 from 5:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc: lrm_invoke-lrmd-1237829603-11 crmd[9940]: 2009/03/23_19:33:23 info: do_lrm_rsc_op: Performing key=59:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 ) lrmd[9937]: 2009/03/23_19:33:23 info: rsc:drbd0:0: notify crmd[9940]: 2009/03/23_19:33:23 info: do_lrm_rsc_op: Performing key=61:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 ) crmd[9940]: 2009/03/23_19:33:24 info: process_lrm_event: LRM operation drbd0:0_monitor_6 (call=31, rc=-2, cib-update=0, confirmed=true) Cancelled unknown exec error lrmd[9937]: 2009/03/23_19:33:24 info: rsc:drbd0:0: notify crmd[9940]: 2009/03/23_19:33:24 info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=32, rc=0, cib-update=49, confirmed=true) complete ok crmd[9940]: 2009/03/23_19:33:24 info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=33, rc=0, cib-update=50, confirmed=true) complete ok crmd[9940]: 2009/03/23_19:33:26 info: do_lrm_rsc_op: Performing key=62:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 ) lrmd[9937]: 2009/03/23_19:33:26 info: rsc:drbd0:0: notify crmd[9940]: 2009/03/23_19:33:26 info: do_lrm_rsc_op: Performing key=13:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_promote_0 ) crm_master[13804]: 2009/03/23_19:33:26 info: Invoked: /usr/sbin/crm_master -l reboot -v 75 lrmd[9937]: 2009/03/23_19:33:27 info: RA output: (drbd0:0:notify:stdout) 0 Trying master-drbd0:0=75 update via attrd lrmd[9937]: 2009/03/23_19:33:27 info: rsc:drbd0:0: promote crmd[9940]: 2009/03/23_19:33:27 info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=34, rc=0, cib-update=51, confirmed=true) complete ok lrmd[9937]: 2009/03/23_19:33:27 info: RA output: (drbd0:0:promote:stdout) drbd[13811]:2009/03/23_19:33:27 INFO: drbd0 promote: primary succeeded crmd[9940]: 2009/03/23_19:33:27 info: process_lrm_event: LRM operation drbd0:0_promote_0 (call=35, rc=0, cib-update=52, confirmed=true) complete ok crmd[9940]: 2009/03/23_19:33:29 info: do_lrm_rsc_op: Performing key=60:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 ) lrmd[9937]: 2009/03/23_19:33:29 info: rsc:drbd0:0: notify crm_master[13983]: 2009/03/23_19:33:29 info: Invoked: /usr/sbin/crm_master -l reboot -v 75 lrmd[9937]: 2009/03/23_19:33:29 info: RA output: (drbd0:0:notify:stdout) 0 Trying master-drbd0:0=75 update via attrd crmd[9940]: 2009/03/23_19:33:29 info: process_lrm_event: LRM operation drbd0:0_notify_0 (call=36, rc=0, cib-update=53, confirmed=true) complete ok crmd[9940]: 2009/03/23_19:33:31 info: do_lrm_rsc_op: Performing key=44:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=fs0_start_0 ) lrmd[9937]: 2009/03/23_19:33:31 info: rsc:fs0: start crmd[9940]: 2009/03/23_19:33:31 info: do_lrm_rsc_op: Performing key=14:8:8:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_monitor_59000 ) Filesystem[13990]: 2009/03/23_19:33:31 INFO: Running start for /dev/drbd0 on /var/lib/mysql crmd[9940]: 2009/03/23_19:33:31 info: process_lrm_event: LRM operation drbd0:0_monitor_59000 (call=38, rc=8, cib-update=54, confirmed=false) complete master crmd[9940]: 2009/03/23_19:33:31 info: process_lrm_event: LRM operation fs0_start_0 (call=37, rc=0, cib-update=55, confirmed=true) complete ok crmd[9940]: 2009/03/23_19:33:33 info: do_lrm_rsc_op: Performing key=46:8:0:84c3fc98-c640-4a3f-b0ea-c
Re: [Linux-HA] Beginner questions
Dominik Klein wrote: Is there some documentation available for openais? I can't even find a good description of what it does or why you would use it. Also, will this help with my 2nd question: having a few spares for a large number of servers? While my objective with the squid cache is to proxy everything through one server to maximize the cache hits, I may switch to memcached on a group of machines and would like to have a standby or 2 that could take over for any failing machine. Well, there are man-pages and the mailing list. The install page even has a configuration example. And I have found this thread to be especially helpful: https://lists.linux-foundation.org/pipermail/openais/2009-March/010894.html Yes, but I want to know why I should use it before dealing with how to install and configure. Is there a feature list, FAQ, or comparison to other mechanisms? openais will be the future platform for pacemaker clusters providing the communication infrastructure and node failure detection. Heartbeat will for example no longer be part of the next suse enterprise linux (sles11) ha solution. It will be based on openais. So for new setups, this should be the way to go - at least imho. The code may be great, but it really needs a little public relations effort unless I'm missing something. Is there any way to find the answer to my question above (many active hosts per spare)? -- Les Mikesell lesmikes...@gmail.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Beginner questions
> Is there some documentation available for openais? I can't even find a > good description of what it does or why you would use it. Also, will > this help with my 2nd question: having a few spares for a large number > of servers? While my objective with the squid cache is to proxy > everything through one server to maximize the cache hits, I may switch > to memcached on a group of machines and would like to have a standby or > 2 that could take over for any failing machine. Well, there are man-pages and the mailing list. The install page even has a configuration example. And I have found this thread to be especially helpful: https://lists.linux-foundation.org/pipermail/openais/2009-March/010894.html openais will be the future platform for pacemaker clusters providing the communication infrastructure and node failure detection. Heartbeat will for example no longer be part of the next suse enterprise linux (sles11) ha solution. It will be based on openais. So for new setups, this should be the way to go - at least imho. Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Beginner questions
Dominik Klein wrote: My first HA setup is for a squid proxy where all I need is to move an IP address to a backup server if the primary fails (and the cache can just rebuild on its own). This seems to work, but will only fail if the machine goes down completely or the primary IP is unreachable. Is that typical or are there monitors for the service itself so failover would happen if the squid process is not running or stops accepting connections? Second question (unrelated): Can heartbeat be set up so one or two spare machines could automatically take over the IP address of any of a much larger pool of machines that might fail? Heartbeat in v1 mode (haresources configuration) cannot do any resource level monitoring itself. You'd need to do that externally by any means. If you're just starting out learning now, I'd suggest going with openais and pacemaker instead of heartbeat right away. Check out the documentation on www.clusterlabs.org/wiki/install and www.clusterlabs.org/wiki/Documentation Is there some documentation available for openais? I can't even find a good description of what it does or why you would use it. Also, will this help with my 2nd question: having a few spares for a large number of servers? While my objective with the squid cache is to proxy everything through one server to maximize the cache hits, I may switch to memcached on a group of machines and would like to have a standby or 2 that could take over for any failing machine. -- Les Mikesell lesmikes...@gmail.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat degrades drbd resource
Dominik Klein wrote: > You cannot use drbd in heartbeat the way you configured it. > > Please refer to http://wiki.linux-ha.org/DRBD/HowTov2 Sorry, copy/paste error. I meant to say http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat degrades drbd resource
You cannot use drbd in heartbeat the way you configured it. Please refer to http://wiki.linux-ha.org/DRBD/HowTov2 and (if that wasn't made clear enough on the page) make sure the first thing you do is upgrade your cluster software. Read here on how to do that: http://clusterlabs.org/wiki/Install Regards Dominik Heiko Schellhorn wrote: > Hi > > I installed drbd (8.0.14) together with heartbeat (2.0.8) on a Gentoo-system. > > I have following problem: > Standalone the drbd resource works perfectly. I can mount/unmount it > alternate > on both nodes. Reading/writing works and /proc/drbd looks fine. > > But when I start heartbeat it degrades the resource step by step until it's > marked as unconfigured. An excerpt of the logfile is attached. > Heartbeat itself starts up and runs. Two of the three resources configured up > to now are also working. Only drbd shows problems. (See the file > crm_mon-out) > > I don't think it's a problem of communication between the nodes because drbd > is working standalone and e.g. the IPaddr2 resource is also working within > heartbeat. > I also tried several heartbeat-configurations. First I defined the resources > as single resources and then I combined the resources to a resource group. > There was no difference. > > Has someone seen such an issue before? Any ideas ? > I didn't find anything helpful in the list archive. > > If you need more informations I can provide a complete log and the config. > > Thanks > > Heiko > > > > > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Heartbeat degrades drbd resource
Hi I installed drbd (8.0.14) together with heartbeat (2.0.8) on a Gentoo-system. I have following problem: Standalone the drbd resource works perfectly. I can mount/unmount it alternate on both nodes. Reading/writing works and /proc/drbd looks fine. But when I start heartbeat it degrades the resource step by step until it's marked as unconfigured. An excerpt of the logfile is attached. Heartbeat itself starts up and runs. Two of the three resources configured up to now are also working. Only drbd shows problems. (See the file crm_mon-out) I don't think it's a problem of communication between the nodes because drbd is working standalone and e.g. the IPaddr2 resource is also working within heartbeat. I also tried several heartbeat-configurations. First I defined the resources as single resources and then I combined the resources to a resource group. There was no difference. Has someone seen such an issue before? Any ideas ? I didn't find anything helpful in the list archive. If you need more informations I can provide a complete log and the config. Thanks Heiko -- --- Dipl. Inf. Heiko Schellhorn University of BremenRoom: NW1-U 2065 Inst. of Environmental Physics Phone: +49(0)421 218 4080 P.O. Box 33 04 40 Fax: +49(0)421 218 4555 D-28334 Bremen Mail: mailto:sch...@physik.uni-bremen.de Germany www: http://www.iup.uni-bremen.de http://www.sciamachy.de http://www.geoscia.de Last updated: Mon Mar 23 13:11:49 2009 Current DC: mainsrv2 (d7bd5c11-babc-4b69-97d6-3d20d01d8d66) 2 Nodes configured. 1 Resources configured. Node: mainsrv2 (d7bd5c11-babc-4b69-97d6-3d20d01d8d66): online Node: mainsrv1 (6a5eacba-7389-4305-9074-de6116504c49): online Resource Group: heartbeat_group_1 resource_IP (heartbeat::ocf:IPaddr2): Started mainsrv1 resource_drbd (heartbeat::ocf:drbd): Started mainsrv1 fs_drbd (heartbeat::ocf:Filesystem):Stopped drbd[30750][30763]: 2009/03/23_12:42:19 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf state dr0 drbd[30750][30772]: 2009/03/23_12:42:19 DEBUG: dr0: Exit code 0 drbd[30750][30778]: 2009/03/23_12:42:19 DEBUG: dr0: Command output: Secondary/Secondary drbd[30750][30794]: 2009/03/23_12:42:19 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf cstate dr0 drbd[30750][30798]: 2009/03/23_12:42:19 DEBUG: dr0: Exit code 0 drbd[30750][30799]: 2009/03/23_12:42:19 DEBUG: dr0: Command output: Connected drbd[30750][30800]: 2009/03/23_12:42:19 DEBUG: dr0 status: Secondary/Secondary Secondary Secondary Connected drbd[30808][30815]: 2009/03/23_12:42:20 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf state dr0 drbd[30808][30819]: 2009/03/23_12:42:20 DEBUG: dr0: Exit code 0 drbd[30808][30820]: 2009/03/23_12:42:20 DEBUG: dr0: Command output: Secondary/Unknown drbd[30808][30830]: 2009/03/23_12:42:20 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf cstate dr0 drbd[30808][30836]: 2009/03/23_12:42:20 DEBUG: dr0: Exit code 0 drbd[30808][30837]: 2009/03/23_12:42:20 DEBUG: dr0: Command output: WFConnection drbd[30808][30839]: 2009/03/23_12:42:20 DEBUG: dr0 status: Secondary/Unknown Secondary Unknown WFConnection drbd[30808][30841]: 2009/03/23_12:42:20 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf down dr0 drbd[30808][30873]: 2009/03/23_12:42:21 DEBUG: dr0: Exit code 0 drbd[30808][30874]: 2009/03/23_12:42:21 DEBUG: dr0: Command output: drbd[30808][30875]: 2009/03/23_12:42:21 DEBUG: dr0 stop: drbdadm down succeeded. drbd[30876][30883]: 2009/03/23_12:42:21 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf state dr0 drbd[30876][30888]: 2009/03/23_12:42:21 DEBUG: dr0: Exit code 0 drbd[30876][30889]: 2009/03/23_12:42:21 DEBUG: dr0: Command output: Unconfigured drbd[30876][30897]: 2009/03/23_12:42:21 DEBUG: dr0: Calling /sbin/drbdadm -c /etc/drbd.conf cstate dr0 drbd[30876][30901]: 2009/03/23_12:42:21 DEBUG: dr0: Exit code 0 drbd[30876][30902]: 2009/03/23_12:42:21 DEBUG: dr0: Command output: Unconfigured drbd[30876][30903]: 2009/03/23_12:42:21 DEBUG: dr0 status: Unconfigured Unconfigured Unconfigured Unconfigured drbd[30876][30904]: 2009/03/23_12:42:21 DEBUG: dr0 start: already configured. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] expected-quorum-votes
> crmd metadata tells me that expected-quorum-votes > are used to calculate quorum in openais based clusters. Its default value is > 2. Do I have to change this value if I have 3 or more nodes in a OpenAIS > based > cluster? No. It is automatically adjusted by the cluster. Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] maintenance-mode of pengine
Michael Schwartzkopff wrote: > Hi, > > In the metadata of the pengine I found the attribute maintenance-mode. I did > not find any documentation about it. The long description also says: "Should > the cluster ...". Anybody knows what this options does? > > Thanks. It disables resource management when set to true. Like "is-managed-default" did in the old days, plus, irrc, it also disables all ops. But better let Andrew verify the latter. Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Beginner questions
Juha Heinanen wrote: > Dominik Klein writes: > > > Heartbeat in v1 mode (haresources configuration) cannot do any resource > > level monitoring itself. You'd need to do that externally by any > > means. > > yes, in v2 mode i have managed to make pacemaker to monitor resources, > for example, like this: > > primitive test lsb:test \ > op monitor interval="30s" timeout="5s" \ > meta target-role="Started" > > but i still have failed to find out how to make pacemaker to migrate > a resource group to another node if one of the resources in the group > fails to start. > > for example, if test is the last member of group > > group test-group fs0 mysql-server virtual-ip test > > and fails to start, the group is not migrated to another node. > > i have tried to add > > primitive test lsb:test op monitor interval=30s timeout=5s meta > migration-threshold=3 > > but it just stopped monitoring of test after 3 attempts. > > any ideas how to achieve migration? I read your email on the pacemaker list and from what you've shared and explained, i cannot spot find a configuration issue. It should just work like that (and does work like that for me). Maybe post your entire configuration, preferrably a hb_report archive. Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Beginner questions
Dominik Klein writes: > Heartbeat in v1 mode (haresources configuration) cannot do any resource > level monitoring itself. You'd need to do that externally by any > means. yes, in v2 mode i have managed to make pacemaker to monitor resources, for example, like this: primitive test lsb:test \ op monitor interval="30s" timeout="5s" \ meta target-role="Started" but i still have failed to find out how to make pacemaker to migrate a resource group to another node if one of the resources in the group fails to start. for example, if test is the last member of group group test-group fs0 mysql-server virtual-ip test and fails to start, the group is not migrated to another node. i have tried to add primitive test lsb:test op monitor interval=30s timeout=5s meta migration-threshold=3 but it just stopped monitoring of test after 3 attempts. any ideas how to achieve migration? -- juha ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Beginner questions
Les Mikesell wrote: > My first HA setup is for a squid proxy where all I need is to move an IP > address to a backup server if the primary fails (and the cache can just > rebuild on its own). This seems to work, but will only fail if the > machine goes down completely or the primary IP is unreachable. Is that > typical or are there monitors for the service itself so failover would > happen if the squid process is not running or stops accepting connections? > > Second question (unrelated): Can heartbeat be set up so one or two > spare machines could automatically take over the IP address of any of a > much larger pool of machines that might fail? > Heartbeat in v1 mode (haresources configuration) cannot do any resource level monitoring itself. You'd need to do that externally by any means. If you're just starting out learning now, I'd suggest going with openais and pacemaker instead of heartbeat right away. Check out the documentation on www.clusterlabs.org/wiki/install and www.clusterlabs.org/wiki/Documentation Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems