Re: [Linux-HA] mysql drbd and SAN all together
Indeed, I can confirm that for the stable versions. MySQL cluster (NDB) engine provides great performance, but every node must have enough memory to cope with the current database size. As long as you're willing to go for small lean, mean database that's fine. However, the more recent bleeding edge versions (as of 5.1.6) do have disk database support, it depends upon your environment whether you'd like have a go at 5.1. There are always the more proven mature engines to turn to. On 4/22/07, Eddie C <[EMAIL PROTECTED]> wrote: MySQL cluster is an in memory database. Your Database size is limited by your ram if I read the documentation properly. Not useful for anything I am doing. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat-2.0.8: load balancing
On Sat, Apr 21, 2007 at 07:48:09PM -0600, Alan Robertson wrote: > Gerry Reno wrote: > > I have a virtual IP resource that I'm making available via heartbeat and > > I am controlling this via the GUI. Now I want to add ldirectord load > > balancing for a service on three machines. How can this be added? > > ldirectord is installed but there is no config file. How do I see and > > control these ldirectord load balanced resources in the GUI? Or are > > they not manageable via heartbeat and GUI? > > At this point in time, the load balancer infrastructure doesn't > integrate with the heartbeat infrastructure beyond being able to keep > the load balancer running. > > Sorry :-( > > I can see that being a nice thing to do, though... Is the answer that ldirectord needs to be extended so that the GUI knows how to configure it? If so I am (as always) happy to consider patches. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] standalone pingd.sh
On Fri, 20 Apr 2007, Alan Robertson wrote: > David Lee wrote: > > On Thu, 19 Apr 2007, Xinwei Hu wrote: > > > >> [David Lee had earlier written:] > >>> 5. "ping -q -c 1 $ping_host". The options for "ping" are notoriously > >>> variable from system to system. Keep it simple. (For example my system > >>> doesn't have a "-q" option; and it says that "-c " is for a thing > >>> called "traffic class", only valid on IPv6.) If they are not necessary, > >>> leave them out. If they are necessary, then for those of us who come > >>> along later to maintain code, especially on other operating systems, it is > >>> worth adding comments about your intentions, such as: > >>># -q: to do foo > >>># -c to do bar > >> Here on my system: > >>-c count > >> Stop after sending count ECHO_REQUEST packets. With > >> deadline > >> option, ping waits for count ECHO_REPLY packets, until the > >> time‐ > >> out expires. > >>-q Quiet output. Nothing is displayed except the summary lines > >> at > >> startup time and when finished. > > > > "ping" on Linux and Solaris (to name two of our OSes) seem incompatible in > > their options. > > > >> -q can be removed as we did ">/dev/null 2>&1" already. > > > > Yes. The ">/dev/null 2>&1" method is the way to go to suppress output > > across a range of OSes > > > >> -c is used so that ping won't last forever. > > > > On Solaris: "ping hostname [data_size ] [ count ]" > > > > In practice, it seems that "ping hostname number" also causes a swift > > return for non-replying hosts. > > > > See "resources/heartbeat/IPaddr.in" and "resources/OCF/IPaddr.in" which > > tryi to do the right thing according to which OS they are running on. > > > > So it might be worth us trying to develop our own "ping-wrapper" command > > with a fixed, portable, interface, whose contents are based on those in > > those other two files, and which they would then use, and which your new > > "pingd" could also use. > [...] > Of course, we already have portable ping code in C in our base. The > best way to do this portably is probably to use that code, which is > guaranteed not to change without us knowing about it... Alan: The context of this discussion is for callers that are shell-scripts (the proposed "pingd.sh"; also noting two "IPaddr.in") rather than C code. The principle of calling the system "ping" is clean and simple -- far more so (isn't it?) than having to (re-)write a ping-like command to call the C code in our base. The current problem is simply that the system "ping" in different OSes has different options, and the current discussion is about how to handle that. So I'm wondering whether, in the case of script-based (not C) callers, we could simply have a shell function (e.g. "pingfn") with a fixed interface acting as a wrapper to the system-shell ping command and handling all the relevant OS incompatibilities. -- : David LeeI.T. Service : : Senior Systems ProgrammerComputer Centre : : UNIX Team Leader Durham University : : South Road: : http://www.dur.ac.uk/t.d.lee/Durham DH1 3LE: : Phone: +44 191 334 2752 U.K. : ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] standalone pingd.sh
David Lee wrote: On Fri, 20 Apr 2007, Alan Robertson wrote: Of course, we already have portable ping code in C in our base. The best way to do this portably is probably to use that code, which is guaranteed not to change without us knowing about it... Alan: The context of this discussion is for callers that are shell-scripts (the proposed "pingd.sh"; also noting two "IPaddr.in") rather than C code. The principle of calling the system "ping" is clean and simple -- far more so (isn't it?) than having to (re-)write a ping-like command to call the C code in our base. Sadly, no, it isn't. The system pings are wildly incompatible and lacking in useful features. And since the C code already exists (or so I believe, as Alan made the claim), wrapping a main() and getopts() around it is trivial. The current problem is simply that the system "ping" in different OSes has different options, and the current discussion is about how to handle that. So I'm wondering whether, in the case of script-based (not C) callers, we could simply have a shell function (e.g. "pingfn") with a fixed interface acting as a wrapper to the system-shell ping command and handling all the relevant OS incompatibilities. Good luck - mostly, you can't do so and keep useful semantics. How many ICMP ECHO REQUEST packets do you send? How many REPLY packets do you require for it to be "good"? How long do you wait for each packet? How long do you wait between each REQUEST, and does it depend on the timing of the REPLY? I've had to do this for our monitoring system, and ended up writing a wrapper around fping. -- Carson ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat-2.0.8: load balancing
Am Montag, 23. April 2007 11:06 schrieb Simon Horman: > On Sat, Apr 21, 2007 at 07:48:09PM -0600, Alan Robertson wrote: > > Gerry Reno wrote: > > > I have a virtual IP resource that I'm making available via heartbeat > > > and I am controlling this via the GUI. Now I want to add ldirectord > > > load balancing for a service on three machines. How can this be added? > > > ldirectord is installed but there is no config file. How do I see and > > > control these ldirectord load balanced resources in the GUI? Or are > > > they not manageable via heartbeat and GUI? > > > > At this point in time, the load balancer infrastructure doesn't > > integrate with the heartbeat infrastructure beyond being able to keep > > the load balancer running. > > > > Sorry :-( > > > > I can see that being a nice thing to do, though... > > Is the answer that ldirectord needs to be extended so that > the GUI knows how to configure it? If so I am (as always) > happy to consider patches. Hi, if your platform is Linux, your distro supports CLUSTERIP target of iptables I would be glad if you could do some tests on my IPaddr2 resource agent. It does load sharing between several nodes. Please mail me if you want to test it. Status of the script: works for me. -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: [EMAIL PROTECTED] web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Active-Active Mysql servers
On Mon, Apr 09, 2007 at 11:45:12AM -0400, Benjamin Lawetz wrote: > I'm upgrading my existing mysql replication to new servers and moving up > from heartbeat 1.x to 2.x > I've got the basics up and running and was reading up on the monitoring > feature. I'm a little confused so I just wanted to run this by you guys to > make sure I understood correctly: > > I have 2 mysql servers > Mysql1 192.168.1.4 > Mysql2 192.168.1.5 > I have a heartbeat that controls 2 virtual Ips: 192.168.1.6 (which has a > preffered node of mysql1) and 192.168.1.7 (which has a preffered node of > mysql2) > > I don't want hearbeat to control starting and stopping of mysql (these > services should always be running and are controlled by the OS). That's what heartbeat can do for you as well. I can't imagine a reason to run them outside of the cluster if you want them HA. > But I do want heartbeat to monitor mysql and failover if the monitoring > fails. What would failover consist of then? > The way I understand how to configure this is that I have to add the mysql > ressource to my configuration but disable (or make them return that > everything proceeded correctly) the "start" and "stop" commands on the mysql > script so that start and stop does nothing but monitor functions correctly. > > Is this correct? > > Thanks in advance for your help > > -- > Benjamin > TéliPhone inc. > > > -- > N'envoyé pas de courriel à l'adresse qui suit, sinon vous serez > automatiquement mis sur notre liste noire. > [EMAIL PROTECTED] > Do not send an email to the email above or you will automatically be > blacklisted. > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Dejan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] status check
On Fri, Apr 20, 2007 at 04:18:24PM +0200, Andrew Beekhof wrote: > On 4/19/07, Alan Robertson <[EMAIL PROTECTED]> wrote: > >Andrew Beekhof wrote: > >> On 4/18/07, Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > >>> On 2007-04-17T19:40:13, [EMAIL PROTECTED] wrote: > >>> > >>> > >Easiest way is to model after an existing resource agent, Xen for > >>> > >example. > >>> > I've found the Dummy one a good start in the past. Simple, and shows > >>> > the basic required components. > >>> > >>> Yeah, but that one now calls the ha_pseudo_resource wrappers which isn't > >>> exactly obvious. > >> > >> it never used to and I'd much prefer it didn't for precisely the > >> reason that it was a good template > >> > >> in fact i might just go and revert that particular change now - at > >> least for Dummy > > > >We actually use that resource, and it's not such a good template, IIRC > >for other reasons. > > as the person who wrote this RA, I can tell you it has always had two > purposes (as also mentioned in the commit message). > > 1 - be so insanely simple that it was guaranteed to work (and therefor > good for testing) > 2 - be an appropriate starting point for people writing RAs > (without any extra baggage that would then get copied a million times) > > _please_ do not add features to it, nor refactor it. > I like it exactly how it is. I removed all features on Friday. > >If we want a template, maybe we should just put it in the doc directory > >as a template with lots and lots of comments? > > > >-- > >Alan Robertson <[EMAIL PROTECTED]> > > > >"Openness is the foundation and preservative of friendship... Let me > >claim from you at all times your undisguised opinions." - William > >Wilberforce > >___ > >Linux-HA mailing list > >Linux-HA@lists.linux-ha.org > >http://lists.linux-ha.org/mailman/listinfo/linux-ha > >See also: http://linux-ha.org/ReportingProblems > > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Dejan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] IPv6, service fail on start behaviour
On Thu, Apr 19, 2007 at 12:08:48PM +0200, Andrew Beekhof wrote: > On 4/19/07, Benjamin Watine <[EMAIL PROTECTED]> wrote: > >Andrew Beekhof a écrit : > >> On 4/18/07, Benjamin Watine <[EMAIL PROTECTED]> wrote: > >>> Hi all > >>> > >>> I have two questions about Heartbeat v2 configuration : > >>> > >>> 1. IPv6addr : I've tried to configure virtual IPv6 address for a > >>> resource group. Because I didn't find documentation about this script, I > >>> did it like IPaddr, but it don't seems to be OK. > >>> What are the parameters to use with IPv6addr script ? Is this script > >>> fully OK ? > >> > >> /path/to/IPv6addr meta-data > >> > ># /etc/ha.d/resource.d/IPv6addr meta-data > >usage: /etc/ha.d/resource.d/IPv6addr > >(start|stop|status|usage|meta-data) > > > >That's why I ask. > >"/usr/lib/ocf/resource.d/heartbeat/IPv6addr meta-data" works, thank you > >Should I open a bugzilla ? > > couldn't hurt to None of the heartbeat class agents supports meta-data. Is this usage wrong? > > > > >>> > >>> 2. On a resource group, when a service shut down (manually or because of > >>> fail), heartbeat try to start it again. It's ok, but if the service > >>> have, for any reason, stop after one minute, HB will try to start it > >>> undefinitely... How can I configure HeartBeat in the way it tries to > >>> load a service only 3 times between two reboot, and failover if it can't > >>> run durably the service ? I look after DTD, but didn't find anything > >>> about this. > >> > >> http://www.linux-ha.org/v2/faq/forced_failover > > > >Thanks a lot ! > > > >> > >>> > >>> I hope you understand what I mean :) > >>> > >>> Thanks ! > >>> > >>> Ben > >>> ___ > >>> Linux-HA mailing list > >>> Linux-HA@lists.linux-ha.org > >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>> See also: http://linux-ha.org/ReportingProblems > >>> > >> ___ > >> Linux-HA mailing list > >> Linux-HA@lists.linux-ha.org > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > >> > >> > > > >___ > >Linux-HA mailing list > >Linux-HA@lists.linux-ha.org > >http://lists.linux-ha.org/mailman/listinfo/linux-ha > >See also: http://linux-ha.org/ReportingProblems > > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Dejan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Location constraints
Any hint ? Benjamin Watine a écrit : Hi the list I'm trying to set location constraint for 2 resources group, but I don't understand very well how it works. I want to define a prefered node for each group, and tell HeartBeat to move the group on the other node if 3 resources fail (and restart) occurs. So, I defined default-resource-stickiness at 100, default-resource-failure-stickiness at -100, and put a score of 1200 on prefered node, and 1000 for "second" node. ((1200-1000+100)/100 = 3). I'm trying to do this for 2 group. If 3 fails occurs for the resource of a group, all the group have to be moved to the other node. Can I configure group location constraint as for resource ? How can I get group failcount (if it make sense) ? ... and nothing works :p The resource group don't start on the good node, and never failover if I manually stop 3 times a resource of the group. Some light about this location constraints would be greatly appreciated ! cibadmin -Ql attached. Thank you, in advance. Ben ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] IPv6, service fail on start behaviour
Dejan Muhamedagic a écrit : On Thu, Apr 19, 2007 at 12:08:48PM +0200, Andrew Beekhof wrote: On 4/19/07, Benjamin Watine <[EMAIL PROTECTED]> wrote: Andrew Beekhof a écrit : On 4/18/07, Benjamin Watine <[EMAIL PROTECTED]> wrote: Hi all I have two questions about Heartbeat v2 configuration : 1. IPv6addr : I've tried to configure virtual IPv6 address for a resource group. Because I didn't find documentation about this script, I did it like IPaddr, but it don't seems to be OK. What are the parameters to use with IPv6addr script ? Is this script fully OK ? /path/to/IPv6addr meta-data # /etc/ha.d/resource.d/IPv6addr meta-data usage: /etc/ha.d/resource.d/IPv6addr (start|stop|status|usage|meta-data) That's why I ask. "/usr/lib/ocf/resource.d/heartbeat/IPv6addr meta-data" works, thank you Should I open a bugzilla ? couldn't hurt to What couldn't hurt ? This behaviour or to open a bugzilla ? :) None of the heartbeat class agents supports meta-data. Is this usage wrong? 2. On a resource group, when a service shut down (manually or because of fail), heartbeat try to start it again. It's ok, but if the service have, for any reason, stop after one minute, HB will try to start it undefinitely... How can I configure HeartBeat in the way it tries to load a service only 3 times between two reboot, and failover if it can't run durably the service ? I look after DTD, but didn't find anything about this. http://www.linux-ha.org/v2/faq/forced_failover Thanks a lot ! I hope you understand what I mean :) Thanks ! Ben ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Help : Is RHEL Advanced Server 3 Supports Heartbeat 2.0.8 and DRBD?
Hi All, I want to setup MySQL Failover (With the Single DB Fail) with the help of Heartbeat 2.0.8 and DRDB (Disk Based Replication). Anybody please guide me for the Operating System requirement for the HeartBeat 2.0.8 and DRDB. Though I have idea that DRBD is inbuilt with Red Hat Advanced Server 3. Right now I have Red-Hat Advanced Server version 3. If it supports Heartbeat 2.0.8, Please let me know. I have done Heartbeat 2.0.8 setup on CentOS 4.3 but the setup is not worked perfactly. I have couple of question please solve my query for the Heartbeat SetUp. 1. Which are the things to take while generating CIB.xml using haresource2cib.py file. (** I got many errors regarding this because my virtual ip is not activated by doing this) 2. Where to put MySQL service so Heartbeat can execute it. Right now I have only these two queries but tomorrow I will got RHEL Advanced Server 3 so, I can explain my problems more while the setup. Thanks & Regards, Jugal Shah - Ahhh...imagining that irresistible "new car" smell? Check outnew cars at Yahoo! Autos. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] IPv6, service fail on start behaviour
On 4/23/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: On Thu, Apr 19, 2007 at 12:08:48PM +0200, Andrew Beekhof wrote: > On 4/19/07, Benjamin Watine <[EMAIL PROTECTED]> wrote: > >Andrew Beekhof a écrit : > >> On 4/18/07, Benjamin Watine <[EMAIL PROTECTED]> wrote: > >>> Hi all > >>> > >>> I have two questions about Heartbeat v2 configuration : > >>> > >>> 1. IPv6addr : I've tried to configure virtual IPv6 address for a > >>> resource group. Because I didn't find documentation about this script, I > >>> did it like IPaddr, but it don't seems to be OK. > >>> What are the parameters to use with IPv6addr script ? Is this script > >>> fully OK ? > >> > >> /path/to/IPv6addr meta-data > >> > ># /etc/ha.d/resource.d/IPv6addr meta-data > >usage: /etc/ha.d/resource.d/IPv6addr > >(start|stop|status|usage|meta-data) > > > >That's why I ask. > >"/usr/lib/ocf/resource.d/heartbeat/IPv6addr meta-data" works, thank you > >Should I open a bugzilla ? > > couldn't hurt to None of the heartbeat class agents supports meta-data. Is this usage wrong? 99% of them just redirect to the OCF agent... so in theory it should work anyway ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
R: [Linux-HA] I: Mysql Ha cluster
Hallo, thank you for your reply, but is i type Mysql meta-data The out is the follow 1.0 Resource script for MySQL. It manages a MySQL Database instance as an HA resource. MySQL resource agent Configuration file MySQL config Directory containing databases MySQL datadir User running MySQL daemon MySQL user Group running MySQL daemon (for logfile and directory permissions) MySQL group Table to be tested in monitor statement (in . notation) MySQL test table MySQL test user MySQL test user MySQL test user password MySQL test user password If the MySQL database does not exist, it will be created Create the database if it does not exist And there aren't any parameter relative to the binary location of mysqld. -Messaggio originale- Da: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Per conto di Alan Robertson Inviato: sabato 14 aprile 2007 16.37 A: General Linux-HA mailing list Oggetto: Re: [Linux-HA] I: Mysql Ha cluster Viesti Luca wrote: > > > > I have need to create a mysql centralized cluster whith 3 different > mysql. > One mysql for devel istance > One mysql for test istance and the last for the production. > Now i have installed the version 2.08 of heartbeat on a sles9 SP3. > > > Can i specify different binary for start end stop mysql when i use the > mysql agent? There is a parameter called 'binary' which you can use to configure that (at least in the version I'm looking at) > I think that i use the OCF agent and after specify 3 different > OCF-resource-name (mysql-dev,..,mysql-prod) > > Can someone help me to configure OCF? Have you read the script? Have you run it and asked for its metadata? /usr/lib/ocf/resource.d/heartbeat/mysql meta-data will tell you all the parameters it supports and what they mean. The output from this is as follows: 1.0 Resource script for MySQL. It manages a MySQL Database instance as an HA resource. MySQL resource agent Location of the MySQL binary MySQL binary Configuration file MySQL config Directory containing databases MySQL datadir User running MySQL daemon MySQL user Group running MySQL daemon (for logfile and directory permissions) MySQL group Table to be tested in monitor statement (in . notation) MySQL test table MySQL test user MySQL test user MySQL test user password MySQL test user password If the MySQL database does not exist, it will be created Create the database if it does not exist -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems The information contained in this electronic message and any attachments (the "Message") is intended for one or more specific individuals or entities, and may be confidential, proprietary, privileged or otherwise protected by law. If you are not the intended recipient, please notify the sender immediately, delete this Message and do not disclose, distribute, or copy it to any third party or otherwise use this Message. Electronic messages are not secure or error free and can contain viruses or may be delayed, and the sender is not liable for any of these occurrences. The sender reserves the right to monitor, record and retain electronic messages. Le informazioni contenute in questo messaggio e gli eventuali allegati (il "Messaggio") si intendono inviate a uno o piu' specifici destinatari. Il contenuto del Messaggio puo' essere confidenziale, riservato e comunque protetto dalla legge applicabile. Se non siete i destinatari del Messaggio, siete pregati di informare immediatamente il mittente, cancellate questo Messaggio, non rivelatelo, non distribuitelo ne' inoltratelo a terzi, non copiatelo ne' fatene alcun uso. I messaggi di posta elettronica non sono sicuri e sono soggetti ad alterazioni, possono essere trasmettitori di Virus informatici o soggetti a ritardi nella distribuzione. Il mittente del Messaggio non puo' essere in alcun modo considerato responsabile per queste evenienze. Il mittente si riserva il diritto di archiviare, ritenere e controllare i messaggi di posta elettronica. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] nodes stays offline after communication is restored
On Mon, Apr 23, 2007 at 10:27:55AM +0200, Thomas Åkerblom (HF/EBC) wrote: > Hi. > > OS: SLES10 > Linux-HA 2.0.8 > > I have a system with two nodes (HA-1 & HA-2) and one standby (HA-3). > To illustrate my problem I have set up HA to define two alias addresses each > on the two hosts HA-1 & HA-2. > After initiation all is OK and crm_mon on all three nodes shows that all > nodes are online. > When I unplug the network cable to HA-2, HA-3 will take over and also then > all seams OK. > HA-1 and HA-3 are online and HA-2 is offline. > HA-2 is still running but consider HA-1 & HA-3 to be offline, as they are. > The problem starts when I plug the network back to HA-2. > The situation stays, HA-2 is offline to HA-1 & HA-3 and vice versa. > I have a persistent split brain situation. > In the syslog I can see that they recognize each other to be alive but it > just doesn't appear to be good enough. > Is my configuration faulty? > I'm attaching the syslogs from the time when I plugged the cable back and for > a short time after. A problem somewhere in CCM but couldn't see anything obvious. BTW, you're running a version of heartbeat recently pulled from the dev branch (or you downloaded a compiled package from somewhere) which has lame logging, i.e. all messages are tagged with "logd" which is not very useful. It was me that broke it, but then fixed it on Thursday, so you should either get the newer code or, since this bug is excercised only if ha_logd logs through syslog, change your logging config accordingly. Thanks. Dejan > I also attach the cib.xml and the ha.cf > > > <> <> <> > > <> <> > > Regards > *** Thomas > This communication is confidential and intended solely for the addressee(s). > Any unauthorized review, use, disclosure or distribution is prohibited. If > you believe this message has been sent to you in error, please notify the > sender by replying to this transmission and delete the message without > disclosing it. Thank you. > E-mail including attachments is susceptible to data corruption, interruption, > unauthorized amendment, tampering and viruses, and we only send and receive > e-mails on the basis that we are not liable for any such corruption, > interception, amendment, tampering or viruses or any consequences thereof. > > Content-Description: cib.xml > ccm_transition="4" cib_feature_revision="1.3" generated="true" > dc_uuid="7e126899-9c1f-477c-9e0e-7d28620f89da" epoch="1" num_updates="32" > cib-last-written="Mon Apr 23 09:39:36 2007"> > > > > > name="transition_idle_timeout" value="120s"/> > value="false"/> > value="ignore"/> > value="true"/> > name="default_resource_failure_stickiness" value="-INFINITY"/> > > > > > type="normal"/> > type="normal"/> > type="normal"/> > > > > provider="heartbeat"> > > on_fail="stop"/> > on_fail="restart"/> > timeout="10s" on_fail="restart"/> > > > > > > > > > > provider="heartbeat"> > > on_fail="stop"/> > on_fail="restart"/> > timeout="10s" on_fail="restart"/> > > > > > > > > > > > > provider="heartbeat"> > > on_fail="stop"/> > on_fail="restart"/> > timeout="10s" on_fail="restart"/> > > > > > > > > > > provider="heartbeat"> > > on_fail="stop"/> > on_fail="restart"/> > timeout="10s" on_fail="restart"/> > > > > > > > > > > > > > > > operation="eq" value="ha-1"/> > > > > > operation="eq" value="ha-3"/> > > > > > operation="eq" value="ha-2"/> > > > > > operation="eq" value="ha-3"/> > > > score="-INFINITY"/> > > > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Dejan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha S
Re: [Linux-HA] heartbeat-2.0.8: load balancing
Simon Horman wrote: > On Sat, Apr 21, 2007 at 07:48:09PM -0600, Alan Robertson wrote: >> Gerry Reno wrote: >>> I have a virtual IP resource that I'm making available via heartbeat and >>> I am controlling this via the GUI. Now I want to add ldirectord load >>> balancing for a service on three machines. How can this be added? >>> ldirectord is installed but there is no config file. How do I see and >>> control these ldirectord load balanced resources in the GUI? Or are >>> they not manageable via heartbeat and GUI? >> At this point in time, the load balancer infrastructure doesn't >> integrate with the heartbeat infrastructure beyond being able to keep >> the load balancer running. >> >> Sorry :-( >> >> I can see that being a nice thing to do, though... > > Is the answer that ldirectord needs to be extended so that > the GUI knows how to configure it? If so I am (as always) > happy to consider patches. I think so, but also of course, the GUI would need work as well. It just seems like a nice thing to think about. -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] ipfail not failing over when ping nodes are unpingable
On Fri, 2007-04-20 at 13:55 -0600, Alan Robertson wrote: > Faisal Shaikh wrote: > > Hi all, > > > > Im having trouble with setting up a pair of machines with a single > > resource (an IP address) to failover between them. > > The machines are Sun Netras (T1 105) running Gentoo Linux. > > > > The scenario is as follows: > > fw1: (primary resource holder) > > eth0: 192.168.1.52 > > eth1: 10.0.0.2 > > > > fw3: (secondary resource holder) > > eth0: 192.168.1.60 > > eth1: 10.0.0.1 > > > > > > eth1 is used as a private network between these two machines for the > > heartbeat. (I havn't got the correct cable type to use the serial > > connection for the heartbeat.) > > > > > > The IP address fails over correctly in the following cases: > > > > 1. When I switch off the primary resource holder. > > 2. When I stop heartbeat on the primary resource holder. > > > > However, if I disconnect the primary resource holder from the network so > > that it cant ping the ping nodes, the IP address does not fail over to > > the secondary resource holder. > > > > After disconnecting the cable, The log entries in the primary resource > > holder is as follows: > > > > Apr 20 19:47:03 fw1 heartbeat: [4232]: WARN: node 192.168.1.2: is dead > > Apr 20 19:47:03 fw1 heartbeat: [4232]: WARN: node 192.168.1.3: is dead > > Apr 20 19:47:03 fw1 heartbeat: [4232]: debug: StartNextRemoteRscReq(): > > child count 1 > > Apr 20 19:47:03 fw1 heartbeat: [4232]: info: Link > > 192.168.1.2:192.168.1.2 dead. > > Apr 20 19:47:03 fw1 heartbeat: [4232]: info: Link > > 192.168.1.3:192.168.1.3 dead. > > Apr 20 19:47:03 fw1 heartbeat: [4496]: debug: notify_world: setting > > SIGCHLD Handler to SIG_DFL > > Apr 20 19:47:03 fw1 harc[4496]: info: Running /etc/ha.d/rc.d/status > > status > > Apr 20 19:47:03 fw1 heartbeat: [4512]: debug: notify_world: setting > > SIGCHLD Handler to SIG_DFL > > Apr 20 19:47:03 fw1 harc[4512]: info: Running /etc/ha.d/rc.d/status > > status > > > > > > And it stays there doing nothing. > > > > My ha.cf file is as follows: > > > > ucast eth1 10.0.0.1 > > logfile /var/log/ha-log > > debugfile /var/log/ha-debug > > keepalive 2 > > warntime 10 > > deadtime 30 > > initdead 120 > > baud 19200 > > udpport 694 > > auto_failback on > > node fw1 > > node fw3 > > > > > > respawn hacluster /usr/lib/heartbeat/ipfail > > ping 192.168.1.2 192.168.1.3 > > crm off > > > > My haresources file is : > > fw1 192.168.1.100/32/192.168.1.255 > > > > I'd appreciate it greatly if someone could point me in the right > > direction please. > > You need redundant communication for ipfail to work. > > You see, ipfail will only fail over if the two nodes can communicate > with each other, and agree to move things around. > > What you've done is created a split-brain, where the each node thinks > the other is dead. If the other is dead, who can take over? > > Hi Alan, Many thanks for your quick reply! I thought that I did have a redundant communication link. The service IP was on eth0(192.168.1.0 network) while the heartbeat link was on eth1 (10.0.0.0 network). After pulling the network cable on eth0, I could still see the heartbeat packets and their replies on eth1 using tcpdump. After pulling the network cable on eth0, if I stop heartbeat on the primary, the secondary takes over with no problems. This would indicate that the servers are able to communicate but the handover doesnt occur if a NIC/cable goes down on the primary. Im going to try to obtain the a null modem adaptor for a serial cable that Ill run between the two Netras. Regards, Faisal ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] ipfail not failing over when ping nodes are unpingable
Both cards on each machine(Sun Netras) have the same MAC addresses. Can this be a cause of the problem? fw1 ha.d # lspci | grep Ethernet 01:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01) 01:03.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01) fw1 ha.d # dmesg | grep Ethernet Ethernet address: 08:00:20:c2:d5:3e eth0: HAPPY MEAL (PCI/CheerIO) 10/100BaseT Ethernet 08:00:20:c2:d5:3e eth1: HAPPY MEAL (PCI/CheerIO) 10/100BaseT Ethernet 08:00:20:c2:d5:3e fw3 ~ # lspci | grep Ethernet 01:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01) 01:03.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01) fw3 ~ # dmesg | grep Ethernet [0.00] Ethernet address: 08:00:20:c2:d3:4e [ 16.613126] eth0: HAPPY MEAL (PCI/CheerIO) 10/100BaseT Ethernet 08:00:20:c2:d3:4e [ 16.714221] eth1: HAPPY MEAL (PCI/CheerIO) 10/100BaseT Ethernet 08:00:20:c2:d3:4e Faisal ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] STONITH in response to stop failures (suicide or ssh)
I think i want the same functioanlity as Christopher wants: * when a resource on a node goes into FAILED state, reboot the machine (currently we have no STONITH device - i know, it is insecure but i have to use what i have) Heartbeat version 2.0.8 Situation: * 2 node cluster * dummy-resource provided by heartbeat runs on management2 * DC management1 Actions: * touch /tmp/Dummy.monitor /tmp/Dummy.stop --> in this way monitor and stop opeartion fails Afterwards: * dummy-resource does not run anywhere * stonithd seems to core dump * reboot of management2 failed (?? may this be because the stonithd core dumps?) * "etc/init.d/heartbeat stop" on management2 hangs forever Here the attached CIB, pe-warn* from management1 and ha-log of both machine. Do i something wrong with the stonith device? On Tuesday 17 April 2007 15:07, Dave Blaschke wrote: > Christophe Zwecker wrote: > > Dave Blaschke wrote: > >> Christophe Zwecker wrote: > >>> Dave Blaschke wrote: > Christophe Zwecker wrote: > > Hi Dave, > > > > its this: > > > > grep mw-test /etc/ha.d/ha.cf > > nodemw-test-n1.i-dis.net > > nodemw-test-n2.i-dis.net > > > > [EMAIL PROTECTED] ~]# uname -n > > mw-test-n2.i-dis.net > > > And your cib.xml? > > > >>> > >>> grep mw-test /var/lib/heartbeat/crm/cib.xml > >>> >>> id="5b1a3c52-a893-44c5-a9c7-035fc632ff8d"> > >>> >>> id="cc1c8955-58d2-4ee3-8e98-b07599335e0c"> > >>> >>> id="prefered_location_group_1_expr" operation="eq" > >>> value="mw-test-n1.i-dis.net"/> > >>> > >> I'd actually like to see the whole thing please... > > > > > > here ya go, sorry for the delay i was on vacation! > Ahh, vacation. Okay, envying over... :-) > > I don't proclaim to be a R2 config expert, but I'm pretty sure you'll > need something similar to the following in your CIB to tell heartbeat > how to STONITH: > > provider="heartbeat"> > > You won't need any attributes for suicide, you'll need a hostlist if you > choose to use ssh. See > http://www.linux-ha.org/ConfiguringStonithPlugins for the full XML sample. > > > > thx alot for your input and time > > > > Christophe > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > pe-input-44.bz2 Description: BZip2 compressed data pe-warn-297.bz2 Description: BZip2 compressed data pe-warn-298.bz2 Description: BZip2 compressed data pe-warn-299.bz2 Description: BZip2 compressed data pe-warn-300.bz2 Description: BZip2 compressed data pe-warn-301.bz2 Description: BZip2 compressed data pe-warn-302.bz2 Description: BZip2 compressed data ha-log-mgt2.bz2 Description: BZip2 compressed data ha-log-mgt1.bz2 Description: BZip2 compressed data ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Cannot create group containing drbd using HB GUI
OK, unstuck, and moving forward with a patch from the DRBD email list... I've got drbd configured in a fairly reliable Master/Slave setup, and I can fail it back and forth between nodes using cibadmin and xml that changes the Place constraint from node to node. (Not sure what this means, but when the drbd processes first come up, the GUI indicates one as Master, but does not show the other as Slave, only that it is running. When I change the Place constraint, Master moves from one node to the other, then the formerly Master node indicates Slave. From that point on behavior is as expected.) Now, I've created a group containing only a single Filesystem resource, colocated to the drbd master (based on the previously discussed constraint rules of a -infinity for existing on a stopped or slave drbd node), ordered to come up after the drbd master. I'm using target_role to control whether HA starts it or not (one xml sets target_role to stopped, the other started). First question: What is the best way to start and stop resources, without using the GUI (In other words, does my use of target_role a good way to control resources)? Second question: Does it make more sense to have target_role defined in the group instance_attributes or in the instance_attributes within the individual primitive resource? Thanks, Doug On Fri, 2007-04-20 at 14:46 -0400, Doug Knight wrote: > Well, whatever was stuck, I had to do a rmmod to remove the drbd module > from the kernel, then modprobe it back in, and the "stuck" Secondary > indication went away. > > Doug > > On Fri, 2007-04-20 at 14:30 -0400, Doug Knight wrote: > > > I completely shutdown heartbeat on both nodes, cleared out the backup > > cib.xml files, recopied the cib.xml from the primary node to the > > secondary, then brought everything back up. This cleared the "diff" > > error. The drbd master/slave pair came up as expected, but when I tried > > to stop them, they eventually went into an unmanaged state. Looking at > > the logs and comparing to the stop function in the OCF script, I noticed > > that I was seeing a successful "drbdadm down", but the additional check > > for status after the down was indicating that the down was unsuccessful > > (from checking drbdadm state). Further, I manually verified that indeed > > the drbd processes were down, and executed the following: > > > > [EMAIL PROTECTED] xml]# /sbin/drbdadm -c /etc/drbd.conf state pgsql > > Secondary/Unknown > > [EMAIL PROTECTED] xml]# cat /proc/drbd > > version: 8.0.1 (api:86/proto:86) > > SVN Revision: 2784 build by [EMAIL PROTECTED], 2007-04-09 11:30:31 > > 0: cs:Unconfigured > > > > Its the same output on either node, and drbd is definitely down on both > > nodes. So, /proc/drbd correctly indicates drbd is down, but the > > subsequent check using drbdadm state comes back indicating one side is > > up in Secondary mode, which its not. This is why the resource is now in > > unmanaged mode. Any ideas why the two tools would differ? > > > > Doug > > > > On Fri, 2007-04-20 at 11:35 -0400, Doug Knight wrote: > > > > > In the interim I set the filesystem group to unmanaged to test failing > > > the drbd master/slave processes back and forth, using the the value part > > > of the place constraint. On my first attempt to switch nodes, it > > > basically took both drbd processes down, and they stayed down. When I > > > checked the logs on the node to which I was switching the primary drbd I > > > found a message about a failed application diff. I switched the place > > > constraint back to the original node. I decided to shutdown heartbeat on > > > the node where I was seeing the diff error, now the shutdown is hung and > > > the diff error below is repeating every minute: > > > > > > cib[3040]: 2007/04/20_11:24:52 WARN: cib_process_diff: Diff 0.11.587 -> > > > 0.11.588 not applied to 0.11.593: current "num_updates" is greater than > > > required > > > cib[3040]: 2007/04/20_11:24:52 WARN: do_cib_notify: cib_apply_diff of > > > FAILED: Application of an update diff failed > > > cib[3040]: 2007/04/20_11:24:52 WARN: cib_process_request: cib_apply_diff > > > operation failed: Application of an update diff failed > > > cib[3040]: 2007/04/20_11:24:52 WARN: cib_process_diff: Diff 0.11.588 -> > > > 0.11.589 not applied to 0.11.593: current "num_updates" is greater than > > > required > > > cib[3040]: 2007/04/20_11:24:52 WARN: do_cib_notify: cib_apply_diff of > > > FAILED: Application of an update diff failed > > > cib[3040]: 2007/04/20_11:24:52 WARN: cib_process_request: cib_apply_diff > > > operation failed: Application of an update diff failed > > > > > > > > > I (and my boss) are kind of getting frustrated getting this setup to > > > work. Is there something obvious I'm missing? Has anyone ever had HA > > > 2.0.8, using v2 monitoring and drbd ocf script, and drbd version 8.0.1 > > > working in a two node cluster? I'm concerned because of the comment made > > > earlier by Bernhard. > > > > > > Doug > >
RE: [Linux-HA] Cannot create group containing drbd using HB GUI
I'm also wrangling with this issue (getting drbd OCF to work in V2, logically grouping master mode with the services that are on it) One thing I've run into so far is that there appear to be some bugs in the drbd ocf script. 1) In do_cmd() it uses "local cmd_out" immediately before taking the result code from $?. This always succeeds (on CentOS 4.4 32 bit anyway). Declaring this local in an earlier line returns the correct return code from the drbdadm command from the function. As this return code is used elsewhere, it helps that failure codes are passed back as intended. 2) There needs to be a wait loop after the module is loaded, same as is in the drbd distributed /etc/init.d/drbd script. I inserted this into drbd_start() (UDEV_TIMEOUT is set in the script header to 10) # make sure udev has time to create the device files for RESOURCE in `$DRBDADM sh-resources`; do for DEVICE in `$DRBDADM sh-dev $RESOURCE`; do UDEV_TIMEOUT_LOCAL=$UDEV_TIMEOUT while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT_LOCAL -gt 0 ] ; do sleep 1 UDEV_TIMEOUT_LOCAL=$(( $UDEV_TIMEOUT_LOCAL-1 )) done done done It takes several seconds after the modload returns for the /dev/drbd0 device to appear - and nothing works until it does. 3) A similar timer is needed in drbd_promote as drbdadm won't let you "Primary" until the other is not "Primary". I found that hearbeat was firing off the promote on "b" slightly before the "demote" on "a", causing a failure. I added this: (REMOTE_DEMOTE_TIMEOUT is set in the script header to 10) drbd_get_status DEMOTE_TIMEOUT_LOCAL=$REMOTE_DEMOTE_TIMEOUT while [ "x$DRBD_STATE_REMOTE" = "xPrimary" ] && [ $DEMOTE_TIMEOUT_LOCAL -gt 0 ] ; do sleep 1 DEMOTE_TIMEOUT_LOCAL=$(( $DEMOTE_TIMEOUT_LOCAL-1 )) drbd_get_status done With these changes I was able to get drbd to start, stop and migrate cleanly when I tweaked the location scores. Getting the services dependent on that disk to do the same is still an open question :-) My modified drbd ocf script is attached, use at your own risk. Alastair Young Director, Operations Ludi labs 399 West El Camino Real Mountain View, CA 94040 Email: [EMAIL PROTECTED] Direct: 650-241-0068 Mobile: 925-784-0812 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Fick Sent: Thursday, April 19, 2007 1:13 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Cannot create group containing drbd using HB GUI Hi Doug, I personally could not get the DRBD OCF to work, I am using drbd .7x, what about you? I never tried a master/slave setup though. I created my own drbd OCF, it is on my site along with the CIB scripts. http://www.theficks.name/bin/lib/ocf/drbd You can even use the drbd CIBS as a starting place if you want: http://www.theficks.name/bin/lib/heartbeat/drbd I just updated them all (CIBS and OCF agents) if you want to try them out. -Martin --- Doug Knight <[EMAIL PROTECTED]> wrote: > I made the ID change indicated below (for the > colocation constraints), > and everything configured fine using cibadmin. Now, > I started JUST the > drbd master/slave resource, with the rsc_location > rule setting the > expression uname to one of the two nodes in the > cluster. Both drbd > processes come up and sync up the partition, but > both are still in > slave/secondary mode (i.e. the rsc_location rule did > not cause a > promotion). Am I missing something here? This is the > rsc_location > constraint: > > > score="100"> > attribute="#uname" > operation="eq" value="arc-dknightlx"/> > > > > (By the way, the example from > Idioms/MasterConstraints web page does not > have an ID specified in the expression tag, so I > added one to mine.) > Doug > On Thu, 2007-04-19 at 13:04 -0400, Doug Knight > wrote: > > > ... > > > > > > > >> > > > > > For exemple > > > > > rsc="drbd1"> > > > > > score="600"> > > > > > operation="eq" value="nodeA" > > > > > id="pref_drbd1_loc_nodeA_attr"/> > > > > > > > > > > score="800"> > > > > > operation="eq" value="nodeB" > > > > > id="pref_drbd1_loc_nodeB_attr"/> > > > > > > > > > > > > > > > > > > > > In this case, nodeB will be primary for > resource drbd1. Is that what > > > > > > > > > > >> you > > > > > >> > > > > > were looking for ? > > > > > > > > > > >>> Not like this, not when using the drbd > OCF Resource Agent as a > > > > > >>> master-slave one. In that case, you need > to bind the rsc_location to > > > > > >>> > > > > > >> the > > > > > >> > > > > > >>> role=Master as well. > > > > > >>> > > > > > >> I was missing this in the CIB idioms > page. I just added it. > > > > > >> > > > > > >>h
RE: [Linux-HA] Forced umount of DRBD volume
This is what I have in my /etc/init.d/drbdctrl for my HB V1 machines. This almost always seems to work. I do make sure that all important services that should be accessing the disk are already dead, but who knows who may be logged in scanning logfiles etc. It is important to point fuser -mk at the disk device, not at the mount point. If the device is already dismounted, that latter syntax kills a lot of processes that it shouldn't. #!/bin/bash case "$1" in start) # makes this node the primary node /sbin/drbdadm primary all mount /dev/drbd0 /service ;; stop) # makes this node the secondary node fuser -mk /dev/drbd0 sleep 2 umount /service /sbin/drbdadm secondary all ;; *) echo "Usage: /etc/init.d/drbctrl {start|stop}" exit 1 ;; esac exit 0 Alastair Young Director, Operations Ludi labs 399 West El Camino Real Mountain View, CA 94040 Email: [EMAIL PROTECTED] Direct: 650-241-0068 Mobile: 925-784-0812 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Piotr Kaczmarzyk Sent: Saturday, April 21, 2007 5:30 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Forced umount of DRBD volume > If would be the best to replace underlying DRBD with NFS so that user could > still operate on his files without logging out, but user processes (including > bash) would have to be restarted to reopen the files. I'll answer to myself (partially) - first of all I found 'Filesystem' OCF script and it's use of fuser command (so the recommended solution is to kill everything), second - I read a note on http://www.linux-ha.org/HaNFS saying that "NFS-mounting any filesystem on your NFS servers is highly discouraged". I don't understand why. I did that manually and it worked: - mounted /dev/drbd2 as /usr/local/mysql-drbd - added IP 10.0.0.1 to eth0 - started NFS - mounted via NFS 10.0.0.1:/usr/local/mysql-drbd as /usr/local/mysql Then: - stopped mysqld (just in case) - stopped NFS server - removed IP 10.0.0.1 - unmounted /dev/drbd2 (only nfsd used it) and set it as secondary On second node: - set /dev/drbd2 as primary - mounted it as /usr/local/mysql-drbd - added IP 10.0.0.1 to eth0 - started NFS And the directory was still accessible from the first node. So what's wrong with such configuration and why it should be avoided? It has advantages - users having shell access won't notice that something has changed, postfix will be able to deliver mail queued in local spool, etc. Best regards, Piotr ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
RE: [Linux-HA] Cannot create group containing drbd using HB GUI
Attached is the cib I am using. By adjusting the scores on the drbd_m_like_ rules I can migrate the drbd master between nodes, and the filesystem cleanly dismounts first and remounts on the new master after. What I also need it to do is to migrate the services in response to a failure or other score change of the grp_www group. I've tried many permutations and I can't figure this out. The best I come up with is failure of the rsc_www_fs resource in situ after I manually dismount it a few times. At worst, Bad Things Happen. As best as I can guess grp_www won't move to the slave node no matter what. Perhaps because of the -INFINITY in the colocation? What I need is to have the other node become master and then have grp_www start on it. Essentially I need the master state of drbd-ms to effectively be the first member of grp_www. I know that cannot be done overtly, but how does one get that effect? What's the incantation to get the master_slave to change master in response to failure/scorechange on a collocated service? I am running hb2.0.8 on CentOS4.4 i386 running under vmware. Drbd is v0.7 with the modified/fixed drbd ocf script I posted earlier. Alastair Young Director, Operations Ludi labs 399 West El Camino Real Mountain View, CA 94040 Email: [EMAIL PROTECTED] Direct: 650-241-0068 Mobile: 925-784-0812 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alastair N. Young Sent: Monday, April 23, 2007 2:19 PM To: General Linux-HA mailing list Subject: RE: [Linux-HA] Cannot create group containing drbd using HB GUI I'm also wrangling with this issue (getting drbd OCF to work in V2, logically grouping master mode with the services that are on it) One thing I've run into so far is that there appear to be some bugs in the drbd ocf script. 1) In do_cmd() it uses "local cmd_out" immediately before taking the result code from $?. This always succeeds (on CentOS 4.4 32 bit anyway). Declaring this local in an earlier line returns the correct return code from the drbdadm command from the function. As this return code is used elsewhere, it helps that failure codes are passed back as intended. 2) There needs to be a wait loop after the module is loaded, same as is in the drbd distributed /etc/init.d/drbd script. I inserted this into drbd_start() (UDEV_TIMEOUT is set in the script header to 10) # make sure udev has time to create the device files for RESOURCE in `$DRBDADM sh-resources`; do for DEVICE in `$DRBDADM sh-dev $RESOURCE`; do UDEV_TIMEOUT_LOCAL=$UDEV_TIMEOUT while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT_LOCAL -gt 0 ] ; do sleep 1 UDEV_TIMEOUT_LOCAL=$(( $UDEV_TIMEOUT_LOCAL-1 )) done done done It takes several seconds after the modload returns for the /dev/drbd0 device to appear - and nothing works until it does. 3) A similar timer is needed in drbd_promote as drbdadm won't let you "Primary" until the other is not "Primary". I found that hearbeat was firing off the promote on "b" slightly before the "demote" on "a", causing a failure. I added this: (REMOTE_DEMOTE_TIMEOUT is set in the script header to 10) drbd_get_status DEMOTE_TIMEOUT_LOCAL=$REMOTE_DEMOTE_TIMEOUT while [ "x$DRBD_STATE_REMOTE" = "xPrimary" ] && [ $DEMOTE_TIMEOUT_LOCAL -gt 0 ] ; do sleep 1 DEMOTE_TIMEOUT_LOCAL=$(( $DEMOTE_TIMEOUT_LOCAL-1 )) drbd_get_status done With these changes I was able to get drbd to start, stop and migrate cleanly when I tweaked the location scores. Getting the services dependent on that disk to do the same is still an open question :-) My modified drbd ocf script is attached, use at your own risk. Alastair Young Director, Operations Ludi labs 399 West El Camino Real Mountain View, CA 94040 Email: [EMAIL PROTECTED] Direct: 650-241-0068 Mobile: 925-784-0812 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin Fick Sent: Thursday, April 19, 2007 1:13 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Cannot create group containing drbd using HB GUI Hi Doug, I personally could not get the DRBD OCF to work, I am using drbd .7x, what about you? I never tried a master/slave setup though. I created my own drbd OCF, it is on my site along with the CIB scripts. http://www.theficks.name/bin/lib/ocf/drbd You can even use the drbd CIBS as a starting place if you want: http://www.theficks.name/bin/lib/heartbeat/drbd I just updated them all (CIBS and OCF agents) if you want to try them out. -Martin --- Doug Knight <[EMAIL PROTECTED]> wrote: > I made the ID change indicated below (for the > colocation constraints), > and everything configured fine using cibadmin. Now, > I started JUST the > drbd master/slave resource, with the rsc_location > rule setting the > expression uname to one of the two nodes in the > cluster. Both dr
Re: [Linux-HA] heartbeat-2.0.8: load balancing
On Mon, Apr 23, 2007 at 09:42:35AM -0600, Alan Robertson wrote: > Simon Horman wrote: > > On Sat, Apr 21, 2007 at 07:48:09PM -0600, Alan Robertson wrote: > >> Gerry Reno wrote: > >>> I have a virtual IP resource that I'm making available via heartbeat and > >>> I am controlling this via the GUI. Now I want to add ldirectord load > >>> balancing for a service on three machines. How can this be added? > >>> ldirectord is installed but there is no config file. How do I see and > >>> control these ldirectord load balanced resources in the GUI? Or are > >>> they not manageable via heartbeat and GUI? > >> At this point in time, the load balancer infrastructure doesn't > >> integrate with the heartbeat infrastructure beyond being able to keep > >> the load balancer running. > >> > >> Sorry :-( > >> > >> I can see that being a nice thing to do, though... > > > > Is the answer that ldirectord needs to be extended so that > > the GUI knows how to configure it? If so I am (as always) > > happy to consider patches. > > I think so, but also of course, the GUI would need work as well. > > It just seems like a nice thing to think about. Yes, I totally agree. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems