Re: [Linux-HA] routing issue on cluster node
Hi Nick, Well, as you recommended I set an IP to eth2 NIC and it worked out :-D . Thank you very much for your help. One more thing, that machine is also a VPN server and the openvpn client's side complained about sending the vpn start up request to ip A (assigned to eth2:0) and receiving the response from ip B (assigned to eth2), so I added the "float" option to the client's openvpn configuration file and everything is ok now. This means that whatever packet sent from the server will have eth2's ip, but I think this can be solved with IPsrsaddr resource agent ... so that's the next thing I will experimenting with :-) Again Nick, thank you very much for your help it was very ... well helpful ;-) Regards, Nick escribió: Gabriel Bermudez ha scritto: Hi Nick, I followed your advise but it didn't work out. This are the content of the files where I try to set the default gateway: /etc/rc.local ip route add default via xxx.xxx.xxx.xxx dev eth2 touch /var/lock/subsys/local /etc/sysconfig/network-scripts/route-eth2 default via xxx.xxx.xxx.xxx You can add the dev eth2 to the end of that comand, but it's not required, as long as there's an interface with an IP in the same subnet already on that network, which should be eth2 (although now you mention that there isn't one assigned...). Do you get any errors in /var/log/messages? /etc/sysconfig/network NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=gw1.mynetwork.net GATEWAY=xxx.xxx.xxx.xxx none of this seems wrong.. I don't think it has something to do but the eth2 doesn't have an ip assign to it /etc/sysconfig/network-scripts/ifcfg-eth2 DEVICE=eth2 BOOTPROTO=none HWADDR=00:15:17:3a:fa:be ONBOOT=yes TYPE=Ethernet hmm, it's probably not going to work without having an interface with a source address.. The logs should give you more information, or at least an error to start. Thanks for your help. Nick escribió: Gabriel Bermudez ha scritto: Hi, I'm trying to configure a high availability router for my internal network. I'm able to set both private and public ips on eth0:0 and eth2:0 respectively with heartbeat. The gateway is configured using the /etc/sysconfig/network file (using centos 5.2) NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=gw1.mynetwork.net GATEWAY=xxx.xxx.xxx.xxx but for some reason it doesn't persists when I reboot the server. I have to manually use the route command to restore the default gateway route add -net 0.0.0.0 gw xxx.xxx.xxx.xxx I've also tried to set up the route in "/etc/rc.local" and in "/etc/sysconfig/network-scripts/route-eth2" files with no success. I know that this is not a heartbeat related problem but I've tried to google this with no success, so any help on this issue would be greatly appreciated. Thanks in advanced, Gabriel. Have a look at the ip comand (ip route in specific), and add instead of all those lines add: default via xxx.xxx.xxx.xxx in the route-eth2 file. ip is much simpler, and if you ever decide to do policy based routing, much more powerful. Nick ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Fail over to another node (was: Problem with LSB init script when monitoring)
On Thu, 2008-12-04 at 13:48 +0100, Dejan Muhamedagic wrote: > > I tried the tomcat OCF RA but there are lots of incorrect values > hard > > coded in so I edited up an init script to what I thought was LSB > > compatible. > > It should be fixed then. Can you provide a list of stuff which is > wrong on your platform/distribution? > > Thanks, > > Dejan For some reason when I first tried it I couldn't get the java process to show the -Dname parameter which the pgrep and pkill commands in the OCF script rely on so I thought it had been dropped in Tomcat 6. Now I've tried it again it is there so I'm happily using the OCF script which is working fine. I now just have one more question. I have 2 nodes both running identical independent Tomcats. When I kill the Java process heartbeat will immediately restart it but I want it to fail the IP and tomcat resource over to the other node. I can't find it in the documentation. Could anyone provide any pointers on what I need to do so these 2 resources failover to the other node? Thanks for all the help so far! Darren ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Bug in MailTo in heartbeat-resources-2.99.2-6.1 and long failover duration with pingd
> IPaddr does not check reachability in the net. > There is a parameter in the ha.cf called deadping. It is 30s by default. So > changes in the reachability take 30s + damping time to get active. Set this > paramter to 10s (or lower) according to your needs. You will have to restart > heartbeat. > > Greetings, Hi, I tested to set the value of 'deadping' to 3 with a 'dampen' value of 1 ha.cf: initdead 30 deadtime 2 keepalive 800ms warntime 1800ms deadping 3 xml snippet for resource pingd: But the cluster still behaves as before. After the master node lost its connection to one ping node, the failover occurs at least 70 seconds later. In addition it would be great to reduce the ping testing interval. it still sticks at 10 seconds. As you can see I tried to set it to one seconds. When i look into meta-data of ocf-RA pingd i can found this entries: I guess the op attribute 'interval' means something like pingd test interval, right? As far as I can remember in heartbeat version 2.1.4 the default test interval for pingd was 1 second. Greetings, Joerg -- Dipl.-Ing. (FH) Joerg Streckfuss, Phone: +49 40 808077-631 DFN-CERT Services GmbH, https://www.dfn-cert.de/, Phone +49 40 808077-555 Sitz / Register: Hamburg, AG Hamburg, HRB 88805, Ust-IdNr.: DE 232129737 Sachsenstraße 5, 20097 Hamburg/Germany, CEO: Dr. Klaus-Peter Kossakowski ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Bug in MailTo in heartbeat-resources-2.99.2-6.1 and long failover duration with pingd
Okey, this did the job. In /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries I had to replace : ${MAILCMD:=} with : ${MAILCMD:=/usr/bin/mail} Thanks, Joerg > Hi, > > On Wed, Dec 03, 2008 at 08:40:46AM +0100, Dominik Klein wrote: > > There was a thread about this in November. Search the archives for thread > > "E-Mail Notification Problem by takeover". Iirc, it was a problem with the > > package which should be manually fixable by editing an included file. > > Right. Though I'll fix MailTo to print a more sensible error > message. > > Thanks, > > Dejan > > > Regards > > Dominik > > > > Joerg Streckfuss wrote: > >> Hi list, > >> > >> Im using heartbeat-2.99.2-6.1 with pacemaker-1.0.1-3.1 in a testsetup for a > >> firewall cluster. This setup has two nodes each with two physical > >> interfaces > >> eth0 and eth1. > >> > >> I configured two resources of the typ IPaddr2 and one MailTo resource to > >> get an > >> Email when a failover occurs. I put these resources into one group to > >> ensure > >> that the resources will always run on one node. > >> The problem is each time when I force a failover the resource MailTo > >> produces > >> the following report in /var/log/ha-log and unfortunately no email will be > >> send. > >> > >> > >> RA > >> output:(MailTo-admin:start:stderr) > >> /usr/lib/ocf/resource.d//heartbeat/MailTo: > >> line 86: -s: command not found > >> > >> For me it looks like MailTo has no valid $MAILCMD. > >> Here is my xml snippet. > >> > >> > >> >> provider="heartbeat"> > >> > >>>> timeout="3s" role="Started" on-fail="restart"/> > >> > >> > >> > >> > >> >> value="24"/> > >> >> value="VIP"/> > >> > >> > >> >> provider="heartbeat"> > >> > >>>> timeout="3s" role="Started" on-fail="restart"/> > >> > >> > >>>> value="192.168.2.50"/> > >> >> value="eth1"/> > >> >> name="cidr_netmask" value="24"/> > >> >> name="iflabel" value="VIP"/> > >> > >> > >> >> provider="heartbeat"> > >> > >> > >> > >> > >> > >> > >> > >> In additionen, when ping packets from my configured pingd on the prefered > >> maste node stay away, the duration for a complete failover takes about 75 > >> seconds. This is a long time and not reasonable for a firewall cluster. > >> I tried to set the monitor option interval from pingd to 3 seconds. But > >> this > >> changed nothing. The interval for ping packets remains at 10 seconds. > >> Are there better places, like adding another resource to monitor the link > >> status of the network interfaces to achieve a faster failover? I believe > >> IPaddr2 won't check network link status, right? > >> > >> > >> Here is my xml snippet for pingd > >> > >> > >> > >>>> value="2"/> > >>>> name="clone_node_max" value="1"/> > >> > >> > >> > >> > >> >> value="200"/> > >> > >> >> value="default-gateway switch1 switch2"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> Thanks in advance, > >> > >>Joerg > >> ___ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Problem with mailman
Hi, On Wed, Dec 03, 2008 at 12:14:52PM -0800, Syn, Joonho wrote: > Here is the output when I try a re-probe of services. > > Dec 3 12:10:09 mail1 crm_resource: [23866]: info: Invoked: crm_resource -P > -H mail1 > Dec 3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource > masterHttpd was active at shutdown. You may ignore this error if it is > unmanaged. > Dec 3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource masterFS > was active at shutdown. You may ignore this error if it is unmanaged. > Dec 3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource > master_IPaddr was active at shutdown. You may ignore this error if it is > unmanaged. > Dec 3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource > masterPostfix was active at shutdown. You may ignore this error if it is > unmanaged. > Dec 3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource > masterDovecot was active at shutdown. You may ignore this error if it is > unmanaged. > Dec 3 12:10:09 mail1 crm_resource: [23866]: WARN: main: here i am - 3 This output doesn't say much. Anyhow, you should check if your mailman init script is LSB compliant. Take a look here: http://www.linux-ha.org/LSBResourceAgent for details. Thanks, Dejan > > On 12/2/08 3:52 PM, "Alex Strachan" <[EMAIL PROTECTED]> wrote: > > Any output in /var/log/messages for when HA tries to start masterMailman? > e.g. > > Dec 2 17:55:12 itbaims lrmd: [3790]: info: RA output: > (resource_its_fild:start:stdout) Warning: no access to tty (Bad file > descriptor). Thus no job control in this shell. > > This is the output from a script. Enable debug on the shell script, then > the output will be captured by HA. > > > > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:linux-ha- > > [EMAIL PROTECTED] On Behalf Of Syn, Joonho > > Sent: Wednesday, 3 December 2008 9:10 AM > > To: Linux-HA mailing list > > Subject: [Linux-HA] Problem with mailman > > > > Hello List, > > I'm a newcomer to mailman and I'm having an issue where mailman does not > > start. I can start the process manually using the service command in > > RHEL5 but attempting to start it via the crm_resource command and/or the > > gui seemingly has no effect. Looking at my cib.xml I don't see anything > > particularly wrong but I'm hoping some better trained eyes can help me to > > indentify my issue. > > > > > > Last updated: Mon Dec 1 15:31:51 2008 > > Current DC: mail1 (bad25385-cb55-44d7-9f66-22a25e3f30e7) > > 2 Nodes configured. > > 1 Resources configured. > > > > > > Node: mail2 (6b7231f6-ad2e-4879-87b9-8d38bff5420b): online > > Node: mail1 (bad25385-cb55-44d7-9f66-22a25e3f30e7): online > > > > Resource Group: mastermailer_group > > masterFS(heartbeat::ocf:Filesystem):Started mail1 > > master_IPaddr(heartbeat::ocf:IPaddr2):Started mail1 > > masterPostfix(lsb:postfix):Started mail1 > > masterDovecot(lsb:dovecot):Started mail1 > > masterHttpd(lsb:httpd):Started mail1 > > masterMailman(lsb:mailman):Stopped > > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New Kernel - Can Not Compile DRBD
Hi, On Wed, Dec 03, 2008 at 01:11:54PM -0700, [EMAIL PROTECTED] wrote: > I am running drbd-8.0.8 and was forced to install a new kernel on a > fedora 8 machine (local security policy). Before I booted off the new > kernel, I am trying to > compile and install the new drbd.ko module. I think you should post this to the drbd related lists. Thanks, Dejan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Problem with LSB init script when monitoring
Hi, On Wed, Dec 03, 2008 at 05:09:03PM +, Darren Mansell wrote: > Hello everyone. > > I am trying to run a 2 node cluster with 1 shared IP for Tomcat. This > works fine until I set the monitor operation inside the Tomcat resource > where the CRM keeps trying to restart Tomcat over and over infinitely. > > Without the monitor operation in the CIB it won't keep trying to restart > Tomcat but if I stop it manually it doesn't automatically get started > again. > > I tried the tomcat OCF RA but there are lots of incorrect values hard > coded in so I edited up an init script to what I thought was LSB > compatible. It should be fixed then. Can you provide a list of stuff which is wrong on your platform/distribution? Thanks, Dejan > This is the init script: > > > > #!/bin/sh > > > # description: Start or stop the Tomcat server > > > # > > > ### BEGIN INIT INFO > > > # Provides: tomcat > > > # Required-Start: $network $syslog > > > # Required-Stop: $network > # Default-Start: 3 > # Default-Stop: 0 > # Description: Start or stop the Tomcat server > ### END INIT INFO > > RETVAL=$? > NAME=tomcat > export JRE_HOME=/opt/java > export CATALINA_HOME=/opt/$NAME > export CATALINA_BASE=/opt/$NAME > export JAVA_HOME=/opt/java > > check_running() { > NAME=$1 > LINES=`ps -ef | grep java | grep opt | grep $NAME | grep -v grep | wc > -l ` > [ $LINES -gt 0 ] && echo "yes" > } > > case "$1" in > 'start') > RUNNING=`check_running $NAME` > [ "$RUNNING" ] && exit 0 > if [ -f $CATALINA_HOME/bin/startup.sh ]; > then > echo $"Starting Tomcat" > $CATALINA_HOME/bin/startup.sh > fi > ;; > 'stop') > RUNNING=`check_running $NAME` > [ ! "$RUNNING" ] && exit 0 > if [ -f $CATALINA_HOME/bin/shutdown.sh ]; > then > echo $"Stopping Tomcat" > $CATALINA_HOME/bin/shutdown.sh > fi > ;; > 'restart') > $0 stop > sleep 15 > $0 start > ;; > 'status') > RUNNING=`check_running $NAME` > [ "$RUNNING" ] && exit 0 || exit 1;; > *) > echo > echo $"Usage: $0 {start|stop}" > echo > exit 1;; > > esac > exit $RETVAL > > > > > > This is my cib.xml > > > > > > num_peers="2" cib_feature_revision="1.3" crm_feature_set="2.0" epoch="125" > num_updates="82" cib-last-written="Wed Dec 3 16:45:56 2008" > ccm_transition="2" dc_uuid="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb"> > > > > > > > > > > > > > > value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/> > > > > > > > > > > >
[Linux-HA] monitor OCF script
Hi All, If the return value of monitor action is other than 0 and 7, then does it 1. Tries to start the resource in the primary node itself or 2. Does it shift to secondary node Awaiting your help!! Padmaja. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] monitor action
Hi Sony If the return value of monitor action is other than 0 and 7, then does it 1. Tries to start the resource in the primary node itself or 2. Does it shift to secondary node Awaiting your help!! That totally depends on your configuration. Tell us what you'd like the cluster to do and which version you use. Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] routing issue on cluster node
Gabriel Bermudez ha scritto: Hi Nick, I followed your advise but it didn't work out. This are the content of the files where I try to set the default gateway: /etc/rc.local ip route add default via xxx.xxx.xxx.xxx dev eth2 touch /var/lock/subsys/local /etc/sysconfig/network-scripts/route-eth2 default via xxx.xxx.xxx.xxx You can add the dev eth2 to the end of that comand, but it's not required, as long as there's an interface with an IP in the same subnet already on that network, which should be eth2 (although now you mention that there isn't one assigned...). Do you get any errors in /var/log/messages? /etc/sysconfig/network NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=gw1.mynetwork.net GATEWAY=xxx.xxx.xxx.xxx none of this seems wrong.. I don't think it has something to do but the eth2 doesn't have an ip assign to it /etc/sysconfig/network-scripts/ifcfg-eth2 DEVICE=eth2 BOOTPROTO=none HWADDR=00:15:17:3a:fa:be ONBOOT=yes TYPE=Ethernet hmm, it's probably not going to work without having an interface with a source address.. The logs should give you more information, or at least an error to start. Thanks for your help. Nick escribió: Gabriel Bermudez ha scritto: Hi, I'm trying to configure a high availability router for my internal network. I'm able to set both private and public ips on eth0:0 and eth2:0 respectively with heartbeat. The gateway is configured using the /etc/sysconfig/network file (using centos 5.2) NETWORKING=yes NETWORKING_IPV6=no HOSTNAME=gw1.mynetwork.net GATEWAY=xxx.xxx.xxx.xxx but for some reason it doesn't persists when I reboot the server. I have to manually use the route command to restore the default gateway route add -net 0.0.0.0 gw xxx.xxx.xxx.xxx I've also tried to set up the route in "/etc/rc.local" and in "/etc/sysconfig/network-scripts/route-eth2" files with no success. I know that this is not a heartbeat related problem but I've tried to google this with no success, so any help on this issue would be greatly appreciated. Thanks in advanced, Gabriel. Have a look at the ip comand (ip route in specific), and add instead of all those lines add: default via xxx.xxx.xxx.xxx in the route-eth2 file. ip is much simpler, and if you ever decide to do policy based routing, much more powerful. Nick ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] monitor action
Hi All, If the return value of monitor action is other than 0 and 7, then does it 1. Tries to start the resource in the primary node itself or 2. Does it shift to secondary node Awaiting your help!! Regards, Sony. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems