Re: [Linux-HA] routing issue on cluster node

2008-12-04 Thread Gabriel Bermudez

Hi Nick,

Well, as you recommended I set an IP to eth2 NIC and it worked out :-D 
.  Thank you very much for your help.  One more thing, that machine is 
also a VPN server and the openvpn client's side complained about sending 
the vpn start up request to ip A (assigned to eth2:0) and receiving the 
response from ip B (assigned to eth2), so I added the "float" option to 
the client's openvpn configuration file and everything is ok now.  This 
means that whatever packet sent from the server will have eth2's ip, but 
I think this can be solved with IPsrsaddr resource agent ... so that's 
the next thing I will experimenting with  :-) Again Nick, thank you very 
much for your help it was very ... well helpful  ;-)


Regards,

Nick escribió:

Gabriel Bermudez ha scritto:

Hi Nick,

I followed your advise but it didn't work out.  This are the content 
of the files where I try to set the default gateway:


/etc/rc.local

ip route add default via xxx.xxx.xxx.xxx dev eth2
touch /var/lock/subsys/local

/etc/sysconfig/network-scripts/route-eth2

default via xxx.xxx.xxx.xxx
You can add the dev eth2 to the end of that comand, but it's not 
required, as long as there's an interface with an IP in the same 
subnet already on that network, which should be eth2 (although now you 
mention that there isn't one assigned...).


Do you get any errors in /var/log/messages?



/etc/sysconfig/network

NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=gw1.mynetwork.net
GATEWAY=xxx.xxx.xxx.xxx

none of this seems wrong..


I don't think it has something to do but the eth2 doesn't have an ip 
assign to it


/etc/sysconfig/network-scripts/ifcfg-eth2

DEVICE=eth2
BOOTPROTO=none
HWADDR=00:15:17:3a:fa:be
ONBOOT=yes
TYPE=Ethernet
hmm, it's probably not going to work without having an interface with 
a source address..


The logs should give you more information, or at least an error to start.



Thanks for your help.


Nick escribió:

Gabriel Bermudez ha scritto:

Hi,

I'm trying to configure a high availability router for my internal 
network.  I'm able to set both private and public ips on eth0:0 and 
eth2:0 respectively with heartbeat.  The gateway is configured 
using the /etc/sysconfig/network file (using centos 5.2)


NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=gw1.mynetwork.net
GATEWAY=xxx.xxx.xxx.xxx

but for some reason it doesn't persists when I reboot the server.  
I have to manually use the route command to  restore the default 
gateway


route add -net 0.0.0.0 gw xxx.xxx.xxx.xxx

I've also tried to set up the route in "/etc/rc.local" and in 
"/etc/sysconfig/network-scripts/route-eth2" files with no success.  
I know that this is not a heartbeat related problem but I've tried 
to google this with no success, so any help on this issue would be 
greatly appreciated.


Thanks in advanced,

Gabriel.
Have a look at the ip comand (ip route in specific),  and add 
instead of all those lines add:


default via xxx.xxx.xxx.xxx

in the route-eth2 file.

ip is much simpler, and if you ever decide to do policy based 
routing, much more powerful.


Nick





___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Fail over to another node (was: Problem with LSB init script when monitoring)

2008-12-04 Thread Darren Mansell
On Thu, 2008-12-04 at 13:48 +0100, Dejan Muhamedagic wrote:
> > I tried the tomcat OCF RA but there are lots of incorrect values
> hard
> > coded in so I edited up an init script to what I thought was LSB
> > compatible.
> 
> It should be fixed then. Can you provide a list of stuff which is
> wrong on your platform/distribution?
> 
> Thanks,
> 
> Dejan

For some reason when I first tried it I couldn't get the java process to
show the -Dname parameter which the pgrep and pkill commands in the OCF
script rely on so I thought it had been dropped in Tomcat 6. Now I've
tried it again it is there so I'm happily using the OCF script which is
working fine.

I now just have one more question. I have 2 nodes both running identical
independent Tomcats. When I kill the Java process heartbeat will
immediately restart it but I want it to fail the IP and tomcat resource
over to the other node.

I can't find it in the documentation. Could anyone provide any pointers
on what I need to do so these 2 resources failover to the other node?

Thanks for all the help so far!

Darren
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Bug in MailTo in heartbeat-resources-2.99.2-6.1 and long failover duration with pingd

2008-12-04 Thread Joerg Streckfuss
 
> IPaddr does not check reachability in the net.
> There is a parameter in the ha.cf called deadping. It is 30s by default. So 
> changes in the reachability take 30s + damping time to get active. Set this 
> paramter to 10s (or lower) according to your needs. You will have to restart 
> heartbeat.
> 
> Greetings,

Hi, I tested to set the value of 'deadping' to 3 with a 'dampen' value of 1

ha.cf:


initdead 30
deadtime 2
keepalive 800ms
warntime 1800ms
deadping 3


xml snippet for resource pingd:


   
  
  
   
   
  

  
  
 
 
 
  
   


But the cluster still behaves as before. After the master node lost its
connection to one ping node, the failover occurs at least 70 seconds later.

In addition it would be great to reduce the ping testing interval. it still
sticks at 10 seconds. As you can see I tried to set it to one seconds.

When i look into meta-data of ocf-RA pingd i can found this entries:









I guess the op attribute 'interval' means something like pingd test interval,
right?

As far as I can remember in heartbeat version 2.1.4 the default test interval
for pingd was 1 second. 

Greetings,

Joerg

-- 
Dipl.-Ing. (FH) Joerg Streckfuss, Phone: +49 40 808077-631

DFN-CERT Services GmbH, https://www.dfn-cert.de/, Phone  +49 40 808077-555
Sitz / Register: Hamburg, AG Hamburg, HRB 88805,  Ust-IdNr.:  DE 232129737
Sachsenstraße 5, 20097 Hamburg/Germany, CEO: Dr. Klaus-Peter Kossakowski
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Bug in MailTo in heartbeat-resources-2.99.2-6.1 and long failover duration with pingd

2008-12-04 Thread Joerg Streckfuss

Okey, this did the job.

In /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries I had to replace

: ${MAILCMD:=}

with

: ${MAILCMD:=/usr/bin/mail}

Thanks, Joerg

> Hi,
> 
> On Wed, Dec 03, 2008 at 08:40:46AM +0100, Dominik Klein wrote:
> > There was a thread about this in November. Search the archives for thread 
> > "E-Mail Notification Problem by takeover". Iirc, it was a problem with the 
> > package which should be manually fixable by editing an included file.
> 
> Right. Though I'll fix MailTo to print a more sensible error
> message.
> 
> Thanks,
> 
> Dejan
> 
> > Regards
> > Dominik
> >
> > Joerg Streckfuss wrote:
> >> Hi list,
> >>
> >> Im using heartbeat-2.99.2-6.1 with pacemaker-1.0.1-3.1 in a testsetup for a
> >> firewall cluster. This setup has two nodes each with two physical 
> >> interfaces
> >> eth0 and eth1.
> >>
> >> I configured two resources of the typ IPaddr2 and one MailTo resource to 
> >> get an
> >> Email when a failover occurs. I put these resources into one group to 
> >> ensure
> >> that the resources will always run on one node. 
> >> The problem is each time when I force a failover the resource MailTo 
> >> produces
> >> the following report in /var/log/ha-log and unfortunately no email will be 
> >> send.
> >>
> >> 
> >> RA
> >> output:(MailTo-admin:start:stderr) 
> >> /usr/lib/ocf/resource.d//heartbeat/MailTo:
> >> line 86: -s: command not found 
> >>
> >> For me it looks like MailTo has no valid $MAILCMD. 
> >> Here is my xml snippet.
> >>
> >> 
> >>  >> provider="heartbeat">
> >>
> >>>> timeout="3s" role="Started" on-fail="restart"/>
> >>
> >>
> >> 
> >> 
> >>  >> value="24"/>
> >>  >> value="VIP"/>
> >>  
> >>
> >>  >> provider="heartbeat">
> >>
> >>>> timeout="3s" role="Started" on-fail="restart"/>
> >>
> >>
> >>>> value="192.168.2.50"/>
> >>  >> value="eth1"/>
> >>  >> name="cidr_netmask" value="24"/>
> >>  >> name="iflabel" value="VIP"/>
> >>
> >> 
> >>  >> provider="heartbeat">
> >>
> >>   
> >>   
> >>
> >> 
> >> 
> >>
> >> In additionen, when ping packets from my configured pingd on the prefered
> >> maste node stay away, the duration for a complete failover takes about 75
> >> seconds. This is a long time and not reasonable for a firewall cluster.
> >> I tried to set the monitor option interval from pingd to 3 seconds. But 
> >> this
> >> changed nothing. The interval for ping packets remains at 10 seconds.
> >> Are there better places, like adding another resource to monitor the link
> >> status of the network interfaces to achieve a faster failover? I believe
> >> IPaddr2 won't check network link status, right?
> >>
> >>
> >> Here is my xml snippet for pingd
> >>
> >> 
> >>
> >>>> value="2"/>
> >>>> name="clone_node_max" value="1"/>
> >>
> >>
> >>   
> >> 
> >>   >> value="200"/>
> >>  
> >>   >> value="default-gateway switch1 switch2"/>
> >>   
> >>
> >> 
> >>
> >> 
> >>
> >>   
> >>
> >> 
> >>
> >> Thanks in advance,
> >>
> >>Joerg
> >> ___
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Problem with mailman

2008-12-04 Thread Dejan Muhamedagic
Hi,

On Wed, Dec 03, 2008 at 12:14:52PM -0800, Syn, Joonho wrote:
> Here is the output when I try a re-probe of services.
> 
> Dec  3 12:10:09 mail1 crm_resource: [23866]: info: Invoked: crm_resource -P 
> -H mail1
> Dec  3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource 
> masterHttpd was active at shutdown.  You may ignore this error if it is 
> unmanaged.
> Dec  3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource masterFS 
> was active at shutdown.  You may ignore this error if it is unmanaged.
> Dec  3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource 
> master_IPaddr was active at shutdown.  You may ignore this error if it is 
> unmanaged.
> Dec  3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource 
> masterPostfix was active at shutdown.  You may ignore this error if it is 
> unmanaged.
> Dec  3 12:10:09 mail1 crmd: [4914]: ERROR: verify_stopped: Resource 
> masterDovecot was active at shutdown.  You may ignore this error if it is 
> unmanaged.
> Dec  3 12:10:09 mail1 crm_resource: [23866]: WARN: main: here i am - 3

This output doesn't say much. Anyhow, you should check if your
mailman init script is LSB compliant. Take a look here:

http://www.linux-ha.org/LSBResourceAgent

for details.

Thanks,

Dejan

> 
> On 12/2/08 3:52 PM, "Alex Strachan" <[EMAIL PROTECTED]> wrote:
> 
> Any output in /var/log/messages for when HA tries to start masterMailman?
> e.g.
> 
> Dec  2 17:55:12 itbaims lrmd: [3790]: info: RA output:
> (resource_its_fild:start:stdout) Warning: no access to tty (Bad file
> descriptor). Thus no job control in this shell.
> 
> This is the output from a script.  Enable debug on the shell script, then
> the output will be captured by HA.
> 
> 
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:linux-ha-
> > [EMAIL PROTECTED] On Behalf Of Syn, Joonho
> > Sent: Wednesday, 3 December 2008 9:10 AM
> > To: Linux-HA mailing list
> > Subject: [Linux-HA] Problem with mailman
> >
> > Hello List,
> > I'm a newcomer to mailman and I'm having an issue where mailman does not
> > start.  I can start the process manually using the service command in
> > RHEL5 but attempting to start it via the crm_resource command and/or the
> > gui seemingly has no effect.  Looking at my cib.xml I don't see anything
> > particularly wrong but I'm hoping some better trained eyes can help me to
> > indentify my issue.
> >
> > 
> > Last updated: Mon Dec  1 15:31:51 2008
> > Current DC: mail1 (bad25385-cb55-44d7-9f66-22a25e3f30e7)
> > 2 Nodes configured.
> > 1 Resources configured.
> > 
> >
> > Node: mail2 (6b7231f6-ad2e-4879-87b9-8d38bff5420b): online
> > Node: mail1 (bad25385-cb55-44d7-9f66-22a25e3f30e7): online
> >
> > Resource Group: mastermailer_group
> > masterFS(heartbeat::ocf:Filesystem):Started mail1
> > master_IPaddr(heartbeat::ocf:IPaddr2):Started mail1
> > masterPostfix(lsb:postfix):Started mail1
> > masterDovecot(lsb:dovecot):Started mail1
> > masterHttpd(lsb:httpd):Started mail1
> > masterMailman(lsb:mailman):Stopped
> 
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] New Kernel - Can Not Compile DRBD

2008-12-04 Thread Dejan Muhamedagic
Hi,

On Wed, Dec 03, 2008 at 01:11:54PM -0700, [EMAIL PROTECTED] wrote:
> I am running drbd-8.0.8 and was forced to install a new kernel on a
> fedora 8 machine (local security policy).  Before I booted off the new
> kernel, I am trying to
> compile and install the new drbd.ko module.  

I think you should post this to the drbd related lists.

Thanks,

Dejan
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Problem with LSB init script when monitoring

2008-12-04 Thread Dejan Muhamedagic
Hi,

On Wed, Dec 03, 2008 at 05:09:03PM +, Darren Mansell wrote:
> Hello everyone.
> 
> I am trying to run a 2 node cluster with 1 shared IP for Tomcat. This
> works fine until I set the monitor operation inside the Tomcat resource
> where the CRM keeps trying to restart Tomcat over and over infinitely.
> 
> Without the monitor operation in the CIB it won't keep trying to restart
> Tomcat but if I stop it manually it doesn't automatically get started
> again.
> 
> I tried the tomcat OCF RA but there are lots of incorrect values hard
> coded in so I edited up an init script to what I thought was LSB
> compatible.

It should be fixed then. Can you provide a list of stuff which is
wrong on your platform/distribution?

Thanks,

Dejan


> This is the init script:
> 
> 
> 
> #!/bin/sh 
>   
> 
> # description: Start or stop the Tomcat server
>   
> 
> # 
>   
> 
> ### BEGIN INIT INFO   
>   
> 
> # Provides: tomcat
>   
> 
> # Required-Start: $network $syslog
>   
> 
> # Required-Stop: $network
> # Default-Start: 3
> # Default-Stop: 0
> # Description: Start or stop the Tomcat server
> ### END INIT INFO
> 
> RETVAL=$?
> NAME=tomcat
> export JRE_HOME=/opt/java
> export CATALINA_HOME=/opt/$NAME
> export CATALINA_BASE=/opt/$NAME
> export JAVA_HOME=/opt/java
> 
> check_running() {
> NAME=$1
> LINES=`ps -ef | grep java | grep opt | grep $NAME | grep -v grep | wc 
> -l `
> [ $LINES -gt 0 ] && echo "yes"
> }
> 
> case "$1" in
> 'start')
> RUNNING=`check_running $NAME`
> [ "$RUNNING" ] && exit 0
> if [ -f $CATALINA_HOME/bin/startup.sh ];
> then
> echo $"Starting Tomcat"
> $CATALINA_HOME/bin/startup.sh
> fi
> ;;
> 'stop')
> RUNNING=`check_running $NAME`
> [ ! "$RUNNING" ] && exit 0
> if [ -f $CATALINA_HOME/bin/shutdown.sh ];
> then
> echo $"Stopping Tomcat"
> $CATALINA_HOME/bin/shutdown.sh
> fi
> ;;
> 'restart')
> $0 stop
> sleep 15
> $0 start
> ;;
> 'status')
> RUNNING=`check_running $NAME`
> [ "$RUNNING" ] && exit 0 || exit 1;;
> *)
> echo
> echo $"Usage: $0 {start|stop}"
> echo
> exit 1;;
> 
> esac
> exit $RETVAL
> 
> 
> 
> 
> 
> This is my cib.xml
> 
> 
> 
> 
> 
>   num_peers="2" cib_feature_revision="1.3" crm_feature_set="2.0" epoch="125" 
> num_updates="82" cib-last-written="Wed Dec  3 16:45:56 2008" 
> ccm_transition="2" dc_uuid="ae4489bf-2c5d-4cfd-bf81-5e25b11932eb">
>  
>
>   
> 
>   
>   
> 
>  
>   
> 
>   
>   
> 
> value="2.1.3-node: a3184d5240c6e7032aef9cce6e5b7752ded544b3"/>
>
>  
>   
> 
>
>   
> 
>  
>   
> 
> 

[Linux-HA] monitor OCF script

2008-12-04 Thread lakshmipadmaja maddali
Hi All,

 If the return value of monitor action is other than 0 and 7, then
does it
1. Tries to start the resource in the primary node itself
or
2. Does it shift to secondary node

Awaiting your help!!

Padmaja.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] monitor action

2008-12-04 Thread Dominik Klein

Hi Sony


If the return value of monitor action is other than 0 and 7, then
does it
1. Tries to start the resource in the primary node itself
or
2. Does it shift to secondary node

Awaiting your help!!


That totally depends on your configuration.

Tell us what you'd like the cluster to do and which version you use.

Regards
Dominik
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] routing issue on cluster node

2008-12-04 Thread Nick

Gabriel Bermudez ha scritto:

Hi Nick,

I followed your advise but it didn't work out.  This are the content 
of the files where I try to set the default gateway:


/etc/rc.local

ip route add default via xxx.xxx.xxx.xxx dev eth2
touch /var/lock/subsys/local

/etc/sysconfig/network-scripts/route-eth2

default via xxx.xxx.xxx.xxx
You can add the dev eth2 to the end of that comand, but it's not 
required, as long as there's an interface with an IP in the same subnet 
already on that network, which should be eth2 (although now you mention 
that there isn't one assigned...).


Do you get any errors in /var/log/messages?



/etc/sysconfig/network

NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=gw1.mynetwork.net
GATEWAY=xxx.xxx.xxx.xxx

none of this seems wrong..


I don't think it has something to do but the eth2 doesn't have an ip 
assign to it


/etc/sysconfig/network-scripts/ifcfg-eth2

DEVICE=eth2
BOOTPROTO=none
HWADDR=00:15:17:3a:fa:be
ONBOOT=yes
TYPE=Ethernet
hmm, it's probably not going to work without having an interface with a 
source address..


The logs should give you more information, or at least an error to start.



Thanks for your help.


Nick escribió:

Gabriel Bermudez ha scritto:

Hi,

I'm trying to configure a high availability router for my internal 
network.  I'm able to set both private and public ips on eth0:0 and 
eth2:0 respectively with heartbeat.  The gateway is configured using 
the /etc/sysconfig/network file (using centos 5.2)


NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=gw1.mynetwork.net
GATEWAY=xxx.xxx.xxx.xxx

but for some reason it doesn't persists when I reboot the server.  I 
have to manually use the route command to  restore the default gateway


route add -net 0.0.0.0 gw xxx.xxx.xxx.xxx

I've also tried to set up the route in "/etc/rc.local" and in 
"/etc/sysconfig/network-scripts/route-eth2" files with no success.  
I know that this is not a heartbeat related problem but I've tried 
to google this with no success, so any help on this issue would be 
greatly appreciated.


Thanks in advanced,

Gabriel.
Have a look at the ip comand (ip route in specific),  and add instead 
of all those lines add:


default via xxx.xxx.xxx.xxx

in the route-eth2 file.

ip is much simpler, and if you ever decide to do policy based 
routing, much more powerful.


Nick





___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] monitor action

2008-12-04 Thread sony thapa
Hi All,

If the return value of monitor action is other than 0 and 7, then
does it
1. Tries to start the resource in the primary node itself
or
2. Does it shift to secondary node

Awaiting your help!!

Regards,
Sony.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems