Hi,

On Tue, Apr 22, 2008 at 01:34:58PM +0000, [EMAIL PROTECTED] wrote:
> Ok, as a better test...I stopped squid on ha-1 and quickly
> modified the squid.conf file with a "Bungled" command that
> would prevent squid from starting (the init.d/squid is back to
> normal)
> 
> When heartbeat checks to see if squid is running, its not. It
> tries to restart squid and fails because of the error in the
> config. No squid.pid is made, and no squid process is running.
> 
> crm_mon shows squid as down on ha-1 , but after it tries to
> restart it and fails crm_mon shows that it is running on ha-1,
> even though it is not. Something in my config somewhere is
> making Heartbeat restart squid and not seeing the process
> running and thinks it is. No failover is being done.

Looks like your RA is not behaving as it should. Did you check
that it's managing in all these situations to return proper exit
codes?

Thanks,

Dejan

> BTW, thanks for all the replies so far, again I am new but
> slowly getting it.

> 
> -------------- Original message -------------- 
> From: Dominik Klein <[EMAIL PROTECTED]> 
> 
> > Nick Duda wrote: 
> > > I rename the restart script for squid. 
> > 
> > Your OCF Script or your /etc/init.d script? 
> > 
> > > My current setup (based on 
> > > examples on the web) show that if squid fails on the current runing 
> > > server it will try to restart itself. If restart fails it will failover. 
> > > So basically I am trying to make a test case scenario that if the squid 
> > > startup script in /etc/init.d got deleted 
> > 
> > Ah, your /etc/init.d script. 
> > 
> > Okay, look at your OCF script, what it does when /etc/init.d/squid is 
> > not there. 
> > 
> > ----------- 
> > INIT_SCRIPT=/etc/init.d/squid 
> > 
> > case "$1" in 
> > start) 
> > ${INIT_SCRIPT} start > /dev/null 2>&1 && exit || exit 1 
> > ;; 
> > 
> > stop) 
> > ${INIT_SCRIPT} stop > /dev/null 2>&1 && exit || exit 1 
> > ;; 
> > 
> > status) 
> > ${INIT_SCRIPT} status > /dev/null 2>&1 && exit || exit 1 
> > ;; 
> > 
> > monitor) 
> > # Check if Ressource is stopped 
> > ${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7 
> > 
> > # Otherwise check services (XXX: Maybe loosen retry / 
> > timeout) 
> > wget -o /dev/null -O /dev/null -T 1 -t 1 
> > http://localhost:3128/ && exit || exit 1 
> > ;; 
> > 
> > meta-data) 
> > -------------- 
> > 
> > So for the next monitor operation, it will exec 
> > "${INIT_SCRIPT} status > /dev/null 2>&1 || exit 7" 
> > 
> > This will propably return 7. So the cluster thinks your resource is 
> > stopped. As it was running before (I guess?), the cluster will now try 
> > to stop and start it. 
> > 
> > Stop calls 
> > "stop > /dev/null 2>&1 && exit || exit 1" 
> > 
> > This will return 1. So the stop operation failed. 
> > 
> > With stonith, your node would be rebooted now. I don't see a stonith 
> > device, so the resource goes "unmanaged". 
> > 
> > I think what you see is intended. 
> > 
> > Regards 
> > Dominik 
> > 
> > > and squid crashed it should 
> > > failover to the other box.....its not. 
> > > 
> > > Dominik Klein wrote: 
> > >> Nick Duda wrote: 
> > >>> (sorry for the long email, but all my configs are here to view) 
> > >>> 
> > >>> I posted before about HA with 2 squid servers. It's just about done, 
> > >>> but stumbling on something. Everytime i manually cause something to 
> > >>> happen in hopes to see it failover, it doesnt. For example, I get 
> > >>> crm_mon to show everything as I want it, and when I kill squid (and 
> > >>> prevent the xml from restarting it) it just goes into a failed 
> > >>> state...more below. Anyone see anything wrong with my configs? 
> > >>> 
> > >>> Server #1 
> > >>> Hostname: ha-1 
> > >>> eth0 - lan (192.168.95.1) 
> > >>> eth1 - xover to eth1 on other server 
> > >>> 
> > >>> Server #2 
> > >>> Hostname: ha-2 
> > >>> eth0 - lan (192.168.95.2) 
> > >>> eth1 - xover to eth1 on other server 
> > >>> 
> > >>> ha.cf on each server: 
> > >>> 
> > >>> bcast eth1 
> > >>> mcast eth0 239.0.0.2 694 1 0 
> > >>> node ha-1 ha-2 
> > >>> crm on 
> > >>> 
> > >>> Not using haresources because of crm 
> > >>> 
> > >>> Here is the output from crm_mon: 
> > >>> 
> > >>> ============ 
> > >>> Last updated: Mon Apr 21 15:44:53 2008 
> > >>> Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d) 
> > >>> 2 Nodes configured. 
> > >>> 1 Resources configured. 
> > >>> ============ 
> > >>> 
> > >>> Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online 
> > >>> Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online 
> > >>> 
> > >>> Resource Group: squid-cluster 
> > >>> ip0 (heartbeat::ocf:IPaddr2): Started ha-1 
> > >>> squid (heartbeat::ocf:squid): Started ha-1 
> > >>> 
> > >>> If squid stops on the current heartbeat serer, ha-1, it will restart 
> > >>> within 60sec...so the scripting is working. If i stop the squid 
> > >>> process and rename it in /etc/init.d/squid to something else, the 
> > >>> script wont be able to execute the squid start and should failover to 
> > >>> ha-2, but it doesnt, instead this appears (on both ha-1 and ha-2): 
> > >> 
> > >> What exactly do you "rename" and how? It's likely the cluster is 
> > >> behaving sane and you're just creating a testcase you don't understand. 
> > >> 
> > >> Regards 
> > >> Dominik 
> > >> 
> > >>> ============ 
> > >>> Last updated: Mon Apr 21 15:47:49 2008 
> > >>> Current DC: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d) 
> > >>> 2 Nodes configured. 
> > >>> 1 Resources configured. 
> > >>> ============ 
> > >>> 
> > >>> Node: ha-1 (2422b230-22f2-451b-aa95-0b783eccab8d): online 
> > >>> Node: ha-2 (1691d699-2a81-4545-8242-b00862431514): online 
> > >>> 
> > >>> Resource Group: squid-cluster 
> > >>> ip0 (heartbeat::ocf:IPaddr2): Started ha-1 
> > >>> squid (heartbeat::ocf:squid): Started ha-1 (unmanaged) FAILED 
> > >>> 
> > >>> Failed actions: 
> > >>> squid_stop_0 (node=ha-1, call=74, rc=1): Error 
> > >> _______________________________________________ 
> > >> Linux-HA mailing list 
> > >> Linux-HA@lists.linux-ha.org 
> > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> > >> See also: http://linux-ha.org/ReportingProblems 
> > >> 
> > > 
> > > _______________________________________________ 
> > > Linux-HA mailing list 
> > > Linux-HA@lists.linux-ha.org 
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> > > See also: http://linux-ha.org/ReportingProblems 
> > > 
> > 
> > 
> > -- 
> > 
> > IN-telegence GmbH & Co. KG 
> > Oskar-J?ger-Str. 125 
> > 50825 K?ln 
> > 
> > Registergericht K?ln - HRA 14064, USt-ID Nr. DE 194 156 373 
> > ph Gesellschafter: komware Unternehmensverwaltungsgesellschaft mbH, 
> > Registergericht K?ln - HRB 38396 
> > Gesch?ftsf?hrende Gesellschafter: Christian Pl?tke und Holger Jansen 
> > _______________________________________________ 
> > Linux-HA mailing list 
> > Linux-HA@lists.linux-ha.org 
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> > See also: http://linux-ha.org/ReportingProblems 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to