On 19/09/11 14:53, Guillaume Bettayeb wrote: > Hi Andrew, > > Yes, I tried with both "/etc/init.d/apache2 start" and "service apache2 > start". > > I suppose I have to test with the path defined in crm configure show, which > might be something like /usr/bin/apache2 or something like that (I am not > home just now I can't check..)
Try running something like: OCF_ROOT=/usr/lib/ocf \ OCF_RESKEY_configfile=/etc/apache2/apache2.conf \ OCF_RESKEY_httpd=/usr/sbin/apache2 \ /usr/lib/ocf/resource.d/heartbeat/apache start Or, if you really want to see what the RA is doing: OCF_ROOT=/usr/lib/ocf \ OCF_RESKEY_configfile=/etc/apache2/apache2.conf \ OCF_RESKEY_httpd=/usr/sbin/apache2 \ sh -x /usr/lib/ocf/resource.d/heartbeat/apache start Note those OCF_RESKEY_* vars need to match what you set for the resource parameters in the crm config. See also: http://www.clusterlabs.org/wiki/Debugging_Resource_Failures Regards, Tim > I will have a look. Thanks very much for your help. > > G > > On 19 September 2011 00:41, Andrew Beekhof<and...@beekhof.net> wrote: > >> On Fri, Sep 16, 2011 at 11:22 PM, Guillaume Bettayeb >> <guillaume1...@gmail.com> wrote: >>> Hi all, >>> >>> I have been through my Apache configuration again and I confirm Apache >> works >>> fine. >> >> I assume you're testing by running "/etc/init.d/apache2 start" or >> something similar? >> This is not what the cluster executes to start apache and therefor the >> test doesn't help much. >> >>> >>> I have changed corosync config file to dump all the corosync log into >>> /var/log/corosync/corosync.log >>> >>> then I have restarted corosync and the log file has the following : >>> >>> http://pastebin.com/BFVVfxCh >>> >>> Could the following lines being the consequence of the error ? >> >> A consequence yes, but not the cause. >> >>> >>> root@node1:/var/log/corosync# cat corosync-apache.log | grep "INFINITY >>> times" >>> Sep 16 14:11:13 node1 pengine: [7103]: info: get_failcount: apache has >>> failed INFINITY times on node1 >>> Sep 16 14:11:15 node1 pengine: [7103]: info: get_failcount: apache has >>> failed INFINITY times on node1 >>> Sep 16 14:11:16 node1 pengine: [7103]: info: get_failcount: apache has >>> failed INFINITY times on node1 >>> Sep 16 14:11:18 node1 pengine: [7103]: info: get_failcount: apache has >>> failed INFINITY times on node1 >>> Sep 16 14:11:18 node1 pengine: [7103]: info: get_failcount: apache has >>> failed INFINITY times on node2 >>> Sep 16 14:11:18 node1 pengine: [7103]: info: get_failcount: apache has >>> failed INFINITY times on node1 >>> Sep 16 14:11:18 node1 pengine: [7103]: info: get_failcount: apache has >>> failed INFINITY times on node2 >>> >>> >>> Thanks again, >>> >>> G >>> >>> On 16 September 2011 11:00, Guillaume Bettayeb<guillaume1...@gmail.com >>> wrote: >>> >>>> Hi Dejan, >>>> >>>> I am not sure because Apache runs like a charm when not started via >>>> Corosync but I don't know. >>>> >>>> Thanks, >>>> >>>> Guillaume >>>> >>>> >>>> On 16 September 2011 09:01, Dejan Muhamedagic<deja...@fastmail.fm> >> wrote: >>>> >>>>> Hi, >>>>> >>>>> On Fri, Sep 16, 2011 at 03:01:12AM +0100, Guillaume Bettayeb wrote: >>>>>> Hi all, >>>>>> >>>>>> >>>>>> I am still struggling to run apache in corosync. My Apache service is >> OK >>>>> and >>>>>> runs fine if I start it manually, I have mod_status enabled on both >>>>> nodes >>>>>> OK. Ulrich made a good point earlier by asking if the cgisock ls -l >>>>>> /var/run/apache2 was used by another process but that's not the case. >>>>>> >>>>>> I've restarted corosync after midnight and have added the full syslog >>>>> here : >>>>>> >>>>>> http://pastebin.com/LXmLUu3W >>>>>> >>>>>> >>>>>> I keep digging on google to find out why is it not working but any >> help >>>>>> would be greatly appreciated..is anyone else runs an Ubuntu cluster >> with >>>>>> Apache here by any chance ? >>>>> >>>>> Perhaps, but since this seems to be an issue with apache on >>>>> Ubuntu I guess it's best to enquire in some ubuntu forum. >>>>> >>>>> Thanks, >>>>> >>>>> Dejan >>>>> >>>>>> Thanks, >>>>>> >>>>>> Guillaume >>>>>> >>>>>> >>>>>> On 15 September 2011 16:04, Guillaume Bettayeb< >> guillaume1...@gmail.com >>>>>> wrote: >>>>>> >>>>>>> Hi Ulrich, >>>>>>> >>>>>>> nope, there's nothing in there at the moment : >>>>>>> root@node1:/home/user# ls -l /var/run/apache2 >>>>>>> ls: cannot access /var/run/apache2: No such file or directory >>>>>>> >>>>>>> >>>>>>> it looks like that error comes up when the cluster starts apache. >>>>>>> >>>>>>> >>>>>>> Guillaume >>>>>>> >>>>>>> On 15 September 2011 16:00, Ulrich Windl< >>>>>>> ulrich.wi...@rz.uni-regensburg.de> wrote: >>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> What about "ls -l /var/run/apache2"? Any cgisock* there? >> Permissions >>>>> of >>>>>>>> the directory OK? Who is using that cgisock? >>>>>>>> >>>>>>>> Ulrich >>>>>>>> >>>>>>>> >>>>>>>>>>> Guillaume Bettayeb<guillaume1...@gmail.com> schrieb am >>>>> 15.09.2011 um >>>>>>>> 16:06 in >>>>>>>> Nachricht >>>>>>>> <CAG6QY=3DLP6S1t=+ >> jsq7qv+rmp5qqq02crnbea38nzbgvbd...@mail.gmail.com >>>>>> : >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> Thanks for your advice, I have double checked mod_status in >> Apache >>>>> and >>>>>>>> its >>>>>>>>> definitely enabled on both nodes : >>>>>>>>> ls /etc/apache2/mods-enabled >>>>>>>>> alias.conf authz_user.load dir.conf >>>>> reqtimeout.conf >>>>>>>>> alias.load autoindex.conf dir.load >>>>> reqtimeout.load >>>>>>>>> auth_basic.load autoindex.load env.load >>>>> setenvif.conf >>>>>>>>> authn_file.load cgid.conf mime.conf >>>>> setenvif.load >>>>>>>>> authz_default.load cgid.load mime.load >>>>> status.conf >>>>>>>>> authz_groupfile.load deflate.conf negotiation.conf >>>>> status.load >>>>>>>>> authz_host.load deflate.load negotiation.load >>>>>>>>> >>>>>>>>> I have checked the status page http://node/server-status and I >> can >>>>> see >>>>>>>> the >>>>>>>>> status page ok. The mod_status is enabled on my node and runs >> fine. >>>>>>>>> >>>>>>>>> I had a look at my apache log as you advised but I can't see >> Apache >>>>>>>> moaning >>>>>>>>> about a specific error, apart from multiple stops and restarts >> due >>>>> to my >>>>>>>>> tests : >>>>>>>>> >>>>>>>>> [Thu Sep 15 14:20:37 2011] [notice] caught SIGTERM, shutting >> down >>>>>>>>> [Thu Sep 15 14:20:38 2011] [notice] Apache/2.2.17 (Ubuntu) >>>>> configured -- >>>>>>>>> resuming normal operations >>>>>>>>> [Thu Sep 15 14:20:38 2011] [error] (2)No such file or directory: >>>>>>>> Couldn't >>>>>>>>> bind unix domain socket /var/run/apache2/cgisock.4278 >>>>>>>>> [Thu Sep 15 14:20:39 2011] [notice] caught SIGTERM, shutting >> down >>>>>>>>> >>>>>>>>> That's for the primary node. It looks like Corosync shuts down >>>>> Apache. >>>>>>>>> On the Apache log file of the second node, I see the following : >>>>>>>>> >>>>>>>>> [Thu Sep 15 14:18:27 2011] [notice] Apache/2.2.17 (Ubuntu) >>>>> configured -- >>>>>>>>> resuming normal operations >>>>>>>>> [Thu Sep 15 14:18:27 2011] [error] (2)No such file or directory: >>>>>>>> Couldn't >>>>>>>>> bind unix domain socket /var/run/apache2/cgisock.1338 >>>>>>>>> [Thu Sep 15 14:18:28 2011] [crit] cgid daemon failed to >> initialize >>>>>>>>> >>>>>>>>> I still have errors but the http service keep on running, no >>>>> SIGTERM. >>>>>>>>> >>>>>>>>> And then my node status is : >>>>>>>>> >>>>>>>>> Online: [ node1 node2 ] >>>>>>>>> >>>>>>>>> Resource Group: group1 >>>>>>>>> failover-ip (ocf::heartbeat:IPaddr): Started node1 >>>>>>>>> apache (ocf::heartbeat:apache): Stopped >>>>>>>>> >>>>>>>>> Failed actions: >>>>>>>>> apache_start_0 (node=node2, call=6, rc=1, status=complete): >>>>> unknown >>>>>>>>> error >>>>>>>>> apache_monitor_0 (node=node1, call=3, rc=1, >> status=complete): >>>>>>>> unknown >>>>>>>>> error >>>>>>>>> apache_start_0 (node=node1, call=7, rc=1, status=complete): >>>>> unknown >>>>>>>>> error >>>>>>>>> >>>>>>>>> >>>>>>>>> of interest, some information found in var/log/syslog on node1 : >>>>>>>>> >>>>>>>>> Sep 15 02:32:56 node1 crmd: [710]: info: do_state_transition: >> All 2 >>>>>>>> cluster >>>>>>>>> nodes are eligible to run resources. >>>>>>>>> Sep 15 02:32:56 node1 apache[928]: INFO: apache not running >>>>>>>>> Sep 15 02:32:56 node1 apache[928]: INFO: waiting for apache >>>>>>>>> /etc/apache2/apache2.conf to come up >>>>>>>>> >>>>>>>>> Sep 15 02:32:58 node1 apache[928]: INFO: Killing apache PID 995 >>>>>>>>> Sep 15 02:32:59 node1 lrmd: [707]: info: RA output: >>>>>>>> (apache:start:stderr) >>>>>>>>> kill: 833: >>>>>>>>> Sep 15 02:32:59 node1 lrmd: [707]: info: RA output: >>>>>>>> (apache:start:stderr) No >>>>>>>>> such process >>>>>>>>> Sep 15 02:32:59 node1 lrmd: [707]: info: RA output: >>>>>>>> (apache:start:stderr) >>>>>>>>> Sep 15 02:32:59 node1 apache[928]: INFO: Killing apache PID 995 >>>>>>>>> Sep 15 02:32:59 node1 apache[928]: INFO: apache stopped. >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: info: process_lrm_event: LRM >>>>>>>> operation >>>>>>>>> apache_start_0 (call=6, rc=1, cib-update=37, confirmed=true) >>>>> unknown >>>>>>>> error >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: WARN: status_from_rc: Action >> 8 >>>>>>>>> (apache_start_0) on node1 failed (target: 0 vs. rc: 1): Error >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: WARN: update_failcount: >> Updating >>>>>>>>> failcount for apache on node1 after failed start: rc=1 >>>>> (update=INFINITY, >>>>>>>>> time=1316050379) >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: info: abort_transition_graph: >>>>>>>>> match_graph_event:272 - Triggered transition abort (complete=0, >>>>>>>>> tag=lrm_rsc_op, id=apache_start_0, >>>>>>>>> magic=0:1;8:3:0:a4e41810-3e8f-439a-9b92-489edf657291, >> cib=0.172.10) >>>>> : >>>>>>>> Event >>>>>>>>> failed >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: info: update_abort_priority: >>>>> Abort >>>>>>>>> priority upgraded from 0 to 1 >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: info: update_abort_priority: >>>>> Abort >>>>>>>> action >>>>>>>>> done superceeded by restart >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: info: match_graph_event: >> Action >>>>>>>>> apache_start_0 (8) confirmed on node1 (rc=4) >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: info: run_graph: >>>>>>>>> ==================================================== >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: notice: run_graph: Transition >> 3 >>>>>>>>> (Complete=3, Pending=0, Fired=0, Skipped=4, Incomplete=0, >>>>>>>>> Source=/var/lib/pengine/pe-input-247.bz2): Stopped >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: info: te_graph_trigger: >>>>> Transition 3 >>>>>>>> is >>>>>>>>> now complete >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: info: do_state_transition: >> State >>>>>>>>> transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ >> input=I_PE_CALC >>>>>>>>> cause=C_FSA_INTERNAL origin=notify_crmd ] >>>>>>>>> Sep 15 02:32:59 node1 crmd: [710]: info: do_state_transition: >> All 2 >>>>>>>> cluster >>>>>>>>> nodes are eligible to run resources. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Here is my crm configure show, is there anything I can change >> there >>>>> ? >>>>>>>>> >>>>>>>>> >>>>>>>>> root@node1:/home/user# crm configure show >>>>>>>>> node node1 \ >>>>>>>>> attributes standby="off" >>>>>>>>> node node2 \ >>>>>>>>> attributes standby="off" >>>>>>>>> primitive apache ocf:heartbeat:apache \ >>>>>>>>> params configfile="/etc/apache2/apache2.conf" >>>>> httpd="/usr/sbin/apache2" >>>>>>>> \ >>>>>>>>> op start interval="10" timeout="40s" \ >>>>>>>>> op stop interval="10" timeout="60s" \ >>>>>>>>> op monitor interval="5s" >>>>>>>>> primitive failover-ip ocf:heartbeat:IPaddr \ >>>>>>>>> params ip="192.168.0.105" \ >>>>>>>>> op monitor interval="5s" >>>>>>>>> group group1 failover-ip apache >>>>>>>>> location cli-prefer-failover-ip failover-ip \ >>>>>>>>> rule $id="cli-prefer-rule-failover-ip" inf: #uname eq node1 >>>>>>>>> property $id="cib-bootstrap-options" \ >>>>>>>>> dc-version="1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ >>>>>>>>> cluster-infrastructure="openais" \ >>>>>>>>> expected-quorum-votes="2" \ >>>>>>>>> stonith-enabled="false" \ >>>>>>>>> no-quorum-policy="ignore" >>>>>>>>> >>>>>>>>> >>>>>>>>> Thank you for your help, >>>>>>>>> >>>>>>>>> Guillaume >>>>>>>>> >>>>>>>>> >>>>>>>>> On 13 September 2011 08:29, Tim Serong<tser...@suse.com> >> wrote: >>>>>>>>> >>>>>>>>>> On 13/09/11 00:39, Guillaume Bettayeb wrote: >>>>>>>>>>> Hi there, >>>>>>>>>>> >>>>>>>>>>> This is my first post on this list, so hello everybody :) >>>>>>>>>>> >>>>>>>>>>> I am currently testing the fun of Linux HA Clustering (just >> for >>>>>>>>>>> personal interest) >>>>>>>>>>> and I have successfully set up a tiny ubuntu virtualbox 2 >> nodes >>>>>>>>>>> cluster with Ip failover and Apache running as resources. >>>>>>>>>>> >>>>>>>>>>> Right after the install, I tried to move the resources from >> a >>>>> node >>>>>>>> to >>>>>>>>>>> the other (command standby) an everything worked like a >> charm. >>>>>>>>>>> Then I tried some failure tests, and started with a simple >>>>>>>>>>> /etc/init.d/networking stop on one node, noticed that the >> other >>>>> node >>>>>>>>>>> took ownership of the resources automatically, all was fine. >>>>>>>>>>> >>>>>>>>>>> Then I have rebooted the nodes just to see how they would >>>>> restart >>>>>>>> the >>>>>>>>>>> cluster, and since I have the following error : >>>>>>>>>>> >>>>>>>>>>> apache_start_0 (node=node1, call=8, rc=1, status=complete): >>>>> unknown >>>>>>>> error >>>>>>>>>>> >>>>>>>>>>> For reading convenience, my outputs are available at >>>>>>>>>>> http://pastebin.com/w1J4TWaG >>>>>>>>>>> Just to clarify, that's : >>>>>>>>>>> - crm configure show command >>>>>>>>>>> - crm_mon status >>>>>>>>>>> - All relevant information into my /var/log/syslog (although >> I >>>>> was >>>>>>>> not >>>>>>>>>>> sure what to look at, I never used corosync before) >>>>>>>>>>> >>>>>>>>>>> I have read on an older post that the apache error usually >> have >>>>>>>>>>> something to do with either timeout or mod status. >>>>>>>>>>> As you can see on my pastebin, my timeout values are ok : >>>>>>>>>>> op stop interval="60s" timeout="120" \ >>>>>>>>>>> op start interval="60s" timeout="120" \ >>>>>>>>>>> >>>>>>>>>>> as for mod_status it's already enabled in Apache : >>>>>>>>>>> root@node1:/etc/apache2# a2enmod status >>>>>>>>>>> Module status already enabled >>>>>>>>>>> >>>>>>>>>>> Have I done anything wrong or is there anything else I >> should >>>>>>>>>> check/configure ? >>>>>>>>>>> >>>>>>>>>>> Any help with this matter would be greatly appreciated :) >>>>>>>>>> >>>>>>>>>> On a punt, it's probably mod_status. Check your Apache logs >> at >>>>> the >>>>>>>> time >>>>>>>>>> the start failed. If it's whining about a 403 or 404 for >>>>>>>> /server-status >>>>>>>>>> (or similar), you need to fix that in your Apache config. >>>>>>>>>> >>>>>>>>>> HTH, >>>>>>>>>> >>>>>>>>>> Tim >>>>>>>>>> -- >>>>>>>>>> Tim Serong >>>>>>>>>> Senior Clustering Engineer >>>>>>>>>> SUSE >>>>>>>>>> tser...@suse.com >>>>>>>>>> _______________________________________________ >>>>>>>>>> Linux-HA mailing list >>>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Linux-HA mailing list >>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Linux-HA mailing list >>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Linux-HA mailing list >>>>>> Linux-HA@lists.linux-ha.org >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>> See also: http://linux-ha.org/ReportingProblems >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> Linux-HA@lists.linux-ha.org >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>> >>>> >>> _______________________________________________ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Tim Serong Senior Clustering Engineer SUSE tser...@suse.com _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems