Re: [Linux-HA] Antw: Re: Apache error on all nodes

Guillaume Bettayeb Sun, 18 Sep 2011 21:53:43 -0700

Hi Andrew,

Yes, I tried with both "/etc/init.d/apache2 start" and "service apache2
start".


I suppose I have to test with the path defined in crm configure show, which
might be something like /usr/bin/apache2 or something like that (I am not
home just now I can't check..)

I will have a look. Thanks very much for your help.

G

On 19 September 2011 00:41, Andrew Beekhof <and...@beekhof.net> wrote:

> On Fri, Sep 16, 2011 at 11:22 PM, Guillaume Bettayeb
> <guillaume1...@gmail.com> wrote:
> > Hi all,
> >
> > I have been through my Apache configuration again and I confirm Apache
> works
> > fine.
>
> I assume you're testing by running "/etc/init.d/apache2 start" or
> something similar?
> This is not what the cluster executes to start apache and therefor the
> test doesn't help much.
>
> >
> > I have changed corosync config file to dump all the corosync log into
> > /var/log/corosync/corosync.log
> >
> > then I have restarted corosync and the log file has the following :
> >
> > http://pastebin.com/BFVVfxCh
> >
> > Could the following lines being the consequence of the error ?
>
> A consequence yes, but not the cause.
>
> >
> > root@node1:/var/log/corosync# cat corosync-apache.log | grep "INFINITY
> > times"
> > Sep 16 14:11:13 node1 pengine: [7103]: info: get_failcount: apache has
> > failed INFINITY times on node1
> > Sep 16 14:11:15 node1 pengine: [7103]: info: get_failcount: apache has
> > failed INFINITY times on node1
> > Sep 16 14:11:16 node1 pengine: [7103]: info: get_failcount: apache has
> > failed INFINITY times on node1
> > Sep 16 14:11:18 node1 pengine: [7103]: info: get_failcount: apache has
> > failed INFINITY times on node1
> > Sep 16 14:11:18 node1 pengine: [7103]: info: get_failcount: apache has
> > failed INFINITY times on node2
> > Sep 16 14:11:18 node1 pengine: [7103]: info: get_failcount: apache has
> > failed INFINITY times on node1
> > Sep 16 14:11:18 node1 pengine: [7103]: info: get_failcount: apache has
> > failed INFINITY times on node2
> >
> >
> > Thanks again,
> >
> > G
> >
> > On 16 September 2011 11:00, Guillaume Bettayeb <guillaume1...@gmail.com
> >wrote:
> >
> >> Hi Dejan,
> >>
> >> I am not sure because Apache runs like a charm when not started via
> >> Corosync but I don't know.
> >>
> >> Thanks,
> >>
> >> Guillaume
> >>
> >>
> >> On 16 September 2011 09:01, Dejan Muhamedagic <deja...@fastmail.fm>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> On Fri, Sep 16, 2011 at 03:01:12AM +0100, Guillaume Bettayeb wrote:
> >>> > Hi all,
> >>> >
> >>> >
> >>> > I am still struggling to run apache in corosync. My Apache service is
> OK
> >>> and
> >>> > runs fine if I start it manually, I have mod_status enabled on both
> >>> nodes
> >>> > OK. Ulrich made a good point earlier by asking if the cgisock ls -l
> >>> > /var/run/apache2 was used by another process but that's not the case.
> >>> >
> >>> > I've restarted corosync after midnight and have added the full syslog
> >>> here :
> >>> >
> >>> > http://pastebin.com/LXmLUu3W
> >>> >
> >>> >
> >>> > I keep digging on google to find out why is it not working but any
> help
> >>> > would be greatly appreciated..is anyone else runs an Ubuntu cluster
> with
> >>> > Apache here by any chance ?
> >>>
> >>> Perhaps, but since this seems to be an issue with apache on
> >>> Ubuntu I guess it's best to enquire in some ubuntu forum.
> >>>
> >>> Thanks,
> >>>
> >>> Dejan
> >>>
> >>> > Thanks,
> >>> >
> >>> > Guillaume
> >>> >
> >>> >
> >>> > On 15 September 2011 16:04, Guillaume Bettayeb <
> guillaume1...@gmail.com
> >>> >wrote:
> >>> >
> >>> > > Hi Ulrich,
> >>> > >
> >>> > > nope, there's nothing in there at the moment :
> >>> > > root@node1:/home/user# ls -l /var/run/apache2
> >>> > > ls: cannot access /var/run/apache2: No such file or directory
> >>> > >
> >>> > >
> >>> > > it looks like that error comes up when the cluster starts apache.
> >>> > >
> >>> > >
> >>> > > Guillaume
> >>> > >
> >>> > > On 15 September 2011 16:00, Ulrich Windl <
> >>> > > ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>> > >
> >>> > >> Hi!
> >>> > >>
> >>> > >> What about "ls -l /var/run/apache2"? Any cgisock* there?
> Permissions
> >>> of
> >>> > >> the directory OK? Who is using that cgisock?
> >>> > >>
> >>> > >> Ulrich
> >>> > >>
> >>> > >>
> >>> > >> >>> Guillaume Bettayeb <guillaume1...@gmail.com> schrieb am
> >>> 15.09.2011 um
> >>> > >> 16:06 in
> >>> > >> Nachricht
> >>> > >> <CAG6QY=3DLP6S1t=+
> jsq7qv+rmp5qqq02crnbea38nzbgvbd...@mail.gmail.com
> >>> >:
> >>> > >> > Hi all,
> >>> > >> >
> >>> > >> > Thanks for your advice, I have double checked mod_status in
> Apache
> >>> and
> >>> > >> its
> >>> > >> > definitely enabled on both nodes :
> >>> > >> > ls /etc/apache2/mods-enabled
> >>> > >> > alias.conf            authz_user.load  dir.conf
> >>>  reqtimeout.conf
> >>> > >> > alias.load            autoindex.conf   dir.load
> >>>  reqtimeout.load
> >>> > >> > auth_basic.load       autoindex.load   env.load
> >>>  setenvif.conf
> >>> > >> > authn_file.load       cgid.conf        mime.conf
> >>> setenvif.load
> >>> > >> > authz_default.load    cgid.load        mime.load
> >>> status.conf
> >>> > >> > authz_groupfile.load  deflate.conf     negotiation.conf
> >>>  status.load
> >>> > >> > authz_host.load       deflate.load     negotiation.load
> >>> > >> >
> >>> > >> > I have checked the status page http://node/server-status and I
> can
> >>> see
> >>> > >> the
> >>> > >> > status page ok. The mod_status is enabled on my node and runs
> fine.
> >>> > >> >
> >>> > >> > I had a look at my apache log as you advised but I can't see
> Apache
> >>> > >> moaning
> >>> > >> > about a specific error, apart from multiple stops and restarts
> due
> >>> to my
> >>> > >> > tests :
> >>> > >> >
> >>> > >> > [Thu Sep 15 14:20:37 2011] [notice] caught SIGTERM, shutting
> down
> >>> > >> > [Thu Sep 15 14:20:38 2011] [notice] Apache/2.2.17 (Ubuntu)
> >>> configured --
> >>> > >> > resuming normal operations
> >>> > >> > [Thu Sep 15 14:20:38 2011] [error] (2)No such file or directory:
> >>> > >> Couldn't
> >>> > >> > bind unix domain socket /var/run/apache2/cgisock.4278
> >>> > >> > [Thu Sep 15 14:20:39 2011] [notice] caught SIGTERM, shutting
> down
> >>> > >> >
> >>> > >> > That's for the primary node. It looks like Corosync shuts down
> >>> Apache.
> >>> > >> > On the Apache log file of the second node, I see the following :
> >>> > >> >
> >>> > >> > [Thu Sep 15 14:18:27 2011] [notice] Apache/2.2.17 (Ubuntu)
> >>> configured --
> >>> > >> > resuming normal operations
> >>> > >> > [Thu Sep 15 14:18:27 2011] [error] (2)No such file or directory:
> >>> > >> Couldn't
> >>> > >> > bind unix domain socket /var/run/apache2/cgisock.1338
> >>> > >> > [Thu Sep 15 14:18:28 2011] [crit] cgid daemon failed to
> initialize
> >>> > >> >
> >>> > >> > I still have errors but the http service keep on running, no
> >>> SIGTERM.
> >>> > >> >
> >>> > >> > And then my node status is :
> >>> > >> >
> >>> > >> > Online: [ node1 node2 ]
> >>> > >> >
> >>> > >> >  Resource Group: group1
> >>> > >> >      failover-ip (ocf::heartbeat:IPaddr): Started node1
> >>> > >> >      apache (ocf::heartbeat:apache): Stopped
> >>> > >> >
> >>> > >> > Failed actions:
> >>> > >> >     apache_start_0 (node=node2, call=6, rc=1, status=complete):
> >>> unknown
> >>> > >> > error
> >>> > >> >     apache_monitor_0 (node=node1, call=3, rc=1,
> status=complete):
> >>> > >> unknown
> >>> > >> > error
> >>> > >> >     apache_start_0 (node=node1, call=7, rc=1, status=complete):
> >>> unknown
> >>> > >> > error
> >>> > >> >
> >>> > >> >
> >>> > >> > of interest, some information found in var/log/syslog on node1 :
> >>> > >> >
> >>> > >> > Sep 15 02:32:56 node1 crmd: [710]: info: do_state_transition:
> All 2
> >>> > >> cluster
> >>> > >> > nodes are eligible to run resources.
> >>> > >> > Sep 15 02:32:56 node1 apache[928]: INFO: apache not running
> >>> > >> > Sep 15 02:32:56 node1 apache[928]: INFO: waiting for apache
> >>> > >> > /etc/apache2/apache2.conf to come up
> >>> > >> >
> >>> > >> > Sep 15 02:32:58 node1 apache[928]: INFO: Killing apache PID 995
> >>> > >> > Sep 15 02:32:59 node1 lrmd: [707]: info: RA output:
> >>> > >> (apache:start:stderr)
> >>> > >> > kill: 833:
> >>> > >> > Sep 15 02:32:59 node1 lrmd: [707]: info: RA output:
> >>> > >> (apache:start:stderr) No
> >>> > >> > such process
> >>> > >> > Sep 15 02:32:59 node1 lrmd: [707]: info: RA output:
> >>> > >> (apache:start:stderr)
> >>> > >> > Sep 15 02:32:59 node1 apache[928]: INFO: Killing apache PID 995
> >>> > >> > Sep 15 02:32:59 node1 apache[928]: INFO: apache stopped.
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: info: process_lrm_event: LRM
> >>> > >> operation
> >>> > >> > apache_start_0 (call=6, rc=1, cib-update=37, confirmed=true)
> >>> unknown
> >>> > >> error
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: WARN: status_from_rc: Action
> 8
> >>> > >> > (apache_start_0) on node1 failed (target: 0 vs. rc: 1): Error
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: WARN: update_failcount:
> Updating
> >>> > >> > failcount for apache on node1 after failed start: rc=1
> >>> (update=INFINITY,
> >>> > >> > time=1316050379)
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: info: abort_transition_graph:
> >>> > >> > match_graph_event:272 - Triggered transition abort (complete=0,
> >>> > >> > tag=lrm_rsc_op, id=apache_start_0,
> >>> > >> > magic=0:1;8:3:0:a4e41810-3e8f-439a-9b92-489edf657291,
> cib=0.172.10)
> >>> :
> >>> > >> Event
> >>> > >> > failed
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: info: update_abort_priority:
> >>> Abort
> >>> > >> > priority upgraded from 0 to 1
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: info: update_abort_priority:
> >>> Abort
> >>> > >> action
> >>> > >> > done superceeded by restart
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: info: match_graph_event:
> Action
> >>> > >> > apache_start_0 (8) confirmed on node1 (rc=4)
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: info: run_graph:
> >>> > >> > ====================================================
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: notice: run_graph: Transition
> 3
> >>> > >> > (Complete=3, Pending=0, Fired=0, Skipped=4, Incomplete=0,
> >>> > >> > Source=/var/lib/pengine/pe-input-247.bz2): Stopped
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: info: te_graph_trigger:
> >>> Transition 3
> >>> > >> is
> >>> > >> > now complete
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: info: do_state_transition:
> State
> >>> > >> > transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [
> input=I_PE_CALC
> >>> > >> > cause=C_FSA_INTERNAL origin=notify_crmd ]
> >>> > >> > Sep 15 02:32:59 node1 crmd: [710]: info: do_state_transition:
> All 2
> >>> > >> cluster
> >>> > >> > nodes are eligible to run resources.
> >>> > >> >
> >>> > >> >
> >>> > >> >
> >>> > >> >
> >>> > >> > Here is my crm configure show, is there anything I can change
> there
> >>> ?
> >>> > >> >
> >>> > >> >
> >>> > >> > root@node1:/home/user# crm configure show
> >>> > >> > node node1 \
> >>> > >> > attributes standby="off"
> >>> > >> > node node2 \
> >>> > >> > attributes standby="off"
> >>> > >> > primitive apache ocf:heartbeat:apache \
> >>> > >> > params configfile="/etc/apache2/apache2.conf"
> >>> httpd="/usr/sbin/apache2"
> >>> > >> \
> >>> > >> > op start interval="10" timeout="40s" \
> >>> > >> >  op stop interval="10" timeout="60s" \
> >>> > >> > op monitor interval="5s"
> >>> > >> > primitive failover-ip ocf:heartbeat:IPaddr \
> >>> > >> > params ip="192.168.0.105" \
> >>> > >> > op monitor interval="5s"
> >>> > >> > group group1 failover-ip apache
> >>> > >> > location cli-prefer-failover-ip failover-ip \
> >>> > >> > rule $id="cli-prefer-rule-failover-ip" inf: #uname eq node1
> >>> > >> > property $id="cib-bootstrap-options" \
> >>> > >> > dc-version="1.0.9-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
> >>> > >> > cluster-infrastructure="openais" \
> >>> > >> >  expected-quorum-votes="2" \
> >>> > >> > stonith-enabled="false" \
> >>> > >> >  no-quorum-policy="ignore"
> >>> > >> >
> >>> > >> >
> >>> > >> > Thank you for your help,
> >>> > >> >
> >>> > >> > Guillaume
> >>> > >> >
> >>> > >> >
> >>> > >> > On 13 September 2011 08:29, Tim Serong <tser...@suse.com>
> wrote:
> >>> > >> >
> >>> > >> > > On 13/09/11 00:39, Guillaume Bettayeb wrote:
> >>> > >> > > > Hi there,
> >>> > >> > > >
> >>> > >> > > > This is my first post on this list, so hello everybody :)
> >>> > >> > > >
> >>> > >> > > > I am currently testing the fun of Linux HA Clustering (just
> for
> >>> > >> > > > personal interest)
> >>> > >> > > > and I have successfully set up a tiny ubuntu virtualbox 2
> nodes
> >>> > >> > > > cluster with Ip failover and Apache running as resources.
> >>> > >> > > >
> >>> > >> > > > Right after the install, I tried to move the resources from
> a
> >>> node
> >>> > >> to
> >>> > >> > > > the other (command standby) an everything worked like a
> charm.
> >>> > >> > > > Then I tried some failure tests, and started with a simple
> >>> > >> > > > /etc/init.d/networking stop on one node, noticed that the
> other
> >>> node
> >>> > >> > > > took ownership of the resources automatically, all was fine.
> >>> > >> > > >
> >>> > >> > > > Then I have rebooted the nodes just to see how they would
> >>> restart
> >>> > >> the
> >>> > >> > > > cluster, and since I have the following error :
> >>> > >> > > >
> >>> > >> > > > apache_start_0 (node=node1, call=8, rc=1, status=complete):
> >>> unknown
> >>> > >> error
> >>> > >> > > >
> >>> > >> > > > For reading convenience,  my outputs are available at
> >>> > >> > > > http://pastebin.com/w1J4TWaG
> >>> > >> > > > Just to clarify, that's :
> >>> > >> > > > - crm configure show command
> >>> > >> > > > - crm_mon status
> >>> > >> > > > - All relevant information into my /var/log/syslog (although
> I
> >>> was
> >>> > >> not
> >>> > >> > > > sure what to look at, I never used corosync before)
> >>> > >> > > >
> >>> > >> > > > I have read on an older post that the apache error usually
> have
> >>> > >> > > > something to do with either timeout or mod status.
> >>> > >> > > > As you can see on my pastebin, my timeout values are ok :
> >>> > >> > > > op stop interval="60s" timeout="120" \
> >>> > >> > > > op start interval="60s" timeout="120" \
> >>> > >> > > >
> >>> > >> > > > as for mod_status it's already enabled in Apache :
> >>> > >> > > > root@node1:/etc/apache2# a2enmod status
> >>> > >> > > > Module status already enabled
> >>> > >> > > >
> >>> > >> > > > Have I done anything wrong or is there anything else I
> should
> >>> > >> > > check/configure ?
> >>> > >> > > >
> >>> > >> > > > Any help with this matter would be greatly appreciated :)
> >>> > >> > >
> >>> > >> > > On a punt, it's probably mod_status.  Check your Apache logs
> at
> >>> the
> >>> > >> time
> >>> > >> > > the start failed.  If it's whining about a 403 or 404 for
> >>> > >> /server-status
> >>> > >> > > (or similar), you need to fix that in your Apache config.
> >>> > >> > >
> >>> > >> > > HTH,
> >>> > >> > >
> >>> > >> > > Tim
> >>> > >> > > --
> >>> > >> > > Tim Serong
> >>> > >> > > Senior Clustering Engineer
> >>> > >> > > SUSE
> >>> > >> > > tser...@suse.com
> >>> > >> > > _______________________________________________
> >>> > >> > > Linux-HA mailing list
> >>> > >> > > Linux-HA@lists.linux-ha.org
> >>> > >> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> > >> > > See also: http://linux-ha.org/ReportingProblems
> >>> > >> > >
> >>> > >> > _______________________________________________
> >>> > >> > Linux-HA mailing list
> >>> > >> > Linux-HA@lists.linux-ha.org
> >>> > >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> > >> > See also: http://linux-ha.org/ReportingProblems
> >>> > >> >
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> _______________________________________________
> >>> > >> Linux-HA mailing list
> >>> > >> Linux-HA@lists.linux-ha.org
> >>> > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> > >> See also: http://linux-ha.org/ReportingProblems
> >>> > >>
> >>> > >
> >>> > >
> >>> > _______________________________________________
> >>> > Linux-HA mailing list
> >>> > Linux-HA@lists.linux-ha.org
> >>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> > See also: http://linux-ha.org/ReportingProblems
> >>> _______________________________________________
> >>> Linux-HA mailing list
> >>> Linux-HA@lists.linux-ha.org
> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> See also: http://linux-ha.org/ReportingProblems
> >>>
> >>
> >>
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Apache error on all nodes

Reply via email to