> On 13 Aug 2015, at 2:20 am, Ken Gaillot <kgail...@redhat.com> wrote: > > On 08/12/2015 10:45 AM, Miloš Kozák wrote: >> Thank you for your answer, but. >> >> 1) This sounds ok, but in other words it means the first delayed check >> is not possible to be done. >> >> 2) Start of init script? I follow lsb scripts from distribution, so >> there is not way to change them (I can change them, but with packages >> upgade they will go void). The is quite typical approach, how can I do >> HA for atlassian for example? Jira loads 5minutes.. > > I think your situation involves multiple issues which are worth > separating for clarity: > > 1. As Alexander mentioned, Pacemaker will do a monitor BEFORE trying to > start a service, to make sure it's not already running. So these don't > need any delay and are expected to "fail". > > 2. Resource agents MUST NOT return success for "start" until the service > is fully up and running, so the next monitor should succeed, again > without needing any delay. If that's not the case, it's a bug in the agent.
Consider the ordering constraint “start A then B”. Regardless of whether you delay A’s monitor operation, B is going to expect A is up when “start A” completes. So it should only indicate completion once its actually usable. > > 3. It's generally better to use OCF resource agents whenever available, > as they have better integration with pacemaker than lsb/systemd/upstart. > In this case, take a look at ocf:heartbeat:apache. > > 4. You can configure the timeout used with each action (stop, start, > monitor, restart) on a given resource. The default is 20 seconds. For > example, if a "start" action is expected to take 5 minutes, you would > define a start operation on the resource with timeout=300s. How you do > that depends on your management tool (pcs, crmsh, or cibadmin). > > Bottom line, you should never need a delay on the monitor, instead set > appropriate timeouts for each action, and make sure that the agent does > not return from "start" until the service is fully up. > >> Dne 12.8.2015 v 16:14 Nekrasov, Alexander napsal(a): >>> 1. Pacemaker will/may call a monitor before starting a resource, in >>> which case it expects a NOT_RUNNING response. It's just checking >>> assumptions at that point. >>> >>> 2. A resource::start must only return when resource::monitor is >>> successful. Basically the logic of a start() must follow this: >>> >>> start() { >>> start_daemon() >>> while ! monitor() ; do >>> sleep some >>> done >>> return $OCF_SUCCESS >>> } >>> >>>> -----Original Message----- >>>> From: Miloš Kozák [mailto:milos.ko...@lejmr.com] >>>> Sent: Wednesday, August 12, 2015 10:03 AM >>>> To: users@clusterlabs.org >>>> Subject: [ClusterLabs] Delayed first monitoring >>>> >>>> Hi, >>>> >>>> I have set up and CoroSync+CMAN+Pacemaker at CentOS 6.5 in order to >>>> provide high-availability of opennebula. However, I am facing to a >>>> strange problem which raises from my lack of knowleadge.. >>>> >>>> In the log I can see that when I create a resource based on an init >>>> script, typically: >>>> >>>> pcs resource create httpd lsb:httpd >>>> >>>> The httpd daemon gets started, but monitor is initiated at the same time >>>> and the resource is identified as not running. This behaviour makes >>>> sense since we realize that the daemon starting takes some time. In this >>>> particular case, I get error code 2 which means that process is running, >>>> but environment is not locked. The effect of this is that httpd resource >>>> gets restarted. >>>> >>>> My workaround is extra sleep in status function of the init script, but >>>> I dont like this solution at all! Do you have idea how to tackle this >>>> problem in a proper way? I expected an op attribut which would specify >>>> delay after service start and first monitoring, but I could not find >>>> it.. >>>> >>>> Thank you, Milos > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org