On Thu, May 05, 2011 at 12:26:57PM +0200, Benjamin Knoth wrote: > Hi again, > > i copied the jboss ocf and modified the variables, that the script use > my variables ifi start it. Now if i start the ocf script i get the > following everytime. > > ./jboss-test start > jboss-test[6165]: DEBUG: [jboss] Enter jboss start > jboss-test[6165]: DEBUG: start_jboss[jboss]: retry monitor_jboss > jboss-test[6165]: DEBUG: start_jboss[jboss]: retry monitor_jboss > jboss-test[6165]: DEBUG: start_jboss[jboss]: retry monitor_jboss > > Something is wrong.
Typically, the start operation includes a monitor at the end to make sure that the resource really started. In this case it looks like the monitor repeatedly fails. You should check the monitor operation. Take a look at the output of "crm ra info jboss" for parameters which have effect on monitoring. BTW, you can test your resource without cluster using ocf-tester. Thanks, Dejan > Cheers > Benjamin > > Am 05.05.2011 12:03, schrieb Benjamin Knoth: > > Hi, > > > > Am 05.05.2011 11:46, schrieb Dejan Muhamedagic: > >> On Wed, May 04, 2011 at 03:44:02PM +0200, Benjamin Knoth wrote: > >>> > >>> > >>> Am 04.05.2011 13:18, schrieb Benjamin Knoth: > >>>> Hi, > >>>> > >>>> Am 04.05.2011 12:18, schrieb Dejan Muhamedagic: > >>>>> Hi, > >>>>> > >>>>> On Wed, May 04, 2011 at 10:37:40AM +0200, Benjamin Knoth wrote: > >>>>> > >>>>> > >>>>> Am 04.05.2011 09:42, schrieb Florian Haas: > >>>>>>>> On 05/04/2011 09:31 AM, Benjamin Knoth wrote: > >>>>>>>>> Hi Florian, > >>>>>>>>> i test it with ocf, but i couldn't run. > >>>>>>>> > >>>>>>>> Well that's really helpful information. Logs? Error messages? > >>>>>>>> Anything? > >>>>> > >>>>> Logs > >>>>> > >>>>> May 4 09:55:10 vm36 lrmd: [19214]: WARN: p_jboss_ocf:start process (PID > >>>>> 27702) timed out (try 1). Killing with signal SIGTERM (15). > >>>>> > >>>>>> You need to set/increase the timeout for the start operation to > >>>>>> match the maximum expected start time. Take a look at "crm ra > >>>>>> info jboss" for minimum values. > >>>>> > >>>>> May 4 09:55:10 vm36 attrd: [19215]: info: find_hash_entry: Creating > >>>>> hash entry for fail-count-p_jboss_ocf > >>>>> May 4 09:55:10 vm36 lrmd: [19214]: WARN: operation start[342] on > >>>>> ocf::jboss::p_jboss_ocf for client 19217, its parameters: > >>>>> CRM_meta_name=[start] crm_feature_set=[3.0.1] > >>>>> java_home=[/usr/lib64/jvm/java] CRM_meta_timeout=[240000] jboss_sto > >>>>> p_timeout=[30] jboss_home=[/usr/share/jboss] jboss_pstring=[java > >>>>> -Dprogram.name=run.sh] : pid [27702] timed out > >>>>> May 4 09:55:10 vm36 attrd: [19215]: info: attrd_trigger_update: Sending > >>>>> flush op to all hosts for: fail-count-p_jboss_ocf (INFINITY) > >>>>> May 4 09:55:10 vm36 crmd: [19217]: WARN: status_from_rc: Action 64 > >>>>> (p_jboss_ocf_start_0) on vm36 failed (target: 0 vs. rc: -2): Error > >>>>> May 4 09:55:10 vm36 lrmd: [19214]: info: rsc:p_jboss_ocf:346: stop > >>>>> May 4 09:55:10 vm36 attrd: [19215]: info: attrd_perform_update: Sent > >>>>> update 2294: fail-count-p_jboss_ocf=INFINITY > >>>>> May 4 09:55:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Hard error > >>>>> - p_jboss_lsb_monitor_0 failed with rc=5: Preventing p_jboss_lsb from > >>>>> re-starting on vm36 > >>>>> May 4 09:55:10 vm36 crmd: [19217]: WARN: update_failcount: Updating > >>>>> failcount for p_jboss_ocf on vm36 after failed start: rc=-2 > >>>>> (update=INFINITY, time=1304495710) > >>>>> May 4 09:55:10 vm36 attrd: [19215]: info: find_hash_entry: Creating > >>>>> hash entry for last-failure-p_jboss_ocf > >>>>> May 4 09:55:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Operation > >>>>> p_jboss_cs_monitor_0 found resource p_jboss_cs active on vm36 > >>>>> May 4 09:55:10 vm36 crmd: [19217]: info: abort_transition_graph: > >>>>> match_graph_event:272 - Triggered transition abort (complete=0, > >>>>> tag=lrm_rsc_op, id=p_jboss_ocf_start_0, > >>>>> magic=2:-2;64:1375:0:fc16910d-2fe9-4daa-834a-348a4c7645ef, cib=0.53 > >>>>> 5.2) : Event failed > >>>>> May 4 09:55:10 vm36 attrd: [19215]: info: attrd_trigger_update: Sending > >>>>> flush op to all hosts for: last-failure-p_jboss_ocf (1304495710) > >>>>> May 4 09:55:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Hard error > >>>>> - p_jboss_init_monitor_0 failed with rc=5: Preventing p_jboss_init from > >>>>> re-starting on vm36 > >>>>> May 4 09:55:10 vm36 crmd: [19217]: info: match_graph_event: Action > >>>>> p_jboss_ocf_start_0 (64) confirmed on vm36 (rc=4) > >>>>> May 4 09:55:10 vm36 attrd: [19215]: info: attrd_perform_update: Sent > >>>>> update 2297: last-failure-p_jboss_ocf=1304495710 > >>>>> May 4 09:55:10 vm36 pengine: [19216]: WARN: unpack_rsc_op: Processing > >>>>> failed op p_jboss_ocf_start_0 on vm36: unknown exec error (-2) > >>>>> May 4 09:55:10 vm36 crmd: [19217]: info: te_rsc_command: Initiating > >>>>> action 1: stop p_jboss_ocf_stop_0 on vm36 (local) > >>>>> May 4 09:55:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Operation > >>>>> p_jboss_ocf_monitor_0 found resource p_jboss_ocf active on vm37 > >>>>> May 4 09:55:10 vm36 crmd: [19217]: info: do_lrm_rsc_op: Performing > >>>>> key=1:1376:0:fc16910d-2fe9-4daa-834a-348a4c7645ef op=p_jboss_ocf_stop_0 > >>>>> ) > >>>>> May 4 09:55:10 vm36 pengine: [19216]: notice: native_print: p_jboss_ocf > >>>>> (ocf::heartbeat:jboss): Stopped > >>>>> May 4 09:55:10 vm36 pengine: [19216]: info: get_failcount: p_jboss_ocf > >>>>> has failed INFINITY times on vm36 > >>>>> May 4 09:55:10 vm36 pengine: [19216]: WARN: common_apply_stickiness: > >>>>> Forcing p_jboss_ocf away from vm36 after 1000000 failures (max=1000000) > >>>>> May 4 09:59:10 vm36 pengine: [19216]: info: unpack_config: Node scores: > >>>>> 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > >>>>> May 4 09:59:10 vm36 crmd: [19217]: WARN: status_from_rc: Action 50 > >>>>> (p_jboss_ocf_start_0) on vm37 failed (target: 0 vs. rc: -2): Error > >>>>> May 4 09:59:10 vm36 pengine: [19216]: info: determine_online_status: > >>>>> Node vm36 is online > >>>>> May 4 09:59:10 vm36 crmd: [19217]: WARN: update_failcount: Updating > >>>>> failcount for p_jboss_ocf on vm37 after failed start: rc=-2 > >>>>> (update=INFINITY, time=1304495950) > >>>>> May 4 09:59:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Hard error > >>>>> - p_jboss_lsb_monitor_0 failed with rc=5: Preventing p_jboss_lsb from > >>>>> re-starting on vm36 > >>>>> May 4 09:59:10 vm36 crmd: [19217]: info: abort_transition_graph: > >>>>> match_graph_event:272 - Triggered transition abort (complete=0, > >>>>> tag=lrm_rsc_op, id=p_jboss_ocf_start_0, > >>>>> magic=2:-2;50:1377:0:fc16910d-2fe9-4daa-834a-348a4c7645ef, cib=0.53 > >>>>> 5.12) : Event failed > >>>>> May 4 09:59:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Operation > >>>>> p_jboss_cs_monitor_0 found resource p_jboss_cs active on vm36 > >>>>> May 4 09:59:10 vm36 crmd: [19217]: info: match_graph_event: Action > >>>>> p_jboss_ocf_start_0 (50) confirmed on vm37 (rc=4) > >>>>> May 4 09:59:10 vm36 pengine: [19216]: notice: native_print: p_jboss_ocf > >>>>> (ocf::heartbeat:jboss): Stopped > >>>>> May 4 09:59:10 vm36 pengine: [19216]: info: get_failcount: p_jboss_ocf > >>>>> has failed INFINITY times on vm37 > >>>>> May 4 09:59:10 vm36 pengine: [19216]: WARN: common_apply_stickiness: > >>>>> Forcing p_jboss_ocf away from vm37 after 1000000 failures (max=1000000) > >>>>> May 4 09:59:10 vm36 pengine: [19216]: info: get_failcount: p_jboss_ocf > >>>>> has failed INFINITY times on vm36 > >>>>> May 4 09:59:10 vm36 pengine: [19216]: info: native_color: Resource > >>>>> p_jboss_ocf cannot run anywhere > >>>>> May 4 09:59:10 vm36 pengine: [19216]: notice: LogActions: Leave > >>>>> resource p_jboss_ocf (Stopped) > >>>>> May 4 09:59:31 vm36 pengine: [19216]: notice: native_print: p_jboss_ocf > >>>>> (ocf::heartbeat:jboss): Stopped > >>>>> .... > >>>>> > >>>>> Now i don't know how can i reset the resource p_jboss_ocf to test it > >>>>> again. > >>>>> > >>>>>> crm resource cleanup p_jboss_ocf > >>>> > >>>> That's the now way, but if i start this command on shell or crm shell in > >>>> both i get Cleaning up p_jboss_ocf on vm37 > >>>> Cleaning up p_jboss_ocf on vm36 > >>>> > >>>> But if i look on the monitoring with crm_mon -1 i getevery time > >>>> > >>>> Failed actions: > >>>> p_jboss_ocf_start_0 (node=vm36, call=-1, rc=1, status=Timed Out): > >>>> unknown error > >>>> p_jboss_monitor_0 (node=vm37, call=205, rc=5, status=complete): not > >>>> installed > >>>> p_jboss_ocf_start_0 (node=vm37, call=281, rc=-2, status=Timed Out): > >>>> unknown exec error > >>>> > >>>> p_jboss was deleted in the config yesterday. > >>> > >>> For demonstration: > >>> > >>> 15:34:22 ~ # crm_mon -1 > >>> > >>> Failed actions: > >>> p_jboss_ocf_start_0 (node=vm36, call=376, rc=-2, status=Timed Out): > >>> unknown exec error > >>> p_jboss_monitor_0 (node=vm37, call=205, rc=5, status=complete): not > >>> installed > >>> p_jboss_ocf_start_0 (node=vm37, call=283, rc=-2, status=Timed Out): > >>> unknown exec error > >>> > >>> 15:35:02 ~ # crm resource cleanup p_jboss_ocf > >>> INFO: no curses support: you won't see colors > >>> Cleaning up p_jboss_ocf on vm37 > >>> Cleaning up p_jboss_ocf on vm36 > >>> > >>> 15:39:12 ~ # crm resource cleanup p_jboss > >>> INFO: no curses support: you won't see colors > >>> Cleaning up p_jboss on vm37 > >>> Cleaning up p_jboss on vm36 > >>> > >>> 15:39:19 ~ # crm_mon -1 > >>> > >>> Failed actions: > >>> p_jboss_ocf_start_0 (node=vm36, call=376, rc=-2, status=Timed Out): > >>> unknown exec error > >>> p_jboss_monitor_0 (node=vm37, call=205, rc=5, status=complete): not > >>> installed > >>> p_jboss_ocf_start_0 (node=vm37, call=283, rc=-2, status=Timed Out): > >>> unknown exec error > > > > Strange, after i edit the config all other Failed actions are deleted > > only this Failed actions will be displayed. > > > > Failed actions: > > p_jboss_ocf_start_0 (node=vm36, call=380, rc=-2, status=Timed Out): > > unknown exec error > > p_jboss_ocf_start_0 (node=vm37, call=287, rc=-2, status=Timed Out): > > unknown exec error > > > >> > >> Strange, perhaps you ran into a bug here. You can open a bugzilla > >> with hb_report. > >> > >> Anyway, you should fix the timeout issue. > > > > I know but what sould i do to resolve this issue. > > > > my config entry for jboss is: > > > > primitive p_jboss_ocf ocf:heartbeat:jboss \ > > params java_home="/usr/lib64/jvm/java" > > jboss_home="/usr/share/jboss" jboss_pstring="java -Dprogram.name=run.sh" > > jboss_stop_timeout="30" \ > > op start interval="0" timeout="240s" \ > > op stop interval="0" timeout="240s" \ > > op monitor interval="20s" > > > > In worst case jboss needs max 120s and that's really the worst. > > > > Cheers, > > Benjamin > > > >> > >> Thanks, > >> > >> Dejan > >> > >> > >>>>> > >>>>> And after some tests i have some not more existing resouces in the > >>>>> Failed actions list. How can i delete them? > >>>>> > >>>>>> The same way. > >>>>> > >>>>>> Thanks, > >>>>> > >>>>>> Dejan > >>>>> > >>> > >>> Thx > >>> > >>> Benjamin > >>>>> > >>>>> > >>>>> > >>>>>>>> > >>>>>>>> Florian > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>> > >>>> > >>>> _______________________________________________ > >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>> > >>>> Project Home: http://www.clusterlabs.org > >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>> Bugs: > >>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: > >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: > >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > -- > Benjamin Knoth > Max Planck Digital Library (MPDL) > Systemadministration > Amalienstrasse 33 > 80799 Munich, Germany > http://www.mpdl.mpg.de > > Mail: kn...@mpdl.mpg.de > Phone: +49 89 38602 202 > Fax: +49-89-38602-280 > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker