I see you've created a bug for this, I'll follow up there. On Wed, Sep 29, 2010 at 10:15 AM, <renayama19661...@ybb.ne.jp> wrote: > Hi, > > We examined the trouble outbreak of a resource during cluster division and > the recovery of the > cluster. > > However, at the time of cluster recovery, the phenomenon that fail-count > disappeared occurred. > Failed-Actions did not disappear then. > > In the next procedure, it occurred. > > Step1)We start Heartbeat. > > Step2)We stand alone in iptables in a cgl60 node. > > Step3)When a sfex resource started in a cgl63 node, we remove the isolation > of the cgl60 node. > > Step4)In a cgl63 node, a start of VIPcheck,sfex becomes the error. > * VIPcheck,sfex becomes the resource to detect double start. > > Step5)fail-count is lost. > > ============ > Last updated: Thu Sep 16 17:26:10 2010 > Stack: Heartbeat > Current DC: cgl63 (16349f88-0203-40d1-ba48-b7a5c4547a26) - partition with > quorum > Version: 1.0.9-74392a28b7f3 stable-1.0 tip > 4 Nodes configured, unknown expected votes > 10 Resources configured. > ============ > > Online: [ cgl60 cgl61 cgl62 cgl63 ] > > Resource Group: UMgroup01 > UmVIPcheck (ocf::heartbeat:VIPcheck): Started cgl60 > UmIPaddr (ocf::heartbeat:IPaddr2): Started cgl60 > UmDummy01 (ocf::pacemaker:Dummy): Started cgl60 > UmDummy02 (ocf::pacemaker:Dummy): Started cgl60 > Resource Group: OVDBgroup02-1 > prmExPostgreSQLDB1 (ocf::heartbeat:sfex): Started cgl60 > prmFsPostgreSQLDB1-1 (ocf::heartbeat:Filesystem): Started cgl60 > prmFsPostgreSQLDB1-2 (ocf::heartbeat:Filesystem): Started cgl60 > prmFsPostgreSQLDB1-3 (ocf::heartbeat:Filesystem): Started cgl60 > prmIpPostgreSQLDB1 (ocf::heartbeat:IPaddr2): Started cgl60 > prmApPostgreSQLDB1 (ocf::heartbeat:pgsql): Started cgl60 > Resource Group: OVDBgroup02-2 > prmExPostgreSQLDB2 (ocf::heartbeat:sfex): Started cgl61 > prmFsPostgreSQLDB2-1 (ocf::heartbeat:Filesystem): Started cgl61 > prmFsPostgreSQLDB2-2 (ocf::heartbeat:Filesystem): Started cgl61 > prmFsPostgreSQLDB2-3 (ocf::heartbeat:Filesystem): Started cgl61 > prmIpPostgreSQLDB2 (ocf::heartbeat:IPaddr2): Started cgl61 > prmApPostgreSQLDB2 (ocf::heartbeat:pgsql): Started cgl61 > Resource Group: OVDBgroup02-3 > prmExPostgreSQLDB3 (ocf::heartbeat:sfex): Started cgl62 > prmFsPostgreSQLDB3-1 (ocf::heartbeat:Filesystem): Started cgl62 > prmFsPostgreSQLDB3-2 (ocf::heartbeat:Filesystem): Started cgl62 > prmFsPostgreSQLDB3-3 (ocf::heartbeat:Filesystem): Started cgl62 > prmIpPostgreSQLDB3 (ocf::heartbeat:IPaddr2): Started cgl62 > prmApPostgreSQLDB3 (ocf::heartbeat:pgsql): Started cgl62 > (snip) > Migration summary: > * Node cgl60: > * Node cgl61: > * Node cgl62: > * Node cgl63: -----> Lost fail-count..... > > Failed actions: > prmExPostgreSQLDB1_start_0 (node=cgl63, call=46, rc=1, status=complete): > unknown error > UmVIPcheck_start_0 (node=cgl63, call=45, rc=1, status=complete): unknown > error > > > The trouble of the start processing seems to detect it when we watch log. > > Sep 16 17:25:29 cgl63 crmd: [9757]: info: process_lrm_event: LRM operation > prmExPostgreSQLDB1_start_0 > (call=46, rc=1, cib-update=91, confirmed=true) unknown error > > What is the cause of the disappearance of fail-count? > > I attach log. > * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2496 > > Best Regard, > Hideo Yamauchi. > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker