Hi Andrew, Thank you for comment.
> The problem here is that attrd is supposed to be the authoritative > source for this sort of data. Yes. I understand. > Additionally, you don't always want attrd reading from the status > section - like after the cluster restarts. The problem seems to be able to solve even that it retrieves a status section from cib after attrd rebooted. "method2" which I suggested is such a meaning. > > method 2)When attrd started, Attrd communicates with cib and receives > > fail-count. > For failcount, the crmd could keep a hashtable of the current values > which it could re-send to attrd if it detects a disconnection. > But that might not be a generic-enough solution. If a Hash table of crmd can maintain it, it may be a good thought. However, I have a feeling that the same problem happens when crmd causes trouble and rebooted. > The chance that attrd dies _and_ there were relevant values for > fail-count is pretty remote though... is this a real problem you've > experienced or a theoretical one? I did not understand meanings well. Does this mean that there is fail-count of attrd in the other node? Best Regards, Hideo Yamauchi. --- Andrew Beekhof <and...@beekhof.net> wrote: > On Mon, Sep 27, 2010 at 7:26 AM, <renayama19661...@ybb.ne.jp> wrote: > > Hi, > > > > When I investigated another problem, I discovered this phenomenon. > > If attrd causes process trouble and does not restart, the problem does not > > occur. > > > > Step1) After start, it causes a monitor error in UmIPaddr twice. > > > > Online: [ srv01 srv02 ] > > > > �Resource Group: UMgroup01 > > � � UmVIPcheck (ocf::heartbeat:Dummy): Started srv01 > > � � UmIPaddr � (ocf::heartbeat:Dummy2): � > > � � �Started srv01 > > > > Migration summary: > > * Node srv02: > > * Node srv01: > > � UmIPaddr: migration-threshold=10 fail-count=2 > > > > Step2) Kill Attrd and Attrd reboots. > > > > Online: [ srv01 srv02 ] > > > > �Resource Group: UMgroup01 > > � � UmVIPcheck (ocf::heartbeat:Dummy): Started srv01 > > � � UmIPaddr � (ocf::heartbeat:Dummy2): � > > � � �Started srv01 > > > > Migration summary: > > * Node srv02: > > * Node srv01: > > � UmIPaddr: migration-threshold=10 fail-count=2 > > > > Step3) It causes a monitor error in UmIPaddr. > > > > Online: [ srv01 srv02 ] > > > > �Resource Group: UMgroup01 > > � � UmVIPcheck (ocf::heartbeat:Dummy): Started srv01 > > � � UmIPaddr � (ocf::heartbeat:Dummy2): � > > � � �Started srv01 > > > > Migration summary: > > * Node srv02: > > * Node srv01: > > � UmIPaddr: migration-threshold=10 fail-count=1 -----> Fail-count > > return to the first. > > > > The problem is so that attrd disappears fail-count by reboot.(Hash-tables > > is Lost.) > > It is a problem very much that the trouble number of times is initialized. > > > > I think that there is the following method. > > > > method 1)Attrd maintain fail-count as a file in "/var/run" directories and > > refer. > > > > method 2)When attrd started, Attrd communicates with cib and receives > > fail-count. > > > > Is there a better method? > > > > Please think about the solution of this problem. > > Hmmmm... a tricky one. > > The problem here is that attrd is supposed to be the authoritative > source for this sort of data. > Additionally, you don't always want attrd reading from the status > section - like after the cluster restarts. > > For failcount, the crmd could keep a hashtable of the current values > which it could re-send to attrd if it detects a disconnection. > But that might not be a generic-enough solution. > > The chance that attrd dies _and_ there were relevant values for > fail-count is pretty remote though... is this a real problem you've > experienced or a theoretical one? > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker