Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

renayama19661014 Tue, 28 Sep 2010 19:03:05 -0700

Hi Andrew,

Thank you for comment.


> The problem here is that attrd is supposed to be the authoritative
> source for this sort of data.

Yes. I understand.

> Additionally, you don't always want attrd reading from the status
> section - like after the cluster restarts.

The problem seems to be able to solve even that it retrieves a status section 
from cib after attrd
rebooted. 
"method2" which I suggested is such a meaning.
> > method 2)When attrd started, Attrd communicates with cib and receives 
> > fail-count.

> For failcount, the crmd could keep a hashtable of the current values
> which it could re-send to attrd if it detects a disconnection.
> But that might not be a generic-enough solution.

If a Hash table of crmd can maintain it, it may be a good thought. 
However, I have a feeling that the same problem happens when crmd causes 
trouble and rebooted.

> The chance that attrd dies _and_ there were relevant values for
> fail-count is pretty remote though... is this a real problem you've
> experienced or a theoretical one?

I did not understand meanings well.
Does this mean that there is fail-count of attrd in the other node?

Best Regards,
Hideo Yamauchi.

--- Andrew Beekhof <and...@beekhof.net> wrote:

> On Mon, Sep 27, 2010 at 7:26 AM,  <renayama19661...@ybb.ne.jp> wrote:
> > Hi,
> >
> > When I investigated another problem, I discovered this phenomenon.
> > If attrd causes process trouble and does not restart, the problem does not 
> > occur.
> >
> > Step1) After start, it causes a monitor error in UmIPaddr twice.
> >
> > Online: [ srv01 srv02 ]
> >
> > &#65533;Resource Group: UMgroup01
> > &#65533; &#65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
> > &#65533; &#65533; UmIPaddr &#65533; (ocf::heartbeat:Dummy2): &#65533; 
> > &#65533; &#65533;
&#65533;Started srv01
> >
> > Migration summary:
> > * Node srv02:
> > * Node srv01:
> > &#65533; UmIPaddr: migration-threshold=10 fail-count=2
> >
> > Step2) Kill Attrd and Attrd reboots.
> >
> > Online: [ srv01 srv02 ]
> >
> > &#65533;Resource Group: UMgroup01
> > &#65533; &#65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
> > &#65533; &#65533; UmIPaddr &#65533; (ocf::heartbeat:Dummy2): &#65533; 
> > &#65533; &#65533;
&#65533;Started srv01
> >
> > Migration summary:
> > * Node srv02:
> > * Node srv01:
> > &#65533; UmIPaddr: migration-threshold=10 fail-count=2
> >
> > Step3) It causes a monitor error in UmIPaddr.
> >
> > Online: [ srv01 srv02 ]
> >
> > &#65533;Resource Group: UMgroup01
> > &#65533; &#65533; UmVIPcheck (ocf::heartbeat:Dummy): Started srv01
> > &#65533; &#65533; UmIPaddr &#65533; (ocf::heartbeat:Dummy2): &#65533; 
> > &#65533; &#65533;
&#65533;Started srv01
> >
> > Migration summary:
> > * Node srv02:
> > * Node srv01:
> > &#65533; UmIPaddr: migration-threshold=10 fail-count=1 -----> Fail-count 
> > return to the first.
> >
> > The problem is so that attrd disappears fail-count by reboot.(Hash-tables 
> > is Lost.)
> > It is a problem very much that the trouble number of times is initialized.
> >
> > I think that there is the following method.
> >
> > method 1)Attrd maintain fail-count as a file in "/var/run" directories and 
> > refer.
> >
> > method 2)When attrd started, Attrd communicates with cib and receives 
> > fail-count.
> >
> > Is there a better method?
> >
> > Please think about the solution of this problem.
> 
> Hmmmm... a tricky one.
> 
> The problem here is that attrd is supposed to be the authoritative
> source for this sort of data.
> Additionally, you don't always want attrd reading from the status
> section - like after the cluster restarts.
> 
> For failcount, the crmd could keep a hashtable of the current values
> which it could re-send to attrd if it detects a disconnection.
> But that might not be a generic-enough solution.
> 
> The chance that attrd dies _and_ there were relevant values for
> fail-count is pretty remote though... is this a real problem you've
> experienced or a theoretical one?
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [Problem or Enhancement]When attrd reboots, a fail count is initialized.

Reply via email to